Page MenuHomePhabricator

EFL Evas Textblock Object
Updated 1,567 Days AgoPublic

Introduction

This document serves as a complement to the existing [Textblock Documentation and Tutorial](http://docs.enlightenment.org/auto/efl/group__Evas__Object__Textblock.html#Evas_Object_Textblock_Tutorial "") in the official EFL documents.

The Textblock object is an extensive tool for handling complex texts, such as ones that have multiple styles and multilined. More complex features offer text alignement, wrapping and item embedding e.g. icons.
Currently, this is the favorable way of handling text in EFL, and demand for more a comprehensive explanation had led to the forming of this document.

Markup

Setting a text in Textblock is performed by using the evas_object_textblock_text_markup_set/prepend methods.
The textblock object introduces a light markup language for using the special text formatting it offers. For example, the following markup:
<font_weight=bold>Hello</font_weight> World
will produce the Hello World text with "Hello" being in bold:{F11559}

Formatting

Formatting can be achieved by either using the formatting functions, or by adding formatting tags when setting the markup text.
Tags add convenience and are converted to the proper formatting property. Thus, instead of <font_weight=bold>Hello</font_weight> World we can just use <b>Hello</b> World.
The <b> and the <i> are default tags that are embedded in Textblock. Different tags can be supported by setting a style to the Textblock object using the evas_object_textblock_style_set method.

For all formattings that textblock offers, please consult the Textblock Style Page.

Textblock Logic

Nodes

When a user sets the textblock's markup using evas_object_textblock_text_markup_set (or evas_object_textblock_text_markup_prepend), the text is being parsed to create [Text Nodes](#nodes_text) and [Format Nodes](#nodes_format) from occurrences of plain text and tags, respectively. Nodes contain the minimal amount of information required, so that the heavy work of layouting can be performed on the rendering stages.

{F11562, layout=right, float}

Format nodes are assigned to text nodes, and each text node can have many format nodes assigned to it.
The relation (pointers) between the two types is depicted in the following diagram:

Solid lines represent required pointers to make it work, dotted lines represent list pointers (for iteration) and dashed lines represent pointers for optimization.

Text Nodes

[Evas_Object_Textblock_Node_Text]
Text nodes hold information of the actual text. The text is being stripped-down of format tags, and stored as unicode data.
The stripped format tags create format nodes with relevant information.

Format Nodes

[Evas_Object_Textblock_Node_Format]
These represent the format instances determined by the style and the format tags in a text.

{F11564, layout=right, float}

Paragraph

Paragraphs are delimited by a Paragraph Separator (PS) format tag. Each Paragraph
has lines, and each line describes the visual ordering of items.
The paragraph objects holds a logical list of all of its items.

IMPORTANT: This is true if legacy newline support is turned off. Otherwise, a newline item creates a new paragraph instead of a line.

Lines

Lines are created by either having a Line Break format tag, or by a property of the
paragraph, such as line-wrapping.
Essentially, each line is a list of Textblock items.

Items

Items are elements that describe how a line is laid out. Their order in each line of a paragraph determines the visual placement of the text in the rendering stage.
Each item is associated with a text node, as well as its offset in that node, so that multiple items may represent different sections of the same text node. Each item is associated with a format node.
There are two types of items: Text and Format. Each extend the Item object.

Text Items

Text items are essentially an extension to Evas Text Props.

Text Props

The text props (=properties?) is the core text structure of Evas. It is used throughout all text-related Evas objects.
Mainly, the text props consists of:

  • An array of glyphs
  • An array of glyph information.

Glyph information is being populated during the pre-layout stage, and the glyphs themselves at the rendering stage.
The information of each glyph, which is required to handle text properly, is as follows:

  • Pen position after glyph
  • x/y bearing
  • Width
  • Opentype Information:
    • Cluster index
    • x/y offset

The Text_Props structure contains information for a whole paragraph. Also, it can be shared among multiple text items, which had been created as a result of one text item being split, for various resons. One example for such a case is text wrapping, and is demonstrated in the following diagram:

{F11577, layout=right, float}

This is a rough example of how a single line of the "Hello World" text is linked to its text props. Note that the glyph information does not hold a character or a glyph, but only the information as described in the list above. The letters of each corresponding glyph information are placed in the diagram for the convenience of the reader.

{F11581, layout=right, float}

If the text happens to get wrapped, due to width restrictions of the textblock's window, then the line will be split. In this case, the wrapping is set to be a "word-wrap".
As we can see, the single item has been split to three items. Each Text Item has its own Text Props structure, which corresponds to different ranges in the glyph information array. For example, the item of the word "World" is placed in the second line, but uses the same allocated space of the glyph information.

Format Items

Format items are special formats that exist in the text itself. Text elememnts such as the paragraph separator, line break, tabs and <item> tags create format items. Tabs and <item> format items can actually take up space in the text.

Layout

Textblock's layouting is divided to two main stages:

  • Creation of logical items as well as their respectful paragraphs (pre-layout)
  • Individual visual handling of each paragraph for cases such as lines geometry, wrapping etc.

Logical Items and Font Runs

We had a small example with text items, where it was shown a line with a single item is split due to line-wrapping. However, there are more cases where we would want to have more than one item in our text, to do the rendering job more efficiently.

{F11583, layout=right, float}

While in most text usages in the western world, you would've probably stick to latin-only characters, there are cases where you will need to use other type of scripts. For instance, if you want to explain that the word "Hello" is "שלום" in Hebrew (an RTL language), you would've written a text similar to this: "Helloשלום" (normally you would've at least add a whitespace between this couple of words).

{F11585, layout=right, float}

The above image shows the expected order of characters on-screen. The order in which these characters are stored in memory is a bit different:

The first thing that comes to mind is: how do we determine in which order we display these characters? Textblock handles this by creating a separate item for each "run" in the text. In brief, a "run" is the biggest sequence of characters, starting at a given position, that ends (not including) at the first character of a different script than the one of the first character in the run.

In this example, the runs are as followed: {F11587}

Pre-Layout

At this stage all of the first-level information (that has been set by user functions), such as text and format nodes, is processed. Each text node produces a single paragraph, and both are associated to one another. Logical textblock items are created and assigned to their respectful paragraphs. This after being set with actual formats, that have been produced from processing the format nodes.

Pre-layout Example #1 - Font Formats
NOTE: The queuing mechanism of formats will not be described in this first example, to keep things a bit simpler. For now, we just need to know that it remembers to which offset in the text node's text it assigns each format.

{F11736, layout=right, float, size=thumb}

  • A user uses the API to set a markup text:

evas_object_textblock_markup_set(tb, "Small <font_size=24>Big");

This instantiates text and format nodes: Evas_Object_Textblock_Node_Text and Evas_Object_Textblock_Node_Format.
The text node contains the text information, specifically the string "Small Big".
The format node contains the format string font_size=24, and has an offset field of 6 to associate it with the Big part in the text.
This is all that is done, so the process time is fast and efficient.
At the render stage, this new information will be processed to actual entities of text.

{F11739, layout=right, float}

  • The new nodes are marked as new and handled at the pre-render stage:

A single paragraph is created for this single text node. Now, each of the format nodes corresponding to this text node (all in our case) is being processed and dictates what needs to be done.

Note that an initial format is set to the textblock. It is set by the default base style. Let's assume that in this example it is a font of size 10.
So, the current format is font_size=10.
The first format node is processed, and it is determined that the current font_size=10 and new format font_size=24 differ in their fonts. As a result, a Text Item is created with the previous format font_size=10, and is associated with the Small part in the text. Furthermore, the current format is replaced with the new font_size=24 format.
There are no more format nodes left, so a second Text Item is created with the current font_size=24 format, and it is associated with the Big part in the text.
The newly-created logical items are consecutively stored in the logical_items list of the paragraph.

Pre-layout Example #2 - Color Formats
  • A user uses the API to set a markup text:

evas_object_textblock_markup_set(tb, "Black <color=#00f>Blue");

This instantiates text and format nodes: Evas_Object_Textblock_Node_Text and Evas_Object_Textblock_Node_Format.
The text node contains the text information, specifically the string "Black Blue".
The format node contains the format string color=#00f, and has an offset field of 6 to associate it with the Blue part in the text.

  • The new nodes are handled at the pre-render stage:

{F11828, layout=right, float}

Note that an initial format is set to the textblock. It is set by the default base style. Let's assume that in this example it's black.
So, the current format is color=#000.
The first format node is processed. This triggers the queuing mechanism to queue the first format color=#000, and set the current format as color=#00f. We overlooked this in the previous example to simplify things, but this is what happens for each format node.

Unlike the previous example, the current format node doesn't have a different font format. This means that we do not create two completely different Text items. Instead, two text items will share the same glyphs area in memory, as will be explained. So, the second format is just being queued with an offset specifying in where it starts in the text.
There are no more format nodes left. Once we are done, we create the text items. We use the first queued format color=#000 and create the first Text Item for the whole Black Blue text. Afterwards, we use the second queued format color=#00f. Upon this queued format, we split the Black Blue Text Item. Again, this results with two Text Items: one for Black and on for Blue. The color=#00f format is set to the second (Blue) text item.
The newly-created logical items are consecutively stored in the logical_items list of the paragraph.

You may notice that this is a different behavior than our previous example. Here, each of the Text Items have the same font information, so they share the same glyphs space, but with different offsets (i.e. "split items"). This allows some text behaviors to work like they should (e.g. clusters, ligatures) even if we use different color formats.

{F12044, layout=right, float}

Font Information Breakdown

The Evas_Text_Props structure holds all the required data to have a correct rendering and querying of the text it represents.
TBD

Visual Layout

This is where all the logical items are ordered.
Geometries of all items, lines and paragraphs are calculated here. All line wrapping and ellipsis handling is done here as well. This stage results in what is called "The Formatted Layout".

Wrapping

The Textblock object supports wrapping of the following types: character, word and mixed. If the user has set the style to have wrapping, and the content's width has exceeds the width of the textblock, then wrapping occurs, given there is enough height.

Ellipsis

When a user set the style to have ellipsis, and the content's width exceeds the width of the textblock, then ellipsis occurs. In this, the text is cut and a portion is replaced with the "..." so the text fits the width again. This is different than wrapping, such that no characters are moved to the lines below.

Formatted vs. Native Size

The width and the height values of the resulant layout are known as the "formatted size". It has the width that the user has set for the textblock object (using the evas_object_size_set function), and the the required height so that the whole text fits in it.
On the other hand, we have the "native size" values. This can be thought of as the formatted size, but with its width already set to infinity.
The difference between the two types of sizing is that text wrapping can *not* occur with the latter, as the width is big enough to fit the whole line of the original text. So, formatted height >= native height.
In actuality, the native size is calculated a bit differently: instead of running the layout process with an infinite width value, a different layouting mechanism is used. In this kind of layouting we skip wrapping handling. Lines are only created when meeting line or paragraph breaks. This makes the native size calculation a lot faster.

Unicode

Evas Textblock is working in accordance with the Unicode Standards:

  • Word (UAX #29) and line breaking (UAX #14) is supported via libunibreak.
  • Glyph shaping and information is retrieved using harfbuzz.

References

Grapheme

An atomic unit of a certain written language, which is used to represent a single distinct sound unit (phoneme). A grapheme may be a single letter or a combination of letters.

Glyph

A visual element in a language. Some languages may regard a certain symbol a glyph, while others may not. Glyphs may either represent graphemes or other symbols.

Ligatures

A combination of grapheme, represented by a single glyph. Some font faces of the same language may not consider the same combination of graphemes as a ligature, while others do.

Grapheme Clusters vs. Ligatures

First thing that needs to be clarified - grapheme clusters and ligatures are different. Indeed, both a ligature and a grapheme cluster can be formed as a result of two or more consecutive characters, but only a grapheme (cluster) is a "user-perceived" character.
[To be expanded...]

Bidi Properties

Usages

Textblock and Unicode

In this section we will discuss a bit more on how the Evas Textblock supports and handles unicode text.
We will not elaborate much on unicode here. If you are unfamiliar with the basics of unicode, it is advised to have a read first.

Evas_Text_Props

As noted, Evas_Text_Props struct is the core of every Evas Textblock object.
The Evas_Text_Props_Info
[TBD]

OpenType

The Evas_Text_Props has been expanded to support Opentype shaping, using harfbuzz as an OT shaper. Its shaping properties are stored in the Evas_Font_OT_Info and Evas_Font_Glyph_Info structures.
These are stored and may be shared among multiple Text Items, for cases such as color-formatting and even wrapping.

Last Author
herdsman
Last Edited
Dec 8 2014, 4:06 AM
Projects
None
Subscribers
singh.amitesh, JackDanielZ