Page MenuHomePhabricator

Evas textblock: fix wrong hyphenation issues with non UTF8 encoded dictionary
ClosedPublic

Authored by id213sin on Apr 4 2016, 2:19 AM.

Details

Summary

hnj_hyphen_hyphenate2() needs properly encoded text based on the given
dictionary. Each dictionary contains its encoding information at the head
of file. So, text will be converted to proper encoding before calling
the function. It fixes T3221.
@fix

Test Plan

Included in Evas test suite.

Diff Detail

Repository
rEFL core/efl
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
id213sin updated this revision to Diff 8825.Apr 4 2016, 2:19 AM
id213sin retitled this revision from to Evas textblock: fix wrong hyphenation issues for multi bytes characters.
id213sin updated this object.
id213sin edited the test plan for this revision. (Show Details)
herdsman requested changes to this revision.Apr 4 2016, 8:55 AM
herdsman edited edge metadata.

The returned hyphens are not encoded in UTF-8 format (although we do pass the input as UTF-8).
I was doing the extra step to pass UTF-8, as the dictionary was UTF-8.

The result though is converted back to UTF-32.
Though, now I see that /usr/share/hyphen/hyph_de_DE.dic is encoded with ISO8859-1, so it was wrong to always pass the UTF-8 string.

It may be possible to solve this just passing the string as it is, and if the characters are valid DE language, the result will be fine. Either of us will have to test this.

This revision now requires changes to proceed.Apr 4 2016, 8:55 AM
id213sin updated this revision to Diff 8852.Apr 6 2016, 9:12 PM
id213sin edited edge metadata.

Update patch to check encoding information and convert text.

@herdsman
Thanks for helping me! :)

id213sin retitled this revision from Evas textblock: fix wrong hyphenation issues for multi bytes characters to Evas textblock: fix wrong hyphenation issues with non UTF8 encoded dictionary.Apr 6 2016, 9:13 PM
id213sin updated this object.
id213sin edited edge metadata.
tasn edited edge metadata.Apr 8 2016, 3:07 AM

I'm a bit surprised and sad to see that non-utf8 is actually legal. :( I'd say we should probably just restrict ourselves to utf-8 and open a ticket with upstream. This is quite ridiculous.

In D3863#63578, @tasn wrote:

I'm a bit surprised and sad to see that non-utf8 is actually legal. :( I'd say we should probably just restrict ourselves to utf-8 and open a ticket with upstream. This is quite ridiculous.

Well the encoding of the dictionary files is not an upstream issue. These are independently provided from various sources, it seems. The Arch Linux package DE dictionary is dated to '14.

If we still want to be strictly use utf-8, we could just convert the rogue dictionaries ourselves, so we avoid the runtime conversion.

tasn added a comment.Apr 14 2016, 2:35 AM

You don't want to convert on your own. Nevermind then. This sucks though.

raster added a subscriber: raster.Jul 11 2016, 1:41 AM

updates on this?

Hi,
Let's do some fixes and then we can get this in.

Thanks :)

src/lib/evas/canvas/evas_textblock_hyphenation.x
155

Shouldn't you pass the size in bytes instead i.e. converted_len? Also worth checking if it includes the optional BOM characters and subtract this.

161

Actually, this is the optional BOM (byte order mark) we discussed about. Better make that clear for anyone who sees this bit of code.

163

Just to be no the safe side, add a length check before checking the next byte.

herdsman requested changes to this revision.Jul 31 2016, 5:24 AM
herdsman edited edge metadata.

Ugh. Forgot.

This revision now requires changes to proceed.Jul 31 2016, 5:24 AM

@herdsman
Thanks for comment. :)
I'll update the patch soon.

src/lib/evas/canvas/evas_textblock_hyphenation.x
155

Yes, you're right. I'll replace word_len to (converted_len - converted_text_offset).

161

Ok, then I'll modify the comment.

163

I'll put more condition checks before and after checking BOM character.

id213sin updated this revision to Diff 9716.Aug 9 2016, 12:18 AM
id213sin edited edge metadata.

Update the following things.

  • Add clear comment for BOM character.
  • Add more condition before/after checking BOM character.
  • Pass proper length for converted text.
herdsman requested changes to this revision.Aug 11 2016, 5:15 AM
herdsman edited edge metadata.

Overall it looks good. However, the test suite fails (see inline comment).
I guess we need to provide the dictionary with the test suite?

Please share your opinion on this.

src/tests/evas/evas_test_textblock.c
4161

What happens if I don't have this specific dictionary installed?

This revision now requires changes to proceed.Aug 11 2016, 5:15 AM

@herdsman
I missed that case.
Please give me your opinion. :)

src/tests/evas/evas_test_textblock.c
4161

Oh, I missed that case. :P
But, if I provide a dict file in test suite,
I need a way to override or append a new path for dict.

How about to add a environment variable for appending a new path? :)

Yes, no problem about using an environment variable for that.

id213sin updated this revision to Diff 10019.Oct 24 2016, 12:39 AM
id213sin edited edge metadata.

Add hyph_de_DE.dic file for test-suite.
It also contain a change for getting path for dictionary files
via enviroment value.

@herdsman happy with the latest update?

herdsman requested changes to this revision.Feb 20 2017, 7:59 AM

Tested. Looks good with a few requested changes.
Also, better to also provide the en_US dictionary for the first test.

src/lib/evas/canvas/evas_textblock_hyphenation.x
141

Should be size_t. Should also remove the int cast below.

165

No need to cast. Also converted_text[0] and converted_text[1] is a bit more readable.

This revision now requires changes to proceed.Feb 20 2017, 7:59 AM
id213sin updated this revision to Diff 12835.Nov 1 2017, 12:20 AM

Update the following things.

  • Fix commented code.
  • Add hyph_en_US.dic and its README file.
  • Update hyph_de_DE.dic and add its README file.
zmike added a project: efl.May 2 2018, 4:13 PM
zmike added a subscriber: zmike.

This patch applies cleanly.

@herdsman

Tested and accepted. Thank you for the hard work :)

This revision was not accepted when it landed; it landed in state Needs Review.May 6 2018, 2:41 AM
This revision was automatically updated to reflect the committed changes.