Page MenuHomePhabricator

evas: change way of searching for Escape strings/values in textblock
ClosedPublic

Authored by ali.alzyod on Apr 15 2019, 10:32 AM.

Details

Summary

Instead of using old way for linear search for Escape Strings or Escape values, we will sort these values and string in compile time then binary search them.

In simple words:
Instead of having one array with pair of {escapeChar,escapeValue} and linear search it.
We will have two arrays with pair of {escapeChar,escapeValue}, one with escapeChar Sorted, and one with escapeValue sorted.
and we will use one of the array to binary search escape chars, and use the other to binary search escape values,

1- This will increase the speed for the search a lot for both Escape characters and Escape values.
2- Make code more easy to understand and trace
3- This will also fix bug for

int value;
const char * value = evas_textblock_string_escape_get("",&value)
//because of some unhanded case in previous code, this will return "&qout;" , which is first element in predefined escape character array
Test Plan
#define EFL_EO_API_SUPPORT 1
#define EFL_BETA_API_SUPPORT 1
#include<Eina.h>
#include<Efl.h>
#include <Elementary.h>

int main(int argc, char **argv)
{
   Evas_Object *win,*textblock;

   char * escapes[] = {
      "&Aacute;","&Acirc;","&Aelig;","&Agrave;","&Aring;","&Atilde;","&Auml;","&Ccedil;","&Dagger;",
      "&Eacute;","&Ecirc;","&Egrave;","&Eth;","&Euml;","&Iacute;","&Icirc;","&Igrave;","&Iuml;","&Ntilde;","&Oacute;","&Ocirc;","&Ograve;","&Oslash;","&Otilde;",
      "&Ouml;","&Thorn;","&Uacute;","&Ucirc;", "&Ugrave;", "&Yacute;", "&aacute;", "&acirc;", "&acute;", "&aelig;", "&agrave;", "&alpha;", "&amp;", "&and;", "&apos;",
      "&aring;", "&atilde;", "&auml;", "&beta;", "&brvbar;", "&bull;", "&ccedil;", "&cedil;", "&cent;", "&chi;", "&copy;", "&curren;", "&dagger;", "&darr;", "&deg;",
      "&delta;", "&divide;", "&eacute;", "&ecirc;", "&egrave;", "&epsilon;", "&equiv;", "&eta;", "&eth;", "&euml;", "&euro;", "&exist;", "&forall;", "&frac12;", "&frac14;", "&frac34;", 
      "&gamma;", "&gt;", "&harr;", "&hellip;", "&iacute;", "&icirc;", "&iexcl;", "&igrave;", "&int;", "&iota;", "&iquest;", "&iuml;", "&kappa;", "&lambda;", "&laquo;", "&larr;", "&larr;", "&lrm;", 
      "&lt;", "&macr;", "&micro;", "&middot;", "&mu;", "&nabla;", "&nbsp;", "&ne;", "&not;", "&ntilde;", "&nu;", "&oacute;", "&ocirc;", "&ograve;", "&omega;", "&omicron;", "&oplus;", "&or;", 
      "&ordf;", "&ordm;", "&oslash;", "&otilde;","&ouml;", "&para;", "&perp;", "&phi;", "&pi;", "&plusmn;", "&pound;", "&prod;", "&psi;", "&quot;", "&raquo;", "&rarr;", "&rarr;", "&reg;", "&rho;",
      "&rlm;","&sect;","&shy;","&sigma;", "&sum;", "&sup1;", "&sup2;", "&sup3;", "&szlig;","&tau;", "&theta;", "&thorn;","&times;","&uacute;", "&uarr;",
      "&ucirc;", "&ugrave;", "&uml;", "&upsilon;", "&uuml;", "&xi;",  "&yacute;", "&yen;", "&yuml;", "&zeta;", "&zwj;","&zwnj;",
      "&XXXXX&","&some text;","&WhatIsThis;" /*This is some invalide escape chars*/
      };

   char * common_escapes[] = {
      "&quot;","&amp;","&apos;","&lt;","&gt;",
      };


   char * values[] = {
      "\xc3\x81","\xc3\x82","\xc3\x86","\xc3\x80","\xc3\x85","\xc3\x83","\xc3\x84","\xc3\x87","\xe2\x80\xa1","\xc3\x89","\xc3\x8a","\xc3\x88","\xc3\x90","\xc3\x8b","\xc3\x8d",
      "\xc3\x8e","\xc3\x8c","\xc3\x8f","\xc3\x91","\xc3\x93","\xc3\x94","\xc3\x92","\xc3\x98","\xc3\x95","\xc3\x96","\xc3\x9e","\xc3\x9a","\xc3\x9b","\xc3\x99","\xc3\x9d","\xc3\xa1","\xc3\xa2","\xc2\xb4",
      "\xc3\xa6","\xc3\xa0","\xce\x91","\x26","\xe2\x88\xa7","\x27","\xc3\xa5","\xc3\xa3","\xc3\xa4","\xce\x92","\xc2\xa6","\xe2\x80\xa2","\xc3\xa7","\xc2\xb8","\xc2\xa2","\xce\xa7","\xc2\xa9","\xc2\xa4",
      "\xe2\x80\xa0","\xe2\x86\x93","\xc2\xb0","\xce\x94","\xc3\xb7","\xc3\xa9","\xc3\xaa","\xc3\xa8","\xce\x95","\xe2\x89\xa1","\xce\x97","\xc3\xb0","\xc3\xab","\xe2\x82\xac","\xe2\x88\x83","\xe2\x88\x80",
      "\xc2\xbd","\xc2\xbc","\xc2\xbe","\xce\x93","\x3e","\xe2\x86\x94","\xe2\x80\xa6","\xc3\xad","\xc3\xae","\xc2\xa1","\xc3\xac","\xe2\x88\xab","\xce\x99","\xc2\xbf","\xc3\xaf","\xce\x9a","\xce\x9b",
      "\xc2\xab","\xe2\x86\x90","\xe2\x87\x90","\xe2\x80\x8e","\x3c","\xc2\xaf","\xc2\xb5","\xc2\xb7","\xce\x9c","\xe2\x88\x87","\xc2\xa0","\xe2\x89\xa0","\xc2\xac","\xc3\xb1","\xce\x9d",
      "\xc3\xb3","\xc3\xb4","\xc3\xb2","\xce\xa9","\xce\x9f","\xe2\x8a\x95","\xe2\x88\xa8","\xc2\xaa","\xc2\xba","\xc3\xb8","\xc3\xb5","\xc3\xb6","\xc2\xb6","\xe2\x8a\xa5","\xce\xa6","\xce\xa0",
      "\xc2\xb1","\xc2\xa3","\xe2\x88\x8f","\xce\xa8","\x22","\xc2\xbb","\xe2\x86\x92","\xe2\x87\x92","\xc2\xae","\xce\xa1","\xe2\x80\x8f","\xc2\xa7","\xc2\xad","\xce\xa3","\xe2\x88\x91","\xc2\xb9",
      "\xc2\xb2","\xc2\xb3","\xc3\x9f","\xce\xa4","\xce\x98","\xc3\xbe","\xc3\x97","\xc3\xba","\xe2\x86\x91","\xc3\xbb","\xc3\xb9","\xc2\xa8","\xce\xa5","\xc3\xbc","\xce\x9e","\xc3\xbd","\xc2\xa5",
      "\xc3\xbf","\xce\x96","\xe2\x80\x8d","\xe2\x80\x8c"
      };

   clock_t start,end;
   char* value;
   int random,iRet;
   double total_time;
   srand(time(NULL));
   int arraySize;
   
   start = clock();
   arraySize = sizeof(escapes)/sizeof(char*);
   for(int i = 0 ; i < 1000000 ; i++){
      random = rand()%arraySize;
      value = evas_textblock_escape_string_get(escapes[random]);   
   }
   end = clock();
   total_time = ((double)(end-start))/CLOCKS_PER_SEC;
   printf("total time for escape_string is \t%f\n",total_time);


   start = clock();
   arraySize = sizeof(common_escapes)/sizeof(char*);
   for(int i = 0 ; i < 1000000 ; i++){
      random = rand()%arraySize;
      value = evas_textblock_escape_string_get(common_escapes[random]);   
   }
   end = clock();
   total_time = ((double)(end-start))/CLOCKS_PER_SEC;
   printf("total time for common escape_string is \t%f\n",total_time);


   start = clock();
   arraySize = sizeof(values)/sizeof(char*);
   for(int i = 0 ; i < 1000000 ; i++){
      random = rand()%arraySize;
      value = evas_textblock_string_escape_get(values[random],&iRet);   
   }
   end = clock();
   total_time = ((double)(end-start))/CLOCKS_PER_SEC;
   printf("total time for string_escape is \t%f\n",total_time);


   return 0;
}

Old Code
total time for escape_string is 2.370570
total time for common escape_string is 0.100057
total time for string_escape is 2.430365

New Code
total time for escape_string is 0.168964
total time for common escape_string is 0.078172
total time for string_escape is 0.136468

Diff Detail

Repository
rEFL core/efl
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 10876
Build 8457: arc lint + arc unit
ali.alzyod created this revision.Apr 15 2019, 10:32 AM

It seems that this patch has no reviewers specified. If you are unsure who can review your patch, please check this wiki page and see if anyone can be added: https://phab.enlightenment.org/w/maintainers_reviewers/

ali.alzyod requested review of this revision.Apr 15 2019, 10:32 AM
ali.alzyod edited the summary of this revision. (Show Details)Apr 15 2019, 10:38 AM
ali.alzyod edited the summary of this revision. (Show Details)Apr 15 2019, 10:42 AM
ali.alzyod added reviewers: woohyun, bowonryu.
ali.alzyod edited the summary of this revision. (Show Details)Apr 15 2019, 10:46 AM

Replacing a linear search with a binary search is indeed a good thing. However, since this is a performance improvement, you should provide performance NUMBERS (As you did before, create a test case with lots of text and escape sequences).

I am surprised to see that the table of escapes contains all the & and ; chars. This hints that the algorithm could be improved... Do we first look for an & and THEN lookup the escapes table?
I tried to understand when is this escape sequence replacement performed but I got lost, so please excuse me if my suggestion is already in place.

src/lib/evas/canvas/evas_object_textblock.c
1610

The comment should say escape_values_e_sorted

ali.alzyod edited the test plan for this revision. (Show Details)Apr 16 2019, 5:13 AM
ali.alzyod added a comment.EditedApr 16 2019, 5:16 AM

@segfaultxavi Yes I think we can skip first '&' and last ';' from checking, this will enhance performance more, but I suggest doing it in other patch.
because this required other changes for exposed function, where we need to handle & in begining of string before passing it to internal algorithm, also to trim last ; from string

src/lib/evas/canvas/evas_object_textblock.c
1610

It will explain next variable that is called escape_values_v_sorted

segfaultxavi added inline comments.Apr 16 2019, 5:25 AM
src/lib/evas/canvas/evas_object_textblock.c
1610

Sorry, I meant escape_values_v_sorted (the comment says escape_values_s_sorted).

ali.alzyod added inline comments.Apr 16 2019, 5:54 AM
src/lib/evas/canvas/evas_object_textblock.c
1610

The comment means that, when you add new value to escape_values_v_sorted, then you should also reflect escape_values_s_sorted.

becase one contains escape_string sorted values, and the other contains escape_value sorted values

Well, these numbers are definitely impressive. Thanks.

One last thing. The original comment said that the escapes were sorted with the most common ones at the beginning. That is no longer the case. Could you run the test searching for only the 5 escapes marked as /* most common escaped stuff */ in the original code?
The problem with using artificial text sequences for the benchmarks is that they might not catch problems with natural (real) sequences.

src/lib/evas/canvas/evas_object_textblock.c
1610

Haha, yeah, and what I mean is that there is nothing called escape_values_s_sorted in the code :) notice the _s_.
What the comment wants to say is escape_values_e_sorted, notice the _e_.

ali.alzyod edited the test plan for this revision. (Show Details)Apr 16 2019, 9:31 AM
ali.alzyod added a comment.EditedApr 16 2019, 9:53 AM

@segfaultxavi

Old Code
total time for escape_string is 2.370570
total time for common escape_string is 0.100057
total time for string_escape is 2.430365

New Code
total time for escape_string is 0.133139
total time for common escape_string is 0.126393
total time for string_escape is 0.123412

With only pure common escape string (in old code/ still I think more should be added to this list) it is slitly slower, but still I think it is better to use new code, for example if we moved one or two tags to common list, speed will start dropping.
and new code is more likely to work with any tag that it can find.

We can create other list for common tags, and code can give slitly better results but it will complicate things up.

ali.alzyod edited the summary of this revision. (Show Details)
  • fix typo in the comments
ali.alzyod marked an inline comment as done.Apr 16 2019, 10:03 AM
segfaultxavi requested changes to this revision.Apr 16 2019, 10:42 AM

We have reached now a classical point. The new code is much better in the general case, but the old one is faster for the allegedly most common cases.
I don't think there's a good solution so maybe you can do a linear search on the short list of common escapes and then the binary search on the full list?

I have to ask: are you trying to optimize any particular scenario? Why are you looking at improving this function?

This revision now requires changes to proceed.Apr 16 2019, 10:42 AM
ali.alzyod added a comment.EditedApr 16 2019, 11:55 AM

@segfaultxavi I can update the code to get better results than old one for pure common escapes,
But still I think even if there are comment in the code that these are most common tags,this is not 100% true. (I mean that these are necessary the common ones), I can not find reference telling what are the most common one, for example this site show different list
https://www.impressivewebs.com/html-entity-reference-common/

And for your question, I want to update this one, to reuse the code if we are going to support named colors in EFL, like "red" "green", so searching binary list will make things much faster for big color names list

cedric requested changes to this revision.Apr 16 2019, 2:22 PM

Also it would be nice to follow efl coding style.

-search common escapes first
-EFL coding conventions

ali.alzyod edited the test plan for this revision. (Show Details)Apr 16 2019, 11:41 PM
ali.alzyod edited the test plan for this revision. (Show Details)
This comment was removed by ali.alzyod.
This comment was removed by ali.alzyod.
This comment was removed by ali.alzyod.
This comment was removed by ali.alzyod.
This comment was removed by ali.alzyod.

Can you update the benchmark numbers with the latest approach?
Thanks for taking the time to fix all this!
(BTW, what happened with D8620 ? nobody can open that ticket.)

src/lib/evas/canvas/evas_object_textblock.c
1451

Sorry, but I'll keep following you! The comment should say escape_values_e_sorted. Notice the _e_.

ali.alzyod added a comment.EditedApr 17 2019, 1:32 AM

@segfaultxavi I already update them in Test Plan

@segfaultxavi I already update them in Test Plan

Oh, sorry, I had not realized. Indeed we have very impressive numbers now :)
Just fix the comment (that has reverted itself) and I'm OK with this.

ali.alzyod updated this revision to Diff 21400.Apr 17 2019, 1:47 AM
  • fix typo in the comments
ali.alzyod marked an inline comment as done.Apr 17 2019, 1:47 AM
segfaultxavi added inline comments.Apr 17 2019, 3:13 AM
src/lib/evas/canvas/evas_object_textblock.c
7739

This condition looks very strange, because this should be the normal condition (string longer than escape).
In this case, you always return 1, which means always &apos; or &iexcl;.

Are you sure this function is working? Can you add checks to the benchmark to make sure that the returned escapes match the expected results?

ali.alzyod added inline comments.Apr 17 2019, 3:30 AM
src/lib/evas/canvas/evas_object_textblock.c
7739

There are no return value here, This will continue binary search upper or lower.

Value will return only if two strings are equal

ali.alzyod added inline comments.Apr 17 2019, 3:34 AM
src/lib/evas/canvas/evas_object_textblock.c
7739

because there are no version for strcmp that compare specific length of string with other full string. (you can not use strncmp)

segfaultxavi accepted this revision.Apr 17 2019, 3:44 AM

I have got no further comments :)

Also, I noticed the testsuite is already testing for correctness of these methods, nice!

src/lib/evas/canvas/evas_object_textblock.c
7739

Oh, sorry, you're right. Not my brightest day.

bu5hm4n requested changes to this revision.Apr 17 2019, 5:41 AM
bu5hm4n added a subscriber: bu5hm4n.

The code still is not in the coding standards of EFL, "while (" not "while(" same for the if's. We also do not name things in camel, so its not nRet but rather n_ret also there are normally a bit more " " a if should look for example like this`if (n_ret != -1)`. Please see https://www.enlightenment.org/contrib/devs/coding-conventions.md

This revision now requires changes to proceed.Apr 17 2019, 5:41 AM
ali.alzyod updated this revision to Diff 21403.Apr 17 2019, 5:51 AM
This comment was removed by ali.alzyod.
ali.alzyod updated this revision to Diff 21404.Apr 17 2019, 5:54 AM

follow EFL code convention for while/if statments

ali.alzyod updated this revision to Diff 21406.Apr 17 2019, 5:57 AM

code convention

Yeah, it should make "finding escape string" be faster.
If there would be no objection (or feedback), I'll accept this patch soon.

bu5hm4n resigned from this revision.Apr 18 2019, 11:42 PM

Feel free to approve it :).

However, there are still parts where the coding convention is not kept. The type is still in CamelCase, they should rather be Escape_Value instead of EscapeValue. And from time to time spaces after commas are missing. This however should not be a big blocker.

ali.alzyod updated this revision to Diff 21571.Apr 23 2019, 9:52 AM

fix code convention issues

bu5hm4n retitled this revision from Change Way of searching for Escape strings/values to evas: change way of searching for Escape strings/values in textblock.Apr 25 2019, 5:00 AM
This revision was not accepted when it landed; it landed in state Needs Review.Apr 25 2019, 5:04 AM
Closed by commit rEFL555ac0a452c3: evas: change way of searching for Escape strings/values in textblock (authored by ali.alzyod, committed by Marcel Hollerbach <mail@marcel-hollerbach.de>). · Explain Why
This revision was automatically updated to reflect the committed changes.