Page MenuHomePhabricator

efl_ui_textpath: mathmatical calculations
Needs ReviewPublic

Authored by ali.alzyod on Jun 8 2020, 9:46 AM.

Details

Summary

Reduce number of sqrt calls.

Diff Detail

Repository
rEFL core/efl
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 16998
Build 11306: arc lint + arc unit
ali.alzyod created this revision.Jun 8 2020, 9:46 AM

It seems that this patch has no reviewers specified. If you are unsure who can review your patch, please check this wiki page and see if anyone can be added: https://phab.enlightenment.org/w/maintainers_reviewers/

ali.alzyod requested review of this revision.Jun 8 2020, 9:46 AM
cedric requested changes to this revision.Jun 8 2020, 10:11 AM
cedric added inline comments.
src/lib/elementary/efl_ui_flip.c
373 ↗(On Diff #30567)

You could use sincos to have just one call and rely on the CPU having instruction that calculate them in one pass.

375 ↗(On Diff #30567)

Same as above.

src/lib/evas/canvas/evas_map.c
1043 ↗(On Diff #30567)

Same as above, but you could also do this in a more optional way by using a boolean that tell you if the sincos has been calculated already and only calculate it if necessary.

This revision now requires changes to proceed.Jun 8 2020, 10:11 AM
ali.alzyod updated this revision to Diff 30577.Jun 9 2020, 12:24 AM
ali.alzyod marked 3 inline comments as done.

use sincos instead of two calls for sin and cos

src/lib/evas/canvas/evas_map.c
1043 ↗(On Diff #30567)

Have dirty flag and cache results is more complicated than this patch Idea.

raster added a subscriber: raster.Jun 9 2020, 3:02 AM

don't take this the wrong way, but i think 90% of the time you'r chasing the wrong thing. the compiler already does this. you need to find the cases where the compiler does NOT do something smart ... (and -O0 doesn't count! -O2/3 should be what you worry about). just trying to encourage you to focus on the valuable things. reviewing patches that actually make no difference (and possibly make it harder to read/follow/understand) is not useful for reviewers or for you in creating the patches :) focus on the value bits... if you want pointers into what could do with speedups that i am pretty sure are a win in almost all if not all cases on average, then let's have a chat :).

src/lib/elementary/efl_ui_flip.c
381 ↗(On Diff #30577)

so we went from this sequence of math calls:

sqrt, pow, sin, asin, sin, sin

then sin to this:

sincos, sincos, sqrt, pow, asin, sincos

we actually do 6 math calls no mater what. now .. the question is ... i the new sequence faster? sqrt, pow, asin are the same so we're talking sincos vs sin speed.

without actually benchmarking my gut says "same speed". so we're talking a change that makes zero difference. imho it makes it less readable and clear.

388 ↗(On Diff #30577)

are you sure the above is actually faster (not -O0 - real life -O usage)? the compiler SHOULD see that cos(rho) and sin(rho) is called twice with no change in rho, store the result and call only once and re-use. that would make the above changes a NOP.

src/lib/elementary/efl_ui_textpath.c
292

this also is dubiously faster. it does make 1 sqrt conditional, adds a mul in return for that and that is fair and probably a bit faster. so in this case i think you win - so this comment is not a "change things" but a "i think you actually may have found a positive change here" :)

src/lib/elementary/elm_gengrid.c
1728 ↗(On Diff #30577)

are you sure the compiler doesn't already do this for you? are see its the same sin call with same vars that have not changed? :) double-check with gcc -s and read the asm :)

src/lib/evas/canvas/evas_map.c
1074 ↗(On Diff #30577)

now again.. does the compiler already figure this out for you? :) because i'm hinting at this... it does do that. try gcc -s with:

double x = atof(argv[1]);
printf("hello1 %1.1f\n", sin(x));
printf("hello2 %1.1f\n", sin(x));

and see how many times sin() is called :)

you're probably much better trying to beat the compiler by moving the if (rz/ry/rz == 0) outside of the loop to have no if's inside and "unroll it" or check the compiler does the unrolling for you... but given what i've seen from gcc your change here is a noop as it calculates the above just once. you will have replaced a sin+cos with a single sincos though... that is a win, but maybe not as much as you think. just as a hint, extract this function/code snippet into a stand-alone test you can benchmark and then read the asm ouptput too to see what the compiler does and i think you may find your wins are elsewhere like the unrolling of if's (so you then have 8 version of the loop with rz/y/z == 0 or not so its really 3 bits thus 8 versions of it).

@raster lets look at the following code:

#include<math.h>
#include<time.h>
#include<stdio.h>
#include<stdlib.h>

void nothing(double a, double b)
{
  (void) a;
  (void) b;
}

int main()
{
  clock_t start , end;
  double time_used = 0;
  int i,j;
  double s;
  double c;

  start = clock();
  for(j = 0 ; j < 100 ; j++)
  for(i = 0 ; i < 100000 ; i++)
  {
    double angle = ((double)(rand()%720)) / 2.0;
    s = sin(angle);
    c = cos(angle);
    nothing(sin(angle) , cos(angle));
  }
  end = clock();
  time_used = ((double)(end - start)) / CLOCKS_PER_SEC;
  printf("time before = %f\n", time_used);


  start = clock();
  for(j = 0 ; j < 100 ; j++)
  for(i = 0 ; i < 100000 ; i++)
  {
    double angle = ((double)(rand()%720)) / 2.0;
    sincos(angle, &s, &c);
    nothing(s , c);
  }
  end = clock();
  time_used = ((double)(end - start)) / CLOCKS_PER_SEC;
  printf("time after = %f\n", time_used);
}

even with -O3 you will notice that that change is faster, I know compiler optimization is very smart and very powerful, and we are not doing black magic to trick it,

bu5hm4n added a subscriber: bu5hm4n.Jun 9 2020, 3:49 AM

Don't look at the times, take a look at objdump -S a.out the outputted asm is equivalent. Also take a look where the calls to clock are located. The time advantage you see is *not* coming from the different call. infact, the sin and cos calls are stripped out completly. As a mixture of inlining, dead code removal, and const folding removed it.

ali.alzyod added a comment.EditedJun 9 2020, 4:11 AM

Don't look at the times, take a look at objdump -S a.out the outputted asm is equivalent. Also take a look where the calls to clock are located. The time advantage you see is *not* coming from the different call. infact, the sin and cos calls are stripped out completly. As a mixture of inlining, dead code removal, and const folding removed it.

Looking at asm (I use) https://godbolt.org/
the asm code is not the same for both cases:
lets try more simple version:

#include<math.h>
#include<time.h>
#include<stdio.h>
#include<stdlib.h>

int main()
{

  double s;
  double c;
  double angle = ((double)(rand()%720)) / 2.0;
  s = sin(angle);
  c = cos(angle);
  //sincos(angle, &s, &c);  
}

I do not see them both as equivalent , also please not sincos() is always faster or equal in speed to sin() cos() (depending on architecture )

The code *is* the same, Add -O3 and you will just see that the code lines are not mapped to anything in asm. Even on the wegpage. The next example also gets eliminated the cos and sin calls.

The code *is* the same, Add -O3 and you will just see that the code lines are not mapped to anything in asm. Even on the wegpage. The next example also gets eliminated the cos and sin calls.

I think GCC do that, but for example CLANG does not do it.
but since we are interested in GCC only I will drop this one

vtorri added a subscriber: vtorri.Jun 9 2020, 6:04 AM

note that there is a company which is actually trying to help the windows port, using the microsoft compiler, so please don't consider that only gcc will be used

exclude sin/cos changes

ali.alzyod retitled this revision from efl: reduce sin/cos calculations to efl_ui_textpath: mathmatical calculations.Jun 16 2020, 12:24 AM
ali.alzyod edited the summary of this revision. (Show Details)
ali.alzyod added reviewers: raster, bu5hm4n.
vtorri added inline comments.Jun 16 2020, 12:53 AM
src/lib/elementary/efl_ui_textpath.c
292

why using len_2 (used only in the test) ?

i would have done

if (line_len2 > len_2*len_2)

vtorri added inline comments.Jun 16 2020, 12:58 AM
src/lib/elementary/efl_ui_textpath.c
292

len*len, not len_2*len_2

ali.alzyod updated this revision to Diff 30653.Jun 16 2020, 1:08 AM

remove len_2

ali.alzyod marked 2 inline comments as done.Jun 16 2020, 1:08 AM
ali.alzyod added a reviewer: vtorri.