Page MenuHomePhabricator

eina: remove usless newline
AbandonedPublic

Authored by cedric on Jan 15 2018, 10:59 AM.

Details

Summary

ecore_evas: remove debug

eina: unregister log level when done with

Fixes a constant memory leak.

eina: introduce EINA_HOT and EINA_COLD

These attributes respectivelly expand to attribute ((hot)) and
attribute ((cold)) when available. They allow to mark functions are
being hot/cold (frequently used or not) as well as to qualify labels
within a function (likely/unlikely branches).

eo: speed-up generated calls by removing call cache

The call cache needed to by thread-local, to avoid concurrency issues.
Problem with TLS is that is adds an extra overhead, which appears to be
greater than the optimization the cache provides.

Op is naturally atomic, because it is an unsigned integer. As such, it
cannot be tempered with while another thread is reading it. When
entering the generated function, the first operation done is reading
'op'. If we have concurrency, we will have access sequences returning
either EFL_NOOP or a VALID op, because 'op' is not set until the very
end of the function, when everything has been computed. As such, we now
use the 'op' atomic integer to instore a lock-free/wait-free mechanism,
which allows to drop the TLS nature of the cache, speeding up the access
to the cache, and therefore making functions execute faster.

We don't test anymore the generation count. This can be put as a
limitation. If means that if you call efl_object_shutdown() and
re-initialize it later with different data, opcodes will be invalid.
I am not sure there is any usecase for this to ever happen.
We could move all the caches in a dedicated section, that can be
overwritten after a call to efl_object_shutdown(), but I am not sure it
will be very portable.

Benchmark: mean over 3 executions of

ELM_TEST_AUTOBOUNCE=100 time elementary_test -to genlist
                     BEFORE               AFTER
------------------------------------------------------------
time (ns)            11114111647.0        9147676220.0
frames               2872.3333333333335   2904.6666666666665
time per frame (ns)  3869364.6666666665   3149535.3333333335
user time (s)        11.096666666666666   9.22
cpu (%)              22.666666666666668   18.333333333333332

Ref T6580

Diff Detail

Repository
rEFL core/efl
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
jayji created this revision.Jan 15 2018, 10:59 AM
jayji requested review of this revision.Jan 15 2018, 11:04 AM
jayji added reviewers: raster, cedric.

This is a first commit round addressing T6580. Please let me know what you think about it.
To measure the performance impact, I've used the following script: python3 ./run_bench.py. It spawns elementary_test with autobounce, and collects the results. It produces a final data.csv that aggregates the results. A mean is displayed on stdout at the end of the runs.

jayji edited the summary of this revision. (Show Details)Jan 15 2018, 11:05 AM

The call cache needed to by thread-local, to avoid concurrency issues.
Problem with TLS is that is adds an extra overhead, which appears to be
greater than the optimization the cache provides.

Op is naturally atomic, because it is an unsigned integer. As such, it...

Hmmm reading this patch you don't replace the cache with anything except static Efl_Object_Op ___op... It's still doing a full lookup now in this patch every time. So this basically just removes the cache entirely. I think op was like this prior to the cache. At least my benchmarking showed cache to be a win at the time. A small one, not a loss? you're saying the cache hurts instead of helps?

I've done some numbers. I did 100 bounces and ran it 3 times. I did 3 runs to get an idea of "volatility".

Intel(R) Atom(TM) x7-Z8750 CPU @ 1.60GHz -> 2.3% better
BEFORE PATCH (AVG=6544815):
  NS since 2 = 18821959642 , 2911 frames = 6465805 / frame
  NS since 2 = 19329443400 , 2936 frames = 6583597 / frame
  NS since 2 = 19024192304 , 2889 frames = 6585044 / frame
AFTER PATCH (AVG=6398077):
  NS since 2 = 18382234636 , 2967 frames = 6195562 / frame
  NS since 2 = 18137977766 , 2955 frames = 6138063 / frame
  NS since 2 = 20369146343 , 2969 frames = 6860608 / frame
i5-4200U CPU @ 1.60GHz -> 1.6% worse
BEFORE PATCH (AVG=2572242):
  NS since 2 = 8048576869 , 2970 frames = 2709958 / frame
  NS since 2 = 7837472726 , 2970 frames = 2638879 / frame
  NS since 2 = 7032637871 , 2970 frames = 2367891 / frame
AFTER PATCH (AVG=2614774):
  NS since 2 = 7763040253 , 2970 frames = 2613818 / frame
  NS since 2 = 7762357696 , 2970 frames = 2613588 / frame
  NS since 2 = 7772249419 , 2970 frames = 2616918 / frame
Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz -> 14.8% better
BEFORE PATCH (AVG=1397054):
  NS since 2 = 4426775116 , 2969 frames = 1490998 / frame
  NS since 2 = 3840920936 , 2871 frames = 1337833 / frame
  NS since 2 = 3958940154 , 2906 frames = 1362333 / frame
AFTER PATCH (AVG=1190854):
  NS since 2 = 3481158563 , 2871 frames = 1212524 / frame
  NS since 2 = 3281121154 , 2871 frames = 1142849 / frame
  NS since 2 = 3494554288 , 2871 frames = 1217190 / frame
Raspberry Pi3 (Wayland mode with E as compositor and GL Accel) -> 2.8% better
BEFORE PATCH (AVG=17800357):
  NS since 2 = 29441554545 , 1628 frames = 18084492 / frame
  NS since 2 = 29696646152 , 1672 frames = 17761152 / frame
  NS since 2 = 29387788564 , 1674 frames = 17555429 / frame
AFTER PATCH (AVG=17318537):
  NS since 2 = 27976573082 , 1641 frames = 17048490 / frame
  NS since 2 = 28797106727 , 1579 frames = 18237559 / frame
  NS since 2 = 27704812210 , 1662 frames = 16669562 / frame

So I guess it depends on the CPU type/generation and so on if this is better or not, but I guess it does hurt.

This revision was not accepted when it landed; it landed in state Needs Review.Jan 16 2018, 12:51 AM
Closed by commit rEFL34d9f2070696: eina: remove usless newline (authored by jayji, committed by raster). · Explain Why
This revision was automatically updated to reflect the committed changes.
jpeg added a comment.Jan 16 2018, 2:57 AM

Note that this broke eo_suite.

raster reopened this revision.Jan 16 2018, 4:04 AM

wha .. wait. i committed this? damn! i didn't mean to! sorry!

jpeg added a comment.Jan 16 2018, 8:16 AM

Well, it's in. To be honest I am quite dubious that the cache was actually helpful... It's too specific.
But I wonder if merging this patch isn't breaking ABI??

how does it break abi? ... it's all internals. the op id is assigned at runtime and never really exposed?

jpeg added a comment.Jan 17 2018, 1:22 AM

The change in Eo.h is not internal by definition.
It will affect all apps already compiled assuming they had their own .eo files. Unlikely, but still to be considered.

oh you mean people using eo api outside of efl... yes. it'd be a problem. but this will be a regular issue for them with eo in general. :)

jayji added a comment.Jan 23 2018, 9:28 AM

This is interesting. On my machine, results are always better, but I see you have a wider range of materials to test on :D
Also, sorry for taking to long to reply, but since I do not work professionally on the EFL, the time I can allocate to that sometimes changes without warning :/

Since the patch landed, I'll submit what I've done on top of that, but I didn't took the time to progress on the call resolution.

I think results will vary based on size of caches, also their relative speed to cpu clock and memory. variation in results per run is kind of tough to deal with. i can't find a way to have stable results. I've locked cpu to max cpuclock and performance mode etc. - still varies. i guess i could run fewer processes (render to framebuffer, and almost nothing on system except the elementary_test running and the shell it runs from)... but that's hard to do practically

don't worry about "being slow". that's just fine. do it when you find time! :)

cedric added a comment.Mar 6 2018, 5:54 PM

Is there any follow up on this patch ? Should maybe close this one as it landed and reopen with further improvement.

jayji added a comment.Mar 7 2018, 9:29 AM

Hi, not on this differential. I think you can safely close this one.

cedric commandeered this revision.Mar 7 2018, 9:36 AM
cedric edited reviewers, added: jayji; removed: cedric.

Good, then.

cedric abandoned this revision.Mar 7 2018, 9:36 AM