Hi,
I was wondering lately what good did the whole eo stuff brought to the EFL. I
wanted to see if there was a noticeable overhead between before and after the
introduction of Eo.
I've made a simple benchmark with graphical program, first linked with EFL
master (0f7c5582a4 for the record) and then with EFL 1.7 (v1.7.10 to be exact).
This test program is strongly inspired from the genlist test with autobounce.
Its code (messy) is here: https://pastebin.com/gKCFkQjY. A "full" genlist
occupying the whole window was required to scroll from top to bottom, and then
from bottom to top, every two seconds, for a total of 10 scrolls, with 500
items. The code was unmodified between the tests, it was just recompiled to be
linked against the appropriate EFL version. The window was maximized, for a
screen resolution of 1920x1080. The following is considered:
- memory usage with valgrind+massif,
- Linux' perf, to see where each program spends most of its time,
- system resource usage, as retrieved by "time",
- output of the autobounce benchmark.
Data are consistent from one execution of the program to another, given the
same conditions of execution. The tests were performed with both the OpenGL-X11
and Software-X11 renderers. Both EFL are built from source in -O2.
Before going through the metrics, I have to say that the (my) perceived
responsiveness of the program is superior with EFL 1.7. I find this consistent
after manually fooling around with elementary_test.
1) Memory usage
- EFL 1.7 (software-x11): peak at 6.3 MiB
- EFL 1.7 (opengl-x11): peak at 33.7 MiB, in nominal, around 32 MiB
- EFL master (software-x11): peak at 10.2 MiB, in nominal around 8 MiB
- EFL master (opengl-x11): 36.1 MiB, in nominal mode, always around 34 MiB
I observe about the same difference between EFL 1.7 and EFL master for
terminology (v0.1.0). So there is definitely a slight increase in memory
usage, but it may not that big of a deal. Considering all the data allocated
by Eo, this seems reasonable.
2) Linux Perf
Or: where are we spending our time?
Software-X11
EFL 1.7:
12,91% a.out libevas.so.1.7.10 [.] _op_blend_p_dp_sse3 12,25% a.out libevas.so.1.7.10 [.] scale_rgba_in_to_out_clip_sample_internal 3,82% a.out libevas.so.1.7.10 [.] _op_copy_p_dp_mmx 3,01% a.out libevas.so.1.7.10 [.] _op_blend_p_dp_mmx 1,75% a.out [kernel.kallsyms] [k] clear_page_erms 1,44% a.out [kernel.kallsyms] [k] shmem_getpage_gfp 1,38% a.out libeina.so.1.7.10 [.] _eina_chained_mempool_alloc_in 1,36% a.out libevas.so.1.7.10 [.] evas_event_thaw 1,28% a.out libeina.so.1.7.10 [.] eina_chained_mempool_free 1,21% a.out libc-2.26.so [.] _int_malloc [...]
EFL master:
6,06% a.out libeo.so.1.20.99 [.] _efl_object_call_resolve 6,03% a.out libevas.so.1.20.99 [.] _evas_event_object_list_raw_in_get.part.10 5,36% a.out libeo.so.1.20.99 [.] _eo_obj_pointer_get 3,36% a.out libevas.so.1.20.99 [.] _evas_event_object_list_raw_in_get_single.constprop.37 3,23% a.out ld-2.26.so [.] _dl_update_slotinfo 2,25% a.out libc-2.26.so [.] _int_malloc 1,63% Eevas-thread-wk libevas.so.1.20.99 [.] _op_blend_p_dp_mmx 1,62% a.out libc-2.26.so [.] _int_free 1,52% Eevas-thread-wk [kernel.kallsyms] [k] copy_user_enhanced_fast_string 1,37% a.out libeo.so.1.20.99 [.] efl_isa 1,37% a.out libeo.so.1.20.99 [.] _efl_object_call_end 1,32% a.out libevas.so.1.20.99 [.] evas_object_recalc_clippees 1,32% a.out libpthread-2.26.so [.] __pthread_getspecific 1,22% a.out libedje.so.1.20.99 [.] _edje_part_recalc
OpenGL-X11
EFL 1.7:
2,58% a.out libc-2.26.so [.] _int_malloc 2,42% a.out libevas.so.1.7.10 [.] evas_event_thaw 2,30% a.out libevas.so.1.7.10 [.] evas_object_event_callback_call 2,20% a.out libeina.so.1.7.10 [.] _eina_chained_mempool_alloc_in 2,19% a.out libpthread-2.26.so [.] __pthread_mutex_lock 2,03% a.out libc-2.26.so [.] __GI___strcmp_ssse3 2,01% a.out libevas.so.1.7.10 [.] _evas_event_object_list_raw_in_get.part.4 1,97% a.out libeina.so.1.7.10 [.] eina_chained_mempool_free 1,93% a.out libedje.so.1.7.10 [.] _edje_part_recalc_single 1,91% a.out libedje.so.1.7.10 [.] _edje_part_recalc 1,77% a.out libc-2.26.so [.] __strlen_avx2 1,51% a.out module.so [.] pipe_region_intersects 1,49% a.out libc-2.26.so [.] __GI___printf_fp_l 1,44% a.out libc-2.26.so [.] _int_free 1,34% a.out libpthread-2.26.so [.] __pthread_mutex_unlock 1,20% a.out libeina.so.1.7.10 [.] eina_strbuf_common_append
EFL master:
6,58% a.out libeo.so.1.20.99 [.] _efl_object_call_resolve 5,71% a.out libeo.so.1.20.99 [.] _eo_obj_pointer_get 4,87% a.out libevas.so.1.20.99 [.] _evas_event_object_list_raw_in_get.part.10 3,41% a.out ld-2.26.so [.] _dl_update_slotinfo 2,89% a.out libevas.so.1.20.99 [.] _evas_event_object_list_raw_in_get_single.constprop.37 2,36% a.out libc-2.26.so [.] _int_malloc 1,68% a.out libc-2.26.so [.] _int_free 1,61% a.out libpthread-2.26.so [.] __pthread_getspecific 1,60% a.out module.so [.] _evas_gl_common_context_push 1,55% a.out libeo.so.1.20.99 [.] efl_isa 1,39% a.out libeo.so.1.20.99 [.] _efl_object_call_end 1,33% a.out libedje.so.1.20.99 [.] _edje_part_recalc [...]
Seems that in EFL master we are passing most of our time doing Eo stuff. I was
hoping at least for the software renderer to spend more time drawing than
trying to resolve function calls.
3,4) Autobounce output + system resources
This is definitely the most interesing test, and I'm not sure to like my
understanding of the data.
+-----------+--------+-----------------+--------+---------------------+---------------+----------------+ | Renderder | EFL | Time Spent (ns) | Frames | Time (ns) per frame | CPU Usage (%) | Total Time (s) | +-----------+-----------------+--------+------------------------------+---------------+----------------+ | Software | 1.7 | 4_172_005_256 | 279 | 14_953_423 | 24 | 4.53 | | Software | master | 4_557_504_828 | 93 | 49_005_428 | 34 | 7.42 | | OpenGL | 1.7 | 2_723_094_193 | 279 | 9_760_194 | 16 | 3.55 | | OpenGL | master | 4_568_038_976 | 126 | 36_254_277 | 34 | 7.68 | +-----------+-----------------+--------+------------------------------+---------------+----------------+
I think the total time taken to perform the same operation between the two
version of EFL confirm that we definitely lost performance at some point.
This difference is clearly seen when running under valgrind. EFL master is so
slow it cannot perform its scrolling animation, while EFL 1.7 is fluid, even
with the software renderer.
I am intrigued by the significant difference in term of frames. EFL master
displays less 50% frames than EFL 1.7, and takes longer doing so, and consumes
more CPU time.
I know benchmarks should be considered with care, and I'm conscious there might
be flaws in my testing, but they confirm that EFL 1.7 had a way better
perceived responsiveness to me.
I tried to investifate the problem by removing the pointer indirection that Eo
does. I got a noticeable performance improvement (not in perceived
responsiveness though), but there it still a big difference with EFL 1.7:
+-----------+--------+-----------------+--------+---------------------+---------------+----------------+ | Renderder | EFL | Time Spent (ns) | Frames | Time (ns) per frame | CPU Usage (%) | Total Time (s) | +-----------+-----------------+--------+------------------------------+---------------+----------------+ | OpenGL | master | 4_513_911_510 | 134 | 33_685_906 | 33 | 7.45 | | Software | master | 4_453_696_881 | 100 | 44_536_968 | 32 | 7.09 | +-----------+-----------------+--------+------------------------------+---------------+----------------+
The perf traces below show where we are taking our time:
With OpenGL:
6,39% a.out libevas.so.1.20.99 [.] _evas_event_object_list_raw_in_get.part.10 6,15% a.out libeo.so.1.20.99 [.] _efl_object_call_resolve 4,01% a.out ld-2.26.so [.] _dl_update_slotinfo 3,66% a.out libevas.so.1.20.99 [.] _evas_event_object_list_raw_in_get_single.constprop.37 2,48% a.out libc-2.26.so [.] _int_malloc 1,93% a.out libc-2.26.so [.] _int_free 1,73% a.out module.so [.] _evas_gl_common_context_push 1,54% a.out libevas.so.1.20.99 [.] evas_object_recalc_clippees 1,52% a.out libeo.so.1.20.99 [.] _efl_object_event_callback_del 1,46% a.out libeo.so.1.20.99 [.] _efl_object_call_end 1,42% a.out ld-2.26.so [.] update_get_addr 1,36% a.out libedje.so.1.20.99 [.] _edje_part_recalc 1,34% a.out libeina.so.1.20.99 [.] eina_chained_mempool_free 1,28% a.out libeina.so.1.20.99 [.] _eina_chained_mempool_alloc_in 1,28% a.out libeo.so.1.20.99 [.] _efl_object_event_callback_call 1,25% a.out libedje.so.1.20.99 [.] _edje_part_recalc_single 1,21% a.out libc-2.26.so [.] cfree@GLIBC_2.2.5 1,21% a.out libeina.so.1.20.99 [.] eina_hash_find_by_hash 1,19% a.out ld-2.26.so [.] __tls_get_addr 1,15% a.out libeo.so.1.20.99 [.] efl_data_scope_get 1,13% a.out libevas.so.1.20.99 [.] _evas_canvas_efl_object_event_thaw 1,06% a.out libc-2.26.so [.] malloc_consolidate 1,02% a.out libeo.so.1.20.99 [.] efl_isa 0,95% a.out libeina.so.1.20.99 [.] eina_hash_free 0,90% a.out libeina.so.1.20.99 [.] eina_cow_write
With Software:
7,26% a.out libevas.so.1.20.99 [.] _evas_event_object_list_raw_in_get.part.10 5,45% a.out libeo.so.1.20.99 [.] _efl_object_call_resolve 3,97% a.out libevas.so.1.20.99 [.] _evas_event_object_list_raw_in_get_single.constprop.37 3,63% a.out ld-2.26.so [.] _dl_update_slotinfo 2,45% a.out libc-2.26.so [.] _int_malloc 1,75% a.out libc-2.26.so [.] _int_free 1,73% Eevas-thread-wk [kernel] [.] 0xffffffffaed0f2d7 1,68% Eevas-thread-wk libevas.so.1.20.99 [.] _op_blend_p_dp_mmx 1,67% Eevas-thread-wk [unknown] [.] 0xffffffffaed0f2d7 1,43% a.out libevas.so.1.20.99 [.] evas_object_recalc_clippees 1,31% a.out libedje.so.1.20.99 [.] _edje_part_recalc 1,28% a.out libeo.so.1.20.99 [.] _efl_object_event_callback_del 1,25% a.out libeo.so.1.20.99 [.] _efl_object_call_end 1,24% a.out ld-2.26.so [.] update_get_addr 1,20% a.out libeina.so.1.20.99 [.] eina_chained_mempool_free 1,18% a.out libedje.so.1.20.99 [.] _edje_part_recalc_single 1,17% a.out ld-2.26.so [.] __tls_get_addr 1,13% a.out libeina.so.1.20.99 [.] _eina_chained_mempool_alloc_in 1,10% a.out libevas.so.1.20.99 [.] _evas_canvas_efl_object_event_thaw 1,08% a.out libeo.so.1.20.99 [.] _efl_object_event_callback_call 1,06% a.out libeina.so.1.20.99 [.] eina_hash_find_by_hash 1,02% a.out libeo.so.1.20.99 [.] efl_data_scope_get 0,96% Eevas-thread-wk libevas.so.1.20.99 [.] evas_common_scale_rgba_sample_draw 0,95% a.out libc-2.26.so [.] cfree@GLIBC_2.2.5 0,93% a.out libeina.so.1.20.99 [.] eina_hash_free 0,85% a.out ld-2.26.so [.] __tls_get_addr_slow 0,81% a.out libeo.so.1.20.99 [.] efl_isa 0,80% a.out libeina.so.1.20.99 [.] eina_cow_write 0,77% Eevas-thread-wk libevas.so.1.20.99 [.] _op_copy_c_dp_mmx
It seems I was running all the tests with my mouse cursor over the genlist,
causing events to be raised (mouse,in/mouse,out) I guess. EFL master seems to
have a hard time with them. Results are slightly better without the mouse over,
but still far from what EFL 1.7 shows.
So my observation is that between EFL 1.7 and EFL master, we greatly lost in
perceived responsiveness, and are consuming significantly more CPU time.
But I don't really know _where_. Eo causes obvious slowdown, but it does not
seem to me it is the sole responsible of that.
Thanks for taking the time reading that. Please tell me if my way to collect
the data or my readings are incorrect; but I am convinced that what is shown
to the user appears less responsible between master and 1.7.
I'd love to see EFL master having a perceived responsiveness close to what EFL 1.7 offered.
@raster, @cedric what do you think of all that?