Page MenuHomePhabricator

efl + weston and efl + framerate is really bad
Open, HighPublic

Description

have you guys noticed how bad framerate is? i see it on everything from intel to rpi to xu3/4 to nouveau. we basically manage 30fps... sometimes less. on weston we're like 10fps.

something is seriously wrong somewhere. it's not a lack of cpu or other resources as i see it on high powered gpu's too. it's some event/timing/animator thing... we should be pulling 60fps. my favorite test is:

ELM_ACCEL=gl elementary_test -to animation

that should do 60fps. everywhere. at least in e with partial rendering which i know we do it should be... we can move a window around at 60hz without breaking a sweat on the xu3 ... or on nouveau or intel, but a client rendering just doesn't get even close. it generally is about half that and sometimes pulls up to 60fps for short periods, then back down.

raster created this task.Aug 5 2017, 9:59 PM

This should make ticking much better... There are some missed opportunities for optimization (window show should be able to re-apply the previous buffer for sw render instead of doing a full frame rerender..)

It may also light up bugs in previously dark corners of the animator code...

I'm disappearing for vacation for a week, so won't be able to follow up on the big annoyance this exposed in E - I'm doing frame callbacks wrong for commits with 0 damage there, so we can end up burning more cpu than required while animating (under weston we don't chew cpu)

That will be the first thing I fix when I return.

raster reopened this task as Open.Aug 18 2017, 9:55 PM

i see absolutely no improvement - at least on raspberry pi ... :(

Our rpi problems are dominated by CMA allocations triggered by not using VBOs properly.

The "event/timing/animator thing" should be resolved, and sensible architectures should have steadier frame rates now. On an intel box i see our frame callback complete, our surface commit a ms or 2 later, and everything's beautifully smooth.

Odroid's still a mess, but I can't even get 30fps moving the mouse cursor under weston on my odroid right now unless I set repaint-window=15

With repaint window set to 15 I see us taking 10ms from frame callback to commit, but the next frame callback misses the refresh - and the mouse cursor skips. So it's a ridiculous amount of time for the client to get to swapbuffers, but it still makes the cut-off by a few ms - but I guess the frame's not complete when the compositor tries to source it as texture and we stall.

btw, we only sort-of get partial updates on odroid, since their drivers (at least the ones I'm running...) don't use wl_surface.damage_buffer. Every commit posts full surface damage to the compositor, even if the client only re-rendered part of the frame.

tl;dr. shouldn't be a timing issue anymore, it now looks like it's our gl code, not our wayland code, that's causing performance issues on low performance systems.

i've been meaning to respond but i can only sensibly test at home. FYI things have gone downhill on rpi badly. and we're still a bug fest of pain. especially on rpi.

(this whole post is specifically about rpi3)
btw, you probably need to set
[core]
repaint-window=15

in your weston.ini file to give apps a little longer to do their thing... I find this lets me drag weston-terminal windows around on weston without chop on odroid.

I'm seeing some dumb stuff going on in EFL (GL_INVALID_VALUE in glDisableVertexAttribArray) - the code looks obviously weird in efl, it seems to assume all vert programs have the same number of variables. But the weird thing is that i'm seeing that debug spew in differing quantities per frame, as if some frames are much more complicated than others when it seems they shouldn't be (elementary_test -to animation)

When monitoring the weston side of the connection, I'm seeing pretty horrible things happen - EFL client sends a commit, and then 100+ms later the compositor sends the next frame event. I'm wondering if there's a weston regression here. I'm also seeing elementary_test stuck on screen after close, so there's definitely something wrong in latest weston from git.

I'm seeing better performance with E than weston right now, which surprises me. Under E I'm seeing a solid 30fps.
Also, under weston if I use sw render and shrink the window I can easily hit 60fps. Our timing seems correct there.
As it does with GL, where I can view the compositor side logs and see that we submit a new commit quickly after getting a frame callback.

Oddly, I'm seeing less slow bo allocations than previously, and more consistent framerates. This is not expected from any efl change I'm aware of, and may be a mesa optimization? Still do get the occasional ridiculously long stall though.

Not sure why things fall apart so dramatically under weston on rpi3, will look at some apitraces next week, maybe we're doing something differently (different visual type?) when run under different compositors.

There are definitely some pretty hilarious bugs when running with E_USE_HARDWARE_PLANES=1 but that's all off by default for good reason.

raster added a comment.Sep 3 2017, 8:06 PM

oh wait. you mean gldrawarrays() with client side memory vs using vbo's. we have vbo code. you can turn it on:

export EVAS_GL_MAPBUFFER=1

it never has shown itself to be faster... i event once spent some time making a vbo buffer pool to allow the gpu to be using a buffer while we fill another in parallel (like double/triple buffering but with a generic pool which tries to keep oldest buffers re-used first to avoid blocking). i had a branch with that locally. here are the patches:

P202
P203

i never had a use for them as they just added complexity and i saw no performance benefits on the drivers i tested... but i didn't try the rpi... i think this is before i was really testing on my rpi.

you can try the export environment var above as a quick and dirty first step (i can too but only at home and at the office all week...) in theory with the above 2 patches (fixed for conflicts and whatever) it should have the best non-blocking behavior possible (i found no way to figure out if a buffer is still in use by driver or not - i would have used this to try re-use a buffer but if queued or in use then create a new one and down-size the pool of unused spare buffers once it gets too big or every now and again).

raster added a comment.Sep 4 2017, 6:11 AM

ok.

export EVAS_GL_MAPBUFFER=1

consistently between 50 to 60fps. i think the above vbo pool might do even better *IF* the vc4 drivers actually do async process the vbo's. but e i still only using 40-50%cpu (and elm_test 20-25%). so we're not cpu bound...

This seems to be unrelated to Enlightenment.

bu5hm4n triaged this task as High priority.Jun 10 2018, 12:07 PM
bu5hm4n moved this task from Backlog to rendering on the efl board.Jun 10 2018, 12:32 PM
zmike edited projects, added Restricted Project; removed efl.Jun 11 2018, 6:51 AM
zmike edited projects, added efl: display system; removed Restricted Project.Jun 11 2018, 9:13 AM