Page MenuHomePhabricator

enlightenment-0.21.2 and 0.20.9 odd behavior
Open, Pending on user inputPublic

Description

This happens on 0.21.x and 20.9
Fedora-24 system
I compiled Enligtenment-0.21.2 with efl 1.18

  • On enlightenment-0.21.2, keyboard is frozen. I have to wait few seconds (about 12 seconds) for

displaying what I type.

  • Clock (seconds) is frozen, then, suddenly restart, then stop again
  • these two problems disappear when moving mouse cursor
maderios created this task.Aug 22 2016, 6:21 AM
maderios renamed this task from enlightenement-0.21.2 and 0.20.9 odd behavior to enlightenment-0.21.2 and 0.20.9 odd behavior.

what gfx card/driver?

davek added a subscriber: davek.EditedAug 22 2016, 8:29 AM

If it helps, I've noticed something similar on my system. I've just compiled E 0.21.2 and EFL 1.18 tonight. Use a laptop with an Nvidia 670m. Debian Stretch system with Nvidia 352.79 drivers.

So far, to trigger the issue, I do the following:

  • Start up E
  • Then start terminology
  • Everything is fine at this point. Terminology cursor is blinking away happily.
  • Type one character.
  • The character appears, and Terminology freezes up. The clock on my Gkrellm app has also frozen.
  • Wait 4 -5 seconds. Then everything responds properly.

Some noticed behaviour:

  • Any new terminal windows respond fine afterwards.
  • Selecting Enlightenment -> Restart does not retrigger the issue. E has to be started from scratch.
In T4415#67309, @raster wrote:

what gfx card/driver?

intel HD Graphics 5500
driver: i915

@davek - this happens just once after start? while it's frozen you can't move any windows, pop up any menus etc. etc.? capturing a backtrace during this freeze would be nice. e.g. ssh in from another machine and do:

killall -SEGV enlightenment

exactly at that time...

@maderios ... im wondering if this is a vsync event issue for you. can you set the following environment variable prior to running e:

export ECORE_NO_VSYNC=1

e.g. from your ~/xsession or ~/.xinitrc or /etc/profile just launch e manually from a tty and set that env var... however you like...

In T4415#67388, @raster wrote:

capturing a backtrace during this freeze would be nice

OK, backtrace captured. Logged file uploaded:

My libxcb is version 1.11.1 .

@davek well.. that tells us a lot.. e is waiting on the xserver. it's seeing a mapping notify event which means keybaord key mapping have changed and it's refreshing its local mapping data from the server. the xserver is just not responding for a while. that's what that says. :( so at this point... i have to put off to "xserver issue - it should normally respond in like 0.1s or so" and mapping changes are not very common - they are rare (xmodmap for example) but do mostly happen on startup as peoples modmap files and so on are set up.

That's true. I do have an .Xmodmap file in my home dir, which would be triggering a call to the xmodmap command.

Is it strange that E is only noticing the mapping event once I press a key? Shouldn't it pick up on the event at the time xmodmap is run?

Otherwise, yep, seems it's an xserver issue.

In T4415#67389, @raster wrote:

@maderios ... im wondering if this is a vsync event issue for you. can you set the following environment variable prior to running e:

export ECORE_NO_VSYNC=1

e.g. from your ~/xsession or ~/.xinitrc or /etc/profile just launch e manually from a tty and set that env var... however you like...

I put it in my /home/user/.xsession then launch my x session normally with startx. It doesn't work, freeze is still there.

I don't know if it's useful, I produced backtrace

I have also seen this occasionally on my laptop's intel gpu, but only when running with the laptop display, normally I have a second monitor connected via HDMI and when that is the case its not triggered.

Also as there's not a clear description of the behavior @raster it isn't a complete lockup, e continues to run fine in the background its just the display is only updated when you move the mouse.

@davek - notice that the _ecore_x_event_handle_mapping_notify() begins the train of "lets update out mapping" that's a func that handles the x event saying there was a kbd remapping - so that's when the event came in... thats when x sent it... :/ we're simply responding to the event from the xserver when it arrives... :(

@maderios well ok. so its not vsync events. well ruled that one out. :)

@maderios well that backtrace says e is "waiting for stuff". timeouts, events etc. to come in... that's a normal state.

so the reason i asked about vsync is that ecore explicitly refuses to render frames faster than the animator tick. it times frame renders exactly to an animator tick. animator ticks by default come from the system clock and are timed at a framerate... 60fps for example is a default value there.

but with specific hardware we open /dev/dri/card0 and ask the gpu to give us vblank/vsync events directly timed to the screen refresh. there is whitelist of known working cards there and everyone else will fail and fall back to the above system clock. that's why i asked you to disable vsync to see if perhaps we were not getting vsync events and that was causing issues because ecore will just refuse to render an update at all until it gets an animator tick.

so that doesnt seem to be the issue... well now the debugging gets hard. the next point of call is if evas is at all doing a swapbuffers.

so first. using gl or software for e's compositor? i shall assume gl right now. is efl built for egl/gles or opengl/glx? if the first we care about EGLSwapBuffers or otherwise glXSwapBuffers.

there is no magic env var you can set to do this... so your best best is this. you will have to recompile efl. you will have to src/modules/evas/engines/gl_x11/evas_x_main.c -> open that file in an editor. the eng_outbuf_flush() function is the one you are interested in. that will call the appropriate swapbuffers command. so see the ''#ifdef'' GL_GLES - that section until ''#else'' is for egl/gles. the stuff from ''#else'' to the last ''#endif'' in that func is for glx/opengl. so if you are built for glx/opengl then add a:

static int frame = 0;
printf("SWAP %i\n", frame++);

right above glXSwapBuffers() (or if egl/gles right above glsym_eglSwapBuffersWithDamage() or eglSwapBuffers() - make the printfs different strings so you know which is happening).

now from your .xsession make enlightenment_start redirect stdout like:

enlightenment_start > ~/swaplog.txt

then see... do you get swaps with frame numbers happen while the screen isn't updating. simple: tail -f ~/swaplog.txt from a terminal in e... move mouse. do you see a whole bunch of swapbuffer prints with frame numbers appear at once after not moving mouse for a while? or does only 1 appear on the first mouse move then every move after that another line? basically i'm wondering if we're swapping buffers - which we should be, but somehow x is refusing to display them. i kind of have a hunch you have a problem over on the xserver side and i'm bisecting to see if we are trying to swap at all. if we are - then the issue lies inside your gl driver libraries or the xserver itself and whatever 2d driver it has and any options it may be given. if we are NOT swapping... then the issue lies higher up - next port of call - is evas_render() being called by ecore_evas at all? have to bisect and find where in the pipeline the issue is.

i have never seen this issue before. not on nvidia, nouveau, intel or radeon. on any of my machines. i suspect the issue stems from something that is different on your machine to any of mine... thus why i currently suspect something lower level. but let's see.

In T4415#67524, @raster wrote:

@maderios well that backtrace says e is "waiting for stuff". timeouts, events etc. to come in... that's a normal state.

so the reason i asked about vsync is that ecore explicitly refuses to render frames faster than the animator tick. it times frame renders exactly to an animator tick. animator ticks by default come from the system clock and are timed at a framerate... 60fps for example is a default value there.

but with specific hardware we open /dev/dri/card0 and ask the gpu to give us vblank/vsync events directly timed to the screen refresh. there is whitelist of known working cards there and everyone else will fail and fall back to the above system clock. that's why i asked you to disable vsync to see if perhaps we were not getting vsync events and that was causing issues because ecore will just refuse to render an update at all until it gets an animator tick.

so that doesnt seem to be the issue... well now the debugging gets hard. the next point of call is if evas is at all doing a swapbuffers.

so first. using gl or software for e's compositor? i shall assume gl right now. is efl built for egl/gles or opengl/glx? if the first we care about EGLSwapBuffers or otherwise glXSwapBuffers.

there is no magic env var you can set to do this... so your best best is this. you will have to recompile efl. you will have to src/modules/evas/engines/gl_x11/evas_x_main.c -> open that file in an editor. the eng_outbuf_flush() function is the one you are interested in. that will call the appropriate swapbuffers command. so see the ''#ifdef'' GL_GLES - that section until ''#else'' is for egl/gles. the stuff from ''#else'' to the last ''#endif'' in that func is for glx/opengl. so if you are built for glx/opengl then add a:

static int frame = 0;
printf("SWAP %i\n", frame++);

right above glXSwapBuffers() (or if egl/gles right above glsym_eglSwapBuffersWithDamage() or eglSwapBuffers() - make the printfs different strings so you know which is happening).

now from your .xsession make enlightenment_start redirect stdout like:

enlightenment_start > ~/swaplog.txt

then see... do you get swaps with frame numbers happen while the screen isn't updating. simple: tail -f ~/swaplog.txt from a terminal in e... move mouse. do you see a whole bunch of swapbuffer prints with frame numbers appear at once after not moving mouse for a while? or does only 1 appear on the first mouse move then every move after that another line? basically i'm wondering if we're swapping buffers - which we should be, but somehow x is refusing to display them. i kind of have a hunch you have a problem over on the xserver side and i'm bisecting to see if we are trying to swap at all. if we are - then the issue lies inside your gl driver libraries or the xserver itself and whatever 2d driver it has and any options it may be given. if we are NOT swapping... then the issue lies higher up - next port of call - is evas_render() being called by ecore_evas at all? have to bisect and find where in the pipeline the issue is.

i have never seen this issue before. not on nvidia, nouveau, intel or radeon. on any of my machines. i suspect the issue stems from something that is different on your machine to any of mine... thus why i currently suspect something lower level. but let's see.

I edit and recompiled efl-18. Recompiled e and terminology, too.
Start with fresh e installation, change conf software -> gl
I got this file

you were on software? wait - does the issue appear only when in gl? there is no log there.... that can't be right... wait is this in x11 or wayland mode?

I've seen it on X11, under GL as E keeps running and its only the screen that doesn't get refreshed a backtrace isn't going to help much unless you want to know the value of certain flags.

In T4415#67561, @raster wrote:

you were on software? wait - does the issue appear only when in gl? there is no log there.... that can't be right... wait is this in x11 or wayland mode?

Yesterday, after editing efl18 src evas_x_main.c , compiling, installing and configure e, I was on opengl and x11. It didn't work, it was frozen.
May be important: today, i updated my fedora 24 system, libevdev was updated http://koji.fedoraproject.org/koji/buildinfo?buildID=793461
I started e. It works normally, no freeze :)
Then i changed opengl conf to software: it doesn't work, freeze again.
Then I changed software conf to opengl: it works normally
For me, it seems it's software problem, not opengl.

eh? wtf? so...

before sw worked, and gl didn't.

after update...

gl works, sw doesn't.

right?

at this point i have not much to go on. the only difference with rendering engines is inside the actual rendering engine - everything above it including what triggers/begins a render is engine agnostic. right now the only thing that i can imagine it might be is something in xorg simply discarding renders ... why would it magically swap after a system upgrade where efl didn't change? right now i still suspect something inside the driver stack... efl is obviously not stuck. i have to assume its rendering because i see no good reason it wouldn't be. that's what the swap logs were about - you tested them when using sw not gl. do you get them with gl enabled? well obviously you do... it's rendering frames...

can you try with an older fedora and same efl? fedora24 under vmware/virtualbox/qemu? i might try under a vm if i find some time but i'm pretty busy...

In T4415#67584, @raster wrote:

eh? wtf? so...

before sw worked, and gl didn't.

Yesterday, before update, sw and gl didn't work

after update...

gl works, sw doesn't.

right?

Today, after update, gl works, sw doesn't, but I'm not sure update change something, maybe...

In T4415#67584, @raster wrote:

can you try with an older fedora and same efl? fedora24 under vmware/virtualbox/qemu? i might try under a vm if i find some time but i'm pretty busy...

Ok, I'll try with fedora 23.

I compiled efl 18 and e on a fedora 23 under virtualbox.
Software works normally. I can't use gl, it says e is not compiled with opengl but it is
with all mesa devel dependencies and '--with-opengl=full' option
May be virtualbox problem...

ok- but yeah. you said software has the issue anyway. so different gfx driver... and it works. also older fedora too...

i have never seen your issue and i've thought of what might cause it... but i'm pretty sure efl is rendering and something at the xserver level is just deciding not to display it or update unless the xserver gets a mouse move. that's my best guess right now. without logs from all the bits of evas that actually put up content to the screen - like the sw outbuf xshmputimage calls... it isn't 100% but i see no reason why this wouldnt be called on your box when it is everywhere else beyond perhaps the animator thing i spoke of before. either way before an update gl didnt update... then suddenly it did. you changed nothing in efl and you get swapbufs obviously. so it must have before too ... so that just makes me more suspicious that its something deep down in t he xorg 2d driver land. proving it is simply require knowing for sure that we are putting image pixels to the screen. i suspect you may want to file a bug with fedora maintainers.

thanks! let's keep this open for tracking for now.

May be interesting
I got same problem with Lxde and Openbox
No problem with Xfce4, Blackbox, Icewm and Cinnamon...

raster added a comment.Sep 3 2016, 6:10 PM

oh really? so different wm's and compositors hit this. smells much much much more like a lower level issue. not much we can do here other than identify it and provide whatever info we can as to likely triggers. i'm fresh out of anything we can do on this end... :)

ProhtMeyhet triaged this task as Pending on user input priority.Oct 24 2016, 4:30 PM