Page MenuHomePhabricator

On Plasma desktop terminology does not always start
Open, Incoming QueuePublic

Description

On my system when I open a new instance of terminology I get a sort of ghost window instead of an actual window rendering, the only indications of it is the blur effect underneath the window decoration is still applied and the task is listed in the task bar, oddly the thumbnail accurately shows the window along with the blinking cursor. I've attached a video clip demonstrating the issue (top right portion). My system is running Manjaro Plasma edition. There are no errors in a from Terminology if launched from another terminal and the issue occurs. I'm not sure if this is directly a bug with Terminology, or possibly some odd interaction with kwin and efl.

Related Objects

lunarfyre7 updated the task description. (Show Details)Jul 30 2019, 2:10 AM
lunarfyre7 updated the task description. (Show Details)Jul 30 2019, 2:13 AM

Which version of Terminology are you using?

lunarfyre7 added a comment.EditedJul 30 2019, 4:04 PM

I'm using the AUR git package, the exact build is 1.5.0.r2427.413b879-1
The same issue occurs with the stable package too (v1.4.0-1).

Do you have the same issue if you're not using a transparent background in Terminology?

Wow, disabling transparency fixes it. Oddly I also notice when I enable background blur for the entire window in kwin the issue seems to be exacerbated.

@raster: do you have any idea?

raster added a comment.Aug 1 2019, 1:39 AM

ok. brainstorming here:

efl won't be querying what wm you have but there is a small bit of logic when alpha_set() is called on a window to enable alpha to see if you have a composited screen and if NOT... we fall back to a shaped window. this tests the selection owner of _NET_WM_CM_SXXX where XXX is the screen number (almost always just 0 - in fact only ever 0 for efl). it'll do the same dumb thing every single time if the setup is the same. i don't see how this can randomly detect then not detect a compositor each time but ecore_x_screen_is_composited() is the function in ecore_x that checks for this and returns true/false. i am going to assume for now it always returns true for you unless something truly bizarre is going on. i've not seen this fail before other than than the obvious - no compositor, or compositor didn't go own the right selection etc., or it's a race (terminology/efl app starting WHILE compositor/wm starts and there is a race as to if the query happens before the selection own).

i am assuming this is x11 for now btw.

in x11 efl has a white list of gpu's to get vblank events from to tick the animator from. with no animator events from drm there will be no rendering/updates. but that is not going to be plasma based but drm driver based decisions. if the drm driver is not on the whitelist then it'll use ye olde timeout based rendering at 60fps by default (can be modified by env vars). ecore_x_vsync.c contains the whitelist. do:

export ECORE_VSYNC_DRM_VERSION_DEBUG=1

then run terminology to get debug output to see what it thinks. this will not depend on alpha windows though so it's probably not related. for alpha windows all we do is select the right argb visual for the window via xrender and render a destination alpha channel. be it software or gl. i assume you're software rendering as you didn't mention switching to gl via env var or elementary_config. there isn't any magic here really - efl has done this now over a decade - been able ot render alpha channels. we have been doing it to windows with destination alpha channels for a little less than that, but long enough.

now if this is efl somehow missing an expose event (i doubt that very much), then just hitting enter or doing anything to cause window redraw to need to happen will cause some content to appear. (enter in the empty/blank frame). focus in/out of window will also cause that and i see no redraw.

efl does say it handled the netwm sync request protocol. this is used to get some sync between wm and client on resizes to reduce resize glitches. i haven't seen issues with this, but to eliminate it as an option you'd need to modify some source code - remove/comment out the protos[num++] = ECORE_X_ATOM_NET_WM_SYNC_REQUEST line in _ecore_evas_x_protocols_set() in ecore_evas_x.c in the ecore_evas x module. there. we have some ancient sync thing that e once supported between wm and efl but e doesn't do that anymore (has not for years). ecore_evas still has the support but i assume that is not being activated as we never documented it and it was efl specific. (as an aside i think i'll remove it from efl as it's been unused for so long... :) less bloat from history anyway).

so ... either it's the netwm sync protocol and someone getting it wrong (kwin or efl), something blocking animators entirely (a whitelisted drm device just not providing vlbank events ... we DO have fallbacks and timeouts and what not to try work around broken devices so i doubt this). or...

it's a kwin OR xorg/driver bug that somehow is just not picking up the right texture from pixmap stuff. creating and egl or glx image/pixmap is failing or a failure being ignored? i know for sure on some drivers we see eglcreatimage issues for some windows and it actually errors out (on the compositor side).

if you resize the window does it suddenly render content then?

now ... is this a wayland issue? i think not as you still have a window border. unless its x client via xwayland and kwin being a wayland compositor ... in which case it's quite possible it's a xorg/driver or kwin bug there like above. forcing efl to render via gl could help identify this. if you go to elementary_config and select rendering settings (under the more menu unless you resize the window), and select anything but "no acceleration". ... next efl app run will use gl then. that'll apply to terminology of course... but since i don't see CSD here i am dismissing this as possibly a wayland issue.

so some ideas... :) my gut tells me it's probably on the xorg/kwin side right now as the efl side here is pretty simple and just renders pixels to a window. the key places it could go wrong i listed above but as it's random and only to do with alpha it seems, i doubt that is the case right now (but could be wrong - just going via probability here).

lunarfyre7 added a comment.EditedAug 1 2019, 3:27 PM

i am assuming this is x11 for now btw.

Yep, this is on X11. I'll test it on wayland as well too.

now if this is efl somehow missing an expose event (i doubt that very much), then just hitting enter or doing anything to cause window redraw to need to happen will cause some content to appear. (enter in the empty/blank frame). focus in/out of window will also cause that and i see no redraw.

Resizing doesn't change it, also it doesn't accept any input, and mouse clicks seem to pass through to what's behind it. It also seems to interfere with other applications focus, intermittently other open windows cannot receive interactions either.

This is the output from terminology after setting ECORE_VSYNC_DRM_VERSION_DEBUG, it's the same whether or not the window opens correctly.

➜  ~ terminology
!BROKEN DRM! Do FIXUP of ABI
DRM Version: 1.6
Name:        'i915'
Date:        '20190417'
Desc:        'Intel Graphics'
Whitelisted i915 OK

i assume you're software rendering as you didn't mention switching to gl via env var or elementary_config. there isn't any magic here really - efl has done this now over a decade - been able ot render alpha channels. we have been doing it to windows with destination alpha channels for a little less than that, but long enough.

It was using the default software rendering, when I switch to opengl it consistently behaves correctly.

efl does say it handled the netwm sync request protocol. this is used to get some sync between wm and client on resizes to reduce resize glitches. i haven't seen issues with this, but to eliminate it as an option you'd need to modify some source code - remove/comment out the protos[num++] = ECORE_X_ATOM_NET_WM_SYNC_REQUEST line in _ecore_evas_x_protocols_set() in ecore_evas_x.c in the ecore_evas x module. there. we have some ancient sync thing that e once supported between wm and efl but e doesn't do that anymore (has not for years). ecore_evas still has the support but i assume that is not being activated as we never documented it and it was efl specific. (as an aside i think i'll remove it from efl as it's been unused for so long... :) less bloat from history anyway).

I'll try building Terminology from source with that change and see what happens.

raster added a comment.Aug 1 2019, 3:32 PM

ok. so intel gfx... broken drm is quite normal ... had to work around structs not matching the memory layouts defined in the drm headers - figured that out years ago :)

you'll have to build efl to change it and disable the newtm sync request ... but that's about the only thing i can think of...

unless its the async rendering like totally hard blocking and thus never finishing a render?

export ECORE_EVAS_FORCE_SYNC_RENDER=1

and run terminology to disable async threaded sw rendering... i HAVE seems xlib bugs in the past regarding running from a thread. we do do xshmputimage from a thread when async rendering. setting that env var forces rendering to be single threaded in the main loop... the only way you'll know though is have it async render and inspect threads to see which one is the render thread and see if it's hung on something with gdb... you'll need debug syms too there in efl :) i have my doubts it's this ... but... :)