Page MenuHomePhabricator

EFM sometimes fails to show files in any folder
Closed, ResolvedPublic

Description

I encountered this yesterday.

I open EFM and ~/home/yomi is just empty.
This also affects the Shot's Save dialogue.)
{F11193 size="full"}

abyomi0 created this task.Sep 1 2014, 11:45 AM
abyomi0 updated the task description. (Show Details)
abyomi0 raised the priority of this task from to Incoming Queue.
abyomi0 assigned this task to zmike.
abyomi0 added a project: enlightenment-git.
abyomi0 changed the visibility from "All Users" to "Public (No Login Required)".
abyomi0 added a subscriber: abyomi0.
abyomi0 triaged this task as Normal priority.Sep 3 2014, 12:37 PM

I seem to have forgotten to mention that it happens sporadically.
It only seems limited to EFM.

0.19.0.18837.06cb4fe-1

abyomi0 renamed this task from EFM fails to show files in any folder to EFM sometimes fails to show files in any folder.Sep 16 2014, 8:59 AM
hnaparst added a subscriber: hnaparst.EditedSep 16 2014, 11:22 PM

We are trying E19 over at Gentoo, and this was the issue that popped up as the most serious problem. We also notice that no files show up on the desktop.

I would humbly suggest that the priority of this issue be changed to at least "high", or possibly "showstopper," since it is unlikely that anyone will use a window manager without being able to see files on the desktop or use a file manager.

So I think the problem is a combination of configuration options. I haven't totally nailed it down which configure option is causing the problem, but I seem to have flipped enough options to get it working.

Which options would those be?

The only thing I have set is Show Icons on Desktop which is disabled. But EFM still works.

I've also updated since, and I haven't see this happen again...but like I said before, it's probably intermittent.

hnaparst added a comment.EditedSep 17 2014, 9:36 AM

Here are the (efl config) options that allow EFM to work, but cause warnings while compiling efl.
If you flip the options so that there are no warnings, then EFM doesn't work anymore.



You have chosen to disable physics support. This disables lots of
core functionality and is effectively never tested. You are going
to find features that suddenly don't work and as a result cause
a series of breakages. This is simply not tested so you are on
your own in terms of ensuring everything works if you do this



Fribidi is used for handling right-to-left text (like Arabic,
Hebrew, Farsi, Persian etc.) and is very likely not a feature
you want to disable unless you know for absolute certain you
will never encounter and have to display such scripts. Also
note that we don't test with fribidi disabled so you may also
trigger code paths with bugs that are never normally used.



You disabled Gstreamer 1.x support. You likely don't want to do
this as it will heavily limit your media support options and render
some functionality as useless, leading to visible application bugs.



You disabled audio support in Ecore. This is not tested and may
Create bugs for you due to it creating untested code paths.
Reconsider disabling audio.



You have disabled xinput2 support. This means a whole lot of input
devices in X11 will not work correctly. You likely do not want to
do this.



You disabled XIM input method support. This is the most basic and
core input method protocol supported in X11 and you almost certainly
want the support for it. Input methods allow for complex text input
like for Chinese, Japanese and Korean as well as virtual keyboards
on touch/mobile devices.



SCIM is a modern and very common input method framework and you
disabled support for it. You very likely want the support for
complex language input, so please reconsider this. Input methods
allow for complex text input like for Chinese, Japanese and Korean
as well as virtual keyboards on touch/mobile devices.



Multisense has been disabled. This causes Edje audio suport to
Simply not work, and will break applications and libraries
that rely on it with users then reporting bugs.
If you want to mute audio, there are APIs and policies to do
that, as opposed to compiling it out.



I got that far, but then I got this warning :

--enable-i-really-know-what-i-am-doing-and-that-this-will-probably-break-things-and-i-will-fix-them-myself-and-send-patches-aba

As Clint Eastwood would say, "So now you have to ask yourself: Do you know what you are doing? Well, do you?"

In a way, we're dealing with two different things.

You ran into this by enabling things in EFL, I only seem to have run across this in git.

sera added a subscriber: sera.Oct 29 2014, 3:49 AM

I've got a system which exhibits the same problem.

A few observations:
enlightenment_f gets oom killed after having eaten 16GB RAM about a minute after boot
building efl with --with-profile=debug instead of release works around the issue (much less leaking observed)
disabling eeze on enlightenment works around the issue

sera added a comment.Nov 13 2014, 6:44 AM

Had some time to look into it again. I noticed that just restarting E let's efm display icons afterwards.

Attached valgrind and strace output that should help figuring out what is going wrong here.

Note the socket it fails to connect to does exist.

# ls -l /tmp/.ecore_service\|eeze_scanner\|0
srwxrwxrwx 1 root sera 0 Nov 13 14:41 /tmp/.ecore_service|eeze_scanner|0
sera added a comment.Nov 13 2014, 6:51 AM

Attaching doesn't work as expected. Stupid drag& drop. Even more so as this is a bug about a broken filemanager ;)

strace: https://bpaste.net/show/18766a13f5b9
valgrind: https://bpaste.net/show/6f8ac5b3c7b8

output of strace -f enlightenment_fm

socket(PF_LOCAL, SOCK_STREAM, 0) = -1 EMFILE (Too many open files)
socket(PF_LOCAL, SOCK_STREAM, 0) = -1 EMFILE (Too many open files)
socket(PF_LOCAL, SOCK_STREAM, 0) = -1 EMFILE (Too many open files)

(repeating...)

I think the best way to test is to use my workaround (chmod -x /usr/lib64/enlightenment/utils/enlightenment_fm), copying over to a temporary location and running from there.

Also I formerly didn't know there are two different binaries, /usr/lib64/enlightenment/utils/enlightenment_fm and /usr/bin/enlightenment_filemanager. That means T1864 is really a 100% dupe (but I don't know how to close it as a duplicate).

Also, *whether* it happens depends on *how* I run it. For example running in gdb leads to the condition, while running in gdb with breakpoint on socket() doesn't. Looks like some sort of race conditions.

#0 0x00007ffff67bdfb7 in socket () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007ffff758fea8 in ?? () from /usr/lib64/libecore_con.so.1
#2 0x00007ffff75850d2 in ?? () from /usr/lib64/libecore_con.so.1
#3 0x00007ffff5c53600 in eo_finalize () from /usr/lib64/libeo.so.1
#4 0x00007ffff7587087 in ecore_con_server_connect () from /usr/lib64/libecore_con.so.1
#5 0x0000000000409c20 in _scanner_poll (data=0x0) at src/bin/e_fm/e_fm_main_eeze.c:599
#6 0x0000000000409fc5 in _scanner_disc (data=0x0, type=19, ev=0x4bf4dd08) at src/bin/e_fm/e_fm_main_eeze.c:681
#7 0x00007ffff7138cbc in ?? () from /usr/lib64/libecore.so.1
#8 0x00007ffff713fd79 in ?? () from /usr/lib64/libecore.so.1
#9 0x00007ffff7140047 in ecore_main_loop_begin () from /usr/lib64/libecore.so.1
#10 0x0000000000404750 in main (argc=1, argv=0x7fffffffdd88) at src/bin/e_fm/e_fm_main.c:147

This will be better:

#0 0x00007ffff677ffb7 in socket () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007ffff75807a1 in ecore_con_local_connect (obj=0x8004b76f4025b46a, cb_done=0x7ffff7570859 <_ecore_con_cl_handler>, data=0x8004b76f4025b46a) at lib/ecore_con/ecore_con_local.c:138
#2 0x00007ffff756c036 in _ecore_con_connector_eo_base_finalize (obj=0x8004b76f4025b46a, pd=0x0) at lib/ecore_con/ecore_con.c:509
#3 0x00007ffff5c0c4fd in eo_finalize () at lib/eo/eo_base.eo.c:62
#4 0x00007ffff756bcb3 in ecore_con_server_connect (compl_type=ECORE_CON_LOCAL_SYSTEM, name=0x40cce1 "eeze_scanner", port=0, data=0x0) at lib/ecore_con/ecore_con.c:446
#5 0x0000000000409c20 in _scanner_poll (data=0x0) at src/bin/e_fm/e_fm_main_eeze.c:599
#6 0x0000000000409fc5 in _scanner_disc (data=0x0, type=19, ev=0x5b9858a0) at src/bin/e_fm/e_fm_main_eeze.c:681
#7 0x00007ffff711ad43 in _ecore_call_handler_cb (func=0x409f7b <_scanner_disc>, data=0x0, type=19, event=0x5b9858a0) at lib/ecore/ecore_private.h:359
#8 0x00007ffff711bcdf in _ecore_event_call () at lib/ecore/ecore_events.c:562
#9 0x00007ffff712580b in _ecore_main_loop_iterate_internal (once_only=0) at lib/ecore/ecore_main.c:1942
#10 0x00007ffff7123ad4 in ecore_main_loop_begin () at lib/ecore/ecore_main.c:983
#11 0x0000000000404750 in main (argc=1, argv=0x7fffffffdd88) at src/bin/e_fm/e_fm_main.c:147

#4 0x0000000000409c20 in _scanner_poll (data=0x0) at src/bin/e_fm/e_fm_main_eeze.c:599
599 svr = ecore_con_server_connect(ECORE_CON_LOCAL_SYSTEM, "eeze_scanner", 0, NULL);

So there's definitely a race condition somewhere at the startup of enlightenment_fm. If that race condition gets triggered, then at some point of time, _scanner_disc gets called as if in an endless loop. Using the "finish" command shows that _ecore_event_call is the function that never returns.

As I understand the function, it's responsible for dispatching a list of events or something like that, so when all of those events are dispatched, the loop(s) should break and it should return. But when this bug occurs, it never reaches the state that would cause it to return.

abyomi0 added a subscriber: zmike.Dec 1 2014, 2:28 AM

◀ Merged tasks: T1864.

abyomi0 raised the priority of this task from Normal to High.Dec 1 2014, 2:30 AM
pavlix added a comment.EditedDec 1 2014, 3:18 AM

677 static Eina_Bool
678 _scanner_disc(void *data UNUSED, int type UNUSED, Ecore_Con_Event_Server_Del *ev UNUSED)
679 {
680 INF("Scanner disconnected");
681 if (_scanner_poll(NULL))
682 _scanner_run();
683 return ECORE_CALLBACK_RENEW;
684 }

This is almost certainly wrong. It seems to always return ECORE_CALLBACK_RENEW causing the infinite loop. Even abort() would be better than that!

zmike edited this Maniphest Task.Dec 5 2014, 10:47 AM
zmike edited this Maniphest Task.
zmike edited this Maniphest Task.Dec 5 2014, 10:55 AM
zmike edited this Maniphest Task.
zmike closed this task as Resolved.Dec 5 2014, 10:56 AM

This should work now.

zmike edited this Maniphest Task.Dec 5 2014, 11:03 AM
zmike edited this Maniphest Task.
zmike edited this Maniphest Task.
zmike edited this Maniphest Task.
zmike edited this Maniphest Task.
zmike edited this Maniphest Task.
zmike edited this Maniphest Task.
zmike edited this Maniphest Task.

Will this get fixed for E19? Is there a branch with a fix or at least a patch that applies to Enlightenment 0.19.1? I would like to provide a fully working setup for the users of my E19 ebuild for Gentoo.

See:

https://wiki.gentoo.org/wiki/Enlightenment

Answering myself. So I see that the commits were backported to E19 and that a new release has been published. I updated the ebuild and the issue is hopefully over.