Page MenuHomePhabricator

Ecore_Exe seems to send spurious events
Closed, ResolvedPublic

Description

On EFL master (0f7c5582a4873711e0e086a8c725ffe899e29bcf), I observe that after a long time of execution of a program using ecore_exe to spawn another program that my handler ssociated to ECORE_EXE_EVENT_ADD is called at unexpected times, with the data parameter being NULL.

I reproduce this daily with eovim. When eovim starts, it creates ONE instance of the neovim process, and lives as long as the neovim process lives. In my workflow, eovim is still running before the night, and when I come back in the morning, I find eovim to be dead.

After leaving Eovim attached to GDB, I woke up to this backtrace:

#0  0x000055555555e2c7 in _nvim_added_cb ()
#1  0x00007ffff6f42ce9 in _ecore_event_message_handler_efl_loop_message_handler_message_call (obj=0x400000000e8a, pd=0x55555579cf00, message=0x40000018a4bb) at ../../src/lib/ecore/ecore_event_message_handler.c:335
#2  0x00007ffff6f488d5 in efl_loop_message_handler_message_call (obj=0x400000000e8a, message=0x40000018a4bb) at ../src/lib/ecore/efl_loop_message_handler.eo.c:14
#3  0x00007ffff6f44ea9 in _efl_loop_message_process (obj=<optimized out>, pd=0x555555795d10) at ../../src/lib/ecore/efl_loop.c:649
#4  0x00007ffff6f43f2d in efl_loop_message_process (obj=obj@entry=0x400000000287) at ../../src/lib/ecore/efl_loop.c:679
#5  0x00007ffff6f3f649 in _ecore_main_loop_iterate_internal (obj=obj@entry=0x400000000287, pd=pd@entry=0x555555795d10, once_only=once_only@entry=0) at ../../src/lib/ecore/ecore_main.c:2431
#6  0x00007ffff6f3ff85 in _ecore_main_loop_begin (obj=0x400000000287, pd=pd@entry=0x555555795d10) at ../../src/lib/ecore/ecore_main.c:1162
#7  0x00007ffff6f43e99 in _efl_loop_begin (obj=<optimized out>, pd=0x555555795d10) at ../../src/lib/ecore/efl_loop.c:85
#8  0x00007ffff6f4440d in efl_loop_begin (obj=0x400000000287) at ../src/lib/ecore/efl_loop.eo.c:44
#9  0x00007ffff6f40057 in ecore_main_loop_begin () at ../../src/lib/ecore/ecore_main.c:1235
#10 0x000055555555d84b in elm_main ()
#11 0x000055555555d90a in main ()
(gdb) f 1
#1  0x00007ffff6f42ce9 in _ecore_event_message_handler_efl_loop_message_handler_message_call (obj=0x400000000e8a, pd=0x55555579cf00, message=0x40000018a4bb) at ../../src/lib/ecore/ecore_event_message_handler.c:335
335                  if (!h->func(h->data, h->type, data)) break;
(gdb) list
330             pd->handlers_walking++;
331             EINA_INLIST_FOREACH(pd->handlers[type], h)
332               {
333                  if (h->delete_me) continue;
334                  handled++;
335                  if (!h->func(h->data, h->type, data)) break;
336               }
337             pd->handlers_walking--;
338             pd->current_event_data = NULL;
339             pd->current_event_type = -1;

And the data associated to the event is NULL:

(gdb) p *h
$2 = {__in_list = {next = 0x0, prev = 0x0, last = 0x5555558a39f0}, func = 0x55555555e2a8 <_nvim_added_cb>, data = 0x0, type = 10, delete_me = 0 '\000', to_add = 0 '\000'}

I can easily test for NULL in my handler as a bandaid, but I don't think this is the normal, expected behavior.

jayji created this task.Jan 10 2018, 11:41 PM
jayji updated the task description. (Show Details)
bu5hm4n triaged this task as High priority.Jun 11 2018, 1:39 AM
zmike edited projects, added Restricted Project; removed efl.Jun 11 2018, 6:53 AM
bu5hm4n edited projects, added efl: system integration; removed Restricted Project.Jun 11 2018, 7:44 AM
zmike lowered the priority of this task from High to Pending on user input.Jun 22 2018, 7:33 AM
zmike added a subscriber: zmike.

Can you verify whether this still happens? There have been some patches related to ecore things in the past 6 months...

jayji closed this task as Resolved.Aug 3 2018, 12:16 PM
jayji claimed this task.

This seems to have been fixed :)

jayji reopened this task as Open.Aug 11 2018, 2:28 AM

Actually, on EFL commit c57b3912a5f65632e17b023d9c12ae6e1fcc819a, I still have this problem.

Thread 1 "eovim" received signal SIGSEGV, Segmentation fault.
0x00005555555664bd in _nvim_added_cb (data=0x0, type=10, event=0x0) at /home/jean/e/eovim/src/nvim.c:239
239        INF("Process with PID %i was created", ecore_exe_pid_get(info->exe));
(gdb) p info (with const Ecore_Exe_Event_Add *const info = event;)
$1 = (const Ecore_Exe_Event_Add * const) 0x0
(gdb) bt
#0  0x00005555555664bd in _nvim_added_cb (data=0x0, type=10, event=0x0) at /home/jean/e/eovim/src/nvim.c:239
#1  0x00007ffff6e2d29d in _ecore_event_message_handler_efl_loop_message_handler_message_call (obj=0x40000000165e, pd=0x5555557c7da0, message=0x4000007064d5) at ../../src/lib/ecore/ecore_event_message_handler.c:359
#2  0x00007ffff6e3845d in efl_loop_message_handler_message_call (obj=0x40000000165e, message=0x4000007064d5) at ../src/lib/ecore/efl_loop_message_handler.eo.c:14
#3  0x00007ffff6e32b64 in _efl_loop_message_process (obj=0x400000000259, pd=0x5555557c0a00) at ../../src/lib/ecore/efl_loop.c:633
#4  0x00007ffff6e32d64 in efl_loop_message_process (obj=0x400000000259) at ../../src/lib/ecore/efl_loop.c:663
#5  0x00007ffff6e2abf9 in _ecore_main_loop_iterate_internal (obj=0x400000000259, pd=0x5555557c0a00, once_only=0) at ../../src/lib/ecore/ecore_main.c:2450
#6  0x00007ffff6e281c6 in _ecore_main_loop_begin (obj=0x400000000259, pd=0x5555557c0a00) at ../../src/lib/ecore/ecore_main.c:1183
#7  0x00007ffff6e311c6 in _efl_loop_begin (obj=0x400000000259, pd=0x5555557c0a00) at ../../src/lib/ecore/efl_loop.c:83
#8  0x00007ffff6e33ae5 in efl_loop_begin (obj=0x400000000259) at ../src/lib/ecore/efl_loop.eo.c:28
#9  0x00007ffff6e2830b in ecore_main_loop_begin () at ../../src/lib/ecore/ecore_main.c:1266
#10 0x00007ffff6359cc7 in elm_run () at ../../src/lib/elementary/elm_main.c:1315
#11 0x0000555555565415 in elm_main (argc=1, argv=0x7fffffffe288) at /home/jean/e/eovim/src/main.c:219
#12 0x00005555555655a3 in main (argc=1, argv=0x7fffffffe288) at /home/jean/e/eovim/src/main.c:237
zmike added a comment.Aug 12 2018, 8:06 AM

It would be interesting to know where/when/how this event is getting added.

It is pretty hard to reproduce, as it usually happens over night (but not regularly). I'll investigate the next core dump.

jayji added a comment.Aug 18 2018, 4:12 AM

It crashed again this night... but there is too much eo stuff... What I can see is that the data field of Efl_Loop_Message is NULL (that's the final NULL ptr)

EOLIAN static void
_ecore_event_message_handler_efl_loop_message_handler_message_call(Eo *obj, Ecore_Event_Message_Handler_Data *pd, Efl_Loop_Message *message)
{
   Handler *h;
   int type = -1;
   void *data = NULL, *free_func = NULL, *free_data = NULL;
   Ecore_End_Cb fn_free = NULL;
   Eina_List *l, *l2;
   int handled = 0;

   // call legacy handlers which are controled by this class' custom api
   ecore_event_message_data_steal
     (message, &type, &data, &free_func, &free_data); // <----- data is NULL

which comes from:

EOLIAN static Eina_Bool
_efl_loop_message_process(Eo *obj, Efl_Loop_Data *pd)
{
   if (!pd->message_queue) return EINA_FALSE;
   pd->message_walking++;
   _ecore_event_filters_call(obj, pd);
   while (pd->message_queue)
     {
        Message *msg = (Message *)pd->message_queue;
        if (!msg->delete_me)
          efl_loop_message_handler_message_call(msg->handler, msg->message); // <--- msg->message contains NULL

So the message_queue is fed with an invalid message. I've placed asserts in the efl code. I'll wait for them to fail.

jayji added a comment.Aug 21 2018, 1:24 PM

Well, it seems that

static void
_cb_read(void *data EINA_UNUSED, const Efl_Event *event)
{
   Eo *loop = efl_provider_find(event->object, EFL_LOOP_CLASS);
   int fd = efl_loop_handler_fd_get(event->object);
   char buf[8];

   if (read(fd, buf, sizeof(buf)) >= 0) return;
   DBG("system clock changed");
   // XXX: ecore event needs to be eo loop api's
   ecore_event_add(ECORE_EVENT_SYSTEM_TIMEDATE_CHANGED, NULL, NULL, NULL);
   _ecore_main_timechanges_stop(loop);
   _ecore_main_timechanges_start(loop);
}

is called, with ECORE_EVENT_SYSTEM_TIMEDATE_CHANGED being the same value than ECORE_EXE_EVENT_ADD .... I'm pretty sure this should not happen...

jayji added a comment.Aug 21 2018, 1:42 PM

Oh wait... ecore code seems wrong

// init some core legacy event types in t he same order and numbering as before
// ECORE_EVENT_NONE                     0
// no need to do as ev types start at 1

no no, it starts at -1.

I'd add a call to

ecore_event_message_handler_type_new(_event_msg_handler);

in _ecore_event_init()

zmike added a comment.Aug 21 2018, 1:45 PM

No, this should be done in the constructor, which does the initialization to -1 instead of 0.

zmike added a comment.Aug 21 2018, 1:47 PM

Although now that I'm testing a patch to do this, I'm getting an immediate crash on init?

zmike added a comment.Aug 21 2018, 1:50 PM

Okay, like an idiot I believed the comment that I found. @jayji adding that extra call (and removing the wrong comment) does resolve it. If you submit a patch for this I'll review.

Mhh,.. I'm actually not quite sure why just changing the initialization value in the constructor from -1 to 0 is not the way to go.

diff --git a/src/lib/ecore/ecore_event_message_handler.c b/src/lib/ecore/ecore_event_message_handler.c
index 1b593d0771..f5498832f9 100644
--- a/src/lib/ecore/ecore_event_message_handler.c
+++ b/src/lib/ecore/ecore_event_message_handler.c
@@ -290,7 +290,7 @@ EOLIAN static Efl_Object *
 _ecore_event_message_handler_efl_object_constructor(Eo *obj, Ecore_Event_Message_Handler_Data *pd)
 {
    obj = efl_constructor(efl_super(obj, MY_CLASS));
-   pd->event_type_count = -1;
+   pd->event_type_count = 0;
    pd->current_event_type = -1;
    return obj;
 }

That's what I tried when I first discovered it and I'm getting constant crashes. I suppose it's possible that it could somehow be local, but I did a clean reinstall...

Mhh.. it works for me. I've made a patch. Hopefully it will solve the problem.