Page MenuHomePhabricator

efl-net valgrind errors
Closed, ResolvedPublic

zmike created this task.Jan 20 2017, 7:58 AM
barbieri reassigned this task from barbieri to cedric.Feb 6 2017, 9:14 AM
barbieri added a subscriber: barbieri.

Cedric, looking at the code and the backtrace:

==15191==  Address 0x21e31b68 is 168 bytes inside a block of size 256 free'd
==15191==    at 0x4C2ED4A: free (vg_replace_malloc.c:530)
==15191==    by 0xE80AD64: _eina_freeq_free_do (eina_freeq.c:118)
==15191==    by 0xE80B675: eina_freeq_ptr_add (eina_freeq.c:371)
==15191==    by 0xE58923F: eina_freeq_ptr_main_add (eina_freeq.h:315)
==15191==    by 0xE58C75A: _eo_free (eo.c:997)
==15191==    by 0xE589700: _efl_unref_internal (eo_private.h:345)
==15191==    by 0xE58E8FD: efl_unref (eo.c:1757)
==15191==    by 0xDBE560C: _efl_io_copier_job (efl_io_copier.c:129)
==15191==    by 0xDBEED2D: _efl_loop_future_success (efl_promise.c:112)
==15191==    by 0xDBEEFA3: _efl_loop_future_propagate (efl_promise.c:190)
..

_efl_io_copier_job() is, as the name implies, a function called from a job (promise), which call the user -- thus keep an extra ref the user is free to delete the copier (ie: it's done, delete) and this is postponed until efl_io_copier.c:129, that efl_unref(copier)
FROM inside the job, that is efl_future_link(copier, pd->job).

The job is still alive, but the copier and its pd of pd->job are GONE.

Nevertheless, the promise doesn't realize that and tries to null the linked pointer that is long gone:

==15191== Invalid write of size 8
==15191==    at 0xE592D00: _wref_destruct (eo_base_class.c:870)
==15191==    by 0xE594B66: _efl_object_destructor (eo_base_class.c:1813)
==15191==    by 0xE595F2A: efl_destructor (efl_object.eo.c:58)
==15191==    by 0xDBEF8DD: _efl_loop_future_efl_object_destructor (efl_promise.c:447)
==15191==    by 0xE595F2A: efl_destructor (efl_object.eo.c:58)
==15191==    by 0xE58931E: _efl_del_internal (eo_private.h:248)
==15191==    by 0xE5896E5: _efl_unref_internal (eo_private.h:323)
==15191==    by 0xE58E8FD: efl_unref (eo.c:1757)
==15191==    by 0xDBEEFEC: _efl_loop_future_propagate (efl_promise.c:200)

From this code:

static void
_efl_loop_future_propagate(Eo *obj, Efl_Loop_Future_Data *pd)
{
   Efl_Event ev;
   Eina_Bool cancel;

   ev.object = obj;
   cancel = pd->fulfilled && !pd->message;

   // This has to be done early on to avoid recursive success/failure to
   // bypass the fulfilled check.
   pd->fulfilled = EINA_TRUE;

   if (cancel)
     {
        _efl_loop_future_failure(&ev, pd, EINA_ERROR_FUTURE_CANCEL);
     }
   else if (pd->message->error == 0)
     {
        _efl_loop_future_success(&ev, pd, pd->message->value); // line 190: calls _efl_io_copier_job()
     }
   else
     {
        _efl_loop_future_failure(&ev, pd, pd->message->error);
     }

   if (!pd->delayed)
     {
        pd->delayed = EINA_TRUE;
        efl_unref(obj); // line 200: calls _wref_destruct() that results in invalid write
     }
}
cedric added a comment.Feb 6 2017, 9:55 AM

This is a duplicate of T5149 I think.

zmike added a comment.Feb 6 2017, 10:44 AM
In T5119#81253, @cedric wrote:

This is a duplicate of T5149 I think.

I think you mean T5149 is a duplicate of this

This is still an issue.

I just saw this in a valgrind trace today, current efl and enlightenment git master.

Can confirm this with Ephoto:
#0 0x00007ffff794eabf in send () at /lib64/libpthread.so.0
#1 0x00007ffff505b27a in _efl_net_socket_fd_efl_io_writer_write (o=0x8000009224ae323f, pd=<optimized out>, ro_slice=0x7fffffffdb30, remaining=0x0) at lib/ecore_con/efl_net_socket_fd.c:277
#2 0x00007ffff70572d4 in efl_io_writer_write (obj=0x8000009224ae323f, slice=slice@entry=0x7fffffffdb30, remaining=remaining@entry=0x0) at ../src/lib/efl/interfaces/efl_io_writer.eo.c:3
#3 0x00007ffff7292498 in _efl_io_copier_write (o=o@entry=0x80000094d4ae3241, pd=pd@entry=0x119f3a0)

at lib/ecore/efl_io_copier.c:297

#4 0x00007ffff7293ffb in _efl_io_copier_job (data=0x80000094d4ae3241, ev=<optimized out>)

at lib/ecore/efl_io_copier.c:111

#5 0x00007ffff7297d36 in _efl_loop_future_success (ev=ev@entry=0x7fffffffdc20, pd=pd@entry=0xff0530, value=0x80000094d4ae3241) at lib/ecore/efl_promise.c:112
#6 0x00007ffff7298b5f in _efl_loop_future_propagate (obj=0x800000b164ae3689, pd=0xff0530)

at lib/ecore/efl_promise.c:190

#7 0x00007ffff7298beb in _efl_promise_propagate (obj=0x8000009324ae3688, pd=0xff4220) at lib/ecore/efl_promise.c:578
#8 0x00007ffff7299980 in ecore_loop_promise_fulfill (obj=<optimized out>) at lib/ecore/efl_promise.c:591
#9 0x00007ffff728b6c5 in _ecore_main_loop_iterate_internal (once_only=once_only@entry=0)

at lib/ecore/ecore_main.c:2272

#10 0x00007ffff728be27 in ecore_main_loop_begin () at lib/ecore/ecore_main.c:1299
#11 0x00007ffff612e745 in elm_run () at lib/elementary/elm_main.c:1281
#12 0x000000000040956d in elm_main (argc=argc@entry=1, argv=argv@entry=0x7fffffffddf8) at ephoto.c:74
#13 0x000000000040934c in main (argc=1, argv=0x7fffffffddf8) at ephoto.c:93

zmike added a comment.Jul 10 2017, 6:57 AM

Still an issue.

@cedric, how is the rewrite/rework of the promise/future? I'm not following git that closely, but seems no work to make it usable and race free... if we want to push users to this new pattern, it must work reliably AND must be easy to work with (like in JS, where it's almost transparent...)

zmike added a comment.Jul 24 2017, 5:35 AM

I feel like if this isn't fixed before the 1.20 release then it never will be.

iscaro added a subscriber: iscaro.Aug 3 2017, 11:58 AM

This should be fixed by c61ac48f1a3647a3010392d44781388fcae6a2a2. However the same problem may occur if promises/futures are not reworked.

Hi, the reworked future/promise landed today and looks like it fix this issue. Could you take a look with recent GIT (post de4825a274e81dc3035c70b38c3086b6446cba19)

bu5hm4n closed this task as Resolved.Jun 11 2018, 3:02 AM
bu5hm4n added a subscriber: bu5hm4n.

I see this now as fixed as per the last comment, if otherwise, please reopen.