Page MenuHomePhabricator

Promise problem
Closed, ResolvedPublic

Description

I was tracking down some wref destruction that access illigal memory.

Here is what happens:

  • efl_future_use(&pd->job, efl_loop_job(efl_loop_get(o), o));
  • the object where this was called in gets destroyed
  • efl_future_cancel gets called, everything is canceled
  • object gets completly destroyed
  • promise gets destroyed. -> wref's are deleted -> invalid memory.

Now we have a few possibilities.

  • Adding the storage field to the api, so efl_future_cancel can remove the wref. (possibly sucks for bindings)
  • Telling the user to remove the wref itself (thats the worst solution imo)
  • Add a macro just like use, which calls cancel and removes the wref.

like

static inline void
efl_future_unuse(Efl_Future **storage)
{
   efl_future_cancel(strorage);
   efl_wref_del(*storage, storage)
}

But that leaves the correct usage to the user, i dont really know what is better ...

Related Objects

bu5hm4n created this task.Feb 5 2017, 4:23 AM
bu5hm4n updated the task description. (Show Details)
cedric added a comment.Feb 5 2017, 8:42 AM

Cancel is supposed to destroy all wref right away. I am wondering if that's not a side effect of my change to delay the result propagation.away this is a bug, if the ref is not destroyed by cancel, there is an unlimited amount of bug.

So after thinking about it during the weekend, I believe the issue is that when someone is holding a reference to a future, it prevent it from being destroyed before leaving the first call to efl_future_cancel. From there, it can only get more borked. The solution is to not rely on Eo for handling wref at all and implement them completely inside efl_future. Will look at that later today.

stefan_schmidt triaged this task as High priority.Feb 10 2017, 6:52 AM
jpeg added a subscriber: jpeg.Feb 27 2017, 8:19 PM

The last I heard from @cedric about this was that his fix made problems to the cxx bindings. @felipealmeida could you work with @cedric to solve this?

I've tried to look today, but didn't have time. I'll look tomorrow and the weekend on this.

I've added the following test which is what basically the C++ tests does and fails. I don't think the test should fail in C nor C++, so maybe you (@cedric) can see what is making the promise getting deleted in that particular case?
The test used the efl_future_optional_set that i've implemented, but on top of your new branch it doesn't work either with optional_set or without, probably because the efl_future_optional_set is not working correctly because
you added more cases where optional is and isn't.

Regards,

cedric added a comment.Mar 6 2017, 2:51 PM

I think efl_future_optional_set is trying to work around the center of the problem. So once I start fixing the issue with wref, there is an issue with normal ref that shows up. The logical solution which is actually obvious is that efl normal object and efl future do not have the same life cycle. We do need to also override ref, but that isn't possible because ref is not an eo function yet.

Still I feel like we are more and more trying to fix the core problem which is that we have put efl futures on top of efl object with no good technical reason and the fallout of thousand of bugs to hunt. We can try to continue this path of intercepting all possible call again and again until we finally stop calling anything from eo, or we go back to the drawing board and dump eo all together. I think this is to late for fixing in this release cycle, but let stop this madness for next release and move future out of eo.

@stefan_schmidt and @bu5hm4n : I am now advocating we give up for this release cycle as the amount of work in any direction increase the side effect (Inheritance on efl_ref in eo or rewrite of efl future).

jpeg added a comment.Mar 6 2017, 6:51 PM
In T5149#83286, @cedric wrote:

@stefan_schmidt and @bu5hm4n : I am now advocating we give up for this release cycle as the amount of work in any direction increase the side effect (Inheritance on efl_ref in eo or rewrite of efl future).

Do you mean there is no risk of any bug when using legacy APIs, even if they are built on top of a promises (ecore-con, fileselector, ...)?
I know that right now fileselector doesn't cancel properly a directory open (expand a folder, close it, it may re-expand itself as new children keep coming).

cedric added a comment.Mar 6 2017, 7:19 PM

Is that the reason why the cancel is not working ? If it is, I am afraid I have to do some big change :'(

jpeg added a comment.Mar 6 2017, 10:46 PM

No I didn't say it is the reason. I haven't inverstigated fileselector, I only know this happens. My question was just: do we know of any real issues (with legacy API) related to this ticket?

Well, i am not directly aware of a bug. You just get the valgrind warnings. Now the question is, can this result in a crash ?

Why not push the promise fix that fails to work as expected from C++ bindings and leave the C++ bindings broken on the promise API rather than risk C API correctness?

cedric added a comment.Mar 8 2017, 9:28 AM

I have been thinking over night about @felipealmeida proposition and I think it makes quite some sense. Some use case of future won't be possible anymore with my wref patch, but at least it will be access to NULL which most API are protected against. I will do that today.

After further tests, my patch on wref also break ecore_con use. I guess we have to stay like it is for now and I will rewrite future to not depend on Eo for next release.

#0 0x00007ffff794eabf in send () at /lib64/libpthread.so.0
#1 0x00007ffff505b27a in _efl_net_socket_fd_efl_io_writer_write (o=0x8000009224ae323f, pd=<optimized out>, ro_slice=0x7fffffffdb30, remaining=0x0) at lib/ecore_con/efl_net_socket_fd.c:277
#2 0x00007ffff70572d4 in efl_io_writer_write (obj=0x8000009224ae323f, slice=slice@entry=0x7fffffffdb30, remaining=remaining@entry=0x0) at ../src/lib/efl/interfaces/efl_io_writer.eo.c:3
#3 0x00007ffff7292498 in _efl_io_copier_write (o=o@entry=0x80000094d4ae3241, pd=pd@entry=0x119f3a0)

at lib/ecore/efl_io_copier.c:297

#4 0x00007ffff7293ffb in _efl_io_copier_job (data=0x80000094d4ae3241, ev=<optimized out>)

at lib/ecore/efl_io_copier.c:111

#5 0x00007ffff7297d36 in _efl_loop_future_success (ev=ev@entry=0x7fffffffdc20, pd=pd@entry=0xff0530, value=0x80000094d4ae3241) at lib/ecore/efl_promise.c:112
#6 0x00007ffff7298b5f in _efl_loop_future_propagate (obj=0x800000b164ae3689, pd=0xff0530)

at lib/ecore/efl_promise.c:190

#7 0x00007ffff7298beb in _efl_promise_propagate (obj=0x8000009324ae3688, pd=0xff4220) at lib/ecore/efl_promise.c:578
#8 0x00007ffff7299980 in ecore_loop_promise_fulfill (obj=<optimized out>) at lib/ecore/efl_promise.c:591
#9 0x00007ffff728b6c5 in _ecore_main_loop_iterate_internal (once_only=once_only@entry=0)

at lib/ecore/ecore_main.c:2272

#10 0x00007ffff728be27 in ecore_main_loop_begin () at lib/ecore/ecore_main.c:1299
#11 0x00007ffff612e745 in elm_run () at lib/elementary/elm_main.c:1281
#12 0x000000000040956d in elm_main (argc=argc@entry=1, argv=argv@entry=0x7fffffffddf8) at ephoto.c:74
#13 0x000000000040934c in main (argc=1, argv=0x7fffffffddf8) at ephoto.c:93

barbieri closed this task as Resolved.Sep 11 2017, 12:27 PM
barbieri added a subscriber: barbieri.

promise/future was rewritten and this bug is gone.

if similar bug shows, open a new one with new backtrace...