Page MenuHomePhabricator

fileselector unit test errors
Open, Showstopper IssuesPublic

Description

Some crazy stuff going on here: _itc_del is being reached with it_data->model as invalid because it was previously destroyed through the _noref_death mechanism.

zmike created this task.Aug 16 2018, 2:39 PM
zmike triaged this task as Normal priority.

@cedric could you verify this?

zmike raised the priority of this task from Normal to Showstopper Issues.Feb 4 2019, 11:07 AM
zmike added a subscriber: bu5hm4n.

Something related to this is causing a deadlock in the unit test which occurs >50% of the time for me. This needs to be looked at urgently to verify that there isn't something wrong with model usage/implementation.

zmike added a project: Restricted Project.Feb 4 2019, 11:07 AM

While working on cleaning up the child model lifecycle of Efl.Io.Model for task T7528 I have found a few possible error that could explain the problem reported here. Could you check with that patch set if that solve the problem?

zmike added a comment.Feb 4 2019, 12:19 PM

It doesn't.

zmike added a comment.Feb 4 2019, 1:56 PM

For documentation purposes, this is the deadlock backtrace:

#0  0x00007f7885908655 in pthread_spin_lock () at /lib64/libpthread.so.0
#1  0x00007f78861046d8 in eina_spinlock_take (spinlock=0x55dc24c6ba5c) at ../src/lib/eina/eina_inline_lock_posix.x:672
#2  0x00007f7886105d26 in ecore_thread_cancel (thread=0x55dc24c6b990) at lib/ecore/ecore_thread.c:751
#3  0x00007f788490a128 in eio_file_cancel (ls=0x55dc24c18e40) at lib/eio/eio_file.c:742
#4  0x00007f788490b9b1 in _efl_io_manager_future_cancel (o=0x40000000289f, data=0x55dc24c18e40, error=125) at lib/eio/eio_private.h:551
#5  0x00007f7886455ea6 in _efl_future_cb (data=0x55dc248b67d0, value=..., dead_future=0x55dc2481df40) at lib/eo/eo_base_class.c:2079
#6  0x00007f78864a7669 in _eina_future_cb_dispatch (f=0x55dc2481df40, value=...) at lib/eina/eina_promise.c:351
#7  0x00007f78864a7c27 in _eina_future_cancel (f=0x55dc2481df40, err=125) at lib/eina/eina_promise.c:468
#8  0x00007f78864a8f59 in eina_future_cancel (f=0x55dc2481dfc0) at lib/eina/eina_promise.c:767
#9  0x00007f7886450684 in _efl_pending_futures_clear (pd=0x55dc24c70590) at lib/eo/eo_base_class.c:124
#10 0x00007f78864506b0 in _efl_object_invalidate (obj_id=0x4000001922de, pd=0x55dc24c70590) at lib/eo/eo_base_class.c:131
#11 0x00007f7886457dd6 in efl_invalidate (obj=0x4000001922de) at lib/eo/efl_object.eo.c:76
#12 0x00007f7884918b6a in _efl_io_model_efl_object_invalidate (obj=0x4000001922de, priv=0x55dc24c705e0) at lib/eio/efl_io_model.c:1063
#13 0x00007f7886457dd6 in efl_invalidate (obj=0x4000001922de) at lib/eo/efl_object.eo.c:76
#14 0x00007f78864507bd in _efl_invalidate (obj=0x55dc24c70550) at lib/eo/eo_base_class.c:167
#15 0x00007f78864524ec in _efl_object_parent_set (obj=0x4000001922de, pd=0x55dc24c70590, parent_id=0x0) at lib/eo/eo_base_class.c:726
#16 0x00007f7886456cf5 in efl_parent_set (obj=0x4000001922de, parent=0x0) at lib/eo/efl_object.eo.c:12
#17 0x00007f78860ce3a1 in _efl_loop_consumer_efl_object_parent_set (obj=0x4000001922de, pd=0x55dc24c705e0, parent=0x0) at lib/ecore/efl_loop_consumer.c:36
#18 0x00007f7886456cf5 in efl_parent_set (obj=0x4000001922de, parent=0x0) at lib/eo/efl_object.eo.c:12
#19 0x00007f78864522d5 in efl_del (obj=0x4000001922de) at lib/eo/eo_base_class.c:678
#20 0x00007f78859fe070 in _noref_death (data=0x0, event=0x7ffd8280b080) at lib/elementary/elc_fileselector.c:167
#21 0x00007f7886454e2a in _event_callback_call (obj_id=0x4000001922de, pd=0x55dc24c70590, desc=0x7f78864619c0 <_EFL_EVENT_NOREF>, event_info=0x0, legacy_compare=0 '\000') at lib/eo/eo_base_class.c:1681
#22 0x00007f7886455106 in _efl_object_event_callback_call (obj_id=0x4000001922de, pd=0x55dc24c70590, desc=0x7f78864619c0 <_EFL_EVENT_NOREF>, event_info=0x0) at lib/eo/eo_base_class.c:1765
#23 0x00007f78864551ac in efl_event_callback_call (obj=0x4000001922de, desc=0x7f78864619c0 <_EFL_EVENT_NOREF>, event_info=0x0) at lib/eo/eo_base_class.c:1768
#24 0x00007f788644be2f in efl_unref (obj_id=0x4000001922de) at lib/eo/eo.c:1963
#25 0x00007f78859ff5bb in _listing_request_cleanup (lreq=0x55dc24c17b70) at lib/elementary/elc_fileselector.c:706
#26 0x00007f78859ff555 in _process_last (lreq=0x55dc24c17b70) at lib/elementary/elc_fileselector.c:695
#27 0x00007f7885a00055 in _process_children_cb (data=0x55dc24c17b70, v=..., dead_future=0x55dc2481e040) at lib/elementary/elc_fileselector.c:959
#28 0x00007f78864a7669 in _eina_future_cb_dispatch (f=0x55dc2481e040, value=...) at lib/eina/eina_promise.c:351
#29 0x00007f78864a7c27 in _eina_future_cancel (f=0x55dc2481e040, err=125) at lib/eina/eina_promise.c:468
#30 0x00007f78864a8f59 in eina_future_cancel (f=0x55dc2481e080) at lib/eina/eina_promise.c:767
#31 0x00007f7886450684 in _efl_pending_futures_clear (pd=0x55dc249b5340) at lib/eo/eo_base_class.c:124
#32 0x00007f78864506b0 in _efl_object_invalidate (obj_id=0x40000001dd0c, pd=0x55dc249b5340) at lib/eo/eo_base_class.c:131
#33 0x00007f7886457dd6 in efl_invalidate (obj=0x40000001dd0c) at lib/eo/efl_object.eo.c:76
#34 0x00007f78861c39bc in _efl_canvas_object_efl_object_invalidate (eo_obj=0x40000001dd0c, obj=0x55dc249b5390) at lib/evas/canvas/evas_object_main.c:1350
#35 0x00007f7886457dd6 in efl_invalidate (obj=0x40000001dd0c) at lib/eo/efl_object.eo.c:76
#36 0x00007f7885b32e4f in _efl_access_object_efl_object_invalidate (obj=0x40000001dd0c, pd=0x55dc249b5874) at lib/elementary/efl_access_object.c:641
#37 0x00007f7886457dd6 in efl_invalidate (obj=0x40000001dd0c) at lib/eo/efl_object.eo.c:76
#38 0x00007f7885c96c3a in _efl_ui_focus_composition_efl_object_invalidate (obj=0x40000001dd0c, pd=0x55dc249b5830) at lib/elementary/efl_ui_focus_composition.c:223
#39 0x00007f7886457dd6 in efl_invalidate (obj=0x40000001dd0c) at lib/eo/efl_object.eo.c:76
#40 0x00007f78864507bd in _efl_invalidate (obj=0x55dc249b5300) at lib/eo/eo_base_class.c:167
#41 0x00007f78864524ec in _efl_object_parent_set (obj=0x40000001dd0c, pd=0x55dc249b5340, parent_id=0x0) at lib/eo/eo_base_class.c:726
#42 0x00007f7886456cf5 in efl_parent_set (obj=0x40000001dd0c, parent=0x0) at lib/eo/efl_object.eo.c:12
#43 0x00007f78860ce3a1 in _efl_loop_consumer_efl_object_parent_set (obj=0x40000001dd0c, pd=0x55dc249b5390, parent=0x0) at lib/ecore/efl_loop_consumer.c:36
#44 0x00007f7886456cf5 in efl_parent_set (obj=0x40000001dd0c, parent=0x0) at lib/eo/efl_object.eo.c:12
#45 0x00007f78864522d5 in efl_del (obj=0x40000001dd0c) at lib/eo/eo_base_class.c:678
#46 0x00007f78861c2893 in evas_object_del (obj=0x40000001dd0c) at lib/evas/canvas/evas_object_main.c:1052
#47 0x00007f7885c260df in _efl_ui_widget_efl_canvas_group_group_del (obj=0x400000003ca4, sd=0x55dc2480ebc0) at lib/elementary/efl_ui_widget.c:833
#48 0x00007f78861d982e in efl_canvas_group_del (obj=0x400000003ca4) at lib/evas/canvas/evas_object_smart.c:1843
#49 0x00007f7885c45e48 in _efl_ui_win_efl_canvas_group_group_del (obj=0x400000003ca4, sd=0x55dc2480ed60) at lib/elementary/efl_ui_win.c:2989
#50 0x00007f78861d982e in efl_canvas_group_del (obj=0x400000003ca4) at lib/evas/canvas/evas_object_smart.c:1843
#51 0x00007f78861d816b in evas_object_smart_del (eo_obj=0x400000003ca4) at lib/evas/canvas/evas_object_smart.c:1291
#52 0x00007f78861c398a in _efl_canvas_object_efl_object_invalidate (eo_obj=0x400000003ca4, obj=0x55dc2480ea00) at lib/evas/canvas/evas_object_main.c:1347
#53 0x00007f7886457dd6 in efl_invalidate (obj=0x400000003ca4) at lib/eo/efl_object.eo.c:76
#54 0x00007f7885b32e4f in _efl_access_object_efl_object_invalidate (obj=0x400000003ca4, pd=0x55dc2480efb8) at lib/elementary/efl_access_object.c:641
#55 0x00007f7886457dd6 in efl_invalidate (obj=0x400000003ca4) at lib/eo/efl_object.eo.c:76
#56 0x00007f78864507bd in _efl_invalidate (obj=0x55dc2480e970) at lib/eo/eo_base_class.c:167
#57 0x00007f78864524ec in _efl_object_parent_set (obj=0x400000003ca4, pd=0x55dc2480e9b0, parent_id=0x0) at lib/eo/eo_base_class.c:726
#58 0x00007f7886456cf5 in efl_parent_set (obj=0x400000003ca4, parent=0x0) at lib/eo/efl_object.eo.c:12
#59 0x00007f78860ce3a1 in _efl_loop_consumer_efl_object_parent_set (obj=0x400000003ca4, pd=0x55dc2480ea00, parent=0x0) at lib/ecore/efl_loop_consumer.c:36
#60 0x00007f7886456cf5 in efl_parent_set (obj=0x400000003ca4, parent=0x0) at lib/eo/efl_object.eo.c:12
#61 0x00007f78864522d5 in efl_del (obj=0x400000003ca4) at lib/eo/eo_base_class.c:678
#62 0x00007f78861c2893 in evas_object_del (obj=0x400000003ca4) at lib/evas/canvas/evas_object_main.c:1052
#63 0x00007f7885c478ec in _elm_win_shutdown () at lib/elementary/efl_ui_win.c:3670
#64 0x00007f7885b6fb4a in elm_shutdown () at lib/elementary/elm_main.c:457
#65 0x000055dc22a89a0b in _elm_suite_shutdown () at tests/elementary/suite_helpers.c:46
#66 0x00007f7886513820 in srunner_run_teardown () at /lib64/libcheck.so.0
#67 0x00007f78865142e8 in srunner_run_tagged () at /lib64/libcheck.so.0
#68 0x000055dc22a8ae95 in _efl_suite_run_end (sr=0x55dc24981e80, name=0x55dc22ac233b "elm_fileselector") at tests/elementary/../efl_check.h:251

Also, somewhat interesting is that I tried with eina thread debug enabled and the test passed every time with no errors. So I guess this is some timing issue.

zmike added a comment.Feb 5 2019, 8:17 AM

Here's a new one, I got a crash today:

Thread 1 "elm_suite" received signal SIGSEGV, Segmentation fault.
0x00007ffff7ebac26 in _efl_object_call_resolve (eo_id=0x40000012a72f, func_name=0x7ffff7ecee04 "efl_event_callback_array_del", call=0x7fffffffc4e0, op=33, file=0x7ffff7ece8fd "lib/eo/eo_base_class.c", line=1595) at lib/eo/eo.c:471
471	        vtable = EO_VTABLE(obj);
(gdb) bt
#0  0x00007ffff7ebac26 in _efl_object_call_resolve (eo_id=0x40000012a72f, func_name=0x7ffff7ecee04 "efl_event_callback_array_del", call=0x7fffffffc4e0, op=33, file=0x7ffff7ece8fd "lib/eo/eo_base_class.c", line=1595) at lib/eo/eo.c:471
#1  0x00007ffff7ec7a12 in efl_event_callback_array_del (obj=0x40000012a72f, array=0x7ffff7e96a60 <internal>, user_data=0x555555ac12c0) at lib/eo/eo_base_class.c:1592
#2  0x00007ffff7c0b5bb in evas_object_callback_shutdown (eo_obj=0x40000012a72f, obj=0x555555ac12c0) at lib/evas/canvas/evas_callbacks.c:869
#3  0x00007ffff7c4867f in evas_object_free (obj=0x555555ac12c0, clean_layer=0 '\000') at lib/evas/canvas/evas_object_main.c:549
#4  0x00007ffff7c23c27 in evas_layer_free_objects (lay=0x5555559cff10) at lib/evas/canvas/evas_layer.c:113
#5  0x00007ffff7c25c70 in _evas_canvas_efl_object_invalidate (eo_e=0x400000004696, e=0x5555558ded30) at lib/evas/canvas/evas_main.c:419
#6  0x00007ffff7ecadd6 in efl_invalidate (obj=0x400000004696) at lib/eo/efl_object.eo.c:76
#7  0x00007ffff7ec37bd in _efl_invalidate (obj=0x5555558deca0) at lib/eo/eo_base_class.c:167
#8  0x00007ffff7ec54ec in _efl_object_parent_set (obj=0x400000004696, pd=0x5555558dece0, parent_id=0x0) at lib/eo/eo_base_class.c:726
#9  0x00007ffff7ec9cf5 in efl_parent_set (obj=0x400000004696, parent=0x0) at lib/eo/efl_object.eo.c:12
#10 0x00007ffff7b55461 in _efl_loop_consumer_efl_object_parent_set (obj=0x400000004696, pd=0x5555558ded30, parent=0x0) at lib/ecore/efl_loop_consumer.c:36
#11 0x00007ffff7ec9cf5 in efl_parent_set (obj=0x400000004696, parent=0x0) at lib/eo/efl_object.eo.c:12
#12 0x00007ffff7ec52d5 in efl_del (obj=0x400000004696) at lib/eo/eo_base_class.c:678
#13 0x00007ffff7c25587 in evas_free (eo_e=0x400000004696) at lib/evas/canvas/evas_main.c:264
#14 0x00007ffff7b0db68 in _ecore_evas_free (ee=0x5555557b38a0) at lib/ecore_evas/ecore_evas.c:3497
#15 0x00007ffff7b05836 in ecore_evas_free (ee=0x5555557b38a0) at lib/ecore_evas/ecore_evas.c:1107
#16 0x00007ffff76caf8f in _deferred_ecore_evas_free (data=0x5555557b38a0) at lib/elementary/efl_ui_win.c:2274
#17 0x00007ffff7b4831a in _ecore_job_event_handler (data=0x0, type=15, ev=0x555555b90ef0) at lib/ecore/ecore_job.c:95
#18 0x00007ffff7b4ed47 in _ecore_event_message_handler_efl_loop_message_handler_message_call (obj=0x400000000e88, pd=0x55555587be50, message=0x4000001a6520) at lib/ecore/ecore_event_message_handler.c:360
#19 0x00007ffff7b58d42 in efl_loop_message_handler_message_call (obj=0x400000000e88, message=0x4000001a6520) at lib/ecore/efl_loop_message_handler.eo.c:14
#20 0x00007ffff7b535c7 in _efl_loop_message_process (obj=0x400000000285, pd=0x55555594d3d0) at lib/ecore/efl_loop.c:630
#21 0x00007ffff7b537b8 in efl_loop_message_process (obj=0x400000000285) at lib/ecore/efl_loop.c:660
#22 0x00007ffff7b4c85b in _ecore_main_loop_iterate_internal (obj=0x400000000285, pd=0x55555594d3d0, once_only=1) at lib/ecore/ecore_main.c:2458
#23 0x00007ffff7b49dcb in _ecore_main_loop_iterate (obj=0x400000000285, pd=0x55555594d3d0) at lib/ecore/ecore_main.c:1111
#24 0x00007ffff7b51d8f in _efl_loop_iterate (obj=0x400000000285, pd=0x55555594d3d0) at lib/ecore/efl_loop.c:70
#25 0x00007ffff7b5425e in efl_loop_iterate (obj=0x400000000285) at lib/ecore/efl_loop.eo.c:20
#26 0x00007ffff7b4a064 in ecore_main_loop_iterate () at lib/ecore/ecore_main.c:1258
#27 0x00007ffff75f6b56 in elm_shutdown () at lib/elementary/elm_main.c:460
#28 0x0000555555562a0b in _elm_suite_shutdown () at tests/elementary/suite_helpers.c:46
#29 0x00007ffff7f86820 in srunner_run_teardown () at /lib64/libcheck.so.0
#30 0x00007ffff7f86d54 in tcase_run_tfun_nofork.isra () at /lib64/libcheck.so.0
#31 0x00007ffff7f8736e in srunner_run_tagged () at /lib64/libcheck.so.0
#32 0x0000555555563e95 in _efl_suite_run_end (sr=0x555555925830, name=0x0) at tests/elementary/../efl_check.h:251
#33 0x00005555555643d4 in _efl_suite_build_and_run (argc=1, argv=0x7fffffffe0b0, suite_name=0x55555559b534 "Elementary", etc=0x5555555c35e0 <etc>, init=0x5555555628cc <_elm2_suite_init>, shutdown=0x5555555629fe <_elm_suite_shutdown>) at tests/elementary/../efl_check.h:386
#34 0x00005555555644e5 in main (argc=2, argv=0x7fffffffe0a8) at tests/elementary/elm_suite.c:100
zmike added a comment.Feb 5 2019, 11:33 AM

After spending most of the day on this:

  1. _noref_death() is reached on a model
  2. _invalidate() is called in elc_fileselector.c on the model
  3. this cancels the underlying model's future in eio
  4. eio triggers its "thread done" callback for the model
  5. _properties_changed() is called on this model
  6. the (dead) model is added as a genlist item in _process_model()
  7. the fileselector is destroyed
  8. the model is killed again and either deadlocks or crashes
YOhoho added a subscriber: YOhoho.Mar 8 2019, 2:30 AM

Oh, that's interesting. I had an issue on my filesystem yesterday which resulted in a stat failing all the time which would lead to an Efl.Io.Model to always be in error. The patch D8336 is solving the error path and might actually solve also this problem too. Would be interesting to see if that does the case.

zmike added a subscriber: cedric.

Still broken with that patchset. Also not just errors, this crashes and deadlocks.

@zmike, after the patchset i have only one crash left, which is 1 out of ~30 tries. (T7771) can we close this ?

zmike added a comment.Mar 28 2019, 6:19 AM

No, these patches 100% made things worse. Check https://travis-ci.org/Enlightenment/efl/builds not a single run has passed since they were landed and the issues are all related.

Given the fact that trais is acting really weird (still not building my patch with debugging output) what do you think about disabling them on travis for now ? DO you have issues locally ? I can try running it on a raspberryPI in a week ...

Okay, we have finally the output of a failing eio_suite, after looking briefly through it, is it possible that in some cases on our CI a thread is never executed ? I get 0 of my debugging output, which means, the code did not even start to list the content of '/tmp' which is rather weird IMO.

That sounds rather strange indeed. I can't really think a reason why the listing would not start at all especially because other part of efl io are succeeding in there test. Are any of you able to reproduce the problem locally?

I would recommend putting printfs into the thread you're expecting to run (line by line) and then doing a test run through travis to see what happens.

@cedric @zmike

https://github.com/Enlightenment/efl/commit/6e678d692101db524d7e9683cd3d42373869a975

That is the debugging output i have added there, which shows not at all when eio suite times out on travis.

bu5hm4n added a comment.EditedMar 31 2019, 2:25 AM

@cedric I think i have found today more of what could be possibily the fact why efl_io_model dies.

What i debugged for now:

  • Efl.Io.Model gets creates, with a parent, no additional references
  • The Efl.Io.Model class gets called for some properties, therefore _eio_build_st is executed.
  • _eio_build_st takes a reference

*some time passes by*

  • _eio_file_stat_done is executed, efl_unref is executed.
  • This causes _noref_death in efl_loop_model.c:116 to be executed, which takes the model, and deletes it. (What the ???)
  • The only reason this can work is when someone keeps calling async operations on the object, so new references are taken again and again all the time...

I have changed efl_add to efl_add_ref here:
https://git.enlightenment.org/core/efl.git/commit/?h=devs/bu5hm4n/eio_suite_fix&id=337401eeb0b9aa6cd6cdc4fad0659b9daf1828c4

Lets see how travis likes it.

But more serious, i can change the tests, no problem, but fileselector seems to be a bigger problem, we efl_add_ref the model there, and unref a few lines later, which seems to be just the same there again, we should *really* stop the objects from comitting suicide...

EDIT: !$%&! Did not help :(

A little bit more information:
this is the error log of a failed eio-suite run: https://phab.enlightenment.org/P281

When you see the code that is in this branch, you will observe that the testcase in question is not even executed correctly. Running the testsuite directly with docker exec --env EINA_LOG_BACKTRACE="0" --env EIO_MONITOR_POLL=1 $(cat $HOME/cid) ./build/src/tests/eio/eio_suite
Also does show this behavior. Running it additionally with the env var CK_FORK=no results in this bug disappearing, so i suspect broken fork() behavior on travis, or we have a bug in efl_check.h

bu5hm4n added a comment.EditedMar 31 2019, 7:08 AM

Okay, we hit the weirdest thing that i have ever seen:

../src/tests/eio/../efl_check.h:319 9204 0
../src/tests/eio/../efl_check.h:336 9204 0
../src/tests/eio/../efl_check.h:338 9211 0
../src/tests/eio/../efl_check.h:353 9211 0
../src/tests/eio/../efl_check.h:357 9211 0
../src/tests/eio/../efl_check.h:360 9211 0
../src/tests/eio/../efl_check.h:363 9211 0
Running suite(s): Eio
../src/tests/eio/../efl_check.h:319 9204 0
../src/tests/eio/../efl_check.h:336 9204 0
../src/tests/eio/../efl_check.h:338 9204 0
>9211Eio
../src/tests/eio/../efl_check.h:319 9204 1
../src/tests/eio/../efl_check.h:336 9204 1
../src/tests/eio/../efl_check.h:338 9212 1
../src/tests/eio/../efl_check.h:353 9212 1
../src/tests/eio/../efl_check.h:357 9212 1
../src/tests/eio/../efl_check.h:360 9212 1
../src/tests/eio/../efl_check.h:363 9212 1
Running suite(s): Eio
100%: Checks: 1, Failures: 0, Errors: 0
../src/tests/eio/../efl_check.h:368 9211 0
../src/tests/eio/../efl_check.h:371 9211 0
../src/tests/eio/../efl_check.h:376 9211 0
../src/tests/eio/../efl_check.h:319 9204 0
../src/tests/eio/../efl_check.h:336 9204 0
../src/tests/eio/../efl_check.h:338 9204 0
>9211Eio
../src/tests/eio/../efl_check.h:319 9204 1
../src/tests/eio/../efl_check.h:336 9204 1
../src/tests/eio/../efl_check.h:338 9204 1
>9212Efl Io Model Monitor
<9211
<9210
FAILSAFE TIMEOUT REACHED!
STILL RUNNING: Efl Io Model Monitor

This is the output of the eio testsuite, executed on travis vm,
">%d%s" Is executed in the suite-executor (right after the fork), %d is the pid of the child that executes the testcase. %s is the name of the testcase. What this shows is, that in iteration 0 the testcase "Eio" is executed 2 times, however with the same result of fork, this means: we have only 2 processes that will exit, while num_fork is 3. which means, something will not work here at all in that regard ...

With this efl_check.h is out, there is IMO no reason the code would do that, so we either have an compiler bug, or travis bug. I am not sure what we should do here...