Page MenuHomePhabricator

eina_thread_queue deadlocks on FreeBSD
Closed, ResolvedPublic

Description

I initially noticed this when E froze up after clicking on a gadget (the backlight gadget). After attaching GDB I observed that both the threadq submitter and background worker thread for evas async sample sampling were stuck waiting on the threadq semaphore.

The ecore test suite can hit this as well:

$ ( cd src && tests/ecore/ecore_suite Ecore_Test_Ccore_Thread_Eina_Thread_Queue )     
Running suite(s): Ecore
msgs done
10000000 messages sent
ERROR ERROR: DEADLOCK on lock 0x80a011008

(The deadlock detection isn't perfect — it seems to also hit this without printing a warning.)

c created this task.Apr 13 2015, 2:21 PM
c updated the task description. (Show Details)
c raised the priority of this task from to High.
c added a project: enlightenment-git.
c changed the visibility from "All Users" to "Public (No Login Required)".
c added a subscriber: c.
c added a comment.Apr 13 2015, 2:50 PM
Thread 3 (Thread 809c06c00 (LWP 101243)):
#0  0x000000080216bc69 in pthread_spin_trylock () from /lib/libthr.so.3
#1  0x00000008018dbd6d in eina_spinlock_take (spinlock=<optimized out>) at ../src/lib/eina/eina_inline_lock_posix.x:683
#2  eina_thread_queue_send (thq=<optimized out>, size=<optimized out>, allocref=0x7fffdfffdf58) at lib/eina/eina_thread_queue.c:385
#3  0x00000000004084e4 in th31_do (data=<optimized out>, th=<optimized out>)
    at tests/ecore/ecore_test_ecore_thread_eina_thread_queue.c:268
#4  0x0000000801d23a0b in _ecore_direct_worker (work=0x809ce4020) at lib/ecore/ecore_thread.c:424
#5  0x00000008018cdee2 in _eina_internal_call (context=0x809c3cb80) at lib/eina/eina_thread.c:109
#6  0x000000080216c6f4 in ?? () from /lib/libthr.so.3
#7  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffdfffe000

Thread 2 (Thread 809c07000 (LWP 101244)):
#0  0x0000000802177d9c in ?? () from /lib/libthr.so.3
#1  0x000000080216de04 in ?? () from /lib/libthr.so.3
#2  0x0000000802172b34 in ?? () from /lib/libthr.so.3
#3  0x00000008018dbe65 in eina_lock_take (mutex=<optimized out>) at ../src/lib/eina/eina_inline_lock_posix.x:234
#4  _eina_thread_queue_msg_alloc (thq=0x809c17c90, size=<optimized out>, blkret=0x80a00e000) at lib/eina/eina_thread_queue.c:241
#5  eina_thread_queue_send (thq=0x809c17c90, size=<optimized out>, allocref=0x7fffdfdfcf58) at lib/eina/eina_thread_queue.c:386
#6  0x0000000000408544 in th32_do (data=<optimized out>, th=<optimized out>)
    at tests/ecore/ecore_test_ecore_thread_eina_thread_queue.c:286
#7  0x0000000801d23a0b in _ecore_direct_worker (work=0x809ce4160) at lib/ecore/ecore_thread.c:424
#8  0x00000008018cdee2 in _eina_internal_call (context=0x809c3cbe0) at lib/eina/eina_thread.c:109
#9  0x000000080216c6f4 in ?? () from /lib/libthr.so.3
#10 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffdfdfd000

Thread 1 (Thread 809c06400 (LWP 100956)):
#0  0x00000008023fb328 in _umtx_op () from /lib/libc.so.7
#1  0x00000008023ea1f2 in sem_timedwait () from /lib/libc.so.7
#2  0x00000008018dbf7e in eina_semaphore_lock (sem=<optimized out>) at ../src/lib/eina/eina_inline_lock_posix.x:793
#3  _eina_thread_queue_wait (thq=0x809c17c90) at lib/eina/eina_thread_queue.c:181
#4  eina_thread_queue_wait (thq=0x809c17c90, allocref=0x7fffffffe028) at lib/eina/eina_thread_queue.c:431
#5  0x0000000000407d8f in ecore_test_ecore_thread_eina_thread_queue_t4 (_i=<optimized out>)
    at tests/ecore/ecore_test_ecore_thread_eina_thread_queue.c:309
#6  0x0000000800837e69 in srunner_run () from /usr/local/lib/libcheck.so.0
#7  0x000000000040549f in main (argc=<optimized out>, argv=0x7fffffffe248) at tests/ecore/ecore_suite.c:122
c assigned this task to raster.Apr 13 2015, 6:04 PM
c added a comment.Apr 17 2015, 6:29 AM

So. The deadlock seems to be a leak on lock_non_0_ref. Every other caller is simply stacked behind RWLOCK on thq.lock_write, which is held by the thread in send -> msg_alloc.

The send thread is stuck on acquiring lock_non_0_ref member of Eina_Thread_Queue_Msg_Block:

ERROR ERROR: DEADLOCK on lock 0x80a00e008
ERROR ERROR: DEADLOCK on lock 0x80a00e008

In the trace above, thread 2 is in eina_thread_queue_send; grabs thq->lock_write; then calls _eina_thread_queue_msg_alloc;

(gdb) p $blk.ref
$21 = 1

where it sleeps forever on eina_lock_take of the non-0-ref lock:

if (ref == 1) eina_lock_take(&(blk->lock_non_0_ref));

Anyway I'm trying to add more logging around the non_0_ref locks to track down the leak / holder.

just to add some meat. you have done some adding of aborts() to detect EINVAL and you think _eina_thread_queue_block_pool_lock is invalid (0x0 value - eg null ptr). all the code paths init and shut that doe correctly reading eina threadqueue code and all lock takes and frees do too... so either:

  1. eina is being shut down before threads using threadqueue finish. looking at ecore_test_ecore_thread_eina_thread_queue.c i can't see that happen - hell it never shuts eina or ecore down at all and the ecore_thread sending msgs quits itswlf (its for loop) and thus stops running - and the mainloop is a consumer at least in th1 and th2 tests (and your info says th1 0s wrong):

https://phab.enlightenment.org/D2376 / http://dpaste.com/02NF30X

  1. recursive spinlocks? reading the code - this can't be. there is no way for it to recurse here.
  2. memory corruption (hw, sw, kernel, libc?)

or

  1. spinlocks are somehow broken on bsd?

another option... something in eina_lock pthread wrapping etc. is broken - i havent looekd there. or now assuming it's correct at least. in which case threadqueue isn't the problem src :)

if it is #4 - replacing the spinlock with a regular lock and testing might prove informative. if problems go away either:

  1. spinlocks are broken on your bsd (is it a general bsd or just your instance issue?)
  2. this simply changed timing to avoid some racer condition to hide a bug - who knows... ?
cpet added a subscriber: cpet.Aug 2 2015, 11:10 PM

OpenBSD. EFL 1.17.0. Amd64.

pthread spinlocks not working. Of note: EFL 1.13.3's "src/lib/eina/eina_inline_lock_posix.x" will drop into EFL 1.17's and eolian_gen will not die.

I've also ran EFL 1.17.0 without spin locks...pthread_mutex_destroy is failing due to waiters. By counting and only destroying when there were no waiters the code without spin locking works.

Eolian (C) and eolian_gen will compile with vanilla EFL 1.17 on OpenBSD. It will fail everytime it runs. pthread_spin_destroy will fail (maybe due to above???) and abort. I've found when either not using spinlocking or using EFL 1.13.3's "eina_inline_lock_posix.c" eolian_gen is much happier, but EFL 1.17's eolian won't always validate its database, there are concurrency issues...not present with EFL 1.13.3...

cpet added a comment.Feb 29 2016, 6:16 PM
This comment was removed by raster.
cpet removed a subscriber: cpet.Feb 29 2016, 6:16 PM

it works fine on linux. until right now - no one bothered to check on bsd to see if spinlock -> full lock (mutex) change changed anything. right now it does appear that spinlocks are somehow broken on bsd - that's the best info i can see here. that smells to me right now of a "well bsd is broken. fix bsd." until we have more info. we use spinlocks specifically because they are for cases where contention is expected to be very low/short lived and a spinlock would be more efficient than a full mutex (use less memory too).

what to do? find out more information as to WHY spinlocks fail but mutexes succeed.

as for destruction having issues with waiters - that does smell like an issue somewhere in efl - can you get me a backtrace on destruction (with an abort() etc.) so i can at least see WHO is waiting when destruction happens? this could be a classic case of our shutdowns wanting to clean everything up and thus destroying objects that still have threads sitting around listening on (admittedly we are about to exit and don't really care).

we have test suites. we all almost 100% develop on linux, not bsd. the test suites succeed on linux. release after release. the builds succeed. if you care about bsd - then help and run the tests and tell us what's going on. if you don't care - then don't bother, but realize that we likely will not much care as bsd is frankly such a tiny percentage of our userbase and effectively nil of the developer base (or close to it). if you want that to change, stand up and do something about it.

@netstar is at least now trying. it's a start. so for changes since 1.13 for the eina_inline_lock_posix.x . the only commit i see as likely to affect things is:

76f9739f21a41799007db61595877b4912c4d605

does reverting that fix things for you? (git revert 76f9739f21a41799007db61595877b4912c4d605) ?

the only thing i see that really doing is actually DETECTING issues as opposed to ignoring them. the patch comes from @c who is a bsd user. so it worked for him.

so 2 issues -> spinlocks seem broken on bsd for some reason and someone s waiting on a lock (threadqueue) while it is being deleted. a backtrace will help there.

fyi i just put some waiting counter in eina_threadque - if someone is waiting it's ++'d and --'d when waiting is done. on free if > 0 it aborts. i just added this locally to debug and see. th abort doesn't happen. it works. so at least for me in my builds here on linux... no code is still waiting on a threadqueue while it is being destroyed. i really need your backtraces to see who/what/where/why/when this is going on.

git revert 76f9739f21a41799007db61595877b4912c4d605

I've just tried this with EFL.git.

Using OpenBSD 5.9 GENERIC.MP#1888 on amd64.

This revert stops pthread_spin_destroy from aborting on OpenBSD and will build EFL. Can someone test on FreeBSD???

raster added a comment.Mar 1 2016, 3:53 AM

ok - but that doesn't actually fix anything. it simply ignores an error. what about backtraces? that commit simply checks return values and aborts if there is an error that simply shouldn't happen.

c added a comment.Mar 1 2016, 2:04 PM
This comment was removed by raster.
raster added a comment.Mar 1 2016, 7:09 PM

btw - if comments aren't helping with the issue i don't see them as useful here. if you are not going to respond to questions to gather info then the best that will happen is a guess based on available info and that guess may just say "bsd seems broken because same thing on linux works fine". provide more info to work it out and narrow it down. @netstar is being rather useful and helpful atm. :)

http://www.haxlab.org/patches/eina_openbsd_thread_queue.patch

for (EFL-1.17.0) on OpenBSD. This fixes spinlock issues with eolian which allows us to build EFL 1.17.0 on OpenBSD.

This patch is committed to EFL (git).

Therefore, now, FreeBSD and OpenBSD are behaving quite similarly.

At the moment there are build issues with EFL (git 1.18.+) with FreeBSD. Perhaps we should fix these and then work on fixing pthread_spin issues on OpenBSD and FreeBSD once EFL (git 1.18+) builds and links okay

...am happy both BSDs are behaving the same...

The easiest way to test it just to have desklock start up on E init...

netstar added a comment.EditedMar 26 2016, 11:20 AM

Well, all I can manage to do is isolate the BSD code from the Linux code so that for BSDs the old behaviour is used and for Linux the proper behaviour is used...

http://www.haxlab.org/patches/eina_inline_lock_posix_bsds.patch

So the only undefined behaviour afaik would be:

a) if you try to lock an already locked spinlock (pthread_spin_lock + pthread_spin_trylock)
b) if you try to unlock an unlocked spinlock (pthread_spin_unlock)

I guess the patch isolates where the issues are.

ok. so that last patch above ignores error returns. what we need to know is WHY does it fail? let's look at the first.

static inline Eina_Lock_Result
eina_spinlock_take(Eina_Spinlock *spinlock)
{
#if defined(EINA_HAVE_POSIX_SPINLOCK)
   int t;

   while (EINA_TRUE)
     {
        t = pthread_spin_trylock(spinlock);
        if (t == 0) break;
        else if (t == EBUSY) sched_yield();
        else EINA_LOCK_ABORT_DEBUG(t, spin_trylock, spinlock);
     }

   return EINA_LOCK_SUCCEED;
#elif defined(EINA_HAVE_OSX_SPINLOCK)
   /* void OSSpinLockLock(OSSpinLock *lock);
    * Will spin if the lock is already held, but employs various strategies to
    * back off, making it immune to most priority-inversion livelocks. */
   OSSpinLockLock(spinlock);
   return EINA_LOCK_SUCCEED;
#else
   return eina_lock_take(spinlock);
#endif
}

ok - we only care about posix spinlocks:

int t;

while (EINA_TRUE)
  {
     t = pthread_spin_trylock(spinlock);
     if (t == 0) break;
     else if (t == EBUSY) sched_yield();
     else EINA_LOCK_ABORT_DEBUG(t, spin_trylock, spinlock);
  }

so here - what is the return when this aborts? what spinlock is it and where? what backtrace? what situation? locking an already locked spinlock should be defined behavior. that is actually the whole point. keep spinning until we get the lock. is it returning t as some other error than EBUSY while spinning? never succeeding? what is going on?

FreeBSD 10.3-RC3 amd64

pthread_spin_trylock returns 22 'resource temporarily unavailable'

also: EINA_ERROR ;'Invalid Argument' on spin_trylock

Loops infinitely like this.

Return checked/tested from above pthread_spin_trylock you isolated.

netstar added a comment.EditedMar 27 2016, 5:55 AM

This occurs when the desklock starts on E init...because it keeps failing there's no way to enter a password. This desklock code works with the non error checking spinlock code and is fine on Linux too with every EFL version I throw at it...

netstar added a comment.EditedMar 27 2016, 6:03 AM

Also occurs when I disable the desklock also...the screen remains blank forever and ever amen!

The blank screen is what users are reporting with E20 on FreeBSD afaik the wizard works up to a point and then it all turns to ashes!

Sorry a revision to a bogus statement.

results are undefined if:

pthread_spin_lock and pthread_spin_trylock try to lock if the SAME THREAD already owns the lock and if you try to unlock a spinlock which isn't locked.

again... WHAT LOCK? yes - if you try lock the same lock from the same thread... WHAT LOCK? where are the 2 lock instances? backtraces of each! :)

so when it returns 22... that's EINVAL. that's an invalid spinlock at least according to the return value. what spinlock is this? what is the backtrace? why does this happen on fbsd and not linux? is it perhaps not that the lock has been freed or never initialized but that we have memory corruption and a lock is being scribbled over? does valgrind say anything?

so just a sec... you can type a password but every password is "wrong" ? desklock forks(). in e_auth.c. this forked e process does the actual PAM check (because pam can pause for a while to auth if there is a failure blocking the mainloop etc. etc.)... is the forked child aborting? is there perhaps a problem using locks in a forked child? or is it on exit() within this child that things go bad?

so disabling desklock - blank when? when screen blank timeout hits, screen blanks and you cant get it back? again... backtraces. what locks where?

Okay, here's a start. This happens right after E finishes initializing.
I aborted on first occurrence with pthread_spin_trylock and have a back trace for you:

#0  0x00000008096cb35a in thr_kill () from /lib/libc.so.7
#1  0x00000008096cb346 in raise () from /lib/libc.so.7
#2  0x00000008096cb2c9 in abort () from /lib/libc.so.7
#3  0x000000080ed83181 in evas_software_xlib_outbuf_idle_flush (
    buf=<value optimized out>) at eina_inline_lock_posix.x:760
#4  0x0000000805e960fd in _evas_canvas_render_idle_flush (
    eo_e=<value optimized out>, e=0x80d00d070) at evas_render.c:3187
#5  0x0000000805e32681 in evas_render_idle_flush (obj=<value optimized out>)
    at evas_canvas.eo.c:400
#6  0x0000000803ec2f78 in _ecore_evas_cb_idle_flush (
    data=<value optimized out>) at ecore_evas.c:3164
#7  0x0000000807f03de8 in _ecore_timer_expired_call (when=93555.729092643)
    at ecore_private.h:336
#8  0x0000000807eff844 in _ecore_main_loop_iterate_internal (once_only=0)
    at ecore_main.c:2273
#9  0x0000000807effc57 in ecore_main_loop_begin () at ecore_main.c:1297
#10 0x0000000000438428 in main (argc=<value optimized out>, 
    argv=<value optimized out>) at e_main.c:1044
(gdb)
(gdb) bt
#0  0x0000000809ab2eba in _select () from /lib/libc.so.7
#1  0x000000080931cff2 in pthread_suspend_all_np () from /lib/libthr.so.3
#2  0x0000000808218a9f in _timer_tick_core (data=0x0, thread=0x80d499420)
    at ecore_anim.c:140
#3  0x000000080822e64b in _ecore_direct_worker (work=0x80d499420)
    at ecore_thread.c:425
#4  0x0000000808dd8904 in _eina_internal_call (context=0x80d4d0a00)
    at eina_thread.c:125
#5  0x000000080931a855 in pthread_create () from /lib/libthr.so.3
#6  0x0000000000000000 in ?? ()
(gdb)

Another backtrace ... don't even know if this is related gosh!

With the second backtrace I compiled with more debugging symbols and captured the backtrace in the same way as the first one...

hmmm

#3  0x000000080ed83181 in evas_software_xlib_outbuf_idle_flush (buf=<value optimized out>) at eina_inline_lock_posix.x:760

where in evas_software_xlib_outbuf_idle_flush() ? can you compile with like -O1 not -O2 or -O3? though the inline kind of ruins things. eina_spinlock_take() is likely the function in the posix lock code - the abort there. but this function is not called from evas_software_xlib_outbuf_idle_flush() - it's some child function ... but where? it could be _unfind_xob() as that does SHMPOOL_LOCK() which does eina_spinlock_take()... that lock is initted by evas_software_xlib_outbuf_init() ... can you confirm that it is initted before this happens (add a printf there or trap in gdb to confirm). oh wait evas_software_xlib_outbuf_free() frees the shmpool spinlock. WTF? what? wait... if you have multiple outbufs this can be "freed" while still in use.

commit: 759dc373d78e4ded49566433355ff6398c6bc1d5

try that.

https://git.enlightenment.org/core/efl.git/commit/?id=5f307809b1d2ba2204bf073a06b4bc2fe1b205bb

Raster you gave me the wrong id, simotek found the above patch and yes sir...EFL on FreeBSD is running like a dream! Good job, well done!!!

netstar added a comment.EditedMar 29 2016, 11:15 AM

Nevermind, I've been running E for a few hours...another occurrence of the same spinlock issue., different spinlock I'm afraid right now I've no idea what triggered it. I'm guessing the fix could be applied to other implementations ??? This is running EFL 1.17.0 with the two patches that have come from these discussions.

Similarly as before, running on FreeBSD 10.3-RELEASE on X11 (amd64)...

oh sorry - wrong id. why did i give you that one? actually i think it may have changed id once i pushed... :) anyway... what is the new spinlock? where is this issue. again - we need to find it. :) same thing. need backtraces etc. hunting it down.

And another one! Here are two backtraces of the same event. It's caused by...

launching the Theme Selector dialogue from the menu or anywhere.

0x00000008097ebeba in _select () from /lib/libc.so.7
(gdb) bt
#0  0x00000008097ebeba in _select () from /lib/libc.so.7
#1  0x0000000809055ff2 in pthread_suspend_all_np () from /lib/libthr.so.3
#2  0x0000000807f90991 in _timer_tick_core (data=0x14, thread=0x80d05bb00)
    at ecore_anim.c:140
#3  0x0000000807f9e7ff in _ecore_direct_worker (work=0x80d05bb00)
    at ecore_thread.c:425
#4  0x0000000808b20382 in _eina_internal_call (context=0x80d0ea6a0)
    at eina_thread.c:125
#5  0x0000000809053855 in pthread_create () from /lib/libthr.so.3
#6  0x0000000000000000 in ?? ()

(gdb) bt
#0  0x0000000809ab2eba in _select () from /lib/libc.so.7
#1  0x000000080931cff2 in pthread_suspend_all_np () from /lib/libthr.so.3
#2  0x0000000808218a59 in _timer_tick_core (data=0x0, thread=0x80d45f6a0)
    at ecore_anim.c:135
#3  0x000000080822e64b in _ecore_direct_worker (work=0x80d45f6a0)
    at ecore_thread.c:425
#4  0x0000000808dd8904 in _eina_internal_call (context=0x80d4ff9c0)
    at eina_thread.c:125
#5  0x000000080931a855 in pthread_create () from /lib/libthr.so.3
#6  0x0000000000000000 in ?? ()
(gdb)

Again, forgetting to set the context. This is amd64 FreeBSD 10.3-RELEASE. Other than this, things are pretty solid, been using E on FreeBSD for hours with few issues...I didnt think this one was relative to us but it seems after a little investigation is...howz that??? Not out!

This comment was removed by netstar.

ok - your bt isn't a problem. that's the animator thread that is pretty much forever sleeping in select() until something wakes it up - either an fd message which is to go change the timeout from "infinite" to "next animation slot time" ... or its a timeout... so it eventually will wake up. look at the other threads. gdb cmds:

info threads
thread 1
bt
thread 2
bt
thread 3
bt

etc. - go look at threads that are stuck. there is a whole sw scaler thread that will sit waiting all day for cmds - it's going to me stuck until it gets a cmd to wake it up... find a thread that should NOT be stuck... :) of course unless this select there never ever ever wakes up because either bsd kernel has ignored the timeout (you need to find parameters of the select to see if it asked for a timeout) ... or someone write()s to the fd (timer_fd_write which is a pipe that goes to the read fd) and that write is somehow lost? or never wakes up the select? this would stop all animation and actually all rendering if these animator ticks don't happen...

Okay, another backtrace:

(gdb) bt
#0  0x00001939adc9ba1a in thrkill () at <stdin>:2
#1  0x00001939adc97109 in *_libc_abort ()
    at /usr/src/lib/libc/stdlib/abort.c:52
#2  0x00001939adc79539 in wrterror (d=0x19395ba6a310, 
    msg=0x1939adda2521 "use after free", p=0x1939de790400)
    at /usr/src/lib/libc/stdlib/malloc.c:286
#3  0x00001939adc7a969 in ofree (pool=0x19395ba6a310, p=0x1939de790400)
    at /usr/src/lib/libc/stdlib/malloc.c:1248
#4  0x00001939adc7aa2b in free (ptr=0x19393a90d980)
    at /usr/src/lib/libc/stdlib/malloc.c:1354
#5  0x0000193973c858ac in pthread_spin_destroy (lock=0x193972cee100)
    at /usr/src/lib/librthread/rthread_spin_lock.c:54
#6  0x00001939fd8d2de5 in _evas_cache_image_entry_delete (
    cache=0x193a00783a00, ie=0x193972cee000) at eina_inline_lock_posix.x:820
#7  0x00001939ba476805 in evas_software_xlib_outbuf_flush (buf=0x193a00783900, 
    rects=Variable "rects" is not available.
) at modules/evas/engines/software_x11/evas_xlib_outbuf.c:834
#8  0x00001939fd96ae06 in eng_output_flush (data=0x193962d01300, render_mode=6)
    at modules/evas/engines/software_generic/evas_engine.c:3716
#9  0x00001939fd8b9942 in evas_render_pipe_wakeup (data=0x193a0f269070)
    at lib/evas/canvas/evas_render.c:3057
#10 0x00001939fd94093b in evas_thread_worker_func (data=Variable "data" is not available.
)
    at lib/evas/common/evas_thread_render.c:110
#11 0x00001939454164ee in _eina_internal_call (context=Variable "context" is not available.
)
---Type <return> to continue, or q <return> to quit---
    at lib/eina/eina_thread.c:125
#12 0x0000193973c867ce in _rthread_start (v=Variable "v" is not available.
)
    at /usr/src/lib/librthread/rthread.c:135
#13 0x00001939adc235ab in __tfork_thread ()
    at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:75
#14 0x0000000000000000 in ?? ()
(gdb)

And another!!!

OpenBSD 5.9-CURRENT EFL 1.17 with (3 patches)

#0  _dl_bind (object=0x1939810c5400, index=Variable "index" is not available.
)
    at /usr/src/libexec/ld.so/amd64/rtld_machine.c:389
#1  0x00001939860009e1 in _dl_bind_start ()
    at /usr/src/libexec/ld.so/amd64/ldasm.S:151
#2  0x0000193728adc187 in _e_x_composite_shutdown () at e_signals.c:27
#3  0x0000193728adc1a9 in e_sigabrt_act (x=Variable "x" is not available.
) at e_signals.c:117
#4  <signal handler called>
#5  0x00001939adc9ba1a in thrkill () at <stdin>:2
#6  0x00001939adc97109 in *_libc_abort ()
    at /usr/src/lib/libc/stdlib/abort.c:52
#7  0x00001939adc79539 in wrterror (d=0x19395ba6a310, 
    msg=0x1939adda2521 "use after free", p=0x1939de790400)
    at /usr/src/lib/libc/stdlib/malloc.c:286
#8  0x00001939adc7a969 in ofree (pool=0x19395ba6a310, p=0x1939de790400)
    at /usr/src/lib/libc/stdlib/malloc.c:1248
#9  0x00001939adc7aa2b in free (ptr=0x19393a90d980)
    at /usr/src/lib/libc/stdlib/malloc.c:1354
#10 0x0000193973c858ac in pthread_spin_destroy (lock=0x193972cee100)
    at /usr/src/lib/librthread/rthread_spin_lock.c:54
#11 0x00001939fd8d2de5 in _evas_cache_image_entry_delete (
    cache=0x193a00783a00, ie=0x193972cee000) at eina_inline_lock_posix.x:820
#12 0x00001939ba476805 in evas_software_xlib_outbuf_flush (buf=0x193a00783900, 
    rects=Variable "rects" is not available.
) at modules/evas/engines/software_x11/evas_xlib_outbuf.c:834
---Type <return> to continue, or q <return> to quit---
#13 0x00001939fd96ae06 in eng_output_flush (data=0x193962d01300, render_mode=6)
    at modules/evas/engines/software_generic/evas_engine.c:3716
#14 0x00001939fd8b9942 in evas_render_pipe_wakeup (data=0x193a0f269070)
    at lib/evas/canvas/evas_render.c:3057
#15 0x00001939fd94093b in evas_thread_worker_func (data=Variable "data" is not available.
)
    at lib/evas/common/evas_thread_render.c:110
#16 0x00001939454164ee in _eina_internal_call (context=Variable "context" is not available.
)
    at lib/eina/eina_thread.c:125
#17 0x0000193973c867ce in _rthread_start (v=Variable "v" is not available.
)
    at /usr/src/lib/librthread/rthread.c:135
#18 0x00001939adc235ab in __tfork_thread ()
    at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:75
#19 0x0000000000000000 in ?? ()

Another thread from same:

#0  0x00001939adc4c48a in _thread_sys___thrsleep () at <stdin>:2
#1  0x0000193973c88f4c in _rthread_mutex_lock (mutexp=0x19397408d8d0, 
    trywait=0, abstime=0x0) at /usr/src/lib/librthread/rthread_sync.c:148
#2  0x00001939adc7be8d in malloc (size=36)
    at /usr/src/lib/libc/stdlib/malloc.c:1201
#3  0x000019396d8238c2 in _xcb_in_read (c=0x193a0e380000)
    at /usr/xenocara/lib/libxcb/libxcb/../../../dist/libxcb/src/xcb_in.c:259
#4  0x000019396d826596 in _xcb_conn_wait (c=0x193a0e380000, cond=Variable "cond" is not available.
)
    at /usr/xenocara/lib/libxcb/libxcb/../../../dist/libxcb/src/xcb_conn.c:495
#5  0x000019396d8250d3 in _xcb_out_send (c=0x193a0e380000, 
    vector=0x7f7ffffc5300, count=3)
    at /usr/xenocara/lib/libxcb/libxcb/../../../dist/libxcb/src/xcb_out.c:399
#6  0x000019396d8254c8 in xcb_writev (c=0x193a0e380000, vector=0x7f7ffffc5300, 
    count=3, requests=2)
    at /usr/xenocara/lib/libxcb/libxcb/../../../dist/libxcb/src/xcb_out.c:347
#7  0x0000193a21be3588 in _XSend () from /usr/X11R6/lib/libX11.so.16.1
#8  0x0000193a21be3f7e in _XEventsQueued () from /usr/X11R6/lib/libX11.so.16.1
#9  0x0000193a21bd4ddd in XPending () from /usr/X11R6/lib/libX11.so.16.1
#10 0x00001939971b4f99 in _ecore_x_fd_handler_buf (data=Variable "data" is not available.
)
    at lib/ecore_x/xlib/ecore_x.c:1173
#11 0x0000193962286277 in _ecore_main_fd_handlers_buf_call ()
    at ecore_private.h:414
#12 0x0000193962286537 in _ecore_main_loop_iterate_internal (once_only=0)
---Type <return> to continue, or q <return> to quit---
    at lib/ecore/ecore_main.c:2382
#13 0x000019396228676a in ecore_main_loop_begin ()
    at lib/ecore/ecore_main.c:1297
#14 0x0000193728a3518b in main (argc=Variable "argc" is not available.
) at e_main.c:1048
raster added a comment.Apr 1 2016, 7:46 PM

the last one above is xcb that is underneath xlib trying to lock access to the x connection. so either it's an xcb bug, some memory corruption of the lock itself ... or ANOTHER thread is currently holding the lock there and this is having to sit and wait. if you see this again look at the other threads to see if something else is inside some xlib call doing something. if not... well we have a deeper problem ass this shouldn't happen. even if something else is inside a lock - it shouldnt HANG forever ... find out where/what/why.

the other 2 are the same/similar issue. :)

raster added a comment.Apr 1 2016, 8:31 PM

ok- for the above 2 aborts on delete of image (not the latest bt but 2 before) could you at all get this to happen under valgrind? and try latest git? this is either a corruption of the image struct content or indeed a double-free of something there.

can you try this patch? i hope it will "abort" then on the canary1/2 checks:

the question then will be finding the first time something is freed - valgrind as i mentioned can figure this out. :)

Hi Raster, this is not the best working environment, I've another backtrace:

(gdb) bt
#0  0x00001d35a657ca1a in thrkill () at <stdin>:2
#1  0x00001d35a6578109 in *_libc_abort ()
    at /usr/src/lib/libc/stdlib/abort.c:52
#2  0x00001d35a655a539 in wrterror (d=0x1d3615c7e520, 
    msg=0x1d35a6683521 "use after free", p=0x1d35345dc800)
    at /usr/src/lib/libc/stdlib/malloc.c:286
#3  0x00001d35a655b969 in ofree (pool=0x1d3615c7e520, p=0x1d35345dc800)
    at /usr/src/lib/libc/stdlib/malloc.c:1248
#4  0x00001d35a655ba2b in free (ptr=0x1d35f3f2ea00)
    at /usr/src/lib/libc/stdlib/malloc.c:1354
#5  0x00001d358805ac32 in _XDestroyImage () from /usr/X11R6/lib/libX11.so.16.1
#6  0x00001d3541b981fd in evas_software_xlib_x_output_buffer_free (
    xob=0x1d3553d29700, psync=Variable "psync" is not available.
)
    at modules/evas/engines/software_x11/evas_xlib_buffer.c:388
#7  0x00001d3541b96815 in evas_software_xlib_outbuf_flush (buf=0x1d360d837b00, 
    rects=Variable "rects" is not available.
) at modules/evas/engines/software_x11/evas_xlib_outbuf.c:835
#8  0x00001d358f90de06 in eng_output_flush (data=0x1d35dcb97300, render_mode=6)
    at modules/evas/engines/software_generic/evas_engine.c:3716
#9  0x00001d358f85c942 in evas_render_pipe_wakeup (data=0x1d35e03a3070)
    at lib/evas/canvas/evas_render.c:3057
#10 0x00001d358f8e393b in evas_thread_worker_func (data=Variable "data" is not available.
)
    at lib/evas/common/evas_thread_render.c:110
#11 0x00001d35d63324ee in _eina_internal_call (context=Variable "context" is not available.
)
---Type <return> to continue, or q <return> to quit---
    at lib/eina/eina_thread.c:125
#12 0x00001d361780c7ce in _rthread_start (v=Variable "v" is not available.
)
    at /usr/src/lib/librthread/rthread.c:135
#13 0x00001d35a65045ab in __tfork_thread ()
    at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:75
#14 0x0000000000000000 in ?? ()
raster added a comment.Apr 4 2016, 7:05 PM

that kind of seems related - an outbuf x11 resource is being freed twice. how/why? as per the above. i need to know the other free location.

al@linux-g7r8:~> ==26537== Invalid read of size 4
==26537==    at 0x822B069: ecore_evas_get (ecore_evas.c:1167)
==26537==    by 0x458247: e_canvas_idle_flush (e_canvas.c:120)
==26537==    by 0x437435: main (e_main.c:1055)
==26537==  Address 0x1988a338 is 24 bytes inside a block of size 176 free'd
==26537==    at 0x4C2B3CB: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==26537==    by 0xA6AE912: evas_render_mapped (evas_render.c:1805)
==26537==    by 0xA6B08E7: evas_render_updates_internal_loop (evas_render.c:2380)
==26537==    by 0xA6B2CBB: evas_render_updates_internal (evas_render.c:2759)
==26537==    by 0xA6B41AC: _evas_canvas_render_async (evas_render.c:3102)
==26537==    by 0xA63972C: evas_canvas_render_async (in /usr/local/lib64/libevas.so.1.17.0)
==26537==    by 0xA63D434: evas_render_async (evas_canvas.eo.c:901)
==26537==    by 0x144DF976: _ecore_evas_x_render (ecore_evas_x.c:813)
==26537==    by 0x822E4CD: _ecore_evas_idle_enter (ecore_evas.c:172)
==26537==    by 0xC6FE04D: _ecore_call_task_cb (ecore_private.h:336)
==26537==    by 0xC6FE04D: _ecore_idle_enterer_call (ecore_idle_enterer.c:187)
==26537==    by 0xC7013A8: _ecore_main_loop_iterate_internal (ecore_main.c:2308)
==26537==    by 0xC701986: ecore_main_loop_begin (ecore_main.c:1297)
==26537==  Block was alloc'd at
==26537==    at 0x4C2A16F: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==26537==    by 0xA71A6E5: evas_common_draw_context_dup (evas_draw_main.c:108)
==26537==    by 0xA76076D: eng_context_dup (evas_engine.c:456)
==26537==    by 0xA6AE73D: evas_render_mapped (evas_render.c:1765)
==26537==    by 0xA6B08E7: evas_render_updates_internal_loop (evas_render.c:2380)
==26537==    by 0xA6B2CBB: evas_render_updates_internal (evas_render.c:2759)
==26537==    by 0xA6B41AC: _evas_canvas_render_async (evas_render.c:3102)
==26537==    by 0xA63972C: evas_canvas_render_async (in /usr/local/lib64/libevas.so.1.17.0)
==26537==    by 0xA63D434: evas_render_async (evas_canvas.eo.c:901)
==26537==    by 0x144DF976: _ecore_evas_x_render (ecore_evas_x.c:813)
==26537==    by 0x822E4CD: _ecore_evas_idle_enter (ecore_evas.c:172)
==26537==    by 0xC6FE04D: _ecore_call_task_cb (ecore_private.h:336)
==26537==    by 0xC6FE04D: _ecore_idle_enterer_call (ecore_idle_enterer.c:187)
==26537==

waaaaait that last bt is really odd. it's like memory has been really messed up - we're accessing memory freed as part of the draw context inside evas.. but accessing it in ecore_evas ... what? that's bizarre. like really bizarre. normally valgrind points to the issue.... but here the issue happened long long long before this point... somehow a ptr was recycled or the wrong one freed or.... what?

oh btw.. is that your patched efl 1.17 or is this efl/elm from git master? also e from git master? while we're debugging all these nasties can we at least try git master as it could be these are fixed and thats why i never see them... ?

Gosh it's been a while since looking at this.

# tests/ecore/ecore_suite Ecore_Test_Ccore_Thread_Eina_Thread_Queue
Running suite(s): Ecore
10000000 messages sent
msg fd ok
28%: Checks: 7, Failures: 0, Errors: 5
tests/ecore/ecore_test_ecore_thread_eina_thread_queue.c:66:E:Ecore_Test_Ccore_Thread_Eina_Thread_Queue:ecore_test_ecore_thread_eina_thread_queue_t1:0: (after this point) Received signal 11 (Segmentation fault)
tests/ecore/ecore_test_ecore_thread_eina_thread_queue.c:214:E:Ecore_Test_Ccore_Thread_Eina_Thread_Queue:ecore_test_ecore_thread_eina_thread_queue_t3:0: (after this point) Received signal 11 (Segmentation fault)
tests/ecore/ecore_test_ecore_thread_eina_thread_queue.c:304:E:Ecore_Test_Ccore_Thread_Eina_Thread_Queue:ecore_test_ecore_thread_eina_thread_queue_t4:0: (after this point) Received signal 11 (Segmentation fault)
tests/ecore/ecore_test_ecore_thread_eina_thread_queue.c:395:E:Ecore_Test_Ccore_Thread_Eina_Thread_Queue:ecore_test_ecore_thread_eina_thread_queue_t5:0: (after this point) Received signal 11 (Segmentation fault)
tests/ecore/ecore_test_ecore_thread_eina_thread_queue.c:501:E:Ecore_Test_Ccore_Thread_Eina_Thread_Queue:ecore_test_ecore_thread_eina_thread_queue_t6:0: (after this point) Received signal 11 (Segmentation fault)

@raster

I've quickly de-constructed the tests attached is the new standalone test code (maybe you can fix this).

ATM this happens in the loop:

EINA ERROR: 'Invalid argument' on spin_trylock 0xce5d933d448
EINA ERROR: 'Invalid argument' on spin_trylock 0xce597c8c210
EINA ERROR: 'Invalid argument' on spin_trylock 0xce5d933d448
EINA ERROR: 'Invalid argument' on spin_trylock 0xce597c8c210
EINA ERROR: 'Invalid argument' on spin_trylock 0xce5d933d448
EINA ERROR: 'Invalid argument' on spin_trylock 0xce597c8c210
EINA ERROR: 'Invalid argument' on spin_trylock 0xce5d933d448
EINA ERROR: 'Invalid argument' on spin_trylock 0xce597c8c210
EINA ERROR: 'Invalid argument' on spin_trylock 0xce5d933d448
EINA ERROR: 'Invalid argument' on spin_trylock 0xce597c8c210

I've no idea if that helps, am hoping it might.

raster added a comment.Aug 4 2016, 7:26 PM

well segfaults are bad.can you get a backtrace to the segfault issues? you can stop check (the test infrA) from forking with:

export CK_FORK=no
raster added a comment.Aug 4 2016, 7:55 PM

btw your test case is broken. it'll HANG on test1() and never go to test2() the test.c

is a fix. :) now it completes to the end for me cleanly with no complaints. so...

can you get a backtrace to the invalid arguments too? on linux it's all fine btw.

ok scratch that last test.c update.

now fixed much better.

i have sat and read eina_thread_queue.c and i cannot see where it would be getting thigns wrong - like NOT iniiting a spinlock on a threadqueue.

well non-atomics had a compile warning - it should be harmless. i fixed that. does openbsd build with atomics enabled or not in eina_thread_queue.c ?

the test case doesn't even shut down eina or ecore so not accessing data that is being free on shutdown thus junk. the backtrace would help me identify a code path/point to look more closely at, but a review is showing me NADA.

the ONLY thing i can think of is the threadqueue struct or the thread queue message block HEADER has been corrupted/written over. thus the spinlock is now junk. valgrind on linux is happy as a big in mud (after i fixed it up)...

so what? why? do bsd spinlocks have bugs? i don't know. there is an invalid error - why is this so? why invalid argument?

Woke up going back to bed but just ran test.

EINA ERROR: 'Invalid argument' on spin_trylock 0x10fff6543390
EINA ERROR: 'Invalid argument' on spin_trylock 0x110006b1fe48
EINA ERROR: 'Invalid argument' on spin_trylock 0x10fff6543390
EINA ERROR: 'Invalid argument' on spin_trylock 0x110006b1fe48
EINA ERROR: 'Invalid argument' on spin_trylock 0x10fff6543390
EINA ERROR: 'Invalid argument' on spin_trylock 0x110006b1fe48
EINA ERROR: 'Invalid argument' on spin_trylock 0x10fff6543390
EINA ERROR: 'Invalid argument' on spin_trylock 0x110006b1fe48
EINA ERROR: 'Invalid argument' on spin_trylock 0x10fff6543390
EINA ERROR: 'Invalid argument' on spin_trylock 0x110006b1fe48
EINA ERROR: 'Invalid argument' on spin_trylock 0x10fff6543390
EINA ERROR: 'Invalid argument' on spin_trylock 0x110006b1fe48
EINA ERROR: 'Invalid argument' on spin_trylock 0x10fff6543390


* 3 thread 1040805  0x000010ffa40a167a in _thread_sys___thrwakeup () at <stdin>:2
  2 thread 1049063  0x000010ffa409634a in _thread_sys_nanosleep () at <stdin>:2
  1 thread 1017645  0x000010ffa406500a in _thread_sys___thrsleep () at <stdin>:2
(gdb) bt
#0  0x000010ffa40a167a in _thread_sys___thrwakeup () at <stdin>:2
#1  0x000011003e3d7292 in _thread_funlockfile (fp=Variable "fp" is not available.
) at /usr/src/lib/librthread/rthread_file.c:297
#2  0x000010ffa4075c1b in *_libc_vfprintf (fp=0x10ffa44f51f8, fmt0=0x10fff9c5f825 "EINA ERROR: '%s' on %s %p\n", ap=Variable "ap" is not available.
) at /usr/src/lib/libc/stdio/vfprintf.c:263
#3  0x000010ffa406363c in *_libc_printf (fmt=Variable "fmt" is not available.
) at /usr/src/lib/libc/stdio/printf.c:44
#4  0x000010fff9b31906 in eina_spinlock_take (spinlock=0x110006b1fe48) at eina_inline_lock_posix.x:761
#5  0x000010fff9b32211 in _eina_thread_queue_msg_fetch_done (blk=0x110006b1fe38) at lib/eina/eina_thread_queue.c:324
#6  0x000010fff9b32616 in eina_thread_queue_wait_done (thq=0x10ffc8072580, allocref=0x110006b1fe38) at lib/eina/eina_thread_queue.c:450
#7  0x000010fd4f501041 in th2_do (data=0x0, th=0x10ff8f390b00) at test.c:54
#8  0x000010ff95a3b43b in _ecore_direct_worker (work=0x10ff8f390b00) at lib/ecore/ecore_thread.c:425
#9  0x000010fff9b313a5 in _eina_internal_call (context=0x10ffee449c00) at lib/eina/eina_thread.c:150
#10 0x000011003e3d831e in _rthread_start (v=Variable "v" is not available.
) at /usr/src/lib/librthread/rthread.c:115
#11 0x000010ffa409068b in __tfork_thread () at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:75
#12 0x0000000000000000 in ?? ()
(gdb) thread 2
[Switching to thread 2 (thread 1049063)]#0  0x000010ffa409634a in _thread_sys_nanosleep () at <stdin>:2
2       <stdin>: No such file or directory.
        in <stdin>
(gdb) bt
#0  0x000010ffa409634a in _thread_sys_nanosleep () at <stdin>:2
#1  0x000010ffa40b7647 in *_libc_nanosleep_cancel (timeout=Variable "timeout" is not available.
) at /usr/src/lib/libc/sys/w_nanosleep.c:27
#2  0x000010ffa407ffa7 in usleep (useconds=1160598240) at /usr/src/lib/libc/gen/usleep.c:46
#3  0x000010fd4f500f4b in th1_do (data=0x0, th=0x1100317d0c00) at test.c:32
#4  0x000010ff95a3b43b in _ecore_direct_worker (work=0x1100317d0c00) at lib/ecore/ecore_thread.c:425
#5  0x000010fff9b313a5 in _eina_internal_call (context=0x10ffe28f8cc0) at lib/eina/eina_thread.c:150
#6  0x000011003e3d831e in _rthread_start (v=Variable "v" is not available.
) at /usr/src/lib/librthread/rthread.c:115
#7  0x000010ffa409068b in __tfork_thread () at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:75
#8  0x0000000000000000 in ?? ()
(gdb) thread 1
[Switching to thread 1 (thread 1017645)]#0  0x000010ffa406500a in _thread_sys___thrsleep () at <stdin>:2
2       <stdin>: No such file or directory.
        in <stdin>
(gdb) bt
#0  0x000010ffa406500a in _thread_sys___thrsleep () at <stdin>:2
#1  0x000011003e3d74a6 in _thread_flockfile (fp=0x10ffa44f51f8) at /usr/src/lib/librthread/rthread_file.c:208
#2  0x000010ffa4075bfa in *_libc_vfprintf (fp=0x10ffa44f51f8, fmt0=0x10fff9c5f825 "EINA ERROR: '%s' on %s %p\n", ap=0x7f7fffff4d70) at /usr/src/lib/libc/stdio/vfprintf.c:261
#3  0x000010ffa406363c in *_libc_printf (fmt=Variable "fmt" is not available.
) at /usr/src/lib/libc/stdio/printf.c:44
#4  0x000010fff9b31906 in eina_spinlock_take (spinlock=0x10fff6543390) at eina_inline_lock_posix.x:761
#5  0x000010fff9b32211 in _eina_thread_queue_msg_fetch_done (blk=0x10fff6543380) at lib/eina/eina_thread_queue.c:324
#6  0x000010fff9b32616 in eina_thread_queue_wait_done (thq=0x10fff6543380, allocref=0x10fff6543380) at lib/eina/eina_thread_queue.c:450
#7  0x000010fd4f50119e in test1 () at test.c:80
#8  0x000010fd4f501445 in main () at test.c:183
(gdb)

Hope to be back later to answer those questions you have.

(gdb) run
Starting program: /home/src/efl/src/tests/ecore/a.out

Program received signal SIGSEGV, Segmentation fault.
0x000014c13cd0126e in test1 () at test.c:82
82              if (DEBUG) printf("O[%4i]: %4i   [%4i]\n", i, msg->value, eina_thread_queue_pending_get(thq2));
(gdb) info threads
  3 thread 1074811  _dl_find_symbol_obj (object=0x14c380bfac00, name=0x14c13cd00877 "usleep", hash=22820181510264, flags=47710507, this=0x14c37915b608, weak_sym=0x14c37915b5a0,
    weak_object=0x14c37915b598) at /usr/src/libexec/ld.so/resolve.c:553
  2 thread 1086512  _dl_bind (object=0x14c380bfae00, index=Variable "index" is not available.
) at /usr/src/libexec/ld.so/amd64/rtld_machine.c:389
* 1 thread 1029793  0x000014c13cd0126e in test1 () at test.c:82
(gdb) bt
#0  0x000014c13cd0126e in test1 () at test.c:82
#1  0x000014c13cd01595 in main () at test.c:188
(gdb) thread 2
[Switching to thread 2 (thread 1086512)]#0  _dl_bind (object=0x14c380bfae00, index=Variable "index" is not available.
) at /usr/src/libexec/ld.so/amd64/rtld_machine.c:389
389     /usr/src/libexec/ld.so/amd64/rtld_machine.c: No such file or directory.
        in /usr/src/libexec/ld.so/amd64/rtld_machine.c
(gdb) bt
#0  _dl_bind (object=0x14c380bfae00, index=Variable "index" is not available.
) at /usr/src/libexec/ld.so/amd64/rtld_machine.c:389
#1  0x000014c3aee00995 in _dl_bind_start () at /usr/src/libexec/ld.so/amd64/ldasm.S:152
#2  0x000014c13cd00ffb in th1_do (data=0x0, th=0x14c3ee137800) at test.c:32
#3  0x000014c402d9943b in _ecore_direct_worker (work=0x14c3ee137800) at lib/ecore/ecore_thread.c:425
#4  0x000014c3a39cf3a5 in _eina_internal_call (context=0x14c365a4fca0) at lib/eina/eina_thread.c:150
#5  0x000014c40964631e in _rthread_start (v=Variable "v" is not available.
) at /usr/src/lib/librthread/rthread.c:115
#6  0x000014c353e4268b in __tfork_thread () at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:75
#7  0x0000000000000000 in ?? ()
(gdb) thread 3
[Switching to thread 3 (thread 1074811)]#0  _dl_find_symbol_obj (object=0x14c380bfac00, name=0x14c13cd00877 "usleep", hash=22820181510264, flags=47710507, this=0x14c37915b608,
    weak_sym=0x14c37915b5a0, weak_object=0x14c37915b598) at /usr/src/libexec/ld.so/resolve.c:553
553     /usr/src/libexec/ld.so/resolve.c: No such file or directory.
        in /usr/src/libexec/ld.so/resolve.c
(gdb) bt
#0  _dl_find_symbol_obj (object=0x14c380bfac00, name=0x14c13cd00877 "usleep", hash=22820181510264, flags=47710507, this=0x14c37915b608, weak_sym=0x14c37915b5a0, weak_object=0x14c37915b598)
    at /usr/src/libexec/ld.so/resolve.c:553
#1  0x000014c3aee02289 in _dl_find_symbol (name=0x14c13cd00877 "usleep", this=0x14c37915b608, flags=48, ref_sym=0x14c13cd00650, req_obj=0x14c380bfae00, pobj=0x14c37915b600)
    at /usr/src/libexec/ld.so/resolve.c:669
#2  0x000014c3aee04a43 in _dl_bind (object=0x14c380bfae00, index=Variable "index" is not available.
) at /usr/src/libexec/ld.so/amd64/rtld_machine.c:363
#3  0x000014c3aee00995 in _dl_bind_start () at /usr/src/libexec/ld.so/amd64/ldasm.S:152
#4  0x000014c13cd010f8 in th2_do (data=0x0, th=0x14c341d7a900) at test.c:54
#5  0x000014c402d9943b in _ecore_direct_worker (work=0x14c341d7a900) at lib/ecore/ecore_thread.c:425
#6  0x000014c3a39cf3a5 in _eina_internal_call (context=0x14c365a4ffc0) at lib/eina/eina_thread.c:150
#7  0x000014c40964631e in _rthread_start (v=Variable "v" is not available.
) at /usr/src/lib/librthread/rthread.c:115
#8  0x000014c353e4268b in __tfork_thread () at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:75
#9  0x0000000000000000 in ?? ()
(gdb)

:) back later!

Program received signal SIGSEGV, Segmentation fault.
0x000019612a20126e in test1 () at test.c:82
82              if (DEBUG) printf("O[%4i]: %4i   [%4i]\n", i, msg->value, eina_thread_queue_pending_get(thq2));
(gdb) bt
#0  0x000019612a20126e in test1 () at test.c:82
#1  0x000019612a201595 in main () at test.c:188
(gdb) print thq2
$1 = (Eina_Thread_Queue *) 0x19635310a080
(gdb) print msg->value
Cannot access memory at address 0x4
0x00000eee232010fc in th2_do (data=0x0, th=0xef0f9870200) at test.c:55
55              val = msg->value;
(gdb) print msg
$1 = (Msg *) 0x0
(gdb)

wtf? ok. i see. msg is null. it should never have been. this shouldnt have failed. so the segv's are just a side-effect of the msg fetch failing which in this case it shouldnt thus we're back to the original problem i think

Just so others know w/OpenBSD at least ATOMICS aren't enabled.

can you add checks for msg returns i (if (!msg) abort(); ?

(gdb) run
Starting program: /home/src/efl/src/tests/ecore/a.out 

Program received signal SIGABRT, Aborted.
0x000013fe80c6e86a in thrkill () at <stdin>:2
2       <stdin>: No such file or directory.
        in <stdin>
Current language:  auto; currently asm
(gdb) bt
#0  0x000013fe80c6e86a in thrkill () at <stdin>:2
#1  0x000013fe80c36e49 in *_libc_abort () at /usr/src/lib/libc/stdlib/abort.c:52
#2  0x000013fc77b012bd in test1 () at test.c:84
#3  0x000013fc77b015fd in main () at test.c:195
(gdb) info threads
  3 thread 1050343  0x000013fe80c6e86a in thrkill () at <stdin>:2
  2 thread 1067395  0x000013fe80c4d34a in _thread_sys_nanosleep () at <stdin>:2
* 1 thread 1080391  0x000013fe80c6e86a in thrkill () at <stdin>:2
(gdb) thread 2
[Switching to thread 2 (thread 1067395)]#0  0x000013fe80c4d34a in _thread_sys_nanosleep () at <stdin>:2
2       <stdin>: No such file or directory.
        in <stdin>
(gdb) bt
#0  0x000013fe80c4d34a in _thread_sys_nanosleep () at <stdin>:2
#1  0x000013fe80c6e647 in *_libc_nanosleep_cancel (timeout=Variable "timeout" is not available.
) at /usr/src/lib/libc/sys/w_nanosleep.c:27
#2  0x000013fe80c36fa7 in usleep (useconds=1686440096) at /usr/src/lib/libc/gen/usleep.c:46
#3  0x000013fc77b0103b in th1_do (data=0x0, th=0x13ff0bb64200) at test.c:32
#4  0x000013fefb63943b in _ecore_direct_worker (work=0x13ff0bb64200) at lib/ecore/ecore_thread.c:425
#5  0x000013ff1a96d3a5 in _eina_internal_call (context=0x13fed57ea760) at lib/eina/eina_thread.c:150
#6  0x000013fecdef731e in _rthread_start (v=Variable "v" is not available.
) at /usr/src/lib/librthread/rthread.c:115
#7  0x000013fe80c4768b in __tfork_thread () at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:75
#8  0x0000000000000000 in ?? ()
(gdb) thread 3
[Switching to thread 3 (thread 1050343)]#0  0x000013fe80c6e86a in thrkill () at <stdin>:2
2       <stdin>: No such file or directory.
        in <stdin>
(gdb) bt
#0  0x000013fe80c6e86a in thrkill () at <stdin>:2
#1  0x000013fe80c36e49 in *_libc_abort () at /usr/src/lib/libc/stdlib/abort.c:52
#2  0x000013fc77b0110d in th2_do (data=0x0, th=0x13ff4fca5a00) at test.c:55
#3  0x000013fefb63943b in _ecore_direct_worker (work=0x13ff4fca5a00) at lib/ecore/ecore_thread.c:425
#4  0x000013ff1a96d3a5 in _eina_internal_call (context=0x13fef833d720) at lib/eina/eina_thread.c:150
#5  0x000013fecdef731e in _rthread_start (v=Variable "v" is not available.
) at /usr/src/lib/librthread/rthread.c:115
#6  0x000013fe80c4768b in __tfork_thread () at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:75
#7  0x0000000000000000 in ?? ()
(gdb)

The above is my own and this is from your test.c where you wanted the checks:

Current language:  auto; currently asm
(gdb) bt
#0  0x00000f66bd76f86a in thrkill () at <stdin>:2
#1  0x00000f66bd737e49 in *_libc_abort () at /usr/src/lib/libc/stdlib/abort.c:52
#2  0x00000f642bf012cc in test1 () at test.c:85
#3  0x00000f642bf01622 in main () at test.c:195
(gdb) info threads
  3 thread 1002272  0x00000f66bd76f86a in thrkill () at <stdin>:2
  2 thread 1037134  _dl_find_symbol_obj (object=0xf6709256e00, name=0xf642bf008f6 "usleep", hash=130689984, flags=48, this=0xf6630287228, weak_sym=0xf66302871c0, weak_object=0xf66302871b8)
    at /usr/src/libexec/ld.so/resolve.c:580
* 1 thread 1089554  0x00000f66bd76f86a in thrkill () at <stdin>:2
(gdb) bt
#0  0x00000f66bd76f86a in thrkill () at <stdin>:2
#1  0x00000f66bd737e49 in *_libc_abort () at /usr/src/lib/libc/stdlib/abort.c:52
#2  0x00000f642bf012cc in test1 () at test.c:85
#3  0x00000f642bf01622 in main () at test.c:195
(gdb) thread 2
[Switching to thread 2 (thread 1037134)]#0  _dl_find_symbol_obj (object=0xf6709256e00, name=0xf642bf008f6 "usleep", hash=130689984, flags=48, this=0xf6630287228, weak_sym=0xf66302871c0, 
    weak_object=0xf66302871b8) at /usr/src/libexec/ld.so/resolve.c:580
580     /usr/src/libexec/ld.so/resolve.c: No such file or directory.
        in /usr/src/libexec/ld.so/resolve.c
Current language:  auto; currently c
(gdb) bt
#0  _dl_find_symbol_obj (object=0xf6709256e00, name=0xf642bf008f6 "usleep", hash=130689984, flags=48, this=0xf6630287228, weak_sym=0xf66302871c0, weak_object=0xf66302871b8)
    at /usr/src/libexec/ld.so/resolve.c:580
#1  0x00000f6673002289 in _dl_find_symbol (name=0xf642bf008f6 "usleep", this=0xf6630287228, flags=48, ref_sym=0xf642bf00650, req_obj=0xf6709256000, pobj=0xf6630287220)
    at /usr/src/libexec/ld.so/resolve.c:669
#2  0x00000f6673004a43 in _dl_bind (object=0xf6709256000, index=Variable "index" is not available.
) at /usr/src/libexec/ld.so/amd64/rtld_machine.c:363
#3  0x00000f6673000995 in _dl_bind_start () at /usr/src/libexec/ld.so/amd64/ldasm.S:152
#4  0x00000f642bf0103b in th1_do (data=0x0, th=0xf66f26aa600) at test.c:32
#5  0x00000f6659e6543b in _ecore_direct_worker (work=0xf66f26aa600) at lib/ecore/ecore_thread.c:425
#6  0x00000f66363b53a5 in _eina_internal_call (context=0xf66d54bc8c0) at lib/eina/eina_thread.c:150
#7  0x00000f668919931e in _rthread_start (v=Variable "v" is not available.
) at /usr/src/lib/librthread/rthread.c:115
#8  0x00000f66bd74868b in __tfork_thread () at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:75
#9  0x0000000000000000 in ?? ()
(gdb) thread 3
[Switching to thread 3 (thread 1002272)]#0  0x00000f66bd76f86a in thrkill () at <stdin>:2
2       <stdin>: No such file or directory.
        in <stdin>
Current language:  auto; currently asm
(gdb) bt
#0  0x00000f66bd76f86a in thrkill () at <stdin>:2
#1  0x00000f66bd737e49 in *_libc_abort () at /usr/src/lib/libc/stdlib/abort.c:52
#2  0x00000f642bf0110d in th2_do (data=0x0, th=0xf663cf4eb00) at test.c:55
#3  0x00000f6659e6543b in _ecore_direct_worker (work=0xf663cf4eb00) at lib/ecore/ecore_thread.c:425
#4  0x00000f66363b53a5 in _eina_internal_call (context=0xf662f65e5a0) at lib/eina/eina_thread.c:150
#5  0x00000f668919931e in _rthread_start (v=Variable "v" is not available.
) at /usr/src/lib/librthread/rthread.c:115
#6  0x00000f66bd74868b in __tfork_thread () at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:75
#7  0x0000000000000000 in ?? ()
(gdb)

ok. well THAT return of null leads to everything else. so the others are red herrings. WHY does that return null? what is i? it is 0, right?

./a.out

i is 0
Abort trap (core dumped)

_eina_thread_queue_msg_fetch() must be returning null... why? it waited for the semaphore to wake up. did the semaphore fail? return early? eina_semaphore_lock() ? actually i don't check the return value of eina_semaphore_lock() - it is not expected to be wrong/bad... is it? in _eina_thread_queue_wait() in eina_thread_queue.c?

Just checking now

Check this out!!!

/usr/local/lib/libglib-2.0.so.4200.3: warning: warning: vsprintf() is often misused, please use vsnprintf()
  CCLD     bin/edje/edje_pick
  CCLD     bin/edje/edje_watch
  EDJ      modules/ethumb/emotion/template.edj
lib/edje/.libs/libedje.so.19.0: warning: warning: rand() may return deterministic values, is that what you want?
lib/edje/.libs/libedje.so.19.0: warning: warning: strcpy() is almost always misused, please use strlcpy()
lib/edje/.libs/libedje.so.19.0: warning: warning: sprintf() is often misused, please use snprintf()
lib/edje/.libs/libedje.so.19.0: warning: warning: strcat() is almost always misused, please use strlcat()
/usr/local/lib/libglib-2.0.so.4200.3: warning: warning: stpcpy() is dangerous; do not use it
/usr/local/lib/libglib-2.0.so.4200.3: warning: warning: vsprintf() is often misused, please use vsnprintf()
  EDJ      modules/elementary/prefs/elm_prefs_swallow.edj
ERR<88999>:eo lib/eo/eo_private.h:237 _eo_del_internal() in lib/eo/eo_private.h:334: Object of class 'Efl_Network_Server' - Not all of the object destructors have been executed.
ERR<88999>:eo lib/eo/eo_ptr_indirection.x:304 _eo_obj_pointer_get() obj_id 0x4000000030000004 is not pointing to a valid object. Maybe it has already been freed.
ERR<88999>:eo lib/eo/eo_ptr_indirection.c:22 _eo_pointer_error() Obj (0x4000000030000004) is an invalid ref.
ERR<71638>:eina_thread_queue lib/eina/eina_thread_queue.c:387 eina_thread_queue_new() Cannot init new semaphore for eina_threadqueue
ERR<71638>:eina_thread_queue lib/eina/eina_thread_queue.c:387 eina_thread_queue_new() Cannot init new semaphore for eina_threadqueue
Segmentation fault (core dumped) 
gmake[4]: *** [Makefile:51897: modules/ethumb/emotion/template.edj] Error 139
gmake[4]: *** Waiting for unfinished jobs....
ERR<88999>:eo lib/eo/eo_ptr_indirection.x:304 _eo_obj_pointer_get() obj_id 0x4000000030000004 is not pointing to a valid object. Maybe it has already been freed.
ERR<88999>:eo lib/eo/eo_ptr_indirection.c:22 _eo_pointer_error() Obj (0x4000000030000004) is an invalid ref.
lib/edje/.libs/libedje.so.19.0: warning: warning: rand() may return deterministic values, is that what you want?
lib/edje/.libs/libedje.so.19.0: warning: warning: strcpy() is almost always misused, please use strlcpy()
lib/edje/.libs/libedje.so.19.0: warning: warning: sprintf() is often misused, please use snprintf()
lib/edje/.libs/libedje.so.19.0: warning: warning: strcat() is almost always misused, please use strlcat()
/usr/local/lib/libglib-2.0.so.4200.3: warning: warning: stpcpy() is dangerous; do not use it
/usr/local/lib/libglib-2.0.so.4200.3: warning: warning: vsprintf() is often misused, please use vsnprintf()
lib/edje/.libs/libedje.so.19.0: warning: warning: rand() may return deterministic values, is that what you want?
lib/edje/.libs/libedje.so.19.0: warning: warning: strcpy() is almost always misused, please use strlcpy()
lib/edje/.libs/libedje.so.19.0: warning: warning: sprintf() is often misused, please use snprintf()
lib/edje/.libs/libedje.so.19.0: warning: warning: strcat() is almost always misused, please use strlcat()
/usr/local/lib/libglib-2.0.so.4200.3: warning: warning: stpcpy() is dangerous; do not use it
/usr/local/lib/libglib-2.0.so.4200.3: warning: warning: vsprintf() is often misused, please use vsnprintf()
/home/src/efl/src/lib/eio/.libs/libeio.so.19.0: warning: warning: strcpy() is almost always misused, please use strlcpy()
/home/src/efl/src/lib/eo/.libs/libeo.so.19.0: warning: warning: sprintf() is often misused, please use snprintf()
/home/src/efl/src/lib/eina/.libs/libeina.so.19.0: warning: warning: rand() may return deterministic values, is that what you want?
/home/src/efl/src/lib/eina/.libs/libeina.so.19.0: warning: warning: sprintf() is often misused, please use snprintf()
/usr/local/lib/libglib-2.0.so.4200.3: warning: warning: stpcpy() is dangerous; do not use it
/usr/local/lib/libglib-2.0.so.4200.3: warning: warning: vsprintf() is often misused, please use vsnprintf()
/usr/local/lib/libglib-2.0.so.4200.3: warning: warning: strcat() is almost always misused, please use strlcat()
gmake[3]: *** [Makefile:49587: all-recursive] Error 1
gmake[2]: *** [Makefile:16711: all] Error 2
gmake[1]: *** [Makefile:2956: all-recursive] Error 1
gmake: *** [Makefile:2053: all] Error 2
#
kern.malloc.kmemstat.sem=(inuse = 2, calls = 4, memuse = 10K, limblocks = 0, mapblocks = 0, maxused = 11K, limit = 78644K, spare = 0, sizes = (32,128,2048,8192))
kern.seminfo.semmni=1024
kern.seminfo.semmns=4096
kern.seminfo.semmnu=30
kern.seminfo.semmsl=60
kern.seminfo.semopm=100
kern.seminfo.semume=10
kern.seminfo.semusz=112
kern.seminfo.semvmx=32767
kern.seminfo.semaem=16384
#
Setting that we can change in /etc/sysctl.conf
the maximum number of System V IPC system-wide semaphore sets (and identifiers) which can exist at any given time:
#kern.seminfo.semmni

#the maximum total individual System V IPC semaphores which can be assigned by applications:
kern.seminfo.semmns

#the amount of shared memory available in the system (bytes):
kern.shminfo.shmmax

#the maximum number of shared memory segments:
sysctl kern.shminfo.shmseg
Error (1) (Operation not permitted)
ERR<76543>:eina_thread_queue lib/eina/eina_thread_queue.c:388 eina_thread_queue_new() Cannot init new semaphore for eina_threadqueue
Error (1) (Operation not permitted)
ERR<76543>:eina_thread_queue lib/eina/eina_thread_queue.c:388 eina_thread_queue_new() Cannot init new semaphore for eina_threadqueue