Page MenuHomePhabricator

efreetd: FreeBSD segfault.
Open, HighPublic

Description

On FreeBSD 12.1 I am finding a efreetd coredump regularly but am unable to duplicate the crash manually.

Reading symbols from efreetd...
[New LWP 100126]
Core was generated by `/usr/local/bin/efreetd'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00000008002ef39c in _scheduled_entry_cb (f=0x80738ac10, value=...) at ../src/lib/eina/eina_promise.c:428
428	   Eina_Future_Scheduler *scheduler = f->scheduled_entry->scheduler;
(gdb) bt
#0  0x00000008002ef39c in _scheduled_entry_cb (f=0x80738ac10, value=...) at ../src/lib/eina/eina_promise.c:428
#1  0x0000000800ad61d7 in _futures_dispatch_cb (data=0x80180fe60, ev=0x7fffffffdb50) at ../src/lib/eo/eo_base_class.c:1806
#2  0x0000000800ad452e in _event_callback_call (obj_id=0x400000000111, pd=0x8011f5440, desc=0x8003ea3a0 <_EFL_LOOP_EVENT_IDLE_ENTER>, event_info=0x0, legacy_compare=0 '\000') at ../src/lib/eo/eo_base_class.c:2073
#3  0x0000000800ad34b0 in _efl_object_event_callback_call (obj_id=0x400000000111, pd=0x8011f5440, desc=0x8003ea3a0 <_EFL_LOOP_EVENT_IDLE_ENTER>, event_info=0x0) at ../src/lib/eo/eo_base_class.c:2158
#4  0x0000000800ac9770 in efl_event_callback_call (obj=0x400000000111, desc=0x8003ea3a0 <_EFL_LOOP_EVENT_IDLE_ENTER>, event_info=0x0) at ../src/lib/eo/eo_base_class.c:2161
#5  0x0000000800377e48 in _ecore_main_loop_iterate_internal (obj=0x400000000111, pd=0x8011f54b0, once_only=0) at ../src/lib/ecore/ecore_main.c:2413
#6  0x00000008003782c2 in _ecore_main_loop_begin (obj=0x400000000111, pd=0x8011f54b0) at ../src/lib/ecore/ecore_main.c:1200
#7  0x00000008003833dd in _efl_loop_begin (obj=0x400000000111, pd=0x8011f54b0) at ../src/lib/ecore/efl_loop.c:57
#8  0x0000000800381c4d in efl_loop_begin (obj=0x400000000111) at src/lib/ecore/efl_loop.eo.c:28
#9  0x00000008003784b4 in ecore_main_loop_begin () at ../src/lib/ecore/ecore_main.c:1285
#10 0x0000000000204590 in main (argc=1, argv=0x7fffffffe750) at ../src/bin/efreet/efreetd.c:82
(gdb) print f
$1 = (Eina_Future *) 0x80738ac10
(gdb) print f->scheduled_entry
$2 = (Eina_Future_Schedule_Entry *) 0x0
(gdb) print f->scheduled_entry->scheduler
Cannot access memory at address 0x0
(gdb) show threads
Undefined show command: "threads".  Try "help show".
(gdb) info threads
  Id   Target Id         Frame
* 1    LWP 100126        0x00000008002ef39c in _scheduled_entry_cb (f=0x80738ac10, value=...) at ../src/lib/eina/eina_promise.c:428
(gdb)

Details

netstar created this task.Nov 24 2019, 5:31 AM
netstar triaged this task as High priority.
netstar added a project: E on FreeBSD.

I could catch the same crash:

(lldb) run
Process 16266 launching
Process 16266 launched: '/usr/local/bin/efreetd' (x86_64)
Process 16266 stopped
* thread #1, name = 'efreetd', stop reason = signal SIGSEGV: invalid address (fault address: 0x0)
    frame #0: 0x00000008002c9cb8 libeina.so.1`_scheduled_entry_cb(f=0x00000008029ecd90, value=Eina_Value @ 0x00000008043580d0) at eina_promise.c:428
   425 	_scheduled_entry_cb(Eina_Future *f, Eina_Value value)
   426 	{
   427 	   // This function is called by the scheduler, so it has to be defined
-> 428 	   Eina_Future_Scheduler *scheduler = f->scheduled_entry->scheduler;
   429 	
   430 	   eina_lock_take(&_pending_futures_lock);
   431 	   _pending_futures = eina_list_remove(_pending_futures, f);
(lldb) bt
* thread #1, name = 'efreetd', stop reason = signal SIGSEGV: invalid address (fault address: 0x0)
  * frame #0: 0x00000008002c9cb8 libeina.so.1`_scheduled_entry_cb(f=0x00000008029ecd90, value=Eina_Value @ 0x00000008066dba40) at eina_promise.c:428
    frame #1: 0x00000008009fb085 libeo.so.1`_futures_dispatch_cb(data=<unavailable>, ev=<unavailable>) at eo_base_class.c:1806
    frame #2: 0x00000008009fa536 libeo.so.1`_event_callback_call(obj_id=0x00004000000000a0, pd=<unavailable>, desc=<unavailable>, event_info=<unavailable>, legacy_compare=<unavailable>) at eo_base_class.c:2073
    frame #3: 0x00000008009f27d6 libeo.so.1`efl_event_callback_call(obj=0x00004000000000a0, desc=0x00000008003703f0, event_info=0x0000000000000000) at eo_base_class.c:2161
    frame #4: 0x000000080032eb85 libecore.so.1`_ecore_main_loop_iterate_internal(obj=0x00004000000000a0, pd=0x00000008016120b0, once_only=0) at ecore_main.c:2413
    frame #5: 0x000000080032f12d libecore.so.1`_ecore_main_loop_begin(obj=0x00004000000000a0, pd=0x00000008016120b0) at ecore_main.c:1200
    frame #6: 0x00000008003346b6 libecore.so.1`_efl_loop_begin(obj=0x00004000000000a0, pd=0x00000008016120b0) at efl_loop.c:57
    frame #7: 0x0000000800334136 libecore.so.1`efl_loop_begin(obj=0x00004000000000a0) at efl_loop.eo.c:28
    frame #8: 0x000000080032f223 libecore.so.1`ecore_main_loop_begin at ecore_main.c:1285
    frame #9: 0x000000000020449c efreetd`main(argc=<unavailable>, argv=<unavailable>) at efreetd.c:82
    frame #10: 0x000000000020411b efreetd`_start(ap=<unavailable>, cleanup=<unavailable>) at crt1.c:76

| LLDB (F1) | Target (F2) | Process (F3) | Thread (F4) | View (F5) | Help (F6) |
┌──<Sources>──────────────────────────────────────────────────────────────────────────────────────┐┌──<Threads>────────────┐
│ libeina.so.1`_scheduled_entry_cb                                                                ││ ◆─process 16266       │
│  418 │            }                                                                             ││                       │
│  419 │          _eina_promise_value_steal_and_link(scheduler, next_value, f);                   ││                       │
│  420 │       }                                                                                  ││                       │
│  421 │     else _eina_future_dispatch(scheduler, f, next_value);                                ││                       │
│  422 │  }                                                                                       ││                       │
│  423 │                                                                                          ││                       │
│  424 │ static void                                                                              ││                       │
│  425 │ _scheduled_entry_cb(Eina_Future *f, Eina_Value value)                                    ││                       │
│  426 │ {                                                                                        ││                       │
│  427 │    // This function is called by the scheduler, so it has to be defined                  ││                       │
│  428 │◆   Eina_Future_Scheduler *scheduler = f->scheduled_entry->scheduler;                      │                       │
   429 │                <<< Thread 1: signal SIGSEGV: invalid address (fault address: 0x0)        ││                       │
│  430 │    eina_lock_take(&_pending_futures_lock);                                               ││                       │
│  431 │    _pending_futures = eina_list_remove(_pending_futures, f);                             ││                       │
│  432 │    eina_lock_release(&_pending_futures_lock);                                            ││                       │
│  433 │    f->scheduled_entry = NULL;                                                            ││                       │
│  434 │    _eina_future_dispatch(scheduler, f, value);                                           ││                       │
│  435 │ }                                                                                        ││                       │
└─────────────────────────────────────────────────────────────────────────────────────────────────┘│                       │
┌──<Variables>────────────────────────────────────────────────────────────────────────────────────┐│                       │
│ ├─◆─(Eina_Promise *) promise = 0x00000008029ece50                                               ││                       │
│ ├─◆─(Eina_Future *) next = 0x00000008029ecdd0                                                   ││                       │
│ ├─◆─(Eina_Future *) prev = 0x0000000000000000                                                   ││                       │
│ ├─◆─(Eina_Future_Cb) cb = 0x0000000000000000                                                    ││                       │
│ ├─◆─(const void *) data = 0x0000000000000000                                                    ││                       │
│ ├─◆─(Eina_Future **) storage = 0x0000000000000000                                               ││                       │
│ └─◆─(Eina_Future_Schedule_Entry *) scheduled_entry = 0x0000000000000000                         ││                       │
└─────────────────────────────────────────────────────────────────────────────────────────────────┘└───────────────────────┘
Process: 16266    stopped               Thread: 100970      Frame:   0  PC = 0x00000008002c9cb8

BTW, it seems that when it cannot build cache (this is the case actually here) - a new E profile cannot be created.

Another test - I started efreetd in 2nd screen, being in login manager (so, without E). It started correctly, without error, being the only process. Then I open session in login manager - E tries to start new efreetd processes, at one moment I see 5 ones, then only two stays started (one old from 2nd screen and one new), E starts, saying that efreetd cannot update cache. And later the first one (started manually) was crashed, the new one was started.

Maybe it helps...

cedric added a comment.Dec 6 2019, 9:20 AM

Do you have something like valgrind? The future seems to be completely empty other than the next and promise which is weird.

Some changes here:

  1. I removed --warn-unresolved-symbols from my LDFLAGS
  2. I rebuilt all from Git with debug CFLAGS
  3. I removed additional search icon dirs from E config

Now I always have ONE instance of efreetd running (it segfaults sometimes). And I have efreet_desktop_cache_create and efreet_icon_cache_create as zombies (status Z) started by efreetd.

BT of efreetd in this state:

* thread #1, name = 'efreetd'
  * frame #0: 0x0000000801b54ba8 libc.so.7`_accept + 8
    frame #1: 0x00000008018506e6 libthr.so.3`___lldb_unnamed_symbol28$$libthr.so.3 + 54
    frame #2: 0x0000000801e99228 libecore_con.so.1`efl_net_accept4(fd=41, addr=0x00007fffffffd678, addrlen=0x00007fffffffd66c, close_on_exec='\x01') at efl_net_server_fd.c:45:20
    frame #3: 0x0000000801e98582 libecore_con.so.1`_efl_net_server_fd_process_incoming_data(o=0x00004000000025b1, pd=0x0000000804a8d4d0) at efl_net_server_fd.c:468:13
    frame #4: 0x0000000801e977ed libecore_con.so.1`efl_net_server_fd_process_incoming_data(obj=0x00004000000025b1) at efl_net_server_fd.eo.c:136:7
    frame #5: 0x0000000801e991ec libecore_con.so.1`_efl_net_server_fd_event_read(data=0x0000000000000000, event=0x00007fffffffd800) at efl_net_server_fd.c:71:4
    frame #6: 0x0000000802c48b38 libeo.so.1`_event_callback_call(obj_id=0x00004000000025b1, pd=0x0000000804a8d460, desc=0x0000000800fbf880, event_info=0x0000000000000000, legacy_compare='\0') at eo_base_class.c:2138:19
    frame #7: 0x0000000802c479b0 libeo.so.1`_efl_object_event_callback_call(obj_id=0x00004000000025b1, pd=0x0000000804a8d460, desc=0x0000000800fbf880, event_info=0x0000000000000000) at eo_base_class.c:2199:11
    frame #8: 0x0000000802c3d760 libeo.so.1`efl_event_callback_call(obj=0x00004000000025b1, desc=0x0000000800fbf880, event_info=0x0000000000000000) at eo_base_class.c:2202:7
    frame #9: 0x0000000800d45872 libecore.so.1`_efl_loop_fd_read_cb(data=0x00004000000025b1, fd_handler=0x0000000804a833d0) at efl_loop_fd.c:34:9
    frame #10: 0x0000000800d3b0b2 libecore.so.1`_ecore_call_fd_cb(func=(libecore.so.1`_efl_loop_fd_read_cb at efl_loop_fd.c:29), data=0x00004000000025b1, fd_handler=0x0000000804a833d0) at ecore_private.h:506:11
    frame #11: 0x0000000800d3a6ba libecore.so.1`_ecore_main_fd_handlers_call(obj=0x00004000000001a8, pd=0x0000000804a9e0b0) at ecore_main.c:2114:24
    frame #12: 0x0000000800d37249 libecore.so.1`_ecore_main_loop_iterate_internal(obj=0x00004000000001a8, pd=0x0000000804a9e0b0, once_only=0) at ecore_main.c:2489:4
    frame #13: 0x0000000800d374d2 libecore.so.1`_ecore_main_loop_begin(obj=0x00004000000001a8, pd=0x0000000804a9e0b0) at ecore_main.c:1200:16
    frame #14: 0x0000000800d425ed libecore.so.1`_efl_loop_begin(obj=0x00004000000001a8, pd=0x0000000804a9e0b0) at efl_loop.c:57:4
    frame #15: 0x0000000800d40e5d libecore.so.1`efl_loop_begin(obj=0x00004000000001a8) at efl_loop.eo.c:28:7
    frame #16: 0x0000000800d376c4 libecore.so.1`ecore_main_loop_begin at ecore_main.c:1285:4
    frame #17: 0x0000000000403ab0 efreetd`main(argc=1, argv=0x00007fffffffe470) at efreetd.c:82:4
    frame #18: 0x000000000040362d efreetd`_start + 141

So I removed x bit from efreetd, killed all of them, restarted E, put back x bit and manually restarted efreetd under truss. It stopped with the following error:

ERR<32363>:efreet_cache ../src/lib/efreet/efreet_cache.c:147 _ipc_launch() Timeout in trying to start and then connect to efreetd

Truss log is attached.

Backtrace du efreetd.core:

(lldb) bt
* thread #1, name = 'efreetd', stop reason = signal SIGABRT
  * frame #0: 0x0000000801b349ba libc.so.7`__sys_thr_kill + 10
    frame #1: 0x0000000801b34984 libc.so.7`raise + 52
    frame #2: 0x0000000801b348f9 libc.so.7`abort + 73
    frame #3: 0x0000000800a9ec45 libeina.so.1`_eina_mmap_safe_sigbus(sig=10, siginfo=0x00007fffffffd3b0, ptr=0x00007fffffffd040) at eina_mmap.c:124:16
    frame #4: 0x0000000801853cfe libthr.so.3`___lldb_unnamed_symbol101$$libthr.so.3 + 222
    frame #5: 0x00000008018532bf libthr.so.3`___lldb_unnamed_symbol82$$libthr.so.3 + 319
    frame #6: 0x00007ffffffff003
    frame #7: 0x0000000800d3b113 libecore.so.1`_ecore_main_loop_spin_core(obj=0x000040000000019d, pd=0x0000000804a9e0b0) at ecore_main.c:2310:9
    frame #8: 0x0000000800d3a51d libecore.so.1`_ecore_main_loop_spin_timers(obj=0x000040000000019d, pd=0x0000000804a9e0b0) at ecore_main.c:2341:22
    frame #9: 0x0000000800d371d7 libecore.so.1`_ecore_main_loop_iterate_internal(obj=0x000040000000019d, pd=0x0000000804a9e0b0, once_only=0) at ecore_main.c:2470:
28
    frame #10: 0x0000000800d374d2 libecore.so.1`_ecore_main_loop_begin(obj=0x000040000000019d, pd=0x0000000804a9e0b0) at ecore_main.c:1200:16
    frame #11: 0x0000000800d425ed libecore.so.1`_efl_loop_begin(obj=0x000040000000019d, pd=0x0000000804a9e0b0) at efl_loop.c:57:4
    frame #12: 0x0000000800d40e5d libecore.so.1`efl_loop_begin(obj=0x000040000000019d) at efl_loop.eo.c:28:7
    frame #13: 0x0000000800d376c4 libecore.so.1`ecore_main_loop_begin at ecore_main.c:1285:4
    frame #14: 0x0000000000403ab0 efreetd`main(argc=1, argv=0x00007fffffffe330) at efreetd.c:82:4
    frame #15: 0x000000000040362d efreetd`_start + 141

One more backtrace of efreetd crashed:

(lldb) bt
* thread #1, name = 'efreetd', stop reason = signal SIGSEGV
  * frame #0: 0x0000000800aa9d3c libeina.so.1`_scheduled_entry_cb(f=0x0000000808f46b50, value=Eina_Value @ 0x00007fffffffd800) at eina_promise.c:429:59
    frame #1: 0x0000000802c4a8e7 libeo.so.1`_futures_dispatch_cb(data=0x0000000804a31980, ev=0x00007fffffffd8b0) at eo_base_class.c:1847:9
    frame #2: 0x0000000802c48a3e libeo.so.1`_event_callback_call(obj_id=0x0000400000000134, pd=0x0000000804a9e040, desc=0x0000000800fbf590, event_info=0x0000000000000000, legacy_compare='\0') at eo_base_class.c:2114:24
    frame #3: 0x0000000802c479b0 libeo.so.1`_efl_object_event_callback_call(obj_id=0x0000400000000134, pd=0x0000000804a9e040, desc=0x0000000800fbf590, event_info=0x0000000000000000) at eo_base_class.c:2199:11
    frame #4: 0x0000000802c3d760 libeo.so.1`efl_event_callback_call(obj=0x0000400000000134, desc=0x0000000800fbf590, event_info=0x0000000000000000) at eo_base_class.c:2202:7
    frame #5: 0x0000000800d36fbf libecore.so.1`_ecore_main_loop_iterate_internal(obj=0x0000400000000134, pd=0x0000000804a9e0b0, once_only=0) at ecore_main.c:2386:9
    frame #6: 0x0000000800d374d2 libecore.so.1`_ecore_main_loop_begin(obj=0x0000400000000134, pd=0x0000000804a9e0b0) at ecore_main.c:1200:16
    frame #7: 0x0000000800d425ed libecore.so.1`_efl_loop_begin(obj=0x0000400000000134, pd=0x0000000804a9e0b0) at efl_loop.c:57:4
    frame #8: 0x0000000800d40e5d libecore.so.1`efl_loop_begin(obj=0x0000400000000134) at efl_loop.eo.c:28:7
    frame #9: 0x0000000800d376c4 libecore.so.1`ecore_main_loop_begin at ecore_main.c:1285:4
    frame #10: 0x0000000000403ab0 efreetd`main(argc=1, argv=0x00007fffffffe498) at efreetd.c:82:4
    frame #11: 0x000000000040362d efreetd`_start + 141

I tried to start it under valgrind, but the log is not really usable - no sources loaded (?)

Valgrind works better on another PC, log attached.

well i put in a "don't crash" workaround to the crash above... but without knowing the full flow of how it gets there, that's the best i can do.

well i put in a "don't crash" workaround to the crash above...

Thanks! I don't see any patch attached, is it in master?

but without knowing the full flow of how it gets there, that's the best i can do.

How can we go deeper? ;)

So, the new version does not crash anymore, it freezes.
Started from command-line, after some time running efreetd does not respond on Ctrl-C neither SIGTERM.
Backtrace:

(lldb) bt
* thread #1, name = 'efreetd'
  * frame #0: 0x0000000801ac8ba8 libc.so.7`_accept + 8
    frame #1: 0x00000008017c46e6 libthr.so.3`___lldb_unnamed_symbol28$$libthr.so.3 + 54
    frame #2: 0x0000000801deea24 libecore_con.so.1`_efl_net_server_fd_process_incoming_data [inlined] efl_net_accept4(fd=41, addr=0x0000000801e147fa, add
rlen=0x000000b900000080, close_on_exec='\x01') at efl_net_server_fd.c:45:20
    frame #3: 0x0000000801deea1c libecore_con.so.1`_efl_net_server_fd_process_incoming_data(o=0x000040000000256c, pd=<unavailable>) at efl_net_server_fd.
c:468
    frame #4: 0x0000000801dee176 libecore_con.so.1`efl_net_server_fd_process_incoming_data(obj=0x000040000000256c) at efl_net_server_fd.eo.c:136:7
    frame #5: 0x0000000802b82599 libeo.so.1`_event_callback_call(obj_id=0x000040000000256c, pd=<unavailable>, desc=<unavailable>, event_info=<unavailable
>, legacy_compare=<unavailable>) at eo_base_class.c:2139:19
    frame #6: 0x0000000802b7a5a6 libeo.so.1`efl_event_callback_call(obj=0x000040000000256c, desc=0x0000000800f4c6d0, event_info=0x0000000000000000) at eo
_base_class.c:2203:7
    frame #7: 0x0000000800d0097f libecore.so.1`_efl_loop_fd_read_cb(data=0x000040000000256c, fd_handler=0x0000000804a833d0) at efl_loop_fd.c:34:9
    frame #8: 0x0000000800cf8840 libecore.so.1`_ecore_main_loop_iterate_internal [inlined] _ecore_call_fd_cb(func=<unavailable>, data=<unavailable>, fd_h
andler=<unavailable>) at ecore_private.h:506:11
    frame #9: 0x0000000800cf883a libecore.so.1`_ecore_main_loop_iterate_internal at ecore_main.c:2114
    frame #10: 0x0000000800cf87d5 libecore.so.1`_ecore_main_loop_iterate_internal(obj=0x0000400000000163, pd=0x0000000804a9d0a8, once_only=0) at ecore_ma
in.c:2489
    frame #11: 0x0000000800cf8cbd libecore.so.1`_ecore_main_loop_begin(obj=0x0000400000000163, pd=0x0000000804a9d0a8) at ecore_main.c:1200:16
    frame #12: 0x0000000800cfe256 libecore.so.1`_efl_loop_begin(obj=0x0000400000000163, pd=0x0000000804a9d0a8) at efl_loop.c:57:4
    frame #13: 0x0000000800cfdcd6 libecore.so.1`efl_loop_begin(obj=0x0000400000000163) at efl_loop.eo.c:28:7
    frame #14: 0x0000000800cf8db3 libecore.so.1`ecore_main_loop_begin at ecore_main.c:1285:4
    frame #15: 0x000000000040394c efreetd`main(argc=<unavailable>, argv=<unavailable>) at efreetd.c:82:4
    frame #16: 0x000000000040362d efreetd`_start + 141

Restarting E produces another efreetd process with zombie child efreet_icon_cache_create...

a blocking accept? that's just stupid. that should not be happening. we wont be doing an accept unless the fd says its available for reads.

It seems that on manual restart of Enlightenment (Ctrl-Alt-End), EFL does not read correctly XDG_RUNTIME_DIR.
The variable is set in .zshrc, manually started efreetd uses

/var/run/user/1001/.ecore/efreetd/0

Enlightenment started using startx uses it too (at least, it tries), but after a manual restart of E I see another efreetd process, using

/tmp/xdg-g5E9qA/.ecore/efreetd/0

and in the same time the first efreetd process becomes frozen, the BT is the last I've posted (efl_net_accept4).
According to sockstat, Enlightenment is connected to

/tmp/xdg-g5E9qA/.ecore/efreetd/0

I built a small test program to check if I can bind two processes to the same socket, using default socket parameters - it is impossible, so the concept of using socket as a flag to check if the only process is started seems to be OK. Probably, there is another problem around.

if you restarted e - e may have deleted the old xdg dir and replaced it - oddly with a new xdg dir of the same name. the old efreetd will eventually die when all its clients go away. i changed e to not delete on shutdown but to have enlightenment_start do that instead.

but that still doesn't explain the blocking accept unless freebsd somehow can't differentiate the 2 sockets if they still have the same name (the only one that was deleted somehow still is being triggered with client activity from the new one?)... that would sound totally odd - i'm clutching at straws here.

I could reproduce blocking access with my test program, but not in 100% of cases.
The key point - unlink active socket and rebind it. Sometimes the old server and the client connected to the old server continue working correctly, sometimes, stopping the second server blocks the communications with the first one.
Anyway, it seems that incorrect unlink is the root of the problem as multiple servers on the same socket is definitely not a good configuration. I see the conditional unlink in _efl_net_server_unix_bind of efl_net_server_unix, but I don't understand how and where pd->unlink_before_bind is set (it seems that _efl_net_server_unix_unlink_before_bind_set is used for it, but I don't see it used in EFL/E code).
There is a good article on the subject:
https://gavv.github.io/articles/unix-socket-reuse/
A workaround proposed on Stackoverflow:
https://stackoverflow.com/questions/7405932/how-to-know-whether-any-process-is-bound-to-a-unix-domain-socket

unlink_before_bind is an option in the efl.net api basically it's something that cna be used to forcibly take over a socket even if something is still listening on it. it's not actually USED in efl anywhere so it's an optional code path. you can ignore this as it's not relevant because it's not enabled.

the only issue may have been E rm -rf'ing the xdg runtime dir when e restarts. it wouldn't affect you until you restart e. i moved that to enlightenment_start so it'll only be deleted when enlightenment_start exits which should be the end of your session then.

i suspect bsd has issues with the same named socket being active with one of them having been deleted before and replaced and this leads to nasty side-effects. it SHOULDNT as it is a different socket with a different inode/fd etc. but perhaps the way it's done means somewhere in the kernel some things get looked up by NAME and thus it gets confused and may switch between the old and the new socket.

I've just tested the new version on my test laptop. The situation is definitely better!! :)
I have only one instance of efreetd running, E communicates correctly with this instance, the cache is correctly updated. Restart of E does not change this situation - everything continues to work correctly.
BUT!! There is still one problem. If I kill efreetd process - it is restarted automatically, but then sometimes (not always!) it freezes, producing two or three zombies. If I kill it again - a new efreetd is started, but then it crashes (or exits) immediately and there is no new instances created. If I restart E or start another EFL app now - everything comes back to the correct situation. It seems that killing efreetd (or crashing it) leaves the socket in place, so new instance has troubles to bind to it.
The thing I don't understand - if the socket is not deleted and not unlinked - how can the situation come back to the normal state?? As I understand, with the current logic, a killed (or crashed) efreetd will prevent another instances to work...
As about deleting working socket - I cannot produce any strange effects with my test program. If I delete a working socket - the server and the client continue working as they worked before deletion, a new instance of server creates a new socket, a new instance of client correctly communicates with the new server (so I can have two pairs client-server working correctly using the same socket name, but different FDs).

It seems that killing efreetd (or crashing it) leaves the socket in place, so new instance has troubles to bind to it.

Correct. Unless there is a clean shutdown the socket file and dir will still be there. No code ran to clean up the socket because efreetd was killed. There is not much we can do about that other than perhaps on freebds ONLY use the lock files and ALWAYS forcibly unlink the socket file if we manage to get the file lock.

The thing I don't understand - if the socket is not deleted and not unlinked - how can the situation come back to the normal state?? As I understand, with the current logic, a killed (or crashed) efreetd will prevent another instances to work...

Not on linux. it works fine... (discussed on IRC) :)

As about deleting working socket - I cannot produce any strange effects with my test program. If I delete a working socket - the server and the client continue working as they worked before deletion, a new instance of server creates a new socket, a new instance of client correctly communicates with the new server (so I can have two pairs client-server working correctly using the same socket name, but different FDs).

Then I am struggling to explain things like a blocking accept() ... :(

I've just tested the last versions of EFL and E from git.
Everything works fine till I kill -9 efreetd process (I think that this situation is rather simple to reproduce). Then I have efreetd restarted, but in a strange state, not responding, and two zombie processes (update cache and desktop). Repeating multiple kill -9, finally I get a working efreetd, restarting E reconnects it to efreetd.