Page MenuHomePhabricator

ecore_ipc_server_add can't create CLOEXEC sockets, E leaks the sockets to all child processes
Closed, ResolvedPublic

Description

enlightenment creates a server from e_ipc_init with ecore_ipc_server_add, this creates a socket that is leaked to all processes launched from E.

It ends up creating a socket with efl_net_socket4() with close_on_exec derived from some object property that doesn't appear to have any way to set, since this all gets handled with one efl call.

ManMower created this task.Jul 12 2017, 9:42 AM

close_on_exec property is documented

[[Controls Close-on-Exec() using FD_CLOEXEC.

  Children socket will inherit the server's setting by
  default. One can change the behavior using each instance
  @Efl.Io.Closer.close_on_exec.set.
]]

errr... is that sensible? shouldn't it be off by default? why would you want to inherit such fd's BY DEFAULT?

it's tempting to change the default...

I can't check right now, but AFAIR the ecore con/ipc had a flag to set cloexec and would reflect to efl_net

As for default, it's just to match the OS default

I'm all in favour of changing that default - to something defined a little better than "OS default" - which basically means anyone writing portable code that doesn't manually change it is doing it wrong?

I can't really imagine a case where we'd actually want to leave these open to child processes - and making the default CLOEXEC set means we can avoid some potential races if a user is multi-threaded and one thread may fork between creation and call to cloexec setter. (E does not do this, I'm fishing for hypothetical reasons to advance my agenda. ;)

If there's actually an API to set this before or after creation then E is easily updated and I apologize for an invalid showstopper. I've not been able to find it, but I've found I'm terrible at grepping for ideas in generated headers, so there's a real possibility I just missed it.

i'm with @ManMower - OS default here is one of the few times i have to say "unix screwed up". fd's should be explicitly marked for survival across exec. across fork? sure. survive. exec? no.

the reason is - it's totally unexepcted. you just run some ipc service OR connect to some network resource and then some program you run can access all those fd's if they want UNLESS you go to effort every time of setting close on exec. i think we should do that by default as what situations do you want that are common where an ipc or network server fd or client fd's are visible to processes you exec?

but @ManMower is right. i can find fd's leaked from e into child processes e runs that have to do with e's ipc server. they should not be there and e shouldn't have to go do special work to ensure they are not (by default)...

I had some minutes to check and the legacy API had no such flag, so indeed the legacy API must set close_on_exec to true regardless of the default.

Will fix this and close this ticket to let you know.

still no time to test/debug, but an inspection of the source says the logic should be correct. Could you check the logs for ERR messages? What about DBG messages in ecore-con?

ecore_con_legacy.c:

  • starts with efl_net_server_fd_close_on_exec_set(inner_server, EINA_TRUE) before socket is created at efl_net_server_serve()

efl_net_server_fd.c:

  • starts with close_on_exec=True (due above) and applied during loop_fd_set().
  • close_on_exec_set() calls eina_file_close_on_exec(), on failure it would ERR

Do you know if that's the listening socket (server) or the accepted client? Or both?

Sorry, you're confusing me - I don't know which functions you're talking about in those files or how they're relevant.

No ERRs before the call to efl_net_socket4() in E startup.

If you need specific debug env vars set, you'll have to tell me exactly what to set.

Here's a bt if it helps:
#0 efl_net_socket4 (domain=1, type=1, protocol=0, close_on_exec=0 '\000') at lib/ecore_con/ecore_con.c:577
#1 0x00007f72dafd840f in _efl_net_server_unix_bind (o=0x8000000216f18fb8, pd=0xc87a59a030) at lib/ecore_con/efl_net_server_unix.c:65
#2 0x00007f72dafd8b42 in _efl_net_server_unix_efl_net_server_serve (o=0x8000000216f18fb8, pd=0xc87a59a030, address=0xc87a5751c0 "/run/user/1000/e-derekf@2ba8ec5c/31643|0") at lib/ecore_con/efl_net_server_unix.c:229
#3 0x00007f72dafabc6c in efl_net_server_serve (obj=0x8000000216f18fb8, address=0xc87a5751c0 "/run/user/1000/e-derekf@2ba8ec5c/31643|0") at lib/ecore_con/efl_net_server.eo.c:9
#4 0x00007f72dcf34a4f in ecore_ipc_server_add (type=ECORE_IPC_LOCAL_SYSTEM, name=0x7ffe38715ac0 "/run/user/1000/e-derekf@2ba8ec5c/31643", port=0, data=0x0) at lib/ecore_ipc/ecore_ipc.c:485
#5 0x000000c878ad38e1 in e_ipc_init () at src/bin/e_ipc.c:87
#6 0x000000c8789cd92b in main (argc=1, argv=0x7ffe3874fd88) at src/bin/e_main.c:611

That's how we get to efl_net_socket4() with close_on_exec == 0...

I don't see either of those files in the bt.

more poking around in gdb seems to indicate that efl_net_server_serve() is called before any call to efl_net_server_fd_close_on_exec_set() - the only call to fd_close_on_exec_set() in my run is a descendant of net_server_serve.

many thanks, I was looking into ecore_con_server_add() from ecore_con_legacy.c... from your bt it shows ecore_ipc_server_add() which does no close_on_exec... clearly a bug. Prior to efl_net_server_serve() we must have a block like in ecore_con_legacy.c:

if (efl_isa(svr->server, EFL_NET_SERVER_FD_CLASS))
  {
     efl_net_server_fd_close_on_exec_set(svr->server, EINA_TRUE);
     efl_net_server_fd_reuse_address_set(svr->server, EINA_TRUE);
     efl_net_server_fd_reuse_port_set(svr->server, EINA_TRUE);
  }

but not just that, the other efl_isa() to cope with legacy behavior are needed, like for tcp, udp, ssl, unix...

will fix ASAP, thanks and sorry about the fuzz

iscaro added a subscriber: iscaro.Jul 26 2017, 9:56 AM

Hello everyone. I created a patch fixing the problem. Please, send me some thoughts.
I also plan to set CLOEXEC to true by default, but I'll do that in another patch.

The patch looks good to me. Gustavo also mentioned the tcp, udp, ssl, unix, etc legacy behaviour. Are these all fine and do not need further tweaking? Just this patch plus CLOEXEC as default later on?

The patch looks good to me. Gustavo also mentioned the tcp, udp, ssl, unix, etc legacy behaviour. Are these all fine and do not need further tweaking? Just this patch plus CLOEXEC as default later on?

Yep, I forgot about those (tcp, ssl, etc). I pushed a patch to devs/iscaro/conn which contains the missing legacy behaviour and another patch that sets CLOEXEC by default.

Great, thanks a lot. Giving it some review now.

@iscaro they look good to me and I went ahead and pushed them. Just added a fixed tag to make it close this issue. Thanks for working on this.
@ManMower could you please give your test case a try with latest efl HEAD and let me know how it goes?

Thanks all - I've tested and E isn't leaking that fd anymore.