I've just tested the last versions of EFL and E from git.
Everything works fine till I kill -9 efreetd process (I think that this situation is rather simple to reproduce). Then I have efreetd restarted, but in a strange state, not responding, and two zombie processes (update cache and desktop). Repeating multiple kill -9, finally I get a working efreetd, restarting E reconnects it to efreetd.
Sat, Feb 15
To summarize the ideas (after remove out of scope parts)
It seems that killing efreetd (or crashing it) leaves the socket in place, so new instance has troubles to bind to it.
Fri, Feb 14
I've just tested the new version on my test laptop. The situation is definitely better!! :)
I have only one instance of efreetd running, E communicates correctly with this instance, the cache is correctly updated. Restart of E does not change this situation - everything continues to work correctly.
BUT!! There is still one problem. If I kill efreetd process - it is restarted automatically, but then sometimes (not always!) it freezes, producing two or three zombies. If I kill it again - a new efreetd is started, but then it crashes (or exits) immediately and there is no new instances created. If I restart E or start another EFL app now - everything comes back to the correct situation. It seems that killing efreetd (or crashing it) leaves the socket in place, so new instance has troubles to bind to it.
The thing I don't understand - if the socket is not deleted and not unlinked - how can the situation come back to the normal state?? As I understand, with the current logic, a killed (or crashed) efreetd will prevent another instances to work...
As about deleting working socket - I cannot produce any strange effects with my test program. If I delete a working socket - the server and the client continue working as they worked before deletion, a new instance of server creates a new socket, a new instance of client correctly communicates with the new server (so I can have two pairs client-server working correctly using the same socket name, but different FDs).
unlink_before_bind is an option in the efl.net api basically it's something that cna be used to forcibly take over a socket even if something is still listening on it. it's not actually USED in efl anywhere so it's an optional code path. you can ignore this as it's not relevant because it's not enabled.
I could reproduce blocking access with my test program, but not in 100% of cases.
The key point - unlink active socket and rebind it. Sometimes the old server and the client connected to the old server continue working correctly, sometimes, stopping the second server blocks the communications with the first one.
Anyway, it seems that incorrect unlink is the root of the problem as multiple servers on the same socket is definitely not a good configuration. I see the conditional unlink in _efl_net_server_unix_bind of efl_net_server_unix, but I don't understand how and where pd->unlink_before_bind is set (it seems that _efl_net_server_unix_unlink_before_bind_set is used for it, but I don't see it used in EFL/E code).
There is a good article on the subject:
A workaround proposed on Stackoverflow:
if you restarted e - e may have deleted the old xdg dir and replaced it - oddly with a new xdg dir of the same name. the old efreetd will eventually die when all its clients go away. i changed e to not delete on shutdown but to have enlightenment_start do that instead.
Thu, Feb 13
It seems that on manual restart of Enlightenment (Ctrl-Alt-End), EFL does not read correctly XDG_RUNTIME_DIR.
The variable is set in .zshrc, manually started efreetd uses
Enlightenment started using startx uses it too (at least, it tries), but after a manual restart of E I see another efreetd process, using
and in the same time the first efreetd process becomes frozen, the BT is the last I've posted (efl_net_accept4).
According to sockstat, Enlightenment is connected to
a blocking accept? that's just stupid. that should not be happening. we wont be doing an accept unless the fd says its available for reads.
So, the new version does not crash anymore, it freezes.
Started from command-line, after some time running efreetd does not respond on Ctrl-C neither SIGTERM.
Wed, Feb 12
well i put in a "don't crash" workaround to the crash above...
Thanks! I don't see any patch attached, is it in master?
well i put in a "don't crash" workaround to the crash above... but without knowing the full flow of how it gets there, that's the best i can do.
Tue, Feb 11
Valgrind works better on another PC, log attached.
Yeah, that ticket includes discussion about how to document these config values (starting at T7356#124323) but we never got to any conclusion.
The global config is in the same state of disarray as the gestures config.
Vaguely, but not anything specific. I'm just talking about it in the sense that we should document the config values in the gesture classes.
This is related to the now-ancient discussion about how to handle configuration in a generic way, right?
I'm talking about T7383.
The biggest issue here is the config values. We must document these, which means they effectively become hardcoded. Unfortunately, our config value names are all bound to the legacy glayer elm widget, and so they have names like glayer_double_tap_timeout which is not good. A decision must be made about what to do here.
One more backtrace of efreetd crashed:
Mon, Feb 10
So how's the user of a EO struct supposed to put the information inside it?
IMO Eo structs are not mutable per se. C and C++ structs are mutable.
If C# devs are not used to mutable structs, then we should not provide them, yeah.
What do you suggest we do, though? EO structs ARE mutable.
@felipealmeida So, I've found rEFL581bec9598943cc9274dfe7db1a73a4c878c3cdd, which makes struct immutable because of C# recommendations (and there's the reasoning I was looking for, too). Should we propose to make EO structs generate classes instead, then? Where could I read/who to talk about if it would or not go against the idea of structs?
Beside the name discussion: The API looks fine.
With the patches applied, this looks very fine.
Sun, Feb 9
I think this looks good now :) And we can move this to "stabilized'.
I think this is just someone missing the setter. The setter should exist IMO. @jptiz can you look at this monday? Thanks.
Fri, Feb 7
I don't know why that was done that way, but seems a bit arbitrary to me. Maybe @felipealmeida knows?
Oops, just realized actually structs don't have properties (https://www.enlightenment.org/contrib/docs/eo). Maybe changing an eo-struct would require creating a new one with modified values?
As far as I can see from struct_definition.hh, fields are intentionally read-only (don't know the reason, tho), while properties may be writable. Should we change that rule and add a set for them, then? Where could I find the reasoning about who should be read-only, etc.?
Okay, so we'll expand here.
Besides yours, we have no other prev_ and one previous_, one cur_ and 6 current_. I'm pretty sure we are striving for explicitness everywhere else.
point_add doesn't really work since it also is used for updating the current state of an already-added touch point.
Thu, Feb 6
Some changes here:
- I removed --warn-unresolved-symbols from my LDFLAGS
- I rebuilt all from Git with debug CFLAGS
- I removed additional search icon dirs from E config