Interesting stuff on E

Interesting things you may want to know about E, EFL and development in general.

Simple EFL vs QML comparison, again !

A few weeks ago someone who knew QtQuick wrote a small application to compare it with EFL. You can see it here. I have been looking for years for a decent way to compare them. So here we are. Thanks a lot!

The first things to point is that a well done benchmark is always useful. It is useful to put things in context and not to stay on your isolated island believing that your flint is the most awesome piece of technology when the rest of the world uses microwaves to cook. It is a good opportunity to know where we stand and what needs to be improved. Also in this particular case, this benchmark also helped us find issue we didn't look at before.

Misconceptions

I think i need to explain some things first about the benchmark of EFL/C vs QML as a language. QML is JavaScript at its heart, when EFL/C is obviously C. One is designed to do Rapid Application Development when the other is clearly not designed with that in mind. EFL/C and Edje in particular is designed to put a clear separation between your UI skin and your application logic, and to spend time developing your application logic in a strictly typed and compiled language.

This gives you some benefits. You write your application logic once and you re-skin your apps for another device or another look. Like having a night theme, a day theme, a tablet theme, a tv theme and so on. Elemines comes out of the box with that feature. The QML version of it, doesn't. It is something more difficult to do when doing QML, as you tend to push the application logic into QML because "it's easy" and while this speeds things up in development, it removes your separation of skin and core application.

We do agree that there is a need for a RAD solution on top of EFL and we do plan to do it with Elev8, but it is not ready yet for prime time. Maybe later this year I will blog about it. At least now I have a good benchmark for it! Until then, yes, a RAD based solution like QML will save your writing code.

With that in mind, I helped Jerome clean up Elemines by using more features of Edje and doing less stuff in C. Overall this benchmark would have helped improving Elemines and make it an official game of the Enlightenment project.

Memory consumption

{F665, float, layout=right}The original benchmark after focusing on lines of code then focused on memory consumption. Since he did run that benchmark before my work on improving the memory consumption, it was a good opportunity to see if it did pay off. As you can see in the below screenshot massif-visualizer (A wonderful KDE/Qt tool, really !), there is quite some change.

If check carefully we don't use the same images and fonts in both benchmarks, but when you sum them, they are quite close and won't change the results (625+110KB for Elemines and 731+85KB on the other side). So at peak time Elemines will have allocated 3.8MB of data when qmlminer will have allocated 4.9MB, this is a win of almost 30% of memory on a very simple application. Not bad at all and note that there is no special effort in Elemines to do so.

But that benchmark was also done on a 64bit platform. We almost never do any memory benchmark on 64bit systems as we target embedded devices and none of them (worth talking about) is 64bit at the moment. So I spent a weekend on looking at the issue reported by this benchmark as it seems we do quite poorly on 64bit, and we do as you can see. I spent some time to look at what was going on. There was a small over allocation bug where we didn't use the right code to align our structures that led to a 100K memory cost just by itself, but the rest of it was just the over-usage of pointers.

Fixing that will take a lot of time. It is a "long tail" of small cleanups. Locating huge structures that have pointers in them and try to find if those pointers are not already accessible in our code from somewhere else. The reality is that we don't really pay any price on 32bit when using those lazy pointers, but on 64bit they add up quite quickly. After hours of work during the weekend, I barely managed to gain 300K (got a 100K saving on 32bit with those change), 2 more megabytes to go...

That will be a long hard road, but hopefully will be fixed for our 1.8 release. And we should thank this benchmark for having detected that problem, or we would have not noticed that issue before the release.

What I don't know here is how much of the overhead is added by the JavaScript binding and I would likely need the help of someone familiar with QtQuick to know that. Also, I did the tests with Qt4. It would be interesting to test with Qt5 here. In the future, I will definitively test an Elev8 port of Elemines to see how much the JS binding adds as overhead. This benchmark really gives us an interesting and not too artificial reference point.

Startup Time

Another point that the original benchmark focused on was startup time. I took this opportunity to improve EFLs ability to benchmark this by adding an environment variable "ELM_FIRST_FRAME" that can take a few different values:

  • T: Will tell you how long it took from just after main() or with quicklaunch just after the fork() to push the first frame to X.
  • E, D: Will exit the application with exit(-1) as soon as the first frame is pushed to X.
  • A: Will abort the application after the first frame is pushed to X.

All of this really helps to get a proper view of what takes time before getting the first frame on screen. It is not yet perfect as you don't see the I/O cost in the benchmarking tool, only the CPU cost, so if you spend a lot of time waiting for X (I didn't try the xcb back-end to see if that helps ... maybe another day), you will not see that in most of your traces. At least, I don't know a tool that can do that at the moment, if someone know please let me know if you know one.

With that in hand, I ran the same benchmark script as the one provided by the original AUTHOR and just passed ELM_FIRST_FRAME=E in the environment when trying it. To exacerbate the difference in startup time and as pointed out on that blog, I used a Raspberry Pi.

Just let me rant a little bit about the state of the X driver on the Raspberry Pi. I managed to crash it a few times when doing that benchmark. During that benchmark there was nothing running on X, no Window Manager no other application. Only one application at a time. One simple application at a time. Only software rendering, so no OpenGL involved. That was enough to make it crash, and running that benchmark would take hours on the Raspberry Pi. It was really painful to use the Raspberry Pi as a benchmarking platform...

So back to the benchmark. I did a setup with an Arch Linux up to date on the Raspberry Pi that started up, connected to the network and started a ssh daemon. Then I connected twice. Once to start X, the other time to run the benchmark. This limited, as much as possible, any external interference. I re-ran the benchmark a few times to be sure I didn't set up anything incorrectly and I do encourage people to look at the Qt setup on Arch as the difference is just huge in my opinion.

As it was a good opportunity, I tested our QuickLaunch infrastructure. QuickLaunch is a daemon that partially initializes EFL and waits for another binary to tell it what to open. When the request comes, it fork()s, resets a few things inside EFL, dlopen()s the application and then starts it. There is still a few things done after the fork like loading the configuration, the theme and its data. I think it would make sense to start loading part of the configuration or the theme in advance. It is just not as easy as it sounds, because we support a configuration/theme per virtual desktop, and thus per app launch etc.

Anyway, with QuickLaunch the warm startup time of EFL application is just 3 times faster than the QtQuick application. QuickLaunch is 0.4s faster to startup an application than without it. When spending 2.25s to startup an application, it is a little bit more than a 20% speed improvement. Arguably it could be better, but it is clearly not worthless on an embedded target like the Raspberry Pi.

If you want to test QuickLaunch on that Arch image, just do in one terminal :

$ DISPLAY=:0.0 elementary_quicklaunch

And in another one :

$ DISPLAY=:0.0 ELM_FIRST_FRAME=T elementary_run eleminesql

You should see a log getting displayed under elementary_quicklaunch every time the application is started with the time needed to push the first frame to X.

Summary

It is clear to anyone today that C is not RAD solution and there is no argument there, but EFL was primarily designed to target the embedded world. This is a place with very limited CPU, memory, I/O and battery, but with a huge variety of screen sizes and input devices. That explains why we have a clear separation between the theme and the application logic and we want to keep it this way.

It also explains why EFL is 3 times faster to start Elemines and uses 30% less memory than QtQuick on this benchmark. It also explains why 64bit is not that optimized at this point. We still have room to improve our startup times and our memory usage, before we do the release.

Also as we target the embedded world, we do understand the need to provide an efficient software back-end for our rendering and will continue to do so in the future. We even have plans to improve it. We do believe that there is a lot of hardware out this day that can do nice graphics and have powerful applications, if they use EFL for that.

Where EFL does need to improve is clearly in the tools and infrastructure around it. Having an easier to use environment for application developers is clearly where we are lacking today and what we should focus on. I do think that what Node.JS did for JavaScript on the server side could, and should, be replicated in Elev8.

It is to be noted that for any of these simple game and the benchmarks we are doing here, it doesn't count what Elemines does in its application logic as all the time will be spent inside the toolkit. In fact by experience, I know that all these casual games will run perfectly well with JavaScript. There is no doubt of that. Maybe later this year, we will be able to invest some time in Elev8 and see how it does in this benchmark. Also what is core here, is the underlying toolkit and how efficient it is to push pixels to the screen. I think that EFL is not to bad for what it targets...