Page MenuHomePhabricator

Efl excessive usage of Eina_List
Closed, WontfixPublic

Description

I don't know why the goto data structure in Efl is Eina_List.

Maybe because it is convenient or maybe people just copy form one place and modify which has a network effect (if i see a piece of code more often will just follow same pattern).

As we know by now List is the Worst data structure in terms of performance even if we need random insert and deletion and yet I don't see much use of Eina_Array and few use of Eina_Inlist.

Eina_List specially messes Cache Locality and of course we do lot of pointer hopping which means greater chance of Cahe Miss ( which is the main reason for slowness in program).

Here is a memory snapshot of elementary_test app startup.

And I fill the eina_mempool is really masking the problem (because we don't see individual malloc and fee because of this ) as well as our memory access pattern in Efl.

I believe if we start refactoring by converting those data structure from Eina_List to Some Data Structure (which can store the POD(Plain Old Data) objects in contiguous memory we can improve the overall performance of Efl App.

@raster
And maybe we should review how many mempool we have and are we effectively using them (whats % memory is used in a particular mempool ) because these mempools are usually big (4k - 64k) .

smohanty created this task.Aug 11 2019, 9:40 PM
smohanty added subscribers: Hermet, JongminLee, cedric.

Well, the cache locality is an interesting things to discuss. For example, if your object are really big and you are using an Eina_Inlist to just access one random member in that structure, you will basically have a worse case than with Eina_List (As you are more likely with the mempool to be near the Eina_List you walk to). There is some improvement we could do in Eina_Mempool to even improve the allocation to be even closer in general.

I have been looking in the past to the clipper and that was a clear cut better of with Eina_List than another structure. I also looked at eet, and it is mostly the same story and much of the conversion to array that was meaningful was done when we did release Edje (Arguably back then Edje was not designed with the monster theme we have today, so that might have changed). The rest, I haven't looked at, but it is a case by case problem and Eina_List is pretty good at making it not an obvious problem.

Well, the cache locality is an interesting things to discuss. For example, if your object are really big and you are using an Eina_Inlist to just access one random member in that structure, you will basically have a worse case than with Eina_List (As you are more likely with the mempool to be near the Eina_List you walk to).

 Its applicable to both Eina_List and Eina_Inlist  for Eina_List you still have to access the  Data Pointer to visit random member after accessing the node pointer (but Eina_Inlist will be 1 access instead of 2).And anyway when we access a member we are not accessing a byte  (but accessing the whole cache line 64byte in general) . As Eina_List  accessing a node does 2 pointer hopping it will at least do 2 cold cache reading in worst case scenario and god knows how many by the prefetcher (if the list node and node data happens to be far from each other ( in different cache line ). so ideally Eina_List will perform worse on an average.

And your mempool assumption works if we add all the element at once to Eina_List (so that all the nodes are in continuous memory area). But we don't know the usage pattern of the user of Eina_List.

Note: Certainly after considering Eina_List nodes are created in Mempool its better than creating the Node Pointer using malloc but still im my opinion Eina_Inlist will outperform Eina_List in average case .

https://phab.enlightenment.org/D7471

no one was interested. i've been meaning to drop this into eina and begin using it internally whenever possible. a slow move over to it would help.... a lot. it's designed to help with locality by having blocks be a multiple of contiguous cachelines. it actually will save memory on anything that is not a super-short list. it still looks like eina list in that it stores pointers to stuff (inlist is a different beast). i built it as a fairly easy move from eina list to blist... just read the patch review. it's intended as super-early "lets throw ideas around" for this to be a viable replacement for eina list in almost all cases. as it hides the implementation it can even just drop down to a single array internally if the data never changes and is just appended to most of the time. it could under-allocate blocks for super-short lists. there are openings for lots of optimizations it can do to be even better in corner-cases. we don't have to do all of them day 1. the first step is agreeing on the api & design being good. the implementation leaks out into design somewhat but not much. the only thing this can't do that a list can is keep a handle to a specific member of a list (an eina list node) and know that handle will be valid until that member is deleted/removed. it can't do that bit. all other uses if eina list are fair game. i also designed it to be less error prone so no list = func(list, ...); just func(&list, ...);

my thoughts are that someone other than me should also look at the above as a starting point and see if we can do some fine tuning to this idea/code./design or... design something new. but this does kind of abstract a list much more than eina list. it merges this with arrays to a large extent giving you the "best of both worlds" of list and array. as it hides the implementation behind the api, this means we have LOTS of room to optimize this in the future as well without breaking api. caveats are just:

  1. can't track individual nodes so any code (very rare) that keeps an Eina_List * ref to a node INSIDE a list (not the beginning of it - but any arbitrary member) and stores this and expects it to remain the same until removal of that member won't port. this is very rare so... :)
  2. very short lists (a few items) will be larger with blist. we could improve this in future with having different storage setups internally for such situations and we can "clean-up" short lists that settle down, but for now it will be worse for very short lists
  3. blists fragment more easily. there is a cleanup func but this needs to be accounted for over time so we need to have "gc" cycles i guess. where and when to do this will be a good question.
  4. blist doesn't nicely extend to eina inlist like list -> inlist did. i'd have loved a design that does, but i think we use inlists for really big objects almost always so the extra cost of the list node and indirection is really not that bad compared to everything else (if object is big we likely access lots of memory per obj access and if objects were stored serially we'd skip so much memory between objects that we'd not have good cache behavior anyway). i think this is a caveat worth living with.
  5. it isn't an instant win. it requires we first introduce this and make sure it's tested well and more importantly the DESIGN is right so we can stick to the current api and behavior model into the future and have room to improve it... but the the work is almost all going to be in finding one list at a time and converting it to blist. the more get converted, the more we win. it's going to be a manual process. find the worst offenders that fit something like blist and do them first. see what we gain out of it, then keep going slowly over time. no instant wins here, but long-term i think this is a better data struct for almost all list-like uses and in fact almost all our array uses too.

For Example :

   I have a structure 
  Struct A 
{
   Eina_Bool someFlag;
   Void          *someData;
};

If i need to keep a list of this nodes in a data structure
In Efl my only option is

A *instance = malloc(sizeof(A));

then add in Eina_Array or Eina_List or Eina_Inlist.
But if i want to directly put this structure content in the Container (in place ) i don't have any tools to do it . and that's where i guess is the main problem.
because i am forced to create a heap object just so i can use it in container has all the problem that typical list has
So we improve the List by keeping a block it will just behave like optimized Eina_Array or a Deque with fixed size slot. but my main issue of heap allocated object will remain same.

we very rarely need to store something that simple in a list... but i get your point. most of the time it's a fair bit more than that. unless you can show me the many uses of such tiny data structs (that are either filled with lots of data - thus wasteful or are walked really often - thus getting cache misses) ?

I did not read all the replies here:

  1. Just plainly moving over to inlist or pod does not help at all IMO. You will not see the allocations in eina_list, but rather in other functions, just not accumulated. Additionally, moving to eina inlist is not always possible and not always helpfull, (@cedric layed that out)
  1. Saying list has a bad performance without calling a usecase is absolutely not usefull. Take a usecase where you have a ordered set, where the order is defined by some function, list will have a way better Performance than a array. Take a simple stack, and you will see that a array is better. It's not that easy to say something is bad or something is good.

What we IMO need is a consideration for what we use a list. Storing callbacks or just children (like in widget) -> good idea. Implementing a simple Boolean property in a parent -> (maybe) a bad idea. There are a lot of usefull datatstructs that we could implement there, my biggest issue with those is, that we will have individual append/remove/nth/find for each of them, which makes refactoring and later container changes a pain.

To start the effort (for resolving this ticket), someone should think about a way to have one interface for calling container's appends etc.. and then later on, start to experiment with new container's, trade off/mixed container's (like blist).

@bu5hm4n that's the idea of blist indeed. it's after a lot of thought on what eina list is bad at. it makes some compromises because it has to, but in return for other benefits. it's a "use in cases where it'd really help" cases. So trivially short lists -> not much help. Lists we very very very rarely walk -> not much help. Impossible for anything that keeps Eina_List * nodes middle-of-list and expects them to be "constant" until member removed. It doesn't solve inlist style use cases which imho are mostly for very big objects where arrays would not help and just hurt.

I could look at making it possible to do an inlist-style thing with blist, but it'd complicate the code a bit as now members are N bytes instead of a fixed known size (and N needs to be aligned, and i have to deal with the block next/prev and count content merging nicely with this). I could do this as a separate binlist in future that referred to content differently for small items (that might be like 1-3 or 4 pointers in size but no bigger), but those use cases are minimal and not worth it at this point.

My take is blist is designed so from the OUTSIDE the storage mechanism is opaque and all blist does it guarantee ordering (unlike a hash) but doesn't have any sorting (like a tree) built in - ordering is explicitly maintained. That's about it. That is it's only design guarantee. It can now optimize walks or getting nth item quickly internally or even rearrange data to be a flat single array if it wanted. perhaps blist is the wrong name... anyway. I made some assumptions on usage based on all the historical use of eina_list i've done myself and seen across e and efl etc. and this is a stab at improving a lot of that with a new api and data struct. i'd like to know the api and design is "right" first and that optimizations beyond what i've already done are just an internal matter for the blist code and won't break/hurt the api.

Maybe my understanding is totally wrong please correct me have drawn some ugly pictures.

This is how I think Eina_List and Eina_Inlist are stored in memory

If my understanding is correct please explain how Eina_List will perform better than Eina_Inlist in any case ? @cedric , @bu5hm4n

Next picture is the BookKipping data per every item in the List

For Eina_List we are paying 32bytes (excluding Eina_Magic have no idea whats that ) = half of cache line just to store 1 item in container.
For Eina_Inlist we are paying 24 bytes for storing 1 item in container.

I am surprised that we have a accounting pointer now I sincerely hope that we compiled out that Accounting pointer when we do a release build . otherwise it will be the worst Linked List implementation than what i read from my 1st class in Computer Science (in terms of both memory ). :)

Saying list has a bad performance without calling a usecase is absolutely not usefull.

   
Well I also used to think that if i have to insert or delete at random position List is a better data structure . But after seeing the profile result and knowing how important is 
the data access pattern for getting good performance i changed my mind long time ago . 

So if you want to be sure just profile I am pretty sure you too will change your mind.

Some Actual Benchmark Data by someone

https://baptiste-wicht.com/posts/2012/11/cpp-benchmark-vector-vs-list.html

Although this comparison is for C++ vector vs List , It shows how data access pattern and performance are correlated.

  1. Can you elaborate on what kind of benchmarks you are looking ? I do not believe that we have a bad performance because of cache misses caused by eina_list, we have private data that is fetched multiple times during eo call chains and many more things, I simply do not believe that our list implementation does cause an considerable amount of slowness.
  1. Speaking of list and mempool, list is using mempool, which means the data is close to each other which should improve cache usage here. Using inlist / pod, is not really ensuring close memory segments. So our cache usage would go up or down with inlist / pod ?
  1. Please have a look at the count method of list, it is O(1), in basic CS class 1 you will likely learn that counting requires litterly counting each element, to get around this there is the accounting pointer, which also takes care of the last pointer. If you take a look on our usages, then you will see that efl uses quite often the count method, which would be very bad performance wise.
  1. The benchmark is nice for an isolated view, but not usefull for usecases like "iterate over each item and perform 5 big eo functions on it" this will likely mess up your cache that much, that every cached result from the iterating is dead by the time you are done with the calls.

well inlist always is part of much larger data structs ... so the inlist header inside a much larger one makes little difference so arrays will be just as cache-missy as a linked list as no single object item is less than a cacheline. ... they are all big structs. so let's skip that. that's the real world usage pretty much.

now for the accounting pointer. it's always there. it's not compiled out - ever. it's necessary for functionality like finding the last member or getting count in O(1) time. it points to an invisible memory block the list impl maintains for you. this is done so that the list head/list itself is the same as the first item. NULL ptr is an empty list. it's nice and simple. it leads to simplicity in some ways. every node has it but only the first node's accounting ptr is valid. so ... this means it's 4 ptrs per list node, instead of 3 without it.

now... common malloc implementations allocate in buckets. keep in mind this was written in the 32bit days too so we're talking 16 vs 12 bytes without the accounting ptr, but ok 32 vs 24 on 64bit. now have a read of https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/generalprogramming/malloc_buckets.html - about malloc buckets in glibc. eina list was written before we had any mempools in eina and every node was malloced assuming the libc implementation would have a well optimized alloc path for lots of small allocations. with buckets by default that means that glibc would always alloc in bucket sizes... e.g. multiples of 8 bytes on 32bit and 16 bytes on 64bit (read that page above). that means if you try allocate 12 bytes on 32bit you'd get a 16byte chunk anyway, so ... 4 ptrs uses the same space as 3 in the end. same on 64bit but 32bytes. this was an implementation of a list written with knowledge of the underlying allocation system in mind where that extra accounting ptr is for free and saved having a different list head/container object vs. list nodes themselves and is considered "free". what they teach you at school is theory :). they don't teach practice. :) now when at some point things were put into mempools then this changed things, but the list implementation and design were already a done deal long before then and changing them would have been a huge amount of work. so it actually was a fairly sensible decision at the time - it was re-using wasted space that the malloc implementation would waste anyway. it's taking advantage of alignment by that implementation.

now in general our list USAGE is not random. we don't randomly insert or remove most of the time. MOST usage is prepend and append. these are both O(1). removal does indeed ned a walk and this is costly, but ... it depends on length of list and hos often this is done compared to everything else. insertion does happen indeed but... insertion into very large arrays can be very expensive. insertion into a list is O(1) if you know the position. it's O(n) if u have to walk. if you maintain an Eina_List * node then you have the position always stored. in cases where this is actually done in performance critical places code actually does do this. inlist of course always stores the node as the object and node are merged. So insertion and removal are cheap in O() space. also keep in mind that eina list was designed in the days of much smaller cache sizes. when u ave to memcpy() enough data up and down to make room for a new item to pack down a list then it will begin to be worse with vectors (arrays). we have arrays too... we do not have a data structure that is "both". also keep in mind benchmarks like the above are rather "fake". hear me out. they drastically look good because they sit in a tight loop and do nothing but insert, remove or walk and then don't do anything else. caches are never polluted by anything else. in real life caches are invariably polluted very badly in between each time you may for example insert an item, or then remove it, so... what we're really talking about are the cache benefits when you read in a cacheline you fetch multiple ptrs at once in one go and thus don't get more cache misses. changes are we have a cache miss per eina list item. we DO have a prefetch call to prefetch the next list node when using EINA_LIST_FOREACH() and this should mitigate this somewhat. so as long as you do enough per item when you walk to allow the prefetch to finish then it's "zero cost". this will not always be the case of course. in "fake" benchmarks that is what happens and then the numbers are really skewed towards arrays/vectors vs lists as opposed to real life where you may also then find some struct members and compare them or do a strcmp etc. per node that ends up maybe also stalling on memory and eating up cycles allowing the prefetch to do its magic... - again... keep in mind most of our data is big and thus is already bigger than a cacheline anyway so only on very tiny data structs is this really very useful. so keep in mind all of this performance is theory - in practice your caches get dirtied up by lots of other data, making the wins less and less. the real wins are mostly in saving the memory and thus lowering the footprint.

if we want the properties of both vector and list... that is what blist is. it's small mini arrays (128 byte chunks) that are linked together. these align to cachelines so going to the next block would be a cache miss anyway (well without automagic auto-prefetch on detecting linear memory walks by a processor). of course blist could explicitly call prefetches as part of a walk and i have the XXX comments in there to go remind me to do just that, so the benefits of that kind of structure is that it gets both worlds. very cheap insert/remove in the middle because it only has to shuffle small arrays of items around and maybe delete a block or insert a block at times. it gets the benefits of a vector/array in that its compact (not quite as compact as a pure array but almost) and it fetches lots of items in a single cacheline fetch so cache locality is good. i have benchmark numbers in the diff above. they always look much better with total random access than eina list. the structure could not even use blocks and us ea dumb single array if it wanted to. that is accounted for in the design to allow it to adapt storage layout based on content, access patterns etc.

so if you want a real comparison... the idea is to make a benchmark that reflects "real life". where we actually do work per list node and pollute the cache a fair bit per item or use up enough cycles to make it realistic... then do the same thing with an array and then compare. mimic behavior of real lists. focus on the lists that are modified/walked very often and make some test cases that mimic that behavior in a stand-alone benchmark .. THEN we have something realistic to measure :) some lists could be converted to pure arrays with nice benefits. others, probably not. my take is something like blist (or whatever would replace it) would make the choices easy as it hides the implementation and thus can be both a vector and a linked list and change on the fly to meet current demands. the work then is just having the code to track the usage patterns in simple ways and then auto-adapt the data layout inside. we could have hinting api's to hint an initial layout from what we know the behavior is likely to be. if you can come up with something better, then i'm listening :) my take is a pure array is no better because blist can be a pure array already just by implementation behind the api could do that if it thinks its best. again - hinting api's to force an initial data model can be used. blists could use nary-trees to make jumping to the nth item incredibly fast if this is a common pattern. if the blist is known to be sorted then same story for finding items by some key (callback) but this is not that common - most lists are sorted by explicit behavior like newest to oldest or top to bottom etc. ... or are totally unsorted. in some rare cases they may sort based on an index or string or something - but not that commonly. so a good representative-of-real-life benchmark is what is needed. i'm happy with blist that it's a good enough improvement in just memory footprint that it's totally worth looking at or something like it. it could be fine tuned and improved over time too with no api/abi breaks or porting needed. that makes me happy with it as it is, but ... let me know :)

I think both @raster and @bu5hm4n have explained the problem here pretty well. Just to add on the topic of Evas_Object and clipper who are at the top of that list. Evas_Object are monster object. Like KBytes object that are bigger than the L1 cache for sure. As soon as you start walking on them, you trash your cache with no way for the system to do anything about it. With Eina_List, the mempool increase the chance of locality, so you have a higher chance to just walk from one item to the next in cache. The possible improvement we could do on Eina_List would be to propagate the nearest item we want to be allocated next to to the mempool and have the mempool use that information to give you a slot that would be the best possible for the cache. This would have limited change in our code base and likely some small benefit.

Micro benchmarking here is not very useful. You would see Eina_List having good result just because there will be one list in the mempool and all the allocation will be nicely aligned there. You can try to generate more random pattern to trash the system, but it won't really match real life usecase. I remember a few years back reading that there was a few counter that report cache miss and stuff like that accessible. Maybe you could add this to Expedite report and see which test really look bad. Then try to figure out what that test does and where does all those misses happen. Valgrind cachegrind is supposed to have a good reporting on this issue, but you have to narrow down the test enough that you have a meaningful report.

  1. Can you elaborate on what kind of benchmarks you are looking ? I do not believe that we have a bad performance because of cache misses caused by eina_list, we have private data that is fetched multiple times during eo call chains and many more things, I simply do not believe that our list implementation does cause an considerable amount of slowness.
If you ask me do i want to use a container which can only store a 4/8 Byte Data (pointer to data) for that it keeps 16/32 Byte of Book Keeping . I will not even look at it. (let alone how bad/good to access that 4/8 byte of data).
If we just assume that there is no problem and bigger problem is somewhere else .. we are just ignoring the problem that's all.

2.Speaking of list and mempool, list is using mempool, which means the data is close to each other which should improve cache usage here. Using inlist / pod, is not really ensuring close memory segments. So our cache usage would > go up or down with inlist / pod ?

Have a look at the List implementation we are not allocating data in the mempool we are just allocating the bookkeeping(List Node) in the mempool . The Data is allocated by the User (malloc/calloc/ or some other mempool) . So
Don't get confused by the idea of List is using mempool so magically it solves all your problem.

3.Please have a look at the count method of list, it is O(1), in basic CS class 1 you will likely learn that counting requires literally counting each element, to get around this there is the accounting pointer, which also takes care of the >last pointer. If you take a look on our usages, then you will see that efl uses quite often the count method, which would be very bad performance wise.

 Also in same CS101 they teach us how to implement lenght() and append() in O(1) without putting length and last pointer in every node . I am pretty sure if you sit down for 2hr you can  implement List from scratch which has O(1)
length() and append() api but uses only 3 pointer instead of 4. Let me know how it goes .. or google will give you plenty of implementations.
  1. The benchmark is nice for an isolated view, but not usefull for usecases like "iterate over each item and perform 5 big eo functions on it" this will likely mess up your cache that much, that every cached result from the iterating is >dead by the time you are done with the calls.
Well all I can do is highlight the problem and see if we can do something about it or atleast change the mindset (popular belief ) . If our mindset is Eina_List is the best Data Structure then well not point of arguing .

My take on Eina_List

Eina_List is a 3X Fat glorified Eina_Array . Both these DS can only store 8byte data in one slot. If you disagree I would happy to hear your opinion.

Here is My Conclusion How Good and Bad is Eina_List comparing to Eina_array(not my favorite ) but that's the best among worst.

and I will take a EFL example so that there will be no discussion about synthetic use case .

Ex:
Our Default theme has 133 styles (in a list ) and each style has list of tags (list)

style list node = 133 , total tag node = 1727

so Total List Node we create = 133 + 1727 = 1860

Total memory just for Node (No data ) = 1860 * 32 = 59.52KB
The Same structure is created again in Evas_TextBlock_Style , which means it has atleast 1727 list nodes 1727 * 32 = 55.26KB

List Bookkeeping cost for storing Styles of Default theme = 59.52 + 55.26 = 114.78 KB.

Eina_Array

Edje level :

Array Node : 1860 * 8 = 14.9KB , 
 Array Bookkeeping per Container = size + capacity + data pointer = 24 bytes. and we have 133 containers , so 133 * 24 = 3.19KB

So total = 14.9 + 3.19 = 18KB

Lets assume Evas_textblock also takes same size when we change to Eina Array 18KB

So total book keeping cost using Eina_Array = 36KB

Memory Footprint Difference = 114 - 36 = 78KB.

So what I am trying to say is if you change only the container of this 2 instance from List to array you will reduce 78KB of heap memory usage .

You can check with @raster and @cedric how that translate to power consumption and performance improvement.

Thats all I have to say .. and now will get ready for my much needed vacation :)

  1. Can you elaborate on what kind of benchmarks you are looking ? I do not believe that we have a bad performance because of cache misses caused by eina_list, we have private data that is fetched multiple times during eo call chains and many more things, I simply do not believe that our list implementation does cause an considerable amount of slowness.

If you ask me do i want to use a container which can only store a 4/8 Byte Data (pointer to data) for that it keeps 16/32 Byte of Book Keeping . I will not even look at it. (let alone how bad/good to access that 4/8 byte of data).
If we just assume that there is no problem and bigger problem is somewhere else .. we are just ignoring the problem that's all.

There are benefit for excess data with a few O(1) case like insertion/deletion/counting. We could slightly better with a single linked list and a permanent head that does the counting for very large list that we only access linearly. As I have said before, I tried array/inlist for clippers and it was way worse, because of the access pattern.

2.Speaking of list and mempool, list is using mempool, which means the data is close to each other which should improve cache usage here. Using inlist / pod, is not really ensuring close memory segments. So our cache usage would > go up or down with inlist / pod ?

Have a look at the List implementation we are not allocating data in the mempool we are just allocating the bookkeeping(List Node) in the mempool . The Data is allocated by the User (malloc/calloc/ or some other mempool) . So Don't get confused by the idea of List is using mempool so magically it solves all your problem.

As said, it depends of the case. For list that are referencing humonguous object, Eina_List is a better choice than Eina_Inlist and the mempool add data locality when walking the list.

3.Please have a look at the count method of list, it is O(1), in basic CS class 1 you will likely learn that counting requires literally counting each element, to get around this there is the accounting pointer, which also takes care of the >last pointer. If you take a look on our usages, then you will see that efl uses quite often the count method, which would be very bad performance wise.

Also in same CS101 they teach us how to implement lenght() and append() in O(1) without putting length and last pointer in every node . I am pretty sure if you sit down for 2hr you can implement List from scratch which has O(1) length() and append() api but uses only 3 pointer instead of 4. Let me know how it goes .. or google will give you plenty of implementations.

Yes, Eina_List is designed with the idea that we have small list and often empty list. Everything has a tradeoff, as said above, the main question is do you have benchmark that show problems. Rules number one of optimizing, benchmark! Assumption on where the problem come from without data usually lead to wasting time on the wrong things.

  1. The benchmark is nice for an isolated view, but not usefull for usecases like "iterate over each item and perform 5 big eo functions on it" this will likely mess up your cache that much, that every cached result from the iterating is >dead by the time you are done with the calls.

Well all I can do is highlight the problem and see if we can do something about it or atleast change the mindset (popular belief ) . If our mindset is Eina_List is the best Data Structure then well not point of arguing .

There is no better data structure. Their are good and bad use.

My take on Eina_List

Eina_List is a 3X Fat glorified Eina_Array . Both these DS can only store 8byte data in one slot. If you disagree I would happy to hear your opinion.

Eina_Array doesn't allow for random insertion/deletion and prepend for starting. If we are using Eina_List where we can use Eina_Array, it is indeed a bad use of Eina_List.

Here is My Conclusion How Good and Bad is Eina_List comparing to Eina_array(not my favorite ) but that's the best among worst.
and I will take a EFL example so that there will be no discussion about synthetic use case .

Ex:

Our Default theme has 133 styles (in a list ) and each style has list of tags (list)
 
 style list node = 133 , total tag node = 1727

so Total List Node we create = 133 + 1727 = 1860

Total memory just for Node (No data ) = 1860 * 32 = 59.52KB
The Same structure is created again in Evas_TextBlock_Style , which means it has atleast 1727 list nodes 1727 * 32 = 55.26KB

List Bookkeeping cost for storing Styles of Default theme = 59.52 + 55.26 = 114.78 KB.

Eina_Array

Edje level :

Array Node : 1860 * 8 = 14.9KB , 
 Array Bookkeeping per Container = size + capacity + data pointer = 24 bytes. and we have 133 containers , so 133 * 24 = 3.19KB

So total = 14.9 + 3.19 = 18KB

Lets assume Evas_textblock also takes same size when we change to Eina Array 18KB

So total book keeping cost using Eina_Array = 36KB

Memory Footprint Difference = 114 - 36 = 78KB.

So what I am trying to say is if you change only the container of this 2 instance from List to array you will reduce 78KB of heap memory usage .

You can check with @raster and @cedric how that translate to power consumption and performance improvement.

Not an easy question at all. It really depends on what are the access pattern. The tags and style structure are rather small and an Eina_Inlist (Eina_Clist if you do not need to jump to the last item and always prepend or keep the tail somewhere) might pay of as I am guessing we do have API to insert and remove tags (So we need the random access ability, ruling out array).

This being said, the API could be entirely changed to allow the sharing of style information between edje and evas using for example an Eina_Accessor. Deduplicating data would allow for even more memory reduction and more importantly stop slow transfer of the information between edje and evas. Not saying that this is the best solution here, but it seems that their are rooms for improving the API.

Updated: We always use EINA_MAGIC , even in tizen people are scared to turn of EINA_MAGIC_DEBUG , so List Node Data size = 5 Pointer = 20/40 byte

When I ask in the team whats the List_Node Size everybodies answer was 3 pointer size.

So at least now they are aware of Data Structure they are using and what's the cost associated with it . That's the best I can get from this conversation.

Now all the internals of List/array is out in the open for future reference So will close this issue.

smohanty closed this task as Wontfix.Aug 12 2019, 8:03 PM

If you ask me do i want to use a container which can only store a 4/8 Byte Data (pointer to data) for that it keeps 16/32 Byte of Book Keeping . I will not even look at it. (let alone how bad/good to access that 4/8 byte of data). If we just assume that there is no problem and bigger problem is somewhere else .. we are just ignoring the problem that's all.

I think you missed our point. You speak of a THEORETICAL problem. Where is the real problem in real code with real performance impact that can be measured. All of us (@bu5hm4n , @cedric, me) were making the point that in almost all cases this is not the real use. almost all structures that are walked/modified frequently enough and/or have enough members (more than a few items) to matter do not have trivial amounts of data in the structure. ESPECIALLY the inlist uses. At least that's from our knowledge of both EFL and of apps using EFL that we write (like enlightenment or terminology etc.). If you have identified some real problems we'd like to know about them. We on't know of them. We can spend all day on the theory - what matters is practice and actionable items that make a real difference to performance and/or memory footprint. These are the examples we're asking for - not theoretical. Real life "here is the code - my profiles say its called X often and costs Y time and Z memory and if we used an array here we'd improve to A, B, C instead" :)

lists like the clipper list are an indirection because they have to be - they are pointing to an object somewhere else because that is the nature of the problem. there is no value with inlist there for example. using less memory per node and making those walks faster would help and that's why i made blist as a start to these kinds of uses which i do not have a fair amount of use. a pure array/vector may not actually be faster here. or not that much faster. things like blist break even at 4 items. arrays/vectors have a similar problem. if they realloc for every item u add they'd suck. so they alloc in chunks. like 8, 16, 32, 64 items per alloc and thus will over-allocate almost always.

Have a look at the List implementation we are not allocating data in the mempool we are just allocating the bookkeeping(List Node) in the mempool . The Data is allocated by the User (malloc/calloc/ or some other mempool) . So Don't get confused by the idea of List is using mempool so magically it solves all your problem.

That is not a problem something like eina list, blist, inlist or any such data struct solves or should solve. it's up to the user to put into the list what they want and allocate it the way they want. they can use mempools for what they put it. they can just put in references ot existing things. oid's ... anything in a pointer. it's a generic data struct and as a generic struct a pointer is about the most generic thing that can be done. again - the number of times you need trivial amounts is rare. it's more often either just a ptr to something big and complex or... it's using inlist already in which case the list not *IS* something big and complex. almost every single time.

again - not talking theory. this isn't CS. it's real live code with real data with compromises and understanding of the lifecycles and use cases. a lot of data goes in as "well that's rarely accessed and not much so this is just less work and easier and so it won't have much impact". focusing on the things that do have real impact from real profiles etc. - like you were doing with vtune. fyi i was trying to actually measure performance differences with the rwlock stuff... i basically couldn't. not even on a raspberry pi 3. i couldn't measure significant startup time differences. so it's had a very small impact if anything worth measuring. it's a nice "theoretical" win. i'm not suing real decent measurable numbers. :) that's what we want - real measurable numbers that go "this actually consistently now starts 5% faster every time" or "we measured memory and we saves 200kb - every time" etc. :)

Also in same CS101 they teach us how to implement length() and append() in O(1) without putting length and last pointer in every node . I am pretty sure if you sit down for 2hr you can implement List from scratch which has O(1) length() and append() api but uses only 3 pointer instead of 4. Let me know how it goes .. or google will give you plenty of implementations.

read what i wrote. 4th pointer is/was for free. you CAN implement a list as a circular list with first pointing to last. this leads to other very un-fun things like detecting you are at the start or end is more painful (its not just a null next/prev but you have to compare it against the known list head). you CANNOT implement O(1) counts at all without some other data tracking it. the only way is to have a different struct for the list as a whole (a list container) then that then contains the linked list nodes and so this accounting block is in the header. that's what eina hash does. it's what blist does. eina list used the accounting ptr because i built it with malloc as the backend for the list nodes (as i explained) AND used to malloc bucketing the 4 pointers are the same cost as 3. read about bucket sizes in malloc implementations. :) CS101 does not discuss this kind of stuff as it sticks to pure theory. when an allocation for 24 bytes costs as much as one for 32 or an alloc for 12 bytes costs as much as 16.. you design differently sometimes to simplify other bits of code. so i just inverted things so a list is ALSO its first item - no differing data structs as i could, for free, attach the last and count info to the first linked list node with the accounting ptr. it simplified usage of the list as a result and is making use of an implementation detail that CS101 doesn't care about. that 24 bytes == 32bytes when it comes to allocations on 64bit (and 12 == 16 on 32bit)... i know the theory full well. also do some real numbers and comparisons. don't read articles. write code. do it. write a theoretical "insertable array" where you can insert items in the middle and remove them and append, prepend, allocate in lots of 8 or 16 and so on to be efficient then look at small, medium and large lists sizes and do a bunch of operations common to lists (prepend, append, insert/remove) with some repeatable random kind of behavior that proportionally reflects the kind of usage patters. then try this with various sized lists and your array and do the numbers. make sure you dirty the caches in between because that is what happens. you will see something append then walk away and execute for 1000000's of cycles before it comes back to the list to maybe remove that item etc. - this is what we're saying - make a benchmark that reflect some real world usage patterns and the do some numbers to see what ends up best and what the cost differences really are. you may be surprised at how much less of an impact it really is. in the scheme of things. :) but... go ahead. i've done this kind of thing for blist. very simplisitcally to test it but i have a feeling for the differences. blist is basically an array that can be broken up into segments. last night i added some #defines so i can make a block any size i like (ive tested from 64bytes up to 64kb for a block of ptrs). these numbers are "runtimes" so the lower the number, the better:

//         number of items in list...
// BLKSZ :      10    100     1k    10k   100k
// 64    :   0.975  0.954  0.980  2.198  2.109
// 128   :   0.871  0.779  0.788  1.540  1.096 * best overall
// 256   :   0.918  0.854  0.897  1.830  1.011
// 512   :   0.906  0.777  0.813  1.646  0.983
// 1024  :   0.874  0.780  0.804  1.447  0.824
// 2048  :   0.882  0.838  0.803  1.376  0.667 * best for long lists + overall
// 4096  :   0.903  0.812  0.893  1.391  0.648
// 8192  :   0.899  0.835  1.041  1.539  0.641
// 16384 :   0.898  0.820  1.641  1.812  0.655
// 32768 :   0.914  0.829  1.643  2.344  0.712
// 65536 :   4.563  1.291  1.782  4.497  0.968
//
// now numbers for eina list:
// list  :   1.632  1.794  2.116 12.587 12.521

now for lists of 10 items, a list is only about 60% slower. it'll use about 2x the memory as a list. this is super common in efl to have smallish lists. of course that's for 10 items. a block size of 128 ... which seems to be the best overall middle-ground uses more memory until we hit 5 items. then it wins... so 4 items or less is "it's the same". you can use a pure array without a block thing around it of course and then you need to also keep count, alloc size too, so at least 2 more ints and the array itself so arrays are not totally zero cost.do u keep the count and alloc count in the parent struct or in an eina_array like one or... anyway. on 64bit an array is going to at a minimum cost you 16 bytes. it cannot cost less due to malloc implementation bucketing. if we have a generic eina_array and extend it to insert/remove in the middle then you end up with at LEAST 16 bytes as a base, THEN another 16 bytes for the first chunk. but that means u alloc in lots of 2 ptrs. let us for now just stick to storing ptrs to things. it's simpler your example of ptr + bool is rare in real life. what is common is just plain ptrs or fairly large structures (of at least multiple 10's of bytes if not 100's). so to be efficient with an array we will almost always alloc in steps of N like 8, 16, 32. so... a default might be 16 so ... with 1 item in an array (1 ptr) we will alloc almost always 16 bytes + 128 bytes in 2 chunks. now the cost is the same until we fill that (and that will be 16 pointers). an eina list on 64bit will cost less until we go over 4 items. look at the above numbers. the numbers are much closer and here we're not polluting the cache very much in between. until we exceed 1k items klists and arrays are like within about 2x speed. at 10 things begin to suck due to the excess memory loads. but so do "almost pure arrays". see the numbers at 10k items get much worse (more than 2x) vs a hybrid linked list/array (block sizes of like 256/512/1024)...

my point is that for small lists, eina list is actually pretty much a win. unless you have magic auto-adapting data structures (and blist is intended to be able to do that in future). once you hit 5 items an array begins to win. but not by the kind of amounts. by having a linked list of arrays you get the best of both worlds where shuffling items up/down is only needed a little bit for smallish blocks of items. blist could be extended to store items of size N but then would need to have an allocator too that inserts PLUS allocates and returns the ptr so u can fill it in. that ptr will NOT be valid long-term (unlike inlist or eina list data ptrs u insert) because as items shuffle up/down that ptr will change. arrays/vectors have different PROPERTIES. you CAN do this with a list:

x = alloc_obj();
x->a = 10;
x->b = 20;
append(list, x);
parent->x = x;
... a long time later after list is modified a lot but x still exists
parent->x->b = 30;

you CANNOT do that with an array where x is store inline in t he array. you CANNOT. it means every ptr to x to access it directly or very index needs updating every time you modify the array as the item you added may have moved in memory. every single time. this is a cost you are absolutely NOT accounting for. this is in fact quite common. the alternative is the array holds pointers TO items like eina list does. that means always an indirection and allocation like you are saying "sucks". indeed its a balance of these factors. CS101 does not cover all the subtleties of how data may be references across a codebase and when you want references to remain constant and usable without having to track every reference to an item and update them all the above is the kind of real code we have in efl. we have structs pointing to structs, lists of structs that point to others etc. - storing inline is often a special case thing where possible, not the common case.

don't get me wrong - this debate is good. it needs to be had. yes, we can improve things. i sure there are improvements to be made. i know there are. it's why i was taking a stab at blist to begin the train for that. i wanted a good design that could go forward efficiently and nicely, then be used where appropriate. i do think though you are being far too CS101-ey and thinking theory and not practice. all the examples of vectors do NOT deal with "you want to reference some object you inserted from 12 places in your code in 12 other structs and u want that reference to be constant regardless of the "list like container" it's stored in". what we're asking for is if u identified real hotspots and issues that could do with some focus - like you did with the eet dict locks :)

Well all I can do is highlight the problem and see if we can do something about it or atleast change the mindset (popular belief ) . If our mindset is Eina_List is the best Data Structure then well not point of arguing .

that's not the case. see above. but you need to, for example go "i've looked at the clipeee lists in evas they seem to get quite long. i've done some debug code and counted items and statistics say X: (some frequency data on list length sizes, so we have an idea of how many list counts are 0, 1 element, 5, 10, 200, 1000 etc.) and then we begin to go "ooooh wow - so many lists here have 100+ items..." and we begin to think about it. then maybe you say "i replaced all the clippee stuff with an array - i had to add a remove func to eina array to remove in the middle and i saw performance change to X or memory footprint to Y". ... do some of THIS stuff... identify hotspots. eina list is not changing because it's an abi and the structs and their layout are exposed. it cannot change. we can change how we alloc list nodes and data that goes into list nodes. things can be converted to inlist to merged node + item in cases where this is worth it. again - identify those cases. we already had that done for major uses already. there are possibly more of them to improve on. if there are somewhere significant uses of trivial structures (ptr+bool for example) that indirect which could be inlined ... then point them out. not theory. find them in efl code :) arrays and lists have different properties. removal from the middle of a list is very cheap. removal from a long array is expensive. even with caching effects. if data is inlined its more expensive and the win for list gets better at some point. thwe wins will also vary from system to system, arch to arch depending on caching, memory subsystem etc. and from generation to generation. so to be really good you need to test on very old cpu's and very new ones, arms vs x86. low end arm systems that may have much faster memory buses vs cpu clocks vt fast x86's with super fast cpu clocks and relatively slower memory but huge caches etc.

Our Default theme has 133 styles (in a list ) and each style has list of tags (list)

indeed edje never imagined our edj/theme files would become anywhere as big as they are. i never expected them to even be like more than 1/10th the current size. i expected maybe 10 or 20 styles... never the monster it is. the thing is... you cant change it from a list. it has to be a list because that's the data format on disk and the decode/encode. to change this will be to break it and you have to deal with the old style list format AND a new added array - so 2 paths etc. - so changing it has to be worth the effort. this may be worth it and save some memory. be aware that the evas text styles can change internally so that can be changed but we won't create every single one all the time - we'll create on demand... right? so only those USED will matter on the evas side... :) so it's probably closer to 70k of usage (60 edje +10 in evas) maybe 80k. it also will need multiple arrays each with their own coutns so your numbers will go up for arrays. for a dynamic array it needs a count plus an alloc count 2 ints per array - so 133 times that overhead (133*8) to 1k there and if they use things like eina array they'll over-allocate or building them will be a bit sucky... etc. ... but remember you are now talking a generic container vs a special case where here we know the count of tags ahead of time and thus could alloc in 1 go. we could allocate all the strings in a single memory block and just index into it with strings separated by nul bytes. the evas style side could do this and have a "compact" phase here after it is built it removes all the internal lists and builds an index into a single char * blob. now u dont need a ptr per tag.. u need an INT per tag it's now string = style->string_blob + index; we could even use a short if the blob is < 64kb in size :) ... this will drastically reduce # of allocations and thus fragmentation and malloc overhead. it won't recycle stringshare though which may be bad. this needs to be looked at - is the cost of keeping it as a single blob of string (i.e. "tag1 here\0tag2\0tag3 here=10\0tag4" ... vs ptrs to stringshares? needs to be measured.

but now your'e talking a specific case and this is practical and useful. we all know what an eina list costs in memory. we know the implications of walking the lsits and modifying them. they are often just a lazy default choice or chosen because lists have specific properties. yes - things can improve. like you say. i can do even better than the naieve "just sue an array of pointers" like above. if we have special cases you can make very compact stuff. imagine using just shorts for each style tags (well short for key, short for value). instead of 1727 tag nodes at 2 ptrs each in an array (13816kb) we can do that in 3454kb with the short indexes. have the evas textblock styles auto-adapt. if the text blob is too big it has to move to ints for indexes... so it needs to have an ints vs shorts path (i doubt any style will exceed 4gb of ram so we don't need more than ints... (well uints))... so i can do better. :) will using stringshares cost less in shared string content vs what we save in index sizes? that needs statistics to see. :) if you do that and get some data from a running app and content in memory (by putting in probing code in edje/evas to get some stats). :)

what we want is the above kind of data - with a view of the specific data being encode, what is it, the nature of that data and ... how could it possibly be reduced in many ways. but for example the shorts/ints vs char *'s to stringshares decision depends on the nature of the data so need that to make a decision.

see where we're coming from? need this kind of analysis. :)

I tried array/inlist for clippers and it was way worse, because of the access pattern.

you tried that? cool. i never had. i was wondering on this. i am hoping blist might be better as it does a merge of array + list and seems to have the best of both worlds except with very very very short lists. :) i guess trying again might be worth it if we have something better than just plain arrays.

Yes, Eina_List is designed with the idea that we have small list and often empty list. Everything has a tradeoff

correct. it was designed with this in mind - our lists would be fairly small and often totally empty. also it was designed for simplicity and that often enough u want a count of a list and don't want to walk it. it was also designed with malloc implementations in mind where 24bytes == 32bits in the implementation as well as 12 == 16 on 32bit. in the cases where it didn't work well other things were available to use. :) there are probably such cases through efl where we can save some memory or get speedups by custom-optimizing for a specific use case and that is what we need the data/benchmarks for :)

There is no better data structure. Their are good and bad use.

I think i need to clarify frenglish or maybe cedrish :) I know what you mean but to be clear - you mean that eina list is not "the best data structure ever and all others suck". you mean that it is just another kind of structure with its benefits and downsides. it has properties that other structures will not have, and vice-versa. you need to pick the right tool for the job. it may be there are better structures (tools) for some other cases. finding those cases and detailing them and then finding a best path to improve them is what matters and that needs data. :)

This being said, the API could be entirely changed to allow the sharing of style information between edje and evas using for example an Eina_Accessor. Deduplicating data would allow for even more memory reduction and more importantly stop slow transfer of the information between edje and evas. Not saying that this is the best solution here, but it seems that their are rooms for improving the API.

This is up for debate and it's probably a good thing to have at something like E dev day :) or on irc... my idea of a single text blob with 0 byte delimited strings and an index pointing into that blob might actually be best even if tag strings and tag content end up duplicated - it reduces the index's down to 1/4 the size. having a way of importing that tag blob into evas directly from edje (and having an updated edje on-disk format that stores this way) may help too. it's hard though to do this and keep a clean api. this is one of those "maybe a back-door C only api that lets us set the string ptr directly" might be needed. :) but .. up for debating and this kind of debate on what to do is the useful bit :) the whole eina list discussion above is a red herring and just brought us to this point where we can save some 10's of kb's perhaps and that's nice. there is work to be done to do it. how much effort will *I* spend to save 80kb ... it depends what other things i have to do... and i have a LOT of things to do. :) it's a matter of prioritization. how much could be saved, and with how much work. what impact to api's and design... and then what impact to performance?

blist is something that can be widely applied across many bits of efl. not all, but many. it's worth a lot of effort because of the wide application possibilities and the savings for longer lists. with some effort even super short lists can improve. so this is more of a "make a better data struct for these use cases and then know use cases for it already" and clipees was indeed one of those i had in mind :) but the styles thing is very focused and specific and needs a different kind of effort and evaluation. :)

Updated: We always use EINA_MAGIC , even in tizen people are scared to turn of EINA_MAGIC_DEBUG , so List Node Data size = 5 Pointer = 20/40 byte

oh yeah. eina magic was added much later after i designed eina list to nicely fill in a 32byte chunk. it wont use 40 bytes now. it'll use 48 actually due to malloc bucket sizes..

Now all the internals of List/array is out in the open for future reference So will close this issue.

it's been out in the open forever. people din't read the code it seems. :) there is a long history and the magic was added i think because we were having memory corruption issues often enough so we wanted to catch them. originally it was not there. i'm wondering if we should keep it these days? with freeq we have memory poisoning on free so a list node will be a pattern with freeq turned on but not poisoning by default. i actually rarely-if-ever see magic issues with eina list... maybe just turn on poisoning by default (set fillpat_max to like 64 or 128 by default)? then a freed list node will have a fill pattern to it and be easy to identify. a corrupted one though... magic helps there. but maybe again - turn this off on release builds. tizen should too but... it'd save 16 bytes per node. @cedric ?