Improving scaling of image data with Evas and OpenGL

Better quality downscaling with Evas, OpenGL and Shaders.

This is my first post here to your new Phabricator setup. I'm testing the blog infra, and also writing some useful information at the same time. So welcome to our new and funky infra.

The situation

So thanks to some work @zmike is doing, Enlightenment now has the compositor on ALWAYS. This was on the plans now for years, and is a step to improving Enlightenment, cutting down memory usage and paving the way for future support like Wayland compositing direct to KMS/DRM etc. buffers or fbcon and so on. It also simplifies code internally and opens up possibilities as well as fixing some bugs (like iBar icons being cut off by the shelf when they pulse+zoom). To make this transition more efficient we are removing internal windows from Enlightenment and moving them to be objects directly in the compositor canvas.

This move means that more and more content is rendered by, and lives inside of the compositor canvas exclusively. This has some downsides but many upsides. One of the upsides is that everything, even apparent window "content" like the Shelf, Menus, Popups etc, is all 100% rendered by your compositors rendering engine. This may be software or it may be OpenGL. This means we accelerate almost EVERYTHING through GL... even all the rendering of text, icons and more, if this is what you selected.

Side effects

This came with a side-effect. A downside. OpenGL just can't do 2D nicely. Not without some beating into shape. OpenGL was built for 3D. It is clear this is what it is really meant for. I've been beating OpenGL into doing 2D for a dozen years now. When it comes to 2D with OpenGL, we are recycling a tool meant for something else. 2D isn't a complete subset of 3D. That is a whole topic on its own, but anyone who has tried to seriously use OpenGL for 2D will attest to this.

So one of the things OpenGL can do is provide filtered scaling. To a limited extent. It may use Linear interpolation on upscaling, even on downscaling. It can use mipmaps, and all sorts of combinations of these with bi-linear and tri-linear interpolation, anisotropic filtering etc. This is great for 3D, but unfortunately the only one of these that is of any use to us in 2D is linear interpolation. Mipmaps add a fat memory requirement (33% more as well as the cost of generation) AND we have to also handle scaling by weird factors. Example 80%x10% for stretching. This leads to needing a more complex mipmap setup that blows memory usage out badly.

So what do we do? Well so far Evas has ASKED OpenGL to use anisotropic filtering at the max level and relied on linear interpolation. Reality is that anisotropic doesn't work without mipmaps etc. and linear interpolation only provides decent quality down to 50% of the size of the original. Below that level of scaling, it gets rather ugly. This just so happens to be something that Enlightenment does a lot of for gadgets, icons and more.

Sampling and scaling in OpenGL

So first let's look at how linear interpolation works and why this equates to a 4 point multi-sample with weighting. When you linearly interpolate, you sample 4 neighboring texels and compute a weighted average.

In this example we will weight the bottom-right texel more than the other 3, giving an interpolation between the 4, which is a weighted average. If we continue using linear interpolation when scaling below this (e.g. to 25% of the original size) we end up doing a weighted average of 4 texels from a logical sample region of 16 texels.

This means we do not account for 50% of the image information when downscaling to this level. This leads to rather ugly results and soon visually degrades to not being much better than nearest scaling.

The Solution

After thinking about all the things I could do (mipmaps, a scale-cache like the software engine uses to keep high quality scaled copies of data that are frequently used), spending more time wondering why anisotropic wasn't doing its multi-sampling without mipmaps, z-buffers etc. ... it dawned on me. Such a simple solution that it evaded my first thoughts... use GLSL! We already require it anyway. Just do the sampling ourselves manually in a shader. Of course only select this shader when downscaling sufficiently. So the shader now looks like this:

Vertex

attribute vec4 vertex;
attribute vec4 color;
attribute vec2 tex_coord;
attribute vec2 tex_sample;
uniform mat4 mvp;
varying vec4 col;
varying vec2 tex_c;
varying vec2 tex_s[4];
varying vec4 div_s;
void main()
{
   gl_Position = mvp * vertex;
   col = color;
   tex_c = tex_coord;
   tex_s[0] = vec2(-tex_sample.x, -tex_sample.y);
   tex_s[1] = vec2( tex_sample.x, -tex_sample.y);
   tex_s[2] = vec2( tex_sample.x,  tex_sample.y);
   tex_s[3] = vec2(-tex_sample.x,  tex_sample.y);
   div_s = vec4(4, 4, 4, 4);
}

Fragment

uniform sampler2D tex;
varying vec4 col;
varying vec2 tex_c;
varying vec2 tex_s[4];
varying vec4 div_s;
void main()
{
   vec4 col00 = texture2D(tex, tex_c + tex_s[0]);
   vec4 col01 = texture2D(tex, tex_c + tex_s[1]);
   vec4 col10 = texture2D(tex, tex_c + tex_s[2]);
   vec4 col11 = texture2D(tex, tex_c + tex_s[3]);
   gl_FragColor = ((col00 + col01 + col10 + col11) / div_s) * col;
}

This effectively makes us sample all 16 texels by offsetting out texture coordinate a bit in the x and y direction and sampling 4 times:

Of course this is covering a case mipmapping does well - scaling down by the same proportional horizontally and vertically. Never fear. I also implemented not just the 2x2 linear interpolate multisampler (which is 16 point sample effectively), but also 2x1 and 1x2 as well for more speed in these cases. I could extend it to do more like 3x1, 3x3, 3x2, 2x3, 1x3, 4x1, 4x2, 4x3, 4x4, etc.

Results

Now of course you may be saying "but what if we scale down even more? won't this not be enough? Implement the 3x3 and 4x4 (and permutations) or will we have quality problems", and you may have a point, but in actual testing, this doesn't seem to be the case in real life. So for now this is enough. My rough speed testing shows a 2x2 multisample to be about half the speed of just the normal naive linear interpolation version. It's only used when doing such downscaling, so it's not normally a "hit" until you have to scale like this. So what are the results. Well see below. We have above (before) and after (below). The quality and smoothness improvements are drastic and amazing. Judge for yourself:

The software engine doesn't have quality problems because its downscaling is already a full weighted area super-sampler. It always has been. This means it can be slow when downscaling, but it makes no compromises for quality and looks really good. With this shader magic, the OpenGL engine is almost as good now, but still faster. Also never fear... this is also for OpenGL-ES2 as well.

This code is now in EFL GIT in revision rEFL683e5d7d0848b0b044eca151c61ad2254dac2e63 and is already available for pulling, and will be part of EFL 1.8 when it is released.