Page MenuHomePhabricator

evas_image_loader spins on error, user sees 100% cpu usage
Open, Incoming QueuePublic

Description

Refer to this file in efl: https://git.enlightenment.org/core/efl.git/tree/src/generic/evas/common/shmfile.c#n73

In function shm_alloc:
ACTUAL:

do
  {

...

  }
while (shm_fd < 0); // endless loop on some errors

EXPECTED:
#ifdef NEED_TMP_SHM

// linux and solaris do not allow multiple <slash>, but some systems do
static const char * EVAS_SHM_FNAME="/tmp/evas-loader"

#else

static const char * EVAS_SHM_FNAME="/evas-loader"

#endif

do
  {
     snprintf(shmfile, 1024,  EVAS_SHM_FNAME ".%i.%i", (int)getpid(), (int)rand());
     shm_fd = shm_open(shmfile, O_RDWR | O_CREAT | O_EXCL, S_IRUSR | S_IWUSR);
     if (shm_fd<0)  {
         // LOG error code
        switch (errno) {
             case EINTR:  
                     // did this process handle signal? Should we check some value and goto failed or exit()?
                     break; // probably we can try again, maybe check a counter(?), but we could also just skip the shared memory optimization.
             case EEXIST:
                     // This should not happen, process id is in the name, this should not collide. Is some instance loading more than one image?
                     // this should not happen because of rand(), did someone forget to call srand in main? Should not matter on BSD (non standard rand).
                    continue; // evil code says try again. Really? should we have counter to give up after a few retries?
             case EACCES: // no permission? not transient, not recoverable
                      // some systems expect and allow path=/tmp/evas_image_loader... this would fail first and every time
                     goto failed; // don't need shm after all?
             case ENFILE:
             case EMFILE: // doh, no file handles. other programs for enlightenment may fail too.
                     goto failed;
             case ENOENT: // unexpected. should not happen. we have O_CREAT -- can not recover
              // these should never happen
              case ENAMETOOLONG: 
              case EINVAL: // maybe we should not have defined NEED_TMP_SHM after all.
               default:  // linux man does not define ENOSPC ? POSIX does, this could happen.
                       goto failed;
         }
      }
  }
while (shm_fd < 0);

DISCUSSION:
Looping over the error condition implies the error is expected be transient. When would that happen?
The windows code seems to expect EEXIST, and rand() will resolve this, but now the process consuming inodes (how would that be cleaned up?)
EINTR is probably safe to retry, but other errors are not, unless some other process will release file handles while we are running; but then we might not get any benefit from this optimization.

This issue should have been mentioned in coding standards...
Endlessly looping over an error condition is deprecated.

 Only transient (EAGAIN and sometimes EINTR) or recoverable  error conditions should be retried.
 In some cases a call that returns an error can be retried with new parameters (like a new file name). 
         If a user or administrator should take action to fix an external condition, a log message should indicate the action item.
If an error is transient, a comment should indicate how and where another program or part of this program resolves the error.