Page MenuHomePhabricator

eina_file_path_sanitize generating faulty paths.
Open, HighPublic

Description

In case we have a symbolic link link this:
/usr -> /usr/lib

now you call eina_file_path_sanitize("/bin/../usr/../include/) the result will be /include/.
Which is wrong, since this path definitly references /usr/include/ in every shell, (which should probebly be the prefered way for doing this)

bu5hm4n created this task.Aug 17 2018, 2:53 PM
bu5hm4n triaged this task as High priority.

Which is wrong, since this path definitly references /usr/include/ in every shell, (which should probebly be the prefered way for doing this)

you're shell uses libc realpath which does resolve links:

realpath() expands all symbolic links and resolves references to /./,
/../ and extra '/' characters in the null-terminated string named by
path to produce a canonicalized absolute pathname.

from the name eina_file_path_sanitize i would assume it's only supposed to sanitize (or clean) a pathname, without checking for it's existance (so no syscall required).

Sure but the problem is: you can put into sanitize a valid path, and you will get back a not existing path, or in worst case, a file that you did not want. Which is 100% unexpected & a problem for us.

We are calling sanitize on the paths, before we open them (in eina_path.c) when eina_path_open is called. And in a lot of other places. Those places where this is called are not looking to me like they know that this will happen.

Site note: realpath only translate a symlink into a real path,in this example /usr -> /usr/lib, Not a whole path that contains a reference to a symlink.

We are calling sanitize on the paths, before we open them (in eina_path.c) when eina_path_open is called. And in a lot of other places. Those places where this is called are not looking to me like they know that this will happen.

then these calls to sanitize in those functions should be removed, as they are not only unnecessary, but also do not add anything to speed (the kernel has to do this work anyway and does it better) nor security.

side note: sanitizing a file path usually only makes sense if you get a file path from a non-trusted source like network services (http, ftp, smb) and you want to check if your program actually should to deliver such a file to an attacker. however, i don't see anything in eina_file.c that assumes it's working on non-trusted paths. so especially in eina_path_open() using sanitize defeats its own goals.

found the commit from @cedric a8d945f0a686f4359ca28ce288ca09e7ccb20ea2

eina: actually sanitize all file inserted in the cache.

-   /*
-     FIXME: always open absolute path
-     (need to fix filename according to current directory)
-   */
+   filename = eina_file_path_sanitize(path);
+   if (!filename) return NULL;

so eina_file_path_sanitize was actually introduced so a hash table used for caching [file-name]->[file_contents] accurately works for files with multiple relative paths in cache.

this could make the solution much simpler: only use eina_file_path_sanitize for setting the key in the hash table, but not for opening the file itself. however, stat is also called anyways, so why not get the information from there?

also eina_file_open is actually written to do it this way, because it first uses a syscall for opening a file, checks if it could open it and then checks the cache if the contents of the file is already there. (//commtens by me)

   filename = eina_file_sanitize(path);
   if (!filename) return NULL;

   if (shared)
#ifdef HAVE_SHM_OPEN
     fd = shm_open(filename, O_RDONLY, S_IRWXU | S_IRWXG | S_IRWXO);
#else
     goto on_error;
#endif
   else
     fd = open(filename, O_RDONLY, S_IRWXU | S_IRWXG | S_IRWXO);

// call succeeded, file exists
   if (fd < 0) goto on_error;

   if (!eina_file_close_on_exec(fd, EINA_TRUE))
     goto on_error;

// stat however has all information to identify a file
   if (fstat(fd, &file_stat))
     goto on_error;

   eina_lock_take(&_eina_file_lock_cache);

// only after opening, see if the file is in cache
   file = eina_hash_find(_eina_file_cache, filename);
   if ((file) && !_eina_file_timestamp_compare(file, &file_stat))
     {
        file->delete_me = EINA_TRUE;
        eina_hash_del(_eina_file_cache, file->filename, file);
        file = NULL;
     }

   if (!file)
     {
        n = malloc(sizeof(Eina_File));

this could make the solution much simpler: only use eina_file_path_sanitize for setting the key in the hash table, but not for opening the file itself. however, stat is also called anyways, so why not get the information from there?

No this is still a bad idea, since 2 paths (A,B) which are pointing to 2 different files can still point to the same file after eina_file_path_sanitize. And well I dont see how you can get the full path of a file from the stat structure ...

also eina_file_open is actually written to do it this way, because it first uses a syscall for opening a file, checks if it could open it and then checks the cache if the contents of the file is already there. (//commtens by me)

To be honest, i cannot believe that eina_file_open is written intentionally to map two different resulting paths into one file. Or open files in a different place then where your shell would open it, thats 100% unexpected to me...
I think we can just replace eina_file_sanitize calls with realpath(...) calls, which would resolve those issues.

this could make the solution much simpler: only use eina_file_path_sanitize for setting the key in the hash table, but not for opening the file itself. however, stat is also called anyways, so why not get the information from there?

No this is still a bad idea, since 2 paths (A,B) which are pointing to 2 different files can still point to the same file after eina_file_path_sanitize. And well I dont see how you can get the full path of a file from the stat structure ...

it's not necessary to have the path for uniquely identify a file. stat contains the filesystem id and the inode number which together form a unique identifier for any path.

also eina_file_open is actually written to do it this way, because it first uses a syscall for opening a file, checks if it could open it and then checks the cache if the contents of the file is already there. (//commtens by me)

To be honest, i cannot believe that eina_file_open is written intentionally to map two different resulting paths into one file. Or open files in a different place then where your shell would open it, thats 100% unexpected to me...

no, what i meant is: eina_file_open wasn't intended to sanitize anything it opens, but to use a file cache. and only the file cache needs to know about links so it doesn't cache anything more then once.

I think we can just replace eina_file_sanitize calls with realpath(...) calls, which would resolve those issues.

or that