Doctrine Common
  1. Doctrine Common
  2. DCOM-130

Paths in Doctrine\Common\Cache\FileCache could create large directory indexes

    Details

    • Type: Bug Bug
    • Status: Awaiting Feedback
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 2.3
    • Fix Version/s: None
    • Component/s: Caching
    • Labels:
      None
    • Environment:
      Any

      Description

      The way paths are created within FileCache currently, there is a theoretical maximum of 16^12 directories in the cache directory, which is quite a large number. Usually schemes like this are used to restrict the number of files in one directory.

      Comparing with git, for example, the dirs are arranged

      00/
      1c/
      ...
      ff/

      and then the object store within those directories, which is a lot more manageable, say if you happen to type ls in the cache directory, you will get a maximum listing of 256 dirs. PhpThumb does something similar when caching images.

      How about something like this for getFilename():

      $idHash = md5($id);
      $path = substr($idHash, 0, 2) . DIRECTORY_SEPARATOR . substr($idHash, 2, 2) . DIRECTORY_SEPARATOR . substr($idHash, 4);
      $path = $this->directory . DIRECTORY_SEPARATOR . $path;

      return $path . $id . $this->extension;

      Not nearly so elegant, but I think this has better properties for the file system. Also I would be tempted to use one of the sha family hashes and not to include the $id within the filename, but perhaps this is helpful for debugging?

        Activity

        Hide
        Julian Higman added a comment - - edited

        We hit this problem in a live system - with a lot of cached items, the number of subdirectories that FileCache creates can exceed the number that an ext3 filesystem allows in a single directory (about 32000).

        After that, an attempt to cache a new item can get an error like this:

        mkdir() [function.mkdir]: Too many links

        Our solution was similar to that suggested:

        
            protected function getFilename($id) {
                $path = implode(str_split(md5($id), 2), DIRECTORY_SEPARATOR);
                $path = $this->directory . DIRECTORY_SEPARATOR . $path;
                return $path . DIRECTORY_SEPARATOR . $id . $this->extension;
            }
        
        

        It splits the md5 of the item id into parts of length 2, rather than the original 12. This creates a deeply nested structure, but which won't ever exceed the limit on number of subdirectories in any one directory. It's the same subdirectory pattern used by default by Apache mod_disk_cache, as well.

        Show
        Julian Higman added a comment - - edited We hit this problem in a live system - with a lot of cached items, the number of subdirectories that FileCache creates can exceed the number that an ext3 filesystem allows in a single directory (about 32000). After that, an attempt to cache a new item can get an error like this: mkdir() [function.mkdir] : Too many links Our solution was similar to that suggested: protected function getFilename($id) { $path = implode(str_split(md5($id), 2), DIRECTORY_SEPARATOR); $path = $ this ->directory . DIRECTORY_SEPARATOR . $path; return $path . DIRECTORY_SEPARATOR . $id . $ this ->extension; } It splits the md5 of the item id into parts of length 2, rather than the original 12. This creates a deeply nested structure, but which won't ever exceed the limit on number of subdirectories in any one directory. It's the same subdirectory pattern used by default by Apache mod_disk_cache, as well.
        Hide
        Julian Higman added a comment -

        After a couple of months in production, we ran into another problem with this - we reached the maximum number of inodes in the fielsystem.

        The resulting errors look like this:

        mkdir() [function.mkdir]: No space left on device

        There is actually disk space left, but looking at the inodes shows that the limit has been hit:

        -bash-3.2# df -i
        Filesystem Inodes IUsed IFree IUse% Mounted on
        /dev/sda1 6553600 6553600 0 100% /

        The creation of directories and subdirectories can be constrained slightly by using 3 instead of 2 characters (with hex chars, that will give max of 16^3 = 4096 subdirectories per directory, still less than the ext3 limit of 32000)

        $path = implode(str_split(md5($id), 2), DIRECTORY_SEPARATOR);

        but ultimately the inodes will still all be used up.

        The only other options are pruning the cache at intervals, or switching to a different caching strategy altogether.

        Show
        Julian Higman added a comment - After a couple of months in production, we ran into another problem with this - we reached the maximum number of inodes in the fielsystem. The resulting errors look like this: mkdir() [function.mkdir] : No space left on device There is actually disk space left, but looking at the inodes shows that the limit has been hit: -bash-3.2# df -i Filesystem Inodes IUsed IFree IUse% Mounted on /dev/sda1 6553600 6553600 0 100% / The creation of directories and subdirectories can be constrained slightly by using 3 instead of 2 characters (with hex chars, that will give max of 16^3 = 4096 subdirectories per directory, still less than the ext3 limit of 32000) $path = implode(str_split(md5($id), 2), DIRECTORY_SEPARATOR); but ultimately the inodes will still all be used up. The only other options are pruning the cache at intervals, or switching to a different caching strategy altogether.
        Hide
        Marco Pivetta added a comment -

        Julian Higman I'd suggest file-based caching mechanisms are not suited for that environment. The file cache is really meant for all those environments where there's strict constraints (like shared hosting).

        Show
        Marco Pivetta added a comment - Julian Higman I'd suggest file-based caching mechanisms are not suited for that environment. The file cache is really meant for all those environments where there's strict constraints (like shared hosting).
        Hide
        Loban Rahman added a comment -

        It's been a couple of years that this Bug has lain idle. Now the situation is worse, since the code now has

        str_split(hash('sha256', $id), 2)

        Our production system also ran out of inodes. Saying that "file-based caching mechanism is not suited for that environment, and is meant for those environments with strict constraints like shared hosting" doesn't make sense, because those environments are even more likely to run out of inodes.

        The proposed solution is simple and would solve this problem. Should I make a pull request?

        Show
        Loban Rahman added a comment - It's been a couple of years that this Bug has lain idle. Now the situation is worse, since the code now has str_split(hash('sha256', $id), 2) Our production system also ran out of inodes. Saying that "file-based caching mechanism is not suited for that environment, and is meant for those environments with strict constraints like shared hosting" doesn't make sense, because those environments are even more likely to run out of inodes. The proposed solution is simple and would solve this problem. Should I make a pull request?
        Hide
        Trent added a comment -

        Yes, this is definitely worse now because of the sha256. I have a project where there are now so many directories that it can take over an hour to do a "rm -rf app/cache"

        Is there a fix in the works for this? Perhaps the name of the folders need to be increased?

        I don't think reducing the number of folders would be an issue would since the files final destination is named with the id itself (so there shouldn't be any conflicts)?

        str_split(str_pad(substr(preg_replace('/[^0-9]/', null, hash('sha256', $id)), 0, 3), 3, '0'), 1)

        The above would only take the numbers from the hash (0-9) and will create 3 sub directories (each allowing 9 folders named 0-9)

        Show
        Trent added a comment - Yes, this is definitely worse now because of the sha256. I have a project where there are now so many directories that it can take over an hour to do a "rm -rf app/cache" Is there a fix in the works for this? Perhaps the name of the folders need to be increased? I don't think reducing the number of folders would be an issue would since the files final destination is named with the id itself (so there shouldn't be any conflicts)? str_split(str_pad(substr(preg_replace('/ [^0-9] /', null, hash('sha256', $id)), 0, 3), 3, '0'), 1) The above would only take the numbers from the hash (0-9) and will create 3 sub directories (each allowing 9 folders named 0-9)

          People

          • Assignee:
            Benjamin Eberlei
            Reporter:
            R Churchill
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated: