Doctrine Common
  1. Doctrine Common
  2. DCOM-130

Paths in Doctrine\Common\Cache\FileCache could create large directory indexes

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 2.3
    • Fix Version/s: None
    • Component/s: Caching
    • Labels:
      None
    • Environment:
      Any

      Description

      The way paths are created within FileCache currently, there is a theoretical maximum of 16^12 directories in the cache directory, which is quite a large number. Usually schemes like this are used to restrict the number of files in one directory.

      Comparing with git, for example, the dirs are arranged

      00/
      1c/
      ...
      ff/

      and then the object store within those directories, which is a lot more manageable, say if you happen to type ls in the cache directory, you will get a maximum listing of 256 dirs. PhpThumb does something similar when caching images.

      How about something like this for getFilename():

      $idHash = md5($id);
      $path = substr($idHash, 0, 2) . DIRECTORY_SEPARATOR . substr($idHash, 2, 2) . DIRECTORY_SEPARATOR . substr($idHash, 4);
      $path = $this->directory . DIRECTORY_SEPARATOR . $path;

      return $path . $id . $this->extension;

      Not nearly so elegant, but I think this has better properties for the file system. Also I would be tempted to use one of the sha family hashes and not to include the $id within the filename, but perhaps this is helpful for debugging?

        Activity

        Hide
        Julian Higman added a comment - - edited

        We hit this problem in a live system - with a lot of cached items, the number of subdirectories that FileCache creates can exceed the number that an ext3 filesystem allows in a single directory (about 32000).

        After that, an attempt to cache a new item can get an error like this:

        mkdir() [function.mkdir]: Too many links

        Our solution was similar to that suggested:

        
            protected function getFilename($id) {
                $path = implode(str_split(md5($id), 2), DIRECTORY_SEPARATOR);
                $path = $this->directory . DIRECTORY_SEPARATOR . $path;
                return $path . DIRECTORY_SEPARATOR . $id . $this->extension;
            }
        
        

        It splits the md5 of the item id into parts of length 2, rather than the original 12. This creates a deeply nested structure, but which won't ever exceed the limit on number of subdirectories in any one directory. It's the same subdirectory pattern used by default by Apache mod_disk_cache, as well.

        Show
        Julian Higman added a comment - - edited We hit this problem in a live system - with a lot of cached items, the number of subdirectories that FileCache creates can exceed the number that an ext3 filesystem allows in a single directory (about 32000). After that, an attempt to cache a new item can get an error like this: mkdir() [function.mkdir] : Too many links Our solution was similar to that suggested: protected function getFilename($id) { $path = implode(str_split(md5($id), 2), DIRECTORY_SEPARATOR); $path = $ this ->directory . DIRECTORY_SEPARATOR . $path; return $path . DIRECTORY_SEPARATOR . $id . $ this ->extension; } It splits the md5 of the item id into parts of length 2, rather than the original 12. This creates a deeply nested structure, but which won't ever exceed the limit on number of subdirectories in any one directory. It's the same subdirectory pattern used by default by Apache mod_disk_cache, as well.
        Hide
        Julian Higman added a comment -

        After a couple of months in production, we ran into another problem with this - we reached the maximum number of inodes in the fielsystem.

        The resulting errors look like this:

        mkdir() [function.mkdir]: No space left on device

        There is actually disk space left, but looking at the inodes shows that the limit has been hit:

        -bash-3.2# df -i
        Filesystem Inodes IUsed IFree IUse% Mounted on
        /dev/sda1 6553600 6553600 0 100% /

        The creation of directories and subdirectories can be constrained slightly by using 3 instead of 2 characters (with hex chars, that will give max of 16^3 = 4096 subdirectories per directory, still less than the ext3 limit of 32000)

        $path = implode(str_split(md5($id), 2), DIRECTORY_SEPARATOR);

        but ultimately the inodes will still all be used up.

        The only other options are pruning the cache at intervals, or switching to a different caching strategy altogether.

        Show
        Julian Higman added a comment - After a couple of months in production, we ran into another problem with this - we reached the maximum number of inodes in the fielsystem. The resulting errors look like this: mkdir() [function.mkdir] : No space left on device There is actually disk space left, but looking at the inodes shows that the limit has been hit: -bash-3.2# df -i Filesystem Inodes IUsed IFree IUse% Mounted on /dev/sda1 6553600 6553600 0 100% / The creation of directories and subdirectories can be constrained slightly by using 3 instead of 2 characters (with hex chars, that will give max of 16^3 = 4096 subdirectories per directory, still less than the ext3 limit of 32000) $path = implode(str_split(md5($id), 2), DIRECTORY_SEPARATOR); but ultimately the inodes will still all be used up. The only other options are pruning the cache at intervals, or switching to a different caching strategy altogether.
        Hide
        Marco Pivetta added a comment -

        Julian Higman I'd suggest file-based caching mechanisms are not suited for that environment. The file cache is really meant for all those environments where there's strict constraints (like shared hosting).

        Show
        Marco Pivetta added a comment - Julian Higman I'd suggest file-based caching mechanisms are not suited for that environment. The file cache is really meant for all those environments where there's strict constraints (like shared hosting).

          People

          • Assignee:
            Benjamin Eberlei
            Reporter:
            R Churchill
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated: