[DCOM-130] Paths in Doctrine\Common\Cache\FileCache could create large directory indexes Created: 23/Oct/12  Updated: 10/May/13

Status: Open
Project: Doctrine Common
Component/s: Caching
Affects Version/s: 2.3
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: R Churchill Assignee: Benjamin Eberlei
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Any



 Description   

The way paths are created within FileCache currently, there is a theoretical maximum of 16^12 directories in the cache directory, which is quite a large number. Usually schemes like this are used to restrict the number of files in one directory.

Comparing with git, for example, the dirs are arranged

00/
1c/
...
ff/

and then the object store within those directories, which is a lot more manageable, say if you happen to type ls in the cache directory, you will get a maximum listing of 256 dirs. PhpThumb does something similar when caching images.

How about something like this for getFilename():

$idHash = md5($id);
$path = substr($idHash, 0, 2) . DIRECTORY_SEPARATOR . substr($idHash, 2, 2) . DIRECTORY_SEPARATOR . substr($idHash, 4);
$path = $this->directory . DIRECTORY_SEPARATOR . $path;

return $path . $id . $this->extension;

Not nearly so elegant, but I think this has better properties for the file system. Also I would be tempted to use one of the sha family hashes and not to include the $id within the filename, but perhaps this is helpful for debugging?



 Comments   
Comment by Julian Higman [ 10/May/13 ]

We hit this problem in a live system - with a lot of cached items, the number of subdirectories that FileCache creates can exceed the number that an ext3 filesystem allows in a single directory (about 32000).

After that, an attempt to cache a new item can get an error like this:

mkdir() [function.mkdir]: Too many links

Our solution was similar to that suggested:


    protected function getFilename($id) {
        $path = implode(str_split(md5($id), 2), DIRECTORY_SEPARATOR);
        $path = $this->directory . DIRECTORY_SEPARATOR . $path;
        return $path . DIRECTORY_SEPARATOR . $id . $this->extension;
    }

It splits the md5 of the item id into parts of length 2, rather than the original 12. This creates a deeply nested structure, but which won't ever exceed the limit on number of subdirectories in any one directory. It's the same subdirectory pattern used by default by Apache mod_disk_cache, as well.

Generated at Thu May 23 11:00:19 UTC 2013 using JIRA 5.2.7#850-sha1:b2af0c8dc8537b36121c6a579fabbdf79fc919e5.