Details
-
Type:
Bug
-
Status:
Open
-
Priority:
Minor
-
Resolution: Unresolved
-
Affects Version/s: 2.3
-
Fix Version/s: None
-
Component/s: Caching
-
Labels:None
-
Environment:Any
Description
The way paths are created within FileCache currently, there is a theoretical maximum of 16^12 directories in the cache directory, which is quite a large number. Usually schemes like this are used to restrict the number of files in one directory.
Comparing with git, for example, the dirs are arranged
00/
1c/
...
ff/
and then the object store within those directories, which is a lot more manageable, say if you happen to type ls in the cache directory, you will get a maximum listing of 256 dirs. PhpThumb does something similar when caching images.
How about something like this for getFilename():
$idHash = md5($id);
$path = substr($idHash, 0, 2) . DIRECTORY_SEPARATOR . substr($idHash, 2, 2) . DIRECTORY_SEPARATOR . substr($idHash, 4);
$path = $this->directory . DIRECTORY_SEPARATOR . $path;
return $path . $id . $this->extension;
Not nearly so elegant, but I think this has better properties for the file system. Also I would be tempted to use one of the sha family hashes and not to include the $id within the filename, but perhaps this is helpful for debugging?
We hit this problem in a live system - with a lot of cached items, the number of subdirectories that FileCache creates can exceed the number that an ext3 filesystem allows in a single directory (about 32000).
After that, an attempt to cache a new item can get an error like this:
mkdir() [function.mkdir]: Too many links
Our solution was similar to that suggested:
protected function getFilename($id) { $path = implode(str_split(md5($id), 2), DIRECTORY_SEPARATOR); $path = $this->directory . DIRECTORY_SEPARATOR . $path; return $path . DIRECTORY_SEPARATOR . $id . $this->extension; }It splits the md5 of the item id into parts of length 2, rather than the original 12. This creates a deeply nested structure, but which won't ever exceed the limit on number of subdirectories in any one directory. It's the same subdirectory pattern used by default by Apache mod_disk_cache, as well.