[DC-644] _getCacheKeys() exhausts memory Created: 22/Apr/10  Updated: 06/Jul/11

Status: Open
Project: Doctrine 1
Component/s: Caching
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Amir W Assignee: Jonathan H. Wage
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Doctrine is installed as a Symfony plugin. Using the latest Symfony from SVN.



 Description   

My scripts have excessive memory consumption and I've often saw in my logs:

PHP Fatal error: Allowed memory size of 268435456 bytes exhausted (tried to allocate 2097152 bytes) in /proj/lib/vendor/symfony/lib/plugins/sfDoctrinePlugin/lib/vendor/doctrine/Doctrine/Cache/Apc.php on line 111

Looking into the code I've found which function to blame:

protected function _getCacheKeys()
{
$ci = apc_cache_info('user');
$keys = array();

foreach ($ci['cache_list'] as $entry)

Unknown macro: { $keys[] = $entry['info']; ######### THIS IS THE LINE }

return $keys;
}

My server extensively uses APC caching and it's normal to have many cache keys.
Obviously retrieving ALL of them is time and memory consuming.
As I'm not well versed with Doctrine's code, I didn't want to dive further in.

Is there another way to avoid this pitfall?



 Comments   
Comment by Amir W [ 26/Apr/10 ]

Is there any patch that could be provided meanwhile? This is quite a problem on a live website.

Comment by Amir W [ 10/May/10 ]

Is this not a critical issue for Doctrine's cache? It's been up for 2 weeks with not even a comment...

Comment by Jonathan H. Wage [ 10/May/10 ]

Hi, what are you calling that is invoking _getCacheKeys()? The only methods that call it are the deleteBy*() methods. It is expected that these methods have to get the entire list of cache keys from the driver in order to perform the delete by operation. These cache clearing operations should probably be done in the CLI environment where the memory limits are higher. If you want to avoid _getCacheKeys() being invoked, then you must not use the deleteBy*() methods.

Comment by Amir W [ 10/May/10 ]

Thank you for commenting. Yes, I am using deleteByRegex() since I need to expire some result cache entries upon an update operation. What other choice do I have if I wish to keep using the result cache offered by Doctrine? Is there any other mechanism?

Can't _getCacheKeys() be optimized some way?

Comment by Jonathan H. Wage [ 10/May/10 ]

No, it is not able to be optimized anymore. It has to load all the keys into a php array in memory in order to loop over them to compare against the regex. You should probably not be doing cache clearing operations in the browser under apache. If you do, you'll need to raise your memory limit.

Comment by Amir W [ 10/May/10 ]

My code actually had a few of these calls and I've now removed use of the result cache with Doctrine. What you're writing means the result cache is not usable for dynamic websites. IMHO, it's a good practice to cache results and remove them once an update is made to the data (which naturally can happen due to an update from a user). However, if that by itself creates an overload on the server (and as you know even a temporary memory abuse leads to an overload), I cannot see how it can be useful.
Please tell me if you think there's a way the results cache can still be usable for a dynamic website.

Thanks

Comment by Jonathan H. Wage [ 10/May/10 ]

This is the only way to allow more complex delete functionality. How you use it, is not up to us. We intended that cache clearing is done from the command line or in an environment where the memory limit is high enough to be able to load all those keys. It may not be able to be used by everyone, if it is not working for how you are using it then you will need to think of another solution I suppose.

Comment by Amir W [ 10/May/10 ]

Thank you for your response and I'll think of another solution for my application.

I did dive into the code and there's a relevant optimization that could be made.

_getCacheKeys() is actually creating another array for all the cache keys which needlessly increases the memory used.
If the deleteBy*() method would be implemented at the driver level (such as with Apc.php) and not at the general level (Driver.php as it is now) this array would not have to be created. It won't be such a code bloat and would surely lessen memory use.

There could be a way around the problem which also implements another feature I miss with the results cache. By allowing some sort of cache tagging to mark the items that may need to be deleted we could easily delete relevant entries. I'll describe the interface here.

Instead of
$q = $q->useResultCache(true, 86400);

There should be
$q = $q->useTagResultCache('SomeTag', true, 86400);
which does the same PLUS update a cached variable (such as 'Doctrine_Result_Cache_Tag_SomeTag') which references the result cache keys of 'SomeTag'.

We can then easily implement deletion of relevant result cache entries with

deleteByTag('SomeTag')

which would read 'Doctrine_Result_Cache_Tag_SomeTag' to figure out which entries should be removed from the cache.

I'm pretty sure my usage scenario is not marginal but let me know what you think.

Comment by Jonathan H. Wage [ 10/May/10 ]

This is already possible if I understand what you describe.

$q->useResultCache(true, 3600, 'key_to_store_cache_under');

Now you can do:

$cacheDriver->delete('key_to_store_cache_under');

Also what you describe useTagResultCache() and keeping up with our own list of cache keys is the way it used to be and was changed to this after worse performance problems were discovered with that approach.

Comment by Amir W [ 10/May/10 ]

Perhaps I've been misunderstood so I'll try explain from the start.

In my system a few queries do relate to the same pieces of information. That information can be updated by a user and thus I would need to remove anywhere between 0 and 50 related result cache variables. I cannot easily name each and every one of my queries thus giving a specific key name doesn't help. So what I did was to prefix the name of each of the queries to indicate that I'll know how to remove them. I may have thousands of results cached and would need to clear just a few. That's why I use the deleteBy*() which proves to be extremely inefficient as it retrieves ALL the keys in my cache driver and not only the Doctrine related ones.

I really don't know how it has been implemented before but what I suggest wouldn't hurt performance as tagging would be an optional addition managed with another variable. If you think that won't b useful to other Doctrine users, I'll simply implement it for my system.

Thanks

Comment by Jonathan H. Wage [ 10/May/10 ]

I think the best solution is the one you suggested earlier. That each cache driver should directly implement this functionality and bypass the creation of the array. What do you think? It is backwards compatible so that way we can commit it in 1.2.

Comment by Amir W [ 10/May/10 ]

Bypassing the array is a required optimization which is easy to implement but it's not really a solution to the problem I'm facing and I believe is common enough (Zend_Cache for example implements tagging) and need to be offered. As it'll be 2 new functions that will implement tagging only when specifically requested, it'll also be backward compatible. The only thing I'm not sure about is if an implementation of some locking mechanism would be needed for the cached variable which would hold the list of cache keys for a specific tag.

Comment by Jonathan H. Wage [ 10/May/10 ]

Let me know what you come up with and we'll have a look at including it in the next 1.2.x release.

Comment by Amir W [ 16/May/10 ]

Bypassing the extra array is still not good enough and IMHO the whole idea of deleteBy() should NOT be used if many such requests could be made, as is my case.

What I've done now is what I mentioned before with a patch that is quite ugly.

In Doctrine/Query/Abstract.php right after the line

$cacheDriver->save($hash, $cached, $this->getResultCacheLifeSpan());

I've added

                if (!empty($GLOBALS['rcache_users_in_query'])) {
                	MyCache::keepRelatedCacheKey($GLOBALS['rcache_users_in_query'], $hash);
                }

Which saves another cache key which holds the hash tags that would have to be deleted on an update.
My global variable is actually an array as a Doctrine query result may be associated with more than one user and possibly other parameters.
Before calling the $q->execute(), I simply update this variable.

When a user on my system does the update, I then delete all relevant Doctrine keys with something like

		if (is_null($cacheDriver)) $cacheDriver = Doctrine_Manager::getInstance()->getAttribute(Doctrine_Core::ATTR_RESULT_CACHE);
		
		foreach($arKeys as $key) {
			$cacheDriver->delete($key);			
		}

and then delete my other cache key.

This solution works well for me. Sorry I cannot make a nice Doctrine patch for it as I'm not well versed with your code. I still believe it should be supported by Doctrine with an optional extra parameter for $q->useResultCache()

Thanks

Comment by David Abdemoulaie [ 08/Jun/10 ]

Hi Amir,

Zend_Cache does not implement tagging for either APC or Memcached backends, see the documentation. It also likely never will, all requests for this functionality have been closed with Wont Fix.

I don't think the deleteBy methods should have ever been implemented. When initially implemented they cached a "doctrine_cache_keys" variable to store the keys known to Doctrine. This however led to a crippling bug that would crash my production servers after a few hours. Not even a friendly "out of memory" limit, but a slowdown and eventual crash. Please see DDC-460 for details. Note that I don't use the magic delete methods, just simple saves with timeouts and this was affecting me.

I fixed the solution as you've seen using the _getCacheKeys() method. I don't believe this functionality should have ever been added to Doctrine to begin with, but this is what we have to work with. It should be the responsibility of the cache store to handle tagging and such, not poorly hacked on with application code.

As it stands, the current implementation doesn't affect people who aren't even using this functionality, as it should be. As Jon suggested, you shouldn't be using this in the context of a page request. Use a CLI script or work on another solution. Your idea of tracking your keys in application code is a good idea, but it doesn't belong in Doctrine imo.

Comment by Amir W [ 10/Jun/10 ]

Thanks David for your comment.

I agree with you that my implementation should not belong in Doctrine and that tagging should have been a part of the cache backends.

Continuing with the same logic you've presented, deleteBy...() functionality **should be removed** from Doctrine if it causes the system to crash as it does so in an obnoxious way so that it would take too long for most developers to notice this is where the problem lies. It has certainly taken too much of my time and efforts and I'd rather save the pain from others.

Comment by Carsten Henkelmann [ 06/Jul/11 ]

We had the exact same problem. We used a "deleteAll()" of a ApcCache object and ran into the "allowed memory size exhausted" pitfall. We helped ourselves with a new class that extends ApcCache and uses the simpler apc_clear_cache function.

 
namespace Foo\Cache;

class ApcCache extends \Doctrine\Common\Cache\ApcCache
{
    /**
     * Delete all cache entries. Memory saving version...
     *
     * @return bool
     */
    public function deleteAll()
    {
        return apc_clear_cache('user');
    }
}
 
use Foo\Cache\ApcCache as Apc;
...
$this->_apc = new Apc();
$this->_apc->deleteAll();

This doesn't return the ids of the deleted entries like the original function but we don't need that. So this works fine for us.

Generated at Sat Dec 20 17:20:25 UTC 2014 using JIRA 6.2.3#6260-sha1:63ef1d6dac3f4f4d7db4c1effd405ba38ccdc558.