Personal tools
You are here: Home Documentation Developers Zope Concurrent Writes Strategy Zope Concurrent Writes Strategy

Zope Concurrent Writes Strategy

by T. Kim Nguyen last modified Aug 29, 2011 11:51 PM

Strategies for handling many concurrent Zope writes

Current thinking: use catalog queueing to get # of user-visible errors (in browser) down to zero consistently.  Maybe increase number of retries in Zope.

Test Environment

Use these so far:

  • http://svn.it.uwosh.edu:12080/plone2/makeFolder has collective.indexing
  • http://svn.it.uwosh.edu:11080/plone1/makeFolder has plone queue catalog
  • http://plonedev.uwosh.edu/plone1/makeFolder has nothing
  • http://svn.it.uwosh.edu:12080/Plone/makeFolder has no queueing at all

Andreas Jung:

The standard recipes that are easy to implement are: move your catalog to a dedicated storage, use a catalog queue, try collective.indexing. However your issue seems at least in your example related to the usage of portal_factory...no idea about that. You might look at Relstorage - however I don't think that will reduce the risk of write conflicts because Relstorage is implemented on top of the ClientStorage API (check the zodb-dev archive for a posting of Dieter related to a similar question of mine).

Is it possible to avoid using portal factory for the content types Plone ships with?

At least you configure this per-portal_type through the ZMI in portal_factory. However you will see orphans for content created but not saved.

Also, can anyone point us to a recipe for using dedicated storage for the catalog?

There is a how-to on plone.org. But you should try this only if your catalog is really prone to conflict errors.

jeanmichel.francois@makina-corpus.com:

Try to play with zope.conf parameters like:
python-check-interval
maximum-number-of-session-objects
session-timeout-minutes
session-resolution-seconds
zserver-threads
Belong the reindexing problem in the content creation workflow, there is the session problem: I have reduce the zserver thread to only 2, up the cache size, and up the session resolution because in the case of conflict error, zope retry the same request 4 times if i remember well. So the request takes too long time and the default parameters of session resolution are too short and the request finaly failed.
Creation of plone's objects are really too cpu consuming, if you can, don't use the portal_factory.

Erik Rose:

use ZODB exploration tools to see what objects the conflicts were happening on.

putting portal_catalog in a separate storage, though it's not clear to me how that would help conflict errors; it seems more likely it would just make better use of caches (in a large site, portal_catalog objects could easily use all the cache, leaving no room for normal objects).

If CPU not at 100%, add more ZEO clients till it is; if you can get those transactions finished faster, they'll be less likely to crash into each other. I usually use one client per logical core. It's good that you're using session affinity and not keeping the sessions in the ZEO; that should make your transactions go an eensy bit faster.

I guess I'm with Andreas in blaming portal_factory; it's probably got some bottleneck object that all additions in the portal root are causing writes to. Plone sites are stored as Btrees, right? Normal Folders are a rich source of ConflictErrors, since they're loaded all-or-nothing; use Large Folders whenever you can, as they're stored as Btrees, and conflicts can occur only at their individual nodes rather than at the whole-folder level.

Kurt Bendl:

* Bust your sites into a few more Plones (if it makes sense).
* Run multiple zope clients (I had 8 clients with a load balancer in front)
* Have multiple ZEO/ZODBs mounted strategically
* Mount your catalog in a separate ZEO/ZODB from your data
* Delay indexing ["PloneQueueCatalog", "collective.indexing"]
* Use another search engine
* Think about having a different set of zope clients handling logged-in users so anonymous might be less effected.

Alan Runyan:

If you could write a simple test that could demonstrate the problem.

The biggest problem we have seen in Plone:

- In process indexing (collective.indexing still does 'in process')

- In process transformation (portal_transforms needs a lot of love)

- Try remove the SearchableText index and reduce the amount of indices in portal_catalog. Watch the system go faster ;)

I would not move the Catalog to a separate mount point. While there may be a win. The complexity is too high.

The best thing is for us to integrate Kapil's repoze.queue w/ our indexing machinery. Basically there would be almost 0 overhead to the indexing operation.

I do believe the size of the instances being serialized are some concern. But not yet. There is so much unoptimized stuff (events, cataloging, indexing, transforms) that slow things down to the point that the size of pickles arent even close to a problem.

So in summary: try to remove as many catalog/metadata indexes as possible to get the system to work. See if it goes any faster w/ collective.indexing.

Nathan Van Gheem:

    Kim and I developed some basic testing mechanisms to show the severity of the problem.  The tests aren't absolute, and right now we are only testing the creation of a folder.  So not much indexing is occurring, but they still expose the problem very well.

    To test, we basically just created a python script on the site that created a folder whenever it was called.  We gave the script a proxy role of manager so it could be run with curl on the command line.  We also had a script to remove all the contents of the folder we were creating these in so each run had equal opportunity. We then had a shell script that made the simultaneous requests to create the folders.  

    We tested against a plain plone site, one with collective.indexing installed, and one with PloneQueueCatalog installed.  

Our numbers on a plain plone site look like this,

  • number of errors for runs with 10 simultaneous requests
    • [1, 0, 1, 2, 1] => 1.0
  • number of errors for runs with 25 simultaneous requests
    • [4, 5, 4, 5, 3] => 4.2
  • number of errors for runs with 50 simultaneous requests
    • [8, 10, 11, 9, 12] => 10.0
  • number of errors for runs with 75 simultaneous requests
    • [14, 15, 16, 18, 16] => 15.8

With collective.indexing, the result got mildly better.  With PloneQueueCatalog we received dramatically improved results.  However, we don't understand how useful it is to use PloneQueueCatalog.  It isn't like we're going to process the queue manually and running a cron-job to process it seems a little overkill.  If the object isn't processed, the user won't see their object on the site until it is.  This could be confusing to user also.  Is this how the product is designed, or is there a better way to process the queue?

Also, I tested this by turning off portal_factory also without any better results.

Some links to info on our testing,
Overview - has all the scripts we used
Results

 

Andreas Jung:

The catalog queue is processed automatically by a dedicated thread - or? Also the queue should defer the indexing of "unimportant" indexes
like fulltext indexes. Other indexes will be updated immediately.

 

Alan Runyan:

If you are going to do lots of simutaneous commits you *have* go out
of process (QueueCatalog).  Sometimes I think we forget exactly *how
much* work Zope/Plone are doing when a write is actually occurring.
Add in all of the transformations that could be happening when you do
a write/index.

You dont need a cronjob - you could use a clock-server task *wink*.
But you will need to do something similar to going out of process if
you want to get many people concurrently writing.  Maybe something
like collective.indexing *and* QueueCatalog would work best.

There are multiple issues at play here. But the answer is.. you must
go out of process.

 

Matthew Wilkes:

Makig changes to an object, not a site.  If you don't already, use  
BTreeFolders (LargePloneFolder) as much as possible, and if possible  
try and keep writes spread over as many folders as possible.  Another  
option is to ensure only one object is created at once, if you can  
isolate the request by URL that creates an object in a folder under  
write-contention (such as the URL the students in your inital mail  
used) and mapping that onto a different zope client with only one  
thread.  The requests will take longer as they have to queue up but  
none of them will get conflict errors.

Another option would be to change the retry count for requests,  
although this may be hard-coded.  This would result in the  
transactions that currently fail taking longer but failing less often.

 

Alan Runyan:

Ack.

According to:
https://svn.plone.org/svn/plone/Plone/branches/3.1/Products/CMFPlone/PloneFolder.py

The root of Plone is a normal folder.  Not a BTreeFolder.

Anyone interested in fixing this?

 

Kim Nguyen:

On Oct 27, 2008, at 8:47 PM, Matthew Wilkes wrote:

Makig changes to an object, not a site.  If you don't already, use
BTreeFolders (LargePloneFolder) as much as possible, and if possible
try and keep writes spread over as many folders as possible.


When the problem first occurred, yes, all the users were in the root  
folder of the site.  I then set up a test site in which home folder  
creation was enabled, then had everyone log in, go to their own home  
folder, and try to add a new folder in there.  The same errors  
occurred, so the problem does not go away even if we try to spread the  
writes...

Another
option is to ensure only one object is created at once, if you can
isolate the request by URL that creates an object in a folder under
write-contention (such as the URL the students in your inital mail
used) and mapping that onto a different zope client with only one
thread.  The requests will take longer as they have to queue up but
none of them will get conflict errors.


Seems worth a try: capturing URLs containing "createObject" and  
"portal_factory" and send them to a ZEO client that has only one thread.

Another option would be to change the retry count for requests,
although this may be hard-coded.  This would result in the
transactions that currently fail taking longer but failing less often.


If anyone has pointers on where this might be in the bowels of Zope  
please let me know.

 

Alan Runyan:

I would not try any other optimizations then simple PloneQueueCatalog
+ collective.indexing.
The way to make things faster is by doing less.  So start off by not
doing any indexing in process.

This is probably a naive question and I assume it is not as simple as
this, but why can't these requests queue up and make people wait longer?


If you turn zserver-threads to 1 it will do exactly that.

 

What We Are Now Trying:

http://labs.menttes.com/zope/products/clockserver/

Slight correction in instructions: do not include this line in buildout.cfg: %import Products.ClockServer

http://wiki.zope.org/zope2/QueueCatalog

http://zope.org/Members/ctheune/QueueCatalog/

http://wiki.zope.org/zope2/PloneQueueCatalog

http://dev.plone.org/collective/browser/PloneQueueCatalog/trunk - svn repository. To check out:

svn co https://svn.plone.org/svn/collective/PloneQueueCatalog/trunk PloneQueueCatalog

or

http://pypi.python.org/simple/Products.PloneQueueCatalog/

We set the clock server thread to wake up every 30 seconds.  It invokes a Script (Python) that is Proxy Manager and simply calls:

context.portal_catalog.process()

which processes all queued requests.

 

Document Actions
  • Print this
  • Bookmarks