Packing and copying Data.fs from production server for local development

These instructions help you to copy and transfer production server  ZODB database (Data.fs) to your local computer for development and testing. This allows you to do the testing against the copy of real data and the production server Plone instance set up.

See the original tip by cguardia.

Data.fs is ZODB file storage for transactional database. Journal history takes quite a lot of disk space there. Packing, i.e. removing the journal history,  usually reduces the size file considerably, making the file lighter for wire transfer. Depending on the database age the packed copy is less than 10% of the original size.

These instructions apply for Ubuntu/Debian based Linux systems. Apply to your own system using the operating system best practices.

We need ZODB Python package to work with the database. To use it, we’ll create virtualenv Python installation in /tmp. In virtualenv installation, installed Python packages do not pollute or break the system wide setup. Note that you might use easy-install-2.4 depending on the OS. The latest stable ZODB can be picked from PyPi listing. Plone 3.x default is ZODB 3.7.x, which is not available as Python egg, but you can use ZODB 3.8.x.

sudo easy-install virtualenv

cd /tmp

virtualenv packer

/tmp/packer/bin/easy_install ZODB=3.8.3 Data.fs cannot be modified in-place. You must create a copy of it to work with it. Data.fs copy can be created from a running system without the fear of corrupting the database, since ZODB is append only database.

cp /yoursite/var/filestorage/Data.fs /tmp/Data.fs.copy

Then create the following script snippet /tmp/pack.py using your favorite terminal editor.

import time
import ZODB.FileStorage
import ZODB.serialize

storage=ZODB.FileStorage.FileStorage('/tmp/Data.fs.copy')
storage.pack(time.time(),ZODB.serialize.referencesf)

And run it using virtualenv’ed Python setup with ZODB installed:

/tmp/packer/bin/python /tmp/pack.py

Lots of patience here… packing may take a while, but it’s still definitely faster than your Internet connection transfer rate.

Verify that the file is succesfully packed:

ls -lh Data.fs.copy
-rw-r--r-- 1 user user 30M 2009-09-01 13:24 Data.fs.copy

Woohoo 1 GB was shrunk to 30 MB. Then copy the file to your local computer using scp and place it to your development buildout.

scp user@server:/tmp/Data.fs.copy ~/mybuildout/var/filestorage/Data.fs

You just saved about 30-90 minutes of waiting of file transfer.

13 thoughts on “Packing and copying Data.fs from production server for local development

  1. I’m often bzipping the database before transfer. bzip2 Data.fs.copy produces Data.fs.copy.bz2 which should be much smaller than original file. Even without pack bzipped database is much faster to transfer (and transfer-error safe)

  2. There is also fsrecover.py which has been around for ages, which can pack a Data.fs not in place (as well as checks it for consistency):

    parts/zope2/lib/python/ZODB/fsrecover.py -P 0 /path/to/existing/Data.fs /path/to/new/Data.fs

    You might have to set your PYTHONPATH variable to point to the zope lib directory, e.g.:

    export PYTHONPATH=/Development/demo3/demo3/parts/zope2/lib/python

    IIRC this will work on a running live Data.fs, so you can create a packed copy in your home directory before copying down for local development.

    -Matt

  3. adding the following snippet to your `buildout.cfg` essentially wraps matt’s recipe into a convenience script (i hope the line breaks will survive :)):

    [buildout]

    parts += packer

    [packer]
    recipe = zc.recipe.egg
    eggs =
    entry-points = packer=ZODB.fsrecover:main
    extra-paths = ${zope2:location}/lib/python
    scripts = packer
    initialization = sys.argv[1:1] = [‘-P0’, ‘var/filestorage/Data.fs’]

    after running buildout you can now simply invoke:

    $ bin/packer

    which can then be downloaded as shown above…

  4. heh, it seems anything that looks like html is filtered here — the command invokation was meant to have an extra parameter, like so:

    $ bin/packer path/to/packed/Data.fs

    this will save a packed version of you current `Data.fs` into the given file…

  5. I use repozo.py to back up so I have the full version and deltas shipped remotely each night. Then I can recover the Data.fs again using repozo and use that copy for testing (and verify my backups at the same time). In practise you need to either do a full repozo back every few days or do a pack which forces a full repozo backup the next time. I do the latter.

  6. How about this: doing the whole repoze packing and copying using one or two SSH commands without the need to login to the server shell?

  7. mikko,

    yep, that was actually my “goal” when i put together that buildout snippet. unfortunately, though, `fsrecover` needs to be given a proper file name, i.e. writing things to stdout directly doesn’t work. at least not ootb.

    otherwise you could of course simply use something like:

    $ rsync -az plone@server:~/bin/packer > Data.fs

  8. Even better challenge. Get those ssh snippets into a fabfile, and put it into a collective.hostout.filestorage recipe the following would work.

    [host1]
    recipe=collective.hostout
    host=myhost.com
    path=/remotepath
    extends=hostfs

    [hostfs]
    recipe=collective.hostout.filestorage

    Once added to your local buildout you can then run

    $ bin/hostout pullfs hostfs
    $ bin/hostout pushfs hostfs

    The hostout plugins are new but there should be enough there to do this

    http://pypi.python.org/pypi/collective.hostout

Leave a Reply

Your email address will not be published. Required fields are marked *