Importing a data snapshot

From OpenHatch wiki

This is a page about improving or modifying OpenHatch.

We call that "Hacking OpenHatch," and there is a whole category of pages about that.

When you get your own instance of the OpenHatch code running, you'll discover you're missing the data that are on the main OpenHatch site. To assist developers that are working on features that data would be helpful, we take periodic snapshots of the data on the main openhatch.org site.

Where you can find the snapshots

You may download a snapshot of the OpenHatch data from:

Note: We go through some effort to remove private information before we publish user data in these snapshots. The code for that is here.

Privacy implications

We discuss some privacy implications in the privacy policy document. We do suggest people read the privacy policies when they make their accounts.

How to use a snapshot

Note: You must run

 python manage.py syncdb --noinput

and

 python manage.py migrate

before the following steps will work. Read README.mkd to learn more about those commands. It should take less than one minute.


1. Copy the downloaded snapshot file into the 'oh-mainline' directory. The snapshot file is named as date.json.gz where 'date' is in the form YYYY-MM-DD.

2. To load the snapshot file into your database, input the following (no need to uncompress the snapshot file):

python manage.py loaddata 2012-08-12.json.gz

Note: This may take a long time (10-15 minutes) without any output. This is normal.

You'll see output that looks something like this:

Installed 94858 object(s) from 1 fixture(s)

3. Then, run the following command to update the database file mysite.db with the new data snapshot:

 python manage.py syncdb

You can test that it worked by loading up your local projects page and ensuring it is not empty. Access http://127.0.0.1:8000/projects/ (and compare it to http://openhatch.org/projects/ if you like!) to check.

In case of memory problems

In case your operating system has problems with loading the giant set of all the production data, you can get the db file directly from http://inside.openhatch.org/snapshots/. The important critical step is to remember to change the name of the existing development db file to a different name (in case you need it later) and then you change the newly downloaded db file name to site.db.

 python manage.py syncdb --noinput --migrate
 python manage.py migrate
 python manage.py loaddata NAME_OF_SNAPSHOT.gz

More about this

  • We go through some effort to remove private information before we publish user data in these snapshots. The code for that is here.
  • We are now creating these snapshots once a day.
  • We don't snapshot every single table. If you find there's something that we don't publish that we should, do file a bug!
  • Known issue: If your MySQL database isn't set up for Unicode, you could get a warning/error like this: "Incorrect string value: '\xC8\x9B' for column 'first_name' at row 1". That issue can be fixed; just re-create your database as described in the README.mkd file. (If you need help, come find us on #openhatch.)
  • How it works, on the servers: On linode2.openhatch.org, a cron job wakes up daily and runs mysite/scripts/snapshot_then_push.sh