Category:Hacking OpenHatch

About our bug tracker import code
The code we use for crawling bug trackers lives in mysite/search/tasks/. Some of the imports are in __init__.py, however many have been migrated out to separate modules. It's fairly straight-forward.

There are a few different designs of our bug tracker import code, and different bug trackers that we can import from use different styles. The most recent style is the only one I'll document here, since I like it the most so far. The Fedora Bugzilla "fitandfinish" importer is an example of that.

It's separated into three tasks:


 * One task accepts a Fedora Bugzilla bug ID as input. It checks if our copy of it is more than one day old. If so, it refreshes it by pulling the fresh data from the remote bug tracker. If we do not have a local copy of that Fedora bug at all, it downloads it and stores a copy in our database. It's called LookAtOneFedoraBug and lives in mysite/search/tasks/bugzilla_instances.py.


 * One task is a "Periodic" task, basically a Python-based cron job. It wakes up every day and grabs a list of Fedora bugs from the Fedora bug tracker. For each such bug, it enqueues a LookAtOneFedoraBug instance. It's called LearnAboutNewFedoraFitAndFinishBugs.


 * The final task is another Periodic task. It wakes up once a day and, for each bug in our bug tracker that seems to come from Fedora, it enqueues a LookAtOneFedoraBug task to make sure we look at that the bug. This task is needed so that we learn when bugs should be marked as CLOSED (note that the LearnAboutNewFedoraFitAndFinishBugs task would not look at CLOSED bugs since those wouldn't appear in the Bugzilla search -- ask me to reword this explaination if necessary!).

Why all this talk about "tasks"? That way, these can each be small bits of code, and we can use a job queuing system to make sure we don't hammer the Fedora Bugzilla too much by downloading LOTS of bugs in parallel. We use the celery task queue to manage these jobs.

These are the workers, but they're not where the real bug tracker parsing work is done. That lives in mysite/customs/, in various files there. We do write tests for the bug tracker importers, and those live in mysite/customs/tests.py. So far we don't much test the tasks, but instead the core import code that the tasks use.

Using buildout: adding a dependency

 * 1) Find the Python package name of the dependency. This will appear in the package's own setup.py file (as a parameter called "name" passed to the setup function).
 * 2) Edit milestone-a/setup.py.
 * 3) Add the dependency's Python package name to "install_requires".  You may optionally include a version number.
 * 4) If you haven't already, create a tarball of the Python package as follows:
 * 5) cd to the package's top-level directory, where setup.py is
 * 6) Run `python setup.py sdist`
 * 7) This creates a tarball of your Python package in dist/.
 * 8) Host a tarball of the Python package at some public URL.  (This is in order to cache a copy of the package somewhere everyone can reliably reach it.)
 * 9) Add the tarball URL to the list called "dependency_links" in milestone-a/setup.py, appending the string "#egg=the package name" to the URL.
 * 10) Add the package name to milestone-a/buildout.cfg under "eggs".
 * 11) Run bin/buildout

Other notes of interest

 * Performance analysis