Bug tracker import code (moved)

From OpenHatch wiki
Revision as of 01:23, 29 July 2010 by imported>Paulproteus (Created page with 'The code we use for crawling bug trackers lives in mysite/search/tasks/. Some of the imports are in __init__.py, however many have been migrated out to separate modules. It's fa…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

The code we use for crawling bug trackers lives in mysite/search/tasks/. Some of the imports are in __init__.py, however many have been migrated out to separate modules. It's fairly straight-forward.

There are a few different designs of our bug tracker import code, and different bug trackers that we can import from use different styles. The most recent style is the only one I'll document here, since I like it the most so far. The Fedora Bugzilla "fitandfinish" importer is an example of that.

It's separated into three tasks:

  • One task accepts a Fedora Bugzilla bug ID as input. It checks if our copy of it is more than one day old. If so, it refreshes it by pulling the fresh data from the remote bug tracker. If we do not have a local copy of that Fedora bug at all, it downloads it and stores a copy in our database. It's called LookAtOneFedoraBug and lives in mysite/search/tasks/bugzilla_instances.py.
  • One task is a "Periodic" task, basically a Python-based cron job. It wakes up every day and grabs a list of Fedora bugs from the Fedora bug tracker. For each such bug, it enqueues a LookAtOneFedoraBug instance. It's called LearnAboutNewFedoraFitAndFinishBugs.
  • The final task is another Periodic task. It wakes up once a day and, for each bug in our bug tracker that seems to come from Fedora, it enqueues a LookAtOneFedoraBug task to make sure we look at that the bug. This task is needed so that we learn when bugs should be marked as CLOSED (note that the LearnAboutNewFedoraFitAndFinishBugs task would not look at CLOSED bugs since those wouldn't appear in the Bugzilla search -- ask me to reword this explaination if necessary!).

Why all this talk about "tasks"? That way, these can each be small bits of code, and we can use a job queuing system to make sure we don't hammer the Fedora Bugzilla too much by downloading LOTS of bugs in parallel. We use the celery task queue to manage these jobs.

These are the workers, but they're not where the real bug tracker parsing work is done. That lives in mysite/customs/, in various files there. We do write tests for the bug tracker importers, and those live in mysite/customs/tests.py. So far we don't much test the tasks, but instead the core import code that the tasks use.