Maintaining an Up-to-Date PurlDB with PurlWatch

PurlDB serves as a knowledge base for packages. It is essential to keep this knowledge base updated as new package versions are released daily. PurlWatch is responsible for keeping PurlDB up-to-date. Depending on the PurlDB size PurlWatch provides two different approach.

Methods to Keep PurlDB Updated

Using the Management Command

For a relatively small and focused PurlDB, one can use the management command python manage_purldb.py watch_packages. This command can be run periodically using a cron job to watch all the PURLs in your PurlDB for new versions. Upon detecting new versions, it collects and indexes the new package. This approach is efficient for smaller databases.

Using the /api/watch Endpoint

For larger PurlDB, the /api/watch endpoint is ideal. Users can use this endpoint to register interest for a PURL and specify how frequently to check for new versions of package. Additionally, it includes a depth field to specify the level of data collection, such as version only, metadata, or a complete scan.

How Watch API Endpoint Works

The /api/watch endpoint allows users to register interest in a specific PURL and periodically monitors it for new version. To effectively manage this periodic monitoring, PurlWatch uses a comprehensive PackageWatch model. The watch_interval field determines how often to look for new package version. The depth field specifies the level of data collection, whether it’s just the version, metadata, or a full scan. Errors encountered during the watch process are tracked in watch_error and resets after each new watch. The schedule_work_id keeps track of the periodic job for a PURL throughout its lifecycle from creation, modification to deletion. The is_active field allows users to pause and resume the watch for any PURL, providing fine-grained control over the entire watch process.

The watch feature utilizes the RQ scheduler to keep track of when a particular PURL is due for watch. It creates watch task for the PURL and enqueues it in RQ for execution.

Advantages

  • Background tasks ensure that the PurlDB remains updated without manual intervention.

  • The watch frequency can be customized to balance the resource uses.

  • Users can define the depth of data collection based on their needs.

Tip

For detailed instructions on using /api/watch endpoint, refer to Watch PURL for new Version.