BGPKIT Broker 0.7 Release

SQLite, CLI, NATS, and more

ยท

7 min read

BGPKIT Broker is a fundamental component to our design of a all-purpose BGP data processing pipeline. In short, it is a BGP data file meta information "broker" that tells the data consumers what MRT files from RouteViews and RIPE RIS are available for any given time range in question. It commonly serves as a data input entry point for data pipelines.

For instance, here is a simple diagram for a system that creates a semi-real-time BGP data stream with BGPKIT Broker and Parser (a very common use case for these two libraries).

Sample workflow diagram where BGPKIT Broker indexes meta information from MRT archives and BGPKIT parser parses these files to BGP messages.

BGPKIT Broker periodically crawls the websites of RIPE RIS and RouteViews MRT data pages of their collectors and index meta information into a database. Downstream consumers can ask and retrieve new files and process the files into BGP messages.

Previously on BGPKIT Broker

In the BGPKIT Broker version 0.1 to 0.6, a working broker instance consists of three individual components: a crawler, a Postgres database, and an API.

Each of the three component runs independently, and requires independent configuration, cronjobs, deployment, and all these goodies. For example, to run BGPKIT Broker v0.6, a user will need to configure and run

  • a PostgreSQL database with proper credentials and schema set up;

  • a cronjob instance that periodically crawls the data sources, with optional locks to prevent overlapping executions in case some crawl became slow;

  • a API application likely sitting behind a configured reverse proxy like Caddy to serve the data.

It's fun and exciting to set up all these for the first time, but quickly became tiring and too complex for repeated set up or bootstrapping for new users.

V0.7: one CLI app that does everything

We completely revamped the architecture for BGPKIT Broker in V0.7 to merge every functionality needed for a running Broker instance into one single command-line application: bgpkit-broker. V0.7 provides a single application to configure, run, debug, query everything on BGPKIT Broker.

To achieve this redesign, we made some major changes to our architecture.

SQLite instead of PostgreSQL

There are two major topics to concern when choosing a backend database for BGPKIT Broker: performance and portability.

SQLite is more than fast enough

BGPKIT Broker indexes metadata for all collectors from RouteViews and RIPE RIS, which includes time, URL, type, size of every RIB dump and updates MRT files from these two public archives. Dating all the way back to 1999, we have indexed roughly 48 million MRT files' metadata.

With a single index on the timestamp of files, we are able to search data files in less than 0.5s for any queries, which is more than fast enough for our use cases. We admit that we have spent our time for "early optimization" and in the end, the simple schema out-weighs the small performance gains.

Backup and bootstrap with just one file

Now in terms of portability, we can appreciate enough the beauty of single-file database like SQLite. In our current production setup, we periodically backup the database, and it literally involves just copying a single file to another directory (well, we also upload it to Cloudflare R2 for safekeeping).

Portability also means users can move their instance anywhere they want with ease. This is definitely the case for V0.7 where new users can bootstrap by simply download a SQLite file (our CLI provides all that functionality), and move to new locations by scp it to anywhere they desire.

Here is a video demonstrating bootstrapping a local BGPKIT Broker SQLite database with the new bgpkit-broker bootstrap command.

New file notification via NATS

Before V0.7, pipelines that needs to continuously processing new MRT files will need to "pull" data from BGPKIT Broker instance periodically and keep track of the latest files processed. We consider this a hassle that developers should not be dealing with and thus introduced a new NATS-based message channel allowing data consumers to subscribe to the public/private NATS channel where a Broker instance may publish new file notification to.

We dedicated nats.broker.bgpkit.com as the public endpoint for any NATS consumers to connect to. Whenever a new file becomes available in Broker, it will publish a new file notification with all metadata as in the database entry to the public channel. Consumers (e.g. data pipelines) can use the NatsNotifier::new(None).start_subscription() to start waiting for new files. The following snippet below shows how a simple pipeline can use this feature in a loop.

let mut notifier = match NatsNotifier::new(url).await {
    Ok(n) => n,
    Err(e) => {
        error!("{}", e);
        return;
    }
};
if let Err(e) = notifier.start_subscription(subject).await {
    error!("{}", e);
    return;
}
while let Some(item) = notifier.next().await {
    if pretty {
        println!("{}", serde_json::to_string_pretty(&item).unwrap());
    } else {
        println!("{}", item);
    }
}

We also implemented a simple new file watcher in the app as bgpkit-broker live subcommand. It will start a subscription to the public BGPKIT NATS endpoint and print out new file data as they come to the channel.

One command to serve and update

As mentioned previously, the new bgpkit-broker application includes everything one needs to start a instance. Once one bootstrapped the database to a local sqlite file (via bgpkit-broker bootstrap <FILENAME> command), all they need to start a auto-updating API is to run bgpkit-broker serve <FILENAME> .

bgpkit-broker serve --help
Serve the Broker content via RESTful API

Usage: bgpkit-broker serve [OPTIONS] <DB_PATH>

Arguments:
  <DB_PATH>  broker db file location

Options:
  -i, --update-interval <UPDATE_INTERVAL>  update interval in seconds [default: 300]
      --no-log                             disable logging
  -b, --bootstrap                          bootstrap the database if it does not exist
      --env <ENV>                          
  -s, --silent                             disable bootstrap progress bar
  -h, --host <HOST>                        host address [default: 0.0.0.0]
  -p, --port <PORT>                        port number [default: 40064]
  -r, --root <ROOT>                        root path, useful for configuring docs UI [default: /]
      --no-update                          disable updater service
      --no-api                             disable API service
  -h, --help                               Print help
  -V, --version                            Print version

The serve subcommand will also start a thread that periodically crawl and update the SQLite database to make sure the API always serve the up-to-date data.

Noticed that error message? It's by design as it tries to connect to notification channel for new files as a default behavior for a service, but not NATS URL is configured. We use the BGPKIT_BROKER_NATS_URL environment variable to configure the NATS channel to use.

We also allow users to optionally configure a heartbeat URL to monitor the data updating status. After every success data crawling run, Broker will try to execute a HTTP GET to a URL if BGPKIT_BROKER_HEARTBEAT_URL is set in the environment. This is useful to monitor the running status of the Broker instance without the need of setting up a cronjob.

We use Better Stack's Uptime monitoring service for page and heartbeat monitoring, and the public Broker instance is running V0.7 with the heartbeat URL set to this service. All status information can be found at https://status.bgpkit.com/

Production-ready, on-prem deployment

Although BGPKIT Broker has not yet reached V1.0, we consider it to be feature-complete and production-ready. Ever since V0.2, we have made our better efforts on not introducing any breaking changes and the service has been serving the community with a stable uptime ever since. We believe all libraries running in production should at least be 1.0, and thus we will release V1.0 soon this summer.

We also made significant efforts in V0.7 release to make sure BGPKIT Broker is as portable as possible. New users can spin up a fully functioning Broker instance with just two commands: bgpkit-broker bootstrap and bgpkit-broker serve, all within 5 minutes. With V0.7 released, we encourage all data pipeline designers to deploy a Broker instance on-premise, ensuring data pipelines are self-containing and reduce external dependencies as much as possible. We also will continue maintain our public instance to our best efforts (we are currently at 99.996% uptime). Thanks to our sponsors, we are able to keep the services up as we do, and we plan to continue serving the community the same way in the foreseeable future.

For the full V0.7 release notes, please check out our GitHub release page. If you have any comments, please drop us a message at Twitter, Mastodon, or email.

๐Ÿ’–
If you find our libraries and services useful, we would highly appreciate if you consider sponsor us on GitHub.
ย