BGPKIT Broker is a fundamental component to our design of a all-purpose BGP data processing pipeline. In short, it is a BGP data file meta information "broker" that tells the data consumers what MRT files from RouteViews and RIPE RIS are available for any given time range in question. It commonly serves as a data input entry point for data pipelines.
For instance, here is a simple diagram for a system that creates a semi-real-time BGP data stream with BGPKIT Broker and Parser (a very common use case for these two libraries).
BGPKIT Broker periodically crawls the websites of RIPE RIS and RouteViews MRT data pages of their collectors and index meta information into a database. Downstream consumers can ask and retrieve new files and process the files into BGP messages.
Previously on BGPKIT Broker
In the BGPKIT Broker version 0.1 to 0.6, a working broker instance consists of three individual components: a crawler, a Postgres database, and an API.
Each of the three component runs independently, and requires independent configuration, cronjobs, deployment, and all these goodies. For example, to run BGPKIT Broker v0.6, a user will need to configure and run
a PostgreSQL database with proper credentials and schema set up;
a cronjob instance that periodically crawls the data sources, with optional locks to prevent overlapping executions in case some crawl became slow;
a API application likely sitting behind a configured reverse proxy like Caddy to serve the data.
It's fun and exciting to set up all these for the first time, but quickly became tiring and too complex for repeated set up or bootstrapping for new users.
V0.7: one CLI app that does everything
We completely revamped the architecture for BGPKIT Broker in V0.7 to merge every functionality needed for a running Broker instance into one single command-line application: bgpkit-broker
. V0.7 provides a single application to configure, run, debug, query everything on BGPKIT Broker.
To achieve this redesign, we made some major changes to our architecture.
SQLite instead of PostgreSQL
There are two major topics to concern when choosing a backend database for BGPKIT Broker: performance and portability.
SQLite is more than fast enough
BGPKIT Broker indexes metadata for all collectors from RouteViews and RIPE RIS, which includes time, URL, type, size of every RIB dump and updates MRT files from these two public archives. Dating all the way back to 1999, we have indexed roughly 48 million MRT files' metadata.
With a single index on the timestamp of files, we are able to search data files in less than 0.5s for any queries, which is more than fast enough for our use cases. We admit that we have spent our time for "early optimization" and in the end, the simple schema out-weighs the small performance gains.
Backup and bootstrap with just one file
Now in terms of portability, we can appreciate enough the beauty of single-file database like SQLite. In our current production setup, we periodically backup the database, and it literally involves just copying a single file to another directory (well, we also upload it to Cloudflare R2 for safekeeping).
Portability also means users can move their instance anywhere they want with ease. This is definitely the case for V0.7 where new users can bootstrap by simply download a SQLite file (our CLI provides all that functionality), and move to new locations by scp it to anywhere they desire.
Here is a video demonstrating bootstrapping a local BGPKIT Broker SQLite database with the new bgpkit-broker bootstrap
command.
New file notification via NATS
Before V0.7, pipelines that needs to continuously processing new MRT files will need to "pull" data from BGPKIT Broker instance periodically and keep track of the latest files processed. We consider this a hassle that developers should not be dealing with and thus introduced a new NATS
-based message channel allowing data consumers to subscribe to the public/private NATS channel where a Broker instance may publish new file notification to.
We dedicated nats.broker.bgpkit.com
as the public endpoint for any NATS consumers to connect to. Whenever a new file becomes available in Broker, it will publish a new file notification with all metadata as in the database entry to the public channel. Consumers (e.g. data pipelines) can use the NatsNotifier::new(None).start_subscription()
to start waiting for new files. The following snippet below shows how a simple pipeline can use this feature in a loop.
let mut notifier = match NatsNotifier::new(url).await {
Ok(n) => n,
Err(e) => {
error!("{}", e);
return;
}
};
if let Err(e) = notifier.start_subscription(subject).await {
error!("{}", e);
return;
}
while let Some(item) = notifier.next().await {
if pretty {
println!("{}", serde_json::to_string_pretty(&item).unwrap());
} else {
println!("{}", item);
}
}
We also implemented a simple new file watcher in the app as bgpkit-broker live
subcommand. It will start a subscription to the public BGPKIT NATS endpoint and print out new file data as they come to the channel.
One command to serve and update
As mentioned previously, the new bgpkit-broker
application includes everything one needs to start a instance. Once one bootstrapped the database to a local sqlite file (via bgpkit-broker bootstrap <FILENAME>
command), all they need to start a auto-updating API is to run bgpkit-broker serve <FILENAME>
.
bgpkit-broker serve --help
Serve the Broker content via RESTful API
Usage: bgpkit-broker serve [OPTIONS] <DB_PATH>
Arguments:
<DB_PATH> broker db file location
Options:
-i, --update-interval <UPDATE_INTERVAL> update interval in seconds [default: 300]
--no-log disable logging
-b, --bootstrap bootstrap the database if it does not exist
--env <ENV>
-s, --silent disable bootstrap progress bar
-h, --host <HOST> host address [default: 0.0.0.0]
-p, --port <PORT> port number [default: 40064]
-r, --root <ROOT> root path, useful for configuring docs UI [default: /]
--no-update disable updater service
--no-api disable API service
-h, --help Print help
-V, --version Print version
The serve
subcommand will also start a thread that periodically crawl and update the SQLite database to make sure the API always serve the up-to-date data.
Noticed that error message? It's by design as it tries to connect to notification channel for new files as a default behavior for a service, but not NATS URL is configured. We use the BGPKIT_BROKER_NATS_URL
environment variable to configure the NATS channel to use.
We also allow users to optionally configure a heartbeat URL to monitor the data updating status. After every success data crawling run, Broker will try to execute a HTTP GET to a URL if BGPKIT_BROKER_HEARTBEAT_URL
is set in the environment. This is useful to monitor the running status of the Broker instance without the need of setting up a cronjob.
We use Better Stack's Uptime monitoring service for page and heartbeat monitoring, and the public Broker instance is running V0.7 with the heartbeat URL set to this service. All status information can be found at https://status.bgpkit.com/
Production-ready, on-prem deployment
Although BGPKIT Broker has not yet reached V1.0, we consider it to be feature-complete and production-ready. Ever since V0.2, we have made our better efforts on not introducing any breaking changes and the service has been serving the community with a stable uptime ever since. We believe all libraries running in production should at least be 1.0, and thus we will release V1.0 soon this summer.
We also made significant efforts in V0.7 release to make sure BGPKIT Broker is as portable as possible. New users can spin up a fully functioning Broker instance with just two commands: bgpkit-broker bootstrap
and bgpkit-broker serve
, all within 5 minutes. With V0.7 released, we encourage all data pipeline designers to deploy a Broker instance on-premise, ensuring data pipelines are self-containing and reduce external dependencies as much as possible. We also will continue maintain our public instance to our best efforts (we are currently at 99.996% uptime). Thanks to our sponsors, we are able to keep the services up as we do, and we plan to continue serving the community the same way in the foreseeable future.
For the full V0.7 release notes, please check out our GitHub release page. If you have any comments, please drop us a message at Twitter, Mastodon, or email.