BGPKIT Blog

Command-line Routing Stats with Monocle and Cloudflare Radar API

Mingwei Zhang — Sun, 21 Apr 2024 18:49:59 GMT

BGPKIT monocle is a command-line utility program that helps users quickly pull Internet routing-related information from publicly available sources.

https://github.com/bgpkit/monocle

In BGPKIT monocle version V0.5, we add support for querying Cloudflare Radar's new BGP routing statistics and prefix-to-origin mapping APIs, the same APIs that power the Cloudflare Radar routing section. monocle users can now quickly glance overview of routing stats for any given ASN, country, or the whole Internet. Users can also quickly look up prefix origins and examine their RPKI validation status as well as prefix visibility on the global routing tables.

Using `monocle radar`

We added a new monocle radar command group in V0.5, which contains the following to subcommands:

monocle radar stats [QUERY]: get routing stats (like prefix count, rpki invalid count) for a given country or ASN.
monocle radar pfx2as [QUERY] [--rpki-status valid|invalid|unknown]: get prefix to origin mapping for a given prefix or ASN

mingwei@terrier ~ % monocle radarCloudflare Radar API lookup (set CF_API_TOKEN to enable)Usage: monocle radar Commands:  stats   get routing stats  pfx2as  look up prefix to origin mapping on the most recent global routing table snapshot  help    Print this message or the help of the given subcommand(s)Options:  -h, --help     Print help  -V, --version  Print version

Cloudflare API token needed

Since the monocle radar command relies on querying data using Cloudflare Radar public API, we also need to specify a user API token as the environment variable CF_API_TOKEN. Obtaining an API token is free and only needs a Cloudflare account. Interested users can follow their official tutorial to obtain a token. The environment variable can be set in a .env file in the current directory, or set in ~/.bashrc or ~/.profile etc.

`monocle radar stats`

Users can query the routing statistics for a given country or ASN. For example, monocle radar stats us returns the routing stats for the United States, while monocle radar stats 174 returns the stats for Cogent (AS174).

The displayed table is further divided into three rows, one for overall counting, and one for IPv4 and IPv6-specific counting. For each row, we show the following fields:

origins: the number of origins ASes registered in the given country
prefixes: the number of prefixes originated by the given ASN or ASes registered in the given country
rpki_valid/invalid/unknown: the number of RPKI valid/invalid/unknown prefix routes (prefix-origin mapping) on the global routing table and their percentage of the overall routes.

`monocle radar pfx2as`

Users can query the prefix-to-origin API to get the mapping of origin ASes and their originated prefixes on the global routing table.

In the following example, monocle radar pfx2as 174 --rpki-status invalid, we ask for all the prefixes originated by AS174 with the RPKI validation status to be invalid. This command returns us the list of RPKI invalid prefixes originated by AS174 at the time of generating the dataset.

Questions it can answer now (more in the future)

Here is a selected list of questions that monocle radar command can answer you:

How many ASes are there on the Internet that announce at least one prefix? (81,770)
How many of these ASes announce only IPv6 prefixes? (6,853)
How many prefixes are there on the global routing table? (1,205,218)
How many prefixes do AS400644 announce? (1)
Which AS(es) originates 1.1.1.0/24? (AS13335)
How many prefixes originated by AS174 are NOT covered by some RPKI ROA? (a lot, 94%+)
How about the RPKI valid ratio for the Philippines? (77%, nice!)

Powered by Cloudflare Radar free API

Cloudflare Radar is a hub that showcases global Internet traffic, attack, and technology trends and insights.

What Cloudflare Radar shines is its data openness. Everything you see on the Cloudflare Radar website is powered by their free publicly available APIs. It's a treasure trove there, and all users need is a free API token to access everything.

At BGPKIT, we think we can further improve the usability of the API by exposing them as a proper Rust SDK: radar-rs. This is our (unofficial) effort on bringing the Cloudflare Radar's rich data to Rust developers. For example, monocle radar is powered by this SDK.

2022 Year in Review

Mingwei Zhang — Tue, 31 Jan 2023 17:14:22 GMT

In 2022, BGPKIT as an open-source organization made significant progresses. As the founder, I am grateful for all the opportunities would like to take this time to appreciate all the milestones we achieved. In this post, I will go through some notable changes we made in 2022, and take a look at what we are excited about for the year of 2023.

BGPKIT Parser

There were a number of major features added to BGPKIT Parser in 2022:

v0.7.0 added support for filtering messages by many fields, and allowing reading from uncompressed files.
v0.7.1 added better examples of parallel MRT files processing with rayon.
v0.7.2 added filtering by multiple peer_ips.
v0.8.0 includes many internal refactoring and brought in the new oneio library to improve developer experience.

BGPKIT Broker

In 2022, we have revised the BGPKIT Broker backend to support crawling for estimated file sizes in addition to other fields like timestamps and URLs. This allows us to keep track of MRT file size changes and help users to pick suitable collectors to use, especially at the age where RIB dumps from a single collector could reach over 1 GB in size.

Figure of RIB sizes over time. Blue line is for rrc00, while yellow line is for route-views2.

We also made major revisions to our Rust SDK for more features like .latest() to get the lastest MRT files from each collector.

https://github.com/bgpkit/bgpkit-broker/releases/tag/v0.5.0

Plus, if you would like to deploy a broker instance yourself, we also have made some significant efforts to improve our documentation on self-hosting guide.

https://github.com/bgpkit/bgpkit-broker-backend/blob/main/deployment/README.md

Python Bindings

In addition to core Rust code base for parser and broker, we have also added support for Python bindings for various Rust SDKs. Users can easily parse MRT files directly using pybgpkit Python library. It is also proven to be usable on cloud-based Jupyter notebooks like Google Colab (examples).

https://github.com/bgpkit/pybgpkit

Monocle

To ties things together, we have also developed our first investigative tool, monocle , to help users to quick find relevant BGP announcements with a suite of easy-to-use utilities. Users with Rust toolchain installed can run cargo install monocle to install the tool.

https://github.com/bgpkit/monocle

Users can use the following subcommands

parse: parse single MRT files, remotely or locally
search: find and filter BGP messages accross multiple public collectors
time: convert between local time string and Unix timestamps
whois: find out AS names, ASN, registration countries, and organizations.

Web/API and Infrastructure

In 2022, we started to experiment new cloud-based infrastructure, especially cloud-based databases for better API stability and developer experiences. We ended up selecting Supabase as our PostgreSQL production host and a self-hosted instance for backup. We are happy with the performance and cost provided by Supabase and more exicted about the potential capability it brings such as user authentication, cloud storage, local dev schema changes, etc.

Based on the new infrastructure, we have started to test a new integrated API system (still in alpha). This allows us to put all of our data access and processing end-points into one location. Based on the new API, we have also developed a newer version of the BGPKIT Broker statistics page: https://alpha.stats.bgpkit.com/

New Datasets

Apart from provide SDKs and data APIs, we have also started providing free access to historical archives of some data that we find interesting. We blogged about one dataset previous, the peer-stats dataset:

https://blog.bgpkit.com/peer-stats-dataset/

Here is a list of our currently available datasets at https://data.bgpkit.com/

[peer-stats](https://data.bgpkit.com/peer-stats/): route collector peers statistics (IP, ASN, v4/v6 prefixes counts)
[as2rel](https://data.bgpkit.com/as2rel/): AS-level relationship, using all available collectors
[pfx2as](https://data.bgpkit.com/pfx2as/): prefix-to-AS mapping, using all available collectors
[ihr-hegemony](https://data.bgpkit.com/ihr/hegemony/ipv4/global/): mirror of IIJ-IHR's global hegemony score dataset (big shout out to Romain and Internet Health Report for producing this data)

All the above datasets are free to use for research or commercial usages, and here is the acceptable usage agreement.

More Public Repositories

There are more open code repositories set available, some experimental and some for data analysis. You can check out the full list here:

https://github.com/orgs/bgpkit/repositories

Founder's Notes

In later 2022, I have joined Cloudflare to continue working on routing security for public benefits. During the first few months, we have built and shipped our new route-leak detection system under the public Radar platform. BGPKIT suite is now used in production for the BGP data anaysis pipeline at Cloudflare. While working full-time now, I am still committed to maintain the software suite and bringing new features to BGPKIT. For example, this year, I ported back our Kafka support used in Cloudflare to BGPKIT Broker backend. Folks at Cloudflare are doing great things in the open-source realm, and BGPKIT software suite will continue to become more useful to BGP enthuesastics and remain completely open-source.

Quote from Cloudflare's route-leak detection system blog.

Looking at 2023, here are a few things that I am really excited to work on for BGPKIT suite:

continue improve the infrastructure of the system and adding new data processing pipelines
continue improve the parser's performance and reliability
adding new RFC supports to parser (e.g. RFC9234 for route-leak prevention)
productionizing the new API and stats website
write more examples and documentations (don't we all love that)

And yeah, we will continue to be open-source first!

If you like what we do here, please consider subscribe to our blog. For all code repositories, check out our GitHub page.

https://github.com/bgpkit

Introducing Peer-Stats Dataset

Mingwei Zhang — Mon, 16 May 2022 15:20:00 GMT

Public BGP data collector projects like RouteViews and RIPE RIS provide valuable research and operational information for understanding BGP and detecting Internet routing anomalies.

There are many BGP routers involved in BGP collection projects.

A project includes many "collectors," and each serves as a collection of messages from several active BGP peers from different networks. Some bigger collectors collect BGP data from more than one hundred BGP routers. For example, RIPE RIS rrc00 has 112 active BGP peers at writing. (To learn about the complete list of BGP peers from all collectors, try theexperimental toolswe developed.)

Sometimes, too many BGP peers may become problematic.

Not all peers present the same amount of data. Some peers are so-called "full-feed" peers, which are the ones that provide the full routing tables to the collector. In a routing table dump file from the collectors, we can observe the full table of these peers. Some peers, however, only provide a limited number of routing entries to the collectors, not representing the whole routing status from these peers. In a project that tries to rebuild full routing tables, e.g., some BGP hijack detection or anomaly detectors, people prefer to use the full-feed peers as their data source.

At times, we are only interested in data from certain peers. For example, when studying the routing data from a particular network, if the network connects to BGP data collectors, we can directly pull data from the collectors' data. However, it can be troublesome to learn about what collectors have data from certain peers. RIPE RIS provides a nice API for querying such info, but we couldn't find one for RouteViews.

Historical data for such information is also missing. Unfortunately, for the researchers who want to study the evolution of the data collectors, even RIPE RIS's peers API could not help with that.

Introducing BGPKIT Peer-Stats Dataset

Peer-Stats dataset is a publicly available, free-to-use dataset that aims to provide daily collector peer information for all RouteViews and RIPE RIS collectors for ten years.

https://data.bgpkit.com/peer-stats/

The data includes the following fields for each peer of a BGP collector:

asn: Autonomous System Number of the collector peer
ip: the IP address of the collector peer
num_v4_pfxs: the number of IPv4 prefixes propagated from the collector peer
num_v6_pfxs: the number of IPv6 prefixes propagated from the collector peer
num_connected_asns: the number of connected (immediate next hop) ASes from the collector peer

The dataset is organized by the following structure.

- collector    - year        - month            - data files

Screenshots of the dataset file listing site.

Each data file is in JSON format (see the section below) and compressed with bzip2. Users can easily use tools like bzcat and jq to view the data files. For example, you can run the following command to quickly view any of the peer-stats data for the collector rrc00 on 2022-05-01.

curl "https://data.bgpkit.com/peer-stats/rrc00/2022/05/rrc00-2022-05-01-1651363200.bz2" --silent | bzcat | jq

{  "collector": "rrc00",  "peers": {    "102.67.56.1": {      "asn": 328474,      "ip": "102.67.56.1",      "num_connected_asns": 330,      "num_v4_pfxs": 919443,      "num_v6_pfxs": 0    },    "103.102.5.1": {      "asn": 131477,      "ip": "103.102.5.1",      "num_connected_asns": 184,      "num_v4_pfxs": 895482,      "num_v6_pfxs": 0    },...

Because all the data files are generated against the midnight UTC RIB dump of the day, you can also easily construct a URL to a data file for any particular date using the following template.

https://data.bgpkit.com/peer-stats/{COLLECTOR}/{YEAR}/{MONTH}/{COLLECTOR}-{YEAR}-{MONTH}-{DAY}-{MIDNIGHT_TIMESTAMP}.bz2

Open-source

We also open-sourced the data collection command-line tool source code on GitHub. Feel free to check it out and run it on your infrastructure if needed.

https://github.com/bgpkit/peer-stats

Credits and Sponsorship

The original idea for this work came from our extensive discussion with Romain Fontugne (follow him on Twitter at @romain_fontugne) from IIJ. This work is made possible by IIJ's generous sponsorship.

Please consider sponsoring us on GitHub if you find our work valuable and would like to see more open-source code and datasets on BGP.

https://github.com/sponsors/bgpkit

KhersonTelecom Outage and Connectivity Change

Mingwei Zhang — Tue, 03 May 2022 18:06:05 GMT

Internet service in Russian-occupied Kherson, Ukraine was disabled at 16:12 UTC (6:12pm local) on Saturday, 30 April. #UkraineRussiaWar
Khersontelecom service was restored ~24hrs later via Russian transit from nearby Crimea. pic.twitter.com/uN31jLrzEc
Doug Madory (@DougMadory) May 2, 2022

The AS47598 experienced an outage shortly after 2022-04-30T16:10:00 and then resumed connectivity 2022-05-01T16:15:00 (both UTC time). After the outage, the AS47598 is then connected via a different upstream provider, AS201776.

The upstream provide change can also be seen on IIJ's Internet Health Report.

Prefixes

AS47598 announces only one prefix 91.206.110.0/23 (data from Hurricane Electric). Most of the BGP announcements are for this prefix. However, after the provider change happened, there were also a IPv6 prefix announcements for this V6 prefix as well 5bce:6e00::/23. It is possible that this prefix is a V4-translated prefix propagated to a V6 collector peer, but we do not know for sure.

We can also confirm the outage of the prefix with RIPEstats Routing History data widget. (uncheck the No low visibility box to reveal the outage).

BGP Messages

We can visualize the overall BGP announcements volume with Cloudflare Radars AS-level page.

The following are the UTC timestamps for the corresponding BGP message spikes.

2022-04-30T16:10:00: announcements with old provider 12883 in the paths.
2022-05-01T16:00:00: announcements with the new provider 201776 in the paths
2022-05-03T10:45:00: similar announcements with a new provider in the paths.

During the first gap time (2022-04-30T16:15:00 to 2022-05-01T16:00:00), there were 0 BGP updates for the prefix or from the ASN.

The old provider paths look like this where AS12883 is the next hop for AS47598. See the full list of messages from rrc00 here: https://gist.github.com/digizeph/c58b77f755d7fec8a7969807fb17d5ba.

207564 56655 3257 12883 47598

The new provider paths look like this where AS12389 and AS201776 are the next hops for AS47598. See the full list of messages from rrc00 here: https://gist.github.com/digizeph/896a4a7e4de23082b496b92ab5bdab5b

207564 28824 28824 1299 12389 201776 47598

BGP Data Tooling

The analysis is done using a privately hosted open-source BGPKIT parser web API. You can host it on your infrastructure, and the source code is freely available at https://github.com/bgpkit/pybgpkit-api. Comments and feedback are welcome!

Update on 2022-05-04T09:20:00 Pacific

#Kherson Internet connectivity is returning to the occupied city in South of #Ukraine, after an outage since Saturday. @Cloudflare data shows growth in requests since 04:15 UTC, and telecom connection was confirmed by the Ukrainian Vice PM @FedorovMykhailo. pic.twitter.com/JxT3kcM234
Cloudflare Radar (@CloudflareRadar) May 4, 2022

The upstreams for AS47598 has reverted back to the original ASes, and the traffic has started coming back to normal.

Parallel MRT Files Parsing with BGPKIT

Mingwei Zhang — Wed, 16 Mar 2022 04:50:13 GMT

In this post, we will talk about how to implement a Rust workflow that can process a large number of BGP data files as fast as we can. We will use BGPKIT Parser and Broker for data collection and parsing, and Rayon crate for parallelization of the code.

Task Overview

Before we begin to talk about the code design, we first need to introduce the data we are dealing with. We want to process the BGP data collected by various collectors, saved in compressed binary MRT format, and archived to files with a fixed interval. In this post, we are using RouteViews archive data as an example. The average data file size ranges from 2MB to 10MB by different collectors (AMSIX collector for example has pretty large files).

For processing, we can use the simplest task possible to do: sum the number of MRT records in all the files. We want to download and process all the updates files for one hour from all the collectors in RouteViews project. Here is the estimated amount of data we are dealing with:

35 collectors
5-minute interval12 files per collector
420 total number of files to download and process
840MB to 4.2GB total download size (its somewhere in between)

Ok! Now that we know what we want to do and have a sense of the estimated workload of the overall task, lets coding!

Photo by Glenn Carstens-Peters on Unsplash

1. Sequential Parsing

Our first attempt to achieve the goal is to design and implement a naive sequential workflow as described below

find all BGP updates files within the hour of interest
iterate through each file, parse the MRT data and count the number of records
sum all record counts and print out the result

For this sequential workflow, we will need to pull in two dependencies into Cargo.toml:

[dependencies]bgpkit-parser = "0.7.2"bgpkit-broker = "0.3.2"

The bgpkit-broker handles looking for updates files within the hour, while bgpkit-parser handles parsing each individual file.

Finding files

BGPKIT Broker indexes all available BGP MRT data archive files from both RouteViews and RIPE RIS in close-to-real-time. For each data file, it saves the following information:

project: route-views or riperis
collector: the collector ID, e.g. rrc00 or route-views2
url: the URL to the corresponding MRT file
timestamp: the UNIX time of the start time of the MRT data file.

With all this information indexed, we can then query the backend and retrieve files information as we want. BGPKIT Broker provides both RESTful API, Rust API, as well as a Python API. Here we use the Rust API to pull in the information we need:

let broker = BgpkitBroker::new_with_params(    "https://api.broker.bgpkit.com/v1",    QueryParams {        start_ts: Some(1640995200),        end_ts: Some(1640998799),        project: Some("route-views".to_string()),        data_type: Some("update".to_string()),        ..Default::default()    });for item in &broker {    println!("processing {:?}...", &item);}

The above block queries the broker and prints out all information of the retrieved files' metadata. The BgpkitBroker::new_with_params call accepts two parameters, one for the endpoint of the broker instance, and the other specifies the filtering criteria. In this example, we search for all BGP updates files from RouteViews with timestamps between 2022-01-01T00:00:00 and 2022-01-01T00:59:59 UTC. It prints out the output as the following:

processing BrokerItem { collector_id: "route-views.telxatl", timestamp: 1640997000, data_type: "update", url: "http://archive.routeviews.org/route-views.telxatl/bgpdata/2022.01/UPDATES/updates.20220101.0030.bz2" }...processing BrokerItem { collector_id: "route-views.uaeix", timestamp: 1640997000, data_type: "update", url: "http://archive.routeviews.org/route-views.uaeix/bgpdata/2022.01/UPDATES/updates.20220101.0030.bz2" }...processing BrokerItem { collector_id: "route-views.wide", timestamp: 1640997000, data_type: "update", url: "http://archive.routeviews.org/route-views.wide/bgpdata/2022.01/UPDATES/updates.20220101.0030.bz2" }...processing BrokerItem { collector_id: "route-views2", timestamp: 1640997900, data_type: "update", url: "http://archive.routeviews.org/bgpdata/2022.01/UPDATES/updates.20220101.0045.bz2" }...

Parse each MRT file

Previously, in the for loop, we only print out the retrieved meta information of the MRT files. Now let's add the actual parsing of the files into the loop. The code is very simple, as designed:

let mut sum: usize = 0;for item in &broker {    println!("processing {}...", &item.url);    let parser = BgpkitParser::new(&item.url).unwrap();    let count = parser.into_record_iter().count();    sum += count;}

We first define a mutable variable sum outside the loop. Then for each file, we create a new Parser instance by BgpkitParser::_new_(&item.url). Here, as our goal is to count the number of records, we call the parser's .into_record_iter() function to create a iterator over the records of the file, and then .count() to get the count of the records. Lastly, we add the count to the overall sum variable.

Run and timing

For testing, I use a fairly powerful VM on a host with AMD 3950x CPU (32 threads), then build the release build and time the release run. The runtime includes downloading the MRT files to my machine with 1Gpbs down link in Southern California.

cargo build --releasetime cargo run --release --bin sequential

It ended up taking about 1 minute and 23 seconds to sequentially parse 144 MRT files from RouteViews for all available ones within the first hour of 2022 (UTC).

total number of records for 144 files is 10554212real    1m23.081suser    0m39.535ssys     0m1.006s

2. Parallel Parsing

Since the parsing of each file is completely independent of each other, we can parse the files in parallel and then sum up the count for each thread at the end. In Rust with Rayon, this conversion is very simple.

Let's first add the dependency of Rayon first:

[dependencies]bgpkit-parser = "0.7.2"bgpkit-broker = "0.3.2"rayon = "1.5.1"

Then we change the broker code one tiny bit to collect all meta information for files into a vector first.

let items = broker.into_iter().collect::<Vec>();

This would enable us to fully utilize rayon's great syntax sugar to turn our sequential code into a parallel one.

let sum: usize = items.par_iter().map(|item| {    println!("processing {}...", &item.url);    let parser = BgpkitParser::new(&item.url).unwrap();    let count = parser.into_record_iter().count();    count}).sum();

The key difference here is the calling of .par_iter(). It turns a sequential iterator into a parallel iterator, and by default going to utilize all available cores on the host machine for scheduling. Then we call .map() to define the parsing steps for each file, and then call .sum() at the end to add all results up.

The final result is approximately 10x faster than the sequential version, and it took only 8 seconds to parse all MRT files and get the record counts.

total number of records is 10554212real    0m8.086suser    0m42.569ssys     0m1.068s

The full code for this example is as follows:

The source code of the two examples is available on GitHub. Feel free to poke around and tweak it as you wish.

https://gist.github.com/digizeph/c23ba39968c6cb4e1ad323520540010f

https://github.com/bgpkit/bgpkit-tutorials/tree/main/parallel-parsing

Real-time RIS Live Data with BGPKIT Parser

Mingwei Zhang — Fri, 12 Nov 2021 20:27:00 GMT

In terms of real-time BGP data processing, RIPE NCC provides a great data source: Routing Information Service Live (RIS Live).

To begin with, here is what RIS Live by the creators:

RIS Live is a feed that offers BGP messages in real-time. It collects information from the RIS Route Collectors (RRCs) and uses a WebSocket JSON API to monitor and detect routing events around the world. A non-interactive full stream (firehose) is also available.

In essence, RIS Live provides:

a WebSocket interface to stream BGP messages in real-time
ability to subscribe to sub-streams with custom filtering messages
JSON-encoded BGP messages as the stream payload
firehose HTTPS stream interface as well, without needing to work with websocket.

In this post, we will discuss how to use the RIS Live stream in practice.

RIS Live Message Format

RIS Live has client messages and server messages.

The client messages is used to setup or dismantle subscriptions, which essentially tell the server what kind of BGP messages a client would like to receive, and allow the server to send only the interested messages to the client.

A server acknowledges the requests from the client and afterwards start streaming requested data back to the client. At a high-level, a server sends either ris_message or ris_error messages. The ris_message is the main payload that we are interested in, while the ris_error message provides debugging messages for the scenarios where stream or subscription fails.

RIS Live provides great flexibility for the clients to specify/narrowdown the interested messages, allowing both the server and client process less messages during a streaming session.

host: only messages collected from a particular RRC (e.g. rrc21)
type: only messages of a given type, e.g. UPDATE , OPEN
require : only messages containing a given key, e.g. withdrawals will return only message that contains any withdrawn prefixes
peer: messages from a particular BGP peer
path : ASN or pattern to match the AS Path attribute in BGP update messages
prefix: only messages containing information for a given prefix
moreSpecific and lessSpecific: only messages that are the subprefix or super-prefix of the specified prefix
includeRaw: whether to include the Base64-encoded RAW BGP messages

As an example, lets take a look at the following message from the official manual:

{  "host": "rrc01",  "type": "UPDATE",  "require": "announcements",  "path": "64496,64497$"}

Example subscription message composer on RIS Live official site

As an example, lets take a look at the following message from the official manual:

{  "host": "rrc01",  "type": "UPDATE",  "require": "announcements",  "path": "64496,64497$"}

collected by rrc01
BGP UPDATE messages
have at least one announced prefix
the last two hops of the AS Path is 66496 and 64497 (the origin)

The ris_message consists of common header fields and data fields (although theyre on the same level).

The common header fields are present for all types of sub-type messages, including timestamp, peer, peer_asn, id, host, type . The rest of the fields are the data fields that are dependent on the type of the messages. For most people, the UPDATE message is what they need. The following JSON block is an example message pulled directly from the demo site.

Example JSON formatted RIS message:

This example shows a BGP announcement of AS132354 originating two prefixes 103.249.208.0/23 and 103.14.184.0/24 , with the next hop to be 37.49.237.228. At this point, the information we see here is pretty similar to what we can see from other BGP MRT readers output (e.g. from bgpdump or bgpreader), just in JSON format.

WebSocket or Firehose?

Provided that RIS Live provides both WebSocket and HTTP Firehose, one would naturally wonder which one is the right choice for their application. Here we have a brief comparison between the two in the context of RIS Live.

WebSocket

Good:

easy to customize stream by composing a simple JSON subscribe message
work with various toolings in languages like Python and JavaScript

Bad:

requires extra library dependencies to work with WebSocket
need to write somewhat lengthy to get started (comparing to firehose)

Firehose

Good:

easy to consume by simply calling GET request on the URL
simple single-liner commandline program can start the stream (e.g. a simple curl call), no need complex script

Bad:

customizing stream is doable with XRIS-SUBSCRIBE HTTP request header, but feels clunky and limited
in my personal tests, the stream get disconnected often due to the stream cannot keep up with the data producer. this did not happen with websocket tests.

Summary

If your application could afford additional dependencies or writing extra scripts, WebSocket is the better choice. RIS Live official manual also makes implication that the WebSocket format is the current formally-supported streaming method.

RIS Live Coding Example with BGPKIT Parser

Now that we have a basic idea of what is RIS Live and the basic message format, we can get started working on some code that will actually use RIS Live to do something useful.

In the following example, we will build a short monitoring service that alerts us when Facebook operators announces their DNS IP prefix (see what happened before here). We are going to build the service in Rust with BGPKIT Parser , WebSocket library Tungstenite.

First, lets collect some basic information about what we are going to monitor here:

Facebooks autonomous system number is 32934. So we will watch for all messages that was originated from AS32934.
Facebooks DNS server IP prefixes involved in the previous incidence are 129.134.30.0/23 and 185.89.218.0/23 . So we want to carefully watch these two prefixes in our monitoring system.
We want to use one of the RIPE RIS collectors data for monitoring, rrc21 is a good choice since its being used by RIS Lives demonstration. You can easily extend this service by tweaking the subscription message later.

OK, we are good to go. Lets do it!

Setting up the stream

We picked the Tungstenite library as our WebSocket library of choice, partly because it has a very straightforward API design.

Lets first connect to the websocket server by calling connect function given a websocket URL. One thing to notice is that the URL protocol section here is ws as opposed to the wss mentioned in the RIS Live documentation. For some reason, Tungstenite does not work with wss protocol (with SSL).

use tungstenite::{connect, Message}; const RIS_LIVE_URL: &str = "ws://ris-live.ripe.net/v1/ws/?client=rust-bgpkit-parser";let (mut socket, _response) =    connect(Url::parse(RIS_LIVE_URL).unwrap())    .expect("Can't connect to RIS Live websocket server");

Now, with a socket ready, we will first send a subscription message to let server know that we want some messages and we are ready to receive.

let msg = json!({"type": "ris_subscribe", "data": {"host": "rrc21"}}).to_string();socket.write_message(Message::Text(msg)).unwrap();

Here we composed a simple subscription message that limits the stream to have messages only from rrc21 collector.

Parsing JSON messages

At this point, we have a WebSocket connection to RIS Live server, and have sent out a subscription message to the server. The server should be sending back messages anytime now, and we are ready to consume the stream.

We would like to code the following behavior:

continuously reading the websocket messages;
parse JSON string into internal BGP structs;
check each message if it contains origins (withdraw-only messages does not contain AS paths, and thus no origins either;
if the origin AS is AS32934, and the announced prefix is 129.134.30.0/23 or 185.89.218.0/23 , then we print out the message to output.

loop {    let msg = socket.read_message().expect("Error reading message").to_string();    if letOk(elems) = parse_ris_live_message(msg.as_str()) {        for elem in elems {            if letSome(origins) = elem.origin_asns.as_ref() {                if origins.contains(&32934) &&                    ( elem.prefix.to_string() ==  "129.134.30.0/23".to_string() ||                        elem.prefix.to_string() ==  "185.89.218.0/23".to_string() )                {                    println!("{}", elem);                }            }        }    }}

The full example code can be found here:

https://gist.github.com/digizeph/fcac3027555c0b744ea0b3a11197b694

Building More with BGPKIT Tools

As introduced in our previous blog post, we added support of real-time BMP stream to BGPKIT Parser as well. Combining with RIPE RIS Live, and RouteViews BMP stream, we can build a powerful real-time BGP monitoring service directly within BGPKIT Parser. We also offer indexing and processing of historical BGP data as well with BGPKIT Broker.

Our goal at BGPKIT is to design, develop, and deploy the most developer-friendly BGP data processing toolkit. To learn more about our offerings, please check out our website and official Twitter account.

Real-time BMP with BGPKIT Parser

Mingwei Zhang — Wed, 10 Nov 2021 20:13:00 GMT

Real-time BGP data processing is very critical on building monitoring services that can detect BGP issues quickly with minimum delay and react to anomalies quickly and mitigate potential issues.

We are creating a new series of posts describing how we design our software to work with real-time BGP data streams. As an opening, we will describe how we handle data streams with BMP protocol and OpenBMP messages.

BMP

The BGP Monitoring Protocol (BMP) is a protocol that allows monitoring of BGP devices.

The RFC7854 describes the purpose of BMP as:

Many researchers and network operators wish to have access to the contents of routers BGP Routing Information Bases (RIBs) as well as a view of protocol updates the router is receiving. This monitoring task cannot be realized by standard protocol mechanisms. Prior to the introduction of BMP, this data could only be obtained through screen scraping.
BMP provides access to the Adj-RIB-In of a peer on an ongoing basis and a periodic dump of certain statistics the monitoring station can use for further analysis. From a high level, BMP can be thought of as the result of multiplexing together the messages received on the various monitored BGP sessions.

There are multiple types of BMP messages, each serving different purposes.

Peer up and down notification: notification about the status of peering sessions to a monitored router;
Initiation message: inform the monitoring station of the routers vendor, software version, and so on;
Termination message: provides information on why a monitored router is terminating a session;
Route monitoring:initial synchronization of the routing table;
Route mirroring: verbatim duplication of messages as received.

For real-time BGP data processing, we are specifically interested in the route monitoring and route-mirroring messages, as we provide the routing information encoded as actual BGP messages.

OpenBMP

OpenBMP is a software implementation of the BMP protocol. It is an open-source project created by Cisco and currently maintained by nice folks from CAIDA/UCSD and RouteViews. It is implemented in C++, can be used with any compliant BMP sender (e.g., router).

Architecture graph of OpenBMP

OpenBMP provides multiple formats for outputting the BMP messages collected from the connected routers, one of which is the raw_bmp format, which is a thin wrapper of the raw BMP messages. The raw_bmp format provides the best performance and allows use to handle the BMP messages directly without having to write a different parser for the plaintext messages.

RouteViews currently provides a OpenBMP Kafka stream that streams BMP messages from their collectors.

BGPKIT Parser with BMP/OpenBMP Support

We develop BGPKIT Parser to provide a one-stop solution for handling all parsing tasks regarding BGP data. Supporting real-time data like BMP is a very important milestone for us.

We have recently developed the full support for BMP messages, and partial support for OpenBMP messages (for raw_bmp type only). This enables us to start working with real-time BMP streams like RouteViews Kafka stream.

Below is an example code that takes RouteViews Kafka OpenBMP stream and parse the messages into internal data structures:

let mut reader = Cursor::new(Vec::from(kafka_payload));let header = parse_openbmp_header(&mut reader).unwrap();if let Ok(msg) = parse_bmp_msg(&mut reader) {    info!("Parsing OK: {:?}", msg.common_header.msg_type);    match msg.message_body {        MessageBody::RouteMonitoring(m) => {            dbg!(m.bgp_update);        }        _ => {}    }}

Here is a break down of what it does:

it first creates a bytes reader from the raw Kafka message payload;
then parse OpenBMP message header, which contains some basic information about the BMP session;
then it calls the parse_bmp_msg function to parse the embedded raw BMP messages and print out the BGP update messages if the parsing is successful.

Here is a full code example:

https://gist.github.com/digizeph/fcac3027555c0b744ea0b3a11197b694

We have published the SDK on crates.io and GitHub. Feel free the check out the example code at examples/routeviews-kafka.rs if you are interested.

Introducing BGPKIT Parser

Mingwei Zhang — Mon, 01 Nov 2021 20:02:00 GMT

BGPKIT Parser is an open-source Rust-based MRT/BGP data parser that takes a MRT formatted binary file and turns it into BGP messages. It is one of the most important building block software that enables BGP data processing and analysis tasks.

Design and Features

As mentioned in our previous post introducing BGPKIT Broker, the most used BGP data collection projects, RouteViews and RIPE RIS, both publish their collected BGP data in MRT format on their data platform. The BGPKIT Parser is designed to handle parsing tasks for these data sources.

We design our parser to strictly follow the industry standard, e.g. RFC4271 and RFC6396. BGPKIT Parser is also designed with the following goals:

performant: comparable to C-based implementations like bgpdump or bgpreader.
actively maintained: we consistently introduce feature updates and bug fixes, and support most of the relevant BGP RFCs.
ergonomic API: a three-line for loop can already get you started.
battery-included: ready to handle remote or local, bzip2 or gz data files out of the box.
open-source: we want people to use our parser freely and we can continue develop and improve it based on community feedbacks.

To demonstrate how easy it is to get started using BGPKIT Parser, check out the example below where we print out all BGP messages from a remote MRT file on RouteViews:

Example of reading a remote MRT file from RouteViews and print out BGP messages. Code available at https://gist.github.com/digizeph/9977371653f39a459ff3ae507dc3636c

There are a number of things happens when we call for elem in BgpkitParser::new(url) :

it creates a new BgpkitParser struct instance with the provided URL to the data file;
it tries to retrieve the content of the remote file and download the raw compressed bytes into memory;
it determines the compression type by file suffix and calls corresponding decompression library to create a buffered reader;
it then creates an iterator (used by the for loop) that continuous return new parsed items until it reaches the end of the data stream.

We determined that it is worth the extra binary file size to bring in the network and compression libraries into the project so that the library users will never have to worry about handling data downloading and decompression by themselves again.

The Future

The future of the BGPKIT Parser lies on continuous performance and stability improvements, as well as some exciting features that we are currently planning. Some of the coming features include

adding capability of handling real-time data streams coming from RIPE RIS Live and RotueViewss Kafka BMP stream;
supporting data serialization back to MRT files (reverse-parsing), which allows users to produce customized MRT files after data processing;
adding WASM support to allow BGP data parsing directly on the web with JavaScript.

Because we are building our software in Rust, we can effortlessly tapping into Rusts great software ecosystem and continue introducing new features and improvements. The future of BGPKIT Parser is exciting and we cant wait to bring more features for you to try out!

For more details about the BGPKIT Parser, check out our GitHub repo and our website. Feedback is highly appreciated!

https://github.com/bgpkit/bgpkit-parser

Introducing BGPKIT Broker

Mingwei Zhang — Sun, 31 Oct 2021 20:04:00 GMT

BGPKIT Broker is a data API service that focuses on building a BGP data file index to enable searching for public/private BGP data files with custom filters. It is one of the building block components designed by BGPKIT to facilitate BGP data processing with ease.

The first step to investigate a BGP event.

Imagine this scenario: a malicious player just attempted to hijack a IP prefix using BGP announcements, and you are interested in learning what exactly happened during that half-hour down time of the victim network, how would you start investigating?

Collecting evidences. Luckily for us, the actors on the Internet, good and bad, always leave traces behind if they use BGP as their method. There are a number of reputable public BGP route collectors operating for years collecting all BGP messages received from their connected router peers and dump them into regular dump files. The most used projects are RouteViews and RIPE RIS Data.

Files are all over the places. There are more than 60 different data collectors from the two projects alone, and each publishing their data under separate sites. The two projects also have different data file structure and compression algorithms for their data dump files. It is not hard to go find one data file during the interested event time from one collector, but it will be a real hassle to gather URLs toward all data files that include information within a specified time range.

Collectors data is published as compressed MRT files at regular frequency.

BGPKIT Broker A BGP Data File Index API Service

We designed the BGPKIT Broker to resolve one and only one problem: quickly collect links to the BGP data files that matches a filtering criteria.

You can filter BGP data using multiple criteria:

start_ts : UNIX timestamp that all files must be dumped after
end_ts : UNIX timestamp that all files must be dumped before
data_type : the type of the data file, can be update or rib
collector: the collector ID that the files are generated from
project: the data collection project, can be route-views or riperis
page and page_size: the pagination control for collecting a large number of files

Here is an example REST API call: https://api.broker.bgpkit.com/v1/search?data_type=update&start_ts=1633046400&end_ts=1633132800&collector=rrc00&project=riperis&page%20=2&page_size=3

It asks for all data updates files dumped between 1633046400 and 1633132800, from RIPE RISs collector rrc00. It also requested for the second page of the results and each page contains 3 items.

DEPRECATED: BGPKIT Broker API service is freely available to use, hosted at https://api.broker.bgpkit.com/v1/. The documentation is available at https://docs.broker.bgpkit.com/

BGPKIT Broker API and SDK has been upgraded to V2 now. The V1 examples are left up for legacy services that depends on it. Please check out the current API documentation at for more:https://api.broker.bgpkit.com/v2/

BGPKIT Broker Rust API

The BGPKIT Broker API service is built entirely using Rust, and of course we also developed native Rust API to access the broker data with ease using Rust.

Example BGPKIT Broker Rust API call

The Rust API is open source under MIT license, and with the free API, you can already build your own workflow today!

https://github.com/bgpkit/bgpkit-broker

Easy On-premise Deployment

For API service like this, especially that it also collects and indexes data of over 10 years span, one might imagine the deployment could be complex and slow.

For BGPKIT Broker, we spent extra efforts to make the API deployment process as quickly as possible, and also efficient on resource consumption. For references, we bootstrapped the entirety of the database in under 5 minutes, and cost less than 1 GB storage to store the information in PostgreSQL database. The whole database and API run fluently with less than 500MB of RAM usage. This allows use to deployment extra instances when under heavy load without costing a fortune.

Users who need dedicated resource allocation for query performance can contact us for private API hosting that are not shared by others. Enterprise option is also available for on-premise deployment with customization consultation available. If you are interested in testing it out, feel free to shoot us an email at contact@bgpkit.com.

For more information, checkout our website!

https://bgpkit.com/broker

BGPKIT Journey Started

Mingwei Zhang — Tue, 26 Oct 2021 23:17:00 GMT

BGPKIT is a small-team start-up that aims to provide comprehensive tool suite to facilitate companies building on-premise BGP data monitoring services. We started our journey of building the best BGP data toolkit for developers in October, 2021. Here is a brief glance on what we are working on and what values we strive to provide.

BGP

Border Gateway Protocol (BGP) is the de facto inter-domain routing protocols being used by every major companies on the Internet. The main purpose of BGP is to allow companies exchange IP prefix reachability information. In other words, companies tell other companies what IP blocks they have, and how to reach these IP blocks. This the key functionality that enables the Internet.

Because the purpose of BGP is to exchange information and allow everyone to know how to reach certain IP blocks, the BGP messages must be propagated globally and publicly. There are a number of data sources available out there that provides BGP data archives (e.g. RouteViews and RIPE RIS), or real-time data like BGP looking glasses or live BGP data stream.

The aforementioned data sources contains a wealth of information if you know what to look for. For example, by looking at BGP information, researchers can infer companies relationships, monitor security incidences. Having handy toolkit in hand enables people to build successful businesses around BGP data.

BGPKIT

At BGPKIT, we build software tools to process BGP data and reveal insights from BGP messages. We aims to provide the best developer experience, and enable customers to build their own BGP data processing pipeline and monitoring services on-premise.

Complete Tool Suite

Our goal is to build complete tool suite for BGP data processing: data collection, parsing, analysis, programmable and visual interface, and data warehousing. Everything you need to handle BGP data.

Rust Implementation

To achieve the best performance and security, we choose to focus on building our tools using the Rust programming language.

We believe that Rusts ecosystem is now mature enough that building modern features like data streaming, async data workflow, parallel processing, web API, or even porting the entire codebase onto WASM and running it on browsers is a archivable task with reasonable efforts.

Powerful Extensibility

At BGPKIT, we design our libraries to provide powerful API and assist customers to further customize workflow to meet individual needs. We strive to provide the most ergonomic interfaces that allow library consumers to easily integrate our library into theirs.

Embrace Open-Source

We also believe that good tools empowers people, so we open-sourced our building block libraries to the public with very permissive license so that any interested parties can explore and build their own ideas free of charge and limitation.

We are excited to start this journey of building, and we hope to have the chance to work with more people and building more dreams together! Follow us here, on Twitter, or visit our website to learn more!