Introducing Peer-Stats Dataset

Photo by fabio on Unsplash

Introducing Peer-Stats Dataset

Public BGP data collector projects like RouteViews and RIPE RIS provide valuable research and operational information for understanding BGP and detecting Internet routing anomalies.

There are many BGP routers involved in BGP collection projects.

A project includes many "collectors," and each serves as a collection of messages from several active BGP peers from different networks. Some bigger collectors collect BGP data from more than one hundred BGP routers. For example, RIPE RIS rrc00 has 112 active BGP peers at writing. (To learn about the complete list of BGP peers from all collectors, try theexperimental toolswe developed.)

Sometimes, too many BGP peers may become problematic.

Not all peers present the same amount of data. Some peers are so-called "full-feed" peers, which are the ones that provide the full routing tables to the collector. In a routing table dump file from the collectors, we can observe the full table of these peers. Some peers, however, only provide a limited number of routing entries to the collectors, not representing the whole routing status from these peers. In a project that tries to rebuild full routing tables, e.g., some BGP hijack detection or anomaly detectors, people prefer to use the full-feed peers as their data source.

At times, we are only interested in data from certain peers. For example, when studying the routing data from a particular network, if the network connects to BGP data collectors, we can directly pull data from the collectors' data. However, it can be troublesome to learn about what collectors have data from certain peers. RIPE RIS provides a nice API for querying such info, but we couldn't find one for RouteViews.

Historical data for such information is also missing. Unfortunately, for the researchers who want to study the evolution of the data collectors, even RIPE RIS's peers API could not help with that.

Introducing BGPKIT Peer-Stats Dataset

Peer-Stats dataset is a publicly available, free-to-use dataset that aims to provide daily collector peer information for all RouteViews and RIPE RIS collectors for ten years.

https://data.bgpkit.com/peer-stats/

The data includes the following fields for each peer of a BGP collector:

  1. asn: Autonomous System Number of the collector peer

  2. ip: the IP address of the collector peer

  3. num_v4_pfxs: the number of IPv4 prefixes propagated from the collector peer

  4. num_v6_pfxs: the number of IPv6 prefixes propagated from the collector peer

  5. num_connected_asns: the number of connected (immediate next hop) ASes from the collector peer

The dataset is organized by the following structure.

- collector
    - year
        - month
            - data files

Introducing Peer-Stats Dataset

Introducing Peer-Stats Dataset

Introducing Peer-Stats Dataset

Introducing Peer-Stats Dataset

Screenshots of the dataset file listing site.

Each data file is in JSON format (see the section below) and compressed with bzip2. Users can easily use tools like bzcat and jq to view the data files. For example, you can run the following command to quickly view any of the peer-stats data for the collector rrc00 on 2022-05-01.

curl "https://data.bgpkit.com/peer-stats/rrc00/2022/05/rrc00-2022-05-01-1651363200.bz2" --silent | bzcat | jq
{
  "collector": "rrc00",
  "peers": {
    "102.67.56.1": {
      "asn": 328474,
      "ip": "102.67.56.1",
      "num_connected_asns": 330,
      "num_v4_pfxs": 919443,
      "num_v6_pfxs": 0
    },
    "103.102.5.1": {
      "asn": 131477,
      "ip": "103.102.5.1",
      "num_connected_asns": 184,
      "num_v4_pfxs": 895482,
      "num_v6_pfxs": 0
    },
...

Introducing Peer-Stats Dataset

Because all the data files are generated against the midnight UTC RIB dump of the day, you can also easily construct a URL to a data file for any particular date using the following template.

https://data.bgpkit.com/peer-stats/{COLLECTOR}/{YEAR}/{MONTH}/{COLLECTOR}-{YEAR}-{MONTH}-{DAY}-{MIDNIGHT_TIMESTAMP}.bz2

Open-source

We also open-sourced the data collection command-line tool source code on GitHub. Feel free to check it out and run it on your infrastructure if needed.

https://github.com/bgpkit/peer-stats


Credits and Sponsorship

The original idea for this work came from our extensive discussion with Romain Fontugne (follow him on Twitter at @romain_fontugne) from IIJ. This work is made possible by IIJ's generous sponsorship.

Please consider sponsoring us on GitHub if you find our work valuable and would like to see more open-source code and datasets on BGP.

https://github.com/sponsors/bgpkit