Documentation

Using Parquet files

Parquet packages are optimized for analytics, typed columns, and efficient scans.

Origin and purpose

Parquet is a columnar storage format designed for analytical workloads. It stores values by column, enabling tools to read only the fields required by a query.

Why GeoIP Locations publishes Parquet

Parquet is ideal for large-scale analysis such as grouping by country, ASN, RIR, network type, or snapshot month. It keeps typed columns and usually stores data more compactly than CSV.

Recommended tools

  • DuckDB for local analytics.
  • Python with pandas and pyarrow.
  • Spark and other distributed processing engines.
  • DBeaver-compatible workflows where Parquet readers are available.

Example use cases

  • Top ASNs by IPv4 footprint.
  • Country-level coverage analytics.
  • Monthly release comparison and trend analysis.
  • Joining GeoIP rows to internal network inventory.