Documentation
Using Parquet files
Parquet packages are optimized for analytics, typed columns, and efficient scans.
Origin and purpose
Parquet is a columnar storage format designed for analytical workloads. It stores values by column, enabling tools to read only the fields required by a query.
Why GeoIP Locations publishes Parquet
Parquet is ideal for large-scale analysis such as grouping by country, ASN, RIR, network type, or snapshot month. It keeps typed columns and usually stores data more compactly than CSV.
Recommended tools
- DuckDB for local analytics.
- Python with pandas and pyarrow.
- Spark and other distributed processing engines.
- DBeaver-compatible workflows where Parquet readers are available.
Example use cases
- Top ASNs by IPv4 footprint.
- Country-level coverage analytics.
- Monthly release comparison and trend analysis.
- Joining GeoIP rows to internal network inventory.