Skip to content

NPS Hikes

A Python project for collecting, validating, and analyzing hiking trail data from US National Parks. The project combines data from the National Park Service API, OpenStreetMap, and the USGS to build a PostGIS database of park boundaries and hiking trails, queryable through a REST API and an interactive Streamlit web app with natural language search.

Live demos

  • Web app --- Interactive map-based explorer with park selection, trail filters, data table, and CSV/GeoJSON export. Built with Streamlit and Folium.
  • API Swagger UI --- Browse the API docs and query park and trail data directly.

Note: Both demos run on free tiers and may take 30-60 seconds to respond on the first request while the servers wake up. Visualization endpoints (maps, elevation charts, 3D trails) and the natural language query endpoint (/query) are only available with a local deployment.

Project overview

  • Collect park metadata and boundaries from the NPS API.
  • Extract hiking trails from OpenStreetMap and The National Map (USGS).
  • Match personal hiking locations stored in Google My Maps to trail geometries.
  • Explore parks and trails through a FastAPI REST API.
  • Browse an interactive map, filter trails, and export data via a Streamlit web app.
  • Query the API in natural language via a local LLM.

Project structure

nps-hikes/
├── api/                       # FastAPI REST API
│   ├── main.py                        # API endpoints and application
│   ├── models.py                      # Pydantic response models
│   ├── queries.py                     # Database query functions
│   ├── database.py                    # Database connection management
│   └── nlq/                           # Natural language query module (Ollama LLM)
├── scripts/                   # Data collection and processing scripts
│   ├── collectors/            # Data collection from external sources
│   ├── processors/            # Data processing and analysis
│   ├── database/              # Database management utilities
│   └── orchestrator.py        # Complete pipeline orchestration
├── streamlit_app/             # Interactive Streamlit web app (API client)
├── config/                    # Configuration and settings
├── profiling/                 # Data quality analysis modules
├── tests/                     # Test suite
├── docs/                      # Documentation (this site)
└── utils/                     # Logging and utility functions

Data collection pipeline

The pipeline runs six steps in the following order:

Step What it does Data source
1. NPS data collection Park metadata, coordinates, and boundary polygons NPS API
2. OSM trails collection Hiking trails within park boundaries OpenStreetMap
3. TNM trails collection Official trail data within park boundaries The National Map(USGS)
4. GMaps import Hiking locations from Google My Maps KML files KML files in raw_data/gmaps/
5. Trail matching Matches GMaps locations to TNM or OSM trail geometries Internal
6. Elevation collection Elevation profiles for matched trails USGS Elevation Point Query Service (EPQS)

The pipeline is resumable: each collector skips parks or trails that already have data in the database. If something interrupts a run, re-running picks up where it left off.