Install dependencies using uv (recommended):
# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create and activate a virtual environment
uv venv
source .venv/bin/activate
# Install dependencies
uv pip install -e .
# For development dependencies (including testing and linting tools)
uv pip install -e ".[dev]"To generate or update the lockfile:
uv pip compile pyproject.toml -o uv.lockTo install dependencies from the lockfile:
uv pip install --requirements uv.lockThe default settings should work for local development, but if you need to tweak the environment variables, you can do so by copying the .env.example file to .env and making your changes.
cp .env.example .envRun the Docker containers:
The application uses several services:
-
ParadeDB (PostgreSQL-compatible database)
- Port: 2345
- Default credentials: postgres/postgres
- Database: btaa_geospatial_api
-
Elasticsearch (Search engine)
- Port: 9200
- Single-node configuration
- Security disabled for development
- 2GB memory allocation
- Index: btaa_geospatial_api
-
Redis (Caching and message broker)
- Port: 6379
- Persistence enabled
- Used for API caching and Celery tasks
-
DuckDB (Embedded analytical database)
- Runs in-process with the Python application
- No separate service or port required
- Database file:
data/duckdb/btaa_geospatial_api.duckdb - Used for analytical queries and data processing
- Access via Python
duckdbpackage
-
Celery Worker (Background task processor)
- Processes asynchronous tasks
- Connected to Redis and ParadeDB
- Logs available in ./logs directory
-
Flower (Celery monitoring)
- Port: 5555
- Web interface for monitoring Celery tasks
- Access at http://localhost:5555
Start all services:
docker compose up -dImports a flat file of GeoBlacklight OpenGeoMetadata Aardvark test fixture data:
cd data
psql -h localhost -p 2345 -U postgres -d btaa_geospatial_api -f btaa_geospatial_api.txtRun the API server:
uvicorn main:app --reloadThis script will create all the database tables needed for the application.
.venv/bin/python run_migrations.pyThis script will populate the item_relationships triplestore.
.venv/bin/python scripts/populate_relationships.pyThis script will create and populate the application ES index.
.venv/bin/python run_index.pyThis script will download and import all the gazetteer data.
.venv/bin/python run_gazetteers.pyThe application is also available as a Docker image on Docker Hub. You can pull and run the image using the following commands:
docker pull ewlarson/btaa-geospatial-api:latest
docker run -d -p 8000:8000 ewlarson/btaa-geospatial-api:latestThis will start the API server on port 8000.
Returns the API documentation.
The API supports aggressive Redis-based caching to improve performance. Caching can be controlled through environment variables:
# Enable/disable caching
ENDPOINT_CACHE=true
# Redis connection settings
REDIS_HOST=redis
REDIS_PORT=6379
REDIS_PASSWORD=optional_password
REDIS_DB=0
# Cache TTL settings (in seconds)
DOCUMENT_CACHE_TTL=86400 # 24 hours
SEARCH_CACHE_TTL=3600 # 1 hour
SUGGEST_CACHE_TTL=7200 # 2 hours
LIST_CACHE_TTL=43200 # 12 hours
CACHE_TTL=43200 # Default TTL (12 hours)
When caching is enabled:
- API responses are cached in Redis based on the endpoint and its parameters
- Search results are cached for faster repeated queries
- Resource details are cached to reduce database load
- Suggestions are cached to improve autocomplete performance
The cache is automatically invalidated when:
- Resources are created, updated, or deleted
- The Elasticsearch index is rebuilt
You can manually clear the cache using:
GET /api/v1/cache/clear?cache_type=search|resource|suggest|all
The API uses OpenAI's ChatGPT API to generate summaries and identify geographic named entities of historical maps and geographic datasets. To use this feature:
- Set your OpenAI API key in the
.envfile:
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_MODEL=gpt-3.5-turbo
- The summarization service will automatically use this API key to generate summaries.
- The geo_entities service will use the same API to identify and extract geographic named entities from the content.
The API can process various types of assets to enhance summaries:
- IIIF Images: Extracts metadata and visual content from IIIF image services
- IIIF Manifests: Processes IIIF manifests to extract metadata, labels, and descriptions
- Cloud Optimized GeoTIFFs (COG): Extracts geospatial metadata from COG files
- PMTiles: Processes PMTiles assets to extract tile information
- Downloadable Files: Processes various file types (Shapefiles, Geodatabases, etc.)
To generate a summary for a resource:
POST /api/v1/resources/{id}/summarize
This will trigger an asynchronous task to generate a summary. You can retrieve the summary using:
GET /api/v1/resources/{id}/summaries
The API can identify and extract geographic named entities from resources. This includes:
- Place names
- Geographic coordinates
- Administrative boundaries
- Natural features
- Historical place names
To extract geographic entities:
POST /api/v1/resources/{id}/extract_entities
The response will include:
- Extracted entities with confidence scores
- Geographic coordinates when available
- Links to gazetteer entries
- Historical context when relevant
The following AI features are planned:
- Metadata summaries
- Imagery summaries
- Tabular data summaries
- OCR text extraction
- Subject enhancements
Data from BTAA GIN. @TODO add license.
Data from GeoNames. License - CC BY 4.0
Data from FAST (Faceted Application of Subject Terminology) which is made available by OCLC Online Computer Library Center, Inc. under the License - ODC Attribution License.
Data from Who's On First. License
- Docker Image - Published on Docker Hub
- Search - basic search across all text fields
- Search - autocomplete
- Search - spelling suggestions
- Search - more complex search with filters
- Search - pagination
- Search - sorting
- Search - basic faceting
- Performance - Redis caching
- Search - facet include/exclude
- Search - facet alpha and numerical pagination, and search within facets
- Search - advanced/fielded search
- Search - spatial search
- Search Results - thumbnail images (needs improvements)
- Search Results - bookmarked resources
- Item View - citations
- Item View - downloads
- Item View - relations (triplestore)
- Item View - exports (Shapefile, CSV, GeoJSON)
- Item View - export conversions (Shapefile to: GeoJSON, CSV, TSV, etc)
- Item View - code previews (Py, R, Leaflet)
- Item View - embeds
- Item View - allmaps integration (via embeds)
- Item View - data dictionaries
- Item View - web services
- Item View - metadata
- Item View - related resources (vector metadata search)
- Item View - similar images (vector imagery search)
- Collection View
- Place View
- Gazetteer - BTAA Spatial
- Gazetteer - Geonames
- Gazetteer - OCLC Fast (Geographic)
- Gazetteer - Who's on First
- Gazetteer - USGS Geographic Names Information System (GNIS), needed?
- GeoJSONs
- AI - Metadata summaries
- AI - Geographic entity extraction
- AI - Subject enhancements
- AI - Imagery - Summary
- AI - Imagery - OCR'd text
- AI - Tabular data summaries
- API - Analytics (PostHog?)
- API - Authentication/Authorization for "Admin" endpoints
- API - Throttling
- Heirarchical Faceting > Spatial, ex: https://geo.btaa.org/catalog/p16022coll230:1750
