Krawler based jobs to scrape various data related to administrative entities.
This job relies on:
- osmium to extract administrative boundaries at different level from OSM pbf files,
- ogr2ogr to generate sequential GeoJson files to handle large datasets,
- mapshaper to simplify complex geometries,
- tippecanoe to generate MBTiles,
- turfjs to compute the position of toponyms.
Important
osmium, ogr, mapshaper and tippecanoe command-line tools must be installed on your system.
To setup the regions to process, you must export the environment variables REGIONS with the GeoFabrik regions. For instance:
export REGIONS="europe/france;europe/albania"If you'd like to simplify geometries you can setup the simplification tolerance and algorithm:
export SIMPLIFICATION_TOLERANCE=500 # defaults to 128
export SIMPLIFICATION_ALGORITHM=visvalingam # defaults to 'db'Note
The given simplification tolerance will be scaled according to administrative level using this formula:
tolerance at level N = tolerance / 2^(N-2)
If you only need some languages for i18n properties (name, alt_name and official_name) in database you can select the target languages like this:
export LANGUAGES="en;fr"For testing purpose you can also limit the processed administrative levels using the MIN_LEVEL/MAX_LEVEL environment variables.
To generate the whole planet use continent extracts like this first to launch the osm-boundaries job from level 3 to 8:
export REGIONS="africa;asia;australia-oceania;central-america;europe;north-america;south-america"As large files are generated for e.g. Europe you might have to increase the default NodeJS memory limit:
export NODE_OPTIONS=--max-old-space-size=8192Then, launch the osm-planet-boundaries job for level 2, which uses a planet extract, and planet MBTiles generation. Indeed, country-level (i.e. administrative level 2) requires a whole planet file to avoid missing relation between continental and islands areas.
Last but not least, launch the generate-osm-boundaries-mbtiles.sh script to generate a MBTiles file from GeoJson files produced by the job or generate-osm-boundaries-gpkg.sh script to generate a GPKG file.
Important
GPKG generation requires the ogrmerge tool to be installed, if you are using an unstable debian version you can do this, which is not recommanded for a stable version:
sudo nano /etc/apt/sources.list
# Edit file and add this line
deb http://deb.debian.org/debian/ unstable main contrib non-free
# Then install GDAL dev version to get ogrmerge
sudo apt update
sudo apt-get install libgdal-devTo avoid generating data multiple times you can easily dump/restore it from/to MongoDB databases:
mongodump --host=localhost --port=27017 --username=user --password=password --db=atlas --collection=osm-boundaries --gzip --out dump
mongorestore --db=atlas --gzip --host=mongodb.example.net --port=27018 --username=user --password=password --collection=osm-boundaries dump/atlas/osm-boundaries.bson.gzIf you dump the collection first recreate required indices to speed-up data access with the Mongo shell like:
db['admin-express'].createIndexes([{ geometry: '2dsphere' }, { 'properties.layer': 1 }, { geometry: '2dsphere', 'properties.layer': 1 }])
db['osm-boundaries'].createIndexes([{ geometry: '2dsphere' }, { 'properties.admin_level': 1 }, { geometry: '2dsphere', 'properties.admin_level': 1 }])
This job relies on archive shape files from IGN and the mapshaper and 7z tools.
https://geoservices.ign.fr/documentation/diffusion/telechargement-donnees-libres.html#admin-express
The job has been updated to include French Polynesia data in the admin-express dataset.
The script requires the following tools:
mapshaper(can be installed withnpm install -g mapshaper)wget,7z,tippecanoeandtile-join. All of those can probably be found as packages in your favorite distribution (apt install 7z wget tippecanoe).
French Polynesia data is fetched from here and here for the INSEE codes.
The script reprojects and patches the French Polynesia data to match the Admin Express schema, adding properties like INSEE_COM, INSEE_DEP, NOM, NOM_M and POPULATION to it. It then merges with the Admin Express dataset to build the final mbtiles.
This job relies on archive shape files from IGN and the mapshaper and 7z tools.
https://geoservices.ign.fr/documentation/diffusion/telechargement-donnees-libres.html#bdpr
This job relies on :
- 7z to extract archive shape files from eaufrance
- mapshaper to convert shape files to GeoJson
- tippecanoe to generate MBTiles (optional)
BDTOPAGE contains hydrographic data for the french territory and includes the following layers:
- Watercourses
- Water bodies
- Elementary surfaces
- Hydrographic sections
- Hydrographic nodes
- Land-sea boundaries
- Hydrographic basins (medium scale)
- Topographic watershed (medium scale)
To select specific layers to process you can setup the TOPAGE_LAYERS environment variable with a comma-separated list of <layer_name>=<output_name> entries like this:
export TOPAGE_LAYERS="BassinHydrographique_FXX=hydrographic-basins,BassinVersantTopographique_FXX=topographic-watersheds"Run the job with:
krawler --jobfile ../k-atlas/jobfile-bdtopage.js
bash script to generate mbtiles from bdtopage data:
./generate-bdtopage-mbtiles.sh <output_dir> <geojson_file> <layer_name(optional)>Two directories will be created :
bdtopage-output/geojson/containing one geojson file per layerbdtopage-output/mbtiles/containing one mbtiles file per layerbdtopage-output/shapefiles/containing the original shapefiles unzippedbdtopage-workdir/temporary working directory
To debug you can run this command from a local krawler install node --inspect . ../k-atlas/jobfile-bdpr.js.
To run it on the infrastructure we use Docker images based on the provided Docker files, if you'd like to test it manually you can clone the repo then do:
docker build --build-arg KRAWLER_TAG=latest -f dockerfile.bdpr -t k-atlas/bdpr-latest .
docker run --name bdpr --network=host --rm -e S3_ACCESS_KEY -e S3_SECRET_ACCESS_KEY -e S3_ENDPOINT -e S3_BUCKET -e "DEBUG=krawler*" k-atlas:bdpr-latest
Please refer to contribution section for more details.
Licensed under the MIT license.
Copyright (c) 2017-20xx Kalisio
