Dockerized papyri.info stack.
Clone with:
git clone --recurse-submodules https://github.com/dcthree/papyri-docker
First, you need to obtain a GitHub Personal Access Token with package registry permissions (see "Creating a personal access token"), and set it as the environment variable GITHUB_TOKEN for the docker compose process. You'll also need to set the environment variable GITHUB_USERNAME to your GitHub username. There are a variety of ways you can set these environment variables, including using an unversioned .env file in the directory where you've cloned this repository. These environment variables must be available for the navigator container to successfully build packages.
Then, from this repository's directory:
docker compose builddocker compose up -d- Watch logs in a separate terminal in the same directory:
docker compose logs -f -t - If all is successful, you should be able to access the running copy once
httpdcomes up at: http://127.0.0.1:8000
- Disk Space: after bringing up a complete stack, my
docker system dfshows 40GB of images, 1GB of containers, and 26GB of volumes (67GB total). You may need to increase the default disk allocation if you're running e.g. Docker for Mac. - Network Port: if another service is already bound to port 8000,
httpdwill fail to come up. If this happens, you can just stop the other service and rundocker compose up -dagain. - Memory: I have 16GB of RAM, 1GB of swap, and 6 VCPUs allocated to Docker. Bringing this up makes my system quite slow...
- Exeption Logging: to see full Rails logs from the
sosolcontainer in Docker logs, setRAILS_LOG_TO_STDOUT=true - Initial Indexing: if something goes wrong with the indexing process, you may need to use
docker compose up -d --force-recreatewhen re-trying. - Docker Compose Timeout: the default Docker Compose HTTP timeout of 60 seconds can sometimes cause problems with
docker compose up/docker compose stop, due to the delay in responsiveness of some services. If you run into this, prefix the commands with e.g.COMPOSE_HTTP_TIMEOUT=10000. - GitHub Maven Package Registry Auth: if you get a
401 Unauthorizederror from trying to builddispatch.warorsync.warwhen you rundocker compose up navigator, you may have an invalid GitHub Personal Access Token (basic) due to token expiration or invalid scope. Try using a new token following the instructions above. - Want to start over from scratch?: run
docker compose down -v.
httpd: Apache 2.2 server, proxies the Navigator, Editor, XSugar, and Fusekiindexer: container that runs the indexing process using the below servicesnavigator: the main "Papyrological Navigator" serverfuseki: Apache Jena Fuseki 1.x SPARQL Server (aka "Numbers Server")tomcat-pn: Tomcat server runing "dispatch" and "sync" servletssolr: Tomcat server running Apache Solr for search
sosol: Puma server serving the Rails Editor (aka "SoSOL") applicationxsugar: container that runs XSugar, an XML transformer used bysosol
postgres: PostgreSQL 13 server, shared bysosol, andtomcat-pnrepo_clone: shared Git checkout of the large mainidp.datarepository, shared bynavigator,fuseki,tomcat-pn,sosol, &httpd
The papyri.info "Top Level Data Flow" diagram may help with understanding:
Services get started in the following order:
ppostgres: no service/startup dependenciesfuseki: no service/startup dependenciesxsugar: no service/startup dependenciesrepo_clone: no service/startup dependencies, clonescanonicalnavigator: oncecanonicalis cloned andfusekiis up, sets config forsolr, builds WAR files fortomcat-pn, runs "mapping" which loads data intofusekisolr: once solr config (/opt/solr/server/solr/solr.xml.lock) is in place, written bynavigatorindexer: oncefusekiandsolrare up andmappingis done, runs "indexing" which loads data intosolrtomcat-pn: once WAR files are built bynavigatorand "mapping" is donesosol: oncecanonicalis cloned andmysqlis available, though some functionality depends onfuseki(as well as "mapping" fromnavigator) andxsugarhttpd: once/srv/data/papyri.info/git/navigator/pn-config/pi.confis in place and the proxied servicessosol,xsugar,tomcat-pn,fuseki, andsolrare available, Apache is started up ashttpd
Service startup order is important, and the current docker-compose.yml uses several strategies to control it:
wait-for-it.shused to wait for network service availability;indexeruses it to wait forsolrstartup,sosoluses it to wait formysqlstartup- lockfiles on shared volumes are used to enforce processes that only need to run once only running once; these lockfiles are also sometimes used as a wait signal for containers that need the process to finish before they can run (these busy-wait until the lockfile exists)
Some containers also use links and depends_on clauses, but these are no longer relied upon to enforce startup order.
You may note that we have some containers which run as continuous servers, and others which are containerized processes for building artifacts needed by those servers. Categorizing them may be useful:
Servers:
httpfusekisolrsosoltomcat-pnmysqlxsugar
Processes:
repo_clonenavigatorindexer
