Skip to content

add tasks back to MoH KE #157

@witash

Description

@witash

Describe the issue
MoH KE provisioned 1TB storage for the postgres database, in a postgres managed by kubernetes
The persistent volume for this cannot be easily resized.
1TB was not enough to store the entire postgres database, and the initial dbt run to populate the database was failing.
To get the dashboards working as quickly as possible, all documents with type 'task' were deleted.
Since tasks accounted for 2/3 of the storage usage (before material views), this freed up enough space to allow the initial dbt run to complete
However, tasks are still useful for analytics and debugging, and deleting them is not a permanent solution.

Describe the improvement you'd like

  1. When all the materialized views are finished, calculate exact storage requirements by adding a buffer of 1/2 (for temp tables and later additions) to the storage usage when everything is finished
  2. Create a new postgres instance with at least this much storage; either it could be a new node in the cluster, or, if the external postgres has enough free storage, switch to using that
  3. Create a new cht-sync instance using that postgres. It will take some time (about one week) to do the initial run, so leave the old one running.
  4. when it is finished, switch dashboards to the new postgres and stop and delete the old cht-sync

Describe alternatives you've considered
There's other ways to get the old tasks, I think this will be the simplest overall.
We could try and save time by pgdump/restore the current database, but it then gets complicated with how merge the old tasks

  1. To save time for the inital dbt run, pgdump and restore to the new postgres, excluding the source table. This may take a long time but cht-sync can keep updating while that is running.
  2. in the new database, truncate couch2pg_progress
  3. Run a new instance of cht-sync, without dbt, poitning to the new database. This will sync
  4. when its finished (1-3 days), stop it and restore couch2pg_progress from the saved table
  5. Switch the normal cht-sync to the new postgres. This will cause dashboards to be out of date again; it's starting over from where it was at step 4. That should be only a few days (at most week) of data so it should catch up quickly. couch2pg will try to sync some tasks that are already in the db; it will update saved_timestamp and treat them like updates, but it will not create duplicates.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions