Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
qredwoods
left a comment
There was a problem hiding this comment.
Thanks Santosh! Defer to Anna on approval, but the condensing lines into a structure we iterate over like in line 69 LGTM and is a good style/prep for expansion, expect pattern will recur many times across project I think since I'm guessing there was a history of adding one data set at a time.
Tangent - (and sorry if my terms are blurry here) - Maybe we can generalize that to a style guideline to encourage it specifically around when handling that expected to grow trio - there was also discussion in the case of a prior issue about trying to abstract at a higher level of things where liquefaction, soft_story, and tsunami were swapped in. In the future I think @agennadi we could look at if there's a design possible that minimizes the need to add an nth thing in every place in the code when an nth data layer is added by maybe making it a single reference that automatically goes through all the layers/sets. Maybe the principle is "when something touches all of the data layers (ie tsunami, soft-story AND liquefaction), take advantage of opportunities to limit repetition of code" as Santosh has done here.
|
I saw Anna's thumbs up on my comment above -- if that means @agennadi you support PR approval I can go ahead and approve this or you also can directly. |
|
(Also to follow up on my point above about standardizing that we always try to combine the layers -- in tonight's meeting, I was successfully convinced it doesn't make sense to do that. Anna pointed out that in other parts of the code the treatment of each dataset is necessarily very distinct due to the differences between datasets -- for example liquefaction and tsunami zones being mapped polygon based, whereas soft-story is address-based data. Therefore when there's code duplication like here or moreso #695, sure, take opportunities to consolidate, but let's not design away the flexibility in pursuit of arbitrary unity, especially as each new layer will bring additional unique traits that require special-case handling. ) |
agennadi
left a comment
There was a problem hiding this comment.
I ran the workflow from a test branch (715/test) and the jobs are failing (check out the logs here: https://github.com/sfbrigade/datasci-earthquake/actions/runs/19606231119).
This happens because each github job has an isolated environment. The pre-etl job installs python and the dependencies but run-etl knows nothing about them and the job fails with uv: command not found error.
post-etl fails with fatal: not a git repository (or any of the parent directories): .git for the same reason - there is no repository checkout in this job.
Thus, each job would require its own setup and/or checkout. Otherwise, we could give up the matrices and go with bash parallel execution using & - what do you think?
Description
To palatalize the ETL steps, i've moved before and after it into their own steps and made the actual etl step a matrix strategy.
Type of changes
Testing
How to test
testing description here: i.e. run app, go to x page, see that it does y
Clean commits
¹ described here