-
Notifications
You must be signed in to change notification settings - Fork 11
finshed the lab #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've identified a few problems that were preventing the script from running, then some further problems with failed tests. Fix the problems noted; use the "Source with echo" button to make sure the script runs; and then use testthat
to identify further issues before you resubmit.
#' | ||
|
||
|
||
#I dont know how to do this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A plotly approach: Generate the static plots as above. Then call plotly::ggplotly()
to generate an interactive version. In RStudio that will show up in the lower-rate panel (the same one as static plots and help files). Navigate that to identify which counties have these giant swings.
A filter approach: Look at the static plot to find a good threshold for "large" swings. Say, abs(cases_per_pop)
larger than 1000. Write a filter for these large swings, then count the number of rows by county to figure out which counties.
Next, filter down to those counties, then check things like the population.
lab.R
Outdated
# ggplot(covid_df, aes(---, ---, group = ---)) + | ||
# geom_line() | ||
ggplot(covid_df, aes(date, cases_per_pop, group = county)) + | ||
geom_line() + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this hanging +
, R tries to combine this ggplot object with the thing next (the ggplot object on line 137). It can't do this so the lab script fails before running the tests. This is why testthat::test_dir('tests')
is failing.
geom_line() + | |
geom_line() |
lab.R
Outdated
date<='2020-07-30') %>% | ||
group_by(county, fips) %>% | ||
summarize(cases_per_pop=sum(cases_per_pop)) %>% | ||
ungroup() %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hanging pipe here is also causing the script to fail when it's run.
ungroup() %>% | |
ungroup() |
lab.R
Outdated
date>='2020-06-01', | ||
date<='2020-06-30') %>% | ||
group_by(county, fips) %>% | ||
summarize(parks=mean(pct_diff, rm.na =TRUE)) %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo. So mean()
is not ignoring NAs and you end up filtering out more counties than intended.
summarize(parks=mean(pct_diff, rm.na =TRUE)) %>% | |
summarize(parks=mean(pct_diff, na.rm =TRUE)) %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the only remaining issue is the filter in cases_july
|
||
cases_july = covid_df %>% | ||
filter(date>='2020-07-01', | ||
date<='2020-07-30') %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double check the number of days in July
date<='2020-07-30') %>% | |
date<='2020-07-31') %>% |
No description provided.