Clean and prepare tidy data with tidyr and dplyr.
- This lesson uses
freadfrom thedata.tablepackage to read in spreadsheet files, rather than the tidyverseread_csvfromreadr. While there is little difference with the 60,000 row ACS table, there is a notable improvement with the >2M row CBP table (0.5 vs 7 seconds). - Lesson should be updated to include pivot_*
functions to replace
gatherandspread. - In
str_detect, the regex[0-9]matches any number,{2}means exactly 2 matches, and the----matches exactly. - In
str_remove, the regex-+matches "1 or more-" - For troubleshooting
select, try specifyingdplyr::selectin case there are conflicting packages loaded.
The National Socio-Environmental Synthesis Center (SESYNC) curates and runs tutorials on using cyberinfrastructure in pursuit of the Center's scientific mission. Visit www.sesync.org to learn more about SESYNC and cyberhelp.sesync.org for more tutorials and ideas.