- 👨🏻💻 Author: Anderson H Uyekita
- 📚 Specialization: Data Science: Foundations using R Specialization
- 📖 Course:
Exploratory Data Analysis
- 🧑🏫 Instructor: Roger D Peng
- 📆 Week 4
- 🚦 Start: Wednesday, 15 June 2022
- 🏁 Finish: Sunday, 19 June 2022
- 🌎 Rpubs: Interactive Document
- 📋 Instructions: Project Instructions
- 📄 README: README.md
- Requirements
- Scripts
- Output and Input Details
It is necessary to install the following packages to reproduce this Course Project 2.
ggplot2tidyversemagrittrcowplot
Also, those following scripts will use the standard packages from R.
basegraphicutilsgrDevices
It is mandatory to have access to the internet if you are willing to reproduce it.
This Course Project 2 required the development of 6 scripts to plot and export six png files.
- Create the folder
datato keep all files downloaded in it; - Download the zipped file (FNEI_data.zip) using
download.file(); - If the raw data is already downloaded, the script will not download any file;
- Unzipping the
FNEI_data.zipinto thedatafolder; - Loading the
summarySCC_PM25.rdsfile usingbase::readRDSand naming it asNEI; - Loading the
Source_Classification_Code.rdsfile usingbase::readRDSand naming it asSCC; - Summarizing the NEI dataset grouping by year and then calculating the Total emissions of PM2.5;
- Plotting a barplot using the base graphic system;
- Saving the plot as a
pngfile;
- Creating the folder
datato keep all files downloaded in it; - Downloading the zipped file (FNEI_data.zip) using
download.file(); - If the raw data is already downloaded, the script will not download any file;
- Unzipping the
FNEI_data.zipinto thedatafolder; - Loading the
summarySCC_PM25.rdsfile usingbase::readRDSand naming it asNEI; - Loading the
Source_Classification_Code.rdsfile usingbase::readRDSand naming it asSCC; - Subsetting the NEI dataset to filter only fips equal to 24510 (Baltimore City);
- Summarizing the previous dataset grouping by year and then calculating the Total emissions of PM2.5;
- Plotting a line plot using the base graphic system;
- Saving the plot as a
pngfile;
- Creating the folder
datato keep all files downloaded in it; - Downloading the zipped file (FNEI_data.zip) using
download.file(); - If the raw data is already downloaded, the script will not download any file;
- Unzipping the
FNEI_data.zipinto thedatafolder; - Loading the
summarySCC_PM25.rdsfile usingbase::readRDSand naming it asNEI; - Loading the
Source_Classification_Code.rdsfile usingbase::readRDSand naming it asSCC; - Subsetting the NEI dataset to filter only fips equal to 24510 (Baltimore City);
- Summarizing the previous dataset grouping by year and then calculating the Total emissions of PM2.5;
- Plotting a line plot using the
ggplot2package; - Setting the aesthetic from geom_line as
color = type; - Saving the plot as a
pngfile;
- Creating the folder
datato keep all files downloaded in it; - Downloading the zipped file (FNEI_data.zip) using
download.file(); - If the raw data is already downloaded, the script will not download any file;
- Unzipping the
FNEI_data.zipinto thedatafolder; - Loading the
summarySCC_PM25.rdsfile usingbase::readRDSand naming it asNEI; - Loading the
Source_Classification_Code.rdsfile usingbase::readRDSand naming it asSCC; - Subsetting the SCC dataset to filter the Coal source of PM2.5 found in EI.Sector variable;
- Subsetting the NEI dataset to filter only the given SCC from the previous dataset;
- Merging both previous datasets to create a new column about EI.Sector;
- Summarizing the previous dataset grouping by year and EI.The sector then calculating the Total emissions of PM2.5;
- Plotting a stacked bar plot using the
ggplot2package; - Setting the geom_bar to have
position = "stack"; - Saving the plot as a
pngfile;
- Creating the folder
datato keep all files downloaded in it; - Downloading the zipped file (FNEI_data.zip) using
download.file(); - If the raw data is already downloaded, the script will not download any file;
- Unzipping the
FNEI_data.zipinto thedatafolder; - Loading the
summarySCC_PM25.rdsfile usingbase::readRDSand naming it asNEI; - Loading the
Source_Classification_Code.rdsfile usingbase::readRDSand naming it asSCC; - Subsetting the NEI dataset to filter only fips equal to 24510 (Baltimore City) and type equal to ON-ROAD;
- Summarizing the previous dataset grouping by year and then calculating the Total emissions of PM2.5;
- Plotting a bar plot using the
ggplot2package; - Saving the plot as a
pngfile;
- Creating the folder
datato keep all files downloaded in it; - Downloading the zipped file (FNEI_data.zip) using
download.file(); - If the raw data is already downloaded, the script will not download any file;
- Unzipping the
FNEI_data.zipinto thedatafolder; - Loading the
summarySCC_PM25.rdsfile usingbase::readRDSand naming it asNEI; - Loading the
Source_Classification_Code.rdsfile usingbase::readRDSand naming it asSCC; - Subsetting the NEI dataset to filter only fips equal to 24510 or 06037 (Baltimore City and Los Angeles County) and type equal to ON-ROAD;
- Creating an auxiliary dataset to store the fips and city;
- Merging both previous datasets to create a new column about the city;
- Summarizing the previous dataset grouping by year and then calculating the Total emissions of PM2.5;
- Plotting a bar plot using the
ggplot2package; - Saving the plot as a
pngfile;
Here will be described some aspects of output and inputs.
Course Project 2 will only use one zipped file, the FNEI_data.zip. It
has 2 (two) .rds compressed files:
summarySCC_PM25.rds, and;Source_Classification_Code.rds.
The data from summarySCC_PM25.rds is already tidy, and following the
assignment instruction, I have named this object NEI.
- Size: 732,232,328 Bytes (698.3112 Megabytes)
- Rows: 6,497,651, and;
- Columns: 6.
I have calculated the object size in Bytes using the object_size()
from the pryr package.
The columns variables in the NEI dataset: fips, SCC, Pollutant, Emissions, type, year
Head
| fips | SCC | Pollutant | Emissions | type | year | |
|---|---|---|---|---|---|---|
| 4 | 09001 | 10100401 | PM25-PRI | 15.714 | POINT | 1999 |
| 8 | 09001 | 10100404 | PM25-PRI | 234.178 | POINT | 1999 |
| 12 | 09001 | 10100501 | PM25-PRI | 0.128 | POINT | 1999 |
| 16 | 09001 | 10200401 | PM25-PRI | 2.036 | POINT | 1999 |
| 20 | 09001 | 10200504 | PM25-PRI | 0.388 | POINT | 1999 |
| 24 | 09001 | 10200602 | PM25-PRI | 1.490 | POINT | 1999 |
Strucuture
## 'data.frame': 6497651 obs. of 6 variables:
## $ fips : chr "09001" "09001" "09001" "09001" ...
## $ SCC : chr "10100401" "10100404" "10100501" "10200401" ...
## $ Pollutant: chr "PM25-PRI" "PM25-PRI" "PM25-PRI" "PM25-PRI" ...
## $ Emissions: num 15.714 234.178 0.128 2.036 0.388 ...
## $ type : chr "POINT" "POINT" "POINT" "POINT" ...
## $ year : int 1999 1999 1999 1999 1999 1999 1999 1999 1999 1999 ...
The data from Source_Classification_Code.rds is already tidy.
Following the assignment instruction, I have named this object SCC.
- Size: 3,983,496 Bytes (3.798958 Megabytes)
- Rows: 11,717, and;
- Columns: 15.
I have calculated the object size in Bytes using the object_size()
from the pryr package.
The columns variables in the SCC dataset: SCC, Data.Category, Short.Name, EI.Sector, Option.Group, Option.Set, SCC.Level.One, SCC.Level.Two, SCC.Level.Three, SCC.Level.Four, Map.To, Last.Inventory.Year, Created_Date, Revised_Date, Usage.Notes
Head
| SCC | Data.Category | Short.Name | EI.Sector | Option.Group | Option.Set | SCC.Level.One | SCC.Level.Two | SCC.Level.Three | SCC.Level.Four | Map.To | Last.Inventory.Year | Created_Date | Revised_Date | Usage.Notes |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 10100101 | Point | Ext Comb /Electric Gen /Anthracite Coal /Pulverized Coal | Fuel Comb - Electric Generation - Coal | External Combustion Boilers | Electric Generation | Anthracite Coal | Pulverized Coal | NA | NA | |||||
| 10100102 | Point | Ext Comb /Electric Gen /Anthracite Coal /Traveling Grate (Overfeed) Stoker | Fuel Comb - Electric Generation - Coal | External Combustion Boilers | Electric Generation | Anthracite Coal | Traveling Grate (Overfeed) Stoker | NA | NA | |||||
| 10100201 | Point | Ext Comb /Electric Gen /Bituminous Coal /Pulverized Coal: Wet Bottom | Fuel Comb - Electric Generation - Coal | External Combustion Boilers | Electric Generation | Bituminous/Subbituminous Coal | Pulverized Coal: Wet Bottom (Bituminous Coal) | NA | NA | |||||
| 10100202 | Point | Ext Comb /Electric Gen /Bituminous Coal /Pulverized Coal: Dry Bottom | Fuel Comb - Electric Generation - Coal | External Combustion Boilers | Electric Generation | Bituminous/Subbituminous Coal | Pulverized Coal: Dry Bottom (Bituminous Coal) | NA | NA | |||||
| 10100203 | Point | Ext Comb /Electric Gen /Bituminous Coal /Cyclone Furnace | Fuel Comb - Electric Generation - Coal | External Combustion Boilers | Electric Generation | Bituminous/Subbituminous Coal | Cyclone Furnace (Bituminous Coal) | NA | NA | |||||
| 10100204 | Point | Ext Comb /Electric Gen /Bituminous Coal /Spreader Stoker | Fuel Comb - Electric Generation - Coal | External Combustion Boilers | Electric Generation | Bituminous/Subbituminous Coal | Spreader Stoker (Bituminous Coal) | NA | NA |
Strucuture
## 'data.frame': 11717 obs. of 15 variables:
## $ SCC : Factor w/ 11717 levels "10100101","10100102",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ Data.Category : Factor w/ 6 levels "Biogenic","Event",..: 6 6 6 6 6 6 6 6 6 6 ...
## $ Short.Name : Factor w/ 11238 levels "","2,4-D Salts and Esters Prod /Process Vents, 2,4-D Recovery: Filtration",..: 3283 3284 3293 3291 3290 3294 3295 3296 3292 3289 ...
## $ EI.Sector : Factor w/ 59 levels "Agriculture - Crops & Livestock Dust",..: 18 18 18 18 18 18 18 18 18 18 ...
## $ Option.Group : Factor w/ 25 levels "","C/I Kerosene",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Option.Set : Factor w/ 18 levels "","A","B","B1A",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ SCC.Level.One : Factor w/ 17 levels "Brick Kilns",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ SCC.Level.Two : Factor w/ 146 levels "","Agricultural Chemicals Production",..: 32 32 32 32 32 32 32 32 32 32 ...
## $ SCC.Level.Three : Factor w/ 1061 levels "","100% Biosolids (e.g., sewage sludge, manure, mixtures of these matls)",..: 88 88 156 156 156 156 156 156 156 156 ...
## $ SCC.Level.Four : Factor w/ 6084 levels "","(NH4)2 SO4 Acid Bath System and Evaporator",..: 4455 5583 4466 4458 1341 5246 5584 5983 4461 776 ...
## $ Map.To : num NA NA NA NA NA NA NA NA NA NA ...
## $ Last.Inventory.Year: int NA NA NA NA NA NA NA NA NA NA ...
## $ Created_Date : Factor w/ 57 levels "","1/27/2000 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Revised_Date : Factor w/ 44 levels "","1/27/2000 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Usage.Notes : Factor w/ 21 levels ""," ","includes bleaching towers, washer hoods, filtrate tanks, vacuum pump exhausts",..: 1 1 1 1 1 1 1 1 1 1 ...
The results of this assignment are 6 (six) files .R and 6 (six) files
png.
EI.Sector
There are 59 types of EI.Sector:
## [1] Fuel Comb - Electric Generation - Coal
## [2] Fuel Comb - Electric Generation - Oil
## [3] Fuel Comb - Electric Generation - Natural Gas
## [4] Fuel Comb - Electric Generation - Other
## [5] Fuel Comb - Electric Generation - Biomass
## [6] Fuel Comb - Industrial Boilers, ICEs - Coal
## [7] Fuel Comb - Industrial Boilers, ICEs - Oil
## [8] Fuel Comb - Industrial Boilers, ICEs - Natural Gas
## [9] Fuel Comb - Industrial Boilers, ICEs - Other
## [10] Fuel Comb - Industrial Boilers, ICEs - Biomass
## [11] Fuel Comb - Comm/Institutional - Coal
## [12] Fuel Comb - Comm/Institutional - Oil
## [13] Fuel Comb - Comm/Institutional - Natural Gas
## [14] Fuel Comb - Comm/Institutional - Other
## [15] Fuel Comb - Comm/Institutional - Biomass
## [16] Industrial Processes - NEC
## [17] Fuel Comb - Residential - Other
## [18] Fuel Comb - Residential - Oil
## [19] Fuel Comb - Residential - Natural Gas
## [20] Fuel Comb - Residential - Wood
## [21] Mobile - On-Road Gasoline Light Duty Vehicles
## [22] Mobile - On-Road Gasoline Heavy Duty Vehicles
## [23] Mobile - On-Road Diesel Light Duty Vehicles
## [24] Mobile - On-Road Diesel Heavy Duty Vehicles
## [25] Mobile - Non-Road Equipment - Gasoline
## [26] Mobile - Non-Road Equipment - Other
## [27] Mobile - Non-Road Equipment - Diesel
## [28] Mobile - Aircraft
## [29] Mobile - Commercial Marine Vessels
## [30] Mobile - Locomotives
## [31] Dust - Paved Road Dust
## [32] Dust - Unpaved Road Dust
## [33] Industrial Processes - Chemical Manuf
## [34] Commercial Cooking
## [35] Industrial Processes - Non-ferrous Metals
## [36] Industrial Processes - Ferrous Metals
## [37] Industrial Processes - Petroleum Refineries
## [38] Industrial Processes - Oil & Gas Production
## [39] Dust - Construction Dust
## [40] Industrial Processes - Mining
## [41] Solvent - Non-Industrial Surface Coating
## [42] Solvent - Industrial Surface Coating & Solvent Use
## [43] Solvent - Degreasing
## [44] Solvent - Dry Cleaning
## [45] Solvent - Graphic Arts
## [46] Solvent - Consumer & Commercial Solvent Use
## [47] Industrial Processes - Storage and Transfer
## [48] Miscellaneous Non-Industrial NEC
## [49] Bulk Gasoline Terminals
## [50] Gas Stations
## [51] Waste Disposal
## [52] Agriculture - Livestock Waste
## [53] Agriculture - Crops & Livestock Dust
## [54] Fires - Agricultural Field Burning
## [55] Agriculture - Fertilizer Application
## [56] Fires - Wildfires
## [57] Fires - Prescribed Fires
## [58] Industrial Processes - Cement Manuf
## [59] Industrial Processes - Pulp & Paper
## 59 Levels: Agriculture - Crops & Livestock Dust ...
The EI.Sector related to Coal:
## [1] Fuel Comb - Electric Generation - Coal
## [2] Fuel Comb - Industrial Boilers, ICEs - Coal
## [3] Fuel Comb - Comm/Institutional - Coal
## 59 Levels: Agriculture - Crops & Livestock Dust ...