You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Auto-generated via `{sandpaper}`
Source : c085c77
Branch : main
Author : swillerhansen <121032241+swillerhansen@users.noreply.github.com>
Time : 2025-12-02 17:12:12 +0000
Message : Merge pull request #23 from swillerhansen/main
rearranged into more episodes
Copy file name to clipboardExpand all lines: introduction.md
+6-93Lines changed: 6 additions & 93 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,100 +18,13 @@ exercises: 2
18
18
::::::::::::::::::::::::::::::::::::::::::::::::
19
19
20
20
## Introduction
21
+
This is a course in how to use the programming language R to do web scraping. Web scraping is a set of methods to systematically download parts of web pages by writing code in a script instead of copy-pasting from the web page.
21
22
22
-
This is a lesson created via The Carpentries Workbench. It is written in
23
-
[Pandoc-flavored Markdown][pandoc] for static files (with extension `.md`) and
24
-
[R Markdown][r-markdown] for dynamic files that can render code into output
25
-
(with extension `.Rmd`). Please refer to the [Introduction to The Carpentries
26
-
Workbench][carpentries-workbench] for full documentation.
23
+
A web pages is made up of elements in the HTML language.
24
+
HTML is a hierarchical file format
25
+
Insert example of HTML hierarchy here
27
26
28
-
What you need to know is that there are three sections required for a valid
29
-
Carpentries lesson template:
27
+
The advantage with using web scraping is that we can in our script specify which specific parts of the web page that we want to download by referring to those elements' HTML code. This allows for precision in what we download.
30
28
31
-
1.`questions` are displayed at the beginning of the episode to prime the
32
-
learner for the content.
33
-
2.`objectives` are the learning objectives for an episode displayed with
34
-
the questions.
35
-
3.`keypoints` are displayed at the end of the episode to reinforce the
## Challenge 2: how do you nest solutions within challenge blocks?
67
-
68
-
:::::::::::::::::::::::: solution
69
-
70
-
You can add a line with at least three colons and a `solution` tag.
71
-
72
-
:::::::::::::::::::::::::::::::::
73
-
::::::::::::::::::::::::::::::::::::::::::::::::
74
-
75
-
## Figures
76
-
77
-
You can include figures generated from R Markdown:
78
-
79
-
80
-
```r
81
-
pie(
82
-
c(Sky=78, "Sunny side of pyramid"=17, "Shady side of pyramid"=5),
83
-
init.angle=315,
84
-
col= c("deepskyblue", "yellow", "yellow3"),
85
-
border=FALSE
86
-
)
87
-
```
88
-
89
-
<divclass="figure"style="text-align: center">
90
-
<imgsrc="fig/introduction-rendered-pyramid-1.png"alt="pie chart illusion of a pyramid" />
91
-
<pclass="caption">Sun arise each and every morning</p>
92
-
</div>
93
-
Or you can use pandoc markdown for static figures with the following syntax:
94
-
95
-
`{alt='alt text for
96
-
accessibility purposes'}`
97
-
98
-
{alt='Blue Carpentries hex person logo with no text.'}
99
-
100
-
## Math
101
-
102
-
One of our episodes contains $\LaTeX$ equations when describing how to create
103
-
dynamic reports with {knitr}, so we now use mathjax to describe this:
- Use `.md` files for episodes when you want static content
112
-
- Use `.Rmd` files for episodes when you need to generate output
113
-
- Run `sandpaper::check_lesson()` to identify any issues with your lesson
114
-
- Run `sandpaper::build_lesson()` to preview your lesson locally
115
-
116
-
::::::::::::::::::::::::::::::::::::::::::::::::
29
+
Furthermore, as we will do in this course, we can scrape the same element on multiple pages. i.e. instead of opening each page separately and for each page marking the parts of the page that we want to download and copy-pasting it, we can write a script that will scrape the same HTML element on multiple pages. This allows for speed and consistency
0 commit comments