Skip to content

Commit f78c874

Browse files
committed
source commit: 0208327
0 parents  commit f78c874

19 files changed

+1708
-0
lines changed

01-introduction.md

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
---
2+
title: Introduction
3+
teaching: 10
4+
exercises: 0
5+
---
6+
7+
::::::::::::::::::::::::::::::::::::::: objectives
8+
9+
- Describe OpenRefine’s uses and applications.
10+
- Differentiate data cleaning from data organization.
11+
- Experiment with OpenRefine’s user interface.
12+
13+
::::::::::::::::::::::::::::::::::::::::::::::::::
14+
15+
:::::::::::::::::::::::::::::::::::::::: questions
16+
17+
- What is OpenRefine useful for?
18+
19+
::::::::::::::::::::::::::::::::::::::::::::::::::
20+
21+
::::: instructor
22+
23+
## Please help improve this page
24+
25+
There are several issues related to this section of the lesson:
26+
27+
- [it does not explain the difference between data cleaning and data organisation (#56)][issue-56]
28+
- [the contents do not match the objectives (#86)][issue-86]
29+
- [it does not explain when (not) to use OpenRefine (#103)][issue-103]
30+
- [the Other Resources section needs refinement (#172)][issue-172]
31+
32+
[issue-56]: https://github.com/datacarpentry/openrefine-socialsci/issues/56
33+
[issue-86]: https://github.com/datacarpentry/openrefine-socialsci/issues/86
34+
[issue-103]: https://github.com/datacarpentry/openrefine-socialsci/issues/103
35+
[issue-172]: https://github.com/datacarpentry/openrefine-socialsci/issues/172
36+
37+
Your input on these issues would be much appreciated!
38+
39+
::::::::::::::::
40+
41+
## Motivations for the OpenRefine Lesson
42+
43+
- Data is often very messy. OpenRefine provides a set of tools to allow you to
44+
identify and amend the messy data.
45+
- It is important to know what you did to your data. Additionally, journals,
46+
granting agencies, and other institutions are requiring documentation of the
47+
steps you took when working with your data. With OpenRefine, you can capture
48+
all actions applied to your raw data and share them with your publication as
49+
supplemental material.
50+
- All actions are easily reversed in OpenRefine.
51+
- If you save your work it will be to a new file. OpenRefine always uses a copy
52+
of your data and *does not* modify your original dataset.
53+
- Data cleaning steps often need repeating with multiple files. OpenRefine
54+
keeps track of all of your actions and allows them to be applied to different datasets.
55+
- Some concepts such as clustering algorithms are quite complex, but OpenRefine
56+
makes it easy to introduce them, use them, and show their power.
57+
58+
## Features
59+
60+
- Open source ([source on GitHub](https://github.com/OpenRefine/OpenRefine)).
61+
- A large growing community, from novice to expert, ready to help. See Getting
62+
Help section below.
63+
- Works with large-ish datasets (100,000 rows). Can adjust memory allocation to
64+
accommodate larger datasets.
65+
- OpenRefine always keeps your data private on your own computer until you
66+
choose to share it. It works by running a small server on your computer and
67+
using your web browser to interact with it, but your private data never
68+
leaves your computer unless you want it to.
69+
70+
71+
::: instructor
72+
73+
### Data privacy when using APIs or reconciliation
74+
75+
Most functionality does not require an Internet connection and keeps your data
76+
within the computer.
77+
Some functions, however, like looking up data from URLs or reconciling values
78+
in your dataset with online services, necessarily require that data is sent to
79+
the online services.
80+
While this lesson does not cover these functions, it may be important to know
81+
how data could be shared with outside parties, especially if you work with
82+
sensitive or confidential data.
83+
84+
::::::::::::::
85+
86+
## Before we get started
87+
88+
Note: this is a Java program that runs on your machine (not in the cloud). It
89+
runs inside your browser, but no web connection is needed.
90+
91+
Follow the [Setup](../learners/setup.md) instructions to install OpenRefine.
92+
93+
If after installation and running OpenRefine, it does not automatically open
94+
for you, point your browser at [http://127.0.0.1:3333/](https://127.0.0.1:3333/)
95+
or [http://localhost:3333](https://localhost:3333) to launch the program.
96+
97+
98+
99+
::: instructor
100+
101+
### Zooming hides buttons
102+
103+
OpenRefine is used through its graphical user interface in this lesson.
104+
In classroom settings or in online classes, you probably want to zoom in on the
105+
interface so that text is readable to all.
106+
However, when you zoom in, some controls may fall outside the view.
107+
Dialog windows in OpenRefine cannot be dragged, so the only way to show buttons
108+
that were outside the view is to zoom out again.
109+
110+
If you are planning to teach this lesson to a big room, you may want to check
111+
if the main projector screen or monitor is large enough to show all of the
112+
user interface while having the text large enough that all learners can see it.
113+
114+
::::::::::::::
115+
116+
:::::::::::::::::::::::::::::::::::::::: keypoints
117+
118+
- OpenRefine is a powerful, free, and open source tool that can be used for data cleaning.
119+
- OpenRefine will automatically track any steps allowing you to backtrack as needed and providing a record of all work done.
120+
121+
::::::::::::::::::::::::::::::::::::::::::::::::::
122+
123+

0 commit comments

Comments
 (0)