diff --git a/episodes/01-introduction.md b/episodes/01-introduction.md index 2c6e058c..1c248d97 100644 --- a/episodes/01-introduction.md +++ b/episodes/01-introduction.md @@ -1,6 +1,6 @@ --- title: What is Wikidata? -teaching: 20 +teaching: 10 exercises: 10 --- @@ -32,7 +32,7 @@ Wikidata uses the same speed and similar collaborative editing platform, but for Wikidata contains various data types (e.g. text, images, quantities, coordinates, geographic shapes, dates). The data can be viewed in a web browser, but it can also be queried via a query interface called SPARQL, which we will cover later in this lesson. Data on Wikidata is published under the Creative Commons Public Domain 1.0 license. Thus, the data can be modified, copied, and distributed without permission. -Wikidata also contains authority files, bibliographic data, and other content normally managed in library databases. +Wikidata also contains authority files, bibliographic data, and other content that can similarly be found or managed in library databases. Importantly, Wikidata can be understood as linked open data, which can be [connected](https://www.wikidata.org/wiki/Wikidata:Data_access#How_can_I_get_data_out_of_Wikidata?) to other open data sets on the web. @@ -50,61 +50,85 @@ Wikidata has many features that make it of interest to librarians and knowledge a. *Scholia*: A tool built on top of Wikidata that visualizes scholarly profiles and research outputs, showing the impact of Wikidata in academic and research contexts. Librarians can showcase Scholia as a tangible example of how data in Wikidata is used for research and scholarship. b. *Crosswalks between systems*: Wikidata’s ability to link various identifiers (e.g., connecting ORCID to GND or VIAF) is beneficial for cross-referencing and data cleaning in library management systems. -## 1\.1 Intro interface +## 1\.1 Wikidata interface This section of the lesson introduces the Wikidata interface as it can be seen in a web browser. -Let's see if we as humans can simply read the data on Wikidata: +Let's learn about some of the important elements of how you can read +and interact with the data on Wikidata. -- Explore a Wikidata Item page - - - Start by going to the [Wikidata Main Page](https://www.wikidata.org/wiki/Wikidata:Main_Page) by typing "www.wikidata.org" into your browser. You will see something like this: +- Start by going to the [Wikidata Main Page](https://www.wikidata.org/wiki/Wikidata:Main_Page) by typing "www.wikidata.org" into your browser. You will see something like this: ![](fig/Wikidata_Main_Page.png){alt='Screenshot of the Wikidata main page displaying in a web browser'} *Screenshot of [Wikidata Main Page](https://www.wikidata.org/wiki/Wikidata:Main_Page)* + +## 1\.2 Wikidata Items and Item Pages + +The primary unit of data described on Wikidata are "items." Each item has an item page with a unique identifier designated by the letter `Q` followed by a string of numbers. Let's explore a Wikidata item page, which will also demonstrate the characteristics of items in Wikidata. + +### Explore a Wikidata Item page - - Now go to the search bar in the top right corner and enter "british library". This will give you a list with search results. Click the entry that says: "British Library (Q23308) national library of the United Kingdom". Now you should see the british library's item page: - [https://www.wikidata.org/wiki/Q23308](https://www.wikidata.org/wiki/Q23308) +- Click in the search bar in the top right corner of the main page and enter "british library". As you start typing, you will see a list with search results. Click the entry that says: "British Library (Q23308) national library of the United Kingdom". Now you should see the british library's item page: +[https://www.wikidata.org/wiki/Q23308](https://www.wikidata.org/wiki/Q23308) - - Let us explore the item *British Library (Q23308)*. The top part of the item page identifies the item. It displays: +- Let us explore the item *British Library (Q23308)*. The top part of the item page identifies the item. Here you will see: - - unique identifier (constructed as the capital letter followed by one or more numbers) - - label - - description - - aliases + - label + - description + - unique identifier (constructed as the capital letter followed by one or more numbers) + - aliases - - Lower on the page is a *statements* section, which shows relationships that have been asserted about the item. A statement has: +- Farther down on the page is a *Statements* subheading. This section shows relationships, or claims, that have been asserted about the item. Statement may include: - - property (constructed as the capital letter P followed by one or more numbers) - - value - - qualifier (optional) - - references (optional) - - can also be called a "triple," which will be explained later - - As you can see, a property can have multiple values for one property; for example "member of"; and can be further specified by qualifiers (not shown on the item page for British Library). + - property (constructed as the capital letter P followed by one or more numbers) + - value + - qualifier (optional) + - references (optional) + - As you can see, a property can have multiple values. For example, *member of* indicates multiple values. These values can be further specified or supported by qualifiers (not shown on the item page for British Library) + - statements can also be called "triples," since they include three parts (the item, the property relationship, and the property's value), which we will look into more closely later on -- There are many special terms and definitions here, like statements, qualifiers and so on. Since many of these terms can be confusing, you may check [this overview graphic as a reference](https://upload.wikimedia.org/wikipedia/commons/a/ae/Datamodel_in_Wikidata.svg): +Wikidata items, as you can see above, have many special parts, like statements, qualifiers, and so on. The following overview graphic, [directly linked here](https://upload.wikimedia.org/wikipedia/commons/a/ae/Datamodel_in_Wikidata.svg), explains many of the various elements of a Wikidata item and shows how they may appear on an item page: ![](https://upload.wikimedia.org/wikipedia/commons/a/ae/Datamodel_in_Wikidata.svg){alt='Labeled display of a Wikidata item showing how elements like identifier, description, and staements may be displayed'} -- Most pages can be edited by anyone (note, however, that the British Library - Q23308 item is semi-protected). To edit an item, click the pen icon followed by the word "edit" in the upper-right are of the page. Don't worry if you made a mistake, you can always go back in an item's history and restore or undo changes. - - - "View history" - more later - - "Log in" and other things for registered users +:::::::::::: callout + +### Wikidata editing and change history + +Most pages can be edited by anyone (note, however, that the British Library - Q23308 item is semi-protected), and like other wiki projects, Wikidata tracks all changes made to an item. +To see the changes made to an item, click "View history". +To edit an item, click the pen icon followed by the word "edit" in the upper-right area of an item page. +Don't worry if you made a mistake, you can always go back in an item's history and restore or undo changes. +We will explore the steps of editing a Wikidata item in [episode 3, "Introduction to editing"](../episodes/03-intro_to_editing.html). -- All of Wikidata's data is published online under the [Creative Commons CC0 License](https://creativecommons.org/publicdomain/zero/1.0/), which states: +:::::::::::: + +## 1\.3 Wikidata's commitment to open data + +All of Wikidata's data is published freely and openly online under a [Creative Commons CC0 License](https://creativecommons.org/publicdomain/zero/1.0/), which states: "The person who associated a work with this deed has dedicated the work to the public domain by waiving all of his or her rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law. You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission." In other words, the data is openly licensed and reusable. Since Wikidata can also be linked to other data sources on the web, this means Wikidata is *linked open data*. - Follow this link to view a pdf that offers a one-page overview of Wikidata (visual): [https://commons.wikimedia.org/wiki/File:Wikidata-in-brief-1.0.pdf](https://commons.wikimedia.org/wiki/File:Wikidata-in-brief-1.0.pdf) -## 1\.2 Play games to open +:::::::::::: challenge + +### Explore a Wikidata Item + +Locate the Wikidata page of the city you were born in. Look for the population. + +- Has the population changed over time? Some wikidata pages appear in multiple languages. +- Are the aliases and data similar between Wikidata and the various Wikipedia entries in different languages? +- Compare the information in Wikipedia and Wikidata + +:::::::: solution + +- Depending on the detail and amount of information about a place, there may be multiple values regarding a city's population. Because a city changes over time, Wikidata statements can be qualified, including with the addition of a start/end date, or by providing a citation for the data. The change in population over time provides a good example of the importance of providing qualifications for Wikidata staements. -- Visit the Wikipedia page of the city you were born in two languages - of you choice (you can choose different language version in the left - side of a Wikipedia page) and look the size of the population. Are - the numbers the same in the different language? Visit the item in - Wikidata. +:::::::: -## 1\.3 Wikidata Item Eastereggs +:::::::::::: + +## 1\.4 Wikidata Item Eastereggs While most of the Q identifiers are arbitrary numbers, there are a few that suggest some meaning or humor, such as: @@ -120,20 +144,39 @@ While most of the Q identifiers are arbitrary numbers, there are a few that sugg - [Q666 - Number of the beast](https://www.wikidata.org/wiki/Q666) - [Q12345 - Count von Count, Character on Sesame Street](https://www.wikidata.org/wiki/Q12345) -## 1\.4 Linking Wikidata to other Wiki resources +## 1\.5 Linking Wikidata to other Wiki resources + +One of the most important and powerful aspects of Wikidata item pages is the final subheading, *Identifiers*. This is a special section that appears at the end of a Wikidata item page, and it is where information about how an item is identified in other databases or knowledge bases. Here, for example, is where you will find information about how an author's Wikidata page relates to various national library catalogs, the Virtual International Authority File, or fan databases that document an author's writings. This linking feature, which is quite highly developed in Wikipedia, makes the data especially valuable to libraries, archives, and other cultural heritage information. + +As well as linking to external identifiers and authority sources, this section also has information about links to an item's Wikipedia page (if there is one), as well as other WikiMedia projects, including WikiCommons, WikiSource, and others. + +:::::::::::: challenge + +### Links from Wikipedia to Wikidata + +Let's take a look at the relationships between Wikipedia and Wikidata. For example, how about Darwin's [On the Origin of Species](https://en.wikipedia.org/wiki/On_the_Origin_of_Species), a notable scientific work that is discussed in both resources. + +- What information is common between both resources? How would you describe the information in Wikidata, in comparison to that in Wikipedia? How are they similar or different? -- Link from Wikipedia to Wikidata - - e.g. [https://en.wikipedia.org/wiki/On\_the\_Origin\_of\_Species](https://en.wikipedia.org/wiki/On_the_Origin_of_Species) - - \=> Follow the link "Wikidata item" on the left side under "tools" - - \=> [https://www.wikidata.org/wiki/Q20124](https://www.wikidata.org/wiki/Q20124) - - \=> the Wikipedia article is linked on the Wikidata's item page. You can find it on the right side. - - \=> link to WikiCommons and WikiSource +:::::::: solution + + - \=> Follow the link "Wikidata item" on the left side under "tools" + - \=> [https://www.wikidata.org/wiki/Q20124](https://www.wikidata.org/wiki/Q20124) + - \=> the Wikipedia article is linked on the Wikidata's item page. You can find it on the right side. + - \=> link to WikiCommons and WikiSource + +It is important to note that Wikidata is limited to basic statements or assertions, such as when the work was published, who and where it was published, and who wrote the work. This is similar to a catalog record. The Wikipedia article, on the other hand, discusses the themes and structure, the impact and reception of the work, and subsequent or ongoing debates. + +:::::::: + +:::::::::::: :::::::::::::::::::::::::::::::::::::::: keypoints - Wikidata entities are known as Items, and each item is displayed on a page that is identified with the item's "Q" number -- Relationships between entities are known as Properties, and each property is identified with a "P" number - Statements are assertions about items, which state relationships between items using wikidata properties. +- Relationships between entities are known as Properties, and each property is identified with a "P" number - Statements are also known as "triples" +- Wikidata and Wikipedia are complementary, but Wikidata is focused on basic claims or assertions, not descriptive or narrative information :::::::::::::::::::::::::::::::::::::::::::::::::: diff --git a/episodes/02-Wikidata_underlying_concepts.md b/episodes/02-Wikidata_underlying_concepts.md index f65f80ac..1493e3d7 100644 --- a/episodes/02-Wikidata_underlying_concepts.md +++ b/episodes/02-Wikidata_underlying_concepts.md @@ -1,14 +1,14 @@ --- title: Underlying concepts of Wikidata -teaching: 0 -exercises: 0 +teaching: 10 +exercises: 10 --- ::::::::::::::::::::::::::::::::::::::: objectives - Know what a triple is, and relate structure of a Wikidata statement to traditional metadata field structure - Know how linked data can create more context for patrons/users in library catalogs -- Know how linked data can improve recall in library catalogs? (TODO: Check if we want to address this here). +- Know how linked data can improve recall in library catalogs? :::::::::::::::::::::::::::::::::::::::::::::::::: @@ -19,87 +19,125 @@ exercises: 0 :::::::::::::::::::::::::::::::::::::::::::::::::: -## 2\.1 Concepts foundations: ways of storing data. +## 2\.1 Conceptual foundations: ways of storing data -There are many types of databases, the most common types are: +There are many types of database structures and systems. +Two common database types are relational databases and graph databases. +Understanding the commonalities and differencews between these structures +helps to explain the uniqueness of Wikidata's data structure. -### 2\.1.1 Relational databases: +### 2\.1.1 Relational databases A relational database is a set of formally described related tables from which data can be accessed or reassembled. This model organizes data into one or more tables (or "relations") of columns and rows, with a unique key identifying each row. each table/relation represents one "entity type" and these entities are connected via constrained relationships. This model is fully structured and mostly uses SQL (Structured Query Language) to retrive and manuplate data. -Examples: -![](fig/Relational_database_terms.svg.png){alt='relational database'} +A single database table and its basic parts is demonstrated below. +Note that each row is a set of ordered values that corresponds to a single data element. +Each column in the table may be understood as an attribute, which is a common attribute, +but for which each row has the data corresponding to that record. +Together, the entire table consitutes a data element that can be related +to other other tables. + +![](fig/Relational_database_terms.svg.png){alt='Schematic of a data table in a relational database, which can be understood as a series of records (ordered tuple values) with various attributes, which can be related to other tables in the database through structured queries.'} ### 2\.1.2 Graph / Semantic databases -Semantic web is an extension of the World Wide Web standards, which promote common data formats and exchange protocols on the Web, most fundamentally the Resource Description Framework (RDF) is used to store data. Most RDF fundamentally uses SPARQL (Simple Protocol and Rdf Query Language) to read stored data while relational databases uses SQL (Structured Query Language) to do so. In SQL relational database terms, RDF data can also be considered or viewed as a table with only three columns – the subject column, the predicate column, and the object column. +Semantic web is an extension of the World Wide Web standards, which promote common +formats and exchange protocols on the Web. For data exchange, the fundamental Web standard +is the Resource Description Framework, or RDF. Rather than being defined by tables, +this "graph" or semantic structure is defined by relationship statements. RDF outlines +a protocol for encoding and transmitting graph data on the web. -![](fig/Data_Structure_Diagram.jpg){alt='data structure diagram'} +RDF can be queried and analyzes using a language called SPARQL (Simple Protocol +and RDF Query Language). This has its own syntax, but it is similar to how relational +databases use SQL (Structured Query Language) to create and build queries. +In SQL relational database terms, RDF data can also be processed as a table, +but with only three columns – the subject column, the predicate column, and the object column. -## 2\.2 Concepts foundations (RDF and RDF triples) +![](fig/Data_Structure_Diagram.jpg){alt='A data structure diagram illustrating a possible connection between a list of triples, represented by a data dictionary, and a graph diagram which visualizes the relationships stipulated by the triples.'} -- The RDF is a conceptual data model, It is based on the idea of making statements about resources in expressions of the form (subject–predicate–object), known as triples. +## 2\.2 Conceptual foundations: RDF and Triples -- The subject denotes the resource, and the predicate denotes traits or aspects of the resource, and expresses a relationship between the subject and the object, for example: John-is-a person, John-born in-1980, John-works as-Engineer +The RDF defines a conceptual data model that is based on the idea of +making statements about resources. Unlike a relational database, +the data model defined by RDF is text-focused, and it is based on relating defined entities +(as Wikidata calls them, *items*) that can be referred to by a Internationalized +Resource Identifier (an IRI, which is nearly synonymous with a URL), and which can be +connected or related to any other defined entity through a standard language. While the data +structures can be complex, they rely on a basic structure called a **triple**, +which consists of a *subject* and an *object*, which are linked together, or related, by a defined +relationship called a *predicate* (as Wikidata calls it, a *property*). +Here youcan read Wikipedia's definition of a [semantic triple](https://en.wikipedia.org/wiki/Semantic_triple). -- RDF data are stored on containers known as triplestores. +![](fig/RDF_subject_predicate_object.jpg){alt='Schematic illustration of an RDF Triple'} -- [https://en.wikipedia.org/wiki/Semantic\_triple](https://en.wikipedia.org/wiki/Semantic_triple) +The basic data statement is expressed in the form *subject–predicate–object*, also known as a *triple*. +The *subject* denotes the resource. In Wikidata, each item, or Q node, is a triple subject. +The *object* is usually another data entity, though it may also be a standalone value, which is related to the subject by the predicate relator. +The *predicate* denotes traits or aspects of the resource, +and expresses a relationship between the subject and the object, for example: -![](fig/RDF_subject_predicate_object.jpg){alt='RDF Tripe'} +- The British Library *is-a* library +- John *is-a* person +- John *born-in *1980 +- John *has-occupation* engineer -## 2\.3 Underlying components +Each of the above is a triple about the subject "John," wither different predicates +and objects. -- Items - Items represents subjects such Douglas Adams and have identifiers that starts with letter "Q" like: Q42 for Douglas Adams. - Each item must have a name in one or more langauges, optionally have alternative names and descrition. -- Properties - Properties represents attributes of the subject such occupation and have identifiers that starts with letter "P" like: P106 for Occupation. -- Claims - Claims are the triples, which combine the formation of Item and Property and a value such: - Douglas Adams (Q42) - occupation (P106) - comedian. - Note: value can be already stored in wikidata, therefore the bot assigns the Q number of the value instead. -- Statement - A Claim is a part of a statement, a statement also includes: References, Ranks, and Qualifiers. -- References - Used to store the source of the claim, using properties, such stated in, qoute, and etc. -- Ranks - A useful component to mark outdated claims. -- Qualifiers - Qualifiers are besicly properties but on claims rather than items. +As you can imagine, Wikidata has a huge number of data items (subjects), and +it includes millions and millions of triple statements. +RDF data are stores are also known as triplestores. -::::::::::::::::::::::::::::::::::::::: challenge +## 2\.3 Wikidata concepts -## Is data stored in the RDF triple format part of your work as a librarian? +**Items** +: Items represents things and conceps, including people, places, events, subjects, and more. Examples mentioned previously include the British Library or Douglas Adams. Wikidata items have identifiers that start with letter "Q", like `Q42` for Douglas Adams. + Each item must have a label in one or more languages, optionally have alternative names and descrition. -Take some time to think about if data stored in the RDF triple format -is part of your work as a librarian. Can you give an example in the format of an RDF triplet? +**Properties** +: Properties represents attributes of the subject such occupation and have identifiers that starts with letter "P" like: P106 for Occupation. -::::::::::::::: solution +**Claims** +: Claims are the triple statements, which combine the formation of Item and Property and value. +For example: `Douglas Adams (Q42) - occupation (P106) - comedian (Q245068)`. *Note:* value can be already stored in wikidata, therefore the bot assigns the Q number of the value instead. -## Solution +**Statement** +: A Claim is a part of a statement, a statement also includes: References, Ranks, and Qualifiers. -*TO DO*: PLEASE ADD A REAL LIFE EXAMPLE +**References** +: Used to store the source of the claim, using properties, such stated in, qoute, and etc. +**Ranks** +: A useful component to deprecate outdated claims. +**Qualifiers** +: Qualifiers are basically properties but on claims rather than items. + +::::::::::::::::::::::::::::::::::::::: discussion + +### Can you identify triple structures in library data? + +Is data stored in the RDF triple format part of your work as a librarian? +Take some time to think about if data stored in the RDF triple format +is part of your work as a librarian. +Can you give an example in the format of an RDF triple? ::::::::::::::::::::::::: -:::::::::::::::::::::::::::::::::::::::::::::::::: ::::::::::::::::::::::::::::::::::::::: challenge ## Point out one RDF triple on the Wikidata item page of former astronaut Mae Jemison. Got to the Wikidata page of Mae Jemison and point out one RDF triple. -An RDF triplet consists of a subject, a predicate and an object. +An RDF triple consists of a subject, a predicate and an object. Can you assign the three corresponding Wikidata terms? ::::::::::::::: solution ## Solution -Got to Wikidata and either search for "Mae Jemison" or enter the ID *Q34091*. +Go to Wikidata and either search for "Mae Jemison" or enter the ID *Q34091*. In the picture below the statement "Mae C. Jemison - part of - NASA Astronaut Group 12" is an RDF triple. ![](fig/Mae_Jemison_Wikidata.png){alt='Wikidata\_Main\_Page'} *Screenshot of [Wikidata Main Page](https://www.wikidata.org/wiki/Q34091)* @@ -108,30 +146,12 @@ In the picture below the statement "Mae C. Jemison - part of - NASA Astronaut Gr :::::::::::::::::::::::::::::::::::::::::::::::::: -## 2\.4 Scholia - a webserive with Wikidata as underlying database - -- Introduction with [The Linked Open Data Cloud](https://www.lod-cloud.net/) -- the structure enables queries -- reference to DBPedia -- you can build your own web services with Wikidata as database > [Scholia](https://scholia.toolforge.org/) - - e.g. search for Alex Bateman - -## 2\.5 Wikidata one pager - -- [https://commons.wikimedia.org/wiki/File:Wikidata-in-brief-1.0.pdf](https://commons.wikimedia.org/wiki/File:Wikidata-in-brief-1.0.pdf) - -## 2\.6 How Wikidata compares with other data sets - -- [https://meta.wikimedia.org/wiki/Wikidata/Notes/DBpedia\_and\_Wikidata](https://meta.wikimedia.org/wiki/Wikidata/Notes/DBpedia_and_Wikidata) -- [https://lod-cloud.net/](https://lod-cloud.net/) - -FIXME - - - :::::::::::::::::::::::::::::::::::::::: keypoints -- First key point. (FIXME) +- Triples are the basic data structure of graph databases, and they are the conceptual structure of Wikidata statements. +- Wikidata items are denoted by a human-readable label and a short description, and a unique identifer that begins with a Q. These items are the subjects of linked Wikidata statements. +- Wikidata defines relationships between items, also known as triple *predicates*, with Wikidata *properties*. +- Wikidata statements can capture library information, such as relationships like creatorship, publication, aboutness, and more. :::::::::::::::::::::::::::::::::::::::::::::::::: diff --git a/index.md b/index.md index 7f51d1ef..39012b44 100644 --- a/index.md +++ b/index.md @@ -29,9 +29,8 @@ Contributions are very welcome** ## Prerequisites 1. Learners need a proper internet connection. - -2. There is no need for pre installations. - +2. There is no need for pre installation of software. +3. A [Wikimedia user account](https://www.wikidata.org/w/index.php?title=Special:CreateAccount&returnto=Wikidata%3AMain+Page), which can be created for free and used to edit Wikidata, Wikipedia, or other Wikimedia projects ::::::::::::::::::::::::::::::::::::::::::::::::::