-
Notifications
You must be signed in to change notification settings - Fork 1
Functionality
Home and sidebar
Creating a new run
Naming your run
Selecting collections
Selecting keyword lists
Selecting metadata file
Running the tool
Reports
Summary report
Individual reports
Managing keyword lists
Adding a new keyword list
Editing a keyword list
Deleting a keyword list
Managing collections
Adding a new collection and uploading files
Editing a collection
Deleting a collection
Managing past runs
Viewing the report from a past run
Deleting a past run
Using single-document sharing
When you first open the application, you will be directed to the home page. The sidebar, which can be opened from the top left, contains the rest of the functionality for the tool, detailed below.
This is the main functionality of the tool--running the subcorpora tool on a list of collections, a list of keyword lists, and a metadata file and generating a summary of the results and folders of the subcorpora.
Figure 1: Click on the "Create Run" section on the sidebar.
You first name your run. A unique ID will then be given to your run, which consists of a combination of the name, the date, and the time. A new folder is created for the run under data/runs, which will contain all the files generated after the run is complete.
Figure 2: Naming the run.
Here, you choose the collections that you want to include in your run. Multiple collections can be selected; each collection will be run against each keyword list. Only collections with corpus files that have been uploaded and appear in the "Collections" list will be able to be selected. If you want to add a new collection, you need to do it under the "Collections" tab.
Figure 3: Selecting the collections that will be in the run.
Here, you choose the keyword lists that you want to include in your run. Multiple keyword lists can be selected. Only keyword lists that are in the "Keyword Lists" list will be able to be selected. If you want to add a new keyword list, you need to do it under the "Keyword Lists" tab.
Figure 4: Selecting the keywords that will be in the run.
Here, you choose the metadata files that you want to include in your run (or upload a new ones). You can only choose one metadata file for the collection metadata, and one for the interviewee metadata.
The collection metadata must be a CSV file with (1) a header in the first row, (2) one row per interview text, and (3) the following columns with the exact column names:
-
interview_id: the unique id of that interview -
project_file_name: the filename of the project (needs to match the one uploaded) -
no_transcript: TRUE or FALSE--whether or not there is an actual transcript (physical file) for the interview -
date_of_first_interview: the date of the first interview in the form mm/dd/yyyy -
interviewee_ids: a semi-colon separated list of all the unique interviewee ids present in the interview
The interviewee metadata must be a CSV file with (1) a header in the first row, (2) one row per interviewee, and (3) the following columns with the exact column names:
-
interviewee_id: the unique id of that interviewee -
interviewee_name: the name of the interviewee in the form LASTNAME, FIRSTNAME MIDDLENAME -
birth_decade: the decade that the interviewee was born -
interviewee_birth_country: the birth country of the interviewee -
sex: the sex of the interviewee -
identified_race: the identified race of the interviewee -
education: the education level of the interviewee
Figure 5: Selecting the metadata file to be used in the run.
The Python file will then run with the information that you have given it. Here, you'll be able to see the progress.
Figure 6: The progress bar with the progress message.
After each run, Winnow generates a summary report (across all keyword lists and all collections) and individual reports for each combination of keyword list and collection. The user can navigate between these reports by using the navigation bar at the top of the page.
The summary report summarizes our run over all keyword lists and all collections.
Figure 7: Example summary report. The report navigation is at the top of the page. Basic information is located under that, and then graphs and charts are after that.
Currently, the summary report contains the following information:
- Total collections
- Percent of collections with keywords
- Total interviews
- Percent interviews with keywords
- Total keywords searched for
- Total keywords found
- Data directory where files generated are located
- Graph of keyword use over time
- Graph of count of keywords found
- Graph of time range of interviews
- Graph of time range of interviewee birth dates
- Graph of race of interviewees
- Graph of sex of interviewees
- Graph of education of interviewees
Below, you can see an example of the graph that shows keyword use over time. You can adjust which keywords show up and which don't.
Figure 8: Graph of keyword use over time.
Navigating to an individual report, it looks very similar to the summary report. It contains the following information:
- Collection name
- Keyword list name
- Total keywords
- Total interviews
- Percent of interviews with keywords
- Total keywords found
- Percent keyword contexts flagged
- Percent keyword contexts marked as false hits
- Data directory and subcorpora folders with files generated from this run
- Graph of keyword use over time
- Graph of count of keywords found
- Graph of time range of interviews
- Graph of time range of interviewee birth dates
- Graph of race of interviewees
- Graph of sex of interviewees
- Graph of education of interviewees
- Tables of keywords in context with flagging and false hit marker abilities
Figure 9: Example of keywords in context with marked false hits and marked flagged contexts.
Figure 10: Table of keyword lists.
Figure 11: Adding a new keyword list.
Figure 12: Editing a keyword list.
Figure 13: Deleting a keyword list.
Figure 14: Table of collections.
Figure 15: Adding a new collection.
Figure 16: Editing a collection.
Currently, you cannot edit, delete, or add files to a collection through the interface. You would have to manually go to data/corpus-files and do it there.
Figure 17: Deleting a collection.
Figure 18: Table of past runs.
Simply click on the link; it will direct you to the report.
Figure 19: Deleting a past run.
Under data/, there is a file called session.json. Share this with a teammate and have them put it into data/. You will then have the current state.
This code belongs to the Stanford Oral History Text Analysis Project and is licensed under The MIT License.