Functionality

Home and sidebar
Creating a new run
    Naming your run
    Selecting collections
    Selecting keyword lists
    Selecting metadata file
    Running the tool
    Reports
        Summary report
        Individual reports
Managing keyword lists
    Adding a new keyword list
    Editing a keyword list
    Deleting a keyword list
Managing collections
    Adding a new collection and uploading files
    Editing a collection
    Deleting a collection
Managing past runs
    Viewing the report from a past run
    Deleting a past run
Using single-document sharing

Home and sidebar

When you first open the application, you will be directed to the home page. The sidebar, which can be opened from the top left, contains the rest of the functionality for the tool, detailed below.

Creating a new run

This is the main functionality of the tool--running the subcorpora tool on a list of collections, a list of keyword lists, and a metadata file and generating a summary of the results and folders of the subcorpora.

Figure 1: Click on the "Create Run" section on the sidebar.

Naming your run

You first name your run. A unique ID will then be given to your run, which consists of a combination of the name, the date, and the time. A new folder is created for the run under data/runs, which will contain all the files generated after the run is complete.

Figure 2: Naming the run.

Selecting collections

Here, you choose the collections that you want to include in your run. Multiple collections can be selected; each collection will be run against each keyword list. Only collections with corpus files that have been uploaded and appear in the "Collections" list will be able to be selected. If you want to add a new collection, you need to do it under the "Collections" tab.

Figure 3: Selecting the collections that will be in the run.

Selecting keyword lists

Here, you choose the keyword lists that you want to include in your run. Multiple keyword lists can be selected. Only keyword lists that are in the "Keyword Lists" list will be able to be selected. If you want to add a new keyword list, you need to do it under the "Keyword Lists" tab.

Figure 4: Selecting the keywords that will be in the run.

Selecting metadata files

Here, you choose the metadata files that you want to include in your run (or upload a new ones). You can only choose one metadata file for the collection metadata, and one for the interviewee metadata.

The collection metadata must be a CSV file with (1) a header in the first row, (2) one row per interview text, and (3) the following columns with the exact column names:

interview_id: the unique id of that interview
project_file_name: the filename of the project (needs to match the one uploaded)
no_transcript: TRUE or FALSE--whether or not there is an actual transcript (physical file) for the interview
date_of_first_interview: the date of the first interview in the form mm/dd/yyyy
interviewee_ids: a semi-colon separated list of all the unique interviewee ids present in the interview

The interviewee metadata must be a CSV file with (1) a header in the first row, (2) one row per interviewee, and (3) the following columns with the exact column names:

interviewee_id: the unique id of that interviewee
interviewee_name: the name of the interviewee in the form LASTNAME, FIRSTNAME MIDDLENAME
birth_decade: the decade that the interviewee was born
interviewee_birth_country: the birth country of the interviewee
sex: the sex of the interviewee
identified_race: the identified race of the interviewee
education: the education level of the interviewee

Figure 5: Selecting the metadata file to be used in the run.

Running the tool

The Python file will then run with the information that you have given it. Here, you'll be able to see the progress.

Figure 6: The progress bar with the progress message.

Reports

After each run, Winnow generates a summary report (across all keyword lists and all collections) and individual reports for each combination of keyword list and collection. The user can navigate between these reports by using the navigation bar at the top of the page.

Summary report

The summary report summarizes our run over all keyword lists and all collections.

Figure 7: Example summary report. The report navigation is at the top of the page. Basic information is located under that, and then graphs and charts are after that.

Currently, the summary report contains the following information:

Total collections
Percent of collections with keywords
Total interviews
Percent interviews with keywords
Total keywords searched for
Total keywords found
Data directory where files generated are located
Graph of keyword use over time
Graph of count of keywords found
Graph of time range of interviews
Graph of time range of interviewee birth dates
Graph of race of interviewees
Graph of sex of interviewees
Graph of education of interviewees

Below, you can see an example of the graph that shows keyword use over time. You can adjust which keywords show up and which don't.

Figure 8: Graph of keyword use over time.

Individual reports

Navigating to an individual report, it looks very similar to the summary report. It contains the following information:

Collection name
Keyword list name
Total keywords
Total interviews
Percent of interviews with keywords
Total keywords found
Percent keyword contexts flagged
Percent keyword contexts marked as false hits
Data directory and subcorpora folders with files generated from this run
Graph of keyword use over time
Graph of count of keywords found
Graph of time range of interviews
Graph of time range of interviewee birth dates
Graph of race of interviewees
Graph of sex of interviewees
Graph of education of interviewees
Tables of keywords in context with flagging and false hit marker abilities

Figure 9: Example of keywords in context with marked false hits and marked flagged contexts.

Managing keyword lists

Figure 10: Table of keyword lists.

Adding a new keyword list

Figure 11: Adding a new keyword list.

Editing a keyword list

Figure 12: Editing a keyword list.

Deleting a keyword list

Figure 13: Deleting a keyword list.

Managing collections

Figure 14: Table of collections.

Adding a new collection and uploading files

Figure 15: Adding a new collection.

Editing a collection

Figure 16: Editing a collection.

Currently, you cannot edit, delete, or add files to a collection through the interface. You would have to manually go to data/corpus-files and do it there.

Deleting a collection

Figure 17: Deleting a collection.

Managing past runs

Figure 18: Table of past runs.

Viewing a report from a past run

Simply click on the link; it will direct you to the report.

Deleting a past run

Figure 19: Deleting a past run.

Using single-document sharing

Under data/, there is a file called session.json. Share this with a teammate and have them put it into data/. You will then have the current state.

This code belongs to the Stanford Oral History Text Analysis Project and is licensed under The MIT License.