Skip to content

Functionality

jadelintott edited this page Oct 25, 2020 · 17 revisions

Functionality

Table of contents

Home and sidebar
Creating a new run
    Naming your run
    Selecting collections
    Selecting keyword lists
    Selecting metadata file
    Running the tool
    Reports
        Summary report
        Individual reports
Managing keyword lists
    Adding a new keyword list
    Editing a keyword list
    Deleting a keyword list
Managing collections
    Adding a new collection and uploading files
    Editing a collection
    Deleting a collection
Managing past runs
    Viewing the report from a past run
    Deleting a past run
Using single-document sharing

Home and sidebar

When you first open the application, you will be directed to the home page. The sidebar, which can be opened from the top left, contains the rest of the functionality for the tool, detailed below.

Creating a new run

This is the main functionality of the tool--running the subcorpora tool on a list of collections, a list of keyword lists, and a metadata file and generating a summary of the results and folders of the subcorpora.

Figure 1: Click on the "Create Run" section on the sidebar.

Naming your run

You first name your run. A unique ID will then be given to your run, which consists of a combination of the name, the date, and the time. A new folder is created for the run under data/runs, which will contain all the files generated after the run is complete.

Figure 2: Naming the run.

Selecting collections

Here, you choose the collections that you want to include in your run. Multiple collections can be selected; each collection will be run against each keyword list. Only collections with corpus files that have been uploaded and appear in the "Collections" list will be able to be selected. If you want to add a new collection, you need to do it under the "Collections" tab.

Figure 3: Selecting the collections that will be in the run.

Selecting keyword lists

Here, you choose the keyword lists that you want to include in your run. Multiple keyword lists can be selected. Only keyword lists that are in the "Keyword Lists" list will be able to be selected. If you want to add a new keyword list, you need to do it under the "Keyword Lists" tab.

Figure 4: Selecting the keywords that will be in the run.

Selecting metadata files

Here, you choose the metadata files that you want to include in your run (or upload a new ones). You can only choose one metadata file for the collection metadata, and one for the interviewee metadata.

The collection metadata must be a CSV file with (1) a header in the first row, (2) one row per interview text, and (3) the following columns with the exact column names:

  • interview_id: the unique id of that interview
  • project_file_name: the filename of the project (needs to match the one uploaded)
  • no_transcript: TRUE or FALSE--whether or not there is an actual transcript (physical file) for the interview
  • date_of_first_interview: the date of the first interview in the form mm/dd/yyyy
  • interviewee_ids: a semi-colon separated list of all the unique interviewee ids present in the interview

The interviewee metadata must be a CSV file with (1) a header in the first row, (2) one row per interviewee, and (3) the following columns with the exact column names:

  • interviewee_id: the unique id of that interviewee
  • interviewee_name: the name of the interviewee in the form LASTNAME, FIRSTNAME MIDDLENAME
  • birth_decade: the decade that the interviewee was born
  • interviewee_birth_country: the birth country of the interviewee
  • sex: the sex of the interviewee
  • identified_race: the identified race of the interviewee
  • education: the education level of the interviewee

Figure 5: Selecting the metadata file to be used in the run.

Running the tool

The Python file will then run with the information that you have given it. Here, you'll be able to see the progress.

Figure 6: The progress bar with the progress message.

Reports

After each run, Winnow generates a summary report (across all keyword lists and all collections) and individual reports for each combination of keyword list and collection. The user can navigate between these reports by using the navigation bar at the top of the page.

Summary report

The summary report summarizes our run over all keyword lists and all collections.

Figure 7: Example summary report. The report navigation is at the top of the page. Basic information is located under that, and then graphs and charts are after that.

Currently, the summary report contains the following information:

  • Total collections
  • Percent of collections with keywords
  • Total interviews
  • Percent interviews with keywords
  • Total keywords searched for
  • Total keywords found
  • Data directory where files generated are located
  • Graph of keyword use over time
  • Graph of count of keywords found
  • Graph of time range of interviews
  • Graph of time range of interviewee birth dates
  • Graph of race of interviewees
  • Graph of sex of interviewees
  • Graph of education of interviewees

Below, you can see an example of the graph that shows keyword use over time. You can adjust which keywords show up and which don't.

Figure 8: Graph of keyword use over time.

Individual reports

Navigating to an individual report, it looks very similar to the summary report. It contains the following information:

  • Collection name
  • Keyword list name
  • Total keywords
  • Total interviews
  • Percent of interviews with keywords
  • Total keywords found
  • Percent keyword contexts flagged
  • Percent keyword contexts marked as false hits
  • Data directory and subcorpora folders with files generated from this run
  • Graph of keyword use over time
  • Graph of count of keywords found
  • Graph of time range of interviews
  • Graph of time range of interviewee birth dates
  • Graph of race of interviewees
  • Graph of sex of interviewees
  • Graph of education of interviewees
  • Tables of keywords in context with flagging and false hit marker abilities

Figure 9: Example of keywords in context with marked false hits and marked flagged contexts.

Managing keyword lists

Figure 10: Table of keyword lists.

Adding a new keyword list

Figure 11: Adding a new keyword list.

Editing a keyword list

Figure 12: Editing a keyword list.

Deleting a keyword list

Figure 13: Deleting a keyword list.

Managing collections

Figure 14: Table of collections.

Adding a new collection and uploading files

Figure 15: Adding a new collection.

Editing a collection

Figure 16: Editing a collection.

Currently, you cannot edit, delete, or add files to a collection through the interface. You would have to manually go to data/corpus-files and do it there.

Deleting a collection

Figure 17: Deleting a collection.

Managing past runs

Figure 18: Table of past runs.

Viewing a report from a past run

Simply click on the link; it will direct you to the report.

Deleting a past run

Figure 19: Deleting a past run.

Using single-document sharing

Under data/, there is a file called session.json. Share this with a teammate and have them put it into data/. You will then have the current state.

Clone this wiki locally