Description
Headline
Free Law Project enhances the relevancy of the RECAP Archive search by using AI document prioritization
What is the Feature?
Over in https://github.com/freelawproject/ai-experiments/issues/10, we're investigating the creation of a document classifier. Once that is complete, we will be able to classify all of our documents into categories like:
- Brief
- Motion to dismiss
- Complaint
- etc.
In freelawproject/courtlistener#4381, we're doing boosting based on court, such that items from more important courts show up at the top of search results.
During one of @s-taube's user research sessions, somebody mentioned that they didn't understand which documents got selected as the top hit (we only show three) when doing a RECAP query, and suggested we use document type to influence this. After all, it's much better to see a motion to dismiss than some an attorney admittance motion or a scheduling order.
So, let's do that. Just like we plan to assign boosting scores to different courts, let's assign boosting scores to different document types.
What Problem Might it Solve?
This makes it so more interesting document types show up in results, and so that types nobody cares about (like attorney addmittance motions) don't show up so much.
Describe a Scenario in Which the Feature Might be Used
John is doing research on Miranda warnings, so he searches broadly for recent cases that mention the word "Miranda". In the results he sees important documents about that Miranda warnings, instead of seeing scheduling orders in a case involving somebody named Miranda.
Technical Requirements
-
How hard is it to make, subjectively?
I think this will be medium difficulty once we have the documents classified.
-
Best guess, how long would it take to make, roughly? [X days/weeks/months/years]
Probably about a month if we work on it a bit at a time:
- Week 1: Make a script for data ingestion
- Week 2: Ingest the data.
- Week 3-4: Tweak tests and code to use the relevancy
- Week 5: Deploy
-
What would it require that we do technically?
I think we'll have to add those classifications to Elastic, assign them scores, and then tweak our relevancy formula to use those scores.
Existing Systems or Alternatives?
I don't know if other search engines do this. They might, but I don't think they'd tell us.
Any Additional Information?
-
This came up in @s-taube's UX interviews — yay!
-
I think this is a very good idea that should help with our relevancy. It's not that hard, and we'll want to add these categories to Elastic anyway so folks can search for different filing categories. Why not use that for relevancy too?
Metadata
Metadata
Assignees
Labels
Type
Projects
Status