Skip to content

Replace passivepy with a call to an LLM#147

Merged
nonprofittechy merged 10 commits intomigrate-from-spaCy-and-nltkfrom
replace-passivepy
Sep 25, 2025
Merged

Replace passivepy with a call to an LLM#147
nonprofittechy merged 10 commits intomigrate-from-spaCy-and-nltkfrom
replace-passivepy

Conversation

@nonprofittechy
Copy link
Member

@nonprofittechy nonprofittechy commented Sep 25, 2025

This replaces the sentence tokenization we used in a few places with a regular expression (instead of NTLK) and replaces the use of PassivePy (via tools.suffolklitlab.org) with a call to an LLM.

PassivePy states accuracy of 98% on its test dataset; the gpt-5-nano LLM via promptfoo evaluation scores 95.65% on the same dataset of about 1,100 sentences. Spent a lot of time going through multiple rounds of tests and tweaks with few shot with extremely detailed instructions vs zero shot classification, and closer to zero shot with fewer rules in the prompt seems to perform the best for gpt-5-nano. Additionally, when I looked closely at the failures, they seem to mostly be because of ambiguous meanings of sentences that have a valid passive voice interpretation but were marked as active by PassivePY's human annotators. I feel confident that the current performance of the LLM is good enough to capture confusing sentences, as the sentences that our LLM prompt marked "passive" but the human marked "active" confused me!

image

Some of the "weird" sentences where we disagreed with human annotators:

  1. The politics being discussed were causing scene. (human: passive, llm: active)
  2. I am stunned at the impact politics is having on our country these days. (human: active, LLM: passive)
  3. The debate tonight was heated (human: active, llm: passive)
  4. Politics can be stressful to be involved in. (human: active, llm: passive)

Some patterns with adjectives vs verb confusion--I agree with the humans after looking closely, but the errors are on weird/ungrammatical sentences, pretty close calls with two valid meanings (one passive and one active), or with ambiguity in usage.

Note that gpt-5-nano is extremely inexpensive, and our prompt does well with caching. Testing 1,100 sentences = 12.5 cents.

image

If this lets us power off tools.suffolklitlab.org, that would be a significant savings, as this is likely to cost less than a dollar a month for even quite high usage.

Additionally, explored using the new Responses API extensively but ultimately stuck with tried and true ChatCompletion; Responses cannot be tested in the current version of PromptFoo and it seems that performance was worse than with the older ChatCompletion (But again, hard to test with promptfoo; any gains would be slight reduction in cost, which is fractions of a penny per thousand uses).

Progress toward #145

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR replaces PassivePy (a Python library for passive voice detection) with a call to OpenAI's LLM (gpt-5-nano) for passive voice detection in text analysis, moving from a local library to an AI-powered cloud solution.

  • Removes dependency on PassivePy and tools.suffolklitlab.org API for passive voice detection
  • Implements new LLM-based passive voice detection using OpenAI's gpt-5-nano model
  • Replaces NLTK sentence tokenization with a regex-based approach to reduce dependencies

Reviewed Changes

Copilot reviewed 11 out of 13 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
formfyxer/passive_voice_detection.py New module implementing LLM-based passive voice detection with OpenAI API
formfyxer/lit_explorer.py Updated to use new passive voice detection module instead of tools API
formfyxer/tests/test_passive_voice_detection.py Comprehensive unit tests for the new passive voice detection functionality
formfyxer/prompts/passive_voice.txt Prompt template for LLM passive voice classification
promptfooconfig.yaml Configuration for evaluating the LLM passive voice detector
test_passive_voice_detection.py Integration test script for the passive voice detection module
formfyxer/tests/passive_voice_test_dataset.csv Test dataset for passive voice evaluation

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@nonprofittechy
Copy link
Member Author

nonprofittechy commented Sep 25, 2025

@BryceStevenWilley this turned out to take much more testing than I expected--I thought it would be the easier drop-in replacement, lol.

But I'll do a future PR off of this branch since we already replace sentence tokenization in this PR.

Same errors with the old ML dependencies; going to ignore those for now.

@nonprofittechy nonprofittechy changed the base branch from main to migrate-from-spaCy-and-nltk September 25, 2025 15:42
Copy link
Contributor

@BryceStevenWilley BryceStevenWilley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! My only nit is that we should rename + move the integration test file.

nonprofittechy and others added 4 commits September 25, 2025 13:17
Co-authored-by: Bryce Willey <bryce.willey@suffolk.edu>
Co-authored-by: Bryce Willey <bryce.willey@suffolk.edu>
@nonprofittechy nonprofittechy merged commit c08bfd4 into migrate-from-spaCy-and-nltk Sep 25, 2025
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants