Skip to content

Refactor SuggestedResource to leverage new Fingerprint model#185

Merged
jazairi merged 2 commits intomainfrom
tco-122-associate-suggested-resource-and-term
Feb 14, 2025
Merged

Refactor SuggestedResource to leverage new Fingerprint model#185
jazairi merged 2 commits intomainfrom
tco-122-associate-suggested-resource-and-term

Conversation

@jazairi
Copy link
Copy Markdown
Contributor

@jazairi jazairi commented Feb 10, 2025

Commit message

Why these changes are being introduced:

We recently decided to make a separate Fingerprint model, associated with Term, as multiple detectors are likely to use fringerprinting (implemented in #138). We have also begun to split the ActiveRecord components of detectors into separate models (implemented for Detector::Journal in #162).

Relevant ticket(s):

How this addresses that need:

  • Splits the ActiveRecord components of Detector::SuggestedResource into a separate SuggestedResource model.
  • Associates SuggestedResource with Fingerprint, via Term, such that a suggested resource can have multiple terms and fingerprints.
  • Removes the suggested resource dashboard (see side effects).

Side effects of this change:

  • Terms that are associated with a suggested resource should not be destroyed. Rails does not allow the :dependent option on belongs_to associations, so this commit instead adds a before_destroy hook with a custom method that aborts the callback and logs the attempt in Sentry.
  • Because administrate does not handle has_many relationships well, we will need to build a custom dashboard to manage suggested resources. This is ticketed as TCO-145. Until that UI is ready, we will use the Rails console to make any requested changes to suggested resources.

Developer

Accessibility

  • ANDI or Wave has been run in accordance to our guide and
    all issues introduced by these changes have been resolved or opened
    as new issues (link to those issues in the Pull Request details above)
  • There are no accessibility implications to this change

Documentation

  • Project documentation has been updated, and yard output previewed
  • No documentation changes are needed

ENV

  • All new ENV is documented in README.
  • All new ENV has been added to Heroku Pipeline, Staging and Prod.
  • ENV has not changed.

Stakeholders

  • Stakeholder approval has been confirmed
  • Stakeholder approval is not needed

Dependencies and migrations

NO dependencies are updated

YES migrations are included

Reviewer

Code

  • I have confirmed that the code works as intended.
  • Any CodeClimate issues have been fixed or confirmed as
    added technical debt.

Documentation

  • The commit message is clear and follows our guidelines
    (not just this pull request message).
  • The documentation has been updated or is unnecessary.
  • New dependencies are appropriate or there were no changes.

Testing

  • There are appropriate tests covering any new functionality.
  • No additional test coverage is required.

@mitlib mitlib temporarily deployed to tacos-api-pipeline-pr-185 February 10, 2025 15:16 Inactive
@jazairi jazairi temporarily deployed to tacos-api-pipeline-pr-185 February 10, 2025 16:00 Inactive
@jazairi jazairi force-pushed the tco-122-associate-suggested-resource-and-term branch from adb8fe3 to 71a3baf Compare February 10, 2025 16:04
@jazairi jazairi temporarily deployed to tacos-api-pipeline-pr-185 February 10, 2025 16:04 Inactive
@JPrevost JPrevost self-assigned this Feb 10, 2025
@jazairi jazairi temporarily deployed to tacos-api-pipeline-pr-185 February 10, 2025 20:11 Inactive
# @param input [CSV::Table] An imported CSV file containing all Suggested Resource records. The CSV file must have
# at least three headers, named "Title", "URL", and "Phrase". Please note: these values
# are case sensitive.
def self.bulk_replace(input)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jazairi

In order to proceed with the review more efficiently, I came up with what might be okay for this bulk_replace method.

  def self.bulk_replace(input)
    raise ArgumentError.new, 'Tabular CSV is required' unless input.instance_of?(CSV::Table)

    # Need to check what columns exist in input
    required_headers = %w[title url phrase]
    missing_headers = required_headers - input.headers
    raise ArgumentError.new, "Some CSV columns missing: #{missing_headers}" unless missing_headers.empty?

    SuggestedResource.destroy_all

    input.each do |line|
      term = Term.find_or_create_by(phrase: line['phrase'])
      term.suggested_resource = SuggestedResource.new({ title: line['title'], url: line['url'] })
      term.save
    end
  end

It changes the required_headers to lowercase as that's how they come out of Heroku DataClips (there is a saved DataClip with the prod data) and avoids the problem I ran into by starting with Term and adding SuggestedResources instead of the other way around which was a remnant from before they were related.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not adjust tests that are likely failing now, nor do a ton of manual verification they actually migrated to the expected terms, but I now have 117 SuggestedResources locally I can use to help understand these changes better which was my main goal :)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. This approach ends up with a 1:1 mapping of Terms to SuggestedResources rather than linking SuggestedResources to multiple Terms which was half the battle.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    input.each do |line|
      term = Term.find_or_create_by(phrase: line['phrase'])

      # check for existing SuggestedResource with the same title/url
      dup_check = SuggestedResource.where(title: line['title'], url: line['url'])

      # link to existing SuggestedResource if one exists
      term.suggested_resource = if dup_check.count > 0
                                  dup_check.first
                                # create a new SuggestedResource if it doesn't exist
                                else
                                  SuggestedResource.new({ title: line['title'], url: line['url'] })
                                end
      term.save
    end

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what I was tentatively working with, but I haven't confirmed it yet:

    input.each do |line|
      record = SuggestedResource.find_or_create_by({ title: line['Title'], url: line['URL'] })
      record.save
      record.terms.find_or_create_by(phrase: line['Phrase'])
    end

But again, I haven't tested that yet, and it seems too easy to work. In that case, I'm fine working from the term instead. If you want to push what you have, I can write/edit the tests. Thanks for looking into this!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed what I had, if after you look more closely at it your version does the same thing it's definitely way simpler and we should go with it instead :)

@JPrevost JPrevost temporarily deployed to tacos-api-pipeline-pr-185 February 12, 2025 14:29 Inactive
Copy link
Copy Markdown
Member

@JPrevost JPrevost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small tweaks suggested, but this seems to work exactly like we wanted. Thanks for working on this!

test 'destroys record successfully if no suggested resource is present' do
term = terms(:hi)
assert_nil term.suggested_resource
assert_nothing_raised { term.destroy }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing is raised in the other test so Im not sure this is testing what we think it is? Maybe a better assertion is to confirm Term.where(phrase: 'whatever the original term we just deleted phrase was') returns no match?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe just follow what the other test does and check for term.present? If that doesn't work as expected then both tests would need updating :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, good catch. I think this was from an earlier iteration where I was raising an exception rather than aborting the callback. I agree that assert_not term.present? is much better here.

@@ -0,0 +1,10 @@
class CreateSuggestedResources < ActiveRecord::Migration[7.1]
Copy link
Copy Markdown
Member

@JPrevost JPrevost Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jazairi I'm not sure how important this is, but the order of migrations is currently incorrect. I noticed this when trying to rollback the latest migration to get back to main and it rolled back several migrations that landed recently before getting to these 3 and then it seems to error on some of these due to not having a foreign key it expects int he down variant. Locally this isn't a big deal as I can just delete my db and start over with main but I'm nervous on the order here for prod.

I suspect it would be better to have these named such that they show up after the latest main migrations as I believe Rails just tracks the latest date migrations have completed and not each individual migration status so there is risk that some environments will skip these migrations and we'll be super confused.

I may be misunderstanding how this works though...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Is that as simple as changing the date marker to, say, today?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so 🫰

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(GitHub has the worst fingers crossed emoji. That's what that is supposed to be)

@jazairi jazairi temporarily deployed to tacos-api-pipeline-pr-185 February 13, 2025 16:33 Inactive
@jazairi
Copy link
Copy Markdown
Contributor Author

jazairi commented Feb 13, 2025

@JPrevost I think the latest commit covers everything, but please let me know if I've missed something. Specifically, I'm a little shaky on whether the new test coverage is adequate/accurate.

@jazairi
Copy link
Copy Markdown
Contributor Author

jazairi commented Feb 13, 2025

I've never seen this GitHub actions caching error. Not sure what that's about. I'm going to try closing/reopening the PR to kickstart CI again... 🤞

@jazairi jazairi closed this Feb 13, 2025
@jazairi jazairi reopened this Feb 13, 2025
@mitlib mitlib temporarily deployed to tacos-api-pipeline-pr-185 February 13, 2025 18:59 Inactive
Rake::Task['suggested_resources:reload'].invoke(remote_file)
end

# The CSV has two rows, one of which is an existing term (web of science). So, only one term should be created.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non blocking. You asked about tests so I spent a bit more time than I normally might with these.

The only suggestion I have is that we might confirm for each of these the change in Term count and the change in SuggestedResource count both match the expectations as sometimes only once changes, sometimes both.

The other comment would be that I might test the bulk_loader method more directly rather than the rake task which just adds complexity. This looks like a carry-forward from the original implementation and just updated to the new spec so I see why you chose this option for this work as changing behavior and how we test is more risky so for now this is probably best to keep as-is (and maybe add assertions for SuggestedResource count if it feel it adds value to track both.

Copy link
Copy Markdown
Contributor Author

@jazairi jazairi Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point, and as I was looking back into this, I realized that even the assertions we have might not all be matching the expectations. Sigh. I want to make sure we get this right before merge, so I'm going to double-check all of these. I'll add the recommended assertions back in as part of that review.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JPrevost I updated the tests to include assertions on both record types, and I updated the file fixtures (and cassettes) to more accurately reflect the cases we're trying to test. I know you've already approved, but I'm going to re-request review because this feels like a significant enough change to warrant it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great change, thanks!

Copy link
Copy Markdown
Member

@JPrevost JPrevost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we've confirmed with UX the plan for us to manually make changes is okay, we're good to merge on my end.

@jazairi jazairi temporarily deployed to tacos-api-pipeline-pr-185 February 14, 2025 18:21 Inactive
@jazairi jazairi requested a review from JPrevost February 14, 2025 18:22
@jazairi jazairi force-pushed the tco-122-associate-suggested-resource-and-term branch from d5002e0 to fc9ee67 Compare February 14, 2025 18:37
@jazairi jazairi temporarily deployed to tacos-api-pipeline-pr-185 February 14, 2025 18:38 Inactive
jazairi and others added 2 commits February 14, 2025 13:38
Why these changes are being introduced:

We [recently decided](#145) to make a separate Fingerprint
model, associated with Term, as multiple detectors are likely
to use fringerprinting (implemented in #138). We have also begun
to split the ActiveRecord components of detectors into separate
models (implemented for Detector::Journal in #162).

Relevant ticket(s):

* [TCO-111](https://mitlibraries.atlassian.net/browse/TCO-111)
* [TCO-122](https://mitlibraries.atlassian.net/browse/TCO-122)

How this addresses that need:

* Splits the ActiveRecord components of Detector::SuggestedResource
into a separate SuggestedResource model.
* Associates SuggestedResource with Fingerprint, via Term, such
that a suggested resource can have multiple terms and fingerprints.
* Removes the suggested resource dashboard (see side effects).

Side effects of this change:

* Config has been adjusted to allow for development logging. This
was lost in the most recent Rails upgrade.
* Terms that are associated with a suggested resource should not
be destroyed. Rails does not allow the `:dependent` option on
`belongs_to` associations, so this commit instead adds a
`before_destroy` hook with a custom method that aborts the callback
and logs the attempt in Sentry.
* Because administrate does not handle has_many relationships well,
we will need to build a custom dashboard to manage suggested resources.
This is ticketed as [TCO-145](https://mitlibraries.atlassian.net/browse/TCO-145).
Until that UI is ready, we will use the Rails console to make any
requested changes to suggested resources.

co-authored-by: Jeremy Prevost <JPrevost@users.noreply.github.com>
Including as a separate commit, as it touches multiple files.
@jazairi jazairi force-pushed the tco-122-associate-suggested-resource-and-term branch from fc9ee67 to 2d2cfbc Compare February 14, 2025 18:39
@jazairi jazairi temporarily deployed to tacos-api-pipeline-pr-185 February 14, 2025 18:39 Inactive
@jazairi jazairi merged commit 82719c1 into main Feb 14, 2025
5 checks passed
@jazairi jazairi deleted the tco-122-associate-suggested-resource-and-term branch February 14, 2025 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants