Refactor SuggestedResource to leverage new Fingerprint model#185
Refactor SuggestedResource to leverage new Fingerprint model#185
Conversation
adb8fe3 to
71a3baf
Compare
| # @param input [CSV::Table] An imported CSV file containing all Suggested Resource records. The CSV file must have | ||
| # at least three headers, named "Title", "URL", and "Phrase". Please note: these values | ||
| # are case sensitive. | ||
| def self.bulk_replace(input) |
There was a problem hiding this comment.
In order to proceed with the review more efficiently, I came up with what might be okay for this bulk_replace method.
def self.bulk_replace(input)
raise ArgumentError.new, 'Tabular CSV is required' unless input.instance_of?(CSV::Table)
# Need to check what columns exist in input
required_headers = %w[title url phrase]
missing_headers = required_headers - input.headers
raise ArgumentError.new, "Some CSV columns missing: #{missing_headers}" unless missing_headers.empty?
SuggestedResource.destroy_all
input.each do |line|
term = Term.find_or_create_by(phrase: line['phrase'])
term.suggested_resource = SuggestedResource.new({ title: line['title'], url: line['url'] })
term.save
end
endIt changes the required_headers to lowercase as that's how they come out of Heroku DataClips (there is a saved DataClip with the prod data) and avoids the problem I ran into by starting with Term and adding SuggestedResources instead of the other way around which was a remnant from before they were related.
There was a problem hiding this comment.
I did not adjust tests that are likely failing now, nor do a ton of manual verification they actually migrated to the expected terms, but I now have 117 SuggestedResources locally I can use to help understand these changes better which was my main goal :)
There was a problem hiding this comment.
Ah. This approach ends up with a 1:1 mapping of Terms to SuggestedResources rather than linking SuggestedResources to multiple Terms which was half the battle.
There was a problem hiding this comment.
input.each do |line|
term = Term.find_or_create_by(phrase: line['phrase'])
# check for existing SuggestedResource with the same title/url
dup_check = SuggestedResource.where(title: line['title'], url: line['url'])
# link to existing SuggestedResource if one exists
term.suggested_resource = if dup_check.count > 0
dup_check.first
# create a new SuggestedResource if it doesn't exist
else
SuggestedResource.new({ title: line['title'], url: line['url'] })
end
term.save
endThere was a problem hiding this comment.
Here's what I was tentatively working with, but I haven't confirmed it yet:
input.each do |line|
record = SuggestedResource.find_or_create_by({ title: line['Title'], url: line['URL'] })
record.save
record.terms.find_or_create_by(phrase: line['Phrase'])
endBut again, I haven't tested that yet, and it seems too easy to work. In that case, I'm fine working from the term instead. If you want to push what you have, I can write/edit the tests. Thanks for looking into this!
There was a problem hiding this comment.
I've pushed what I had, if after you look more closely at it your version does the same thing it's definitely way simpler and we should go with it instead :)
JPrevost
left a comment
There was a problem hiding this comment.
A few small tweaks suggested, but this seems to work exactly like we wanted. Thanks for working on this!
test/models/term_test.rb
Outdated
| test 'destroys record successfully if no suggested resource is present' do | ||
| term = terms(:hi) | ||
| assert_nil term.suggested_resource | ||
| assert_nothing_raised { term.destroy } |
There was a problem hiding this comment.
Nothing is raised in the other test so Im not sure this is testing what we think it is? Maybe a better assertion is to confirm Term.where(phrase: 'whatever the original term we just deleted phrase was') returns no match?
There was a problem hiding this comment.
Or maybe just follow what the other test does and check for term.present? If that doesn't work as expected then both tests would need updating :)
There was a problem hiding this comment.
Oh yeah, good catch. I think this was from an earlier iteration where I was raising an exception rather than aborting the callback. I agree that assert_not term.present? is much better here.
| @@ -0,0 +1,10 @@ | |||
| class CreateSuggestedResources < ActiveRecord::Migration[7.1] | |||
There was a problem hiding this comment.
@jazairi I'm not sure how important this is, but the order of migrations is currently incorrect. I noticed this when trying to rollback the latest migration to get back to main and it rolled back several migrations that landed recently before getting to these 3 and then it seems to error on some of these due to not having a foreign key it expects int he down variant. Locally this isn't a big deal as I can just delete my db and start over with main but I'm nervous on the order here for prod.
I suspect it would be better to have these named such that they show up after the latest main migrations as I believe Rails just tracks the latest date migrations have completed and not each individual migration status so there is risk that some environments will skip these migrations and we'll be super confused.
I may be misunderstanding how this works though...
There was a problem hiding this comment.
Gotcha. Is that as simple as changing the date marker to, say, today?
There was a problem hiding this comment.
(GitHub has the worst fingers crossed emoji. That's what that is supposed to be)
|
@JPrevost I think the latest commit covers everything, but please let me know if I've missed something. Specifically, I'm a little shaky on whether the new test coverage is adequate/accurate. |
|
I've never seen this GitHub actions caching error. Not sure what that's about. I'm going to try closing/reopening the PR to kickstart CI again... 🤞 |
| Rake::Task['suggested_resources:reload'].invoke(remote_file) | ||
| end | ||
|
|
||
| # The CSV has two rows, one of which is an existing term (web of science). So, only one term should be created. |
There was a problem hiding this comment.
Non blocking. You asked about tests so I spent a bit more time than I normally might with these.
The only suggestion I have is that we might confirm for each of these the change in Term count and the change in SuggestedResource count both match the expectations as sometimes only once changes, sometimes both.
The other comment would be that I might test the bulk_loader method more directly rather than the rake task which just adds complexity. This looks like a carry-forward from the original implementation and just updated to the new spec so I see why you chose this option for this work as changing behavior and how we test is more risky so for now this is probably best to keep as-is (and maybe add assertions for SuggestedResource count if it feel it adds value to track both.
There was a problem hiding this comment.
This is a good point, and as I was looking back into this, I realized that even the assertions we have might not all be matching the expectations. Sigh. I want to make sure we get this right before merge, so I'm going to double-check all of these. I'll add the recommended assertions back in as part of that review.
There was a problem hiding this comment.
@JPrevost I updated the tests to include assertions on both record types, and I updated the file fixtures (and cassettes) to more accurately reflect the cases we're trying to test. I know you've already approved, but I'm going to re-request review because this feels like a significant enough change to warrant it.
JPrevost
left a comment
There was a problem hiding this comment.
Once we've confirmed with UX the plan for us to manually make changes is okay, we're good to merge on my end.
d5002e0 to
fc9ee67
Compare
Why these changes are being introduced: We [recently decided](#145) to make a separate Fingerprint model, associated with Term, as multiple detectors are likely to use fringerprinting (implemented in #138). We have also begun to split the ActiveRecord components of detectors into separate models (implemented for Detector::Journal in #162). Relevant ticket(s): * [TCO-111](https://mitlibraries.atlassian.net/browse/TCO-111) * [TCO-122](https://mitlibraries.atlassian.net/browse/TCO-122) How this addresses that need: * Splits the ActiveRecord components of Detector::SuggestedResource into a separate SuggestedResource model. * Associates SuggestedResource with Fingerprint, via Term, such that a suggested resource can have multiple terms and fingerprints. * Removes the suggested resource dashboard (see side effects). Side effects of this change: * Config has been adjusted to allow for development logging. This was lost in the most recent Rails upgrade. * Terms that are associated with a suggested resource should not be destroyed. Rails does not allow the `:dependent` option on `belongs_to` associations, so this commit instead adds a `before_destroy` hook with a custom method that aborts the callback and logs the attempt in Sentry. * Because administrate does not handle has_many relationships well, we will need to build a custom dashboard to manage suggested resources. This is ticketed as [TCO-145](https://mitlibraries.atlassian.net/browse/TCO-145). Until that UI is ready, we will use the Rails console to make any requested changes to suggested resources. co-authored-by: Jeremy Prevost <JPrevost@users.noreply.github.com>
Including as a separate commit, as it touches multiple files.
fc9ee67 to
2d2cfbc
Compare
Commit message
Why these changes are being introduced:
We recently decided to make a separate Fingerprint model, associated with Term, as multiple detectors are likely to use fringerprinting (implemented in #138). We have also begun to split the ActiveRecord components of detectors into separate models (implemented for Detector::Journal in #162).
Relevant ticket(s):
How this addresses that need:
Side effects of this change:
:dependentoption onbelongs_toassociations, so this commit instead adds abefore_destroyhook with a custom method that aborts the callback and logs the attempt in Sentry.Developer
Accessibility
all issues introduced by these changes have been resolved or opened
as new issues (link to those issues in the Pull Request details above)
Documentation
ENV
Stakeholders
Dependencies and migrations
NO dependencies are updated
YES migrations are included
Reviewer
Code
added technical debt.
Documentation
(not just this pull request message).
Testing