Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
119 commits
Select commit Hold shift + click to select a range
76bd671
nsf init
davidcam-src Oct 7, 2025
9eaa4e9
new ingest tracker + rubocop
davidcam-src Oct 7, 2025
4139580
refactor
davidcam-src Oct 7, 2025
d713c07
build_config
davidcam-src Oct 7, 2025
264555c
restart functionality
davidcam-src Oct 7, 2025
8439e5e
small refactor for best match function
davidcam-src Oct 13, 2025
d2fd938
metadata ingest service
davidcam-src Oct 13, 2025
34f2322
syntax
davidcam-src Oct 13, 2025
5f860b5
attribute builder
davidcam-src Oct 13, 2025
54c0e04
syntax
davidcam-src Oct 13, 2025
c031f49
syntax
davidcam-src Oct 13, 2025
9d9f48d
openalex mapping
davidcam-src Oct 14, 2025
0c48839
openalex abstract, keywords
davidcam-src Oct 14, 2025
f366a62
open alex attribute builder
davidcam-src Oct 14, 2025
4d8c415
datacite abstract
davidcam-src Oct 15, 2025
e72416d
refactor
davidcam-src Oct 15, 2025
67554d2
base file attachment service
davidcam-src Oct 15, 2025
7886ed4
remove test changes
davidcam-src Oct 15, 2025
84f7a1b
including more stuff in the abstract file attachment service
davidcam-src Oct 17, 2025
a0a6d6f
move ingest result log helper
davidcam-src Oct 17, 2025
573e0c6
process records nsf
davidcam-src Oct 17, 2025
3e84b4d
ingest helper update
davidcam-src Oct 18, 2025
12c47a2
ingest coordinator file attachment
davidcam-src Oct 18, 2025
b7fd988
path ivs
davidcam-src Oct 18, 2025
eb9b514
normalized names
davidcam-src Oct 18, 2025
d1b19ca
wildcard
davidcam-src Oct 18, 2025
c3f6c3a
normalized filenames
davidcam-src Oct 18, 2025
7216501
visibility
davidcam-src Oct 18, 2025
5d949b4
refactor
davidcam-src Oct 20, 2025
ad12bce
remove resolved filename
davidcam-src Oct 20, 2025
eddfe3d
syntax
davidcam-src Oct 20, 2025
8638d82
refactoring
davidcam-src Oct 20, 2025
cd0cd38
refactor
davidcam-src Oct 21, 2025
4703747
notification service refactor
davidcam-src Oct 21, 2025
db7807e
html changes
davidcam-src Oct 21, 2025
9147622
run notif service
davidcam-src Oct 21, 2025
a9011da
parse tracker on restart
davidcam-src Oct 21, 2025
06b94e6
intro banner update
davidcam-src Oct 21, 2025
f3cd7ea
syntax
davidcam-src Oct 21, 2025
fe4c4de
sleep after each permissions and state sync
davidcam-src Oct 21, 2025
fd52736
path correction
davidcam-src Oct 21, 2025
a485a97
keyword correction
davidcam-src Oct 21, 2025
d37dabf
syntax
davidcam-src Oct 22, 2025
6621cf7
test syntax
davidcam-src Oct 22, 2025
477ea39
syntax
davidcam-src Oct 22, 2025
b91ff45
base report mailer tests
davidcam-src Oct 22, 2025
48ea791
nil default arg
davidcam-src Oct 22, 2025
899e5f5
reporting helper tests
davidcam-src Oct 22, 2025
8210b2e
base file attachment service
davidcam-src Oct 24, 2025
17798fb
base notification, tracker test files
davidcam-src Oct 24, 2025
a2eea38
notification helper tests
davidcam-src Oct 24, 2025
c2c2b8d
ingest tracker
davidcam-src Oct 24, 2025
53f3d27
reporting service test
davidcam-src Oct 24, 2025
b93720d
notif service test
davidcam-src Oct 24, 2025
f1729b2
nsf ingest coordinator tests
davidcam-src Oct 24, 2025
8f58e90
more test classes
davidcam-src Oct 24, 2025
7ca31a7
openalex_attribute_builder tests
davidcam-src Oct 24, 2025
f7fabe7
format display names from openalex
davidcam-src Oct 24, 2025
6a92363
cross ref and open alex test
davidcam-src Oct 24, 2025
3728853
file attachment service tests
davidcam-src Oct 24, 2025
3e15cd6
md ingest service tests
davidcam-src Oct 24, 2025
b9d9407
md retrieval helper tests
davidcam-src Oct 24, 2025
583c553
notification service tests
davidcam-src Oct 24, 2025
f40da4e
nsf ingest tracker tests
davidcam-src Oct 24, 2025
8146622
syntax
davidcam-src Oct 24, 2025
372b4f2
nsf report mailer test
davidcam-src Oct 24, 2025
c546bf7
bug fixes
davidcam-src Oct 27, 2025
a66829e
test bug fix
davidcam-src Oct 27, 2025
67a7138
syntax
davidcam-src Oct 27, 2025
421b201
syntax
davidcam-src Oct 27, 2025
e86be65
stub updates
davidcam-src Oct 27, 2025
fdb8b76
log expectation
davidcam-src Oct 27, 2025
6337e7a
stub update
davidcam-src Oct 27, 2025
4441e4a
removing some older tests
davidcam-src Oct 27, 2025
f055f86
ingest helper fix
davidcam-src Oct 27, 2025
684d69f
test updates
davidcam-src Oct 27, 2025
98fae09
test updates
davidcam-src Oct 28, 2025
4501adf
move run tests
davidcam-src Oct 28, 2025
3ef3250
remove redundant tests
davidcam-src Oct 28, 2025
5e72393
stubbing
davidcam-src Oct 28, 2025
c3596a5
wip log removal
davidcam-src Oct 28, 2025
a4ee39c
updated identifier mapping
davidcam-src Oct 28, 2025
5f5be1b
arg tweak
davidcam-src Oct 28, 2025
e6396e9
fallback to openalex for large author lists
davidcam-src Oct 28, 2025
d89cab8
set a limit for the number of authors
davidcam-src Oct 28, 2025
56a312d
sleep interval
davidcam-src Oct 28, 2025
b912512
updated message
davidcam-src Oct 28, 2025
f2c24ff
updated error message
davidcam-src Oct 28, 2025
766d64b
rubocop
davidcam-src Oct 28, 2025
5d7557c
more tests for coverage
davidcam-src Oct 28, 2025
4829179
remove old function, report html update
davidcam-src Oct 28, 2025
375bc2a
all ids
davidcam-src Oct 28, 2025
c842f65
stubbing update
davidcam-src Oct 28, 2025
20b839c
more test changes
davidcam-src Oct 28, 2025
a03ca64
test fixes
davidcam-src Oct 29, 2025
3321f05
syntax
davidcam-src Oct 29, 2025
06fed46
result hash, work utils
davidcam-src Oct 29, 2025
fece4c4
id merging
davidcam-src Oct 29, 2025
69dcbd2
missing filenames
davidcam-src Oct 29, 2025
2ee0ffb
ingest helper rollback
davidcam-src Oct 29, 2025
83d069e
adjusting file attachment
davidcam-src Oct 30, 2025
baf6225
rollback
davidcam-src Oct 30, 2025
98bad20
test update + rollback
davidcam-src Oct 30, 2025
5ed6a6b
address failing tests
davidcam-src Oct 30, 2025
44c9b95
base file attachment serv spec adjustment
davidcam-src Oct 30, 2025
0f9e8ce
more test changes
davidcam-src Oct 30, 2025
6c066cc
stub update
davidcam-src Oct 30, 2025
f371b55
test update - nsf mailer
davidcam-src Oct 30, 2025
ff7c581
remove article state from attr builders
davidcam-src Nov 3, 2025
dccf8fb
base attribute builder tests
davidcam-src Nov 3, 2025
69f73f7
datacite fallback for openalex abstracts, and test updates
davidcam-src Nov 3, 2025
164b5ab
crossref ab tests
davidcam-src Nov 3, 2025
8b72460
metadata ingest service tests
davidcam-src Nov 3, 2025
b76cc04
remove comments and add indents
davidcam-src Nov 3, 2025
530764b
attribute builder test update
davidcam-src Nov 3, 2025
a96f670
candidate count outside of loop
davidcam-src Nov 3, 2025
456dde6
comment updated authors limit
davidcam-src Nov 3, 2025
3819a28
method and initializer call updates
davidcam-src Nov 3, 2025
ba3fc53
test fix
davidcam-src Nov 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 44 additions & 31 deletions app/helpers/work_utils_helper.rb
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,7 @@ def self.fetch_work_data_by_alternate_identifier(identifier)
admin_set_name = work_data['admin_set_tesim']&.first
admin_set_data = admin_set_name ? ActiveFedora::SolrService.get("title_tesim:#{admin_set_name} AND has_model_ssim:(\"AdminSet\")", { :rows => 1, 'df' => 'title_tesim'})['response']['docs'].first : {}
Rails.logger.warn(self.generate_warning_message(admin_set_name, identifier)) if admin_set_data.blank?
result = {
work_id: work_data['id'],
work_type: work_data.dig('has_model_ssim', 0),
title: work_data['title_tesim']&.first,
admin_set_id: admin_set_data['id'],
admin_set_name: admin_set_name,
file_set_ids: work_data['file_set_ids_ssim']
}
result = self.generate_result_hash(work_data, admin_set_data, admin_set_name)
result.compact.empty? ? nil : result
end
def self.fetch_work_data_by_fileset_id(fileset_id)
Expand All @@ -24,14 +17,7 @@ def self.fetch_work_data_by_fileset_id(fileset_id)
admin_set_name = work_data['admin_set_tesim']&.first
admin_set_data = admin_set_name ? ActiveFedora::SolrService.get("title_tesim:#{admin_set_name} AND has_model_ssim:(\"AdminSet\")", { :rows => 1, 'df' => 'title_tesim'})['response']['docs'].first : {}
Rails.logger.warn(self.generate_warning_message(admin_set_name, fileset_id, :fileset)) if admin_set_data.blank?
result = {
work_id: work_data['id'],
work_type: work_data.dig('has_model_ssim', 0),
title: work_data['title_tesim']&.first,
admin_set_id: admin_set_data['id'],
admin_set_name: admin_set_name,
file_set_ids: work_data['file_set_ids_ssim']
}
result = self.generate_result_hash(work_data, admin_set_data, admin_set_name)
result.compact.empty? ? nil : result
end
def self.fetch_work_data_by_id(work_id)
Expand All @@ -40,14 +26,7 @@ def self.fetch_work_data_by_id(work_id)
admin_set_name = work_data['admin_set_tesim']&.first
admin_set_data = admin_set_name ? ActiveFedora::SolrService.get("title_tesim:#{admin_set_name} AND has_model_ssim:(\"AdminSet\")", { :rows => 1, 'df' => 'title_tesim'})['response']['docs'].first : {}
Rails.logger.warn(self.generate_warning_message(admin_set_name, work_id)) if admin_set_data.blank?
result = {
work_id: work_data['id'],
work_type: work_data.dig('has_model_ssim', 0),
title: work_data['title_tesim']&.first,
admin_set_id: admin_set_data['id'],
admin_set_name: admin_set_name,
file_set_ids: work_data['file_set_ids_ssim']
}
result = self.generate_result_hash(work_data, admin_set_data, admin_set_name)
result.compact.empty? ? nil : result
end

Expand All @@ -58,7 +37,7 @@ def self.fetch_work_data_by_doi(doi)

# Step 2: If that fails, normalize DOI and search identifier_tesim with wildcard
if work_data.blank?
normalized_doi = normalize_if_doi(doi)
normalized_doi = normalize_doi(doi)
if normalized_doi
fallback_value = "DOI: https://dx.doi.org/#{normalized_doi}"
fallback_query = "identifier_tesim:\"#{fallback_value}\" NOT has_model_ssim:(\"FileSet\")"
Expand All @@ -82,16 +61,28 @@ def self.fetch_work_data_by_doi(doi)

Rails.logger.warn(self.generate_warning_message(admin_set_name, doi, :doi)) if admin_set_data.blank?

result = {
result = self.generate_result_hash(work_data, admin_set_data, admin_set_name)
result.compact.empty? ? nil : result
end

def self.generate_result_hash(work_data, admin_set_data, admin_set_name)
identifiers = work_data['identifier_tesim'] || []

pmid = identifiers.find { |id| id.match?(/\APMID:\s*\d+/i) }&.split(':')&.last&.strip
pmcid = identifiers.find { |id| id.match?(/\APMCID:\s*\S+/i) }&.split(':')&.last&.strip
doi = identifiers.find { |id| id.match?(/\ADOI:\s*\S+/i) }&.split(':', 2)&.last&.strip

{
work_id: work_data['id'],
work_type: work_data.dig('has_model_ssim', 0),
title: work_data['title_tesim']&.first,
admin_set_id: admin_set_data['id'],
admin_set_name: admin_set_name,
file_set_ids: work_data['file_set_ids_ssim']
admin_set_name: admin_set_data['title_tesim']&.first,
file_set_ids: work_data['file_set_ids_ssim'],
pmid: pmid,
pmcid: pmcid,
doi: doi
}

result.compact.empty? ? nil : result
end

def self.get_permissions_attributes(admin_set_id)
Expand Down Expand Up @@ -187,7 +178,7 @@ def self.generate_warning_message(admin_set_name, id, concern = :id)
end
end

def self.normalize_if_doi(identifier)
def self.normalize_doi(identifier)
return identifier unless identifier.is_a?(String)
# Strip prefix if it's a full DOI URL
if identifier.match?(%r{\Ahttps?://(dx\.)?doi\.org/}i)
Expand All @@ -199,5 +190,27 @@ def self.normalize_if_doi(identifier)
end
end

# Wrapper to find best work match by trying each alternate identifier in order
def self.find_best_work_match_by_alternate_id(doi: nil, pmcid: nil, pmid: nil)
alt_ids = { doi: doi, pmcid: pmcid, pmid: pmid }.compact
return nil if alt_ids.empty?

alt_ids.each do |key, id|
next if id.blank?

work_data =
case key.to_s
when 'doi'
WorkUtilsHelper.fetch_work_data_by_doi(id)
else
WorkUtilsHelper.fetch_work_data_by_alternate_identifier(id)
end

return work_data if work_data.present?
end

nil
end

private_class_method :build_cdr_url, :log_and_nil, :generate_warning_message
end
26 changes: 26 additions & 0 deletions app/mailers/base_ingest_report_mailer.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# frozen_string_literal: true
class BaseIngestReportMailer < ApplicationMailer
def ingest_report_email(report:, zip_path:, template_name:)
if zip_path.blank? || !File.exist?(zip_path)
LogUtilsHelper.double_log(
'No ZIP provided for attachment; sending email without attachments.',
:warn,
tag: "#{template_name}_report_email"
)
else
attachments[File.basename(zip_path)] = File.read(zip_path)
LogUtilsHelper.double_log(
"Attached ZIP file: #{zip_path}",
:info,
tag: "#{template_name}_report_email"
)
end

@report = report
mail(
to: report[:to] || '[email protected]',
subject: report[:subject],
template_name: template_name
)
end
end
10 changes: 10 additions & 0 deletions app/mailers/nsf_report_mailer.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# frozen_string_literal: true
class NSFReportMailer < BaseIngestReportMailer
def nsf_report_email(report:, zip_path: nil)
ingest_report_email(
report: report,
zip_path: zip_path,
template_name: 'nsf_report_email'
)
end
end
23 changes: 7 additions & 16 deletions app/mailers/pubmed_report_mailer.rb
Original file line number Diff line number Diff line change
@@ -1,19 +1,10 @@
# frozen_string_literal: true
class PubmedReportMailer < ApplicationMailer
def pubmed_report_email(report)
@report = report
mail(to: '[email protected]', subject: report[:subject])
end

def truncated_pubmed_report_email(report, zip_path)
if zip_path.blank? || !File.exist?(zip_path)
LogUtilsHelper.double_log('No ZIP provided for attachment; sending email without attachments.', :warn, tag: 'truncated_pubmed_report_email')
else
attachments[File.basename(zip_path)] = File.read(zip_path)
LogUtilsHelper.double_log("Attached ZIP file: #{zip_path}", :info, tag: 'truncated_pubmed_report_email')
end

@report = report
mail(to: '[email protected]', subject: report[:subject], template_name: 'pubmed_report_email')
class PubmedReportMailer < BaseIngestReportMailer
def pubmed_report_email(report:, zip_path: nil)
ingest_report_email(
report: report,
zip_path: zip_path,
template_name: 'pubmed_report_email'
)
end
end
9 changes: 6 additions & 3 deletions app/services/tasks/dimensions_ingest_service.rb
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# frozen_string_literal: true
module Tasks
require 'tasks/ingest_helper'
require 'tasks/ingest_helper_utils/ingest_helper'
class DimensionsIngestService
include Tasks::IngestHelper
include Tasks::IngestHelperUtils::IngestHelper
attr_reader :admin_set, :depositor
UNC_GRID_ID = 'grid.410711.2'

Expand Down Expand Up @@ -43,7 +43,10 @@ def process_publication(publication)
create_sipity_workflow(work: article)
pdf_path = extract_pdf(publication)
if pdf_path
pdf_file = attach_pdf_to_work(article, pdf_path, @depositor, article.visibility)
pdf_file = attach_pdf_to_work(work: article,
file_path: pdf_path,
depositor: @depositor,
visibility: article.visibility)
pdf_file.update(permissions_attributes: group_permissions(@admin_set))
end
article
Expand Down
133 changes: 0 additions & 133 deletions app/services/tasks/ingest_helper.rb

This file was deleted.

Loading