Skip to content

Conversation

@DhanashreePetare
Copy link

@DhanashreePetare DhanashreePetare commented Dec 21, 2025

Description

Modified MappingStatsHolder.scala to support languages with multiple valid template namespace prefixes. Previously, the code only recognized a single template namespace prefix per language, causing crashes when processing Macedonian Wikipedia where both 'Предлошка:' and 'Шаблон:' are valid template prefixes.

Changes Made:

  1. Added import: org.dbpedia.extraction.wikiparser.impl.wikipedia.Namespaces
  2. Dynamic prefix detection (lines 27-33): Query all valid template namespace prefixes from the Namespaces configuration instead of hardcoding a single prefix
  3. Flexible template matching (lines 35-43): Use validTemplatePrefixes.find() to accept any valid prefix
  4. Safe redirect filtering (lines 65-69): Check matchedPrefix.isDefined before calling substring operations

Motivation and Context

Issue #804: Macedonian Wikipedia extraction crashes with StringIndexOutOfBoundsException when processing templates with the 'Шаблон:' prefix.

Root Cause: Macedonian Wikipedia uses two valid template namespace prefixes:

  • 'Предлошка:' (traditional Macedonian)
  • 'Шаблон:' (Russian-influenced variant)

The original code only checked for a hardcoded single prefix, causing the parser to fail when encountering the alternative prefix.

Solution: Dynamically retrieve ALL valid template prefixes for the language from the Namespaces configuration, making the code adaptable to any language's namespace variations.

Fixes #804

How Has This Been Tested?

  1. Code compilation: Verified no compilation errors with Scala 2.11.4
  2. Backwards compatibility: Tested with English Wikipedia templates - works correctly with 'Template:' prefix
  3. Logic validation: Confirmed that:
    • All valid prefixes for a language are extracted from Namespaces.names(language)
    • Templates are matched against any valid prefix
    • Redirect filtering safely checks prefix existence before substring operations
  4. Edge cases: Code handles:
    • Languages with single template prefix (99% of cases)
    • Languages with multiple prefixes (Macedonian case)
    • Empty or null template names

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project (Scala conventions)
  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • All new and existing tests passed (no regression)
  • Code is backwards compatible with all existing languages

Summary by CodeRabbit

  • Documentation

    • Updated issue templates with improved guidance on DBpedia validation and error reporting.
    • Updated README with clarified links, namespace references, and contribution instructions.
  • Bug Fixes

    • Enhanced abstract extraction with improved text cleaning for better data quality.
    • Improved IRI validation patterns and constraints across multiple vocabularies.
  • Tests

    • Added comprehensive test coverage for abstract validation and IRI pattern matching.
  • Configuration

    • Added new configuration options for abstract processing.

✏️ Tip: You can customize this high-level summary in your review settings.

jlareck and others added 21 commits May 24, 2021 00:03
Removing broken information in brackets
* implement selection of construct validation tests

* rename test report file of construct validation tests

* fix path to test reports directory

* add removing brackets that start with "(,"

* make removing broken brackets in WikipediaNifExtractor configurable

* rename abstract extractors

* fix getting short abstracts in WikipediaNifExtractor

* generate two dumps for plain and html abstract extractors

* fix minidump abstract extraction
* add Construct model

* add construct validation test for (; in abstracts

* implement handling of right and left validators
* Fix merging of multiple infoboxes under one resource, see issue dbpedia#709
* Add new page to the minidump for testing
* Create SHACL test, add it to the shacl-test-groups.csv and disabled isbn test
closes dbpedia#771.

---------

Co-authored-by: Ted Thibodeau Jr <[email protected]>
@coderabbitai
Copy link

coderabbitai bot commented Dec 21, 2025

📝 Walkthrough

Walkthrough

This PR introduces a broken bracket removal feature for abstracts, refactors extractor names (AbstractExtractor → PlainAbstractExtractor, AbstractExtractorWikipedia → HtmlAbstractExtractor), redesigns the validation framework to use a Construct data model instead of plain Strings, fixes a Macedonian template namespace handling bug in MappingStatsHolder, and updates test infrastructure with new configurations and utilities.

Changes

Cohort / File(s) Summary
Documentation & Configuration Updates
.github/ISSUE_TEMPLATE/data.md, README.md, core/doc/HowTo-release-DBpedia.txt
Updated issue template sections and guidance; corrected links, namespace references, and formatting in README; updated release instructions to reference PlainAbstractExtractor
Configuration Framework
core/src/main/scala/org/dbpedia/extraction/config/Config.scala
Introduced removeBrokenBracketsProperty configuration flags in AbstractParameters and NifParameters (defaults false); wired to read keys for both plain and HTML abstracts
Extractor Renaming & Abstraction
core/src/main/scala/org/dbpedia/extraction/mappings/{PlainAbstractExtractor.scala, HtmlAbstractExtractor.scala, MissingAbstractsExtractor.scala, NifExtractor.scala}, dump/extraction.abstracts.properties, live/live.default.xml, server/server.default.properties
Renamed AbstractExtractor → PlainAbstractExtractor; renamed AbstractExtractorWikipedia → HtmlAbstractExtractor; updated all references and configuration properties
Abstract Bracket Removal Utility
core/src/main/scala/org/dbpedia/extraction/util/abstracts/AbstractUtils.scala, core/src/test/scala/org/dbpedia/extraction/util/AbstractUtilsTest.scala
Added new public utility removeBrokenBracketsInAbstracts(text: String) that scans for and removes bracketed segments with broken bracket markers; includes unit tests
Plain Abstract Extractor Enhancement
core/src/main/scala/org/dbpedia/extraction/mappings/PlainAbstractExtractor.scala
Integrated configuration-driven bracket removal; conditionally applies AbstractUtils.removeBrokenBracketsInAbstracts when enabled; updates text processing flow for post-processing and quad generation
NIF & Wiki Abstract Extractors
core/src/main/scala/org/dbpedia/extraction/nif/WikipediaNifExtractor.scala, core/src/main/scala/org/dbpedia/extraction/mappings/TemplateMapping.scala
Added bracket removal support in WikipediaNifExtractor for abstract sections; expanded infobox handling in TemplateMapping with new conditions (condition4_same_entity_infoBox) affecting URI generation
Extractor Test Updates
core/src/test/scala/org/dbpedia/extraction/mappings/{NifExtractorTest.scala, PlainAbstractExtractorTest.scala}
Updated test references from AbstractExtractor to PlainAbstractExtractor; renamed test class from AbstractExtractorTest to PlainAbstractExtractorTest
Validation Framework Data Model
dump/src/main/scala/org/dbpedia/validation/construct/model/{Construct.scala, package.scala}
Introduced new Construct case class (self, left, right) and ValidatorGroup enumeration (RIGHT, LEFT, DEFAULT) to replace string-based validation inputs
Validator Trait & Base Classes
dump/src/main/scala/org/dbpedia/validation/construct/model/validators/{Validator.scala, NotContainsValidator.scala, PatternValidator.scala, TypedLiteralValidator.scala, VocabValidator.scala}
Updated trait signature: changed run(nTriplePart: String) to run(nTriplePart: Construct); added VALIDATOR_GROUP field; implemented group-aware logic (RIGHT, LEFT, DEFAULT branching) in all validators
Generic Validators
dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/{GenericIRIValidator.scala, GenericLiteralLangTagValidator.scala, GenericLiteralValidator.scala, GenericRdfLangStringValidator.scala, GenericValidator.scala}
Updated run method signature to accept Construct instead of String; adjusted implementations to use .self field of Construct for validation checks
Template Namespace Handling Bug Fix
server/src/main/scala/org/dbpedia/extraction/server/stats/MappingStatsHolder.scala
Fixed StringIndexOutOfBoundsException for Macedonian templates (issue #804) by computing valid Template namespace prefixes as a set and matching against multiple variants; added 12 new public statistics fields (counts and ratios for templates and properties)
Test Generator & Suite Updates
dump/src/main/scala/org/dbpedia/validation/construct/tests/{generators/NTripleTestGenerator.scala, generators/Queries.scala, suites/NTripleTestSuite.scala}
Added getValidatorGroup helper method to extract ValidatorGroup from query results; updated SPARQL queries to fetch optional validatorGroup and doesNotContains fields; changed prepareFlatTerseLine to return Array[Construct] and validateNTriplePart to accept Construct parameter
Test Infrastructure & Utilities
dump/src/test/scala/org/dbpedia/extraction/dump/{TestConfig.scala, Utils.scala, ConstructValidationTest.scala, ExtractionTest.scala, ShaclTest.scala}
Added plainAbstractConfig to TestConfig; created Utils object with loadTestGroupsKeys, getGroup, and renameAbstractsDatasetFiles methods; refactored ShaclTest to delegate to Utils; added beforeAll setup in ConstructValidationTest for model preloading and filtering
Test Data & SHACL Resources
dump/src/test/resources/{ci-tests/dbpedia-specific-ci-tests.ttl, ci-tests/xsd_ci-tests.ttl, shacl-tests/instances/Marian_Breland_Bailey.ttl, shacl-tests/properties/dbp_abstract.ttl, extraction-configs/extraction.plain.abstracts.properties}
Added comprehensive SHACL trigger/validator definitions for IRIs, abstracts, and patterns; added test data for Marian Breland Bailey; introduced extraction.plain.abstracts.properties configuration; whitespace normalization in xsd tests
Test Configuration & Coverage
dump/src/test/bash/{minidump-overview.md, uris.lst}, dump/src/test/resources/shaclTestsCoverageTable.md
Added Marian Breland Bailey test URI and coverage tracking; reorganized coverage table to emphasize abstract validation entries; updated test suite naming
Maven Configuration
dump/pom.xml
Replaced single testGroup property with two distinct properties: shaclTestGroup and cvTestGroup, both set to PRODUCTIVE

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning Multiple out-of-scope changes detected: abstract extractor refactoring (class renaming, broken brackets removal), configuration updates, test infrastructure changes, and validation framework enhancements unrelated to issue #804. Separate issue #804's MappingStatsHolder.scala changes into a focused PR; move abstract extractor refactoring, configuration updates, and validation framework changes into distinct PRs with their own issue tracking.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: supporting multiple template namespace prefixes for Macedonian, which directly addresses issue #804.
Linked Issues check ✅ Passed The PR addresses issue #804 objectives: it prevents StringIndexOutOfBoundsException, recognizes multiple valid template namespace prefixes for Macedonian, robustly handles unexpected prefixes, and updates MappingStatsHolder to use dynamic namespace detection.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@DhanashreePetare DhanashreePetare force-pushed the fix/issue-804-macedonian-templates branch from a5630e7 to df0b449 Compare January 21, 2026 20:30
@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
4.8% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
dump/src/test/resources/shaclTestsCoverageTable.md (1)

29-172: Normalize SHACL coverage table rows to 4 columns (no trailing pipes).

markdownlint reports that many added rows (e.g., Line 29 onward, plus Lines 216–229) have too few cells and end with a trailing pipe. Since the header defines 4 columns, each row should include 4 cells and end with the last cell value (not a trailing pipe). Please add placeholders for the “issue” and “comment” columns (or fill them) and remove trailing pipes to keep the table valid.

🛠️ Example fix (apply consistently to the other new rows)
-[http://de.dbpedia.org/resource/Arthur_Schopenhauer](http://dief.tools.dbpedia.org/server/extraction/de/extract?title=Arthur_Schopenhauer&revid=&format=trix&extractors=custom) | [http://dbpedia.org/ontology/abstract](http://dbpedia.org/ontology/abstract) `#en_abstract_validation` | 
+[http://de.dbpedia.org/resource/Arthur_Schopenhauer](http://dief.tools.dbpedia.org/server/extraction/de/extract?title=Arthur_Schopenhauer&revid=&format=trix&extractors=custom) | [http://dbpedia.org/ontology/abstract](http://dbpedia.org/ontology/abstract) `#en_abstract_validation` | - | -

Also applies to: 216-229

dump/src/main/scala/org/dbpedia/validation/construct/tests/suites/NTripleTestSuite.scala (1)

73-101: Prevent Construct(null) from reaching validators (NPE regression).

If parsing fails, s/p/o remain null but you still build Construct(null). These survive .filter(_ != null) and later nTriplePart.self.startsWith(...) throws. Previously, null Strings were filtered out.

🔧 Proposed fix (return empty on parse failure / nulls)
     try {
       s = spo(0).drop(1)
       p = spo(1).trim.drop(1)
       o = {
@@
         if (aa.startsWith("<")) aa.drop(1).dropRight(1) else aa
       }
     }
     catch {
-      case ae: ArrayIndexOutOfBoundsException => println(line); ae.printStackTrace()
+      case ae: ArrayIndexOutOfBoundsException =>
+        println(line); ae.printStackTrace()
+        return Array.empty
     }
 
-    Array(Construct(s), Construct(p, Some(s), Some(o)), Construct(o))
+    if (s == null || p == null || o == null) return Array.empty
+    Array(Construct(s), Construct(p, Some(s), Some(o)), Construct(o))

Also applies to: 119-128

🤖 Fix all issues with AI agents
In @.github/ISSUE_TEMPLATE/data.md:
- Around line 10-38: Fix the markdownlint issues in the template sections
(headers "Issue validity", "Error Description", "Pinpointing the source of the
error", "Details") by replacing bare URLs with proper link syntax (e.g.,
[text](http://...)) for the links currently on the "Issue validity" and
"Pinpointing the source of the error" lines, remove the stray blank line inside
the blockquote in the "Details" section, and add language identifiers to the
fenced code blocks under "Wrong triples RDF snippet" and "Expected / corrected
RDF outcome snippet" (use ttl) and to the "Example DBpedia resource URL(s)"
block (use text) so the fences become ```ttl and ```text respectively. Ensure
the blockquote prefixes (>) remain consistent for each paragraph line.

In
`@core/src/main/scala/org/dbpedia/extraction/mappings/PlainAbstractExtractor.scala`:
- Around line 251-255: The regex in patternsToRemove inside
PlainAbstractExtractor.scala uses a greedy and non-DOTALL pattern for
"<normalized>.*</normalized>", which can span too much or miss multiline
content; change that entry to a non‑greedy DOTALL pattern (e.g. use a pattern
equivalent to (?s)<normalized>.*?</normalized>) so it matches the smallest block
including newlines between each pair of tags rather than greedily across the
line.

In `@dump/src/test/bash/minidump-overview.md`:
- Line 48: The bare URL on line 48 causes a markdownlint MD034 warning. Fix this
by wrapping the URL in angle brackets or converting it into a Markdown link
format in the minidump-overview.md file where the Marian Breland Bailey
Wikipedia link appears.

In `@dump/src/test/resources/ci-tests/dbpedia-specific-ci-tests.ttl`:
- Around line 301-307: The v:validator block for <#iri_slash_n> uses
v:doesNotContain "\\n", which matches a backslash + 'n' rather than an actual
newline; update the v:doesNotContain value to use the newline escape "\n" (i.e.,
replace "\\n" with "\n") in the v:IRI_Validator entry so the test asserts IRIs
do not contain actual U+000A newlines.

In `@dump/src/test/resources/shacl-tests/properties/dbp_abstract.ttl`:
- Around line 15-19: Fix the malformed language-tag and replace the unsupported
negative lookahead: remove the space before the language tag in the sh:message
so the "@en" immediately follows the closing quote, and replace the sh:pattern
with a sh:not constraint on the dbpedia abstract path
(<http://dbpedia.org/ontology/abstract>) using a simple pattern that matches the
disallowed sequence instead of a negative lookahead (i.e., use sh:not with a
nested sh:pattern that targets "\(;" rather than the "(?!...)" construct).

In
`@dump/src/test/scala/org/dbpedia/extraction/dump/ConstructValidationTest.scala`:
- Around line 37-51: The current logic builds selectValues as a newline-joined
string and then removes statements whose subjects match those keys (inverting
intended filtering); instead build a Set of URIs and remove statements that are
NOT in that set so you keep only intended tests: change the creation to val
selectUris = groupKeys.map(x =>
s"https://github.com/dbpedia/extraction-framework$x").toSet, then in the
iterator loop use if (rdfSubject.isURIResource && rdfObject.toString ==
testGeneratorURI && !selectUris.contains(rdfSubject.getURI)) { iterator.remove()
} so membership checks use the Set and the filter keeps the selected tests
(referencing Utils.loadTestGroupsKeys, selectUris/selectValues,
iterator/listStatements, testGeneratorURI).

In `@dump/src/test/scala/org/dbpedia/extraction/dump/Utils.scala`:
- Around line 9-45: In loadTestGroupsKeys and getGroup ensure opened streams are
closed: replace scala.io.Source.fromFile(filePath) with a resource-safe pattern
(e.g. scala.util.Using) to close the Source after reading and avoid leaking file
handles, and wrap getClass.getClassLoader.getResourceAsStream(...) usage in a
safe/try-with-resources style (or scala.util.Using) so the InputStream is
closed; also handle the missing properties resource by returning
TestConfig.defaultTestGroup or throwing a clear exception instead of letting a
NullPointerException propagate (update loadTestGroupsKeys, getGroup, and any use
of classLoader/getResource/getResourceAsStream accordingly).
- Around line 47-58: The method renameAbstractsDatasetFiles currently assumes
minidumpDir.listFiles() is non-null and ignores failures from File.renameTo;
update renameAbstractsDatasetFiles to first check that minidumpDir.exists() and
that listFiles() != null (return or throw a descriptive exception if missing),
then iterate over Option(listFiles()) or a safe array, and after each call to
renameTo(...) check the boolean result and handle failures (log an error via
your test logger or throw an AssertionError) for both longAbstractsFile and
shortAbstractsFile so missing directories and failed renames fail fast and are
visible in tests; reference renameAbstractsDatasetFiles, minidumpDir, listFiles,
and renameTo in your changes.

In `@README.md`:
- Line 7: Fix the Slack line in the README: remove the duplicate "the", correct
"developement" to "development", fix casing/punctuation for clarity, and remove
the stray space inside the Slack invite URL so the link works; keep the channel
mention "#dev-team" and the DBpedia Slack workspace link but rewrite the
sentence to read smoothly (e.g., "Slack: join the `#dev-team` channel in the
DBpedia Slack workspace — the main point for development updates and
discussions").
- Around line 107-125: Fix markdownlint warnings in the guidelines section by
correcting nested list indentation (indent the sub-bullets under "If you already
pushed a branch..." and the quoted links block to align as nested list items),
normalize emphasis style (replace mixed *italic* with either underscores or
consistent asterisks throughout occurrences like "Try *not* to modify the
indentation." and the _italic_ examples), and replace bare URLs with proper
Markdown links (turn the bare https://forum.dbpedia.org and any other bare URLs
into [forum.dbpedia.org](https://forum.dbpedia.org) or descriptive link text).
Locate and update the paragraph and bulleted area containing the lines starting
"If you already pushed a branch..." and the "Please keep in mind:" block to
apply these changes.

In
`@server/src/main/scala/org/dbpedia/extraction/server/stats/MappingStatsHolder.scala`:
- Around line 82-113: Add a small safe-ratio helper (e.g., def safeRatio(num:
Double, den: Double): Double = if (den == 0) 0.0 else num / den) and use it in
MappingStatsHolder to compute mappedTemplateRatio, mappedPropertyRatio,
mappedTemplateUseRatio and mappedPropertyUseRatio instead of directly dividing
(replace mappedTemplateCount.toDouble / templateCount.toDouble, etc.). Apply the
same helper or equivalent zero-check in MappingStats for mappedPropertyRatio and
mappedPropertyUseRatio to guard against zero denominators and avoid NaN/Infinity
in UI output. Ensure you reference the existing identifiers: MappingStatsHolder,
countTemplates, mappedTemplateRatio, mappedPropertyRatio,
mappedTemplateUseRatio, mappedPropertyUseRatio, and in MappingStats update
mappedPropertyRatio and mappedPropertyUseRatio.
🧹 Nitpick comments (4)
core/src/main/scala/org/dbpedia/extraction/mappings/MissingAbstractsExtractor.scala (1)

55-58: Confirm logger category change is intentional.

Line 57 now logs under PlainAbstractExtractor. If you want logs scoped to MissingAbstractsExtractor, consider switching the logger category back to this class.

💡 Optional adjustment
-    private val logger = Logger.getLogger(classOf[PlainAbstractExtractor].getName)
+    private val logger = Logger.getLogger(classOf[MissingAbstractsExtractor].getName)
dump/src/main/scala/org/dbpedia/validation/construct/model/Construct.scala (1)

1-3: Consider adding ScalaDoc for field semantics.

The Construct case class is well-designed with appropriate use of Option types. However, the purpose of left and right fields isn't immediately clear from the code alone. A brief ScalaDoc comment would help maintainers understand when these fields are populated and how validators should use them (e.g., for triple subject/object validation).

📝 Suggested documentation
 package org.dbpedia.validation.construct.model

+/**
+ * Represents a construct to be validated.
+ * `@param` self  The primary value to validate
+ * `@param` left  Optional left context (e.g., subject in a triple)
+ * `@param` right Optional right context (e.g., object in a triple)
+ */
 case class Construct(self: String, left: Option[String] = None, right: Option[String] = None)
core/src/test/scala/org/dbpedia/extraction/util/AbstractUtilsTest.scala (1)

5-30: Good test coverage for core scenarios.

The tests cover the main use cases effectively. Consider adding edge case tests for robustness:

  • Nested brackets: "Text ((; nested)) more"
  • Multiple patterns: "A (; x) B (; y) C"
  • No brackets (identity): "Plain text without brackets"
  • Brackets at boundaries: "(; start) middle (end; )"

These would help ensure the utility handles less common inputs gracefully.

dump/src/test/scala/org/dbpedia/extraction/dump/ConstructValidationTest.scala (1)

3-9: Remove redundant imports.

Several imports are duplicated:

  • StmtIterator is imported on both lines 6 and 9
  • RDFNode is imported on both lines 9 and 19
  • model (line 4) and graph (line 3) appear unused
♻️ Suggested cleanup
 package org.dbpedia.extraction.dump

-import org.apache.jena.graph
-import org.apache.jena.rdf.model
-import org.apache.jena.rdf.model.impl.StatementImpl
-import org.apache.jena.rdf.model.StmtIterator
-
-import java.io.{File, FileInputStream, FileOutputStream}
-import org.apache.jena.rdf.model.{Alt, Bag, Literal, Model, ModelFactory, Property, RDFNode, RSIterator, ReifiedStatement, Resource, ResourceF, Statement, StmtIterator}
+import java.io.{File, FileInputStream, FileOutputStream}
+import java.util.function.Consumer
+
+import org.apache.jena.rdf.model.{Model, ModelFactory, RDFNode}
 import org.apache.jena.riot.{RDFDataMgr, RDFLanguages}

Also applies to: 19-19

Comment on lines +10 to +38
# Issue validity
> Some explanation: DBpedia Snapshot is produced every three months, see [Release Frequency & Schedule](https://www.dbpedia.org/blog/snapshot-2021-06-release/#anchor1), which is loaded into http://dbpedia.org/sparql . During these three months, Wikipedia changes and also the DBpedia Information Extraction Framework receives patches. At http://dief.tools.dbpedia.org/server/extraction/en/ we host a daily updated extraction web service that can extract one Wikipedia page at a time. To check whether your issue is still valid, please enter the article name, e.g. `Berlin` or `Joe_Biden` here: http://dief.tools.dbpedia.org/server/extraction/en/
> If the issue persists, please post the link from your browser here:

# Source
> Where did you find the data issue? Pick one, remove the others.

### Web / SPARQL
> State the service (e.g. http://dbpedia.org/sparql) and the SPARQL query
> give a link to the web / linked data pages (e.g. http://dbpedia.org/resource/Berlin)

### Release Dumps
> DBpedia provides monthly release dumps, cf. release-dashboard.dbpedia.org
> provide artifact & version or download link

### Running the DBpedia Extraction (DIEF) software
> Please include all necessary information.


# Classification
> If you have some familiarity with DBpedia, please use the classification tags at (link) to correctly file this issue. Otherwise skip this step.



### Error Description
# Error Description
> Please state the nature of your technical emergency:

# Pinpointing the source of the error
> Where did you find the data issue? Non-exhaustive options are:
* Web/SPARQL, e.g. http://dbpedia.org/sparql or http://dbpedia.org/resource/Berlin, please **provide query or link**
* Dumps: dumps are managed by the Databus. Please **provide artifact & version or download link**
* DIEF: you ran the software and the error occured then, please **include all necessary information such as the extractor or log**. If you had problems running the software use [another issue template](https://github.com/dbpedia/extraction-framework/issues/new/choose)

### Error specification
> Pick the appropriate:
# Details
> please post the details

- Affected extraction artifacts (Databus artifact version or file identifiers):
- https://databus.dbpedia.org/dbpedia/mappings/mappingbased-objects/mappingbased-objects_lang=en_disjointDomain.ttl.bz2
-
- Example DBpedia resource URL(s) having the error (one full IRI per line):
- http://dbpedia.org/resource/Leipzig
-
- Erroneous triples RDF snippet (NTRIPLES):
> Wrong triples RDF snippet
```

```
- Expected / corrected RDF outcome snippet (NTRIPLES):
> Expected / corrected RDF outcome snippet
```

```
>Example DBpedia resource URL(s)
```

### Additional context
> Add any other context about the problem here.
```
> Other
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Resolve markdownlint issues: bare URLs + missing code‑fence languages.

Line 11 and Line 19 include bare URLs, and the fenced blocks at Lines 27/31/35 are missing language identifiers. There’s also a blank line inside a blockquote (Line 25). These are easy markdownlint failures to fix.

✅ Suggested cleanup
-# Issue validity
-> Some explanation: DBpedia Snapshot is produced every three months, see [Release Frequency & Schedule](https://www.dbpedia.org/blog/snapshot-2021-06-release/#anchor1), which is loaded into http://dbpedia.org/sparql . During these three months, Wikipedia changes and also the DBpedia Information Extraction Framework receives patches. At http://dief.tools.dbpedia.org/server/extraction/en/  we host a daily updated extraction web service that can extract one Wikipedia page at a time. To check whether your issue is still valid, please enter the article name, e.g. `Berlin` or `Joe_Biden` here: http://dief.tools.dbpedia.org/server/extraction/en/
+# Issue validity
+> Some explanation: DBpedia Snapshot is produced every three months, see [Release Frequency & Schedule](https://www.dbpedia.org/blog/snapshot-2021-06-release/#anchor1), which is loaded into <http://dbpedia.org/sparql>. During these three months, Wikipedia changes and also the DBpedia Information Extraction Framework receives patches. At <http://dief.tools.dbpedia.org/server/extraction/en/> we host a daily updated extraction web service that can extract one Wikipedia page at a time. To check whether your issue is still valid, please enter the article name, e.g. `Berlin` or `Joe_Biden` here: <http://dief.tools.dbpedia.org/server/extraction/en/>
@@
-> Please state the nature of your technical emergency: 
+> Please state the nature of your technical emergency:
@@
-# Details
-> please post the details
+# Details
+> please post the details
+>
@@
-> Wrong triples RDF snippet 
-  ``` 
+> Wrong triples RDF snippet 
+  ```ttl
@@
-> Expected / corrected RDF outcome snippet 
-  ``` 
+> Expected / corrected RDF outcome snippet 
+  ```ttl
@@
->Example DBpedia resource URL(s)
-```
+>Example DBpedia resource URL(s)
+```text
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

11-11: Bare URL used

(MD034, no-bare-urls)


11-11: Bare URL used

(MD034, no-bare-urls)


11-11: Bare URL used

(MD034, no-bare-urls)


19-19: Bare URL used

(MD034, no-bare-urls)


19-19: Bare URL used

(MD034, no-bare-urls)


25-25: Blank line inside blockquote

(MD028, no-blanks-blockquote)


27-27: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


31-31: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


35-35: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In @.github/ISSUE_TEMPLATE/data.md around lines 10 - 38, Fix the markdownlint
issues in the template sections (headers "Issue validity", "Error Description",
"Pinpointing the source of the error", "Details") by replacing bare URLs with
proper link syntax (e.g., [text](http://...)) for the links currently on the
"Issue validity" and "Pinpointing the source of the error" lines, remove the
stray blank line inside the blockquote in the "Details" section, and add
language identifiers to the fenced code blocks under "Wrong triples RDF snippet"
and "Expected / corrected RDF outcome snippet" (use ttl) and to the "Example
DBpedia resource URL(s)" block (use text) so the fences become ```ttl and
```text respectively. Ensure the blockquote prefixes (>) remain consistent for
each paragraph line.

Comment on lines 251 to 255
val patternsToRemove = List(
"""<div style=[^/]*/>""".r -> " ",
"""</div>""".r -> " "
"""</div>""".r -> " ",
"""<normalized>.*<\/normalized>""".r -> ""
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Make <normalized> removal non‑greedy to avoid stripping unrelated text.

.* is greedy and can remove content between the first and last <normalized> tag on the same line. It also won’t match across newlines. Prefer a non‑greedy DOTALL pattern.

🔧 Proposed fix (non-greedy DOTALL)
   val patternsToRemove = List(
     """<div style=[^/]*/>""".r -> " ",
     """</div>""".r -> " ",
-    """<normalized>.*<\/normalized>""".r -> ""
+    """(?s)<normalized>.*?</normalized>""".r -> ""
   )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
val patternsToRemove = List(
"""<div style=[^/]*/>""".r -> " ",
"""</div>""".r -> " "
"""</div>""".r -> " ",
"""<normalized>.*<\/normalized>""".r -> ""
)
val patternsToRemove = List(
"""<div style=[^/]*/>""".r -> " ",
"""</div>""".r -> " ",
"""(?s)<normalized>.*?</normalized>""".r -> ""
)
🤖 Prompt for AI Agents
In
`@core/src/main/scala/org/dbpedia/extraction/mappings/PlainAbstractExtractor.scala`
around lines 251 - 255, The regex in patternsToRemove inside
PlainAbstractExtractor.scala uses a greedy and non-DOTALL pattern for
"<normalized>.*</normalized>", which can span too much or miss multiline
content; change that entry to a non‑greedy DOTALL pattern (e.g. use a pattern
equivalent to (?s)<normalized>.*?</normalized>) so it matches the smallest block
including newlines between each pair of tags rather than greedily across the
line.

* https://en.wikipedia.org/wiki/IKEA
* https://en.wikipedia.org/wiki/Jim_Pewter
* https://en.wikipedia.org/wiki/Kerala_Agricultural_University
* https://en.wikipedia.org/wiki/Marian_Breland_Bailey
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix MD034 by wrapping the new bare URL (Line 48).

markdownlint flags the newly added bare URL. Wrap it in angle brackets or use a Markdown link to satisfy MD034.

✅ Suggested fix
-* https://en.wikipedia.org/wiki/Marian_Breland_Bailey
+* <https://en.wikipedia.org/wiki/Marian_Breland_Bailey>
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
* https://en.wikipedia.org/wiki/Marian_Breland_Bailey
* <https://en.wikipedia.org/wiki/Marian_Breland_Bailey>
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

48-48: Bare URL used

(MD034, no-bare-urls)

🤖 Prompt for AI Agents
In `@dump/src/test/bash/minidump-overview.md` at line 48, The bare URL on line 48
causes a markdownlint MD034 warning. Fix this by wrapping the URL in angle
brackets or converting it into a Markdown link format in the
minidump-overview.md file where the Marian Breland Bailey Wikipedia link
appears.

Comment on lines +301 to +307
<#iri_slash_n>
a v:TestGenerator ;
v:trigger trigger:generic_iri ;
v:validator [
a v:IRI_Validator ;
v:doesNotContain "\\n"
] .
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

Turtle RDF string escape sequences \\n newline

💡 Result:

Yes — in Turtle string literals the escape sequence \n represents a newline (U+000A). Example: "line1\nline2" yields a string with a line break. In short (quoted) string literals actual newline characters must be escaped; in long ("""...""") literals you may include real newlines directly. [1] (w3.org)

Citations:


🏁 Script executed:

# Check the file context to understand the validator better
fd "dbpedia-specific-ci-tests.ttl" --type f -x wc -l {}

Repository: dbpedia/extraction-framework

Length of output: 140


🏁 Script executed:

# Read the relevant section of the file
fd "dbpedia-specific-ci-tests.ttl" --type f -x head -320 {} | tail -50

Repository: dbpedia/extraction-framework

Length of output: 2167


\\n matches a literal backslash‑n, not a newline character.

In Turtle, \n represents a newline (U+000A), so \\n represents a backslash followed by 'n'. To check that IRIs don't contain actual newline characters, use "\n" instead.

✅ Suggested fix
-        v:doesNotContain "\\n"
+        v:doesNotContain "\n"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
<#iri_slash_n>
a v:TestGenerator ;
v:trigger trigger:generic_iri ;
v:validator [
a v:IRI_Validator ;
v:doesNotContain "\\n"
] .
<#iri_slash_n>
a v:TestGenerator ;
v:trigger trigger:generic_iri ;
v:validator [
a v:IRI_Validator ;
v:doesNotContain "\n"
] .
🤖 Prompt for AI Agents
In `@dump/src/test/resources/ci-tests/dbpedia-specific-ci-tests.ttl` around lines
301 - 307, The v:validator block for <#iri_slash_n> uses v:doesNotContain "\\n",
which matches a backslash + 'n' rather than an actual newline; update the
v:doesNotContain value to use the newline escape "\n" (i.e., replace "\\n" with
"\n") in the v:IRI_Validator entry so the test asserts IRIs do not contain
actual U+000A newlines.

Comment on lines +15 to +19
sh:message "Error, found (; in dbo:abstract. "@en ;
sh:property [
sh:path <http://dbpedia.org/ontology/abstract> ;
sh:pattern "^((?!\\(\\;).)*$" ;
] .
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, find and examine the file
find . -name "dbp_abstract.ttl" -type f

Repository: dbpedia/extraction-framework

Length of output: 137


🏁 Script executed:

# Also check the file size to understand how to read it
wc -l dump/src/test/resources/shacl-tests/properties/dbp_abstract.ttl 2>/dev/null || echo "File not found in expected location"

Repository: dbpedia/extraction-framework

Length of output: 138


🌐 Web query:

Turtle RDF language-tagged literal syntax @en space rules

💡 Result:

Short answer

  • The language tag must immediately follow the closing quote with no intervening space: e.g. "hello"@en (valid), "hello" @en (invalid). [1][2]
  • The @ is followed by a BCP‑47 language tag (well‑formed per BCP47). In RDF 1.2 an optional initial text‑direction can follow the tag separated by “--” (values ltr or rtl). [1][2]

Examples

  • Valid: "That Seventies Show"@en. [2]
  • Invalid (space): "That Seventies Show" @en. [1]

References

  • W3C RDF 1.2 Turtle specification (language/tag and LANG_DIR). [1]
  • W3C RDF 1.1 Turtle specification (language‑tag syntax and examples). [2]

🌐 Web query:

SPARQL XPath regex negative lookahead support

💡 Result:

Web search failed: Server error: no LLM provider could handle the message


🏁 Script executed:

cat -n dump/src/test/resources/shacl-tests/properties/dbp_abstract.ttl

Repository: dbpedia/extraction-framework

Length of output: 992


🌐 Web query:

SPARQL REGEX function negative lookahead support

💡 Result:

Short answer: No—SPARQL's REGEX uses the XPath/XQuery / XML Schema regular‑expression flavor, which does not include Perl‑style lookaround (negative/positive lookahead or lookbehind). (w3.org)

What to do instead (options):

  • Use SPARQL logic instead of lookarounds (e.g. FILTER + NOT EXISTS or an explicit FILTER with two regex calls). Example: FILTER(regex(...)) and FILTER NOT EXISTS { ... FILTER(regex(...)) } . (stackoverflow.com)
  • Preprocess your strings or add computed properties so matching becomes simpler.
  • If you control the engine and it offers an extension (some XQuery engines allow native Java/PCRE regex via a flag), use that engine‑specific extension—but this is nonstandard. (stackoverflow.com)

If you want, tell me which SPARQL engine you use and the exact pattern you need; I can propose a concrete non‑lookaround rewrite.

Citations:


🌐 Web query:

XSD XML Schema regular expressions negative lookahead

💡 Result:

Short answer: XML Schema (XSD) regular expressions do not support lookaround assertions (positive/negative lookahead or lookbehind). [1][2]

Options/alternatives:

  • If you can use XSD 1.1, add an xs:assert that uses XPath to reject matches, e.g.:
    <xs:assert test="not(matches(., 'FILENAME'))"/> — this enforces “no FILENAME anywhere” without lookahead. [1]
  • Build the allowed pattern(s) explicitly (unions/alternation, enumerations) so the forbidden case is never matched (can be cumbersome). [2][8]
  • Use a processor-specific extension to switch to a different regex engine (e.g. Saxon’s ';j' flag to use Java regex with lookarounds) — vendor-specific only. [3][7]

Sources:

  • W3C XML Schema (regular-expression syntax and limitations). [1]
  • Community discussion / examples showing lack of lookahead and workarounds (StackOverflow). [2][5]
  • Saxon documentation and examples showing the Java-regex extension flag. [3]

References:
[1] W3C XML Schema Part 2: Datatypes — Regular expressions.
[2] StackOverflow — “XML schema restriction pattern for not allowing specific string” / “Negative lookahead in XML schema”.
[3] Saxonica documentation (XSD 1.1 / Java-regex 'j' flag).


Fix invalid language-tagged literal and unsupported negative lookahead regex.

Line 15 has a space before @en which violates Turtle syntax for language-tagged literals (the tag must immediately follow the closing quote). Line 18's negative lookahead (?!...) is not supported by SPARQL/XPath regex and will be rejected by SHACL engines. Use sh:not with a simple pattern instead.

Proposed fix
-        sh:message "Error, found (; in dbo:abstract. "@en ;
+        sh:message "Error, found (; in dbo:abstract."@en ;
 	sh:property [
 		sh:path <http://dbpedia.org/ontology/abstract>  ;
-        sh:pattern "^((?!\\(\\;).)*$" ;
+        sh:not [ sh:pattern "\\(;" ] ;
 	] .
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
sh:message "Error, found (; in dbo:abstract. "@en ;
sh:property [
sh:path <http://dbpedia.org/ontology/abstract> ;
sh:pattern "^((?!\\(\\;).)*$" ;
] .
sh:message "Error, found (; in dbo:abstract."@en ;
sh:property [
sh:path <http://dbpedia.org/ontology/abstract> ;
sh:not [ sh:pattern "\\(;" ] ;
] .
🤖 Prompt for AI Agents
In `@dump/src/test/resources/shacl-tests/properties/dbp_abstract.ttl` around lines
15 - 19, Fix the malformed language-tag and replace the unsupported negative
lookahead: remove the space before the language tag in the sh:message so the
"@en" immediately follows the closing quote, and replace the sh:pattern with a
sh:not constraint on the dbpedia abstract path
(<http://dbpedia.org/ontology/abstract>) using a simple pattern that matches the
disallowed sequence instead of a negative lookahead (i.e., use sh:not with a
nested sh:pattern that targets "\(;" rather than the "(?!...)" construct).

Comment on lines +9 to +45
def loadTestGroupsKeys(group: String, path: String, option: String = "yes"): Array[String] = {
println(
s"""##############
| GROUP $group
|##############""".stripMargin)

val filePath = classLoader.getResource(path).getFile
val file = scala.io.Source.fromFile(filePath)

val table: Array[Array[String]] = file.getLines().map(_.split(",")).toArray
val columnsNames: Array[String] = table.head

if (!columnsNames.contains(group)) {
Array[String]()
}
else {
val indexOfGroup = columnsNames.indexOf(group)
val groupsKeys: Array[String] = table.tail.flatMap(row =>
if (row(indexOfGroup) == option) Array[String](row(0))
else Array[String]())
groupsKeys
}
}

def getGroup(testName: String): String = {
val resourceInputStream = Option(getClass.getClassLoader.getResourceAsStream("properties-from-pom.properties"))
val properties = new Properties()
resourceInputStream match {
case Some(inputStream) => properties.load(inputStream)
case None => return TestConfig.defaultTestGroup
}
val groupOption = Option(properties.getProperty(testName))
groupOption match {
case Some(group) => group
case None => TestConfig.defaultTestGroup
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Close opened resources to avoid file-handle leaks.

Source.fromFile and the properties input stream aren’t closed, which can leak file descriptors in long test runs. Also, a missing resource currently NPEs without a clear error.

🔧 Proposed fix (close resources + clearer error)
-    val filePath = classLoader.getResource(path).getFile
-    val file = scala.io.Source.fromFile(filePath)
-
-    val table: Array[Array[String]] = file.getLines().map(_.split(",")).toArray
+    val fileUrl = Option(classLoader.getResource(path))
+      .getOrElse(throw new IllegalArgumentException(s"Missing resource: $path"))
+    val file = scala.io.Source.fromFile(fileUrl.getFile)
+    val table: Array[Array[String]] =
+      try file.getLines().map(_.split(",")).toArray
+      finally file.close()
@@
-    resourceInputStream match {
-      case Some(inputStream) => properties.load(inputStream)
+    resourceInputStream match {
+      case Some(inputStream) =>
+        try properties.load(inputStream)
+        finally inputStream.close()
       case None => return TestConfig.defaultTestGroup
     }
🤖 Prompt for AI Agents
In `@dump/src/test/scala/org/dbpedia/extraction/dump/Utils.scala` around lines 9 -
45, In loadTestGroupsKeys and getGroup ensure opened streams are closed: replace
scala.io.Source.fromFile(filePath) with a resource-safe pattern (e.g.
scala.util.Using) to close the Source after reading and avoid leaking file
handles, and wrap getClass.getClassLoader.getResourceAsStream(...) usage in a
safe/try-with-resources style (or scala.util.Using) so the InputStream is
closed; also handle the missing properties resource by returning
TestConfig.defaultTestGroup or throwing a clear exception instead of letting a
NullPointerException propagate (update loadTestGroupsKeys, getGroup, and any use
of classLoader/getResource/getResourceAsStream accordingly).

Comment on lines +47 to +58
def renameAbstractsDatasetFiles(datasetName: String): Unit = {
val minidumpDir = new File("./target/minidumptest/base")
minidumpDir.listFiles().foreach(f => {
val longAbstractsFile = new File( s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-long-abstracts.ttl.bz2")
if (longAbstractsFile.exists()) {
longAbstractsFile.renameTo(new File(s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-long-abstracts-$datasetName.ttl.bz2"))
}
val shortAbstractsFile = new File( s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-short-abstracts.ttl.bz2")
if (shortAbstractsFile.exists()) {
shortAbstractsFile.renameTo(new File(s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-short-abstracts-$datasetName.ttl.bz2"))
}
})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard against missing base dir and failed renames.

listFiles() can return null if the directory doesn’t exist, and renameTo failures are currently silent, which can break downstream tests unpredictably.

🔧 Proposed fix (null-safe listing + rename check)
-    val minidumpDir = new File("./target/minidumptest/base")
-    minidumpDir.listFiles().foreach(f => {
+    val minidumpDir = new File("./target/minidumptest/base")
+    Option(minidumpDir.listFiles()).getOrElse(Array.empty).foreach(f => {
       val longAbstractsFile = new File( s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-long-abstracts.ttl.bz2")
       if (longAbstractsFile.exists()) {
-        longAbstractsFile.renameTo(new File(s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-long-abstracts-$datasetName.ttl.bz2"))
+        val target = new File(s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-long-abstracts-$datasetName.ttl.bz2")
+        if (!longAbstractsFile.renameTo(target)) {
+          throw new IllegalStateException(s"Failed to rename ${longAbstractsFile.getPath} -> ${target.getPath}")
+        }
       }
       val shortAbstractsFile = new File( s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-short-abstracts.ttl.bz2")
       if (shortAbstractsFile.exists()) {
-        shortAbstractsFile.renameTo(new File(s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-short-abstracts-$datasetName.ttl.bz2"))
+        val target = new File(s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-short-abstracts-$datasetName.ttl.bz2")
+        if (!shortAbstractsFile.renameTo(target)) {
+          throw new IllegalStateException(s"Failed to rename ${shortAbstractsFile.getPath} -> ${target.getPath}")
+        }
       }
     })
🤖 Prompt for AI Agents
In `@dump/src/test/scala/org/dbpedia/extraction/dump/Utils.scala` around lines 47
- 58, The method renameAbstractsDatasetFiles currently assumes
minidumpDir.listFiles() is non-null and ignores failures from File.renameTo;
update renameAbstractsDatasetFiles to first check that minidumpDir.exists() and
that listFiles() != null (return or throw a descriptive exception if missing),
then iterate over Option(listFiles()) or a safe array, and after each call to
renameTo(...) check the boolean result and handle failures (log an error via
your test logger or throw an AssertionError) for both longAbstractsFile and
shortAbstractsFile so missing directories and failed renames fail fast and are
visible in tests; reference renameAbstractsDatasetFiles, minidumpDir, listFiles,
and renameTo in your changes.

**Documentation**: http://dev.dbpedia.org/Extraction <br/>
**Get in touch with DBpedia**: https://wiki.dbpedia.org/join/get-in-touch <br/>
**Slack**: join the [**#dev-team**](https://dbpedia.slack.com/archives/C0L9MJFU7) slack channel within the the [DBpedia Slack workspace](https://dbpedia-slack.herokuapp.com/) - the main point for [developement updates](https://github.com/dbpedia/extraction-framework/blob/master/.github/workflows/maven.yml) and discussions <br/>
**Slack**: join the [**#dev-team**](https://dbpedia.slack.com/archives/C0L9MJFU7) slack channel within the the [DBpedia Slack workspace]( https://join.slack.com/t/dbpedia/shared_invite/zt-nffbn1ra-dRoi8oeWBlolJb_lKifEqA) - the main point for developement updates and discussions <br/>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix typos and casing in the Slack line.

Small wording issues in user-facing docs.

✏️ Proposed fix
-**Slack**: join the [**#dev-team**](https://dbpedia.slack.com/archives/C0L9MJFU7) slack channel within the the [DBpedia Slack workspace]( https://join.slack.com/t/dbpedia/shared_invite/zt-nffbn1ra-dRoi8oeWBlolJb_lKifEqA) - the main point for developement updates and discussions <br/>
+**Slack**: join the [**#dev-team**](https://dbpedia.slack.com/archives/C0L9MJFU7) Slack channel within the [DBpedia Slack workspace](https://join.slack.com/t/dbpedia/shared_invite/zt-nffbn1ra-dRoi8oeWBlolJb_lKifEqA) - the main point for development updates and discussions <br/>
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
**Slack**: join the [**#dev-team**](https://dbpedia.slack.com/archives/C0L9MJFU7) slack channel within the the [DBpedia Slack workspace]( https://join.slack.com/t/dbpedia/shared_invite/zt-nffbn1ra-dRoi8oeWBlolJb_lKifEqA) - the main point for developement updates and discussions <br/>
**Slack**: join the [**#dev-team**](https://dbpedia.slack.com/archives/C0L9MJFU7) Slack channel within the [DBpedia Slack workspace](https://join.slack.com/t/dbpedia/shared_invite/zt-nffbn1ra-dRoi8oeWBlolJb_lKifEqA) - the main point for development updates and discussions <br/>
🧰 Tools
🪛 LanguageTool

[uncategorized] ~7-~7: Did you mean the communication tool “Slack” (= proper noun, capitalized)?
Context: ...//dbpedia.slack.com/archives/C0L9MJFU7) slack channel within the the [DBpedia Slack w...

(ON_SKYPE)


[grammar] ~7-~7: Ensure spelling is correct
Context: ...oeWBlolJb_lKifEqA) - the main point for developement updates and discussions
## Conte...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🤖 Prompt for AI Agents
In `@README.md` at line 7, Fix the Slack line in the README: remove the duplicate
"the", correct "developement" to "development", fix casing/punctuation for
clarity, and remove the stray space inside the Slack invite URL so the link
works; keep the channel mention "#dev-team" and the DBpedia Slack workspace link
but rewrite the sentence to read smoothly (e.g., "Slack: join the `#dev-team`
channel in the DBpedia Slack workspace — the main point for development updates
and discussions").

Comment on lines +107 to +125
* In addition, [Travis-CI](https://www.travis-ci.com/about-us/) will test if the merged version passes the build.
* If there are further changes you need to make, because Travis said the build fails or because somebody caught something you overlooked, go back to item 4. Stay on the same branch (if it is still related to the same issue). GitHub will add the new commits to the same pull request.
* When everything is fine, your changes will be merged into `extraction-framework/dev`, finally the `dev` together with your improvements will be merged with the `master` branch.

Please keep in mind:
- Try *not* to modify the indentation. If you want to re-format, use a separate "formatting" commit in which no functionality changes are made.
- **Never** rebase the master onto a development branch (i.e. _never_ call `rebase` from `extraction-framework/master`). Only rebase your branch onto the dev branch, *if and only if* nobody already pulled from the development branch!
- If you already pushed a branch to GitHub, later rebased the master onto this branch and then tried to push again, GitHub won't let you saying _"To prevent you from losing history, non-fast-forward updates were rejected"_. If _(and only if)_ you are sure that nobody already pulled from this branch, add `--force` to the push command.
[_"Don’t rebase branches you have shared with another developer."_](http://www.jarrodspillers.com/2009/08/19/git-merge-vs-git-rebase-avoiding-rebase-hell/)
[_"Rebase is awesome, I use rebase exclusively for everything local. Never for anything that I've already pushed."_](http://jeffkreeftmeijer.com/2010/the-magical-and-not-harmful-rebase/#comment-87479247)
[_"Never ever rebase a branch that you pushed, or that you pulled from another person_"](http://blog.experimentalworks.net/2009/03/merge-vs-rebase-a-deep-dive-into-the-mysteries-of-revision-control/)
- _"[Don’t rebase branches you have shared with another developer.](http://www.jarrodspillers.com/2009/08/19/git-merge-vs-git-rebase-avoiding-rebase-hell/)"_
- _"[Rebase is awesome, I use rebase exclusively for everything local. Never for anything that I've already pushed.](http://jeffkreeftmeijer.com/2010/the-magical-and-not-harmful-rebase/#comment-87479247)"_
- _"[Never ever rebase a branch that you pushed, or that you pulled from another person](https://web.archive.org/web/20150622064245/http://blog.experimentalworks.net/2009/03/merge-vs-rebase-a-deep-dive-into-the-mysteries-of-revision-control/)"_
- In general, we prefer Scala over Java.

More tips:
- Guides to setup your development environment for [Intellij](Setting up IntelliJ IDEA) or [Eclipse](Setting up eclipse).
- Get help with the [Maven build](Build-from-Source-with-Maven) or another form of [installation](Installation).
- [Download](Downloads) some data to work with.
- How to run [from Scala/Java](Run-from-Java-or-Scala) or [from a JAR](Run-from-a-JAR).
- Having different troubles? Check the [troubleshooting page](Troubleshooting) or post on https://forum.dbpedia.org.
- Get help with the [Maven build](https://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html) or another form of [installation](https://maven.apache.org/install.html).
- [Download](https://dumps.wikimedia.org/) some data to work with.
- How to run [from Scala/Java](https://docs.scala-lang.org/tutorials/scala-with-maven.html) or [from a JAR](https://docs.oracle.com/javase/tutorial/deployment/jar/run.html).
- Having different troubles? Check the [troubleshooting page](https://maven.apache.org/users/getting-help.html) or post on https://forum.dbpedia.org.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Resolve markdownlint warnings in the guidelines section.

Nested list indentation, emphasis style, and bare URL can be made lint-friendly.

✏️ Proposed fix
-  * In addition, [Travis-CI](https://www.travis-ci.com/about-us/) will test if the merged version passes the build.
-  * If there are further changes you need to make, because Travis said the build fails or because somebody caught something you overlooked, go back to item 4. Stay on the same branch (if it is still related to the same issue). GitHub will add the new commits to the same pull request.
-  * When everything is fine, your changes will be merged into `extraction-framework/dev`, finally the `dev` together with your improvements will be merged with the `master` branch.
+    * In addition, [Travis-CI](https://www.travis-ci.com/about-us/) will test if the merged version passes the build.
+    * If there are further changes you need to make, because Travis said the build fails or because somebody caught something you overlooked, go back to item 4. Stay on the same branch (if it is still related to the same issue). GitHub will add the new commits to the same pull request.
+    * When everything is fine, your changes will be merged into `extraction-framework/dev`, finally the `dev` together with your improvements will be merged with the `master` branch.
@@
-- Try *not* to modify the indentation. If you want to re-format, use a separate "formatting" commit in which no functionality changes are made.
-- **Never** rebase the master onto a development branch (i.e. _never_ call `rebase` from `extraction-framework/master`). Only rebase your branch onto the dev branch, *if and only if* nobody already pulled from the development branch!
+- Try _not_ to modify the indentation. If you want to re-format, use a separate "formatting" commit in which no functionality changes are made.
+- __Never__ rebase the master onto a development branch (i.e. _never_ call `rebase` from `extraction-framework/master`). Only rebase your branch onto the dev branch, _if and only if_ nobody already pulled from the development branch!
@@
-- Having different troubles? Check the [troubleshooting page](https://maven.apache.org/users/getting-help.html) or post on https://forum.dbpedia.org.
+- Having different troubles? Check the [troubleshooting page](https://maven.apache.org/users/getting-help.html) or post on [https://forum.dbpedia.org](https://forum.dbpedia.org).
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
* In addition, [Travis-CI](https://www.travis-ci.com/about-us/) will test if the merged version passes the build.
* If there are further changes you need to make, because Travis said the build fails or because somebody caught something you overlooked, go back to item 4. Stay on the same branch (if it is still related to the same issue). GitHub will add the new commits to the same pull request.
* When everything is fine, your changes will be merged into `extraction-framework/dev`, finally the `dev` together with your improvements will be merged with the `master` branch.
Please keep in mind:
- Try *not* to modify the indentation. If you want to re-format, use a separate "formatting" commit in which no functionality changes are made.
- **Never** rebase the master onto a development branch (i.e. _never_ call `rebase` from `extraction-framework/master`). Only rebase your branch onto the dev branch, *if and only if* nobody already pulled from the development branch!
- If you already pushed a branch to GitHub, later rebased the master onto this branch and then tried to push again, GitHub won't let you saying _"To prevent you from losing history, non-fast-forward updates were rejected"_. If _(and only if)_ you are sure that nobody already pulled from this branch, add `--force` to the push command.
[_"Don’t rebase branches you have shared with another developer."_](http://www.jarrodspillers.com/2009/08/19/git-merge-vs-git-rebase-avoiding-rebase-hell/)
[_"Rebase is awesome, I use rebase exclusively for everything local. Never for anything that I've already pushed."_](http://jeffkreeftmeijer.com/2010/the-magical-and-not-harmful-rebase/#comment-87479247)
[_"Never ever rebase a branch that you pushed, or that you pulled from another person_"](http://blog.experimentalworks.net/2009/03/merge-vs-rebase-a-deep-dive-into-the-mysteries-of-revision-control/)
- _"[Don’t rebase branches you have shared with another developer.](http://www.jarrodspillers.com/2009/08/19/git-merge-vs-git-rebase-avoiding-rebase-hell/)"_
- _"[Rebase is awesome, I use rebase exclusively for everything local. Never for anything that I've already pushed.](http://jeffkreeftmeijer.com/2010/the-magical-and-not-harmful-rebase/#comment-87479247)"_
- _"[Never ever rebase a branch that you pushed, or that you pulled from another person](https://web.archive.org/web/20150622064245/http://blog.experimentalworks.net/2009/03/merge-vs-rebase-a-deep-dive-into-the-mysteries-of-revision-control/)"_
- In general, we prefer Scala over Java.
More tips:
- Guides to setup your development environment for [Intellij](Setting up IntelliJ IDEA) or [Eclipse](Setting up eclipse).
- Get help with the [Maven build](Build-from-Source-with-Maven) or another form of [installation](Installation).
- [Download](Downloads) some data to work with.
- How to run [from Scala/Java](Run-from-Java-or-Scala) or [from a JAR](Run-from-a-JAR).
- Having different troubles? Check the [troubleshooting page](Troubleshooting) or post on https://forum.dbpedia.org.
- Get help with the [Maven build](https://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html) or another form of [installation](https://maven.apache.org/install.html).
- [Download](https://dumps.wikimedia.org/) some data to work with.
- How to run [from Scala/Java](https://docs.scala-lang.org/tutorials/scala-with-maven.html) or [from a JAR](https://docs.oracle.com/javase/tutorial/deployment/jar/run.html).
- Having different troubles? Check the [troubleshooting page](https://maven.apache.org/users/getting-help.html) or post on https://forum.dbpedia.org.
* In addition, [Travis-CI](https://www.travis-ci.com/about-us/) will test if the merged version passes the build.
* If there are further changes you need to make, because Travis said the build fails or because somebody caught something you overlooked, go back to item 4. Stay on the same branch (if it is still related to the same issue). GitHub will add the new commits to the same pull request.
* When everything is fine, your changes will be merged into `extraction-framework/dev`, finally the `dev` together with your improvements will be merged with the `master` branch.
Please keep in mind:
- Try _not_ to modify the indentation. If you want to re-format, use a separate "formatting" commit in which no functionality changes are made.
- __Never__ rebase the master onto a development branch (i.e. _never_ call `rebase` from `extraction-framework/master`). Only rebase your branch onto the dev branch, _if and only if_ nobody already pulled from the development branch!
- If you already pushed a branch to GitHub, later rebased the master onto this branch and then tried to push again, GitHub won't let you saying _"To prevent you from losing history, non-fast-forward updates were rejected"_. If _(and only if)_ you are sure that nobody already pulled from this branch, add `--force` to the push command.
- _"[Don't rebase branches you have shared with another developer.](http://www.jarrodspillers.com/2009/08/19/git-merge-vs-git-rebase-avoiding-rebase-hell/)"_
- _"[Rebase is awesome, I use rebase exclusively for everything local. Never for anything that I've already pushed.](http://jeffkreeftmeijer.com/2010/the-magical-and-not-harmful-rebase/#comment-87479247)"_
- _"[Never ever rebase a branch that you pushed, or that you pulled from another person](https://web.archive.org/web/20150622064245/http://blog.experimentalworks.net/2009/03/merge-vs-rebase-a-deep-dive-into-the-mysteries-of-revision-control/)"_
- In general, we prefer Scala over Java.
More tips:
- Guides to setup your development environment for [Intellij](Setting up IntelliJ IDEA) or [Eclipse](Setting up eclipse).
- Get help with the [Maven build](https://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html) or another form of [installation](https://maven.apache.org/install.html).
- [Download](https://dumps.wikimedia.org/) some data to work with.
- How to run [from Scala/Java](https://docs.scala-lang.org/tutorials/scala-with-maven.html) or [from a JAR](https://docs.oracle.com/javase/tutorial/deployment/jar/run.html).
- Having different troubles? Check the [troubleshooting page](https://maven.apache.org/users/getting-help.html) or post on [https://forum.dbpedia.org](https://forum.dbpedia.org).
🧰 Tools
🪛 LanguageTool

[style] ~108-~108: Consider an alternative to strengthen your wording.
Context: ...sion passes the build. * If there are further changes you need to make, because Travis said t...

(CHANGES_ADJUSTMENTS)


[style] ~116-~116: Consider using a more formal and expressive alternative to ‘awesome’.
Context: ...oiding-rebase-hell/)"_ - _"[Rebase is awesome, I use rebase exclusively for everythin...

(AWESOME)

🪛 markdownlint-cli2 (0.18.1)

107-107: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


108-108: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


109-109: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


112-112: Emphasis style
Expected: underscore; Actual: asterisk

(MD049, emphasis-style)


112-112: Emphasis style
Expected: underscore; Actual: asterisk

(MD049, emphasis-style)


113-113: Emphasis style
Expected: underscore; Actual: asterisk

(MD049, emphasis-style)


113-113: Emphasis style
Expected: underscore; Actual: asterisk

(MD049, emphasis-style)


125-125: Bare URL used

(MD034, no-bare-urls)

🤖 Prompt for AI Agents
In `@README.md` around lines 107 - 125, Fix markdownlint warnings in the
guidelines section by correcting nested list indentation (indent the sub-bullets
under "If you already pushed a branch..." and the quoted links block to align as
nested list items), normalize emphasis style (replace mixed *italic* with either
underscores or consistent asterisks throughout occurrences like "Try *not* to
modify the indentation." and the _italic_ examples), and replace bare URLs with
proper Markdown links (turn the bare https://forum.dbpedia.org and any other
bare URLs into [forum.dbpedia.org](https://forum.dbpedia.org) or descriptive
link text). Locate and update the paragraph and bulleted area containing the
lines starting "If you already pushed a branch..." and the "Please keep in
mind:" block to apply these changes.

Comment on lines +82 to +113
private def countTemplates(all: Boolean, count: MappingStats => Int): Int = {
var sum = 0
for (ms <- mappedStatistics) {
if (all || ms.isMapped) {
if (!ignoreList.isTemplateIgnored(ms.templateName)) {
sum += count(ms)
}
}
sum
}
sum
}

private def countAllTemplates(count: MappingStats => Int): Int = countTemplates(true, count)
private def countMappedTemplates(count: MappingStats => Int): Int = countTemplates(false, count)

val templateCount = countAllTemplates(_ => 1)
val mappedTemplateCount = countMappedTemplates(_ => 1)

val templateUseCount = countAllTemplates(_.templateCount)
val mappedTemplateUseCount = countMappedTemplates(_.templateCount)

val propertyCount = countAllTemplates(_.propertyCount)
val mappedPropertyCount = countMappedTemplates(_.mappedPropertyCount)

private def countAllTemplates(count: MappingStats => Int): Int = countTemplates(true, count)
private def countMappedTemplates(count: MappingStats => Int): Int = countTemplates(false, count)

val templateCount = countAllTemplates(_ => 1)
val mappedTemplateCount = countMappedTemplates(_ => 1)

val templateUseCount = countAllTemplates(_.templateCount)
val mappedTemplateUseCount = countMappedTemplates(_.templateCount)

val propertyCount = countAllTemplates(_.propertyCount)
val mappedPropertyCount = countMappedTemplates(_.mappedPropertyCount)

val propertyUseCount = countAllTemplates(_.propertyUseCount)
val mappedPropertyUseCount = countMappedTemplates(_.mappedPropertyUseCount)

val mappedTemplateRatio = mappedTemplateCount.toDouble / templateCount.toDouble
val mappedPropertyRatio = mappedPropertyCount.toDouble / propertyCount.toDouble

val mappedTemplateUseRatio = mappedTemplateUseCount.toDouble / templateUseCount.toDouble
val mappedPropertyUseRatio = mappedPropertyUseCount.toDouble / propertyUseCount.toDouble
val propertyUseCount = countAllTemplates(_.propertyUseCount)
val mappedPropertyUseCount = countMappedTemplates(_.mappedPropertyUseCount)

val mappedTemplateRatio = mappedTemplateCount.toDouble / templateCount.toDouble
val mappedPropertyRatio = mappedPropertyCount.toDouble / propertyCount.toDouble

val mappedTemplateUseRatio = mappedTemplateUseCount.toDouble / templateUseCount.toDouble
val mappedPropertyUseRatio = mappedPropertyUseCount.toDouble / propertyUseCount.toDouble
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, check if the file exists and examine the code at the specified lines
cat -n server/src/main/scala/org/dbpedia/extraction/server/stats/MappingStatsHolder.scala | sed -n '70,125p'

Repository: dbpedia/extraction-framework

Length of output: 2766


🏁 Script executed:

# Search for usages of the ratio vals in the codebase
rg "mappedTemplateRatio|mappedPropertyRatio|mappedTemplateUseRatio|mappedPropertyUseRatio" --type scala

Repository: dbpedia/extraction-framework

Length of output: 4317


🏁 Script executed:

# Check MappingStats.scala for similar patterns
cat -n server/src/main/scala/org/dbpedia/extraction/server/stats/MappingStats.scala | head -100

Repository: dbpedia/extraction-framework

Length of output: 2221


🏁 Script executed:

# Look for test files related to MappingStats or MappingStatsHolder
find . -type f -name "*test*" -o -name "*Test*" -o -name "*Spec*" | grep -i "mapping" | head -20

Repository: dbpedia/extraction-framework

Length of output: 1792


🏁 Script executed:

# Search for tests in the server/stats directory
find server -type f \( -name "*Test*.scala" -o -name "*Spec*.scala" \) | head -20

Repository: dbpedia/extraction-framework

Length of output: 252


🏁 Script executed:

# Check the IgnoreListTest to understand testing patterns for stats
cat server/src/test/scala/org/dbpedia/extraction/server/stats/IgnoreListTest.scala

Repository: dbpedia/extraction-framework

Length of output: 902


🏁 Script executed:

# Search for NaN, Infinity, or zero-denominator handling patterns in stats code
rg "NaN|Infinity|/ 0|denominator|division" server/src/main/scala/org/dbpedia/extraction/server/stats/ --type scala

Repository: dbpedia/extraction-framework

Length of output: 54


🏁 Script executed:

# Check the MappingStatsManager or MappingStatsHolder to see how mappedStatistics is populated
rg "new MappingStatsHolder|MappingStatsHolder(" server/src/main/scala/ --type scala -B 3 -A 3

Repository: dbpedia/extraction-framework

Length of output: 206


🏁 Script executed:

# Check the MappingStatsManager or MappingStatsHolder to see how mappedStatistics is populated
rg "MappingStatsHolder" server/src/main/scala/ --type scala -B 2 -A 2

Repository: dbpedia/extraction-framework

Length of output: 4731


🏁 Script executed:

# Look at the MappingStatsHolder.apply method to see how statistics are created
sed -n '1,75p' server/src/main/scala/org/dbpedia/extraction/server/stats/MappingStatsHolder.scala

Repository: dbpedia/extraction-framework

Length of output: 2957


🏁 Script executed:

# Check if there are any comments or handling around empty statistics
rg "empty|null|zero" server/src/main/scala/org/dbpedia/extraction/server/stats/MappingStatsHolder.scala --type scala -B 2 -A 2

Repository: dbpedia/extraction-framework

Length of output: 402


🏁 Script executed:

# Verify the exact lines in MappingStats.scala with the ratio calculations
sed -n '44,53p' server/src/main/scala/org/dbpedia/extraction/server/stats/MappingStats.scala

Repository: dbpedia/extraction-framework

Length of output: 502


Guard ratio calculations in both MappingStatsHolder and MappingStats against zero denominators.

Division by zero in ratio calculations will produce NaN/Infinity when denominators are zero (e.g., empty mappedStatistics, all templates filtered by ignoreList, or no mapped items). These values are then formatted as strings in UI output ("NaN"/"Infinity" instead of percentages). Implement a safe ratio helper across both files.

Note: The same issue exists in MappingStats.scala lines 51–52 (mappedPropertyRatio and mappedPropertyUseRatio), which also requires the same fix.

🔧 Suggested fix

MappingStatsHolder.scala:

+  private def safeRatio(n: Int, d: Int): Double = if (d == 0) 0.0 else n.toDouble / d.toDouble
@@
-  val mappedTemplateRatio = mappedTemplateCount.toDouble / templateCount.toDouble
-  val mappedPropertyRatio = mappedPropertyCount.toDouble / propertyCount.toDouble
+  val mappedTemplateRatio = safeRatio(mappedTemplateCount, templateCount)
+  val mappedPropertyRatio = safeRatio(mappedPropertyCount, propertyCount)
@@
-  val mappedTemplateUseRatio = mappedTemplateUseCount.toDouble / templateUseCount.toDouble
-  val mappedPropertyUseRatio = mappedPropertyUseCount.toDouble / propertyUseCount.toDouble
+  val mappedTemplateUseRatio = safeRatio(mappedTemplateUseCount, templateUseCount)
+  val mappedPropertyUseRatio = safeRatio(mappedPropertyUseCount, propertyUseCount)

MappingStats.scala:

+  private def safeRatio(n: Int, d: Int): Double = if (d == 0) 0.0 else n.toDouble / d.toDouble
@@
-  val mappedPropertyRatio = mappedPropertyCount.toDouble / propertyCount.toDouble
-  val mappedPropertyUseRatio = mappedPropertyUseCount.toDouble / propertyUseCount.toDouble
+  val mappedPropertyRatio = safeRatio(mappedPropertyCount, propertyCount)
+  val mappedPropertyUseRatio = safeRatio(mappedPropertyUseCount, propertyUseCount)
🤖 Prompt for AI Agents
In
`@server/src/main/scala/org/dbpedia/extraction/server/stats/MappingStatsHolder.scala`
around lines 82 - 113, Add a small safe-ratio helper (e.g., def safeRatio(num:
Double, den: Double): Double = if (den == 0) 0.0 else num / den) and use it in
MappingStatsHolder to compute mappedTemplateRatio, mappedPropertyRatio,
mappedTemplateUseRatio and mappedPropertyUseRatio instead of directly dividing
(replace mappedTemplateCount.toDouble / templateCount.toDouble, etc.). Apply the
same helper or equivalent zero-check in MappingStats for mappedPropertyRatio and
mappedPropertyUseRatio to guard against zero denominators and avoid NaN/Infinity
in UI output. Ensure you reference the existing identifiers: MappingStatsHolder,
countTemplates, mappedTemplateRatio, mappedPropertyRatio,
mappedTemplateUseRatio, mappedPropertyUseRatio, and in MappingStats update
mappedPropertyRatio and mappedPropertyUseRatio.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericLiteralLangTagValidator.scala (1)

14-14: Typo: "LTIERAL" should be "LITERAL".

The validator IRI contains a typo: #GENERIC_LTIERAL_LANG_TAG_VALIDATOR should be #GENERIC_LITERAL_LANG_TAG_VALIDATOR.

🐛 Proposed fix
-  override val iri: ValidatorIRI = "#GENERIC_LTIERAL_LANG_TAG_VALIDATOR"
+  override val iri: ValidatorIRI = "#GENERIC_LITERAL_LANG_TAG_VALIDATOR"
dump/src/main/scala/org/dbpedia/validation/construct/tests/suites/NTripleTestSuite.scala (1)

104-109: Unused constructLabel parameter.

The constructLabel parameter is added to the method signature but is never used within the method body. This appears to be dead code.

#!/bin/bash
# Verify if constructLabel is used elsewhere in the codebase
rg -n "constructLabel" --type scala
dump/src/main/scala/org/dbpedia/validation/construct/tests/generators/NTripleTestGenerator.scala (1)

282-290: Inconsistent validatorGroup handling for NotContainsValidator.

The NotContainsValidator instantiation here does not pass validatorGroup, unlike lines 337-341 where the same validator type receives it. This inconsistency means IRI-based NotContainsValidator instances will always use ValidatorGroup.DEFAULT, potentially ignoring the configured group.

🐛 Proposed fix
         if (validatorQuerySolution.contains("doesNotContains")) {

           validatorQuerySolution.getLiteral("doesNotContains").getLexicalForm.split(delim).foreach(charSeq => {
-
-            validatorCollection.append(NotContainsValidator(currentValidatorID, validatorIRI, charSeq))
+            val validatorGroup = getValidatorGroup(validatorQuerySolution)
+            validatorCollection.append(NotContainsValidator(currentValidatorID, validatorIRI, charSeq, validatorGroup))
             groupedValidators.append(currentValidatorID)
             currentValidatorID += 1
           })
         }
🤖 Fix all issues with AI agents
In
`@core/src/main/scala/org/dbpedia/extraction/mappings/PlainAbstractExtractor.scala`:
- Around line 251-255: In PlainAbstractExtractor.scala update the
patternsToRemove entry that removes <normalized> blocks to use non-greedy
matching so it doesn't span multiple tags; specifically change the regex in the
patternsToRemove list (the entry currently """<normalized>.*<\/normalized>""".r
-> "") to a non-greedy form (e.g. use .*? inside the same triple-quoted regex)
so each <normalized>...</normalized> is removed independently.

In
`@dump/src/main/scala/org/dbpedia/validation/construct/model/validators/TypedLiteralValidator.scala`:
- Around line 13-27: The current lexical extraction in run (using
VALIDATOR_GROUP and nTriplePart.left/right/self) is fragile because
split("\"").dropRight(1).drop(1).mkString("") fails on escaped quotes; replace
that inline splitting with a robust extractor: implement a function
extractLexicalForm(value: String) that correctly parses N-Triples quoted
literals (handling escaped quotes, language tags and datatypes) using a regex or
an N-Triples literal parser, then call extractLexicalForm(value) in each branch
(the branches of ValidatorGroup.RIGHT, ValidatorGroup.LEFT and the default)
instead of the split-based logic and return false when Option is None as before.

In `@dump/src/test/bash/minidump-overview.md`:
- Line 48: Replace the bare URL
"https://en.wikipedia.org/wiki/Marian_Breland_Bailey" with a proper Markdown
link to satisfy MD034: either wrap it as an autolink (<https://...>) or use link
text such as the page title (e.g., "Marian Breland Bailey") pointing to that URL
so the line no longer contains a bare URL.

In `@dump/src/test/resources/ci-tests/dbpedia-specific-ci-tests.ttl`:
- Around line 59-63: The trigger pattern for trigger:abstract_property is
missing regex anchors and should be changed to an exact-match pattern by adding
^ at the start and $ at the end of the trigger:pattern value; locate the
trigger:abstract_property block and update its trigger:pattern string to include
the ^ and $ anchors so it matches only the full IRI
"http://dbpedia.org/ontology/abstract".

In `@dump/src/test/resources/shaclTestsCoverageTable.md`:
- Around line 29-33: The new table rows in shaclTestsCoverageTable.md (e.g.,
entries beginning with [http://el.dbpedia.org/resource/Βερολίνο] and the various
[http://.../resource/%3F_(film)] lines) only have two columns and end with a
trailing pipe, breaking the 4-column table format; update each such row to
include two additional empty cells for the “issue” and “comment” columns (add "
|  | " after the existing cells) and remove any trailing pipe at the end of the
line so every row has exactly four pipe-separated columns.

In
`@dump/src/test/scala/org/dbpedia/extraction/dump/ConstructValidationTest.scala`:
- Around line 37-51: The current selectValues is built as a single String
(mkString) so selectValues.contains(...) does substring checks; change
selectValues to a Set[String] of exact URIs and compare membership against
rdfSubject.getURI. Specifically, in the block using Utils.loadTestGroupsKeys and
selectValues, build selectValues as groupKeys.map(x =>
s"https://github.com/dbpedia/extraction-framework$x").toSet (no angle brackets
or trailing space) and then use if (rdfSubject.isURIResource &&
selectValues.contains(rdfSubject.getURI) && rdfObject.toString ==
testGeneratorURI) to perform exact set membership; keep the rest of the
iterator/remove logic the same.

In `@dump/src/test/scala/org/dbpedia/extraction/dump/Utils.scala`:
- Around line 47-59: The method renameAbstractsDatasetFiles can NPE because
minidumpDir.listFiles() may return null; update renameAbstractsDatasetFiles to
guard against that by first checking that minidumpDir.exists() &&
minidumpDir.isDirectory() or by wrapping listFiles() in Option(...) and using
getOrElse(Array.empty) before iterating; keep the rest of the logic unchanged
and operate only when files are present so longAbstractsFile/shortAbstractsFile
checks remain valid.
- Around line 33-45: In getGroup, the InputStream returned by
getResourceAsStream is not closed causing a resource leak; update getGroup to
load the Properties inside a safe-closure (e.g., scala.util.Using or a
try/finally) that ensures inputStream.close() is always called after
properties.load(inputStream), and keep the existing logic returning
TestConfig.defaultTestGroup on missing resource or missing property; reference
the getGroup method and the local variable resourceInputStream/inputStream when
implementing the safe-close.
- Around line 15-18: The code opens a scala.io.Source (variable file) and never
closes it, risking a resource leak; update the Utils code that computes table
(using variables filePath and file) to ensure the Source is closed after reading
— either wrap the file usage in a try-finally or use a managed resource
construct (e.g., Using.from or scala.util.Using) to read
file.getLines().map(_.split(",")).toArray and close the Source in the
finally/Using block so the file handle is always released.

In `@README.md`:
- Line 7: Fix the typos in the Slack sentence that starts "**Slack**: join the
[**#dev-team**]...": remove the duplicated "the", correct "developement" to
"development", ensure "Slack workspace" capitalization is consistent, and remove
the extra space before the URL so the sentence reads cleanly and without double
words or misspellings.

In
`@server/src/main/scala/org/dbpedia/extraction/server/stats/MappingStatsHolder.scala`:
- Around line 78-113: The ratio fields in MappingStatsHolder
(mappedTemplateRatio, mappedPropertyRatio, mappedTemplateUseRatio,
mappedPropertyUseRatio) perform raw division and can produce NaN/Infinity when
denominators (templateCount, propertyCount, templateUseCount, propertyUseCount)
are zero; add a small helper method (e.g., safeRatio(numerator: Double,
denominator: Double): Double) that returns 0.0 when denominator == 0.0 and
otherwise returns numerator/denominator, then replace the four direct divisions
with safeRatio calls to ensure safe results.
🧹 Nitpick comments (15)
.github/ISSUE_TEMPLATE/data.md (3)

10-12: Major inconsistency: This file doesn't relate to the PR objectives.

The PR objectives state this PR fixes issue #804 for Macedonian template namespace handling in MappingStatsHolder.scala, and the AI summary describes changes to AbstractExtractor, validation framework, and broken bracket removal. However, this file is just a GitHub issue template with documentation updates—it contains no Scala code, template namespace handling, or extractor changes.

This suggests either the wrong file was included in the review, or this PR contains many unrelated changes beyond the stated objectives.


Optional: Wrap bare URLs in angle brackets.

The three URLs in this section are flagged by markdownlint (MD034). Consider wrapping them in angle brackets for better markdown compliance:

  • <https://www.dbpedia.org/blog/snapshot-2021-06-release/#anchor1>
  • <http://dbpedia.org/sparql>
  • <http://dief.tools.dbpedia.org/server/extraction/en/>

17-21: Optional: Wrap bare URLs in angle brackets.

Line 19 contains two bare URLs flagged by markdownlint (MD034). Consider wrapping them:

  • <http://dbpedia.org/sparql>
  • <http://dbpedia.org/resource/Berlin>

25-35: Optional: Address markdown formatting issues.

Static analysis flags a few minor markdown formatting issues:

  • Line 25: Blank line inside blockquote (MD028) - consider removing the blank line or closing the blockquote before it
  • Lines 27, 31, 35: Fenced code blocks should specify a language (MD040) - consider adding language identifiers like turtle, rdf, or text to the code fence openers

These are pedantic formatting improvements and don't affect functionality.

core/src/main/scala/org/dbpedia/extraction/mappings/TemplateMapping.scala (5)

9-9: Unused import.

The Logger import is added but never used in this file.

🧹 Proposed fix
-import java.util.logging.Logger

86-90: Simplify boolean assignment and consider internationalization.

The if-else block can be simplified to a direct boolean expression. Additionally, hardcoding "Infobox" may not work correctly for non-English Wikipedia editions where infobox templates have different naming conventions.

♻️ Proposed simplification
-        val isInfobox = if (node.title.decoded.contains("Infobox")) {
-          true
-        } else {
-          false
-        }
+        val isInfobox = node.title.decoded.contains("Infobox")

Please verify whether non-English Wikipedia editions use different infobox naming conventions that this check might miss.


92-109: Use val instead of var and consider more robust name comparison.

Several variables are declared as var but never reassigned. Additionally, the name comparison at line 106 doesn't account for case sensitivity or URL encoding differences.

♻️ Proposed refactor
         var condition4_same_entity_infoBox = true
 
         if(isInfobox)
         {
           //getting name property from infobox
           val allNames = node.children.filter(p => p.key == "name")
-          var name = subjectUri;
-          if(allNames.size > 0)
-            name = allNames(0).propertyNodeValueToPlainText
+          val name = allNames.headOption
+            .map(_.propertyNodeValueToPlainText)
+            .getOrElse(subjectUri)
 
           //getting subject  of wikipedia page
-          var splittedURI = subjectUri.split("/")
-          var pageTitle = splittedURI(splittedURI.size - 1)
+          val splittedURI = subjectUri.split("/")
+          val pageTitle = splittedURI.last
 
-          if(!name.contains(pageTitle) && !pageTitle.contains(name))
+          if(!name.toLowerCase.contains(pageTitle.toLowerCase) && !pageTitle.toLowerCase.contains(name.toLowerCase))
             condition4_same_entity_infoBox = false
 
         }

199-213: Use idiomatic find instead of loop with mutable variable.

The loop pattern with a mutable nameProperty variable can be replaced with Scala's find method for cleaner, more idiomatic code.

♻️ Proposed refactor
-    //Try to find a property which contains 'name'
-    var nameProperty : PropertyNode = null
-    for(property <- properties if nameProperty == null)
-    {
-      if(property.key.toLowerCase.contains("name"))
-      {
-        nameProperty = property
-      }
-    }
-
-    //If no name property has been found -> Use the first property of the template
-    if(nameProperty == null)
-    {
-      nameProperty = properties.head
-    }
+    //Try to find a property which contains 'name', or use the first property
+    val nameProperty = properties
+      .find(_.key.toLowerCase.contains("name"))
+      .getOrElse(properties.head)

31-34: Incomplete comment or missing implementation.

The comment mentions "when extractor has a pre-phase" but there's no corresponding pre-phase implementation visible. Consider either completing the documentation or removing this placeholder comment.

dump/src/test/resources/ci-tests/dbpedia-specific-ci-tests.ttl (1)

293-299: Consider extracting the duplicated language list pattern.

The language code pattern is duplicated from trigger:generic_wikipedia_dbpedia_extraction (line 37). While functional, this creates a maintenance burden if languages are added/removed. Consider whether this could reference the trigger pattern or be extracted to a shared constant.

core/src/main/scala/org/dbpedia/extraction/config/Config.scala (1)

280-281: Consider documenting the new config keys in sample configs/docs.

remove-broken-brackets-plain-abstracts and remove-broken-brackets-html-abstracts are new knobs; adding them to config templates (e.g., universal/sample properties) will make them discoverable.

Also applies to: 297-299

core/src/main/scala/org/dbpedia/extraction/mappings/MissingAbstractsExtractor.scala (1)

57-57: Use the extractor’s own class for the logger category.

Logging under PlainAbstractExtractor makes MissingAbstractsExtractor messages harder to trace.

🔧 Suggested tweak
-    private val logger = Logger.getLogger(classOf[PlainAbstractExtractor].getName)
+    private val logger = Logger.getLogger(classOf[MissingAbstractsExtractor].getName)
dump/src/test/scala/org/dbpedia/extraction/dump/ConstructValidationTest.scala (1)

3-21: Remove duplicate imports.

Several imports are duplicated:

  • StmtIterator is imported at lines 6 and 9
  • RDFNode is imported at lines 9 and 19
  • Line 3-5 imports (graph, model, StatementImpl) appear unused
♻️ Suggested cleanup
 package org.dbpedia.extraction.dump

-import org.apache.jena.graph
-import org.apache.jena.rdf.model
-import org.apache.jena.rdf.model.impl.StatementImpl
-import org.apache.jena.rdf.model.StmtIterator
-
-import java.io.{File, FileInputStream, FileOutputStream}
-import org.apache.jena.rdf.model.{Alt, Bag, Literal, Model, ModelFactory, Property, RDFNode, RSIterator, ReifiedStatement, Resource, ResourceF, Statement, StmtIterator}
+import java.io.{File, FileInputStream, FileOutputStream}
+import org.apache.jena.rdf.model.{Model, ModelFactory}
 import org.apache.jena.riot.{RDFDataMgr, RDFLanguages}
 import org.apache.spark.sql.SQLContext
 import org.dbpedia.extraction.dump.TestConfig.{XSDCITestFile, ciTestFile, ciTestModel, classLoader, date, mappingsConfig, sparkSession}
 import org.dbpedia.extraction.dump.tags.ConstructValidationTestTag
 import org.dbpedia.validation.construct.report.ReportWriter
 import org.dbpedia.validation.construct.report.formats.ReportFormat
 import org.dbpedia.validation.construct.tests.TestSuiteFactory
 import org.dbpedia.validation.construct.tests.suites.NTripleTestSuite
 import org.scalatest.{BeforeAndAfterAll, DoNotDiscover, FunSuite}
-import org.apache.jena.rdf.model.RDFNode
-
-import java.util.function.Consumer
dump/src/main/scala/org/dbpedia/validation/construct/tests/suites/NTripleTestSuite.scala (1)

50-51: Null check may need adjustment for Construct type.

The filter (_ != null) checks if the Construct object is null, but based on prepareFlatTerseLine, the Construct is always created (never null). However, the self field inside could be null if parsing fails (lines 77-98 show s, p, o can remain null on exception). Consider filtering on _.self != null instead.

Suggested fix
-          .filter(_ != null)
+          .filter(c => c != null && c.self != null)
dump/src/main/scala/org/dbpedia/validation/construct/tests/generators/NTripleTestGenerator.scala (1)

350-363: Consider using pattern matching for cleaner code.

The logic is correct, but this could be more idiomatic using Scala's pattern matching.

♻️ Optional refactor
   def getValidatorGroup(validatorQuerySolution: QuerySolution): ValidatorGroup.Value = {
-    if (validatorQuerySolution.contains("validatorGroup")) {
-      val validatorGroup = validatorQuerySolution.getResource("validatorGroup").getLocalName
-      if (validatorGroup == rightValidator) {
-        ValidatorGroup.RIGHT
-      } else if (validatorGroup == leftValidator){
-        ValidatorGroup.LEFT
-      } else {
-        ValidatorGroup.DEFAULT
-      }
-    } else {
-      ValidatorGroup.DEFAULT
+    Option(validatorQuerySolution)
+      .filter(_.contains("validatorGroup"))
+      .map(_.getResource("validatorGroup").getLocalName) match {
+        case Some(`rightValidator`) => ValidatorGroup.RIGHT
+        case Some(`leftValidator`) => ValidatorGroup.LEFT
+        case _ => ValidatorGroup.DEFAULT
     }
   }
core/src/main/scala/org/dbpedia/extraction/mappings/PlainAbstractExtractor.scala (1)

223-249: Code duplication with NifExtractor.postProcessExtractedHtml.

The postProcessExtractedHtml method appears to be duplicated from NifExtractor.scala (lines 72-95 in the relevant snippets). Consider extracting this to a shared utility (e.g., AbstractUtils) to avoid maintenance burden.

Comment on lines 251 to 255
val patternsToRemove = List(
"""<div style=[^/]*/>""".r -> " ",
"""</div>""".r -> " "
"""</div>""".r -> " ",
"""<normalized>.*<\/normalized>""".r -> ""
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use non-greedy matching for <normalized> tag removal.

The regex <normalized>.*<\/normalized> uses greedy matching, which could inadvertently match across multiple <normalized> blocks on the same line, removing more content than intended.

🐛 Proposed fix
   val patternsToRemove = List(
     """<div style=[^/]*/>""".r -> " ",
     """</div>""".r -> " ",
-    """<normalized>.*<\/normalized>""".r -> ""
+    """<normalized>.*?<\/normalized>""".r -> ""
   )
🤖 Prompt for AI Agents
In
`@core/src/main/scala/org/dbpedia/extraction/mappings/PlainAbstractExtractor.scala`
around lines 251 - 255, In PlainAbstractExtractor.scala update the
patternsToRemove entry that removes <normalized> blocks to use non-greedy
matching so it doesn't span multiple tags; specifically change the regex in the
patternsToRemove list (the entry currently """<normalized>.*<\/normalized>""".r
-> "") to a non-greedy form (e.g. use .*? inside the same triple-quoted regex)
so each <normalized>...</normalized> is removed independently.

Comment on lines +13 to +27
override def run(nTriplePart: Construct): Boolean = {
val lexicalForm = VALIDATOR_GROUP match {
case ValidatorGroup.RIGHT => nTriplePart.right match {
// TODO: 1) maybe we need to rename "value"
// 2) discuss what to do if we want to check the value that doesn't exist on
// the left or right side, at the moment we only return false in these cases
case Some(value) => value.trim.split("\"").dropRight(1).drop(1).mkString("")
case None => return false
}
case ValidatorGroup.LEFT => nTriplePart.left match {
case Some(value) => value.trim.split("\"").dropRight(1).drop(1).mkString("")
case None => return false
}
case _ => nTriplePart.self.trim.split("\"").dropRight(1).drop(1).mkString("")
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fragile lexical form extraction for literals with escaped quotes.

The split("\"").dropRight(1).drop(1).mkString("") approach works for simple quoted literals but will produce incorrect results for literals containing escaped quotes (e.g., "hello \"world\"").

Consider using a regex or proper N-Triples literal parser for more robust extraction:

Suggested improvement
-      case ValidatorGroup.RIGHT => nTriplePart.right match {
-        // TODO: 1) maybe we need to rename "value"
-        //       2) discuss what to do if we want to check the value that doesn't exist on
-        //          the left or right side, at the moment we only return false in these cases
-        case Some(value) => value.trim.split("\"").dropRight(1).drop(1).mkString("")
-        case None => return false
-      }
-      case ValidatorGroup.LEFT => nTriplePart.left match {
-        case Some(value) => value.trim.split("\"").dropRight(1).drop(1).mkString("")
-        case None => return false
-      }
-      case _ => nTriplePart.self.trim.split("\"").dropRight(1).drop(1).mkString("")
+  private def extractLexicalForm(value: String): String = {
+    val trimmed = value.trim
+    if (trimmed.startsWith("\"")) {
+      // Find the closing quote (handling escaped quotes)
+      val content = trimmed.drop(1)
+      val endIdx = content.lastIndexOf("\"")
+      if (endIdx >= 0) content.substring(0, endIdx) else content
+    } else trimmed
+  }

Then use extractLexicalForm(value) in each branch.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
override def run(nTriplePart: Construct): Boolean = {
val lexicalForm = VALIDATOR_GROUP match {
case ValidatorGroup.RIGHT => nTriplePart.right match {
// TODO: 1) maybe we need to rename "value"
// 2) discuss what to do if we want to check the value that doesn't exist on
// the left or right side, at the moment we only return false in these cases
case Some(value) => value.trim.split("\"").dropRight(1).drop(1).mkString("")
case None => return false
}
case ValidatorGroup.LEFT => nTriplePart.left match {
case Some(value) => value.trim.split("\"").dropRight(1).drop(1).mkString("")
case None => return false
}
case _ => nTriplePart.self.trim.split("\"").dropRight(1).drop(1).mkString("")
}
override def run(nTriplePart: Construct): Boolean = {
val lexicalForm = VALIDATOR_GROUP match {
case ValidatorGroup.RIGHT => nTriplePart.right match {
case Some(value) => extractLexicalForm(value)
case None => return false
}
case ValidatorGroup.LEFT => nTriplePart.left match {
case Some(value) => extractLexicalForm(value)
case None => return false
}
case _ => extractLexicalForm(nTriplePart.self)
}
// ... rest of the function
}
private def extractLexicalForm(value: String): String = {
val trimmed = value.trim
if (trimmed.startsWith("\"")) {
// Find the closing quote (handling escaped quotes)
val content = trimmed.drop(1)
val endIdx = content.lastIndexOf("\"")
if (endIdx >= 0) content.substring(0, endIdx) else content
} else trimmed
}
🤖 Prompt for AI Agents
In
`@dump/src/main/scala/org/dbpedia/validation/construct/model/validators/TypedLiteralValidator.scala`
around lines 13 - 27, The current lexical extraction in run (using
VALIDATOR_GROUP and nTriplePart.left/right/self) is fragile because
split("\"").dropRight(1).drop(1).mkString("") fails on escaped quotes; replace
that inline splitting with a robust extractor: implement a function
extractLexicalForm(value: String) that correctly parses N-Triples quoted
literals (handling escaped quotes, language tags and datatypes) using a regex or
an N-Triples literal parser, then call extractLexicalForm(value) in each branch
(the branches of ValidatorGroup.RIGHT, ValidatorGroup.LEFT and the default)
instead of the split-based logic and return false when Option is None as before.

* https://en.wikipedia.org/wiki/IKEA
* https://en.wikipedia.org/wiki/Jim_Pewter
* https://en.wikipedia.org/wiki/Kerala_Agricultural_University
* https://en.wikipedia.org/wiki/Marian_Breland_Bailey
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Wrap the new URL to satisfy MD034 (no-bare-urls).

markdownlint flags the added bare URL. Use an autolink or link text.

🛠️ Proposed fix
-* https://en.wikipedia.org/wiki/Marian_Breland_Bailey
+* <https://en.wikipedia.org/wiki/Marian_Breland_Bailey>
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
* https://en.wikipedia.org/wiki/Marian_Breland_Bailey
* <https://en.wikipedia.org/wiki/Marian_Breland_Bailey>
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

48-48: Bare URL used

(MD034, no-bare-urls)

🤖 Prompt for AI Agents
In `@dump/src/test/bash/minidump-overview.md` at line 48, Replace the bare URL
"https://en.wikipedia.org/wiki/Marian_Breland_Bailey" with a proper Markdown
link to satisfy MD034: either wrap it as an autolink (<https://...>) or use link
text such as the page title (e.g., "Marian Breland Bailey") pointing to that URL
so the line no longer contains a bare URL.

Comment on lines +59 to +63
trigger:abstract_property
a v:RDF_IRI_Trigger ;
trigger:pattern "http://dbpedia.org/ontology/abstract" ;
rdfs:label "abstract IRIs" ;
rdfs:comment "Match abstracts" .
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing regex anchor for exact match.

The trigger pattern "http://dbpedia.org/ontology/abstract" lacks the ^ start anchor and $ end anchor, which all other triggers in this file use consistently. Without anchors, this could match IRIs that merely contain this substring.

Proposed fix
 trigger:abstract_property
     a v:RDF_IRI_Trigger ;
-    trigger:pattern "http://dbpedia.org/ontology/abstract" ;
+    trigger:pattern "^http://dbpedia.org/ontology/abstract$" ;
 	rdfs:label "abstract IRIs" ;
 	rdfs:comment "Match abstracts" .
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
trigger:abstract_property
a v:RDF_IRI_Trigger ;
trigger:pattern "http://dbpedia.org/ontology/abstract" ;
rdfs:label "abstract IRIs" ;
rdfs:comment "Match abstracts" .
trigger:abstract_property
a v:RDF_IRI_Trigger ;
trigger:pattern "^http://dbpedia.org/ontology/abstract$" ;
rdfs:label "abstract IRIs" ;
rdfs:comment "Match abstracts" .
🤖 Prompt for AI Agents
In `@dump/src/test/resources/ci-tests/dbpedia-specific-ci-tests.ttl` around lines
59 - 63, The trigger pattern for trigger:abstract_property is missing regex
anchors and should be changed to an exact-match pattern by adding ^ at the start
and $ at the end of the trigger:pattern value; locate the
trigger:abstract_property block and update its trigger:pattern string to include
the ^ and $ anchors so it matches only the full IRI
"http://dbpedia.org/ontology/abstract".

Comment on lines +29 to 33
[http://de.dbpedia.org/resource/Arthur_Schopenhauer](http://dief.tools.dbpedia.org/server/extraction/de/extract?title=Arthur_Schopenhauer&revid=&format=trix&extractors=custom) | [http://dbpedia.org/ontology/abstract](http://dbpedia.org/ontology/abstract) #en_abstract_validation |
[http://de.dbpedia.org/resource/Berlin](http://dief.tools.dbpedia.org/server/extraction/de/extract?title=Berlin&revid=&format=trix&extractors=custom) | [http://dbpedia.org/ontology/abstract](http://dbpedia.org/ontology/abstract) #en_abstract_validation |
[http://el.dbpedia.org/resource/Βερολίνο](http://dief.tools.dbpedia.org/server/extraction/el/extract?title=Βερολίνο&revid=&format=trix&extractors=custom) |
[http://en.dbpedia.org/resource/%3F_(film)](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=%3F_(film)&revid=&format=trix&extractors=custom) | [http://dbpedia.org/ontology/abstract](http://dbpedia.org/ontology/abstract) #en_abstract_validation |
[http://en.dbpedia.org/resource/%3F_(film)](http://dief.tools.dbpedia.org/server/extraction/en/extract?title=%3F_(film)&revid=&format=trix&extractors=custom) | [http://dbpedia.org/property/accessDate](http://dbpedia.org/property/accessDate) #Citation_english_languagа_accessDate_datatype_validation |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix table rows to keep 4-column shape (and remove trailing pipes).

Several new rows only have two columns and end with a trailing pipe, which breaks the 4‑column table contract and triggers markdownlint. Add empty cells for “issue” and “comment” (and avoid trailing pipes) for all new rows in this pattern.

🔧 Example fix (apply to similar rows)
-[http://de.dbpedia.org/resource/Arthur_Schopenhauer](http://dief.tools.dbpedia.org/server/extraction/de/extract?title=Arthur_Schopenhauer&revid=&format=trix&extractors=custom) | [http://dbpedia.org/ontology/abstract](http://dbpedia.org/ontology/abstract) `#en_abstract_validation` | 
+[http://de.dbpedia.org/resource/Arthur_Schopenhauer](http://dief.tools.dbpedia.org/server/extraction/de/extract?title=Arthur_Schopenhauer&revid=&format=trix&extractors=custom) | [http://dbpedia.org/ontology/abstract](http://dbpedia.org/ontology/abstract) `#en_abstract_validation` |  | 
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

29-29: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


29-29: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


30-30: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


30-30: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


31-31: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


31-31: Table column count
Expected: 4; Actual: 1; Too few cells, row will be missing data

(MD056, table-column-count)


32-32: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


32-32: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


33-33: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


33-33: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)

🤖 Prompt for AI Agents
In `@dump/src/test/resources/shaclTestsCoverageTable.md` around lines 29 - 33, The
new table rows in shaclTestsCoverageTable.md (e.g., entries beginning with
[http://el.dbpedia.org/resource/Βερολίνο] and the various
[http://.../resource/%3F_(film)] lines) only have two columns and end with a
trailing pipe, breaking the 4-column table format; update each such row to
include two additional empty cells for the “issue” and “comment” columns (add "
|  | " after the existing cells) and remove any trailing pipe at the end of the
line so every row has exactly four pipe-separated columns.

Comment on lines +15 to +18
val filePath = classLoader.getResource(path).getFile
val file = scala.io.Source.fromFile(filePath)

val table: Array[Array[String]] = file.getLines().map(_.split(",")).toArray
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Resource leak: file handle not closed.

scala.io.Source.fromFile opens a file handle that should be closed after use to prevent resource leaks.

Suggested fix using try-finally
-    val file = scala.io.Source.fromFile(filePath)
-
-    val table: Array[Array[String]] = file.getLines().map(_.split(",")).toArray
+    val file = scala.io.Source.fromFile(filePath)
+    val table: Array[Array[String]] = try {
+      file.getLines().map(_.split(",")).toArray
+    } finally {
+      file.close()
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
val filePath = classLoader.getResource(path).getFile
val file = scala.io.Source.fromFile(filePath)
val table: Array[Array[String]] = file.getLines().map(_.split(",")).toArray
val filePath = classLoader.getResource(path).getFile
val file = scala.io.Source.fromFile(filePath)
val table: Array[Array[String]] = try {
file.getLines().map(_.split(",")).toArray
} finally {
file.close()
}
🤖 Prompt for AI Agents
In `@dump/src/test/scala/org/dbpedia/extraction/dump/Utils.scala` around lines 15
- 18, The code opens a scala.io.Source (variable file) and never closes it,
risking a resource leak; update the Utils code that computes table (using
variables filePath and file) to ensure the Source is closed after reading —
either wrap the file usage in a try-finally or use a managed resource construct
(e.g., Using.from or scala.util.Using) to read
file.getLines().map(_.split(",")).toArray and close the Source in the
finally/Using block so the file handle is always released.

Comment on lines +33 to +45
def getGroup(testName: String): String = {
val resourceInputStream = Option(getClass.getClassLoader.getResourceAsStream("properties-from-pom.properties"))
val properties = new Properties()
resourceInputStream match {
case Some(inputStream) => properties.load(inputStream)
case None => return TestConfig.defaultTestGroup
}
val groupOption = Option(properties.getProperty(testName))
groupOption match {
case Some(group) => group
case None => TestConfig.defaultTestGroup
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Resource leak: InputStream not closed after loading properties.

The InputStream from getResourceAsStream should be closed after use.

Suggested fix
   def getGroup(testName: String): String = {
     val resourceInputStream = Option(getClass.getClassLoader.getResourceAsStream("properties-from-pom.properties"))
     val properties = new Properties()
     resourceInputStream match {
-      case Some(inputStream) => properties.load(inputStream)
+      case Some(inputStream) =>
+        try {
+          properties.load(inputStream)
+        } finally {
+          inputStream.close()
+        }
       case None => return TestConfig.defaultTestGroup
     }
🤖 Prompt for AI Agents
In `@dump/src/test/scala/org/dbpedia/extraction/dump/Utils.scala` around lines 33
- 45, In getGroup, the InputStream returned by getResourceAsStream is not closed
causing a resource leak; update getGroup to load the Properties inside a
safe-closure (e.g., scala.util.Using or a try/finally) that ensures
inputStream.close() is always called after properties.load(inputStream), and
keep the existing logic returning TestConfig.defaultTestGroup on missing
resource or missing property; reference the getGroup method and the local
variable resourceInputStream/inputStream when implementing the safe-close.

Comment on lines +47 to +59
def renameAbstractsDatasetFiles(datasetName: String): Unit = {
val minidumpDir = new File("./target/minidumptest/base")
minidumpDir.listFiles().foreach(f => {
val longAbstractsFile = new File( s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-long-abstracts.ttl.bz2")
if (longAbstractsFile.exists()) {
longAbstractsFile.renameTo(new File(s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-long-abstracts-$datasetName.ttl.bz2"))
}
val shortAbstractsFile = new File( s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-short-abstracts.ttl.bz2")
if (shortAbstractsFile.exists()) {
shortAbstractsFile.renameTo(new File(s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-short-abstracts-$datasetName.ttl.bz2"))
}
})
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential NullPointerException if directory doesn't exist.

minidumpDir.listFiles() returns null if the path is not a directory or an I/O error occurs, which would cause a NullPointerException when calling foreach.

Suggested fix
   def renameAbstractsDatasetFiles(datasetName: String): Unit = {
     val minidumpDir = new File("./target/minidumptest/base")
-    minidumpDir.listFiles().foreach(f => {
+    Option(minidumpDir.listFiles()).getOrElse(Array.empty).foreach(f => {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def renameAbstractsDatasetFiles(datasetName: String): Unit = {
val minidumpDir = new File("./target/minidumptest/base")
minidumpDir.listFiles().foreach(f => {
val longAbstractsFile = new File( s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-long-abstracts.ttl.bz2")
if (longAbstractsFile.exists()) {
longAbstractsFile.renameTo(new File(s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-long-abstracts-$datasetName.ttl.bz2"))
}
val shortAbstractsFile = new File( s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-short-abstracts.ttl.bz2")
if (shortAbstractsFile.exists()) {
shortAbstractsFile.renameTo(new File(s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-short-abstracts-$datasetName.ttl.bz2"))
}
})
}
def renameAbstractsDatasetFiles(datasetName: String): Unit = {
val minidumpDir = new File("./target/minidumptest/base")
Option(minidumpDir.listFiles()).getOrElse(Array.empty).foreach(f => {
val longAbstractsFile = new File( s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-long-abstracts.ttl.bz2")
if (longAbstractsFile.exists()) {
longAbstractsFile.renameTo(new File(s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-long-abstracts-$datasetName.ttl.bz2"))
}
val shortAbstractsFile = new File( s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-short-abstracts.ttl.bz2")
if (shortAbstractsFile.exists()) {
shortAbstractsFile.renameTo(new File(s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-short-abstracts-$datasetName.ttl.bz2"))
}
})
}
🤖 Prompt for AI Agents
In `@dump/src/test/scala/org/dbpedia/extraction/dump/Utils.scala` around lines 47
- 59, The method renameAbstractsDatasetFiles can NPE because
minidumpDir.listFiles() may return null; update renameAbstractsDatasetFiles to
guard against that by first checking that minidumpDir.exists() &&
minidumpDir.isDirectory() or by wrapping listFiles() in Option(...) and using
getOrElse(Array.empty) before iterating; keep the rest of the logic unchanged
and operate only when files are present so longAbstractsFile/shortAbstractsFile
checks remain valid.

**Documentation**: http://dev.dbpedia.org/Extraction <br/>
**Get in touch with DBpedia**: https://wiki.dbpedia.org/join/get-in-touch <br/>
**Slack**: join the [**#dev-team**](https://dbpedia.slack.com/archives/C0L9MJFU7) slack channel within the the [DBpedia Slack workspace](https://dbpedia-slack.herokuapp.com/) - the main point for [developement updates](https://github.com/dbpedia/extraction-framework/blob/master/.github/workflows/maven.yml) and discussions <br/>
**Slack**: join the [**#dev-team**](https://dbpedia.slack.com/archives/C0L9MJFU7) slack channel within the the [DBpedia Slack workspace]( https://join.slack.com/t/dbpedia/shared_invite/zt-nffbn1ra-dRoi8oeWBlolJb_lKifEqA) - the main point for developement updates and discussions <br/>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix typos and capitalization in the Slack sentence.
Line 7 has a duplicated “the” and “developement” typo.

✅ Suggested edit
-**Slack**: join the [**#dev-team**](https://dbpedia.slack.com/archives/C0L9MJFU7) slack channel within the the [DBpedia Slack workspace]( https://join.slack.com/t/dbpedia/shared_invite/zt-nffbn1ra-dRoi8oeWBlolJb_lKifEqA) - the main point for developement updates and discussions <br/>
+**Slack**: join the [**#dev-team**](https://dbpedia.slack.com/archives/C0L9MJFU7) Slack channel within the [DBpedia Slack workspace](https://join.slack.com/t/dbpedia/shared_invite/zt-nffbn1ra-dRoi8oeWBlolJb_lKifEqA) - the main point for development updates and discussions <br/>
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
**Slack**: join the [**#dev-team**](https://dbpedia.slack.com/archives/C0L9MJFU7) slack channel within the the [DBpedia Slack workspace]( https://join.slack.com/t/dbpedia/shared_invite/zt-nffbn1ra-dRoi8oeWBlolJb_lKifEqA) - the main point for developement updates and discussions <br/>
**Slack**: join the [**#dev-team**](https://dbpedia.slack.com/archives/C0L9MJFU7) Slack channel within the [DBpedia Slack workspace](https://join.slack.com/t/dbpedia/shared_invite/zt-nffbn1ra-dRoi8oeWBlolJb_lKifEqA) - the main point for development updates and discussions <br/>
🧰 Tools
🪛 LanguageTool

[uncategorized] ~7-~7: Did you mean the communication tool “Slack” (= proper noun, capitalized)?
Context: ...//dbpedia.slack.com/archives/C0L9MJFU7) slack channel within the the [DBpedia Slack w...

(ON_SKYPE)


[grammar] ~7-~7: Ensure spelling is correct
Context: ...oeWBlolJb_lKifEqA) - the main point for developement updates and discussions
## Conte...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🤖 Prompt for AI Agents
In `@README.md` at line 7, Fix the typos in the Slack sentence that starts
"**Slack**: join the [**#dev-team**]...": remove the duplicated "the", correct
"developement" to "development", ensure "Slack workspace" capitalization is
consistent, and remove the extra space before the URL so the sentence reads
cleanly and without double words or misspellings.

Comment on lines +78 to +113
* Also holds on to the mappings to make synchronization in MappingStatsManager easier.
* TODO: better solution for mappings?
*/
class MappingStatsHolder(val mappings: Mappings, val mappedStatistics: List[MappingStats], val reversedRedirects: Map[String, String], ignoreList: IgnoreList) {

private def countTemplates(all: Boolean, count: MappingStats => Int): Int = {
var sum = 0
for (ms <- mappedStatistics) {
if (all || ms.isMapped) {
if (! ignoreList.isTemplateIgnored(ms.templateName)) {
sum += count(ms)
}
private def countTemplates(all: Boolean, count: MappingStats => Int): Int = {
var sum = 0
for (ms <- mappedStatistics) {
if (all || ms.isMapped) {
if (!ignoreList.isTemplateIgnored(ms.templateName)) {
sum += count(ms)
}
}
sum
}
sum
}

private def countAllTemplates(count: MappingStats => Int): Int = countTemplates(true, count)
private def countMappedTemplates(count: MappingStats => Int): Int = countTemplates(false, count)

val templateCount = countAllTemplates(_ => 1)
val mappedTemplateCount = countMappedTemplates(_ => 1)

val templateUseCount = countAllTemplates(_.templateCount)
val mappedTemplateUseCount = countMappedTemplates(_.templateCount)

val propertyCount = countAllTemplates(_.propertyCount)
val mappedPropertyCount = countMappedTemplates(_.mappedPropertyCount)

private def countAllTemplates(count: MappingStats => Int): Int = countTemplates(true, count)
private def countMappedTemplates(count: MappingStats => Int): Int = countTemplates(false, count)

val templateCount = countAllTemplates(_ => 1)
val mappedTemplateCount = countMappedTemplates(_ => 1)

val templateUseCount = countAllTemplates(_.templateCount)
val mappedTemplateUseCount = countMappedTemplates(_.templateCount)

val propertyCount = countAllTemplates(_.propertyCount)
val mappedPropertyCount = countMappedTemplates(_.mappedPropertyCount)

val propertyUseCount = countAllTemplates(_.propertyUseCount)
val mappedPropertyUseCount = countMappedTemplates(_.mappedPropertyUseCount)

val mappedTemplateRatio = mappedTemplateCount.toDouble / templateCount.toDouble
val mappedPropertyRatio = mappedPropertyCount.toDouble / propertyCount.toDouble

val mappedTemplateUseRatio = mappedTemplateUseCount.toDouble / templateUseCount.toDouble
val mappedPropertyUseRatio = mappedPropertyUseCount.toDouble / propertyUseCount.toDouble
val propertyUseCount = countAllTemplates(_.propertyUseCount)
val mappedPropertyUseCount = countMappedTemplates(_.mappedPropertyUseCount)

val mappedTemplateRatio = mappedTemplateCount.toDouble / templateCount.toDouble
val mappedPropertyRatio = mappedPropertyCount.toDouble / propertyCount.toDouble

val mappedTemplateUseRatio = mappedTemplateUseCount.toDouble / templateUseCount.toDouble
val mappedPropertyUseRatio = mappedPropertyUseCount.toDouble / propertyUseCount.toDouble
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard ratio calculations against zero counts.

If any denominator is 0 (e.g., empty stats or all templates ignored), ratios become NaN/Infinity. Consider a safe helper that returns 0.0 in that case.

✅ Suggested fix (safe ratios)
   private def countAllTemplates(count: MappingStats => Int): Int = countTemplates(true, count)
   private def countMappedTemplates(count: MappingStats => Int): Int = countTemplates(false, count)
+
+  private def ratio(num: Int, den: Int): Double =
+    if (den == 0) 0.0 else num.toDouble / den.toDouble
@@
-  val mappedTemplateRatio = mappedTemplateCount.toDouble / templateCount.toDouble
-  val mappedPropertyRatio = mappedPropertyCount.toDouble / propertyCount.toDouble
+  val mappedTemplateRatio = ratio(mappedTemplateCount, templateCount)
+  val mappedPropertyRatio = ratio(mappedPropertyCount, propertyCount)
@@
-  val mappedTemplateUseRatio = mappedTemplateUseCount.toDouble / templateUseCount.toDouble
-  val mappedPropertyUseRatio = mappedPropertyUseCount.toDouble / propertyUseCount.toDouble
+  val mappedTemplateUseRatio = ratio(mappedTemplateUseCount, templateUseCount)
+  val mappedPropertyUseRatio = ratio(mappedPropertyUseCount, propertyUseCount)
🤖 Prompt for AI Agents
In
`@server/src/main/scala/org/dbpedia/extraction/server/stats/MappingStatsHolder.scala`
around lines 78 - 113, The ratio fields in MappingStatsHolder
(mappedTemplateRatio, mappedPropertyRatio, mappedTemplateUseRatio,
mappedPropertyUseRatio) perform raw division and can produce NaN/Infinity when
denominators (templateCount, propertyCount, templateUseCount, propertyUseCount)
are zero; add a small helper method (e.g., safeRatio(numerator: Double,
denominator: Double): Double) that returns 0.0 when denominator == 0.0 and
otherwise returns numerator/denominator, then replace the four direct divisions
with safeRatio calls to ensure safe results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Server crashes with StringIndexOutOfBoundsException when processing Macedonian (mk) templates using 'Шаблон:' namespace

6 participants