Skip to content

Conversation

@TheJoeFin
Copy link
Owner

  • Add more testing
  • Add better handling of spaces

Enhanced OCR framework to support Japanese text processing.
Introduced `PostOcrUtilities` for modular post-OCR handling.
Added `ja-word-borders.json` for test data and updated
`OcrJapaneseImage` test to use new utilities. Updated
`Tests.csproj` to include the JSON file. Refactored to enable
language-specific OCR logic.
Improved Japanese text processing and OCR handling, including:
- Fixed typos in Japanese test strings for accuracy.
- Implemented `GetTextFromJaWordBorders` for line grouping.
- Added `ProcessLineToString` for furigana and main text formatting.
- Refactored sorting logic with LINQ for modularity.
- Enhanced post-processing to merge single-character lines.

Added a new Bash command in `settings.local.json` to run `dotnet test` with detailed logging and filtering.
Updated `multiLineInput` in `StringMethodTests.cs` to test edge cases with trailing spaces and adjusted an `[InlineData]` test case. Added `GetTextFromOcrResult` to `PostOcrUtilities.cs` for processing OCR results into `WordBorderInfo` objects, leveraging existing logic in `GetTextFromWordBorderInfo`.
@TheJoeFin TheJoeFin added enhancement New feature or request General Processing Relating to the processing of images to some type of text output labels Nov 10, 2025
Reversed space-joining condition in OcrUtilities to adjust text processing logic. Removed unused Windows.Media.Ocr namespace from OcrTests.
Base automatically changed from dev to main November 16, 2025 07:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request General Processing Relating to the processing of images to some type of text output

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants