Better Japanese post-processing #570

TheJoeFin · 2025-11-10T15:52:16Z

Add more testing
Add better handling of spaces

Enhanced OCR framework to support Japanese text processing. Introduced `PostOcrUtilities` for modular post-OCR handling. Added `ja-word-borders.json` for test data and updated `OcrJapaneseImage` test to use new utilities. Updated `Tests.csproj` to include the JSON file. Refactored to enable language-specific OCR logic.

Improved Japanese text processing and OCR handling, including: - Fixed typos in Japanese test strings for accuracy. - Implemented `GetTextFromJaWordBorders` for line grouping. - Added `ProcessLineToString` for furigana and main text formatting. - Refactored sorting logic with LINQ for modularity. - Enhanced post-processing to merge single-character lines. Added a new Bash command in `settings.local.json` to run `dotnet test` with detailed logging and filtering.

Updated `multiLineInput` in `StringMethodTests.cs` to test edge cases with trailing spaces and adjusted an `[InlineData]` test case. Added `GetTextFromOcrResult` to `PostOcrUtilities.cs` for processing OCR results into `WordBorderInfo` objects, leveraging existing logic in `GetTextFromWordBorderInfo`.

Reversed space-joining condition in OcrUtilities to adjust text processing logic. Removed unused Windows.Media.Ocr namespace from OcrTests.

TheJoeFin added 5 commits November 9, 2025 23:02

Add test image and new CJK test

f44e816

use new Method for getting text

1a9bf09

TheJoeFin added enhancement New feature or request General Processing Relating to the processing of images to some type of text output labels Nov 10, 2025

Refactor OCR logic and remove unused namespace

e6f4afe

Reversed space-joining condition in OcrUtilities to adjust text processing logic. Removed unused Windows.Media.Ocr namespace from OcrTests.

Base automatically changed from dev to main November 16, 2025 07:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Better Japanese post-processing #570

Better Japanese post-processing #570

Uh oh!

TheJoeFin commented Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Better Japanese post-processing #570

Are you sure you want to change the base?

Better Japanese post-processing #570

Uh oh!

Conversation

TheJoeFin commented Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants