Skip to content

fix: preserve HTML tables in Outlook .msg conversion#1596

Open
lavish0000 wants to merge 1 commit intomicrosoft:mainfrom
lavish0000:codex/fix/outlook-html-tables-1567
Open

fix: preserve HTML tables in Outlook .msg conversion#1596
lavish0000 wants to merge 1 commit intomicrosoft:mainfrom
lavish0000:codex/fix/outlook-html-tables-1567

Conversation

@lavish0000
Copy link

Closes #1567

Summary

  • prefer the Outlook HTML body streams when converting .msg files
  • run HTML bodies through the existing HTML-to-Markdown path so tables stay tables
  • fall back to the plain-text body when no HTML body is available
  • add regression coverage for both the HTML-body and plain-text fallback paths

Testing

  • hatch test tests/test_outlook_msg_converter.py 'tests/test_module_vectors.py::test_guess_stream_info[test_vector4]' 'tests/test_module_vectors.py::test_convert_local[test_vector4]' 'tests/test_module_vectors.py::test_convert_stream_with_hints[test_vector4]' 'tests/test_module_vectors.py::test_convert_stream_without_hints[test_vector4]' 'tests/test_module_vectors.py::test_convert_file_uri[test_vector4]' 'tests/test_module_vectors.py::test_convert_data_uri[test_vector4]'\n- uvx black --check packages/markitdown/src/markitdown/converters/_outlook_msg_converter.py packages/markitdown/tests/test_outlook_msg_converter.py\n- hatch run types:check src/markitdown/converters/_outlook_msg_converter.py tests/test_outlook_msg_converter.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[outlook]: HTML Tables in outlook files

1 participant