Skip to content

feat: convert tables with merged cells to HTML with rowspan/colspan#126

Closed
vnixx wants to merge 3 commits intowhale4113:mainfrom
vnixx:main
Closed

feat: convert tables with merged cells to HTML with rowspan/colspan#126
vnixx wants to merge 3 commits intowhale4113:mainfrom
vnixx:main

Conversation

@vnixx
Copy link
Copy Markdown

@vnixx vnixx commented Mar 24, 2026

  • Tables with merged cells (rowspan/colspan) in Lark documents are now exported as HTML <table> instead of GFM markdown tables, which cannot represent cell merges.
  • Merge info is resolved from the DOM by reading rowspan/colspan attributes and display: none style on <td> elements, with a fallback that scrolls off-screen tables into view via locateBlockWithRecordId (same pattern as transformMentionUsers).
Lark Before After
image image image

…/colspan HTML output

GFM markdown tables cannot represent merged cells. When a Lark table
contains rowspan/colspan merges, it is now converted to an HTML <table>
with proper attributes instead.

Merge info is resolved by reading rowspan/colspan from the rendered DOM
<td> elements. For tables not in the viewport (virtual scrolling), the
extension scrolls them into view via locateBlockWithRecordId before
reading — following the same pattern as transformMentionUsers.
@vnixx vnixx marked this pull request as draft March 24, 2026 09:26
…ad of markdown

Markdown syntax like **bold** inside <td> tags is not parsed by
renderers. Convert cell content to HTML (e.g. <strong>) via the
mdast-to-hast-to-html pipeline so formatting is preserved.
@whale4113 whale4113 marked this pull request as ready for review March 24, 2026 10:27
…yout

When a merged table cell contains multiple consecutive images, wrap them
in a <table><thead><tr><th> structure for horizontal display, matching
the behavior of the original plugin's invalid table HTML output.
@vnixx
Copy link
Copy Markdown
Author

vnixx commented Mar 25, 2026

hi, 感谢你这个优秀的插件, 这是我能找到的飞书文档转 Markdown 最好的工具.
在使用的过程中发现, 对于包含合并单元格的表格无法以原格式展现, 我尝试做了支持了, 经过一些手动测试, 感觉功能上已经可以了, 但我对前端了解甚浅, 代码由 AI 完成, 辛苦 review, 如果有问题请告知我修改或关闭 PR

whale4113 added a commit that referenced this pull request Mar 28, 2026
@whale4113
Copy link
Copy Markdown
Owner

Hi @vnixx ,感谢你为这个项目贡献代码,也感谢你花时间研究合并单元格这个比较复杂的问题!从截图对比来看,功能效果非常好 👍
不过我已经基于这个 PR 的思路,用稍微不同的方式实现了同样的功能,具体逻辑在 53e1dff。这里解释一下我没有直接合并这个 PR 的原因:

主要问题:对 DOM 的依赖

这个 PR 的核心逻辑是 resolveMergedTablesFromDom,它通过查询 DOM 中 元素的 rowspan/colspan 属性来获取合并信息,并在表格未在视口内时通过 locateBlockWithRecordId 触发滚屏后再读取。这个思路可以工作,但存在几个隐患:

  • 脆弱性:依赖飞书渲染后的 DOM 结构,一旦飞书前端更新了渲染方式(比如改变虚拟滚动策略、修改 data-block-id 属性名等),这段逻辑就可能静默失效。
  • 时序依赖:需要 waitForFunction + timeout 来等待 DOM 就绪,增加了不确定性和潜在的性能开销。

更好的数据来源

其实飞书在 TableBlock 的 snapshot 中已经直接提供了 merge_info 信息(包含 row_span 和 col_span),这是数据层的信息,不依赖 DOM 渲染,也不需要等待滚屏。

再次感谢你发现并报告了这个问题,你的 PR 给了我很好的参考,尤其是对问题的分析和截图对比,很有帮助。如果你在使用过程中发现新的 bug 或有其他改进想法,欢迎继续提 issue 或 PR!

@vnixx
Copy link
Copy Markdown
Author

vnixx commented Mar 31, 2026

@whale4113 好的好的, 我也觉得我的实现有隐患, 但是又不太懂, 还是你实现的令人放心!! 感谢~

@vnixx vnixx closed this Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants