Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
ec48fcb
Improve performance by parsing HtmlColor with memory span
onizet Sep 28, 2024
a5048cd
Improve regex to avoid a call to HtmlDecode
onizet Sep 28, 2024
eabbce6
Unit and Margin also now use memory span
onizet Sep 28, 2024
1dff57b
Bug fix on Span polyfill method
onizet Sep 30, 2024
a1cd67e
Reuse existing .Net framework implementation of HtmlDecode if available
onizet Sep 30, 2024
b523d8b
Fix sln markup
onizet Sep 30, 2024
e79b024
Handle case sensitiveness on unit metric
onizet Oct 5, 2024
8ce2cec
Use regex compiled for performance boost
onizet Nov 11, 2024
ca5dff9
Set a timeout for regex
onizet Dec 10, 2024
129bebb
Optimise parsing of Style attributes
onizet Dec 11, 2024
66834d8
Ensure to catch regex timeout exception
onizet Dec 12, 2024
4f516aa
Prefer usage of await and async method
onizet Dec 12, 2024
19c263f
Optimise parsing of Style attributes
onizet Dec 14, 2024
07e3f2e
Update changelog
onizet Dec 14, 2024
2bd74d3
Add Benchmark project
onizet Dec 14, 2024
83851e9
Improve unit tests
onizet Dec 14, 2024
7a5322b
Allow to run multiple runtimes benchmarks
onizet Dec 14, 2024
52a3610
Include the csproj in the sln
onizet Dec 14, 2024
04e9302
Remove neat support for SearchValues as performant is worst than Read…
onizet Dec 14, 2024
2b14315
Merge branch 'dev' into memory-span
onizet Dec 16, 2024
dbfb53e
Merge branch 'memory-span' of github.com:onizet/html2openxml into mem…
onizet Dec 16, 2024
89f4e5f
Fix compilation error
onizet Dec 16, 2024
e62d833
Use FrozenDictionary for better performance (thanks Graham!)
onizet Dec 16, 2024
5dc31f7
Decrease warnings
onizet Dec 16, 2024
03bd94d
Code simplification
onizet Dec 20, 2024
9ec8636
Rewrite parsing to improve code readability
onizet Dec 21, 2024
935da6c
Do not allocate new array in a loop
onizet Jan 6, 2025
af8711e
AngleSharp update
onizet Jan 6, 2025
9b8f240
Ensure to trim input before parsing the color
onizet Jan 10, 2025
436a600
Use benefits of FrozenDictionary only for net8, no cumbersome net462 …
onizet Jan 10, 2025
3d9d9b9
Merge develop
onizet Nov 12, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions .github/workflows/publish_nuget.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# This workflow will build, pack and deploy the solution on Nuget

name: 'publish_nuget.yml'

on:
push:
tags:
- '[0-9]+.[0-9]+'
branches:
- master

jobs:
publish:
if: startsWith(github.ref, 'refs/tags/') && github.ref_type == 'tag' && github.ref_name != ''
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Ensure tag is on master
run: |
TAG_COMMIT=$(git rev-list -n 1 ${{ github.ref_name }})
if ! git merge-base --is-ancestor $TAG_COMMIT origin/master; then
echo "Tag is not on master branch. Skipping publish."
exit 1
fi
- name: Setup .NET 8.
uses: actions/setup-dotnet@v4
with:
dotnet-version: '8.0.x'
- name: Build
run: dotnet build --configuration Release
- name: Pack
run: dotnet pack src/Html2OpenXml/HtmlToOpenXml.csproj --configuration Release --output ./nupkg
- name: Push nuget to NuGet.org
run: dotnet nuget push ./nupkg/*.nupkg --api-key $NUGET_API_KEY -s https://api.nuget.org/v3/index.json
env:
NUGET_API_KEY: ${{ secrets.NUGET_API_KEY }}
- name: Create Release and Upload Artifact to Release
run: gh release create ${{github.ref_name}} -t "Release ${{github.ref_name}}" *.nupkg --generate-notes --draft
4 changes: 2 additions & 2 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
{
"omnisharp.organizeImportsOnFormat": true,
"dotnet.completion.showCompletionItemsFromUnimportedNamespaces": false,
"coverage-gutters.coverageFileNames":[
"coverage.info"
],
"coverage-gutters.showGutterCoverage": false,
"coverage-gutters.showLineCoverage": true
"coverage-gutters.showLineCoverage": true,
"dotnet.formatting.organizeImportsOnFormat": true
}
50 changes: 50 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,55 @@
# Changelog

## 3.3.0

- Rewriting of parsing to use System.Span instead of Regex
- Set Timeout on remaining Regex to prevent any DoS attack

## 3.2.8

- Fix a fatal crash when trying to convert multiple images #215
- New feature to allow to reference external image instead of embedding them #216
- Fix a potential issue on image streams that are disposed too early.
- Support table col with percentage width #206

## 3.2.7

- Fix handling Uri with an anchor #209
- New option DefaultStyles.NumberedHeadingStyle to support an alternate heading style #210

## 3.2.6

(wrong packaging, same code as 3.2.5)

## 3.2.5

- Fix a crash with the new whitespace handling introduced in 3.2.3 #191
- Fix crash when the html contains 2 images with identical source path #193
- Support margin auto for table alignment #194
- Fix handling whitespace between runs #195
- Whitelist more mime-types as specified by the IANA standard #196
- Support EMF file #196
- Correct handling of `figcaption` (allow nested phrasings) #197
- Numbering list now supports type attribute `<ol type="1|a|A|i|I">` #198
- Always restart nested numbering list #198
- Fix table borders being removed even when the specified word table style has borders #199
- Defensive code when download image stream is truncated #201
- Table inside list is constrained to not exceed page margin #202
- Table now supports width:auto for auto-fit content #202

## 3.2.4

- Fix a crash with the new whitespace handling introduced in 3.2.3 #191
- Table inside list must be aligned with the list item #192

## 3.2.3

- Improve support of table alignment #187
- Fix a crash if a span is empty
- Heading with only digits should not be considered as a numbering #189
- Fix whitespaces inserted between spans #179 and #185
- Support percentage size (typically width:100%) for img node #188

## 3.2.2

- Supports a feature to disable heading numbering #175
Expand Down
66 changes: 37 additions & 29 deletions HtmlToOpenXml.sln
Original file line number Diff line number Diff line change
Expand Up @@ -13,34 +13,42 @@ Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "Demo", "examples\Demo\Demo.
EndProject
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "HtmlToOpenXml.Tests", "test\HtmlToOpenXml.Tests\HtmlToOpenXml.Tests.csproj", "{CA0A68E0-45A0-4A01-A061-F951D93D6906}"
EndProject
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "Benchmark", "examples\Benchmark\Benchmark.csproj", "{143A3684-FAEB-43D0-A895-09BE5FDF85F6}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Release|Any CPU = Release|Any CPU
EndGlobalSection
GlobalSection(ProjectConfigurationPlatforms) = postSolution
{EF700F30-C9BB-49A6-912C-E3B77857B514}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{EF700F30-C9BB-49A6-912C-E3B77857B514}.Debug|Any CPU.Build.0 = Debug|Any CPU
{EF700F30-C9BB-49A6-912C-E3B77857B514}.Release|Any CPU.ActiveCfg = Release|Any CPU
{EF700F30-C9BB-49A6-912C-E3B77857B514}.Release|Any CPU.Build.0 = Release|Any CPU
{A1ECC760-B9F7-4A00-AF5F-568B5FD6F09F}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{A1ECC760-B9F7-4A00-AF5F-568B5FD6F09F}.Debug|Any CPU.Build.0 = Debug|Any CPU
{A1ECC760-B9F7-4A00-AF5F-568B5FD6F09F}.Release|Any CPU.ActiveCfg = Release|Any CPU
{A1ECC760-B9F7-4A00-AF5F-568B5FD6F09F}.Release|Any CPU.Build.0 = Release|Any CPU
{CA0A68E0-45A0-4A01-A061-F951D93D6906}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{CA0A68E0-45A0-4A01-A061-F951D93D6906}.Debug|Any CPU.Build.0 = Debug|Any CPU
{CA0A68E0-45A0-4A01-A061-F951D93D6906}.Release|Any CPU.ActiveCfg = Release|Any CPU
{CA0A68E0-45A0-4A01-A061-F951D93D6906}.Release|Any CPU.Build.0 = Release|Any CPU
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
EndGlobalSection
GlobalSection(NestedProjects) = preSolution
{EF700F30-C9BB-49A6-912C-E3B77857B514} = {58520A98-BA53-4BA4-AAE3-786AA21331D6}
{A1ECC760-B9F7-4A00-AF5F-568B5FD6F09F} = {84EA02ED-2E97-47D2-992E-32CC104A3A7A}
{CA0A68E0-45A0-4A01-A061-F951D93D6906} = {84EA02ED-2E97-47D2-992E-32CC104A3A7A}
EndGlobalSection
GlobalSection(ExtensibilityGlobals) = postSolution
SolutionGuid = {14EE1026-6507-4295-9FEE-67A55C3849CE}
EndGlobalSection
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Release|Any CPU = Release|Any CPU
EndGlobalSection
GlobalSection(ProjectConfigurationPlatforms) = postSolution
{EF700F30-C9BB-49A6-912C-E3B77857B514}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{EF700F30-C9BB-49A6-912C-E3B77857B514}.Debug|Any CPU.Build.0 = Debug|Any CPU
{EF700F30-C9BB-49A6-912C-E3B77857B514}.Release|Any CPU.ActiveCfg = Release|Any CPU
{EF700F30-C9BB-49A6-912C-E3B77857B514}.Release|Any CPU.Build.0 = Release|Any CPU
{A1ECC760-B9F7-4A00-AF5F-568B5FD6F09F}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{A1ECC760-B9F7-4A00-AF5F-568B5FD6F09F}.Debug|Any CPU.Build.0 = Debug|Any CPU
{A1ECC760-B9F7-4A00-AF5F-568B5FD6F09F}.Release|Any CPU.ActiveCfg = Release|Any CPU
{A1ECC760-B9F7-4A00-AF5F-568B5FD6F09F}.Release|Any CPU.Build.0 = Release|Any CPU
{CA0A68E0-45A0-4A01-A061-F951D93D6906}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{CA0A68E0-45A0-4A01-A061-F951D93D6906}.Debug|Any CPU.Build.0 = Debug|Any CPU
{CA0A68E0-45A0-4A01-A061-F951D93D6906}.Release|Any CPU.ActiveCfg = Release|Any CPU
{CA0A68E0-45A0-4A01-A061-F951D93D6906}.Release|Any CPU.Build.0 = Release|Any CPU
{143A3684-FAEB-43D0-A895-09BE5FDF85F6}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{143A3684-FAEB-43D0-A895-09BE5FDF85F6}.Debug|Any CPU.Build.0 = Debug|Any CPU
{143A3684-FAEB-43D0-A895-09BE5FDF85F6}.Release|Any CPU.ActiveCfg = Release|Any CPU
{143A3684-FAEB-43D0-A895-09BE5FDF85F6}.Release|Any CPU.Build.0 = Release|Any CPU
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
EndGlobalSection
GlobalSection(NestedProjects) = preSolution
{EF700F30-C9BB-49A6-912C-E3B77857B514} = {58520A98-BA53-4BA4-AAE3-786AA21331D6}
{A1ECC760-B9F7-4A00-AF5F-568B5FD6F09F} = {84EA02ED-2E97-47D2-992E-32CC104A3A7A}
{CA0A68E0-45A0-4A01-A061-F951D93D6906} = {84EA02ED-2E97-47D2-992E-32CC104A3A7A}
{143A3684-FAEB-43D0-A895-09BE5FDF85F6} = {84EA02ED-2E97-47D2-992E-32CC104A3A7A}
EndGlobalSection
GlobalSection(ExtensibilityGlobals) = postSolution
SolutionGuid = {14EE1026-6507-4295-9FEE-67A55C3849CE}
SolutionGuid = {194D4CBE-A20A-4E32-967B-E1BBD3922C29}
EndGlobalSection
EndGlobal
21 changes: 21 additions & 0 deletions examples/Benchmark/Benchmark.csproj
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
<Project Sdk="Microsoft.NET.Sdk">

<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFrameworks>net48;net8.0</TargetFrameworks>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
<SonarQubeExclude>true</SonarQubeExclude>
<LangVersion>latest</LangVersion>
</PropertyGroup>

<ItemGroup>
<PackageReference Include="BenchmarkDotnet" Version="0.14.0" />
<ProjectReference Include="..\..\src\Html2OpenXml\HtmlToOpenXml.csproj" />
</ItemGroup>

<ItemGroup>
<EmbeddedResource Include="*.html" />
</ItemGroup>

</Project>
35 changes: 35 additions & 0 deletions examples/Benchmark/Benchmarks.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Jobs;
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
using HtmlToOpenXml;

[MemoryDiagnoser]
[SimpleJob(runtimeMoniker: RuntimeMoniker.Net48)]
[SimpleJob(runtimeMoniker: RuntimeMoniker.Net80, baseline: true)]
public class Benchmarks
{
[Benchmark]
public async Task ParseWithSpan()
{
string html = ResourceHelper.GetString("benchmark.html");

using (MemoryStream generatedDocument = new MemoryStream())
using (WordprocessingDocument package = WordprocessingDocument.Create(generatedDocument, WordprocessingDocumentType.Document))
{
MainDocumentPart? mainPart = package.MainDocumentPart;
if (mainPart == null)
{
mainPart = package.AddMainDocumentPart();
new Document(new Body()).Save(mainPart);
}

HtmlConverter converter = new HtmlConverter(mainPart);
converter.RenderPreAsTable = true;

await converter.ParseBody(html);
mainPart.Document.Save();
}
}
}
3 changes: 3 additions & 0 deletions examples/Benchmark/Program.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
using BenchmarkDotNet.Running;

BenchmarkRunner.Run<Benchmarks>();
7 changes: 7 additions & 0 deletions examples/Benchmark/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Benchmarks

How to run the benchmark tool.
First build the project: `dotnet build -c Release`

Then run the performance test targeting multiple runtimes:
`dotnet run -c Release -f net8.0 --runtimes net48 net8.0`
40 changes: 40 additions & 0 deletions examples/Benchmark/ResourceHelper.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
/*
* Copyright (c) 2017 Deal Stream sàrl. All rights reserved
*/
using System.IO;
using System.Reflection;
using System.Resources;

/// <summary>
/// Helper class to get an embedded resources.
/// </summary>
public static class ResourceHelper
{
public static string GetString(string resourceName)
{
return GetString(typeof(ResourceHelper).GetTypeInfo().Assembly, resourceName);
}

public static string GetString(Assembly assembly, string resourceName)
{
using (var stream = GetStream(assembly, resourceName))
{
using (var reader = new StreamReader(stream))
return reader.ReadToEnd();
}
}

public static Stream GetStream(string resourceName)
{
return GetStream(typeof(ResourceHelper).GetTypeInfo().Assembly, resourceName);
}

public static Stream GetStream(Assembly assembly, string resourceName)
{
var stream = assembly.GetManifestResourceStream(assembly.GetName().Name + "." + resourceName);
if (stream == null)
throw new MissingManifestResourceException($"Requested resource `{resourceName}` was not found in the assembly `{assembly}`.");

return stream;
}
}
Loading