Skip to content

hesphoros/UniConv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

113 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

UniConv

้ซ˜ๆ€ง่ƒฝ C++17 Unicode ไธŽๅคš็ผ–็ ่ฝฌๆขๅบ“

180+ ็ง็ผ–็  ยท SIMD ๅŠ ้€Ÿ ยท ๆ— ้”ๅนถๅ‘ ยท ๅคšๆ ธๅนถ่กŒๆ‰นๅค„็†

Release License: MIT C++17 Platform Tests


UniConv ๅฐ† GNU libiconv ็š„็ผ–็ ๅนฟๅบฆ๏ผˆ180+ ็งๅญ—็ฌฆ้›†๏ผ‰ไธŽ็Žฐไปฃ C++ ้ซ˜ๆ€ง่ƒฝๆŠ€ๆœฏๆ ˆ็›ธ็ป“ๅˆ๏ผšๅฏ้€‰ simdutf SIMD ๅŠ ้€Ÿใ€ๆ— ้”ๅนถๅ‘ๆ่ฟฐ็ฌฆ็ผ“ๅญ˜ใ€้›ถๆ‹ท่ด I/O ๅ’Œ่‡ช้€‚ๅบ”ๅคšๆ ธๆ‰นๅค„็†ใ€‚

ๆ ธๅฟƒ็‰นๆ€ง

  • 180+ ็ง็ผ–็  โ€” UTF-8/16/32ใ€GBKใ€Shift_JISใ€ISO-8859ใ€EBCDIC ็ญ‰ๅฎŒๆ•ด่ฆ†็›–
  • SIMD ๅŠ ้€Ÿ โ€” ๅฏ้€‰ simdutf ้›†ๆˆ๏ผŒUTF-8/16 ไบ’่ฝฌ 3.3~5.1 GB/s๏ผˆ4~12.5x ๅŠ ้€Ÿ๏ผ‰
  • ๅคšๆ ธๅนถ่กŒๆ‰นๅค„็† โ€” 4096 ๆกๆ•ฐๆฎๅนถ่กŒ่ฝฌๆข่พพ 10.76 GB/s๏ผˆ17.7x ๅŠ ้€Ÿๆฏ”๏ผ‰
  • ๆ— ้”ๅนถๅ‘็ผ“ๅญ˜ โ€” parallel-hashmap 4-way ๅนถ่กŒ + O(1) LRU ้€ๅ‡บ
  • ้›ถๆ‹ท่ด I/O โ€” string_view ็›ดไผ  iconv + BufferLease ้›ถๆ‹ท่ด่พ“ๅ‡บ + iconv ็›ดๅ†™ std::string
  • ็บฟ็จ‹ๅฎ‰ๅ…จ โ€” ๆ‰€ๆœ‰ๅ…ฌๅ…ฑ API ๅ‡ๅฏๅคš็บฟ็จ‹็›ดๆŽฅ่ฐƒ็”จ
  • ่ทจๅนณๅฐ โ€” Windows / Linux / macOS๏ผŒvcpkg / FetchContent / ๆบ็ ๅ†…ๅตŒ

ๆ€ง่ƒฝไธ€่งˆ

Apple Silicon (10ๆ ธ, ARM64)๏ผŒRelease ไผ˜ๅŒ–ใ€‚ๅฎŒๆ•ดๆ•ฐๆฎ่ง BENCHMARK.mdใ€‚

ๅžๅ้‡ๆขฏ้˜Ÿ๏ผˆUTF-8 โ†’ UTF-16, 1MB, ๅ•็บฟ็จ‹๏ผ‰:

  simdutf (AVX-512)        โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ  12 GB/s
  UniConv+simdutf (NEON)   โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ               3.6 GB/s  โ† ๅผ€ๅฏ SIMD
  โ”€โ”€โ”€ SIMD โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  UniConv (iconv)          โ–ˆโ–ˆโ–ˆ                        900 MB/s  โ† ๆ ‡้‡็ฌฌไธ€ๆขฏ้˜Ÿ
  glibc iconv              โ–ˆโ–ˆ                         700 MB/s
  ICU ucnv                 โ–ˆโ–ˆ                         500 MB/s
  โ”€โ”€โ”€ ๆ ‡้‡ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  UniConv (ๅนถ่กŒ batch)     โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ    10.76 GB/s โ† ๅคšๆ ธๅนถ่กŒ
ๅœบๆ™ฏ ๅžๅ้‡
UTF-8 โ†’ UTF-16LE (iconv) 895 MB/s
UTF-8 โ†’ UTF-16LE (simdutf) 3.62 GB/s
ASCII ๆ–‡ๆœฌ (simdutf) 5.62 GB/s
ๅนถ่กŒ batch 4096 ๆก 10.76 GB/s
ThreadLocal() ่Žทๅ–ๅฎžไพ‹ 0.75 ns

ๅฟซ้€Ÿๅผ€ๅง‹

ๅฎ‰่ฃ…

vcpkg install hesphoros-uniconv                            # vcpkg
# ๆˆ– CMake FetchContent:
# FetchContent_Declare(UniConv GIT_REPOSITORY https://github.com/hesphoros/UniConv.git GIT_TAG main)
# FetchContent_MakeAvailable(UniConv)
# target_link_libraries(your_target PRIVATE UniConv)

ไฝฟ็”จ

#include <UniConv/UniConv.h>

auto& conv = UniConv::ThreadLocal();  // 0.75ns ่Žทๅ–ๅฎžไพ‹

// ๅ•ๆก่ฝฌๆข
std::string output;
conv.ConvertEncoding("ไธญๆ–‡ๆต‹่ฏ•", "GBK", "UTF-8", output);

// ๅธฆ้”™่ฏฏๅค„็†
auto result = conv.ConvertEncodingFast("ๆต‹่ฏ•", "GBK", "UTF-8");
if (result.IsSuccess()) { /* result.GetValue() */ }

// ๆœ€ๅฟซ่ทฏๅพ„ (842 MB/s)
ErrorCode err = conv.ConvertEncodingFast("ๆ•ฐๆฎ", "UTF-8", "UTF-16LE", output);

// ๅนถ่กŒๆ‰นๅค„็† (4096ๆก โ†’ 10.76 GB/s)
std::vector<std::string> dataset = LoadData(), outputs;
conv.ConvertEncodingBatchParallel(dataset, "GBK", "UTF-8", outputs);

// string_view ้›ถๆ‹ท่ด
conv.ConvertEncodingFast(std::string_view{large_buf, len}, "UTF-8", "UTF-16LE", output);

ๆž„ๅปบ

cmake .. -DCMAKE_BUILD_TYPE=Release                    # ๆ ‡ๅ‡†ๆž„ๅปบ
cmake .. -DUNICONV_USE_SIMDUTF=ON                      # ๅฏ็”จ SIMD
cmake .. -DUNICONV_BUILD_TESTS=ON                      # ๅ•ๅ…ƒๆต‹่ฏ• (Google Test)
cmake .. -DUNICONV_BUILD_BENCHMARKS=ON                 # ๆ€ง่ƒฝๅŸบๅ‡† (Google Benchmark)

191 ้กนๆต‹่ฏ•ๅœจ ASan + UBSan ไธ‹ๅ…จ้ƒจ้€š่ฟ‡๏ผŒ้›ถๅ†…ๅญ˜ๅฎ‰ๅ…จ้—ฎ้ข˜ใ€‚

ไพ่ต–

ไพ่ต– ็”จ้€” ่ฎธๅฏ่ฏ
GNU libiconv ็ผ–็ ่ฝฌๆขๅผ•ๆ“Ž LGPL-2.1+
parallel-hashmap ๆ— ้”ๅนถๅ‘ๅ“ˆๅธŒ่กจ Apache-2.0
simdutf (ๅฏ้€‰) SIMD UTF ๅŠ ้€Ÿ Apache-2.0 / MIT

็ณป็ปŸ่ฆๆฑ‚๏ผšC++17 ยท CMake 3.16+ ยท Windows / Linux / macOS

่ฎธๅฏ่ฏ

ๆœฌ้กน็›ฎ้‡‡็”จ MIT ่ฎธๅฏ่ฏ โ€” ่ฏฆ่ง LICENSEใ€‚ ๆœฌไป“ๅบ“ๅ†…ๅตŒ็š„ libiconv ๆบไปฃ็ ้ตๅพช LGPL-2.1+ โ€” ่ฏฆ่ง COPYING.LIB ๅ’Œ THIRD_PARTY_LICENSES.mdใ€‚

About

๐Ÿš€ UniConv - High-performance C++17 character encoding conversion library with SIMD acceleration (simdutf), O(1) LRU cache, tiered buffer pools, lock-free thread pool, and adaptive parallel processing. Supports 100+ encodings Zero-copy conversions & modern CMake integration.

Topics

Resources

License

MIT, Unknown licenses found

Licenses found

MIT
LICENSE
Unknown
COPYING.LIB

Stars

Watchers

Forks

Packages

 
 
 

Contributors