้ซๆง่ฝ C++17 Unicode ไธๅค็ผ็ ่ฝฌๆขๅบ
180+ ็ง็ผ็ ยท SIMD ๅ ้ ยท ๆ ้ๅนถๅ ยท ๅคๆ ธๅนถ่กๆนๅค็
UniConv ๅฐ GNU libiconv ็็ผ็ ๅนฟๅบฆ๏ผ180+ ็งๅญ็ฌฆ้๏ผไธ็ฐไปฃ C++ ้ซๆง่ฝๆๆฏๆ ็ธ็ปๅ๏ผๅฏ้ simdutf SIMD ๅ ้ใๆ ้ๅนถๅๆ่ฟฐ็ฌฆ็ผๅญใ้ถๆท่ด I/O ๅ่ช้ๅบๅคๆ ธๆนๅค็ใ
- 180+ ็ง็ผ็ โ UTF-8/16/32ใGBKใShift_JISใISO-8859ใEBCDIC ็ญๅฎๆด่ฆ็
- SIMD ๅ ้ โ ๅฏ้ simdutf ้ๆ๏ผUTF-8/16 ไบ่ฝฌ 3.3~5.1 GB/s๏ผ4~12.5x ๅ ้๏ผ
- ๅคๆ ธๅนถ่กๆนๅค็ โ 4096 ๆกๆฐๆฎๅนถ่ก่ฝฌๆข่พพ 10.76 GB/s๏ผ17.7x ๅ ้ๆฏ๏ผ
- ๆ ้ๅนถๅ็ผๅญ โ parallel-hashmap 4-way ๅนถ่ก + O(1) LRU ้ๅบ
- ้ถๆท่ด I/O โ
string_view็ดไผ iconv +BufferLease้ถๆท่ด่พๅบ + iconv ็ดๅstd::string - ็บฟ็จๅฎๅ จ โ ๆๆๅ ฌๅ ฑ API ๅๅฏๅค็บฟ็จ็ดๆฅ่ฐ็จ
- ่ทจๅนณๅฐ โ Windows / Linux / macOS๏ผvcpkg / FetchContent / ๆบ็ ๅ ๅต
Apple Silicon (10ๆ ธ, ARM64)๏ผRelease ไผๅใๅฎๆดๆฐๆฎ่ง BENCHMARK.mdใ
ๅๅ้ๆขฏ้๏ผUTF-8 โ UTF-16, 1MB, ๅ็บฟ็จ๏ผ:
simdutf (AVX-512) โโโโโโโโโโโโโโโโโโโโโโโโ 12 GB/s
UniConv+simdutf (NEON) โโโโโโโโโโโ 3.6 GB/s โ ๅผๅฏ SIMD
โโโ SIMD โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
UniConv (iconv) โโโ 900 MB/s โ ๆ ้็ฌฌไธๆขฏ้
glibc iconv โโ 700 MB/s
ICU ucnv โโ 500 MB/s
โโโ ๆ ้ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
UniConv (ๅนถ่ก batch) โโโโโโโโโโโโโโโโโโโโโโ 10.76 GB/s โ ๅคๆ ธๅนถ่ก
| ๅบๆฏ | ๅๅ้ |
|---|---|
| UTF-8 โ UTF-16LE (iconv) | 895 MB/s |
| UTF-8 โ UTF-16LE (simdutf) | 3.62 GB/s |
| ASCII ๆๆฌ (simdutf) | 5.62 GB/s |
| ๅนถ่ก batch 4096 ๆก | 10.76 GB/s |
ThreadLocal() ่ทๅๅฎไพ |
0.75 ns |
ๅฎ่ฃ
vcpkg install hesphoros-uniconv # vcpkg
# ๆ CMake FetchContent:
# FetchContent_Declare(UniConv GIT_REPOSITORY https://github.com/hesphoros/UniConv.git GIT_TAG main)
# FetchContent_MakeAvailable(UniConv)
# target_link_libraries(your_target PRIVATE UniConv)ไฝฟ็จ
#include <UniConv/UniConv.h>
auto& conv = UniConv::ThreadLocal(); // 0.75ns ่ทๅๅฎไพ
// ๅๆก่ฝฌๆข
std::string output;
conv.ConvertEncoding("ไธญๆๆต่ฏ", "GBK", "UTF-8", output);
// ๅธฆ้่ฏฏๅค็
auto result = conv.ConvertEncodingFast("ๆต่ฏ", "GBK", "UTF-8");
if (result.IsSuccess()) { /* result.GetValue() */ }
// ๆๅฟซ่ทฏๅพ (842 MB/s)
ErrorCode err = conv.ConvertEncodingFast("ๆฐๆฎ", "UTF-8", "UTF-16LE", output);
// ๅนถ่กๆนๅค็ (4096ๆก โ 10.76 GB/s)
std::vector<std::string> dataset = LoadData(), outputs;
conv.ConvertEncodingBatchParallel(dataset, "GBK", "UTF-8", outputs);
// string_view ้ถๆท่ด
conv.ConvertEncodingFast(std::string_view{large_buf, len}, "UTF-8", "UTF-16LE", output);cmake .. -DCMAKE_BUILD_TYPE=Release # ๆ ๅๆๅปบ
cmake .. -DUNICONV_USE_SIMDUTF=ON # ๅฏ็จ SIMD
cmake .. -DUNICONV_BUILD_TESTS=ON # ๅๅ
ๆต่ฏ (Google Test)
cmake .. -DUNICONV_BUILD_BENCHMARKS=ON # ๆง่ฝๅบๅ (Google Benchmark)191 ้กนๆต่ฏๅจ ASan + UBSan ไธๅ จ้จ้่ฟ๏ผ้ถๅ ๅญๅฎๅ จ้ฎ้ขใ
| ไพ่ต | ็จ้ | ่ฎธๅฏ่ฏ |
|---|---|---|
| GNU libiconv | ็ผ็ ่ฝฌๆขๅผๆ | LGPL-2.1+ |
| parallel-hashmap | ๆ ้ๅนถๅๅๅธ่กจ | Apache-2.0 |
| simdutf (ๅฏ้) | SIMD UTF ๅ ้ | Apache-2.0 / MIT |
็ณป็ป่ฆๆฑ๏ผC++17 ยท CMake 3.16+ ยท Windows / Linux / macOS
ๆฌ้กน็ฎ้็จ MIT ่ฎธๅฏ่ฏ โ ่ฏฆ่ง LICENSEใ ๆฌไปๅบๅ ๅต็ libiconv ๆบไปฃ็ ้ตๅพช LGPL-2.1+ โ ่ฏฆ่ง COPYING.LIB ๅ THIRD_PARTY_LICENSES.mdใ