Skip to content

Commit 5f47430

Browse files
committed
feat: update report
1 parent 2961265 commit 5f47430

File tree

1 file changed

+166
-3
lines changed

1 file changed

+166
-3
lines changed

tts_tech_report/index.html

Lines changed: 166 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,9 @@ <h2>Contents</h2>
8282
<li><a href="#flow-vae-vs-vae-comparison">Flow-VAE vs. VAE Comparison</a></li>
8383
<li><a href="#professional-voice-clone-pvc-demonstration">Professional Voice Clone (PVC) Demonstration</a></li>
8484
<li><a href="#emotion-control-demonstration">Emotion Control Demonstration</a></li>
85+
<li><a href="#text-prompted-voice-generation-demonstration">Text-Prompted Voice Generation Demonstration</a></li>
86+
<li><a href="#comparison-of-voice-naturalness">Comparison of voice
87+
naturalness with the previous generation product</a></li>
8588
</ol>
8689
</nav>
8790

@@ -750,17 +753,26 @@ <h2 id="professional-voice-clone-pvc-demonstration">Professional Voice Clone (PV
750753
<th scope="col" style="text-align: center;">Source Audio</th>
751754
<th scope="col" style="text-align: center;">Fast</th>
752755
<th scope="col" style="text-align: center;">PVC</th>
756+
<th scope="col" style="text-align: center;">Differences</th>
753757
</tr>
754758
<tr class="border-bottom-thin">
755-
<td style="width: 30%">
759+
<td style="width: 25%">
756760
<audio src="assets/audios/JosephBrodsky_Source.wav" controls></audio>
757761
</td>
758-
<td style="width: 30%">
762+
<td style="width: 25%">
759763
<audio src="assets/audios/JosephBrodsky_Fast.mp3" controls></audio>
760764
</td>
761-
<td style="width: 30%">
765+
<td style="width: 25%">
762766
<audio src="assets/audios/JosephBrodsky_PVC.mp3" controls></audio>
763767
</td>
768+
<td>
769+
Like the ZeroShot version, the PVC<br>
770+
version has rising sentence-final intonation,<br>
771+
but distinctively sustains this<br>
772+
elevated pitch instead of the typical<br>
773+
pitch declination found in common<br>
774+
declarative sentences
775+
</td>
764776
</tr>
765777
<tr class="border-bottom-thin">
766778
<td>
@@ -772,6 +784,12 @@ <h2 id="professional-voice-clone-pvc-demonstration">Professional Voice Clone (PV
772784
<td>
773785
<audio src="assets/audios/TianJin_PVC.mp3" controls></audio>
774786
</td>
787+
<td>
788+
With more materials, the model not only<br>
789+
reproduces the speaker's voice characteristics<br>
790+
but also accurately captures more<br>
791+
dialectal features
792+
</td>
775793
</tr>
776794
</tbody>
777795
</table>
@@ -886,6 +904,151 @@ <h3>DEMO</h3>
886904
</table>
887905
</div>
888906
</div>
907+
908+
<div class="article-block">
909+
<h2 id="text-prompted-voice-generation-demonstration">Text-Prompted Voice Generation Demonstration</h2>
910+
<div class="scroll-wrapper">
911+
<table style="width: 100%;">
912+
<tbody>
913+
<tr class="border-bottom-thin">
914+
<th scope="col">Prompt</th>
915+
<th scope="col">Text Content</th>
916+
<th scope="col" style="text-align: center;">Audio</th>
917+
</tr>
918+
<tr class="border-bottom-thin">
919+
<td>
920+
<p>
921+
男性中年声音,说中文,音色浑厚醇厚,带有自然的磁性,语速偏慢,<br>
922+
音量适中,音调偏低沉。声音整体给人沉稳可靠的感觉,<br>
923+
在深度访谈场景中表现出专业性和亲和力,音质清晰,吐字规整有力。
924+
</p>
925+
<p>
926+
English Meaning: A middle-aged male voice speaking Chinese,<br>
927+
characterized by rich and mellow timbre with natural resonance.<br>
928+
The speech rate is moderately slow, with medium volume and<br>
929+
a deep pitch. The overall voice quality conveys a sense of<br>
930+
steadiness and reliability, demonstrating both professionalism <br>
931+
and approachability in in-depth interview settings.<br>
932+
The voice features clear articulation with precise and<br>
933+
well-defined pronunciation.
934+
</p>
935+
</td>
936+
<td>
937+
在这个安静的夜晚,让我们一起走进《人生笔记》这本书。<br>
938+
作者用平实的文字记录下生活中的点点滴滴,<br>
939+
让我们看到平凡中的真善美。<br>
940+
今天,我们先来读第一章:'生活的痕迹'......
941+
</td>
942+
<td>
943+
<audio class="audio-md" src="assets/audios/深度访谈男中年.wav" controls></audio>
944+
</td>
945+
</tr>
946+
<tr class="border-bottom-thin">
947+
<td>
948+
<p>
949+
说中文的女青年,音色偏甜美,语速比较快,说话时带着一种轻快的感觉,<br>
950+
整体音调较高,像是在直播带货,整体氛围比较活跃,<br>
951+
声音清晰,听起来很有亲和力。
952+
</p>
953+
<p>
954+
A young female from China voice with a sweet and pleasant timbre. <br>
955+
The speech rate is relatively fast, and the pitch is moderately high,<br>
956+
carrying a light and energetic quality reminiscent of live-streaming sales<br>
957+
presentations. The overall atmosphere is vibrant and dynamic,<br>
958+
with clear voice quality and an engaging, approachable tone.
959+
</p>
960+
</td>
961+
<td>
962+
亲爱的宝宝们,等了好久的神仙面霜终于到货啦!<br>
963+
你们看这个包装是不是超级精致?<br>
964+
我自己已经用了一个月了,效果真的绝绝子!<br>
965+
而且这次活动价真的太划算了,错过真的会后悔的哦~
966+
</td>
967+
<td>
968+
<audio class="audio-md" src="assets/audios/直播带货女青年.wav" controls></audio>
969+
</td>
970+
</tr>
971+
<tr class="border-bottom-thin">
972+
<td>
973+
<p>
974+
中国女青年的声音,音色清脆,说话速度偏快,语调活泼,<br>
975+
像是在做游戏直播,声音中带着愉快的感觉,整体音调较高,<br>
976+
整体氛围比较轻松。
977+
</p>
978+
<p>
979+
A young Chinese woman's voice with a crisp and bright timbre. <br>
980+
The speech rate is relatively fast, with a moderately high<br>
981+
pitch and lively intonation characteristic of gaming live streams.<br>
982+
The voice conveys a cheerful quality, creating an overall<br>
983+
relaxed and casual atmosphere.
984+
</p>
985+
</td>
986+
<td>
987+
啊!这里有个宝箱!让我们看看里面是什么~<br>
988+
哇!是传说中的紫色装备!运气也太好了吧!<br>
989+
谢谢小伙伴们的打赏,我们继续往前探索......
990+
</td>
991+
<td>
992+
<audio class="audio-md" src="assets/audios/游戏主播女青年.wav" controls></audio>
993+
</td>
994+
</tr>
995+
</tbody>
996+
</table>
997+
</div>
998+
</div>
999+
1000+
<div class="article-block">
1001+
<h2 id="comparison-of-voice-naturalness">Comparison of voice naturalness
1002+
with the previous generation product</h2>
1003+
<p>The new model demonstrates significant advantages in naturalness compared to the previous version.</p>
1004+
<h3 style="margin-top: 2rem;">Source Audio for Radiant_Girl</h3>
1005+
<audio src="assets/audios/English_Radiant_Girl_Sourse.wav" controls></audio>
1006+
<h3>DEMO</h3>
1007+
<div class="scroll-wrapper">
1008+
<table style="width: 100%;">
1009+
<tbody>
1010+
<tr class="border-bottom-thin">
1011+
<th scope="col">Text Content</th>
1012+
<th scope="col" style="text-align: center;">Mnimax<br>Speech_02_HD</th>
1013+
<th scope="col" style="text-align: center;">Microsoft<br>Azure TTS</th>
1014+
<th scope="col" style="text-align: center;">AWS<br>Polly</th>
1015+
</tr>
1016+
<tr class="border-bottom-thin">
1017+
<td>
1018+
I sat alone in the empty room, staring at the old photographs,<br>
1019+
wondering how everything could change so quickly,<br>
1020+
how a lifetime of memories could fade away just like that.
1021+
</td>
1022+
<td>
1023+
<audio class="audio-md" src="assets/audios/Radiant_Girl_1.mp3" controls></audio>
1024+
</td>
1025+
<td>
1026+
<audio class="audio-md" src="assets/audios/Emma_1.mp3" controls></audio>
1027+
</td>
1028+
<td>
1029+
<audio class="audio-md" src="assets/audios/Joanna_1.mp3" controls></audio>
1030+
</td>
1031+
</tr>
1032+
<tr class="border-bottom-thin">
1033+
<td>
1034+
The moment I held my acceptance letter, my heart burst with joy - <br>
1035+
all those sleepless nights finally paid off, and I couldn't stop<br>
1036+
dancing around the room, calling everyone I knew to share this amazing news!
1037+
</td>
1038+
<td>
1039+
<audio class="audio-md" src="assets/audios/Radiant_Girl_2.mp3" controls></audio>
1040+
</td>
1041+
<td>
1042+
<audio class="audio-md" src="assets/audios/Emma_2.mp3" controls></audio>
1043+
</td>
1044+
<td>
1045+
<audio class="audio-md" src="assets/audios/Joanna_2.mp3" controls></audio>
1046+
</td>
1047+
</tr>
1048+
</tbody>
1049+
</table>
1050+
</div>
1051+
</div>
8891052
</article>
8901053
</main>
8911054

0 commit comments

Comments
 (0)