@@ -82,6 +82,9 @@ <h2>Contents</h2>
82
82
< li > < a href ="#flow-vae-vs-vae-comparison "> Flow-VAE vs. VAE Comparison</ a > </ li >
83
83
< li > < a href ="#professional-voice-clone-pvc-demonstration "> Professional Voice Clone (PVC) Demonstration</ a > </ li >
84
84
< li > < a href ="#emotion-control-demonstration "> Emotion Control Demonstration</ a > </ li >
85
+ < li > < a href ="#text-prompted-voice-generation-demonstration "> Text-Prompted Voice Generation Demonstration</ a > </ li >
86
+ < li > < a href ="#comparison-of-voice-naturalness "> Comparison of voice
87
+ naturalness with the previous generation product</ a > </ li >
85
88
</ ol >
86
89
</ nav >
87
90
@@ -750,17 +753,26 @@ <h2 id="professional-voice-clone-pvc-demonstration">Professional Voice Clone (PV
750
753
< th scope ="col " style ="text-align: center; "> Source Audio</ th >
751
754
< th scope ="col " style ="text-align: center; "> Fast</ th >
752
755
< th scope ="col " style ="text-align: center; "> PVC</ th >
756
+ < th scope ="col " style ="text-align: center; "> Differences</ th >
753
757
</ tr >
754
758
< tr class ="border-bottom-thin ">
755
- < td style ="width: 30 % ">
759
+ < td style ="width: 25 % ">
756
760
< audio src ="assets/audios/JosephBrodsky_Source.wav " controls > </ audio >
757
761
</ td >
758
- < td style ="width: 30 % ">
762
+ < td style ="width: 25 % ">
759
763
< audio src ="assets/audios/JosephBrodsky_Fast.mp3 " controls > </ audio >
760
764
</ td >
761
- < td style ="width: 30 % ">
765
+ < td style ="width: 25 % ">
762
766
< audio src ="assets/audios/JosephBrodsky_PVC.mp3 " controls > </ audio >
763
767
</ td >
768
+ < td >
769
+ Like the ZeroShot version, the PVC< br >
770
+ version has rising sentence-final intonation,< br >
771
+ but distinctively sustains this< br >
772
+ elevated pitch instead of the typical< br >
773
+ pitch declination found in common< br >
774
+ declarative sentences
775
+ </ td >
764
776
</ tr >
765
777
< tr class ="border-bottom-thin ">
766
778
< td >
@@ -772,6 +784,12 @@ <h2 id="professional-voice-clone-pvc-demonstration">Professional Voice Clone (PV
772
784
< td >
773
785
< audio src ="assets/audios/TianJin_PVC.mp3 " controls > </ audio >
774
786
</ td >
787
+ < td >
788
+ With more materials, the model not only< br >
789
+ reproduces the speaker's voice characteristics< br >
790
+ but also accurately captures more< br >
791
+ dialectal features
792
+ </ td >
775
793
</ tr >
776
794
</ tbody >
777
795
</ table >
@@ -886,6 +904,151 @@ <h3>DEMO</h3>
886
904
</ table >
887
905
</ div >
888
906
</ div >
907
+
908
+ < div class ="article-block ">
909
+ < h2 id ="text-prompted-voice-generation-demonstration "> Text-Prompted Voice Generation Demonstration</ h2 >
910
+ < div class ="scroll-wrapper ">
911
+ < table style ="width: 100%; ">
912
+ < tbody >
913
+ < tr class ="border-bottom-thin ">
914
+ < th scope ="col "> Prompt</ th >
915
+ < th scope ="col "> Text Content</ th >
916
+ < th scope ="col " style ="text-align: center; "> Audio</ th >
917
+ </ tr >
918
+ < tr class ="border-bottom-thin ">
919
+ < td >
920
+ < p >
921
+ 男性中年声音,说中文,音色浑厚醇厚,带有自然的磁性,语速偏慢,< br >
922
+ 音量适中,音调偏低沉。声音整体给人沉稳可靠的感觉,< br >
923
+ 在深度访谈场景中表现出专业性和亲和力,音质清晰,吐字规整有力。
924
+ </ p >
925
+ < p >
926
+ English Meaning: A middle-aged male voice speaking Chinese,< br >
927
+ characterized by rich and mellow timbre with natural resonance.< br >
928
+ The speech rate is moderately slow, with medium volume and< br >
929
+ a deep pitch. The overall voice quality conveys a sense of< br >
930
+ steadiness and reliability, demonstrating both professionalism < br >
931
+ and approachability in in-depth interview settings.< br >
932
+ The voice features clear articulation with precise and< br >
933
+ well-defined pronunciation.
934
+ </ p >
935
+ </ td >
936
+ < td >
937
+ 在这个安静的夜晚,让我们一起走进《人生笔记》这本书。< br >
938
+ 作者用平实的文字记录下生活中的点点滴滴,< br >
939
+ 让我们看到平凡中的真善美。< br >
940
+ 今天,我们先来读第一章:'生活的痕迹'......
941
+ </ td >
942
+ < td >
943
+ < audio class ="audio-md " src ="assets/audios/深度访谈男中年.wav " controls > </ audio >
944
+ </ td >
945
+ </ tr >
946
+ < tr class ="border-bottom-thin ">
947
+ < td >
948
+ < p >
949
+ 说中文的女青年,音色偏甜美,语速比较快,说话时带着一种轻快的感觉,< br >
950
+ 整体音调较高,像是在直播带货,整体氛围比较活跃,< br >
951
+ 声音清晰,听起来很有亲和力。
952
+ </ p >
953
+ < p >
954
+ A young female from China voice with a sweet and pleasant timbre. < br >
955
+ The speech rate is relatively fast, and the pitch is moderately high,< br >
956
+ carrying a light and energetic quality reminiscent of live-streaming sales< br >
957
+ presentations. The overall atmosphere is vibrant and dynamic,< br >
958
+ with clear voice quality and an engaging, approachable tone.
959
+ </ p >
960
+ </ td >
961
+ < td >
962
+ 亲爱的宝宝们,等了好久的神仙面霜终于到货啦!< br >
963
+ 你们看这个包装是不是超级精致?< br >
964
+ 我自己已经用了一个月了,效果真的绝绝子!< br >
965
+ 而且这次活动价真的太划算了,错过真的会后悔的哦~
966
+ </ td >
967
+ < td >
968
+ < audio class ="audio-md " src ="assets/audios/直播带货女青年.wav " controls > </ audio >
969
+ </ td >
970
+ </ tr >
971
+ < tr class ="border-bottom-thin ">
972
+ < td >
973
+ < p >
974
+ 中国女青年的声音,音色清脆,说话速度偏快,语调活泼,< br >
975
+ 像是在做游戏直播,声音中带着愉快的感觉,整体音调较高,< br >
976
+ 整体氛围比较轻松。
977
+ </ p >
978
+ < p >
979
+ A young Chinese woman's voice with a crisp and bright timbre. < br >
980
+ The speech rate is relatively fast, with a moderately high< br >
981
+ pitch and lively intonation characteristic of gaming live streams.< br >
982
+ The voice conveys a cheerful quality, creating an overall< br >
983
+ relaxed and casual atmosphere.
984
+ </ p >
985
+ </ td >
986
+ < td >
987
+ 啊!这里有个宝箱!让我们看看里面是什么~< br >
988
+ 哇!是传说中的紫色装备!运气也太好了吧!< br >
989
+ 谢谢小伙伴们的打赏,我们继续往前探索......
990
+ </ td >
991
+ < td >
992
+ < audio class ="audio-md " src ="assets/audios/游戏主播女青年.wav " controls > </ audio >
993
+ </ td >
994
+ </ tr >
995
+ </ tbody >
996
+ </ table >
997
+ </ div >
998
+ </ div >
999
+
1000
+ < div class ="article-block ">
1001
+ < h2 id ="comparison-of-voice-naturalness "> Comparison of voice naturalness
1002
+ with the previous generation product</ h2 >
1003
+ < p > The new model demonstrates significant advantages in naturalness compared to the previous version.</ p >
1004
+ < h3 style ="margin-top: 2rem; "> Source Audio for Radiant_Girl</ h3 >
1005
+ < audio src ="assets/audios/English_Radiant_Girl_Sourse.wav " controls > </ audio >
1006
+ < h3 > DEMO</ h3 >
1007
+ < div class ="scroll-wrapper ">
1008
+ < table style ="width: 100%; ">
1009
+ < tbody >
1010
+ < tr class ="border-bottom-thin ">
1011
+ < th scope ="col "> Text Content</ th >
1012
+ < th scope ="col " style ="text-align: center; "> Mnimax< br > Speech_02_HD</ th >
1013
+ < th scope ="col " style ="text-align: center; "> Microsoft< br > Azure TTS</ th >
1014
+ < th scope ="col " style ="text-align: center; "> AWS< br > Polly</ th >
1015
+ </ tr >
1016
+ < tr class ="border-bottom-thin ">
1017
+ < td >
1018
+ I sat alone in the empty room, staring at the old photographs,< br >
1019
+ wondering how everything could change so quickly,< br >
1020
+ how a lifetime of memories could fade away just like that.
1021
+ </ td >
1022
+ < td >
1023
+ < audio class ="audio-md " src ="assets/audios/Radiant_Girl_1.mp3 " controls > </ audio >
1024
+ </ td >
1025
+ < td >
1026
+ < audio class ="audio-md " src ="assets/audios/Emma_1.mp3 " controls > </ audio >
1027
+ </ td >
1028
+ < td >
1029
+ < audio class ="audio-md " src ="assets/audios/Joanna_1.mp3 " controls > </ audio >
1030
+ </ td >
1031
+ </ tr >
1032
+ < tr class ="border-bottom-thin ">
1033
+ < td >
1034
+ The moment I held my acceptance letter, my heart burst with joy - < br >
1035
+ all those sleepless nights finally paid off, and I couldn't stop< br >
1036
+ dancing around the room, calling everyone I knew to share this amazing news!
1037
+ </ td >
1038
+ < td >
1039
+ < audio class ="audio-md " src ="assets/audios/Radiant_Girl_2.mp3 " controls > </ audio >
1040
+ </ td >
1041
+ < td >
1042
+ < audio class ="audio-md " src ="assets/audios/Emma_2.mp3 " controls > </ audio >
1043
+ </ td >
1044
+ < td >
1045
+ < audio class ="audio-md " src ="assets/audios/Joanna_2.mp3 " controls > </ audio >
1046
+ </ td >
1047
+ </ tr >
1048
+ </ tbody >
1049
+ </ table >
1050
+ </ div >
1051
+ </ div >
889
1052
</ article >
890
1053
</ main >
891
1054
0 commit comments