Skip to content

Commit 2384d06

Browse files
committed
use TextEncoder and TextDecoder when available
This commit allows the RxPlayer to use the `TextEncoder` and `TextDecoder` APIs when available respectively to encode JS strings into an UTF-8 bytes sequence (TextEncoder doesn't seem to be able to encode into any other encoding) and to decode from either UTF-8, UTF-16BE or UTF-16LE into a JS string. Because `TextEncoder` and `TextDecoder` are not defined in old browser versions we claim to support and in IE11, we still fallback to custom implementation either if it doesn't exist or if the operation fails. It is important to note of a sensible difference between using the `TextDecoder` interface and the previous implementation: when encountering invalid byte sequences in the correponding encoding, the `TextDecoder` will replace those by a "REPLACEMENT CHARACTER" (�). This seems fine and even desirable, but the previous implementation just threw in that same situation. This means that we now have two different behaviors, depending on the current platform / browser. Those functions using the `TextDecoder` APIs are even directly defined in the `StringUtils` tools, and thus that new behavior can be directly noticable by applications using it. Thankfully, nothing is defined in our API documentation about invalid sequences. Even if we can consider that this does not break our API (though it is still unclear to me), it should be is something to keep in mind as this might be unexpected for users relying on this API throwing. Also, I tried to add unit tests, but it appears that "jsdom", on which relies jest to perform unit test while simulation a browser in node, does not include either APIs yet. Though it is under way: jsdom/whatwg-encoding#11
1 parent 54b8312 commit 2384d06

File tree

1 file changed

+45
-0
lines changed

1 file changed

+45
-0
lines changed

src/utils/string_parsing.ts

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
* limitations under the License.
1515
*/
1616

17+
import log from "../log";
1718
import assert from "./assert";
1819

1920
/**
@@ -56,6 +57,17 @@ function strToBeUtf16(str: string): Uint8Array {
5657
* @returns {string}
5758
*/
5859
function utf16LEToStr(bytes : Uint8Array) : string {
60+
if (typeof window.TextDecoder === "function") {
61+
try {
62+
// instanciation throws if the encoding is unsupported
63+
const decoder = new TextDecoder("utf-16le");
64+
return decoder.decode(bytes);
65+
} catch (e) {
66+
log.warn("Utils: could not use TextDecoder to parse UTF-16LE, " +
67+
"fallbacking to another implementation", e);
68+
}
69+
}
70+
5971
let str = "";
6072
for (let i = 0; i < bytes.length; i += 2) {
6173
str += String.fromCharCode((bytes[i + 1] << 8) + bytes[i]);
@@ -69,6 +81,17 @@ function utf16LEToStr(bytes : Uint8Array) : string {
6981
* @returns {string}
7082
*/
7183
function beUtf16ToStr(bytes : Uint8Array) : string {
84+
if (typeof window.TextDecoder === "function") {
85+
try {
86+
// instanciation throws if the encoding is unsupported
87+
const decoder = new TextDecoder("utf-16be");
88+
return decoder.decode(bytes);
89+
} catch (e) {
90+
log.warn("Utils: could not use TextDecoder to parse UTF-16BE, " +
91+
"fallbacking to another implementation", e);
92+
}
93+
}
94+
7295
let str = "";
7396
for (let i = 0; i < bytes.length; i += 2) {
7497
str += String.fromCharCode((bytes[i] << 8) + bytes[i + 1]);
@@ -83,6 +106,17 @@ function beUtf16ToStr(bytes : Uint8Array) : string {
83106
* @returns {Uint8Array}
84107
*/
85108
function strToUtf8(str : string) : Uint8Array {
109+
if (typeof window.TextEncoder === "function") {
110+
try {
111+
// instanciation throws if the encoding is unsupported
112+
const encoder = new TextEncoder();
113+
return encoder.encode(str);
114+
} catch (e) {
115+
log.warn("Utils: could not use TextEncoder to encode string into UTF-8, " +
116+
"fallbacking to another implementation", e);
117+
}
118+
}
119+
86120
// http://stackoverflow.com/a/13691499 provides an ugly but functional solution.
87121
// (Note you have to dig deeper to understand it but I have more faith in
88122
// stackoverflow not going down in the future so I leave that link.)
@@ -209,6 +243,17 @@ function intToHex(num : number, size : number) : string {
209243
* @returns {string}
210244
*/
211245
function utf8ToStr(data : Uint8Array) : string {
246+
if (typeof window.TextDecoder === "function") {
247+
try {
248+
// TextDecoder use UTF-8 by default
249+
const decoder = new TextDecoder();
250+
return decoder.decode(data);
251+
} catch (e) {
252+
log.warn("Utils: could not use TextDecoder to parse UTF-8, " +
253+
"fallbacking to another implementation", e);
254+
}
255+
}
256+
212257
let uint8 = data;
213258

214259
// If present, strip off the UTF-8 BOM.

0 commit comments

Comments
 (0)