javascript - Unexpected behaviour of String.fromCodePoint / String#codePointAt (Firefox/ES6) -
since version 29 of firefox, mozilla provides string.fromcodepoint , string#codepointat methods , published polyfills on respective mdn pages.
so happens trying out , seems missing important, splitting string "ä☺𠜎" codepoints , reassembling these returns an, @ least me, unexpected result.
i've built test case: http://jsfiddle.net/dcodeio/yhwp7/
var str = "ä☺𠜎"; ...split it, reassemble it...
am missing something?
this not problem of .codepointat
, more of char encoding of character 𠜎
. 𠜎
has javascript string length of 2.
why? because javascript strings encoded using 2-byte utf-16. 𠜎
( charcode: 132878 ) greater 2-byte utf-16 ( 0-65535 ). means needs encoded using 4-byte utf-16. utf-16 representation 0xd841 0xdf0e
consuming 2 characters in string.
when using .charat()
see correct values:
var string = "𠜎"; console.log( string.charat(0), string.charat(1) ); // logs 55361 57102 (0xd841 0xdf0e)
why doesn't display 228, 9786, 55361, 57102
? thats because .codepointat()
converts 4-byte utf-16 characters integers correctly ( 132878
).
so why output 57,102
then? because iterating str.length
in loop, returns 4 (because "𠜎".length == "
), .codepointat()
executed on str[3]
57102
.
Comments
Post a Comment