c# - How to decode an utf8 encoded string split in two buffers right in between a 4 byte long char? -
a character in utf8 encoding has 4 bytes. imagine read stream 1 buffer , another. unfortunately happens @ end of first buffer 2 chars of 4 byte utf8 encoded char left , @ beginning of the second buffer rest 2 bytes.
is there way partially decode string (while leaving 2 rest byte) without copying 2 buffers 1 big
string str = "hello\u263aworld"; console.writeline(str); console.writeline("length of 'helloworld': " + encoding.utf8.getbytes("helloworld").length); var bytes = encoding.utf8.getbytes(str); console.writeline("length of 'hello\u263aworld': " + bytes.length); console.writeline(encoding.utf8.getstring(bytes, 0, 6)); console.writeline(encoding.utf8.getstring(bytes, 7, bytes.length - 7));
this returns:
hello☺world
length of 'helloworld': 10
length of 'hello☺world': 13
hello�
�world
the smiley face 3 bytes long.
is there class deals split decoding of strings? first "hello" , "☺world" reusing reminder of not encoded byte array. without copying both arrays 1 big array. want use reminder of first buffer , somehow make magic happen.
you should use decoder
, able maintain state between calls getchars
- remembers bytes hasn't decoded yet.
using system; using system.text; class test { static void main() { string str = "hello\u263aworld"; var bytes = encoding.utf8.getbytes(str); var decoder = encoding.utf8.getdecoder(); // long enough whole string char[] buffer = new char[100]; // convert first "packet" var length1 = decoder.getchars(bytes, 0, 6, buffer, 0); // convert second "packet", writing buffer // left off // note: 6 not 7, because otherwise we're skipping byte... var length2 = decoder.getchars(bytes, 6, bytes.length - 6, buffer, length1); var reconstituted = new string(buffer, 0, length1 + length2); console.writeline(str == reconstituted); // true } }
Comments
Post a Comment