Unicode / UTF-8
Sunday, May 22nd, 2022 06:27 pmFrom what I have determined, the Wordle "Share" function saves these Unicode characters:
Black Square: (U+2B1B)
White Square: (U+2B1C)
Yellow Square: (U+1F7E8)
Green Square: (U+1F7E9)
But when I look at the saved text in a Hex Editor, why does it instead show these values?
Black Square: e2 ac 9b
White Square: e2 ac 9c
Yellow Square: f0 9f 9f a8
Green Square: f0 9f 9f a9
The Hex Editor does indeed recognize those values as the above Unicode equivalents as it shows them in the side panel.
So how does e2 ac 9c get converted to 2b 1b, etc?
Never mind, I figured it out. It uses UTF-8 encoding.
e2 ac 9b
-> (in binary) 1110 0010 . 1010 1100 . 1001 1011
-> (bracketing the parts that are constant) [1110] 0010 . [10]10 1100 . [10]01 1011
-> (stripping out the parts in brackets) 0010 1011 . 0001 1011
-> 2b 1b
f0 9f 9f a8
-> 1111 0000 . 1001 1111 . 1001 1111 . 1010 1000
-> [1111 0]000 . [10]01 1111 . [10]01 1111 . [10]10 1000
-> (go right to left and add zeros as necessary on the left end to make it a full byte) 0000 0001 . 1111 0111 . 1110 1000
-> 01 f7 e8
Black Square: (U+2B1B)
White Square: (U+2B1C)
Yellow Square: (U+1F7E8)
Green Square: (U+1F7E9)
But when I look at the saved text in a Hex Editor, why does it instead show these values?
Black Square: e2 ac 9b
White Square: e2 ac 9c
Yellow Square: f0 9f 9f a8
Green Square: f0 9f 9f a9
The Hex Editor does indeed recognize those values as the above Unicode equivalents as it shows them in the side panel.
So how does e2 ac 9c get converted to 2b 1b, etc?
Never mind, I figured it out. It uses UTF-8 encoding.
e2 ac 9b
-> (in binary) 1110 0010 . 1010 1100 . 1001 1011
-> (bracketing the parts that are constant) [1110] 0010 . [10]10 1100 . [10]01 1011
-> (stripping out the parts in brackets) 0010 1011 . 0001 1011
-> 2b 1b
f0 9f 9f a8
-> 1111 0000 . 1001 1111 . 1001 1111 . 1010 1000
-> [1111 0]000 . [10]01 1111 . [10]01 1111 . [10]10 1000
-> (go right to left and add zeros as necessary on the left end to make it a full byte) 0000 0001 . 1111 0111 . 1110 1000
-> 01 f7 e8