Messing with how emojis are encoded, Paul Butler demonstrates how one might hide data via a smiley:
Most unicode characters do not have variations associated with them. Since unicode is an evolving standard and aims to be future-compatible, variation selectors are supposed to be preserved during transformations, even if their meaning is not known by the code handling them. So the codepoint
U+0067
(“g”) followed byU+FE01
(VS-2) renders as a lowercase “g”, exactly the same asU+0067
alone. But if you copy and paste it, the variation selector will tag along with it.Since 256 is exactly enough variations to represent a single byte, this gives us a way to “hide” one byte of data in any other unicode codepoint.
Use this simple tool to give it a whirl 😀󠄼󠅟󠅞󠅗󠄐󠅜󠅙󠅦󠅕󠄐󠅤󠅘󠅕󠄐󠅔󠅑󠅤󠅑󠄐󠅠󠅟󠅙󠅞󠅤󠄞.