link: Web Data Handling and Formats

UTF-8 (8-bit Unicode Transformation Format)

Overview

UTF-8 is a variable-width character encoding used for electronic communication. Formulated to encode all 1,112,064 valid character code points in Unicode using one to four one-byte (8-bit) code units, UTF-8 has become the dominant character encoding for the World Wide Web, accounting for more than 90% of all web pages.