Contents
UTF-8 or ISO-8859-1?
Use ISO-8859-1 for Western Europe languages such as English, French, German, etc. UTF-8 has many more characters than ISO-8859-1 and supports most non-Western Europe languages (Japanese, etc.). The disadvantage of UTF-8 is that it's not as widely supported as ISO-8859-1.
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
UTF-8
Transformation Format – 8-bit is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks (BOM). For these and other reasons, UTF-8 has become the dominant character encoding for the World-Wide Web, accounting for more than half of all Web pages.
The Internet Engineering Task Force (IETF) requires all internet protocols to identify the encoding used for character data, and the supported character encodings must include UTF-8. The Internet Mail Consortium (IMC) recommends that all e‑mail programs be able to display and create mail using UTF-8. UTF-8 is also increasingly being used as the default character encoding in operating systems, programming languages, APIs, and software applications.
UTF-8 encodes each of the 1,112,064 code points in the Unicode character set using one to four 8-bit bytes (termed “octets” in the Unicode Standard). Code points with lower numerical values (i.e., earlier code positions in the Unicode character set, which tend to occur more frequently in practice) are encoded using fewer bytes, making the encoding scheme reasonably efficient. In particular, the first 128 characters of the Unicode character set, which correspond one-to-one with ASCII, are encoded using a single octet with the same binary value as the corresponding ASCII character, making valid ASCII text valid UTF-8-encoded Unicode text as well.
ISO-8859-1
8-bit single-byte coded graphic character sets. Is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as Latin-1 and is generally intended for the “Western European” languages shown below:
- Afrikaans
- Albanian
- Basque
- Breton
- Catalan
- English (UK and US)
- Faroese
- Galician
- German
- Icelandic
- Irish (new orthography)
- Italian
- Kurdish (The Kurdish Unified Alphabet)
- Latin (basic classical orthography)
- Leonese
- Luxembourgish (basic classical orthography)
- Norwegian (Bokmål and Nynorsk)
- Occitan
- Portuguese
- Rhaeto-Romanic
- Scottish Gaelic
- Spanish
- Swahili
- Swedish
- Walloon
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text. Most modern character-encoding schemes are based on ASCII, though they support many more characters than ASCII does.
US-ASCII is the Internet Assigned Numbers Authority (IANA) preferred charset name for ASCII.
Historically, ASCII developed from telegraphic codes. Its first commercial use was as a seven-bit teleprinter code promoted by Bell data services. Work on ASCII formally began on October 6, 1960, with the first meeting of the American Standards Association's (ASA) X3.2 subcommittee. The first edition of the standard was published during 1963, a major revision during 1967, and the most recent update during 1986. Compared to earlier telegraph codes, the proposed Bell code and ASCII were both ordered for more convenient sorting (i.e., alphabetization) of lists, and added features for devices other than teleprinters.
ASCII includes definitions for 128 characters: 33 are non-printing control characters (now mostly obsolete) that affect how text and space is processed; 94 are printable characters, and the space is considered an invisible graphic. The most commonly used character encoding on the World Wide Web was US-ASCII until December 2007, when it was surpassed by UTF-8.
