Anything Wrong With Using Windows-1252 Instead Of Utf-8
Solution 1:
Windows 1252 is one of the many many fixed size character sets. Mac has its own set. there are a few ISO for various parts of the Europe and for some other parts of the world. Most of them have slight variations.
The good point is that you have a fixed-size character, meaning 1 character = 1 byte no matter what.
The bad points are:
- Some people may not have your encoding installed
- Some people may use a slightly different encoding, resulting in very few issues, not obvious to see, but very ugly on the long run
- You can only support a few languages
That include any citation you would like to make. In windows-1252 you can't display russian, greek, polish ...
UTF-8 is the standard encoding for unicode representation on 1+ bytes. It can represent a very large majority of the characters you may encounter, although it is designed for latin-based languages, as other languages take more storage space.
It in used in XML, JSON, and most types of web services you may find. It is a good default when you don't know what encoding to use. It allows to limit the number of encoding issues, such as "I though you were in Latin-1 / No, I was using latin-9, but then this guy on mac used Roman". If you have more than 1 people working on the content of the website, they may have different encodings on their plateforme, and therefore your content may be messed up at some point.
UTF-8 is, as far as I know, the only way to easily standardize the encoding used between people without discussion.
Typical example is, if your website is encoded in windows1252, and the new dev has a mac, you'll probably be in trouble.
Solution 2:
You claim that Windows-1252 offers everything you need but the √ symbol is a counter-example. You must be using one of these tricks:
- HTML entities:
√
,√
or similar - Print another character and change the font
In either case, your solution is not portable: stuff will only display correctly in a properly configured web browser. Everything else (database, JavaScript, text files, plain text e-mail messages...) will not contain the real data.
Additionally, JSON's only encoding is UTF-8. JavaScript will normally make the conversions for you but you must ensure that all your tool-chain behaves similarly.
So to answer your main question: there's nothing wrong in using Windows-1252 if that's all you need. The problem is that you already need more than it can offer.
As about your problems with UTF-8, it's obvious that UTF-8 is a full Unicode encoding so it does meet all the requirements. (Not being able to make it work can your reason to dump it but it isn't a technical reason.) My guess is that, since your current data doesn't have actual square root symbols, switching encodings breaks the trick you were using. You need to:
- Find out what current data looks like
- Run a one-time search and replace
Solution 3:
What char set is the web server encoding?
Try changing the web server to utf8. In apache.config:
AddDefaultCharset utf-8
Post a Comment for "Anything Wrong With Using Windows-1252 Instead Of Utf-8"