A pedant that hangs out in the dark corner-cases of the web.

Monday, August 25, 2008

Roundup: XP performance vs. Vista

I dug a bit into some anecdotal reports I had heard about XP vs. Vista. It turns out XP performance beats Vista: 32- or 64-bit, service packs or no, in a box or with a fox, in a house or with a mouse.

Sure, it's doing more, but is it doing more good? Death match: Windows Vista versus XP | InfoWorld | Analysis | 2008-03-17 | By Randall C. Kennedy

Thursday, August 14, 2008

.NET 3.5 sp1 Setup Blocks Itself!

Um… ooooooooooooo-kay… that seems a bit meta.

I guess I'll click Ignore? Oh look, the progress bar is going backwards. I guess it's a Regress Bar.

Wednesday, August 06, 2008

Efficacy of .NET StreamReader's detectEncodingFromByteOrderMarks

The .NET System.IO.StreamReader class has several forms of its constructor that accept a boolean detectEncodingFromByteOrderMarks parameter to look for a byte-order-mark (BOM)/encoding-signature when the file is first read.

When enabled, this feature populates the CurrentEncoding property after the first time the file is read (which can be a simple call to Peek()).

This method only works reliably for encodings that supply a BOM, but since the default encoding is utf-8, several other single-byte encodings are compatible with content in the 7-bit ASCII range.

Here is a sample of how well this feature works with content written in various encodings:

us-ascii
Not detected, but works fine with the default UTF-8 since ASCII is a subset of UTF-8.
utf-7
Not detected, not UTF-8 compatible.
utf-8
Detected correctly. Default encoding anyway.
utf-16/UCS-2
Detected correctly (as utf-16).
utf-32
Detected correctly.
utf-32BE
Detected correctly, but still reads incorrectly in my testing!
unicodeFFFE
Detected correctly.
Windows-1252, iso-8859-1, iso-8859-15, macintosh
Not detected, but shares a significant character overlap with the UTF-8 default (7-bit ASCII).
Various EBCDIC encodings: IBM037, IBM500, IBM870
Not detected, and not read correctly in tests.
UTF-EBCDIC, SCSU, BOCU-1, Punycode, CESU-8, UCS-4*, UTF-1, UTF-9†, UTF-18
Not supported by the .NET Framework.

For the most part, content using a Unicode encoding of some kind (which include a BOM) have the greatest chance of success, and encodings not listed aren't likely to work. EBCDIC and international encodings, among others, must really be opened using their explicit encoding (meaning they must be anticipated), if they are to be read successfully, which is why you should only produce UTF-8/16/32 content.

* Not recognized as an alias for UTF-32.
† To be fair, these encodings are a joke.