Friday, June 23, 2006

Bug in Notepad …

One of my colleagues day before yesterday sent me a message saying try typing “Bush hid the facts” in notepad, save it, close it and reopen it. I tried the same and initially I was surprised to know that it was displaying only few boxes!!

Step 1: Start >> Run >> notepad
Step 2: Just type the text in quotes alone – “Bush hid the facts”
Step 3: Save the file [I saved it as a.txt] and close it
Step 4: Open that saved file.

I didn’t think this to be a Easter egg as I was sure that Microsoft wouldn’t unnecessarily take on US government like this. So I thought of testing it out.

As a first attempt I replaced all characters within that file with “a”. That is, my file would read “aaaa aaa aaa aaaaa” now saved it as “b.txt”, closed it and reopened it. As I have expected it showed up only boxes. So I immediately concluded that notepad isn’t good at opening files which has only a single sentence of 4 words with its size as 4-3-3-5.

I responded back to my friend with my findings!! But I need to admit that this isn’t consistent at all. In some cases it works, for some words/characters it doesn’t work.

The thing which confused me more was I was able to open the saved notepad file (with the above mentioned characters of 4-3-3-5 spacing) without any hassle from command prompt.

Step 1: Start >> Run >> cmd
Step 2: edit a.txt

Only then I was thinking in the lines of “May be this could be something to do with text encoding”. So I started to look out for the answer with the help of my friend Google. Though there were many sites which talks about it, to my knowledge the most appropriate ones are these two links.

  1. http://blogs.msdn.com/oldnewthing/archive/2004/03/24/
    95235.aspx
    --- Majorly I didn’t understand much from this post as it was above my head. But one point which I understood is, its the problem with IsTextUnicode() function which Notepad uses to find out whether the text is a Unicode or not.

If you go through those two links you would understand that people have discussed about this issue in the year 2004 itself :(

Technorati tags: , ,

4 comments:

Hari Baskar J said...

Vadivel:

On the same notepad, if u edit the few boxes, type "Bush hid the facts", save, close and reopen,
it shows the message properly.
so its not that 4-3-3-5 formatting.

may be it isn't consistent.

Vadivel said...

Yes Hari. It isn't related to 4-3-3-5 as what I have thought initially.

If you check the links which I have provided you could see they are using text of varied length and still gettting this error.

So its something to do with encoding only has Ray has pointed out in his blog.

Raja said...

Good Research... Like it :)

Sudar said...

Hi Vadivel,

It’s because of text encoding as you have pointed out. The IsTextUnicode() function is based on some heuristics and is not 100% perfect.

But what made this bug famous is not the problem with encoding but the text for which this bug occurred. ;)

Cheers,
Sudar