I was able to get this done in times less than 3ms, while using file buffer sizes of up to 2K, starting with a buffer size of 1K.
I used a byte-to-byte comparison to do the diff because I wanted to display line numbers as well as the text, so I needed to be able to count the number of newlines in the file. These functions can save a lot of development time since they do all the work for you in parsing out the lines, but they do not do anything to make your program lean and mean. When I thought I had things as lean and mean as possible and got the total time as fast as 8 to 12 ms, I ran the profiler in Visual Studio.
I knew that outputting is a pretty expensive function, but I was a little surprised at just how much time it was taking.

I have found that your algorithm fails and returns incorrect results on some of my test files. I guess it was a case of me not doing a regression test after a last minute change from a linked list to just an array of longs. No, I really didn't even consider LCS, I was pretty much thinking about how I could accomplish this using the least amount of memory and run as fast as possible.
The only time I increase this buffer size is if I have a really long line that exceeds the length or if the current differences that I am comparing exceeds the buffer size.
There is so much processing to read the file and for all the processing to read a line of a file.

This article is obviously for the C++ contest which seems to have less entries than the C# category. I could have cheated optimized a little more and not displayed the results while I was timing, but decided that wouldn't be right.

