When most developers are faced with a fixed width text file, they reach for the String object. While this is effective, it isn’t efficient. .NET doesn’t handle strings that well, and use if SubString is memory intensive. A better way is to use the RegularExpressions classes in System.Text.RegularExpressions.
A fixed width file is one where the columns are defined by the number of spaces consumed. For instance, here is a list of the Big 10 (11? 12?), locations, and years founded:
University of Illinois Champaign, Illinois 1867
Indiana University Bloomington, Indiana 1820
University of Iowa Iowa City, Iowa 1847
University of Michigan Ann Arbor, Michigan 1817
Michigan State University East Lansing, Michigan 1855
University of Minnesota Minneapolis, Minnesota 1851
Northwestern University Evanston, Illinois 1851
Ohio State University Columbus, Ohio 1870
Pennsylvania State University State College, Pennsylvania 1855
Purdue University West Lafayette, Indiana 1869
University of Wisconsin–Madison Madison, Wisconsin 1848
The university is 32 characters, the location is 28 characters, and the year is 4 characters. We can debate up and down the benefits of such a format, but it is what it is, and we often get them from legacy systems.
Instead of using the String.Substring object to get the values out, we can use the Match class in System.Text.Regular expressions. When you use this class, you get back a Match object, that has a collection of the matches (shocker that) found in the intersection of the expression and the input.
Here is an example program that loads the file, and uses an expression (note that format) to break up the file into a collection, basically an array. Notice that there isn’t a single String in the project other than the pattern itself. To run the program, save the above formatted text into a file called “BigTen.txt” on your C drive.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Text.RegularExpressions;
namespace BigTen
{
class Program
{
static void Main(string[] args)
{
StreamReader sr = new StreamReader(@"c:\BigTen.txt");
string pattern = @"^(?<school>.{32})(?<location>.{28})(?<joined>.{4})$";
Regex re = new Regex(pattern);
while (sr.Peek() != -1)
{
Match match = re.Match(sr.ReadLine());
Console.WriteLine(match.Groups["school"].Value.TrimEnd());
Console.WriteLine(match.Groups["location"].Value.TrimEnd());
Console.WriteLine(match.Groups["joined"].Value.TrimEnd()+"\n");
}
sr.Close();
Console.ReadLine();
}
}
}
Of course, there are downsides to regular expressions. They are difficult to debug, and the formatting is arcane. For this, however, they make for an excellent solution, and for formatting of the expression is quite readable. Only one expression is used, so it is easier than some to debug. I think it is a good solution to the problem at hand. Give it a try!