A moderate performance, low memory, modified RFC 4180 CSV delimited file reader. On my box it reads CSV files at ~20MB/sec. I'm pretty sure it can be faster, but seems like a descent place to start.
The modified parts are, it will skip blank lines, it allows for a remark character which is the pound symbol (#) by default. You can change the delimiter. The delimiter defaults to a comma (,) but can be changed to a tab or any other character.
It fully supports any characters embedded inside the quotes including carriage returns and line feeds. A double quote can be embedded by including two double quote. ("") i.e. "This ""is"" a test" when read, it will return This "is" a test.
Test.bas allows you to try the three included delimited files. If you download the Big CSV file (about 14MB, expands to ~104MB), you use Test2.bas to see how fast it runs on your box
To use it, you must include delimitedReader.inc:
In code you would do some variation of the following:
Some additional class methods:
Local rdr As iDelimitedReader rdr = Class "cDelimitedReader" rdr.Open( "myFileName", -1 ): ' -1 for a csv with headers, 0 for a csv without headers While rdr.ReadNext() ' Do something For i = 0 To rdr.ColumnCount-1 value = rdr.ByIndex( i ) Next Wend rdr.Close()
Note that ColumnNames, delimiter character, and remark character can only be changed before the CSV reader is opened.
If a file does not have column headers, you can assign column names if you want. If you don't the columns will be labeled c1, c2, c3 etc. Following is an example of how you might set the column names before the reader is open:
Dim names(1) As String Array Assign names() = "Column1Name", "Column2Name" Call SetColumnNames( rdr, names() )