Tutorial :Better Way of Getting the Text Value of Cells in an Excel (*.xls) File



Question:

I'm trying to write an import function for getting data out of an excel file. How I currently do it is as follows:

Private Sub ReadExcel(ByVal childform As PhoneDiag.frmData, ByVal FileName As String)            Dim xlApp As Excel.Application          Dim xlWorkBook As Excel.Workbook          Dim xlWorkSheet As Excel.Worksheet            xlApp = New Excel.ApplicationClass          xlWorkBook = xlApp.Workbooks.Open(FileName)          xlWorkSheet = xlWorkBook.Worksheets(1)          Dim columnrange = xlWorkSheet.Columns          Dim therange = xlWorkSheet.UsedRange            ''Add rows by column          For rCnt = 2 To therange.Rows.Count                Dim rowArray(therange.Columns.Count) As String                For cCnt = 1 To therange.Columns.Count                    Dim Obj = CType(therange.Cells(rCnt, cCnt), Excel.Range)                  Dim celltext As String                  celltext = Obj.Value.ToString                  rowArray((cCnt - 1)) = celltext                Next                childform.datagridSample.Rows.Add(rowArray)            Next            '' make sure we close the excel.exe service after use          xlWorkBook.Close()          xlApp.Quit()          System.Runtime.InteropServices.Marshal.ReleaseComObject(xlWorkBook)          System.Runtime.InteropServices.Marshal.ReleaseComObject(xlWorkSheet)          System.Runtime.InteropServices.Marshal.ReleaseComObject(xlApp)        End Sub  

The problem is, of course, is that it runs horribly. From what I can gather, it more than likely comes down to this line:

Dim Obj = CType(therange.Cells(rCnt, cCnt), Excel.Range)  

All I need is the text from the cells, not to create an object for each cell (and then not send them to garbage collection). Is there an easier way of just getting the text?

Ideally, if I can get a method for getting the text values of the cell, I'd like to get the multiple rowArray()'s added to a master array and update the program's values later.

If you see any other performance tips, let me know. It would be much appreciated. =b

EDIT: I also realize that I have two options if I were to create a master array, say mArr, to hold all the data. Would it be better performance-wise to have mArr to be large and the sub arrays small, or mArr to be small and the sub arrays to hold more of the information?

I ask because the files that will be imported will have more rows than columns, so I was wondering if there was any "set" way of doing it.


Solution:1

Reading cell content individually is a huge performance killer. My advice is to read first the entire range into an array of objects, and then retrieve the data from that array. I don't write in VB.NET, but in C# the code looks something like this:

Excel.Range firstCell = excelWorksheet.get_Range("A1", Type.Missing);  Excel.Range lastCell = excelWorksheet.Cells.SpecialCells(Excel.XlCellType.xlCellTypeLastCell, Type.Missing);  object[,] cellValues;  Excel.Range worksheetCells = excelWorksheet.get_Range(firstCell, lastCell);  cellValues = worksheetCells.Value2 as object[,];  

This example reads the entire contents of a sheet into cellValues (careful about nulls). The same advice applies to writing to the sheet - do it all at once, with one array.
If you are interested, I have a longer post on this there
Oh and BTW, replace

xlApp = New Excel.ApplicationClass  

by

xlApp = New Excel.Application  


Solution:2

here's a C# version (but you get the gist) that will get data into a datatable...

con = new OleDbConnection(@"Provider=Microsoft.Jet.OLEDB.4.0;Data Source="+SpreadsheetLocation+";Extended Properties=\"Excel 8.0;IMEX=1;HDR=NO\"");  OleDbDataAdapter da = new OleDbDataAdapter("SELECT * FROM ["+Worksheet+"$]", con);  DataTable dt = new DataTable();  da.Fill(dt);  

where "SpreadSheetLocation" and "Worksheet" are a file path and work sheet names respectively. You can then convert the datatable rows to arrays if you like.

Update : you also don't need Excel installed in the machine for this solution...


Solution:3

For simple reading, I'd use the Excel Data Reader available on CodePlex.

Machines using the component don't need Excel installed and it's pretty easy to use. You can read a Worksheet into a DataSet.


Solution:4

SpreadshsetGear for .NET will allow you to open a workbook and access the raw values of cells (numbers, text, logical or error) or get the formatted text of the cells. Because SpreadsheetGear runs as part of your application rather than COM Interop as with Excel, it will run much faster (see the comments on this page to see what some of our customers have said about performance).

You can see live samples here and download the free trial here.

Disclaimer: I own SpreadsheetGear LLC


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »