Query Open XML Spreadsheets in VB.NET using LINQ

When working with SpreadsheetML, one of the most common needs is to retrieve the data from a worksheet or a table in as easy a fashion as possible.  There has been a fair amount written for C# developers to do this, but not nearly as much for VB.NET.  Some time ago, I wrote a blog post, Using LINQ to Query Excel Tables, which introduced a few C# classes and extension methods that make it easy to query SpreadsheetML.  This post presents a super-easy way to use that code from VB.NET.

To make it as easy as possible to get going using LINQ with VB to access SpreadsheetML, I’ve recorded the following screen-cast that walks through the process of building a VB.NET application that uses the code from that blog post.  Here is the video:

I’ve attached a zip file that contains this code, as well as the sample spreadsheet.

The gist of the technique is to put together a project that contains a VB module that uses some C# code in another project in the same solution.  Then, you can write a bit of code to query the spreadsheet.  The following code shows how to retrieve all cells for all rows in a worksheet:

Console.WriteLine("Contents of Spreadsheet (Column C)")
Console.WriteLine("==================================")
Using doc As SpreadsheetDocument = SpreadsheetDocument.Open("Data.xlsx", False)
   
Dim worksheet As WorksheetPart = doc.WorkbookPart.GetPartById("rId1")
   
For Each Row In worksheet.Rows()
       
Console.WriteLine("  RowId:{0}", Row.RowId)
       
Console.WriteLine("  Spans:{0}", Row.Spans)
       
For Each cell In Row.Cells()
           
Console.WriteLine("    Column:{0}", cell.Column)
           
Console.WriteLine("      ColumnId:{0}", cell.ColumnId)
           
If cell.Type IsNot Nothing Then
               
Console.WriteLine("      Type:{0}", cell.Type)
           
End If
           
If cell.Value IsNot Nothing Then
               
Console.WriteLine("      Value:{0}", cell.Value)
           
End If
           
If cell.Formula IsNot Nothing Then
               
Console.WriteLine("      Formula:>{0}<", cell.Formula)
           
End If
           
If cell.SharedString IsNot Nothing Then
               
Console.WriteLine("      SharedString:>{0}<", cell.SharedString)
           
End If
       
Next
   
Next
End Using

If you want to retrieve just the data for a specific column, you can modify the above code as follows.  Note that when the code iterates through the Row.Cells() collection, you
can use the Where extension method to retrieve just the column you are interested in:

Console.WriteLine("Contents of Spreadsheet (Column C)")
Console.WriteLine("==================================")
Using doc As SpreadsheetDocument = SpreadsheetDocument.Open("Data.xlsx", False)
   
Dim worksheet As WorksheetPart = doc.WorkbookPart.GetPartById("rId1")
   
For Each Row In worksheet.Rows()
       
Console.WriteLine("  RowId:{0}", Row.RowId)
       
Console.WriteLine("  Spans:{0}", Row.Spans)
       
For Each cell In Row.Cells().Where(Function(c) c.ColumnId = "C")
           
Console.WriteLine("    Column:{0}", cell.Column)
           
Console.WriteLine("      ColumnId:{0}", cell.ColumnId)
           
If cell.Type IsNot Nothing Then
               
Console.WriteLine("      Type:{0}", cell.Type)
           
End If
           
If cell.Value IsNot Nothing Then
               
Console.WriteLine("      Value:{0}", cell.Value)
           
End If
           
If cell.Formula IsNot Nothing Then
               
Console.WriteLine("      Formula:>{0}<", cell.Formula)
           
End If
           
If cell.SharedString IsNot Nothing Then
               
Console.WriteLine("      SharedString:>{0}<", cell.SharedString)
           
End If
       
Next
   
Next
End Using

If you want to just retrieve the values of interest, so that you could do something like create a drop-down list with the values from a specific column, you could rewrite the code like this:

Console.WriteLine("Contents of Spreadsheet (Column C)")
Console.WriteLine("==================================")
Using doc As SpreadsheetDocument = SpreadsheetDocument.Open("Data.xlsx", False)
   
Dim worksheet As WorksheetPart = doc.WorkbookPart.GetPartById("rId1")
   
For Each Row In worksheet.Rows()
       
For Each cell In Row.Cells().Where(Function(c) c.ColumnId = "C")
           
If cell.Type IsNot Nothing Then
               
If cell.Type = "s" Then
                   
Console.WriteLine(cell.SharedString)
               
End If
           
Else
               
If cell.Value IsNot Nothing Then
                   
Console.WriteLine(cell.Value)
               
End If
           
End If
       
Next
   
Next
End Using

Finally, if you convert the cells in the worksheet to a table, then you can simplify the code and write it like this:

Console.WriteLine("Contents of Table")
Console.WriteLine("=================")
Using doc As SpreadsheetDocument = _
   
SpreadsheetDocument.Open("Data.xlsx", False)
   
Dim query = From i In doc.Table("MyTable").TableRows()
       
Select i
   
For Each r In query
       
Console.WriteLine("{0} : {1} : {2}", _
            r
("Col1").Value.PadRight(10), _
            r
("Col2").Value.PadRight(10), r("Col3"))
   
Next
End Using

Example – Download Code