Exploring Table Markup in Open XML SpreadsheetML
I’ve written a blog post and recorded a screen-cast around table markup in SpreadsheetML. It is pretty easy to create a worksheet, populate cells in the worksheet, and then convert the cells to a table. Tables in Excel 2010, as you probably know, look like this:
Here is the screen-cast that explains table markup, and walks through creation of a small Open XML SDK program that converts ordinary SpreadsheetML cells into a table:
Explores table markup, and walks through creating an Open XML SDK program to convert cells into a table.
Justin said,
June 9, 2011 @ 7:06 pm
Hi Eric
Thanks for this and all your great posts regarding OpenXML.
Just one note on modifying tables and sheets in OpenXML, so that that anyone else reading this doesn’t have to go through the same pain I did. If your first column in your table definition is named ‘Column’ for example, the corresponding cell in the worksheet must also have a value of ‘Column’. The applies to all table column definitions.
I had a scenario where my first table column was named ‘DevHours’ and the corresponding cell in the sheet had a value of ‘Dev Hours’. When I tried to open the xlsx in Excel, it complained that the content was unreadable and that the table definition needed to repaired. No hint as to what aspect of the table needed to be repaired or how the content was unreadable. This is led to several hours of frustration before finally figuring it out.
I know this issue might seem obvious to some but I was unaware of such a tight dependency between the sheet and table in terms of naming values. Perhaps it’s mentioned in the document spec, I don’t know. This is only my second day exploring OpenXML so give me a break 🙂
Cheers
Eric White said,
June 10, 2011 @ 7:52 am
Hi Justin,
Thanks for this note! I was not aware of this tight dependency but now I am. 🙂
-Eric
Rick Spiewak said,
September 11, 2012 @ 9:18 pm
Hello, Eric. I’m trying to use your example for how to turn cells into a table. Unfortunately, in my case the cells are first filled dynamically – I’m attempting to replicate what Excel does in importing XML. So the idea of using the Productivity Tool to generate the code doesn’t make sense. Instead, I’m trying to create it on the fly. This results in an unreadable file, and I can’t even unzip the table file!
I’m concerned about hardwiring “rId1” among other things.
Do you have any advice?
Thanks!
Public Function DefineTable(ByVal worksheetPart As WorksheetPart, startCell As String, endCell As String, tableName As String) As Boolean
' Starting row has column names
Dim startRow As Integer = GetRowIndex(startCell)
' Get columns
Dim startColumn As String = GetColumnName(startCell)
If startColumn.Length 1 Then Throw New ArgumentOutOfRangeException(startCell, "Column not single character")
Dim endColumn As String = GetColumnName(endCell)
If endColumn.Length 1 Then Throw New ArgumentOutOfRangeException(endCell, "Column not single character")
Dim columnCount As Integer = Asc(endColumn) - Asc(startColumn) 'Only works for single character columns
Dim tableDefPartsCount As Integer = worksheetPart.GetPartsOfType(Of TableDefinitionPart)().Count
Dim newId As Integer = tableDefPartsCount + 1
' Add Table Definition
Dim newTableDefinitionPart = worksheetPart.AddNewPart(Of TableDefinitionPart)("rId1")
' Create content for Table Definition Part
Dim newTable As New Spreadsheet.Table()
With newTable
.Id = newId
.Name = tableName
.DisplayName = tableName
.Reference = startCell & ":" & endCell
.TotalsRowShown = False
.AutoFilter = New AutoFilter()
.AutoFilter.Reference = .Reference
End With
' Create the columns for the table
Dim newTableColumns As New TableColumns
newTableColumns.Count = columnCount
Dim currentColumn As String = startColumn
For i As Integer = 1 To columnCount
Dim newTableColumn As New TableColumn
With newTableColumn
.Id = i
' get the name for the column from the contents of the cell in the top row
Dim headerCell As Cell = GetSpreadsheetCell(worksheetPart.Worksheet, currentColumn, startRow)
Dim newName As String = GetCellValue(headerCell, Me.SpreadSheetDocument)
.Name = newName
.UniqueName = newName
End With
newTableColumns.Append(newTableColumn)
currentColumn = GetNextColumn(currentColumn)
Next
newTable.Append(newTableColumns)
newTableDefinitionPart.table = newTable
Dim newTableParts As New TableParts()
newTableParts.Count = 1
Dim newTablePart As New TablePart
newTablePart.Id = "rId1"
newTableParts.Append(newTablePart)
worksheetPart.Worksheet.Append(newTableParts)
worksheetPart.Worksheet.Save()
End Function
Eric White said,
September 12, 2012 @ 2:54 am
Hi Rick,
My first piece of advice is that you first simplify your markup to the simplest possible when you generate, and make sure that you can generate a valid spreadsheet. Then, after you have generated the simple markup, you can add in complexities one at a time, making sure that as you add each complexity, you are still generating valid spreadsheet markup.
One issue that makes SpreadsheetML somewhat more complex to generate is that while Word is forgiving if you do not order the attributes exactly as per the spec, Excel is not. In addition to generating the correct elements and attributes, you must make sure that you order them properly.
The Open XML SDK productivity tool can validate documents and report errors. When I generate invalid markup, I first always check the errors that it reports.
Finally, there are two other options that you can use to generate spreadsheet markup. The screen-cast, Using Open XML and LINQ to XML in a Streaming Fashion to Create Huge Spreadsheets, is oriented towards generating huge spreadsheets, but it can just as easily be used to generate small ones. It will generate valid markup.
While I have not personally used it, I have heard good things about ClosedXML. It may suit your needs, and relieve you of much of the burden of generating valid markup.
-Eric
Rick Spiewak said,
September 17, 2012 @ 4:48 pm
Thanks, Eric. I can already create a valid spreadsheet. I can import XML, replicating the layout that Excel creates. It’s only when I try to add the table markup that it fails. I’m working from the code I published here.
Importing a large XML file led me to some optimization by caching the results of some of the queries, and this works as well. I also have adapted the code to ensure that rows are added in numerical order.
The latest problem only arises in trying to overlay the table formatting.
Unfortunately, in adapting to the need to order rows correctly, I found that the validation in the productivity tool falls short – as does doing the same in code.
Is there a good general reference or example for adding table markup to a set of cells when you don’t have advance knowledge of the rest of the markup? i.e., you aren’t free to arbitrarily pick relationship ids, etc.?
Thanks,
Rick
Rick Spiewak said,
September 21, 2012 @ 3:47 pm
I figured out the Id part:
Public Function DefineTable(ByVal worksheetPart As WorksheetPart, startCell As String, endCell As String, tableName As String) As Boolean
' Get the parent SpreadsheetDocument
Dim spreadsheetdocument As SpreadsheetDocument = DirectCast(worksheetPart.OpenXmlPackage, SpreadsheetDocument)
Dim wsId As String = spreadsheetdocument.WorkbookPart.GetIdOfPart(worksheetPart)
but I’m still trying to figure out how not to break the document.
Eric White said,
September 21, 2012 @ 4:08 pm
With regards to generating unique relationship IDs, I generate a GUID, strip the dashes out, prepend an ‘R’, and use that as the rId. This is safe.
Rick Spiewak said,
September 21, 2012 @ 8:26 pm
I’m getting closer – saved the output document in more places. The table.xml file is coming up empty. I can’t figure out how to force it to save – there is no “save” operation on tableparts or table. I’m saving the worksheet:
'''
''' Create a table from existing data in a worksheet
'''
''' WorksheetPart
''' e.g. A1
''' e.g. C4
''' String
'''
'''
Public Function DefineTable(ByVal worksheetPart As WorksheetPart, startCell As String, endCell As String, tableName As String) As Boolean
worksheetPart.Worksheet.Save()
' Get the parent SpreadsheetDocument
Dim spreadsheetdocument As SpreadsheetDocument = DirectCast(worksheetPart.OpenXmlPackage, SpreadsheetDocument)
Dim wsId As String = spreadsheetdocument.WorkbookPart.GetIdOfPart(worksheetPart)
' Starting row has column names
Dim startRow As Integer = GetRowIndex(startCell)
' Get columns
Dim startColumn As String = GetColumnName(startCell)
If startColumn.Length 1 Then Throw New ArgumentOutOfRangeException(startCell, "Column not single character")
Dim endColumn As String = GetColumnName(endCell)
If endColumn.Length 1 Then Throw New ArgumentOutOfRangeException(endCell, "Column not single character")
Dim columnCount As Integer = Asc(endColumn) - Asc(startColumn) + 1 'Only works for single character columns
Dim tableDefPartsCount As Integer = worksheetPart.GetPartsOfType(Of TableDefinitionPart)().Count
Dim newId As Integer = tableDefPartsCount + 1
' Add Table Definition
Dim newTableDefinitionPart = worksheetPart.AddNewPart(Of TableDefinitionPart)(CreateRelationshipId())
worksheetPart.Worksheet.Save()
' Create content for Table Definition Part
Dim newTable As New Spreadsheet.Table()
With newTable
.Id = newId
.Name = tableName
.DisplayName = tableName
.Reference = startCell & ":" & endCell
.TotalsRowShown = False
.AutoFilter = New AutoFilter()
.AutoFilter.Reference = .Reference
End With
' Create the columns for the table
Dim newTableColumns As New TableColumns
newTableColumns.Count = columnCount
Dim currentColumn As String = startColumn
For i As Integer = 1 To columnCount
Dim newTableColumn As New TableColumn
With newTableColumn
.Id = i
' get the name for the column from the contents of the cell in the top row
Dim headerCell As Cell = GetSpreadsheetCell(worksheetPart.Worksheet, currentColumn, startRow)
Dim newName As String = GetCellValue(headerCell, Me.ExcelSpreadsheetDocument)
.Name = newName
.UniqueName = newName
End With
newTableColumns.Append(newTableColumn)
currentColumn = GetNextColumn(currentColumn)
Next
newTable.Append(newTableColumns)
newTableDefinitionPart.table = newTable
Dim newTableParts As New TableParts()
newTableParts.Count = 1
Dim newTablePart As New TablePart() With {.Id = CreateRelationshipId()}
newTableParts.Append(newTablePart)
worksheetPart.Worksheet.Append(newTableParts)
worksheetPart.Worksheet.Save()
End Function
'''
''' Create a unique Relationship ID
'''
'''
'''
Private Function CreateRelationshipId() As String
Dim newGuid As Guid = Guid.NewGuid
Dim strippedGuid As String = Regex.Replace(newGuid.ToString, "-", "")
Return "rId" & strippedGuid.Substring(0, 4)
End Function
but I’m obviously missing something, since the table doesn’t get created. I can even open the output file in Excel (but not the productivity tool – it complains that the root element for table.xml is missing, which is true since the file is empty…)
Any other ideas? This whole issue of when to save what is one of the most irksome in using this SDK!
Thanks,
Rick