When Writing Spreadsheets, A Comparison of using the Shared String Table to using In-Line Strings

A common question for developers new to Open XML is:

When writing a spreadsheet, what is the difference between using the shared string table, and using in-line strings?

To better discuss this, let’s consider the pseudo code when using the shared string table to the pseudo code when using in-line strings.  Lets consider that we have some data source that we can iterate through a row at a time, and for each row, we can access each column within the row.  This is typical of just about any data source that you might use to create a spreadsheet.

My recommended way to develop an Open XML program is to not start with a completely empty workbook / worksheet.  Instead I create a ‘template’ workbook that contains only one worksheet, and the one worksheet doesn’t contain any data.  The code then opens this ‘template’ workbook, opens the one worksheet part, and adds the data to this part.

Here is the pseudo code when using the shared string table:

Copy the 'template' workbook to a new file.
Open the new file, specifying the argument that enables the
    code to write changes to the workbook.
Open the workbook, find the worksheet relationship ID, and
    open the worksheet.
If necessary, clear any data from the worksheet.  However, if you
    start with a completely empty worksheet, this step is not necessary.
For each row in the data source
  Create the row element
  for each column in the row
    Is data[row, column] a number?
      Create the cell markup
      Sample markup would look like this:
        <c r="A2">
          <v>1</v>
        </c>
    Is data[row, column] a string?
      Call the method to add the string to the shared string table.
      If the shared string table already contains the string at data[row, column],
        then return the index of the existing string in the shared string table.
      If the shared string table does not contain the string, then
        add the string to the end of the shared string table
        return the index of the newly added string.
      Create the cell markup.
        The 't' attribute = 's'.
        The <v> element contains the index.
      Sample markup would look like this (where the <v>0</v> contains the correct
        integer index into the shared string table):
        <c r="A1"
           t="s">
          <v>0</v>
        </c>
  Next column
Next row
Write and close the shared string table.
Write and close the worksheet part.
Write and close the workbook part (if necessary).
Close the package.

In contrast, here is the pseudo code if we use inline strings:

Copy the 'template' workbook to a new file.
Open the new file, specifying the argument that enables the
    code to write changes to the workbook.
Open the workbook, find the worksheet relationship ID, and
    open the worksheet.
If necessary, clear any data from the worksheet.  However, if you
    start with a completely empty worksheet, this step is not necessary.
For each row in the data source
  Create the row element
  for each column in the row
    Is data[row, column] a number?
      Create the cell markup
      Sample markup would look like this:
        <c r="A2">
          <v>1</v>
        </c>
    Is data[row, column] a string?
      Create the cell markup
      Sample markup would look like this:
        <c r="A1"
           t="inlineStr">
          <is>
            <t>This is a string</t>
          </is>
        </c>
  Next column
Next row
Write and close the shared string table.
Write and close the worksheet part.
Write and close the workbook part (if necessary).
Close the package.

As you can see, this second method is simpler than the first.  It is somewhat easier to write this code.

There is a topic on MSDN that discusses the shared string table:

How to: Insert text into a cell in a spreadsheet document

This topic contains a method, InsertSharedStringItem, which contains an implementation of the pseudo code below:

If the shared string table already contains the string at data[row, column],
  then return the index of the existing string in the shared string table.
If the shared string table does not contain the string, then
  add the string to the end of the shared string table
  return the index of the newly added string.

If you want to write some code to generate a workbook, and if you want to avoid the hassles associated with the shared string table, then you can take the following approach:

So, for example, I create a small worksheet that contains the data:

Column1, Column2, Column3, Column4
1,2,3,4
A,B,C,D

Then I run the code that converts to NOT use the shared string table.  The resulting markup looks like this:

<x:sheetData>
  <x:row r="1"
          spans="1:4"
          x14ac:dyDescent="0.25">
    <x:c r="A1"
          t="inlineStr">
      <x:is>
        <x:t>Column1</x:t>
      </x:is>
    </x:c>
    <x:c r="B1"
          t="inlineStr">
      <x:is>
        <x:t>Column2</x:t>
      </x:is>
    </x:c>
    <x:c r="C1"
          t="inlineStr">
      <x:is>
        <x:t>Column3</x:t>
      </x:is>
    </x:c>
    <x:c r="D1"
          t="inlineStr">
      <x:is>
        <x:t>Column4</x:t>
      </x:is>
    </x:c>
  </x:row>
  <x:row r="2"
          spans="1:4"
          x14ac:dyDescent="0.25">
    <x:c r="A2">
      <x:v>1</x:v>
    </x:c>
    <x:c r="B2">
      <x:v>2</x:v>
    </x:c>
    <x:c r="C2">
      <x:v>3</x:v>
    </x:c>
    <x:c r="D2">
      <x:v>4</x:v>
    </x:c>
  </x:row>
  <x:row r="3"
          spans="1:4"
          x14ac:dyDescent="0.25">
    <x:c r="A3"
          t="inlineStr">
      <x:is>
        <x:t>A</x:t>
      </x:is>
    </x:c>
    <x:c r="B3"
          t="inlineStr">
      <x:is>
        <x:t>B</x:t>
      </x:is>
    </x:c>
    <x:c r="C3"
          t="inlineStr">
      <x:is>
        <x:t>C</x:t>
      </x:is>
    </x:c>
    <x:c r="D3"
          t="inlineStr">
      <x:is>
        <x:t>D</x:t>
      </x:is>
    </x:c>
  </x:row>
</x:sheetData>

Then, if I use the Open XML SDK Productivity Tool to generate code to generate the worksheet part, the resulting code looks like this:

using DocumentFormat.OpenXml.Spreadsheet;
using DocumentFormat.OpenXml;

namespace GeneratedCode
{
    public class GeneratedClass
    {
        // Creates an Worksheet instance and adds its children.
        public Worksheet GenerateWorksheet()
        {
            Worksheet worksheet1 = new Worksheet(){ MCAttributes = new MarkupCompatibilityAttributes(){ Ignorable = "x14ac" }  };
            worksheet1.AddNamespaceDeclaration("r", "http://schemas.openxmlformats.org/officeDocument/2006/relationships");
            worksheet1.AddNamespaceDeclaration("mc", "http://schemas.openxmlformats.org/markup-compatibility/2006");
            worksheet1.AddNamespaceDeclaration("x14ac", "http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac");
            worksheet1.AddNamespaceDeclaration("x", "http://schemas.openxmlformats.org/spreadsheetml/2006/main");
            SheetDimension sheetDimension1 = new SheetDimension(){ Reference = "A1:D3" };

            SheetViews sheetViews1 = new SheetViews();

            SheetView sheetView1 = new SheetView(){ TabSelected = true, WorkbookViewId = (UInt32Value)0U };
            Selection selection1 = new Selection(){ ActiveCell = "D4", SequenceOfReferences = new ListValue<StringValue>() { InnerText = "D4" } };

            sheetView1.Append(selection1);

            sheetViews1.Append(sheetView1);
            SheetFormatProperties sheetFormatProperties1 = new SheetFormatProperties(){ DefaultRowHeight = 15D, DyDescent = 0.25D };

            SheetData sheetData1 = new SheetData();

            Row row1 = new Row(){ RowIndex = (UInt32Value)1U, Spans = new ListValue<StringValue>() { InnerText = "1:4" }, DyDescent = 0.25D };

            Cell cell1 = new Cell(){ CellReference = "A1", DataType = CellValues.InlineString };

            InlineString inlineString1 = new InlineString();
            Text text1 = new Text();
            text1.Text = "Column1";

            inlineString1.Append(text1);

            cell1.Append(inlineString1);

            Cell cell2 = new Cell(){ CellReference = "B1", DataType = CellValues.InlineString };

            InlineString inlineString2 = new InlineString();
            Text text2 = new Text();
            text2.Text = "Column2";

            inlineString2.Append(text2);

            cell2.Append(inlineString2);

            Cell cell3 = new Cell(){ CellReference = "C1", DataType = CellValues.InlineString };

            InlineString inlineString3 = new InlineString();
            Text text3 = new Text();
            text3.Text = "Column3";

            inlineString3.Append(text3);

            cell3.Append(inlineString3);

            Cell cell4 = new Cell(){ CellReference = "D1", DataType = CellValues.InlineString };

            InlineString inlineString4 = new InlineString();
            Text text4 = new Text();
            text4.Text = "Column4";

            inlineString4.Append(text4);

            cell4.Append(inlineString4);

            row1.Append(cell1);
            row1.Append(cell2);
            row1.Append(cell3);
            row1.Append(cell4);

            Row row2 = new Row(){ RowIndex = (UInt32Value)2U, Spans = new ListValue<StringValue>() { InnerText = "1:4" }, DyDescent = 0.25D };

            Cell cell5 = new Cell(){ CellReference = "A2" };
            CellValue cellValue1 = new CellValue();
            cellValue1.Text = "1";

            cell5.Append(cellValue1);

            Cell cell6 = new Cell(){ CellReference = "B2" };
            CellValue cellValue2 = new CellValue();
            cellValue2.Text = "2";

            cell6.Append(cellValue2);

            Cell cell7 = new Cell(){ CellReference = "C2" };
            CellValue cellValue3 = new CellValue();
            cellValue3.Text = "3";

            cell7.Append(cellValue3);

            Cell cell8 = new Cell(){ CellReference = "D2" };
            CellValue cellValue4 = new CellValue();
            cellValue4.Text = "4";

            cell8.Append(cellValue4);

            row2.Append(cell5);
            row2.Append(cell6);
            row2.Append(cell7);
            row2.Append(cell8);

            Row row3 = new Row(){ RowIndex = (UInt32Value)3U, Spans = new ListValue<StringValue>() { InnerText = "1:4" }, DyDescent = 0.25D };

            Cell cell9 = new Cell(){ CellReference = "A3", DataType = CellValues.InlineString };

            InlineString inlineString5 = new InlineString();
            Text text5 = new Text();
            text5.Text = "A";

            inlineString5.Append(text5);

            cell9.Append(inlineString5);

            Cell cell10 = new Cell(){ CellReference = "B3", DataType = CellValues.InlineString };

            InlineString inlineString6 = new InlineString();
            Text text6 = new Text();
            text6.Text = "B";

            inlineString6.Append(text6);

            cell10.Append(inlineString6);

            Cell cell11 = new Cell(){ CellReference = "C3", DataType = CellValues.InlineString };

            InlineString inlineString7 = new InlineString();
            Text text7 = new Text();
            text7.Text = "C";

            inlineString7.Append(text7);

            cell11.Append(inlineString7);

            Cell cell12 = new Cell(){ CellReference = "D3", DataType = CellValues.InlineString };

            InlineString inlineString8 = new InlineString();
            Text text8 = new Text();
            text8.Text = "D";

            inlineString8.Append(text8);

            cell12.Append(inlineString8);

            row3.Append(cell9);
            row3.Append(cell10);
            row3.Append(cell11);
            row3.Append(cell12);

            sheetData1.Append(row1);
            sheetData1.Append(row2);
            sheetData1.Append(row3);
            PageMargins pageMargins1 = new PageMargins(){ Left = 0.7D, Right = 0.7D, Top = 0.75D, Bottom = 0.75D, Header = 0.3D, Footer = 0.3D };

            worksheet1.Append(sheetDimension1);
            worksheet1.Append(sheetViews1);
            worksheet1.Append(sheetFormatProperties1);
            worksheet1.Append(sheetData1);
            worksheet1.Append(pageMargins1);
            return worksheet1;
        }


    }
}

This code is simpler than comparable code that uses the shared string table.

Cheers, Eric White