I have PPTX files generated by users with PowerPoint 2016. The slides have embedded excel worksheets which I need to access for further processing. I am using Open Xml SDK v2.6.1 in my project.
On passing the embedded object stream to the SpreadsheetDocument, using the following code:
using (PresentationDocument pd = PresentationDocument.Open(pptxFile, true))
{
foreach (SlidePart slide in pd.PresentationPart.GetPartsOfType<SlidePart>())
{
foreach (EmbeddedObjectPart eoPart in slide.EmbeddedObjectParts)
{
using (SpreadsheetDocument sd = SpreadsheetDocument.Open(eoPart.GetStream(), true))
{
// do some work with worksheets
var count = sd.WorkbookPart.WorksheetParts.Count();
}
}
}
}
I get the following exception:
System.IO.FileFormatException: File contains corrupted data.
at System.IO.Packaging.ZipPackage..ctor(Stream s, FileMode packageFileMode, FileAccess packageFileAccess)
at System.IO.Packaging.Package.Open(Stream stream, FileMode packageMode, FileAccess packageAccess) at DocumentFormat.OpenXml.Packaging.OpenXmlPackage.OpenCore(Stream stream, Boolean readWriteMode) at DocumentFormat.OpenXml.Packaging.SpreadsheetDocument.Open(Stream stream, Boolean isEditable, OpenSettings openSettings)\r\n
at ...
When I open the pptx package and in the embeddings folder rename oleObject1.bin to oleObject1.zip, then see the file information in WinRar, I see that it is SFX Zip volume and not ZipArchive.
The only way I could get the SpreadsheetDocument to open the embedded object stream was to convert the stream to System.IO.Compression.ZipArchive using DotNetZip library.
So I have the following questions:
1. Is there a way to get Open XML SDK to open embedded excel worksheet stream, without explicit transcoding (from SFX Zip volume to Zip Archive)?
2. What is the best way to write the modified stream back into the presentation document? This is important because, the worksheet data will be updated and has to be written back to the host document.
3. Is there another more elegant way to solve this issue?
Note: this issue does not occur when the worksheet is embedded programmatically using OpenXml SDK in the presentation.