Office 2007 Open XML global string replace (C#)

[edit: If you're dealing exclusively with Excel, all strings are in /sharedStrings.xml. However, this is not generally true of all Office 2007 docs across the board. See
http://www.rootsilver.com/2008/11/openxml-excel-vs-word---string for more details. This solution/hack works with all OpenXml docs.]
You can spend a lot of time parsing really ugly XML in Office 2007 documents.
[edit: for example, just found this: http://msdn.microsoft.com/en-us/library/cc974107.aspx]

Or, you can do something like this: a global string replace in all the xml files packaged within the .pptx, .xlsx, or .docx files (they're really just .zip files).

For my purposes, I used this code to replace the URL in linked files (i.e. "Paste special") in Power Point files: I can use this code to chug through a stack of pptx files, replacing the url. However, it's generally useful for replacing random regex-able text within word, powerpoint, or excel 2007 documents.



... replaced text:


using System;
using System.Xml;
using System.IO;
using System.Text.RegularExpressions;
using System.IO.Packaging; //C:\Program Files\Reference Assemblies\Microsoft\Framework\v3.0\windowsbase.dll

public class ReplacementUtility {
  public static void Main {
    ReplacementUtility utility = new ReplacementUtility();

    //Word 2007
    string templateFile = "c:\\template.docx";
    string newFile = "c:\\widgets.docx";

    File.Copy(templateFile, newFile);
    utility.UpdateOpenXmlDocument(newFile, "Acme Inc", "Wally's Widgets");

    //Excel 2007
    templateFile = "c:\\template.pptx";
    newFile = "c:\\widgets.pptx";

    File.Copy(templateFile, newFile);
    utility.UpdateOpenXmlDocument(newFile, "Acme Inc", "Wally's Widgets");

  }

  public void UpdateOpenXmlDocument(string fileName, string searchRegexp, string replacement) {
     Regex regex = new Regex(searchRegexp, RegexOptions.Multiline|RegexOptions.IgnoreCase);

    using (Package package = Package.Open(fileName, FileMode.Open, FileAccess.ReadWrite)) {
      foreach (PackagePart part in package.GetParts()) {
        if (part.Uri.ToString().IndexOf(".xml") != -1 
		|| 
		part.Uri.ToString().IndexOf(".rels") != -1) {
          XmlDocument document = new XmlDocument();

          document.Load(part.GetStream());

          if (regex.IsMatch(document.InnerXml)) {
            document.InnerXml = regex.Replace(document.InnerXml, replacement);

            document.Save(part.GetStream(FileMode.Create, FileAccess.Write));
          }
        }
      }
    }
  }
}

Categories

| Comments (3)TrackBacks (0)

0 TrackBacks

Listed below are links to blogs that reference this entry: Office 2007 Open XML global string replace (C#).

TrackBack URL for this entry: http://www.rootsilver.com/mt-tb.cgi/93

3 Comments

John said:

Hi,

I tried this on my project with .docx files. While debugging it works just fine (all the words I want to be replaced, get replaced), but when I try to open the new file in MS Word 2007, Word gives an error about opening the Open XML file, because there are problems with the contents (freely translated from Dutch...).

Anybody got help on this matter? I really don't know where the error may come from.

Jeffrey Knight Author Profile Page said:

Code edit:
document.Save(part.GetStream());

replaced with:
document.Save(part.GetStream(FileMode.Create, FileAccess.Write));

George said:

You can reduce number of lines in this code if use ready library. Aspose or Invoke docx lib(http://invoke.co.nz/products/docx.aspx). this is completely free and will automate this task

Leave a comment