... that's unfortunately not a trivial task.
But with the new Office 2007 word documents and their Open XML format it is possible.
Background: The new office documents (*.docx) are just data containers (zip files), containing different data parts (xml, images, styles etc.) and their relations to each other. The main part itself is a xml describing the content (paragraphs, image positions, plain text etc). More Info at [1] and [2]
On Brian Jones Blog [3] i found the solution. With the "altChunk" element is it possible to reference a xHtml part within the document. 1+1=2 ... with this information and the latest Open XML SDK (April CTP) [4] i wrote a small piece of code wich inserts the content of web pages in a new word document.
Here are the code snippets, maybe you can reuse it somehow. You can also download the Visual Studio 2008 solution at [5]First) Generate a new word document
1: /// <summary>
2: /// Creates the new word document using open XML SDK.
3: /// see http://msdn2.microsoft.com/en-us/library/bb656295.aspx
4: /// </summary>
5: /// <param name="document">The document.</param>
6: public static void CreateNewWordDocument(string document)
7: {
8: using (WordprocessingDocument wordDoc = WordprocessingDocument.Create(document, WordprocessingDocumentType.Document))
9: {
10: // Set the content of the document so that Word can open it.
11: MainDocumentPart mainPart = wordDoc.AddMainDocumentPart();
12:
13: //write the main content xml structure in the main part
14: const string docXml =
15: @"<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?>
16: ent xmlns:w=""http://schemas.openxmlformats.org/wordprocessingml/2006/main"">
17: ody><w:p><w:r><w:t>Generated from Holgers Blog:</w:t></w:r></w:p></w:body>
18: ment>";
19:
20: using (Stream stream = mainPart.GetStream())
21: {
22: byte[] buf = (new UTF8Encoding()).GetBytes(docXml);
23: stream.Write(buf, 0, buf.Length);
24: }
25: }
26: }
Secondly) Add a Bookmark in the Document
2: /// Inserts a bookmark in the document.
3: /// </summary>
4: /// <param name="document">The document.</param>
5: /// <param name="bookmarkName">Name of the bookmark.</param>
6: public static void InsertBookmark(string document, string bookmarkName)
8: using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true))
10: using (Stream stream = wordDoc.MainDocumentPart.GetStream())
11: {
12: //create a xmldocument from the passed xml stream
13: XmlDocument xmlDocument = new XmlDocument();
14: xmlDocument.LoadXml(new StreamReader(stream).ReadToEnd());
15:
16: //find all paragraph nodes and add the bookmark at the latest position.
17: XmlNodeList nodes = FindNodes(xmlDocument, "/w:document/w:body/w:p");
18: if (nodes.Count > 0)
19: {
20: string bookmarkID = Guid.NewGuid().ToString();
21:
22: //create the bookmark string
23: string bookmark = string.Format("<w:bookmarkStart w:id=\"{0}\" w:name=\"{1}\"/><w:bookmarkEnd w:id=\"{2}\" />", bookmarkID, bookmarkName, bookmarkID);
24:
25: //add the bookmark at the latest position
26: nodes[nodes.Count - 1].CreateNavigator().InsertAfter(bookmark);
27:
28: //reset the stream and fill it with the new content
29: byte[] buf = (new UTF8Encoding()).GetBytes(xmlDocument.OuterXml);
30: stream.Seek(0, 0);
31: stream.Write(buf, 0, buf.Length);
32: }
33: }
34: }
35: }
36:
37:
38: /// <summary>
39: /// Finds some nodes in the xml.
40: /// This is extracted to a method, because so many namespaces.
41: /// </summary>
42: /// <param name="xmlDocument">The XML document.</param>
43: /// <param name="xPathExpression">The x path expression.</param>
44: /// <returns></returns>
45: public static XmlNodeList FindNodes(XmlDocument xmlDocument, string xPathExpression)
46: {
47: //create the namespace manager and add some namespaces
48: XmlNamespaceManager namespaceManager = new XmlNamespaceManager(xmlDocument.NameTable);
49: namespaceManager.AddNamespace("r", "http://schemas.openxmlformats.org/officeDocument/2006/relationships");
50: namespaceManager.AddNamespace("tns", "http://schemas.openxmlformats.org/officeDocument/2006/extended-properties");
51: namespaceManager.AddNamespace("dcmitype", "http://purl.org/dc/dcmitype/");
52: namespaceManager.AddNamespace("w", "http://schemas.openxmlformats.org/wordprocessingml/2006/main");
53: namespaceManager.AddNamespace("cp", "http://schemas.openxmlformats.org/package/2006/metadata/core-properties");
54: namespaceManager.AddNamespace("ds", "http://schemas.openxmlformats.org/officeDocument/2006/customXml");
55: namespaceManager.AddNamespace("vt", "http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes");
56: namespaceManager.AddNamespace("v", "urn:schemas-microsoft-com:vml");
57: namespaceManager.AddNamespace("w10", "urn:schemas-microsoft-com:office:word");
58: namespaceManager.AddNamespace("wne", "http://schemas.microsoft.com/office/word/2006/wordml");
59: namespaceManager.AddNamespace("b", "http://schemas.openxmlformats.org/officeDocument/2006/bibliography");
60: namespaceManager.AddNamespace("sl", "http://schemas.openxmlformats.org/schemaLibrary/2006/main");
61: namespaceManager.AddNamespace("m", "http://schemas.openxmlformats.org/officeDocument/2006/math");
62: namespaceManager.AddNamespace("o", "urn:schemas-microsoft-com:office:office");
63: namespaceManager.AddNamespace("dcterms", "http://purl.org/dc/terms/");
64: namespaceManager.AddNamespace("a", "http://schemas.openxmlformats.org/drawingml/2006/main");
65: namespaceManager.AddNamespace("dc", "http://purl.org/dc/elements/1.1/");
66: namespaceManager.AddNamespace("wp", "http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing");
67: namespaceManager.AddNamespace("xsi", "http://www.w3.org/2001/XMLSchema-instance");
68: namespaceManager.AddNamespace("ve", "http://schemas.openxmlformats.org/markup-compatibility/2006");
69: namespaceManager.AddNamespace("pkg", "http://schemas.microsoft.com/office/2006/xmlPackage");
70:
71: return xmlDocument.SelectNodes(xPathExpression, namespaceManager);
72: }
Thirdly) Add the xHtml part to the Document and put it in the Bookmark
2: /// Adds the XHTML part in the document.
3: /// see Brian Jones Blog: http://blogs.msdn.com/brian_jones/archive/2006/08/08/692705.aspx
6: /// <param name="xHtmlStream">The x HTML stream.</param>
7: /// <param name="bookmarkName">Name of the bookmark.</param>
8: public static void AddXHtmlPart(string document, Stream xHtmlStream, string bookmarkName)
10: using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true))
12: MainDocumentPart mainPart = wordDoc.MainDocumentPart;
13:
14: string relationID = "myExternalXhtmlID";
15: string altChunk = "<w:altChunk r:id=\"" + relationID + "\" />";
16:
17: //add the extendedPart (xhtml)
18: ExtendedPart extPart = mainPart.AddExtendedPart("http://schemas.openxmlformats.org/officeDocument/2006/relationships/aFChunk", "application/xhtml+xml", "/AddedXhtml.xhtml", relationID);
19: extPart.FeedData(xHtmlStream);
20:
21: //create dictionary with BookmarkeNames / xhtml snippets
22: Dictionary<string, string> xmlSnippetCollection = new Dictionary<string, string>();
23: xmlSnippetCollection.Add(bookmarkName, altChunk);
25: //and replace the bookmarks
26: using (Stream stream = mainPart.GetStream())
27: {
28: ReplaceBookmarks(stream, xmlSnippetCollection);
29: }
30: }
31: }
32:
33:
34:
35: /// <summary>
36: /// Replaces the bookmarks found in the xml stream with the xml snippets from the collection
37: /// </summary>
38: /// <param name="stream">The stream.</param>
39: /// <param name="xmlSnippetCollection">The xmlSnippet collection.</param>
40: private static void ReplaceBookmarks(Stream stream, Dictionary<string, string> xmlSnippetCollection)
41: {
42: //create xmldocument from the passed xml stream
43: XmlDocument xmlDocument = new XmlDocument();
44: xmlDocument.LoadXml(new StreamReader(stream).ReadToEnd());
45:
46: //find all Bookmarks
47: XmlNodeList selectedNodes = FindNodes(xmlDocument, "/w:document/w:body//w:bookmarkStart");
48:
49: if (selectedNodes.Count > 0)
50: {
51: foreach (XmlNode selectedNode in selectedNodes)
52: {
53:
54: //add the r:namespace if not exist. Its neccessary for the chunk
55: if (xmlDocument.DocumentElement.Attributes["xmlns:r"] == null)
56: {
57: XmlAttribute test = xmlDocument.CreateAttribute("xmlns", "r", "http://www.w3.org/2000/xmlns/");
58: test.Value = "http://schemas.openxmlformats.org/officeDocument/2006/relationships";
59: xmlDocument.DocumentElement.Attributes.Append(test);
60: }
61:
62: string bookmarkName = selectedNode.Attributes[1].Value;
63: if (xmlSnippetCollection.ContainsKey(bookmarkName))
64: {
65: //insert the references after the bookmarks
66: //(after the paragraph, else the document produce errors)
67: if (selectedNode.ParentNode != null && selectedNode.ParentNode.Name == "w:p")
68: selectedNode.ParentNode.CreateNavigator().InsertAfter(xmlSnippetCollection[bookmarkName]);
69: else selectedNode.CreateNavigator().InsertAfter(xmlSnippetCollection[bookmarkName]);
70: }
71: }
72:
73: //reset the stream and fill it with the new content
74: byte[] buf = (new UTF8Encoding()).GetBytes(xmlDocument.OuterXml);
75: stream.Seek(0, 0);
76: stream.Write(buf, 0, buf.Length);
77: }
78: }
Fourthly) Put everything together
1: protected void btnDoTheMagic_Click(object sender, EventArgs e)
2: {
3: // get the stream of the website
4: Stream stream = WebRequest.Create(TextBox1.Text).GetResponse().GetResponseStream();
5:
6: // define the filename
7: string fileName = Path.Combine(Server.MapPath(""), "Generated.docx");
8:
9: // generate a new document
10: CreateNewWordDocument(fileName);
11:
12: // add a bookmark in the document
13: InsertBookmark(fileName, "AddXHtmlHere");
14:
15: // add the web site stream in the document
16: AddXHtmlPart(fileName, stream, "AddXHtmlHere");
17:
18: //Force this content to be downloaded
19: //as a Word document with the name of your choice
20: Response.AppendHeader("Content-Type", "application/msword");
21: Response.AppendHeader("Content-disposition", "attachment; filename=myword.doc");
22:
23: Response.WriteFile(fileName);
24: Response.End();
At the end it looks like this. Ok, it's not really original, but acceptable ... or not?
[1] http://openxmldeveloper.com/[2] Open XML SDK Documentation[3] Using XHTML in a WordprocessingML document[4] download OpenXML SDK[5] WordProgramming.zip
Remember Me
a@href@title, strike
Theme design by Jelle Druyts
Pick a theme: BlogXP business calmBlue Candid Blue dasBlog dasblogger dasblueblog dasEmerald DirectionalRedux Discreet Blog Blue Elegante essence Just Html MadsSimple Mobile Mono Movable Radio Blue Movable Radio Heat nautica022 orangeCream Portal Project84 Project84Grass Slate Sound Waves Tricoleur useit.com Voidclass2
Powered by: newtelligence dasBlog 2.3.9074.18820
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.
© Copyright 2010, Your Name Here
E-mail