Using a GUI front and a Perl coded back end, Damaged DOCX2TXT extracts the text from damaged or corrupted Word 2007 docx files where Word 2007 fails to salvage text.
Word 2007 files are really zipped collections of mostly XML files. XML is not tolerant of file corruption. The text from a Word 2007 document is found in the document.xml file within the zipped collection. From the errors it generates Word 2007 appears to be using using both an inadequately corruption tolerant unzipper as well as an inadequate corruption tolerant XML reading algorithm to salvage text from the mentioned XML file within corrupt Word 2007 docx files. Damaged DOCX2TXT on the other hand uses a more corruption tolerant unzipper and a corruption tolerant XML reading algorithm as well, succeeding where MS Word fails.
Damaged DOCX2TXT can also be simply used as a an undamaged Word DOCX file viewer, without having Word 2007 or 2010 installed (or earlier version of Word with the Compatibility Pack). It also works as a text editor of the extracted docx text.
.Net Version 2