最近需要将 pdf 批量转为 txt ,用软件效果挺差,想起 word 2013 可以打开pdf ,试了下,效果挺不错的。
然后 word 可以保存 txt 。
问题是如何做呢? word 自带 com编程,直接用python 调用 windows api 最好用SaveAs2,这是2010 和2013的API。之前版本的是SaveAs。
代码如下:
1 | # -*- coding: utf-8 -*- |
附录
其他的文件保存形式
只需要修改SaveAs2的参数FileFormat即可(如我保存为TXT FileFormat=2,如果Html则为10)
Name | Value | Description |
---|---|---|
wdFormatDocument | 0 | Microsoft Office Word 97 – 2003 binary file format. |
wdFormatDOSText | 4 | Microsoft DOS text format. |
wdFormatDOSTextLineBreaks | 5 | Microsoft DOS text with line breaks preserved. |
wdFormatEncodedText | 7 | Encoded text format. |
wdFormatFilteredHTML | 10 | Filtered HTML format. |
wdFormatFlatXML | 19 | Open XML file format saved as a single XML file. |
wdFormatFlatXML | 20 | Open XML file format with macros enabled saved as a single XML file. |
wdFormatFlatXMLTemplate | 21 | Open XML template format saved as a XML single file. |
wdFormatFlatXMLTemplateMacroEnabled | 22 | Open XML template format with macros enabled saved as a single XML file. |
wdFormatOpenDocumentText | 23 | OpenDocument Text format. |
wdFormatHTML | 8 | Standard HTML format. |
wdFormatRTF | 6 | Rich text format (RTF). |
wdFormatStrictOpenXMLDocument | 24 | Strict Open XML document format. |
wdFormatTemplate | 1 | Word template format. |
wdFormatText | 2 | Microsoft Windows text format. |
wdFormatTextLineBreaks | 3 | Windows text format with line breaks preserved. |
wdFormatUnicodeText | 7 | Unicode text format. |
wdFormatWebArchive | 9 | Web archive format. |
wdFormatXML | 11 | Extensible Markup Language (XML) format. |
wdFormatDocument97 | 0 | Microsoft Word 97 document format. |
wdFormatDocumentDefault | 16 | Word default document file format. For Word 2010, this is the DOCX format. |
wdFormatPDF | 17 | PDF format. |
wdFormatTemplate97 | 1 | Word 97 template format. |
wdFormatXMLDocument | 12 | XML document format. |
wdFormatXMLDocumentMacroEnabled | 13 | XML document format with macros enabled. |
wdFormatXMLTemplate | 14 | XML template format. |
wdFormatXMLTemplateMacroEnabled | 15 | XML template format with macros enabled. |
wdFormatXPS | 18 | XPS format. |