最近需要将 pdf 批量转为 txt ,用软件效果挺差,想起 word 2013 可以打开pdf ,试了下,效果挺不错的。
然后 word 可以保存 txt 。
问题是如何做呢? word 自带 com编程,直接用python 调用 windows api 最好用SaveAs2,这是2010 和2013的API。之前版本的是SaveAs。
代码如下:
1 | # -*- coding: utf-8 -*- |
附录
其他的文件保存形式
只需要修改SaveAs2的参数FileFormat即可(如我保存为TXT FileFormat=2,如果Html则为10)
| Name | Value | Description |
|---|---|---|
| wdFormatDocument | 0 | Microsoft Office Word 97 – 2003 binary file format. |
| wdFormatDOSText | 4 | Microsoft DOS text format. |
| wdFormatDOSTextLineBreaks | 5 | Microsoft DOS text with line breaks preserved. |
| wdFormatEncodedText | 7 | Encoded text format. |
| wdFormatFilteredHTML | 10 | Filtered HTML format. |
| wdFormatFlatXML | 19 | Open XML file format saved as a single XML file. |
| wdFormatFlatXML | 20 | Open XML file format with macros enabled saved as a single XML file. |
| wdFormatFlatXMLTemplate | 21 | Open XML template format saved as a XML single file. |
| wdFormatFlatXMLTemplateMacroEnabled | 22 | Open XML template format with macros enabled saved as a single XML file. |
| wdFormatOpenDocumentText | 23 | OpenDocument Text format. |
| wdFormatHTML | 8 | Standard HTML format. |
| wdFormatRTF | 6 | Rich text format (RTF). |
| wdFormatStrictOpenXMLDocument | 24 | Strict Open XML document format. |
| wdFormatTemplate | 1 | Word template format. |
| wdFormatText | 2 | Microsoft Windows text format. |
| wdFormatTextLineBreaks | 3 | Windows text format with line breaks preserved. |
| wdFormatUnicodeText | 7 | Unicode text format. |
| wdFormatWebArchive | 9 | Web archive format. |
| wdFormatXML | 11 | Extensible Markup Language (XML) format. |
| wdFormatDocument97 | 0 | Microsoft Word 97 document format. |
| wdFormatDocumentDefault | 16 | Word default document file format. For Word 2010, this is the DOCX format. |
| wdFormatPDF | 17 | PDF format. |
| wdFormatTemplate97 | 1 | Word 97 template format. |
| wdFormatXMLDocument | 12 | XML document format. |
| wdFormatXMLDocumentMacroEnabled | 13 | XML document format with macros enabled. |
| wdFormatXMLTemplate | 14 | XML template format. |
| wdFormatXMLTemplateMacroEnabled | 15 | XML template format with macros enabled. |
| wdFormatXPS | 18 | XPS format. |