python 从PDF中提取附件

下载 Pdftk server:https://www.pdflabs.com/tools/pdftk-server/

如果有密码,先把带密码的PDF的转成无密码的PDF

pdftk 有密码.pdf  input_pw 密码  output 无密码.pdf

如果不带密码,上一步可以跳过

提取附件(必须不带密码)

pdftk 无密码.pdf unpack_files 解压目录

如果python cmd命令时显示不存在命令,

加入 os.chdir(pdftk的bin目录)

 

完整代码:

import os def get_attachment(pdf_path,psd,pdftk_bin_folder): pdf_folder_path=pdf_path.strip(pdf_path.split("\\")[-1]) tem_pdf_path=pdf_folder_path+"temp.pdf" decrypt_command=f"pdftk {pdf_path} input_pw {psd} output {tem_pdf_path}" extract_command=f"pdftk {tem_pdf_path} unpack_files output {pdf_folder_path}" os.chdir(pdftk_bin_folder) os.system(decrypt_command) os.system(extract_command) if __name__ == '__main__': # pdf_path = r"C:\Users\86173\Desktop\test\word\2-protected.pdf" # psd = "dfcver" pdf_path = r"C:\Users\86173\Desktop\test\word\无密码1.pdf" psd = "" pdftk_bin_folder = r"C:\Program Files (x86)\PDFtk Server\bin" try: get_attachment(pdf_path,psd,pdftk_bin_folder) print("提取成功") except Exception as e: print("提取失败") print(e)

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://www.heiqu.com/wspfdy.html