基于openoffice实现html、doc互换

合集下载

Java实现在线预览的示例代码（openOffice实现）

Java实现在线预览的⽰例代码（openOffice实现）简介之前有写了poi实现在线预览的⽂章，⾥⾯也说到了使⽤openOffice也可以做到，这⾥就详细介绍⼀下。

我的实现逻辑有两种：⼀、利⽤jodconverter(基于OpenOffice服务)将⽂件(.doc、.docx、.xls、.ppt)转化为html格式。

⼆、利⽤jodconverter(基于OpenOffice服务)将⽂件(.doc、.docx、.xls、.ppt)转化为pdf格式。

转换成html格式⼤家都能理解，这样就可以直接在浏览器上查看了，也就实现了在线预览的功能；转换成pdf格式这点，需要⽤户安装了Adobe Reader XI，这样你会发现把pdf直接拖到浏览器页⾯可以直接打开预览，这样也就实现了在线预览的功能。

将⽂件转化为html格式或者pdf格式话不多说，直接上代码。

package com.pdfPreview.util;import java.io.File;import java.io.FileInputStream;import java.io.FileOutputStream;import java.io.IOException;import java.io.InputStream;import java.io.OutputStream;import .ConnectException;import java.text.SimpleDateFormat;import java.util.Date;import com.artofsolving.jodconverter.DocumentConverter;import com.artofsolving.jodconverter.openoffice.connection.OpenOfficeConnection;import com.artofsolving.jodconverter.openoffice.connection.SocketOpenOfficeConnection;import com.artofsolving.jodconverter.openoffice.converter.OpenOfficeDocumentConverter;/*** 利⽤jodconverter(基于OpenOffice服务)将⽂件(*.doc、*.docx、*.xls、*.ppt)转化为html格式或者pdf格式，* 使⽤前请检查OpenOffice服务是否已经开启, OpenOffice进程名称：soffice.exe | soffice.bin** @author yjclsx*/public class Doc2HtmlUtil {private static Doc2HtmlUtil doc2HtmlUtil;/*** 获取Doc2HtmlUtil实例*/public static synchronized Doc2HtmlUtil getDoc2HtmlUtilInstance() {if (doc2HtmlUtil == null) {doc2HtmlUtil = new Doc2HtmlUtil();}return doc2HtmlUtil;}/*** 转换⽂件成html** @param fromFileInputStream:* @throws IOException*/public String file2Html(InputStream fromFileInputStream, String toFilePath,String type) throws IOException {Date date = new Date();SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMddHHmmss");String timesuffix = sdf.format(date);String docFileName = null;String htmFileName = null;if("doc".equals(type)){docFileName = "doc_" + timesuffix + ".doc";htmFileName = "doc_" + timesuffix + ".html";}else if("docx".equals(type)){docFileName = "docx_" + timesuffix + ".docx";htmFileName = "docx_" + timesuffix + ".html";}else if("xls".equals(type)){docFileName = "xls_" + timesuffix + ".xls";htmFileName = "xls_" + timesuffix + ".html";}else if("ppt".equals(type)){docFileName = "ppt_" + timesuffix + ".ppt";htmFileName = "ppt_" + timesuffix + ".html";}else{return null;}File htmlOutputFile = new File(toFilePath + File.separatorChar + htmFileName);File docInputFile = new File(toFilePath + File.separatorChar + docFileName);if (htmlOutputFile.exists())htmlOutputFile.delete();htmlOutputFile.createNewFile();if (docInputFile.exists())docInputFile.delete();docInputFile.createNewFile();/*** 由fromFileInputStream构建输⼊⽂件*/try {OutputStream os = new FileOutputStream(docInputFile);int bytesRead = 0;byte[] buffer = new byte[1024 * 8];while ((bytesRead = fromFileInputStream.read(buffer)) != -1) {os.write(buffer, 0, bytesRead);}os.close();fromFileInputStream.close();} catch (IOException e) {}OpenOfficeConnection connection = new SocketOpenOfficeConnection(8100);try {connection.connect();} catch (ConnectException e) {System.err.println("⽂件转换出错，请检查OpenOffice服务是否启动。

PHP实现wordexcelppt转换为PDF

PHP实现wordexcelppt转换为PDF 前段时间负责公司内部⽂件平台的设计，其中有⼀个需求是要能够在线浏览⽤户上传的 office ⽂件。

我的思路是先将 office 转换成 PDF，再通过 pdf.js 插件解析 PDF ⽂件，使其能在任何浏览器下查看。

可以通过 PHP 的 COM 组件，调⽤其它能够处理 office ⽂件的应⽤程序，利⽤提供的接⼝来转换 PDF ⽂件。

OpenOfficeOpenOffice 是⼀套开源跨平台的办公软件，由许多⾃由软件⼈⼠共同来维持，让⼤家能在 Microsoft Office 之外，还能有免费的 Office 可以使⽤。

OpenOffice 与微软的办公软件套件兼容，能将 doc、xls、ppt 等⽂件转换为 PDF 格式，其功能绝对不⽐ Microsoft Office 差。

OpenOffice 需要 java ⽀持，请确认安装了 JDK，并配置了 JRE 环境变量。

1. 配置组件服务OpenOffice 安装完成之后，按 win+R 快捷键进⼊运⾏菜单，输⼊ Dcomcnfg 打开组件服务。

[组件服务] >> [计算机] >> [我的电脑] >> [DCOM配置] >> [OpenOffice Service Manager]右键打开属性⾯板，选择安全选项卡，分别在启动和激活权限和访问权限上勾选⾃定义，添加 Everyone 的权限。

↑启动和激活权限和访问权限都使⽤⾃定义配置↑添加 Everyone ⽤户组，记得确认前先检查名称↑两个⾃定义配置相同，允许 Everyone 拥有所有权限再选择标识选项卡，勾选交互式⽤户，保存设置后退出。

2. 后台运⾏软件安装完 OpenOffice 后，需要启动⼀次确认软件可以正常运⾏，然后再打开命令⾏运⾏以下命令：切换到安装⽬录：后台运⾏该软件：PS：该命令只需要执⾏⼀次，就可以使软件⼀直在后台运⾏，即使重启服务器也不受影响。

officetohtml实现方式

officetohtml实现方式1. 引言1.1 概述officetohtml是一种将Office文档（如Word、Excel或PowerPoint）转换为HTML格式的工具或技术。

它可以帮助用户将传统的Office文档转化为网页友好的格式，使其在各种平台和设备上都可以方便地访问和阅读。

1.2 文章结构本文将详细介绍officetohtml实现方式，并探讨各种可能的方法。

首先，我们会对officetohtml进行简要概述，了解其基本原理和用途。

接下来，我们将重点介绍两种常见的实现方式，并比较它们的优缺点。

最后，我们会总结文章内容并对officetohtml实现方式进行评价和展望。

1.3 目的本文旨在帮助读者更好地理解officetohtml实现方式，并为他们选择适合自己需求的方法提供指导和参考。

通过详细介绍和比较不同的实现方式，读者可以更全面地了解这项技术，并能够根据自身情况做出明智的决策。

以上是“1. 引言”部分内容，请根据需要进行修改和完善。

2. officetohtml实现方式2.1 什么是officetohtmlOfficetohtml是一种将Office文档（如Word、Excel和PowerPoint）转换为HTML格式的工具或技术。

它能够将这些文档转换为在Web浏览器中进行展示或共享的可交互的HTML页面。

2.2 实现方式一实现方式一可以使用Microsoft Office提供的内置功能来实现officetohtml转换。

以下是基本步骤：1. 打开相应的Office文档（例如Word文档）。

2. 选择“另存为”选项，并选择HTML格式作为要保存的文件类型。

3. 自定义HTML选项，以便满足特定需求，比如页面布局、样式等。

4. 单击“保存”按钮，将Office文档保存为HTML格式。

使用此方法实现officetohtml转换的优点是简单且易于操作。

同时，由于该功能是由Microsoft Office自身提供的，所以它能够确保高度兼容性和准确性。

Word文档转换为HTML帮助文档操作手册范本

Word文档转换为HTML帮助文档操作手册一、使用到的软件●DOC2CHM●Dreamweaver CS3●Help and manual 4二、操作步骤1. 先建立一个工作目录。

如hhwork。

2.将需要转换的文件复制到此工作目录下。

如果是中文文件名，最好将其改为英文文件名。

例：现在要将《小神探点检定修信息管理系统使用手册0.3.6.doc》转换为Html格式的帮助文档，首先将此文档复制到hhwork目录下并将其更名为manual36.doc。

如图1所示。

图13.打开软件DOC2CHM，然后找到manual36.doc，然后点击“Convert”按钮，如图2所示。

图24. 程序分析文档后，打开如图3所示的界面。

图35. 在图3所示的界面中选择默认的“Outline”，然后点击“Last>>”按钮，打开图4所示的界面。

图46. 在图4所示的界面中点击“Convert”按钮，程序开始将文档Manual36.doc转换为Html文档，并保存在Manual36子目录下。

7. 在子目录下的以Outline开头的文件夹下，将后缀名为jpg的文件名更改一下，目的是每个文件的名称不同。

8. 用Dreamweaver打开此目录中的所有htm文件，如图5。

图59. 在图5所示的界面中将出现在标题前的标签删除掉，然后将标题复制到标题框中。

然后将图片的更改正确。

10. 打开Help and Manual 4，如图6。

图611. 在图6所示的界面中点击“新建”按钮创建新的帮助方案。

如图7所示。

图712. 在图7所示的界面中选择“导入现有的文件从…”，然后选择“常规HTML和文本文件”，在下面的框中指定源文件夹的位置。

然后点击“下一步”。

程序打开图8所示的界面。

图813. 在上图中指定输出文件的位置，可以采用默认位置。

然后点击“下一步”打开图9所示的界面。

图914. 在图9所示的界面中将不需要的文件移除，然后点击“下一步”打开图10所示的界面。

java调用openoffice将office系列文档转换为PDF的示例方法

java调⽤openoffice将office系列⽂档转换为PDF的⽰例⽅法前导：发过程中经常会使⽤java将office系列⽂档转换为PDF，⼀般都使⽤微软提供的openoffice+jodconverter 实现转换⽂档。

openoffice既有windows版本也有linux版。

不⽤担⼼⽣产环境是linux系统。

1、openoffice依赖jar，以maven为例：<dependency><groupId>com.artofsolving</groupId><artifactId>jodconverter</artifactId><version>2.2.1</version></dependency><dependency><groupId>org.openoffice</groupId><artifactId>jurt</artifactId><version>3.0.1</version></dependency><dependency><groupId>org.openoffice</groupId><artifactId>ridl</artifactId><version>3.0.1</version></dependency><dependency><groupId>org.openoffice</groupId><artifactId>juh</artifactId><version>3.0.1</version></dependency><dependency><groupId>org.openoffice</groupId><artifactId>unoil</artifactId><version>3.0.1</version></dependency><dependency><groupId>org.slf4j</groupId><artifactId>slf4j-jdk14</artifactId><version>1.4.3</version></dependency>2、直接上转换代码，需要监听openoffice应⽤程序8100端⼝即可。

jodconverterlocalproperties 说明

jodconverterlocalproperties 说明
jodconverter是一个Java 库，用于将Office 文档（如Microsoft Word、Excel 和PowerPoint）转换为其他格式，如PDF、HTML、纯文本等。

它使用LibreOffice 或OpenOffice 的后台进程来执行转换。

jodconverterlocalproperties可能是指在使用jodconverter时需要配置的一些本地
属性。

这些属性可能包括：
1.Office Home Location: 指定LibreOffice 或OpenOffice 的安装路径。

这
是jodconverter需要知道以便能够启动后台进程并执行文档转换的位置。

2.Port Number: jodconverter启动LibreOffice 或OpenOffice 的后台进程时可能使
用的端口号。

3.Timeout Settings: 指定转换操作的超时时间，以避免转换操作无限制地挂起。

4.Logging Configuration: 配置日志记录，以便跟踪和调试转换过程中的任何问题。

5.Other System Properties: 可能还包括其他与LibreOffice 或OpenOffice 后台进程
交互所需的系统属性。

这些属性通常在配置文件或环境变量中设置，以便jodconverter在执行转换时使用。

具体的属性和配置方法可能因jodconverter的版本和使用的Office 套件的不同而有所变化。

【好文翻译】一步一步教你使用Spire.Doc转换Word文档格式

【好⽂翻译】⼀步⼀步教你使⽤Spire.Doc转换Word⽂档格式背景：本⽂试图证明和审查的格式转换能⼒。

很长的⼀段时间⾥，为了操作⽂档，开发⼈员不得不在服务器上安装Office软件。

⾸先，这是⼀个很糟糕的设计和实践。

第⼆，微软从没打算把Office作为⼀个服务器组件，它也⽤来在服务器端解释和操作⽂档的。

于是乎，产⽣了类似Spire.Doc这样的类库。

当我们讨论这个问题时，值得⼀提的是Office Open Xml. Office Open XML (也有⾮正式地称呼为 OOXML 或OpenXML) 是⼀种压缩的, 基于XML的⽂件格式，由微软开发，⽤于表现电⼦表格，展⽰图表，演⽰和⽂字处理等。

在2005年11⽉，微软宣布作为ECMA国际主要合作伙伴，将其开发的基于XML的⽂件格式标准化，称之为"Office Open XML"。

Open XML的引进使office⽂档结构更加标准化，并且开发⼈员使⽤ Open XML SDK可以直接进⾏很多简单的操作，但是仍然有很多差距，如将word⽂档转换成其他格式，⽐如PDF，图像，或者HTML等。

这就是Spire.Doc 来拯救开发⼈员的原因。

⽂档转换：我将在⽂章的其余部分来介绍Spire.Doc可以适⽤的多种场景。

⽂中展⽰的所有例⼦均可以在 Spire.Doc 的DEMO中找到，你可以很容易地下载并使⽤它们。

我的例⼦是⼀个简单的控制台程序，当然它也⽀持其他平台，如web项⽬或者Silverlight项⽬等。

⽤他们⾃⼰的话来说，Spire.Doc 宣称："Spire.Doc for .NET 可以将word⽂件转换成最常见和流⾏的格式。

"为了开始使⽤Spire.Doc，你⾸先需要添加Spire.Doc,Spire.License 和Spire.Pdf引⽤到你的项⽬中，这两个组件是打包在Spire.Doc中的.你需要⼀个有效的Spire.Doc授权⽂件才能使⽤这个类库，否则它将在⽂档中显⽰"评估版本"警告。

Java实现word文档在线预览，读取office（word,excel,ppt）文件

Java实现word⽂档在线预览，读取office（word,excel,ppt）⽂件想要实现word或者其他office⽂件的在线预览，⼤部分都是⽤的两种⽅式，⼀种是使⽤openoffice转换之后再通过其他插件预览，还有⼀种⽅式就是通过POI读取内容然后预览。

⼀、使⽤openoffice⽅式实现word预览主要思路是：1.通过第三⽅⼯具openoffice，将word、excel、ppt、txt等⽂件转换为pdf⽂件2.通过swfTools将pdf⽂件转换成swf格式的⽂件3.通过FlexPaper⽂档组件在页⾯上进⾏展⽰我使⽤的⼯具版本：openof：3.4.1swfTools：1007FlexPaper：这个关系不⼤，我随便下的⼀个。

推荐使⽤1.5.1JODConverter：需要jar包，如果是maven管理直接引⽤就可以操作步骤：1.office准备下载openoffice：从过往⽂件，其他语⾔中找到中⽂版3.4.1的版本下载后，解压缩，安装然后找到安装⽬录下的program ⽂件夹在⽬录下运⾏soffice -headless -accept="socket,host=127.0.0.1,port=8100;urp;" -nofirststartwizard如果运⾏失败，可能会有提⽰，那就加上 .\ 在运⾏试⼀下这样openoffice的服务就开启了。

2.将flexpaper⽂件中的js⽂件夹(包含了flexpaper_flash_debug.js，flexpaper_flash.js,jquery.js,这三个js⽂件主要是预览swf⽂件的插件)拷贝⾄⽹站根⽬录;将FlexPaperViewer.swf拷贝⾄⽹站根⽬录下(该⽂件主要是⽤在⽹页中播放swf⽂件的播放器)项⽬结构：页⾯代码：fileUpload.jsp<%@ page language="java" contentType="text/html; charset=UTF-8"pageEncoding="UTF-8"%><!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "/TR/html4/loose.dtd"><html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><title>⽂档在线预览系统</title><style>body {margin-top:100px;background:#fff;font-family: Verdana, Tahoma;}a {color:#CE4614;}#msg-box {color: #CE4614; font-size:0.9em;text-align:center;}#msg-box .logo {border-bottom:5px solid #ECE5D9;margin-bottom:20px;padding-bottom:10px;}#msg-box .title {font-size:1.4em;font-weight:bold;margin:0 0 30px 0;}#msg-box .nav {margin-top:20px;}</style></head><body><div id="msg-box"><form name="form1" method="post" enctype="multipart/form-data" action="docUploadConvertAction.jsp"><div class="title">请上传要处理的⽂件，过程可能需要⼏分钟，请稍候⽚刻。

openoffice教程

Openoffice入门教程首先安装Apache_OpenOffice_4.0.1_Win_x86_install_zh-CN.exe或者Apache_OpenOffice_3.2.0_LinuxX86-64_install_wJRE_zh-CN.tar.gzLinux 说明1、复制Apache_OpenOffice_3.2.0_LinuxX86-64_install_wJRE_zh-CN.tar.gz到/opt目录安装，安装命令如下：tar -xzvf Apache_OpenOffice_3.2.0_LinuxX86-64_install_wJRE_zh-CN.tar.gzcd OOO320_m12_native_packed-1_zh-CN.9483/RPMSrpm -ivh *.rpm2、在/etc/rc.d/rc.local 中加入如下内容（假设openoffice安装在/opt/3）：/opt/3/program/soffice"-accept=socket,host=localhost,port=8100;urp;StarOffice.ServiceManager" -nologo -headless -nofirststartwizard &3、在shell命令窗口执行命令：nohup /opt/3/program/soffice -accept="socket,host=localhost,port=8100;urp;StarOffice.ServiceManager" -nologo -headless -nofirststartwizard &4、查看启动命令ps -ef | grep openoffice6、重启oaWindow说明echo "用以下命令启动OpenOffice服务"进入目录cd C:\Program Files (x86)\OpenOffice 4\program输入启动服务soffice -headless -accept="socket,host=127.0.0.1,port=8100;urp;" -nofirststartwizardJar包说明可以直接在当前word文档下载jar包例子1commons-io-1.4.jar jodconverter-2.2.0.jar jodconverter-cli-2.2.0.jar juh-2.2.0.jar jurt-2.2.0.jar ridl-2.2.0.jar slf4j-api-1.4.0.jar slf4j-jdk14-1.4.0.jar unoil-2.2.0.jar xstream-1.2.2.jar例子2需要多家两个jar包jooconverter-2.0rc2.jarridl-2.0.jar。

odconv应用实例

odconv应用实例
odconv是一个用于将OpenDocument格式转换为其他格式的命令行工具。

它可以将ODF文档（.odt，.ods，.odp等）转换为其他常见的文档格式，比如PDF，HTML，纯文本等。

下面我将从几个方面来介绍odconv的应用实例。

1. 将ODF文档转换为PDF格式：
通过使用odconv命令行工具，你可以将一个OpenDocument 格式的文档（比如.odt文件）转换为PDF格式。

这在需要与他人共享文档，或者需要打印文档时非常有用。

你只需要运行odconv命令并指定输入文件和输出文件的路径即可完成转换。

2. 将ODF文档转换为HTML格式：
另一个常见的应用实例是将ODF文档转换为HTML格式，这样可以在网页上轻松地展示文档内容，或者在网页中嵌入文档。

odconv可以帮助你将.odt或.ods文件转换为HTML格式，使得文档内容可以在网页上方便地展示。

3. 将ODF文档转换为纯文本格式：
有时候我们可能需要将文档的内容提取为纯文本格式，以便进行文本分析或者其他处理。

odconv可以帮助你将ODF文档转换为纯文本格式，方便进行后续的处理和分析。

4. 批量转换：
odconv还支持批量转换功能，你可以指定一个文件夹，让odconv批量处理其中的所有ODF文档，将它们转换为指定的格式，这在处理大量文档时非常方便。

总之，odconv是一个非常实用的工具，可以帮助你将OpenDocument格式的文档转换为其他常见格式，方便文档的分享、展示和处理。

希望以上介绍能够帮助你更好地了解odconv的应用实例。

Linux下openoffice转换word文档到pdf文档时中文乱码问题

Linux下openoffice转换word文档到pdf文档时中文乱码问题报错显示：INFO: connectedJun 1, 2009 11:21:52 AMcom.artofsolving.jodconverter.openoffice.connection.AbstractOpenOffic eConnection disposingINFO: disconnectedException in thread "main"com.artofsolving.jodconverter.openoffice.connection.OpenOfficeExcepti on: conversion failed: could not load input documentatcom.artofsolving.jodconverter.openoffice.converter.OpenOfficeDocument Converter.loadAndExport(OpenOfficeDocumentConverter.java:131)atcom.artofsolving.jodconverter.openoffice.converter.OpenOfficeDocument Converter.convertInternal(OpenOfficeDocumentConverter.java:120)atcom.artofsolving.jodconverter.openoffice.converter.AbstractOpenOffice DocumentConverter.convert(AbstractOpenOfficeDocumentConverter.java:10 4)atcom.artofsolving.jodconverter.openoffice.converter.AbstractOpenOffice DocumentConverter.convert(AbstractOpenOfficeDocumentConverter.java:74) atcom.artofsolving.jodconverter.openoffice.converter.AbstractOpenOffice DocumentConverter.convert(AbstractOpenOfficeDocumentConverter.java:70) atcom.artofsolving.jodconverter.cli.ConvertDocument.convertOne(ConvertD ocument.java:154)atcom.artofsolving.jodconverter.cli.ConvertDocument.main(ConvertDocument.java:139)问题解决：此时可能是linux下的jre没有相应的中文字体的问题下载 simhei.ttf 黑体simsun.ttc 宋体两种字体文件找到jre的字体路径：/usr/jdk1.6.0_22/jre/lib/fonts新建文件夹fallback：mkdir fallback将字体simhei.ttf 、simsun.ttc拷贝到/usr/jdk1.6.0_22/jre/lib/fonts/fallback目录下重启openofficeps ax|grep soffice显示如下：22739 pts/5 S 0:00 /bin/sh/opt/3/program/soffice -headless-accept=socket,host=127.0.0.1,port=8100;urp; -nofirststartwizard22747 pts/5 Sl 0:01/opt/3/program/soffice.bin -headless-accept=socket,host=127.0.0.1,port=8100;urp; -nofirststartwizard23789 pts/5 S+ 0:00 grep soffice关闭soffice进程：kill 22739以后台启动openoffice：/opt/3/program/soffice -headless-accept=socket,host=127.0.0.1,port=8100;urp; -nofirststartwizard & 问题解决了！！但是，这种情况下只能解决，宋体和黑体的乱码问题，其他字体的还需添加字体文件来解决。

C#实现HTML转WORD及WORD转PDF的方法

C#实现HTML转WORD及WORD转PDF的⽅法本⽂实例讲述了C#实现HTML转WORD及WORD转PDF的⽅法。

分享给⼤家供⼤家参考。

具体如下：功能：实现HTML转WORD，WORD转PDF具体代码如下：using System;using System.Collections.Generic;using ponentModel;using System.Data;using System.Drawing;using System.Text;using System.Windows.Forms;using Word = Microsoft.Office.Interop.Word;using oWord = Microsoft.Office.Interop.Word;using System.Reflection;using System.Configuration;using System.Web;using System.Web.Security;using System.Web.UI;using System.Web.UI.WebControls;using System.Web.UI.WebControls.WebParts;using System.Web.UI.HtmlControls;using Microsoft.Office.Core;using System.Text.RegularExpressions;namespace WindowsApplication2{public partial class Form1 : Form{public Form1(){InitializeComponent();}private void button1_Click(object sender, EventArgs e){object oMissing = System.Reflection.Missing.Value;object oEndOfDoc = "\\endofdoc"; /* \endofdoc is a predefined bookmark *///Start Word and create a new document.Word._Application oWord;Word._Document oDoc;oWord = new Word.Application();oWord.Visible = true;oDoc = oWord.Documents.Add(ref oMissing, ref oMissing,ref oMissing, ref oMissing);//Insert a paragraph at the beginning of the document.Word.Paragraph oPara1;oPara1 = oDoc.Content.Paragraphs.Add(ref oMissing);oPara1.Range.Text = "Heading 1";oPara1.Range.Font.Bold = 1;oPara1.Format.SpaceAfter = 24; //24 pt spacing after paragraph.oPara1.Range.InsertParagraphAfter();//Insert a paragraph at the end of the document.Word.Paragraph oPara2;object oRng = oDoc.Bookmarks.get_Item(ref oEndOfDoc).Range;oPara2 = oDoc.Content.Paragraphs.Add(ref oRng);oPara2.Range.Text = "Heading 2";oPara2.Format.SpaceAfter = 6;oPara2.Range.InsertParagraphAfter();//Insert another paragraph.Word.Paragraph oPara3;oRng = oDoc.Bookmarks.get_Item(ref oEndOfDoc).Range;oPara3 = oDoc.Content.Paragraphs.Add(ref oRng);oPara3.Range.Text = "This is a sentence of normal text. Now here is a table:";oPara3.Range.Font.Bold = 0;oPara3.Format.SpaceAfter = 24;oPara3.Range.InsertParagraphAfter();//Insert a 3 x 5 table, fill it with data, and make the first row//bold and italic.Word.Table oTable;Word.Range wrdRng = oDoc.Bookmarks.get_Item(ref oEndOfDoc).Range;oTable = oDoc.Tables.Add(wrdRng, 3, 5, ref oMissing, ref oMissing);oTable.Range.ParagraphFormat.SpaceAfter = 6;int r, c;string strText;for (r = 1; r <= 3; r++)for (c = 1; c <= 5; c++){strText = "r" + r + "c" + c;oTable.Cell(r, c).Range.Text = strText;}oTable.Rows[1].Range.Font.Bold = 1;oTable.Rows[1].Range.Font.Italic = 1;//Add some text after the table.Word.Paragraph oPara4;oRng = oDoc.Bookmarks.get_Item(ref oEndOfDoc).Range;oPara4 = oDoc.Content.Paragraphs.Add(ref oRng);oPara4.Range.InsertParagraphBefore();oPara4.Range.Text = "And here's another table:";oPara4.Format.SpaceAfter = 24;oPara4.Range.InsertParagraphAfter();//Insert a 5 x 2 table, fill it with data, and change the column widths.wrdRng = oDoc.Bookmarks.get_Item(ref oEndOfDoc).Range;oTable = oDoc.Tables.Add(wrdRng, 5, 2, ref oMissing, ref oMissing);oTable.Range.ParagraphFormat.SpaceAfter = 6;for (r = 1; r <= 5; r++)for (c = 1; c <= 2; c++){strText = "r" + r + "c" + c;oTable.Cell(r, c).Range.Text = strText;}oTable.Columns[1].Width = oWord.InchesToPoints(2); //Change width of columns 1 & 2oTable.Columns[2].Width = oWord.InchesToPoints(3);//Keep inserting text. When you get to 7 inches from top of the//document, insert a hard page break.object oPos;double dPos = oWord.InchesToPoints(7);oDoc.Bookmarks.get_Item(ref oEndOfDoc).Range.InsertParagraphAfter();do{wrdRng = oDoc.Bookmarks.get_Item(ref oEndOfDoc).Range;wrdRng.ParagraphFormat.SpaceAfter = 6;wrdRng.InsertAfter("A line of text");wrdRng.InsertParagraphAfter();oPos = wrdRng.get_Information(Word.WdInformation.wdVerticalPositionRelativeToPage);}while (dPos >= Convert.ToDouble(oPos));object oCollapseEnd = Word.WdCollapseDirection.wdCollapseEnd;object oPageBreak = Word.WdBreakType.wdPageBreak;wrdRng.Collapse(ref oCollapseEnd);wrdRng.InsertBreak(ref oPageBreak);wrdRng.Collapse(ref oCollapseEnd);wrdRng.InsertAfter("We're now on page 2. Here's my chart:");wrdRng.InsertParagraphAfter();//Insert a chart.Word.InlineShape oShape;object oClassType = "MSGraph.Chart.8";wrdRng = oDoc.Bookmarks.get_Item(ref oEndOfDoc).Range;oShape = wrdRng.InlineShapes.AddOLEObject(ref oClassType, ref oMissing,ref oMissing, ref oMissing, ref oMissing,ref oMissing, ref oMissing, ref oMissing);//Demonstrate use of late bound oChart and oChartApp objects to//manipulate the chart object with MSGraph.object oChart;object oChartApp;oChart = oShape.OLEFormat.Object;oChartApp = oChart.GetType().InvokeMember("Application",BindingFlags.GetProperty, null, oChart, null);//Change the chart type to Line.object[] Parameters = new Object[1];Parameters[0] = 4; //xlLine = 4oChart.GetType().InvokeMember("ChartType", BindingFlags.SetProperty,null, oChart, Parameters);//Update the chart image and quit MSGraph.oChartApp.GetType().InvokeMember("Update",BindingFlags.InvokeMethod, null, oChartApp, null);oChartApp.GetType().InvokeMember("Quit",BindingFlags.InvokeMethod, null, oChartApp, null);//... If desired, you can proceed from here using the Microsoft Graph//Object model on the oChart and oChartApp objects to make additional//changes to the chart.//Set the width of the chart.oShape.Width = oWord.InchesToPoints(6.25f);oShape.Height = oWord.InchesToPoints(3.57f);//Add text after the chart.wrdRng = oDoc.Bookmarks.get_Item(ref oEndOfDoc).Range;wrdRng.InsertParagraphAfter();wrdRng.InsertAfter("THE END.");//Close this form.this.Close();}private void button2_Click(object sender, EventArgs e){string s = "";if (openFileDialog1.ShowDialog() == DialogResult.OK){s = openFileDialog1.FileName;}else{return;}// 在此处放置⽤户代码以初始化页⾯Word.ApplicationClass word = new Word.ApplicationClass();Type wordType = word.GetType();Word.Documents docs = word.Documents;// 打开⽂件Type docsType = docs.GetType();object fileName = s;Word.Document doc = (Word.Document)docsType.InvokeMember("Open",System.Reflection.BindingFlags.InvokeMethod, null, docs, new Object[] { fileName, false, false }); // 转换格式，另存为Type docType = doc.GetType();object saveFileName = "d:\\Reports\\aaa.doc";//下⾯是Microsoft Word 9 Object Library的写法，如果是10，可能写成：/*docType.InvokeMember("SaveAs", System.Reflection.BindingFlags.InvokeMethod,null, doc, new object[]{saveFileName, Word.WdSaveFormat.wdFormatFilteredHTML});*////其它格式：///wdFormatHTML///wdFormatDocument///wdFormatDOSText///wdFormatDOSTextLineBreaks///wdFormatEncodedText///wdFormatRTF///wdFormatTemplate///wdFormatText///wdFormatTextLineBreaks///wdFormatUnicodeTextdocType.InvokeMember("SaveAs", System.Reflection.BindingFlags.InvokeMethod,null, doc, new object[] { saveFileName, Word.WdSaveFormat.wdFormatDocument });// 退出 WordwordType.InvokeMember("Quit", System.Reflection.BindingFlags.InvokeMethod,null, word, null);}private void WordConvert(string s){oWord.ApplicationClass word = new Microsoft.Office.Interop.Word.ApplicationClass();Type wordType = word.GetType();//打开WORD⽂档/*对应脚本中的var word = new ActiveXObject("Word.Application");var doc = word.Documents.Open(docfile);*/oWord.Documents docs = word.Documents;Type docsType = docs.GetType();object objDocName =s;oWord.Document doc = (oWord.Document)docsType.InvokeMember("Open", System.Reflection.BindingFlags.InvokeMethod, null, docs, new Object[] { objDocName, true, true });//打印输出到指定⽂件//你可以使⽤ doc.PrintOut();⽅法,次⽅法调⽤中的参数设置较繁琐,建议使⽤ Type.InvokeMember 来调⽤时可以不⽤将PrintOut的参数设置全,只设置4个主要参数Type docType = doc.GetType();object printFileName = @"c:\aaa.ps";docType.InvokeMember("PrintOut", System.Reflection.BindingFlags.InvokeMethod, null, doc, new object[] { false, false, oWord.WdPrintOutRange.wdPrintAllDocument, printFileName });//new object[]{false,false,oWord.WdPrintOutRange.wdPrintAllDocument,printFileName}//对应脚本中的word.PrintOut(false, false, 0, psfile);的参数//退出WORD//对应脚本中的word.Quit();wordType.InvokeMember("Quit", System.Reflection.BindingFlags.InvokeMethod, null, word, null);object o1 = "c:\\aaa.ps";object o2 = "c:\\aaa.pdf";object o3 = "";//引⽤将PS转换成PDF的对象//try catch之间对应的是脚本中的 PDF.FileToPDF(psfile,pdffile,""); //你可以使⽤ pdfConvert.FileToPDF("c:\\test.ps","c:\\test.pdf","");这样的转换⽅法,本⼈只是为了保持与WORD相同的调⽤⽅式 try{ACRODISTXLib.PdfDistillerClass pdf = new ACRODISTXLib.PdfDistillerClass();Type pdfType = pdf.GetType();pdfType.InvokeMember("FileToPDF", System.Reflection.BindingFlags.InvokeMethod, null, pdf, new object[] { o1, o2, o3 });pdf = null;}catch { } //读者⾃⼰补写错误处理//为防⽌本⽅法调⽤多次时发⽣错误,必须停⽌acrodist.exe进程foreach (System.Diagnostics .Process proc in System.Diagnostics.Process.GetProcesses()){int begpos;int endpos;string sProcName = proc.ToString();begpos = sProcName.IndexOf("(") + 1;endpos = sProcName.IndexOf(")");sProcName = sProcName.Substring(begpos, endpos - begpos);if (sProcName.ToLower().CompareTo("acrodist") == 0){try{proc.Kill(); //停⽌进程}catch { } //读者⾃⼰补写错误处理break;}}}private void button3_Click(object sender, EventArgs e){if (openFileDialog1.ShowDialog() == DialogResult.OK){string s = openFileDialog1.FileName;WordConvert(s);}}//getnextcodeprivate void button4_Click(object sender, EventArgs e){WorkCell myWorkCell = new WorkCell(textBox2.Text,textBox1.Text);textBox3.Text = myWorkCell.GetNextCode();}}public class WorkCell{private string workCellCode;private string parentCellCode;private string commonCode;private char[] code;private char[] pCode;private char[] standCode;private string s;public WorkCell( string mycode,string parentcode){workCellCode = mycode;parentCellCode = parentcode;standCode = new char[] { '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'W', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z' }; commonCode = Regex.Replace(parentCellCode,@"0+","");code = workCellCode.Substring(commonCode.Length).ToCharArray();}public string WorkCellCode{set{workCellCode = value;}get{return workCellCode;}}public string ParentCellCode{set{workCellCode = value;}get{return workCellCode;}}public string GetNextCode(){string s="";if (code.Length > 0){int i = 0;for (i = code.Length - 1; i >= 0; i--){if (code[i] != '0'){GetNextChar(i);break;}}for(i=0;i<code.Length;i++){s+=code[i].ToString();}return commonCode + s;}else{return "null";}}//设置code中的下⼀个代码，从右边起，找到第⼀个⾮0字符，将其按标准代码⾃加1，溢出则进位private char GetNextChar(int j){int i = -1;int flag = 0;for (i = 0; i < standCode.Length; i++){if (code[j] == standCode[i]){flag = 1;break;}}//MessageBox.Show(code[j].ToString()+" "+standCode[i].ToString()+" "+i.ToString());if (i >= standCode.Length-1 || flag==0){code[j] = standCode[0];if (j > 0)code[j - 1] = GetNextChar(j - 1);}else{code[j] = standCode[i + 1];}return code[j];}}}希望本⽂所述对⼤家的C#程序设计有所帮助。

批量转换HTML文件转换成WORD文档

批量转换HTML文件转换成WORD文档批量转换HTML文件转换成WORD文档如果一个一个转换将会是一项浩大的工程，其实，利用Word 2003的“转换向导”就可以实现文档格式的批量转换。

step1.启动“转换向导”。

运行Word 2003，单击“文件”→“新建”，打开“新建文档”任务窗格，单击“本机上的模板…”（右侧导航栏），弹出“模板”对话框，切换到“其他文档”标签卡，双击列表中的“转换向导”，启动“转换向导”。

step2.选择转换格式。

单击“下一步”，设置转换格式，这里有两个转换选项，一个是将其他格式的文档转换为Word文档格式，另一个是将Word文档格式转换为其他文件格式，大家可以根据操作过程中实际需要进行选择，如果需要将Word文档格式转换为纯文本文件格式，这里就应该选择“从Word文档格式转换为其他文件格式”，并在下拉列表中选中“纯文本”。

step3.选择转换文件。

接下来向导要求你选择源文件夹和目标文件夹，点击“源文件夹”右侧的“浏览”按钮，找到需要转换的文档所在的文件夹；单击“目标文件夹”右侧的“浏览”按钮，为转换后的文件指定一个文件夹。

单击“下一步”，双击“可用文件”列表中需要转换的DOC文件，将文件添加到“转换文件”列表中（单击“全选”按钮可以将所有文件一次性添加到列表中）。

step4.完成文件转换。

将需要转换的文件全部添加列表中后，单击“完成”按钮，系统开始转换文件，转换时会显示进度、已转换文件数量，并将转换后的文件以设定的格式自动存入指定的文件夹中。

Word 2003的“转换向导”不仅可以把DOC格式的文档转换成TXT格式的文本文件，只要在“选择转换格式”窗口中进行设置，还可以把HTML文档、Outlook 通讯簿、RTF格式文档、WPS文件、文本文档等转换为DOC文档，或者把Word 文档转换成HTML文档、MS DOS文本、RTF等格式的文档。

JAVA：借用OpenOffice将上传的Word文档转换成Html格式

本文由我司收集整编，推荐下载，如有疑问，请与我司联系JAVA：借用OpenOffice 将上传的Word 文档转换成Html 格式2013/09/24 0 为什么会想起来将上传的word 文档转换成html 式呢？设想，如果一个系统需要发布在页面的文章都是来自word 文档，一般会执行下面的流程：使用word 打开文档，Ctrl A，进入发布文章页面，Ctrl V。

看起来也不麻烦，但是，如果文档中包含大量图片呢？尴尬的事是图片都需要重新上传吧？如果可以将已经编写好的word 文档上传到服务器就可以在相应页面进行展示，将会是一件非常惬意的事情，最起码信息发布人员会很开心。

程序员可能就不会这么想了，囧。

将Word 转Html 的原理是这样的：1、客户上传Word 文档到服务器2、服务器调用OpenOffice 程序打开上传的Word 文档3、OpenOffice 将Word 文档另存为Html 式4、Over至此可见，这要求服务器端安装OpenOffice 软件，其实也可以是MS Office，不过OpenOffice 的优势是跨平台，你懂的。

恩，说明一下，本文的测试基于MS Win7 Ultimate X64 系统。

下面就是规规矩矩的实现。

1、下载OpenOffice，download.openoffice/index.htmlSo easy...2、下载Jodconverter artofsolving/opensource/jodconverter 这是一个开启OpenOffice进行式转化的第三方jar 包。

3、泡杯热茶，等待下载。

4、安装OpenOffice，安装结束后，调用cmd，启动OpenOffice 的一项服务：C:\Program Files (x86)\OpenOffice 3\program soffice -headless -accept=socket,port=8100;urp;5、打开eclipse6、喝杯热茶，等待eclipse 打开。

htmldocx.asblob原理

htmldocx.asblob原理
htmldocx.asblob是一种将HTML文档转换为Word文档的方法，可以将HTML内容以Word文档的形式进行导出和保存。

其
原理是将HTML代码转换为Office Open XML格式的文档，
其中使用到了相关的HTML和CSS解析器和处理器。

具体原理如下：
1. 传入HTML字符串：将HTML字符串作为参数传入htmldocx.asblob方法中。

2. 创建文档对象：使用相关的HTML解析器将HTML字符串
解析为一个DOM树结构，然后创建一个Word文档对象。

3. 样式处理：将HTML中的CSS样式转换为Word文档中的
相应样式，例如字体、字号、颜色、对齐方式等。

4. 内容处理：解析HTML中的标签，并根据每个标签的语义
和属性生成相应的Word文档内容，包括文字、图片、表格、
超链接等。

5. 导出为二进制格式：将生成的Word文档对象转换为二进制
格式，即Office Open XML格式，以便于保存为.docx文件或
进行其他处理。

总之，htmldocx.asblob方法通过解析HTML代码，将其转换
为Word文档对象，并最终导出为二进制格式的Word文档，实现了将HTML内容转换为Word文档的功能。

java平台,使用openoffice将word转换为html

java环境下将word转换为html目前没有很简单的方法。

使用openOffice实现应该算是“矬子里面拔大个”。

1，首先下载openOffice。

这是个第三方开源的项目，专门用于在java环境中进行类似word 文档编写（要是连个word编辑都做不出来，那java在外行心目中地位就蹭蹭地下去了）。

我下载的是 3.2版本。

2，下载后安装。

通过cmd进入“安装目录\ 3\program”文件夹下。

运行一下命令soffice -headless -accept="socket,host=127.0.0.1,port=8100;urp;" -nofirststartwizard意思是启动openoffice的一个服务，以备为其他程序使用（看看咱们的开源领袖多大方，不像微软那么小气，生怕自己的用）。

3，测试一下8100端口是否能使用。

cmd命令“telnet localhost 8100”，如果开启了，就会有黑的不能再黑的屏幕显现，如果没开启，就会出现连接不上的消息。

4，下载jodconverter项目，我下的是2.2.2版本。

（咱就不重复制造轮子了，直接就上车吧！）5，自己创建项目，把jodconverter文件夹lib中的所有jar包都引用一下。

然后写下以下代码public static void main(String args[]) {File inputFile = new File("D:\\test\\广告测试.doc");File outputFile = new File("D:\\test\\广告测试.html");OpenOfficeConnection connection = new SocketOpenOfficeConnection(8100);try{connection.connect();}catch(Exception e){e.printStackTrace();}DocumentConverter converter = new OpenOfficeDocumentConverter(connection);converter.convert(inputFile, outputFile);connection.disconnect();}然后就运行一下，应该没什么问题。

PHP调用OpenOffice实现word转PDF的方法

PHP调⽤OpenOffice实现word转PDF的⽅法最近⼀直在研究PHP word⽂档转PDF，也在⽹上搜索了很多类似的资料，⼤多数都是通过OpenOffice进⾏转换的。

核⼼的代码如下：function MakePropertyValue($name,$value,$osm){$oStruct = $osm->Bridge_GetStruct("com.sun.star.beans.PropertyValue");$oStruct->Name = $name;$oStruct->Value = $value;return $oStruct;}function word2pdf($doc_url, $output_url){$osm = new COM("com.sun.star.ServiceManager") or die ("Please be sure that is installed.n");$args = array(MakePropertyValue("Hidden",true,$osm));$oDesktop = $osm->createInstance("com.sun.star.frame.Desktop");$oWriterDoc = $oDesktop->loadComponentFromURL($doc_url,"_blank", 0, $args);$export_args = array(MakePropertyValue("FilterName","writer_pdf_Export",$osm));$oWriterDoc->storeToURL($output_url,$export_args);$oWriterDoc->close(true);}$doc_file=dirname(__FILE__)."/11.doc"; //源⽂件，DOC或者WPS都可以$output_file=dirname(__FILE__)."/11.pdf"; //欲转PDF的⽂件名$doc_file = "file:///" . $doc_file;$output_file = "file:///" . $output_file;$document->word2pdf($doc_file,$output_file);⽤上述发现代码⼀直在报错( ! ) Fatal error: Uncaught exception 'com_exception' with message '<b>Source:</b> [automation bridge] <br/><b>Description:</b> com.sun.star.task.ErrorCodeIOException: ' in I:\phpStudy\WWW\DocPreview\test2.php on line 27( ! ) com_exception: <b>Source:</b> [automation bridge] <br/><b>Description:</b> com.sun.star.task.ErrorCodeIOException: in I:\phpStudy\WWW\DocPreview\test2.php on line 27最后发现原来是转出路径的问题：通过调试得出上述代码的转出路径$output_file 是file:///I:\phpStudy\WWW\DocPreview\sdds.pdf。

word转换成html的方法

word转换成html的⽅法之前⽤到了word转换成HTMl的做法，⽹上找过⼀段类似的代码，后发现好多不能执⾏，调试了半天才最终搞定。

⽅法接参数是word⽂件路径。

执⾏这样的代码，需要添加 Microsoft.Office.Interop.Word.dll引⽤。

///<summary>/// Word转换为HTML///</summary>///<param name="path"></param>private string WordToHTMl(ref string path){string pageUrl = null;object fltDocFormat = 10; //For filtered HTML Outputobject missing = System.Reflection.Missing.Value;object readOnly = false; //Open file in readOnly modeobject isVisible = false;//The process has to be in invisible modeobject fileName = path;//获取⽹站服务器端的路径string newPath = System.Web.HttpContext.Current.Server.MapPath("/ConvertFiles/");string newFileName = System.IO.Path.GetFileNameWithoutExtension(fileName.ToString());object saveFileName = newPath + savePath + newFileName + ".html";//判断⽂件是否被转换if (!System.IO.File.Exists(SaveFileName)){Microsoft.Office.Interop.Word.ApplicationClass objWord = new Microsoft.Office.Interop.Word.ApplicationClass();//打开word⽂档objWord.Documents.Open(ref fileName, ref readOnly, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref isVisible, ref missing, ref missing, ref missing, ref missing, //Do the background activityobjWord.Visible = false;Microsoft.Office.Interop.Word.Document oDoc = objWord.ActiveDocument;oDoc.SaveAs(ref saveFileName, ref fltDocFormat, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing); oDoc.Close(ref missing, ref missing, ref missing);objWord.Quit(ref missing, ref missing, ref missing);//Process[] myProcesses = Process.GetProcessesByName("WINWORD");//foreach (Process myProcess in myProcesses)//{// myProcess.Kill();//}// string pageUrl = newFileName + ".html";// System.Web.HttpContext.Current.Server.Transfer(pageUrl);}else{//页⾯路径pageUrl = IpPath + FileName + ".html";}return pageUrl;}。

java使用POI实现html和word相互转换

java使⽤POI实现html和word相互转换项⽬后端使⽤了springboot，maven，前端使⽤了ckeditor富⽂本编辑器。

⽬前从html转换的word为doc格式，⽽图⽚处理⽀持的是docx格式，所以需要⼿动把doc另存为docx，然后才可以进⾏图⽚替换。

⼀.添加maven依赖主要使⽤了以下和poi相关的依赖，为了便于获取html的图⽚元素，还使⽤了jsoup：<dependency><groupId>org.apache.poi</groupId><artifactId>poi</artifactId><version>3.14</version></dependency><dependency><groupId>org.apache.poi</groupId><artifactId>poi-scratchpad</artifactId><version>3.14</version></dependency><dependency><groupId>org.apache.poi</groupId><artifactId>poi-ooxml</artifactId><version>3.14</version></dependency><dependency><groupId>fr.opensagres.xdocreport</groupId><artifactId>xdocreport</artifactId><version>1.0.6</version></dependency><dependency><groupId>org.apache.poi</groupId><artifactId>poi-ooxml-schemas</artifactId><version>3.14</version></dependency><dependency><groupId>org.apache.poi</groupId><artifactId>ooxml-schemas</artifactId><version>1.3</version></dependency><dependency><groupId>org.jsoup</groupId><artifactId>jsoup</artifactId><version>1.11.3</version></dependency>⼆.word转换为html在springboot项⽬的resources⽬录下新建static⽂件夹，将需要转换的word⽂件temp.docx粘贴进去，由于static是springboot的默认资源⽂件，所以不需要在配置⽂件⾥⾯另⾏配置了，如果改成其他名字，需要在application.yml进⾏相应配置。

document.open()用法

document.open()用法
()是JavaScript在HTML文档中打开一个新的或者替换当前的文档的方法。

这个方法通常用于创建新的HTML文档，或者输出HTML内容。

以下是一些基本用法：
1、创建一个新的HTML文档：
在这个例子中，()方法打开一个新的HTML文档，然后使用()方法将HTML内容写入到新打开的文档中。

2、替换当前的HTML文档：
在这个例子中，第二个参数"replace"告诉浏览器替换当前的HTML 文档。

如果省略第二个参数，浏览器将创建一个新的HTML文档。

请注意，一旦使用()方法打开了一个新的HTML文档，原来的文档内容将被清空。

因此，使用这个方法需要谨慎，确保大家知道自己在做什么，并且有备份原始文档内容的措施。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

为什么会想起来将上传的word文档转换成html格式呢？设想，如果一个系统需要发布在页面的文章都是来自word文档，一般会执行下面的流程：使用word打开文档，Ctrl+A，进入发布文章页面，Ctrl+V。

看起来也不麻烦，但是，如果文档中包含大量图片呢？尴尬的事是图片都需要重新上传吧？如果可以将已经编写好的word文档上传到服务器就可以在相应页面进行展示，将会是一件非常惬意的事情，最起码信息发布人员会很开心。

程序员可能就不会这么想了，囧。

将Word转Html的原理是这样的：1、客户上传Word文档到服务器2、服务器调用OpenOffice程序打开上传的Word文档3、OpenOffice将Word文档另存为Html格式4、Over至此可见，这要求服务器端安装OpenOffice软件，其实也可以是MS Office，不过OpenOffice 的优势是跨平台，你懂的。

恩，说明一下，本文的测试基于MS Win7 Ultimate X64 系统。

下面就是规规矩矩的实现。

1、下载OpenOffice，/index.html So easy...2、下载Jodconverter /opensource/jodconverter 这是一个开启OpenOffice进行格式转化的第三方jar包。

3、泡杯热茶，等待下载。

4、安装OpenOffice，安装结束后，调用cmd，启动OpenOffice的一项服务：C:\Program Files (x86)\ 3\program>soffice -headless -accept="socket,port=8100;urp;"5、打开eclipse6、喝杯热茶，等待eclipse打开。

7、新建eclipse项目，导入Jodconverter/lib 下得jar包。

8、Coding...查看代码package com.mzule.doc2html.util;import java.io.BufferedReader;import java.io.File;import java.io.FileInputStream;import java.io.FileNotFoundException;import java.io.IOException;import java.io.InputStreamReader;import .ConnectException;import java.util.Date;import java.util.regex.Matcher;import java.util.regex.Pattern;import com.artofsolving.jodconverter.DocumentConverter;importcom.artofsolving.jodconverter.openoffice.connection.OpenOfficeConnection;importcom.artofsolving.jodconverter.openoffice.connection.SocketOpenOfficeConnection;importcom.artofsolving.jodconverter.openoffice.converter.OpenOfficeDocumentConverter;/*** 将Word文档转换成html字符串的工具类** @author MZULE**/publicclass Doc2Html {publicstaticvoid main(String[] args) {System.out.println(toHtmlString(new File("C:/test/test.doc"),"C:/test"));}/*** 将word文档转换成html文档** @param docFile* 需要转换的word文档* @param filepath* 转换之后html的存放路径* @return转换之后的html文件*/publicstatic File convert(File docFile, String filepath) {// 创建保存html的文件File htmlFile = new File(filepath + "/" + new Date().getTime()+ ".html");// 创建Openoffice连接OpenOfficeConnection con =new SocketOpenOfficeConnection(8100);try{//连接con.connect();}catch(ConnectException e) {System.out.println("获取OpenOffice连接失败...");e.printStackTrace();}//创建转换器DocumentConverter converter =new OpenOfficeDocumentConverter(con);//转换文档问htmlconverter.convert(docFile, htmlFile);//关闭openoffice连接con.disconnect();return htmlFile;}/*** 将word转换成html文件，并且获取html文件代码。

**@param docFile* 需要转换的文档*@param filepath* 文档中图片的保存位置*@return转换成功的html代码*/public static String toHtmlString(File docFile, String filepath) {//转换word文档File htmlFile = convert(docFile, filepath);//获取html文件流StringBufferhtmlSb =new StringBuffer();try{BufferedReaderbr =new BufferedReader(new InputStreamReader(new FileInputStream(htmlFile)));while(br.ready()) {htmlSb.append(br.readLine());}br.close();//删除临时文件htmlFile.delete();}catch(FileNotFoundException e) {e.printStackTrace();}catch(IOException e) {e.printStackTrace();}//HTML文件字符串String htmlStr = htmlSb.toString();//返回经过清洁的html文本return clearFormat(htmlStr, filepath);}/*** 清除一些不需要的html标记**@param htmlStr* 带有复杂html标记的html语句*@return去除了不需要html标记的语句*/protected static String clearFormat(String htmlStr, String docImgPath) {//获取body内容的正则String bodyReg = "<BODY .*</BODY>";Pattern bodyPattern = pile(bodyReg);Matcher bodyMatcher = bodyPattern.matcher(htmlStr);if(bodyMatcher.find()) {//获取BODY内容，并转化BODY标签为DIVhtmlStr = bodyMatcher.group().replaceFirst("<BODY", "<DIV").replaceAll("</BODY>", "</DIV>");}//调整图片地址htmlStr = htmlStr.replaceAll("<IMG SRC=\"", "<IMG SRC=\"" + docImgPath+ "/");//把<P></P>转换成</div></div>保留样式//content = content.replaceAll("(<P)([^>]*>.*?)(<\\/P>)",//"<div$2</div>");//把<P></P>转换成</div></div>并删除样式htmlStr = htmlStr.replaceAll("(<P)([^>]*)(>.*?)(<\\/P>)", "<p$3</p>");//删除不需要的标签htmlStr = htmlStr.replaceAll("<[/]?(font|FONT|span|SPAN|xml|XML|del|DEL|ins|INS|meta|META|[ovwxpOVWXP]:\\w+) [^>]*?>","");//删除不需要的属性htmlStr = htmlStr.replaceAll("<([^>]*)(?:lang|LANG|class|CLASS|style|STYLE|size|SIZE|face|FACE|[ovwxpOVWXP]:\\ w+)=(?:'[^']*'|\"\"[^\"\"]*\"\"|[^>]+)([^>]*)>","<$1$2>");return htmlStr;}}。