现在的位置: 首页 > 系统运维 > Windows > 正文

【翻译自mos文章】 怎么对Microsoft (Office) Word Document 2007 索引化?

2016年04月02日 Windows ⁄ 共 1472字 ⁄ 字号 暂无评论

怎么对Microsoft (Office) Word Document 2007 索引化?
来源于:How To Index a Microsoft (Office) Word Document 2007 ? (文档 ID 752710.1)

适用于:
Oracle Text - Version: 11.1.0.7 to 11.2.0.3 - Release: 11.1 to 11.2
Information in this document applies to any platform.

目标
本文解释了对一个表中 含有 Microsoft Word 2007 document (new Microsoft formatting,DOCX格式)的 blob 列进行索引化的方法。

从Oracle Database 11.1.0.7开始,Oracle Text使用Oracle Outside In HTML Export技术(额外注:Oracle Outside In HTML Export技术来源于Oracle 公司的如下产品线:Middleware > Content Management > Oracle Outside In Technology > )进行文档过滤,该技术替代了Autonomy Inc公司授权给Oracle公司的filtering technology。
因此,这将会允许从Oracle Database 11.1.0.7+开始来对Microsoft (Office) Word 2007 documents进行索引化。
Kindly refer to the Appendix B of Oracle Text Reference for a complete list of filter-supported document formats in 11.1.0.7.

Oracle Text Reference 11g Release 1 (11.1)
Part Number B28304-03
http://download.oracle.com/docs/cd/B28359_01/text.111/b28304/afilsupt.htm#i634493
B.2 Supported Document Formats

解决方案:

请按照下面的步骤来完成对 Microsoft Word 2007 document的搜索

Step 1 - Within the /tmp directory place all the files to be used from this note.

docx1.sql
docx2.sql
test.txt
test.docx

--如上4个文档已经上传到csdn资源中,地址如下:
http://download.csdn.net/download/msdnchina/9480052

Step 2 - Create the necessary schema and privileges

connect system/manager       or as any privileged user...

create user testdocx identified by testdocx;
grant connect, resource, create any directory to testdocx;
connect testdocx/testdocx

Step 3 - Create the necessary objects (refer to the docx1.sql script)...

SQL> @/tmp/docx1.sql

 

Step 4 - Check a couple of terms inside the documents (refer to the docx2.sql script) ...

SQL> @/tmp/docx2.sql