org.apache.lenya.lucene.html
Class HtmlDocument

java.lang.Object
  extended by org.apache.lenya.lucene.html.HtmlDocument

public class HtmlDocument
extends java.lang.Object

The HtmlDocument class creates a Lucene Document from an HTML document.

It does this by using JTidy package. It can take input input from File or InputStream.


Constructor Summary
HtmlDocument(java.io.File file)
          Constructs an HtmlDocument from a File.
HtmlDocument(java.io.InputStream is)
          Constructs an HtmlDocument from an InputStream.
 
Method Summary
static org.apache.lucene.document.Document Document(java.io.File file)
          Creates a Lucene Document from a File.
 java.lang.String getBody()
          Gets the body text attribute of the HtmlDocument object.
static org.apache.lucene.document.Document getDocument(java.io.InputStream is)
          Creates a Lucene Document from an InputStream.
 java.lang.String getTitle()
          Gets the title attribute of the HtmlDocument object.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HtmlDocument

public HtmlDocument(java.io.File file)
             throws java.io.IOException
Constructs an HtmlDocument from a File.

Parameters:
file - the File containing the HTML to parse
Throws:
java.io.IOException - if an I/O exception occurs

HtmlDocument

public HtmlDocument(java.io.InputStream is)
             throws java.io.IOException
Constructs an HtmlDocument from an InputStream.

Parameters:
is - the InputStream containing the HTML
Throws:
java.io.IOException - if I/O exception occurs
Method Detail

getDocument

public static org.apache.lucene.document.Document getDocument(java.io.InputStream is)
                                                       throws java.io.IOException
Creates a Lucene Document from an InputStream.

Parameters:
is -
Returns:
org.apache.lucene.document.Document
Throws:
java.io.IOException

Document

public static org.apache.lucene.document.Document Document(java.io.File file)
                                                    throws java.io.IOException
Creates a Lucene Document from a File.

Parameters:
file -
Returns:
org.apache.lucene.document.Document
Throws:
java.io.IOException

getTitle

public java.lang.String getTitle()
Gets the title attribute of the HtmlDocument object.

Returns:
the title value

getBody

public java.lang.String getBody()
Gets the body text attribute of the HtmlDocument object.

Returns:
the body text value


Copyright © 1999-2005 Apache Software Foundation. All Rights Reserved.