|
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||
java.lang.Objectorg.apache.lenya.lucene.html.HtmlDocument
public class HtmlDocument
The HtmlDocument class creates a Lucene Document
from an HTML document.
It does this by using JTidy package. It can take input input from File or InputStream.
| Constructor Summary | |
|---|---|
HtmlDocument(java.io.File file)
Constructs an HtmlDocument from a File. |
|
HtmlDocument(java.io.InputStream is)
Constructs an HtmlDocument from an InputStream. |
|
| Method Summary | |
|---|---|
static org.apache.lucene.document.Document |
Document(java.io.File file)
Creates a Lucene Document from a File. |
java.lang.String |
getBody()
Gets the body text attribute of the HtmlDocument object. |
static org.apache.lucene.document.Document |
getDocument(java.io.InputStream is)
Creates a Lucene Document from an InputStream. |
java.lang.String |
getTitle()
Gets the title attribute of the HtmlDocument object. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public HtmlDocument(java.io.File file)
throws java.io.IOException
HtmlDocument from a File.
file - the File containing the HTML to parse
java.io.IOException - if an I/O exception occurs
public HtmlDocument(java.io.InputStream is)
throws java.io.IOException
HtmlDocument from an InputStream.
is - the InputStream containing the HTML
java.io.IOException - if I/O exception occurs| Method Detail |
|---|
public static org.apache.lucene.document.Document getDocument(java.io.InputStream is)
throws java.io.IOException
Document from an InputStream.
is -
java.io.IOException
public static org.apache.lucene.document.Document Document(java.io.File file)
throws java.io.IOException
Document from a File.
file -
java.io.IOExceptionpublic java.lang.String getTitle()
HtmlDocument object.
public java.lang.String getBody()
HtmlDocument object.
|
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||