org.apache.lenya.cms.cocoon.generation
Class LinkStatusGenerator

java.lang.Object
  extended by org.apache.avalon.framework.logger.AbstractLogEnabled
      extended by org.apache.cocoon.xml.AbstractXMLProducer
          extended by org.apache.cocoon.generation.AbstractGenerator
              extended by org.apache.cocoon.generation.ServiceableGenerator
                  extended by org.apache.lenya.cms.cocoon.generation.LinkStatusGenerator
All Implemented Interfaces:
org.apache.avalon.excalibur.pool.Poolable, org.apache.avalon.excalibur.pool.Recyclable, org.apache.avalon.framework.activity.Disposable, org.apache.avalon.framework.component.Component, org.apache.avalon.framework.configuration.Configurable, org.apache.avalon.framework.logger.LogEnabled, org.apache.avalon.framework.service.Serviceable, org.apache.cocoon.generation.Generator, org.apache.cocoon.sitemap.SitemapModelComponent, org.apache.cocoon.xml.XMLProducer

public class LinkStatusGenerator
extends org.apache.cocoon.generation.ServiceableGenerator
implements org.apache.avalon.excalibur.pool.Recyclable, org.apache.avalon.framework.configuration.Configurable

Generates a list of links that are reachable from the src and their status.

  <map:generator name="linkStatus" src="org.apache.lenya.cms.cocoon.generation.LinkStatusGenerator"/>

   <map:generate type="linkStatus" src="/{pubid}/{area}/{doc-id}.html">
      <map:parameter name="depth" value="1"/>
   </map:generate>
 


Field Summary
static java.lang.String ACCEPT_CONFIG
          Config element name specifying http header value for accept.
static java.lang.String ACCEPT_DEFAULT
          Default value of accept configuration value.
protected  org.xml.sax.helpers.AttributesImpl attributes
           
protected static java.lang.String CONTENT_ATTR_NAME
           
protected  int depth
          The depth parameter determines how deep the EnhancedLinkStatusGenerator should delve.
static java.lang.String EXCLUDE_CONFIG
          Config element name specifying excluding regular expression pattern.
protected static java.lang.String HREF_ATTR_NAME
           
static java.lang.String INCLUDE_CONFIG
          Config element name specifying including regular expression pattern.
protected  org.apache.excalibur.source.Source inputSource
           
static java.lang.String LINK_CONTENT_TYPE_CONFIG
          Config element name specifying expected link content-typ.
 java.lang.String LINK_CONTENT_TYPE_DEFAULT
          Default value of link-content-type configuration value.
protected static java.lang.String LINK_NODE_NAME
           
static java.lang.String LINK_VIEW_QUERY_CONFIG
          Config element name specifying query-string appendend for requesting links of an URL.
static java.lang.String LINK_VIEW_QUERY_DEFAULT
          Default value of link-view-query configuration value.
protected static java.lang.String MESSAGE_ATTR_NAME
           
protected static java.lang.String PREFIX
          The namespace prefix for this namespace.
protected static java.lang.String REFERRER_ATTR_NAME
           
protected static java.lang.String STATUS_ATTR_NAME
           
protected static java.lang.String TOP_NODE_NAME
           
protected static java.lang.String URI
          The URI of the namespace of this generator.
static java.lang.String USER_AGENT_CONFIG
          Config element name specifying http header value for user-Agent.
static java.lang.String USER_AGENT_DEFAULT
          Default value of user-agent configuration value.
 
Fields inherited from class org.apache.cocoon.generation.ServiceableGenerator
manager
 
Fields inherited from class org.apache.cocoon.generation.AbstractGenerator
objectModel, parameters, resolver, source
 
Fields inherited from class org.apache.cocoon.xml.AbstractXMLProducer
contentHandler, EMPTY_CONTENT_HANDLER, lexicalHandler, xmlConsumer
 
Fields inherited from interface org.apache.cocoon.generation.Generator
ROLE
 
Constructor Summary
LinkStatusGenerator()
           
 
Method Summary
 void configure(org.apache.avalon.framework.configuration.Configuration configuration)
          Configure the crawler component.
 void generate()
          Generate XML data.
protected  java.util.List getLinksFromConnection(java.lang.String url_link_string, java.lang.String url_of_referrer, int referrerDepth)
          Retrieve a list of links of a url
protected  java.lang.String processURL(java.lang.String uri, java.lang.String referrer, int referrerDepth)
          Generate xml attributes of a url, calculate url for retrieving links
 void recycle()
           
 void setup(org.apache.cocoon.environment.SourceResolver resolver, java.util.Map objectModel, java.lang.String src, org.apache.avalon.framework.parameters.Parameters par)
           
 
Methods inherited from class org.apache.cocoon.generation.ServiceableGenerator
dispose, service
 
Methods inherited from class org.apache.cocoon.xml.AbstractXMLProducer
setConsumer, setContentHandler, setLexicalHandler
 
Methods inherited from class org.apache.avalon.framework.logger.AbstractLogEnabled
enableLogging, getLogger, setupLogger, setupLogger, setupLogger
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.cocoon.xml.XMLProducer
setConsumer
 

Field Detail

URI

protected static final java.lang.String URI
The URI of the namespace of this generator.

See Also:
Constant Field Values

PREFIX

protected static final java.lang.String PREFIX
The namespace prefix for this namespace.

See Also:
Constant Field Values

TOP_NODE_NAME

protected static final java.lang.String TOP_NODE_NAME
See Also:
Constant Field Values

LINK_NODE_NAME

protected static final java.lang.String LINK_NODE_NAME
See Also:
Constant Field Values

HREF_ATTR_NAME

protected static final java.lang.String HREF_ATTR_NAME
See Also:
Constant Field Values

REFERRER_ATTR_NAME

protected static final java.lang.String REFERRER_ATTR_NAME
See Also:
Constant Field Values

CONTENT_ATTR_NAME

protected static final java.lang.String CONTENT_ATTR_NAME
See Also:
Constant Field Values

STATUS_ATTR_NAME

protected static final java.lang.String STATUS_ATTR_NAME
See Also:
Constant Field Values

MESSAGE_ATTR_NAME

protected static final java.lang.String MESSAGE_ATTR_NAME
See Also:
Constant Field Values

attributes

protected org.xml.sax.helpers.AttributesImpl attributes

LINK_CONTENT_TYPE_CONFIG

public static final java.lang.String LINK_CONTENT_TYPE_CONFIG
Config element name specifying expected link content-typ.

Its value is link-content-type.

Since:
See Also:
Constant Field Values

LINK_CONTENT_TYPE_DEFAULT

public final java.lang.String LINK_CONTENT_TYPE_DEFAULT
Default value of link-content-type configuration value.

Its value is application/x-cocoon-links.

Since:
See Also:
Constant Field Values

LINK_VIEW_QUERY_CONFIG

public static final java.lang.String LINK_VIEW_QUERY_CONFIG
Config element name specifying query-string appendend for requesting links of an URL.

Its value is link-view-query.

Since:
See Also:
Constant Field Values

LINK_VIEW_QUERY_DEFAULT

public static final java.lang.String LINK_VIEW_QUERY_DEFAULT
Default value of link-view-query configuration value.

Its value is ?cocoon-view=links.

Since:
See Also:
Constant Field Values

EXCLUDE_CONFIG

public static final java.lang.String EXCLUDE_CONFIG
Config element name specifying excluding regular expression pattern.

Its value is exclude.

Since:
See Also:
Constant Field Values

INCLUDE_CONFIG

public static final java.lang.String INCLUDE_CONFIG
Config element name specifying including regular expression pattern.

Its value is include.

Since:
See Also:
Constant Field Values

USER_AGENT_CONFIG

public static final java.lang.String USER_AGENT_CONFIG
Config element name specifying http header value for user-Agent.

Its value is user-agent.

Since:
See Also:
Constant Field Values

USER_AGENT_DEFAULT

public static final java.lang.String USER_AGENT_DEFAULT
Default value of user-agent configuration value.

Since:
See Also:
Constants.COMPLETE_NAME

ACCEPT_CONFIG

public static final java.lang.String ACCEPT_CONFIG
Config element name specifying http header value for accept.

Its value is accept.

Since:
See Also:
Constant Field Values

ACCEPT_DEFAULT

public static final java.lang.String ACCEPT_DEFAULT
Default value of accept configuration value.

Its value is * / *

Since:
See Also:
Constant Field Values

depth

protected int depth
The depth parameter determines how deep the EnhancedLinkStatusGenerator should delve.


inputSource

protected org.apache.excalibur.source.Source inputSource
Constructor Detail

LinkStatusGenerator

public LinkStatusGenerator()
Method Detail

configure

public void configure(org.apache.avalon.framework.configuration.Configuration configuration)
               throws org.apache.avalon.framework.configuration.ConfigurationException
Configure the crawler component.

Configure can specify which URI to include, and which URI to exclude from crawling. You specify the patterns as regular expressions.

Morover you can configure the required content-type of crawling request, and the query-string appended to each crawling request.


 <include>.*\.html?</include> or <include>.*\.html?, .*\.xsp</include>
 <exclude>.*\.gif</exclude> or <exclude>.*\.gif, .*\.jpe?g</exclude>
 <link-content-type> application/x-cocoon-links </link-content-type>
 <link-view-query> ?cocoon-view=links </link-view-query>
 <user-agent> Cocoon </user-agent>
 <accept> text/xml </accept>
 

Specified by:
configure in interface org.apache.avalon.framework.configuration.Configurable
Parameters:
configuration - XML configuration of this avalon component.
Throws:
org.apache.avalon.framework.configuration.ConfigurationException - is throwing if configuration is invalid.
Since:

setup

public void setup(org.apache.cocoon.environment.SourceResolver resolver,
                  java.util.Map objectModel,
                  java.lang.String src,
                  org.apache.avalon.framework.parameters.Parameters par)
           throws org.apache.cocoon.ProcessingException,
                  org.xml.sax.SAXException,
                  java.io.IOException
Specified by:
setup in interface org.apache.cocoon.sitemap.SitemapModelComponent
Overrides:
setup in class org.apache.cocoon.generation.AbstractGenerator
Throws:
org.apache.cocoon.ProcessingException
org.xml.sax.SAXException
java.io.IOException

generate

public void generate()
              throws org.xml.sax.SAXException,
                     org.apache.cocoon.ProcessingException
Generate XML data.

Specified by:
generate in interface org.apache.cocoon.generation.Generator
Throws:
org.xml.sax.SAXException - if an error occurs while outputting the document
org.apache.cocoon.ProcessingException - if the requsted URI wasn't found

getLinksFromConnection

protected java.util.List getLinksFromConnection(java.lang.String url_link_string,
                                                java.lang.String url_of_referrer,
                                                int referrerDepth)
Retrieve a list of links of a url

Parameters:
url_link_string - url for requesting links, it is assumed that url_link_string queries the cocoon view links, ie of the form http://host/foo/bar?cocoon-view=links
url_of_referrer - base url of which links are requested, ie of the form http://host/foo/bar
Returns:
List of links from url_of_referrer, as result of requesting url url_link_string

processURL

protected java.lang.String processURL(java.lang.String uri,
                                      java.lang.String referrer,
                                      int referrerDepth)
                               throws org.xml.sax.SAXException
Generate xml attributes of a url, calculate url for retrieving links

Parameters:
url - to process
referrer - of the url
Returns:
String url for retrieving links, or null if url is an excluded-url, and not an included-url.
Throws:
org.xml.sax.SAXException

recycle

public void recycle()
Specified by:
recycle in interface org.apache.avalon.excalibur.pool.Recyclable
Overrides:
recycle in class org.apache.cocoon.generation.AbstractGenerator


Copyright $ Apache Software Foundation. All Rights Reserved.