Class WikiConnector

  • All Implemented Interfaces:
    org.apache.manifoldcf.core.interfaces.IConnector, org.apache.manifoldcf.crawler.interfaces.IRepositoryConnector

    public class WikiConnector
    extends org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector
    This is the repository connector for a wiki.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static java.lang.String _rcsid  
      protected java.lang.String accessPassword  
      protected java.lang.String accessRealm  
      protected java.lang.String accessUser  
      protected static java.lang.String[] activitiesList
      Activities list
      protected static java.lang.String ACTIVITY_FETCH
      Fetch activity
      protected java.lang.String baseURL
      Base URL
      protected org.apache.http.conn.HttpClientConnectionManager connectionManager
      Connection management
      protected boolean hasBeenSetup
      Has setup been called?
      protected org.apache.http.client.HttpClient httpClient  
      protected java.lang.String proxyDomain  
      protected java.lang.String proxyHost  
      protected java.lang.String proxyPassword  
      protected java.lang.String proxyPort  
      protected java.lang.String proxyUsername  
      protected java.lang.String server
      Server name
      protected java.lang.String serverDomain  
      protected java.lang.String serverLogin  
      protected java.lang.String serverPass  
      protected java.lang.String userAgent
      The user-agent for this connector instance
      • Fields inherited from class org.apache.manifoldcf.core.connector.BaseConnector

        currentContext, params
      • Fields inherited from interface org.apache.manifoldcf.crawler.interfaces.IRepositoryConnector

        GLOBAL_DENY_TOKEN, JOBMODE_CONTINUOUS, JOBMODE_ONCEONLY, MODEL_ADD, MODEL_ADD_CHANGE, MODEL_ADD_CHANGE_DELETE, MODEL_ALL, MODEL_CHAINED_ADD, MODEL_CHAINED_ADD_CHANGE, MODEL_CHAINED_ADD_CHANGE_DELETE, MODEL_PARTIAL
    • Constructor Summary

      Constructors 
      Constructor Description
      WikiConnector()
      Constructor.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      java.lang.String addSeedDocuments​(org.apache.manifoldcf.crawler.interfaces.ISeedingActivity activities, org.apache.manifoldcf.core.interfaces.Specification spec, java.lang.String lastSeedVersion, long seedTime, int jobMode)
      Queue "seed" documents.
      java.lang.String check()
      Check status of connection.
      void connect​(org.apache.manifoldcf.core.interfaces.ConfigParams configParameters)
      Connect.
      void disconnect()
      Close the connection.
      protected java.lang.String executeListPagesViaThread​(java.lang.String startPageTitle, java.lang.String namespace, java.lang.String prefix, org.apache.manifoldcf.crawler.interfaces.ISeedingActivity activities)
      Execute a listPages() operation via a thread.
      protected static java.lang.String[] getAcls​(org.apache.manifoldcf.core.interfaces.Specification spec)
      Grab forced acl out of document specification.
      java.lang.String[] getActivitiesList()
      List the activities we might report on.
      java.lang.String[] getBinNames​(java.lang.String documentIdentifier)
      For any given document, list the bins that it is a member of.
      protected java.lang.String getCheckURL()
      Get a URL for a check operation.
      protected void getDocInfo​(java.lang.String documentIdentifier, java.lang.String documentVersion, java.lang.String fullURL, org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities, java.lang.String[] allowACL)
      Get document info and index the document.
      protected void getDocURLs​(java.lang.String[] documentIdentifiers, java.util.Map<java.lang.String,​java.lang.String> urls)  
      protected java.lang.String getGetDocInfoURL​(java.lang.String documentIdentifier)
      Create a URL to obtain a page's metadata and content, given the page ID.
      protected java.lang.String getGetDocURLsURL​(java.lang.String[] documentIdentifiers)
      Create a URL to obtain multiple page's urls, given the page IDs.
      protected java.lang.String getGetNamespacesURL()
      Create a URL to obtain the namespaces.
      protected java.lang.String getGetTimestampURL​(java.lang.String[] documentIdentifiers)
      Create a URL to obtain multiple page's timestamps, given the page IDs.
      protected org.apache.http.client.methods.HttpRequestBase getInitializedGetMethod​(java.lang.String URL)
      Create and initialize an HttpRequestBase
      protected org.apache.http.client.methods.HttpRequestBase getInitializedPostMethod​(java.lang.String URL, java.util.Map<java.lang.String,​java.lang.String> params)
      Create an initialize a post method
      protected java.lang.String getListPagesURL​(java.lang.String startingTitle, java.lang.String namespace, java.lang.String prefix)
      Create a URL to obtain the next 500 pages.
      int getMaxDocumentRequest()
      Get the maximum number of documents to amalgamate together into one batch, for this connector.
      protected void getNamespaces​(java.util.Map<java.lang.String,​java.lang.String> namespaces)
      Obtain the set of namespaces, as a map keyed by the canonical namespace name where the value is the descriptive name.
      protected void getSession()  
      protected void getTimestamps​(java.lang.String[] documentIdentifiers, java.util.Map<java.lang.String,​java.lang.String> versions, org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities)
      Obtain document versions for a set of documents.
      protected static void handleException​(java.lang.Throwable thr)  
      protected void listAllPages​(org.apache.manifoldcf.crawler.interfaces.ISeedingActivity activities, java.lang.String namespace, java.lang.String prefix, long startTime, long endTime)
      Perform a series of listPages() operations, so that we fully obtain the documents we're looking for even though we're limited to 500 of them per request.
      protected boolean loginToAPI()
      Log in via the Wiki API.
      void outputConfigurationBody​(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext, org.apache.manifoldcf.core.interfaces.IHTTPOutput out, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.ConfigParams parameters, java.lang.String tabName)
      Output the configuration body section.
      void outputConfigurationHeader​(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext, org.apache.manifoldcf.core.interfaces.IHTTPOutput out, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.ConfigParams parameters, java.util.List<java.lang.String> tabsArray)
      Output the configuration header section.
      void outputSpecificationBody​(org.apache.manifoldcf.core.interfaces.IHTTPOutput out, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.Specification ds, int connectionSequenceNumber, int actualSequenceNumber, java.lang.String tabName)
      Output the specification body section.
      void outputSpecificationHeader​(org.apache.manifoldcf.core.interfaces.IHTTPOutput out, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.Specification ds, int connectionSequenceNumber, java.util.List<java.lang.String> tabsArray)
      Output the specification header section.
      protected static boolean parseCheckResponse​(java.io.InputStream is)
      Parse check response, e.g.:
      protected static boolean parseGetDocURLsResponse​(java.io.InputStream is, java.util.Map<java.lang.String,​java.lang.String> urls)
      This method parses a response like the following:
      protected static boolean parseGetTimestampResponse​(java.io.InputStream is, java.util.Map<java.lang.String,​java.lang.String> versions)
      This method parses a response like the following:
      protected static boolean parseListPagesResponse​(java.io.InputStream is, org.apache.manifoldcf.connectorcommon.common.XThreadStringBuffer buffer, java.lang.String startPageTitle, WikiConnector.ReturnString lastTitle)
      Parse list output, e.g.:
      protected void performCheck()
      Do the check operation.
      void poll()
      This method is periodically called for all connectors that are connected but not in active use.
      java.lang.String processConfigurationPost​(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext, org.apache.manifoldcf.core.interfaces.IPostParameters variableContext, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.ConfigParams parameters)
      Process a configuration post.
      void processDocuments​(java.lang.String[] documentIdentifiers, org.apache.manifoldcf.crawler.interfaces.IExistingVersions statuses, org.apache.manifoldcf.core.interfaces.Specification spec, org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities, int jobMode, boolean usesDefaultAuthority)
      Process a set of documents.
      java.lang.String processSpecificationPost​(org.apache.manifoldcf.core.interfaces.IPostParameters variableContext, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.Specification ds, int connectionSequenceNumber)
      Process a specification post.
      protected static java.lang.String readResponseAsString​(org.apache.http.HttpResponse httpResponse)  
      void viewConfiguration​(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext, org.apache.manifoldcf.core.interfaces.IHTTPOutput out, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.ConfigParams parameters)
      View configuration.
      void viewSpecification​(org.apache.manifoldcf.core.interfaces.IHTTPOutput out, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.Specification ds, int connectionSequenceNumber)
      View specification.
      • Methods inherited from class org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector

        getConnectorModel, getFormCheckJavascriptMethodName, getFormPresaveCheckJavascriptMethodName, getRelationshipTypes, requestInfo
      • Methods inherited from class org.apache.manifoldcf.core.connector.BaseConnector

        clearThreadContext, deinstall, getConfiguration, install, isConnected, outputConfigurationBody, outputConfigurationHeader, outputConfigurationHeader, pack, packFixedList, packList, packList, processConfigurationPost, setThreadContext, unpack, unpackFixedList, unpackList, viewConfiguration
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
      • Methods inherited from interface org.apache.manifoldcf.core.interfaces.IConnector

        clearThreadContext, deinstall, getConfiguration, install, isConnected, setThreadContext
    • Field Detail

      • ACTIVITY_FETCH

        protected static final java.lang.String ACTIVITY_FETCH
        Fetch activity
        See Also:
        Constant Field Values
      • activitiesList

        protected static final java.lang.String[] activitiesList
        Activities list
      • hasBeenSetup

        protected boolean hasBeenSetup
        Has setup been called?
      • server

        protected java.lang.String server
        Server name
      • baseURL

        protected java.lang.String baseURL
        Base URL
      • userAgent

        protected java.lang.String userAgent
        The user-agent for this connector instance
      • serverLogin

        protected java.lang.String serverLogin
      • serverPass

        protected java.lang.String serverPass
      • serverDomain

        protected java.lang.String serverDomain
      • accessRealm

        protected java.lang.String accessRealm
      • accessUser

        protected java.lang.String accessUser
      • accessPassword

        protected java.lang.String accessPassword
      • proxyHost

        protected java.lang.String proxyHost
      • proxyPort

        protected java.lang.String proxyPort
      • proxyDomain

        protected java.lang.String proxyDomain
      • proxyUsername

        protected java.lang.String proxyUsername
      • proxyPassword

        protected java.lang.String proxyPassword
      • connectionManager

        protected org.apache.http.conn.HttpClientConnectionManager connectionManager
        Connection management
      • httpClient

        protected org.apache.http.client.HttpClient httpClient
    • Constructor Detail

      • WikiConnector

        public WikiConnector()
        Constructor.
    • Method Detail

      • getActivitiesList

        public java.lang.String[] getActivitiesList()
        List the activities we might report on.
        Specified by:
        getActivitiesList in interface org.apache.manifoldcf.crawler.interfaces.IRepositoryConnector
        Overrides:
        getActivitiesList in class org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector
      • getBinNames

        public java.lang.String[] getBinNames​(java.lang.String documentIdentifier)
        For any given document, list the bins that it is a member of.
        Specified by:
        getBinNames in interface org.apache.manifoldcf.crawler.interfaces.IRepositoryConnector
        Overrides:
        getBinNames in class org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector
      • connect

        public void connect​(org.apache.manifoldcf.core.interfaces.ConfigParams configParameters)
        Connect.
        Specified by:
        connect in interface org.apache.manifoldcf.core.interfaces.IConnector
        Overrides:
        connect in class org.apache.manifoldcf.core.connector.BaseConnector
        Parameters:
        configParameters - is the set of configuration parameters, which in this case describe the target appliance, basic auth configuration, etc.
      • getSession

        protected void getSession()
                           throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                  org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
      • loginToAPI

        protected boolean loginToAPI()
                              throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                     org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        Log in via the Wiki API. Call this method whenever login is apparently needed.
        Returns:
        true if the login was successful, false otherwise.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
      • check

        public java.lang.String check()
                               throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Check status of connection.
        Specified by:
        check in interface org.apache.manifoldcf.core.interfaces.IConnector
        Overrides:
        check in class org.apache.manifoldcf.core.connector.BaseConnector
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
      • poll

        public void poll()
                  throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
        This method is periodically called for all connectors that are connected but not in active use.
        Specified by:
        poll in interface org.apache.manifoldcf.core.interfaces.IConnector
        Overrides:
        poll in class org.apache.manifoldcf.core.connector.BaseConnector
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
      • disconnect

        public void disconnect()
                        throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Close the connection. Call this before discarding the connection.
        Specified by:
        disconnect in interface org.apache.manifoldcf.core.interfaces.IConnector
        Overrides:
        disconnect in class org.apache.manifoldcf.core.connector.BaseConnector
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
      • getMaxDocumentRequest

        public int getMaxDocumentRequest()
        Get the maximum number of documents to amalgamate together into one batch, for this connector.
        Specified by:
        getMaxDocumentRequest in interface org.apache.manifoldcf.crawler.interfaces.IRepositoryConnector
        Overrides:
        getMaxDocumentRequest in class org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector
        Returns:
        the maximum number. 0 indicates "unlimited".
      • addSeedDocuments

        public java.lang.String addSeedDocuments​(org.apache.manifoldcf.crawler.interfaces.ISeedingActivity activities,
                                                 org.apache.manifoldcf.core.interfaces.Specification spec,
                                                 java.lang.String lastSeedVersion,
                                                 long seedTime,
                                                 int jobMode)
                                          throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                                 org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        Queue "seed" documents. Seed documents are the starting places for crawling activity. Documents are seeded when this method calls appropriate methods in the passed in ISeedingActivity object. This method can choose to find repository changes that happen only during the specified time interval. The seeds recorded by this method will be viewed by the framework based on what the getConnectorModel() method returns. It is not a big problem if the connector chooses to create more seeds than are strictly necessary; it is merely a question of overall work required. The end time and seeding version string passed to this method may be interpreted for greatest efficiency. For continuous crawling jobs, this method will be called once, when the job starts, and at various periodic intervals as the job executes. When a job's specification is changed, the framework automatically resets the seeding version string to null. The seeding version string may also be set to null on each job run, depending on the connector model returned by getConnectorModel(). Note that it is always ok to send MORE documents rather than less to this method. The connector will be connected before this method can be called.
        Specified by:
        addSeedDocuments in interface org.apache.manifoldcf.crawler.interfaces.IRepositoryConnector
        Overrides:
        addSeedDocuments in class org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector
        Parameters:
        activities - is the interface this method should use to perform whatever framework actions are desired.
        spec - is a document specification (that comes from the job).
        seedTime - is the end of the time range of documents to consider, exclusive.
        lastSeedVersion - is the last seeding version string for this job, or null if the job has no previous seeding version string.
        jobMode - is an integer describing how the job is being run, whether continuous or once-only.
        Returns:
        an updated seeding version string, to be stored with the job.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
      • processDocuments

        public void processDocuments​(java.lang.String[] documentIdentifiers,
                                     org.apache.manifoldcf.crawler.interfaces.IExistingVersions statuses,
                                     org.apache.manifoldcf.core.interfaces.Specification spec,
                                     org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities,
                                     int jobMode,
                                     boolean usesDefaultAuthority)
                              throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                     org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        Process a set of documents. This is the method that should cause each document to be fetched, processed, and the results either added to the queue of documents for the current job, and/or entered into the incremental ingestion manager. The document specification allows this class to filter what is done based on the job. The connector will be connected before this method can be called.
        Specified by:
        processDocuments in interface org.apache.manifoldcf.crawler.interfaces.IRepositoryConnector
        Overrides:
        processDocuments in class org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector
        Parameters:
        documentIdentifiers - is the set of document identifiers to process.
        statuses - are the currently-stored document versions for each document in the set of document identifiers passed in above.
        activities - is the interface this method should use to queue up new document references and ingest documents.
        jobMode - is an integer describing how the job is being run, whether continuous or once-only.
        usesDefaultAuthority - will be true only if the authority in use for these documents is the default one.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
      • getAcls

        protected static java.lang.String[] getAcls​(org.apache.manifoldcf.core.interfaces.Specification spec)
        Grab forced acl out of document specification.
        Parameters:
        spec - is the document specification.
        Returns:
        the acls.
      • outputConfigurationHeader

        public void outputConfigurationHeader​(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext,
                                              org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
                                              java.util.Locale locale,
                                              org.apache.manifoldcf.core.interfaces.ConfigParams parameters,
                                              java.util.List<java.lang.String> tabsArray)
                                       throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                              java.io.IOException
        Output the configuration header section. This method is called in the head section of the connector's configuration page. Its purpose is to add the required tabs to the list, and to output any javascript methods that might be needed by the configuration editing HTML.
        Specified by:
        outputConfigurationHeader in interface org.apache.manifoldcf.core.interfaces.IConnector
        Overrides:
        outputConfigurationHeader in class org.apache.manifoldcf.core.connector.BaseConnector
        Parameters:
        threadContext - is the local thread context.
        out - is the output to which any HTML should be sent.
        parameters - are the configuration parameters, as they currently exist, for this connection being configured.
        tabsArray - is an array of tab names. Add to this array any tab names that are specific to the connector.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        java.io.IOException
      • outputConfigurationBody

        public void outputConfigurationBody​(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext,
                                            org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
                                            java.util.Locale locale,
                                            org.apache.manifoldcf.core.interfaces.ConfigParams parameters,
                                            java.lang.String tabName)
                                     throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                            java.io.IOException
        Output the configuration body section. This method is called in the body section of the connector's configuration page. Its purpose is to present the required form elements for editing. The coder can presume that the HTML that is output from this configuration will be within appropriate <html>, <body>, and <form> tags. The name of the form is "editconnection".
        Specified by:
        outputConfigurationBody in interface org.apache.manifoldcf.core.interfaces.IConnector
        Overrides:
        outputConfigurationBody in class org.apache.manifoldcf.core.connector.BaseConnector
        Parameters:
        threadContext - is the local thread context.
        out - is the output to which any HTML should be sent.
        parameters - are the configuration parameters, as they currently exist, for this connection being configured.
        tabName - is the current tab name.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        java.io.IOException
      • processConfigurationPost

        public java.lang.String processConfigurationPost​(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext,
                                                         org.apache.manifoldcf.core.interfaces.IPostParameters variableContext,
                                                         java.util.Locale locale,
                                                         org.apache.manifoldcf.core.interfaces.ConfigParams parameters)
                                                  throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Process a configuration post. This method is called at the start of the connector's configuration page, whenever there is a possibility that form data for a connection has been posted. Its purpose is to gather form information and modify the configuration parameters accordingly. The name of the posted form is "editconnection".
        Specified by:
        processConfigurationPost in interface org.apache.manifoldcf.core.interfaces.IConnector
        Overrides:
        processConfigurationPost in class org.apache.manifoldcf.core.connector.BaseConnector
        Parameters:
        threadContext - is the local thread context.
        variableContext - is the set of variables available from the post, including binary file post information.
        parameters - are the configuration parameters, as they currently exist, for this connection being configured.
        Returns:
        null if all is well, or a string error message if there is an error that should prevent saving of the connection (and cause a redirection to an error page).
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
      • viewConfiguration

        public void viewConfiguration​(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext,
                                      org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
                                      java.util.Locale locale,
                                      org.apache.manifoldcf.core.interfaces.ConfigParams parameters)
                               throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                      java.io.IOException
        View configuration. This method is called in the body section of the connector's view configuration page. Its purpose is to present the connection information to the user. The coder can presume that the HTML that is output from this configuration will be within appropriate <html> and <body>tags.
        Specified by:
        viewConfiguration in interface org.apache.manifoldcf.core.interfaces.IConnector
        Overrides:
        viewConfiguration in class org.apache.manifoldcf.core.connector.BaseConnector
        Parameters:
        threadContext - is the local thread context.
        out - is the output to which any HTML should be sent.
        parameters - are the configuration parameters, as they currently exist, for this connection being configured.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        java.io.IOException
      • outputSpecificationHeader

        public void outputSpecificationHeader​(org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
                                              java.util.Locale locale,
                                              org.apache.manifoldcf.core.interfaces.Specification ds,
                                              int connectionSequenceNumber,
                                              java.util.List<java.lang.String> tabsArray)
                                       throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                              java.io.IOException
        Output the specification header section. This method is called in the head section of a job page which has selected a repository connection of the current type. Its purpose is to add the required tabs to the list, and to output any javascript methods that might be needed by the job editing HTML. The connector will be connected before this method can be called.
        Specified by:
        outputSpecificationHeader in interface org.apache.manifoldcf.crawler.interfaces.IRepositoryConnector
        Overrides:
        outputSpecificationHeader in class org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector
        Parameters:
        out - is the output to which any HTML should be sent.
        locale - is the locale the output is preferred to be in.
        ds - is the current document specification for this job.
        connectionSequenceNumber - is the unique number of this connection within the job.
        tabsArray - is an array of tab names. Add to this array any tab names that are specific to the connector.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        java.io.IOException
      • outputSpecificationBody

        public void outputSpecificationBody​(org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
                                            java.util.Locale locale,
                                            org.apache.manifoldcf.core.interfaces.Specification ds,
                                            int connectionSequenceNumber,
                                            int actualSequenceNumber,
                                            java.lang.String tabName)
                                     throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                            java.io.IOException
        Output the specification body section. This method is called in the body section of a job page which has selected a repository connection of the current type. Its purpose is to present the required form elements for editing. The coder can presume that the HTML that is output from this configuration will be within appropriate <html>, <body>, and <form> tags. The name of the form is always "editjob". The connector will be connected before this method can be called.
        Specified by:
        outputSpecificationBody in interface org.apache.manifoldcf.crawler.interfaces.IRepositoryConnector
        Overrides:
        outputSpecificationBody in class org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector
        Parameters:
        out - is the output to which any HTML should be sent.
        locale - is the locale the output is preferred to be in.
        ds - is the current document specification for this job.
        connectionSequenceNumber - is the unique number of this connection within the job.
        actualSequenceNumber - is the connection within the job that has currently been selected.
        tabName - is the current tab name. (actualSequenceNumber, tabName) form a unique tuple within the job.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        java.io.IOException
      • processSpecificationPost

        public java.lang.String processSpecificationPost​(org.apache.manifoldcf.core.interfaces.IPostParameters variableContext,
                                                         java.util.Locale locale,
                                                         org.apache.manifoldcf.core.interfaces.Specification ds,
                                                         int connectionSequenceNumber)
                                                  throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Process a specification post. This method is called at the start of job's edit or view page, whenever there is a possibility that form data for a connection has been posted. Its purpose is to gather form information and modify the document specification accordingly. The name of the posted form is always "editjob". The connector will be connected before this method can be called.
        Specified by:
        processSpecificationPost in interface org.apache.manifoldcf.crawler.interfaces.IRepositoryConnector
        Overrides:
        processSpecificationPost in class org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector
        Parameters:
        variableContext - contains the post data, including binary file-upload information.
        locale - is the locale the output is preferred to be in.
        ds - is the current document specification for this job.
        connectionSequenceNumber - is the unique number of this connection within the job.
        Returns:
        null if all is well, or a string error message if there is an error that should prevent saving of the job (and cause a redirection to an error page).
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
      • viewSpecification

        public void viewSpecification​(org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
                                      java.util.Locale locale,
                                      org.apache.manifoldcf.core.interfaces.Specification ds,
                                      int connectionSequenceNumber)
                               throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                      java.io.IOException
        View specification. This method is called in the body section of a job's view page. Its purpose is to present the document specification information to the user. The coder can presume that the HTML that is output from this configuration will be within appropriate <html> and <body>tags. The connector will be connected before this method can be called.
        Specified by:
        viewSpecification in interface org.apache.manifoldcf.crawler.interfaces.IRepositoryConnector
        Overrides:
        viewSpecification in class org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector
        Parameters:
        out - is the output to which any HTML should be sent.
        locale - is the locale the output is preferred to be in.
        ds - is the current document specification for this job.
        connectionSequenceNumber - is the unique number of this connection within the job.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        java.io.IOException
      • getInitializedGetMethod

        protected org.apache.http.client.methods.HttpRequestBase getInitializedGetMethod​(java.lang.String URL)
                                                                                  throws java.io.IOException
        Create and initialize an HttpRequestBase
        Throws:
        java.io.IOException
      • getInitializedPostMethod

        protected org.apache.http.client.methods.HttpRequestBase getInitializedPostMethod​(java.lang.String URL,
                                                                                          java.util.Map<java.lang.String,​java.lang.String> params)
                                                                                   throws java.io.IOException
        Create an initialize a post method
        Throws:
        java.io.IOException
      • performCheck

        protected void performCheck()
                             throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                    org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        Do the check operation. This throws an exception if anything is wrong.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
      • getCheckURL

        protected java.lang.String getCheckURL()
                                        throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Get a URL for a check operation.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
      • parseCheckResponse

        protected static boolean parseCheckResponse​(java.io.InputStream is)
                                             throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                                    org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        Parse check response, e.g.:
         <api xmlns="http://www.mediawiki.org/xml/api/">
           <query>
             <allpages>
               <p pageid="19839654" ns="0" title="Kre'fey" />
             </allpages>
           </query>
           <query-continue>
             <allpages apfrom="Krea" />
           </query-continue>
         </api>
         
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
      • listAllPages

        protected void listAllPages​(org.apache.manifoldcf.crawler.interfaces.ISeedingActivity activities,
                                    java.lang.String namespace,
                                    java.lang.String prefix,
                                    long startTime,
                                    long endTime)
                             throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                    org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        Perform a series of listPages() operations, so that we fully obtain the documents we're looking for even though we're limited to 500 of them per request.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
      • executeListPagesViaThread

        protected java.lang.String executeListPagesViaThread​(java.lang.String startPageTitle,
                                                             java.lang.String namespace,
                                                             java.lang.String prefix,
                                                             org.apache.manifoldcf.crawler.interfaces.ISeedingActivity activities)
                                                      throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                                             org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        Execute a listPages() operation via a thread. Returns the last page title.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
      • getListPagesURL

        protected java.lang.String getListPagesURL​(java.lang.String startingTitle,
                                                   java.lang.String namespace,
                                                   java.lang.String prefix)
                                            throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Create a URL to obtain the next 500 pages.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
      • parseListPagesResponse

        protected static boolean parseListPagesResponse​(java.io.InputStream is,
                                                        org.apache.manifoldcf.connectorcommon.common.XThreadStringBuffer buffer,
                                                        java.lang.String startPageTitle,
                                                        WikiConnector.ReturnString lastTitle)
                                                 throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                                        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        Parse list output, e.g.:
         <api xmlns="http://www.mediawiki.org/xml/api/">
           <query>
             <allpages>
               <p pageid="19839654" ns="0" title="Kre'fey" />
               <p pageid="30955295" ns="0" title="Kre-O" />
               <p pageid="14773725" ns="0" title="Kre8tiveworkz" />
               <p pageid="19219017" ns="0" title="Kre M'Baye" />
               <p pageid="19319577" ns="0" title="Kre Mbaye" />
             </allpages>
           </query>
           <query-continue>
             <allpages apfrom="Krea" />
           </query-continue>
         </api>
         
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
      • getDocURLs

        protected void getDocURLs​(java.lang.String[] documentIdentifiers,
                                  java.util.Map<java.lang.String,​java.lang.String> urls)
                           throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                  org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
      • getGetDocURLsURL

        protected java.lang.String getGetDocURLsURL​(java.lang.String[] documentIdentifiers)
                                             throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Create a URL to obtain multiple page's urls, given the page IDs.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
      • parseGetDocURLsResponse

        protected static boolean parseGetDocURLsResponse​(java.io.InputStream is,
                                                         java.util.Map<java.lang.String,​java.lang.String> urls)
                                                  throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                                         org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        This method parses a response like the following:
         <api>
           <query>
             <pages>
               <page pageid="27697087" ns="0" title="API" fullurl="..."/>
             </pages>
           </query>
         </api>
         
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
      • getTimestamps

        protected void getTimestamps​(java.lang.String[] documentIdentifiers,
                                     java.util.Map<java.lang.String,​java.lang.String> versions,
                                     org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities)
                              throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                     org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        Obtain document versions for a set of documents.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
      • getGetTimestampURL

        protected java.lang.String getGetTimestampURL​(java.lang.String[] documentIdentifiers)
                                               throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Create a URL to obtain multiple page's timestamps, given the page IDs.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
      • parseGetTimestampResponse

        protected static boolean parseGetTimestampResponse​(java.io.InputStream is,
                                                           java.util.Map<java.lang.String,​java.lang.String> versions)
                                                    throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                                           org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        This method parses a response like the following:
         <api>
           <query>
             <pages>
               <page pageid="27697087" ns="0" title="API">
                 <revisions>
                   <rev user="Graham87" timestamp="2010-06-13T08:41:17Z" />
                 </revisions>
               </page>
             </pages>
           </query>
         </api>
         
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
      • getNamespaces

        protected void getNamespaces​(java.util.Map<java.lang.String,​java.lang.String> namespaces)
                              throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                     org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        Obtain the set of namespaces, as a map keyed by the canonical namespace name where the value is the descriptive name.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
      • getGetNamespacesURL

        protected java.lang.String getGetNamespacesURL()
                                                throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Create a URL to obtain the namespaces.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
      • getDocInfo

        protected void getDocInfo​(java.lang.String documentIdentifier,
                                  java.lang.String documentVersion,
                                  java.lang.String fullURL,
                                  org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities,
                                  java.lang.String[] allowACL)
                           throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                  org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        Get document info and index the document.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
      • getGetDocInfoURL

        protected java.lang.String getGetDocInfoURL​(java.lang.String documentIdentifier)
                                             throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Create a URL to obtain a page's metadata and content, given the page ID. QUESTION: Can we do multiple document identifiers at a time??
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
      • readResponseAsString

        protected static java.lang.String readResponseAsString​(org.apache.http.HttpResponse httpResponse)
                                                        throws java.io.IOException
        Throws:
        java.io.IOException
      • handleException

        protected static void handleException​(java.lang.Throwable thr)
                                       throws java.lang.InterruptedException,
                                              org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                              org.apache.manifoldcf.agents.interfaces.ServiceInterruption,
                                              java.io.IOException,
                                              org.apache.http.HttpException
        Throws:
        java.lang.InterruptedException
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        java.io.IOException
        org.apache.http.HttpException