Full Text Search (FTS) is a main capability of Content Management systems to search both content and metadata associated to the content. In a previous blog, I already discussed about a new fully scalable architecture for Content Management using Apache Chemistry with Couchbase repository for metadata (and possibly blobs). Today, I would like to discuss about how to integrate FTS capability in a scalable way in this architecture without the need for yet another tier (ElasticSearch, Solr, LudicWorks).

In 2015, Couchbase has announced the development of CBFT which stands for Couchbase Full Text search, actually in developer preview. CBFT is simple, integrated distributed Full Text server which covers 80% of features of most applications.You can find more informations on CBFT here: http://connect15.couchbase.com/agenda/sneak-peek-cbft-full-text-search-couchbase/

In this article, I will start to investigate how to integrate CBFT in CMIS Apache Chemistry for metadata full text search.

  • Setup

To install Couchbase, follow the documentation here.

Create a bucket called cmismeta. This bucket contains the metatada of each content (folder, file).

To install Apache Chemistry using Couchbase repository, follow the documentation here.

To install CBFT, follow the documentation here.

  • Create a CBFT index

Start CBFT on a local node : cbft -s http://localhost:8091

Point your web browser to cbft’s web admin UI : http://localhost:8095

On the Indexes listing page, click on the New Index button.

Create an index called cmis-fts on bucket cmismeta.

  • Test your index

To test your index, you need to add content on cmismeta bucket. You can either do it using the Apache Chemistry workbench to create content (folder, files) that will be associated with metadata in cmismeta bucket, or by adding simple content for testing (then remove it).

In this example, I already have a bunch of files added to the Content Management Couchbase repository.

Open the query tab and enter a query using Bleve syntax

  • CMIS Apache Chemistry project

 First, you need to activate the full text query capabilities of CMIS Couchbase repository class.

public class CouchbaseRepository {

   private RepositoryInfo createRepositoryInfo(CmisVersion cmisVersion) {

        // set repo infos

        RepositoryInfoImpl repositoryInfo = new RepositoryInfoImpl();

        repositoryInfo.setCmisVersionSupported(cmisVersion.value());

        …

        // set repo capabilities

       RepositoryCapabilitiesImpl capabilities = new RepositoryCapabilitiesImpl();

        capabilities.setCapabilityQuery(CapabilityQuery.FULLTEXTONLY);

        …

        repositoryInfo.setCapabilities(capabilities);

        return repositoryInfo;

     }

}

To query the CBFT index, we are using the REST API with a Jersey client.

First, add the dependency in the maven pom file.

           com.sun.jersey

jersey-client

          1.8

Then create a new CBFT service class. This service needs the CBFT location and index name. I provides a simple query method returning a list of keys referring to cmismeta bucket in Couchbase.

package org.apache.chemistry.opencmis.couchbase;

import java.util.ArrayList;
import java.util.List;

import com.couchbase.client.java.document.json.JsonArray;
import com.couchbase.client.java.document.json.JsonObject;
import com.sun.jersey.api.client.Client;
import com.sun.jersey.api.client.ClientResponse;
import com.sun.jersey.api.client.WebResource;

public class CBFTService  {

 private String cbftLocation = null;

private Client client = null;

private String indexid = null;

public CBFTService(String location, String indexid) {
this.cbftLocation = location;
this.indexid = indexid;
client = Client.create();
}

/** Search cbft index.
* @param query the query to search
* @return list of keys matching the query
* */
public List query(String query){
List results = new ArrayList();

WebResource webResource = client
.resource(“http://”+this.cbftLocation+”:8095/api/index/”+indexid+”/query”);

String input = “{” +

     “”q”: “”+query+””,” +
“”indexName”: “”+indexid+””,” +
“”size”: 10,”+
“”from”: 0,”+
“”explain”: true,”+
“”highlight”: {},” +
“”query”: {” +
“”boost”: 1,”+
“”query”: “”+query + “””+
“},”+
“”fields”: [” +
“”*”” +
“],” +
“”ctl”: {” +
“”consistency”: {“+
“”level”: “”,” +
“”vectors”: {}”+
“},”+
“”timeout”: 0″+
“}”+
“}”;

    ClientResponse response = webResource.type(“application/json”)
.post(ClientResponse.class, input);

if (response.getStatus() != 200) {
throw new RuntimeException(“Failed : HTTP error code : ”
+ response.getStatus());
}

String output = response.getEntity(String.class);

JsonObject content = JsonObject.fromJson(output);

   JsonArray hits = content.getArray(“hits”);

if(hits != null){
String id;

     for(int i=0 ; i<hits.size(); i++){
id = hits.getObject(i).getString(“id”);
results.add(id);

    }
}

return results;

}
}

You can now query the Content Management server using the workbench to retrieve content using the CBFT capability and click on the result to see the associated content.

Author

Posted by Cecile Le Pape, Solutions Architect, Couchbase

After 10 years at University of Paris 6 (France) where she worked in the database team on replication, consistency, XML, p2p, syndication and mobile projects, Cecile decided to move on and start a new life experience in industry. Cecile joined the architect team of a small company called Oceane Consulting to develop document management projects. She is now a Solutions Architect at Couchbase and very happy to be involved both on the technical field and sales.

Leave a reply