Introduction

Couchbase Full Text Search (FTS) is a great tool for indexing and querying geospatial data.  In this article, I’ll present a geospatial search use case and demonstrate the various ways that we can perform a search of location data using the Couchbase Full Text Search service.  I’ll be using Couchbase Server Enterprise Edition 6.6 (running locally in Docker) to create an FTS index on my sample geospatial dataset and then run geospatial queries against the index.

What makes this case interesting is that we aren’t using a spatial database. What is a spatial database? Unlike Couchbase Server, which is a NoSQL document database, spatial databases are specially optimized for data that describes geometric spaces such as lines, points of interest, or even 3-D topology in advanced instances. As we’ll see, Couchbase’s Full Text Search capabilities make it just as useful for handling and querying geospatial data as anything we might expect from a more specialized solution.

Use Case

My family has always enjoyed visiting and exploring Great Smoky Mountains National Park (or GRSM, the National Park Service’s abbreviation), and one day we might be interested in relocating there. But you can’t live in the national park, so we need to consider the various cities and towns near the park and make a short list of the ones to evaluate and possibly visit. 

The main objective is to be within close proximity to the national park, but we’ll consider other factors like the size (population) of the towns, too.

Sample Dataset

To support my GRSM use case, I’ve decided to use a public dataset from GeoNames that includes states, cities, towns, and other landmarks from various nations around the world.  I downloaded their United States data file and imported into Couchbase only the “populated places” data (records with feature codes of ‘PPL’, ‘PPLA’, ‘PPLA2’, ‘PPLA3′,’PPLA4’, and ‘PPLC’) for cities/towns with a non-zero population. The result is a Couchbase bucket “cities” with 30,734 documents.  

Each city’s document data model includes some attributes of interest for my GRSM use case: the name, state, population, elevation, and, most importantly, the latitude and longitude.  Here are a couple of sample JSON documents:

city::4699066

 

city::4649251

Creating the Index

With the city data loaded into the cities bucket in Couchbase, we can build an FTS index that suits the “live near GRSM” use case at hand.  I’ll briefly cover the highlights of creating the index required here, and the full index definition is below in the appendix of the post. (For a more detailed explanation of creating FTS indexes, please refer to my blog post on the topic.)

Index creation key points: 

  • Name: city_geo
  • Bucket: cities
  • Type identifier: “Doc ID up to separator” and enter “::” as the delimiter (note the keys of the sample documents above)
  • Type mappings: 
    • Uncheck “default”
    • Create a mapping for “city” type documents, indexing only these specified fields: 
      • name: I’ll use the keyword analyzer for this field (because we want to sort by name later), and I’ll check index, store, _all, term vectors, and docvalues so that, in addition to searching by this field, I can test the index with highlights and sort by this field. 
      • state: Just store this text field so we can retrieve it in the search results. 
      • population: Set the type to number, and check index, store, and docvalues so that I can sort results based on population later. 
      • elevation: Set the type to number and check only store so that this value is included in the search results.
      • geo: Set the type to geopoint (since each document has the “lat” and lon” properties in the “geo” subdocument), and check index, store, and _all. 

create the city_geo search index

We’ll wait until the indexing process is 100% complete:

index processing complete

Now, let’s quickly test the index in the Couchbase UI to verify the index is working as expected.  The result looks good!

testing the city_geo index

Geospatial Searches

Now that the dataset is loaded and indexed, I can get to the heart of the subject at hand and execute some geospatial queries against the index.  For demonstration purposes, I’ll query the Couchbase Search Service REST API with cURL, but the Search queries can also be executed through any of the Couchbase SDKs as part of your application or service.  N1QL queries also support Full Text Search with SQL methods without needing any coding. 

I’ll format the REST API response for readability using jq, an open source command-line JSON processor.

Search Method 1: Point and Radius

Often we want to know what’s nearby or within a specified distance of a specific point.  In my use case, I’d like to know what cities and towns are near the GRSM national park…maybe within 50 miles as a starting point.  This first geospatial search method is referred to as “point and radius”, “point and distance, or “radius-based”.  

For my “point”, I’ve chosen Newfound Gap, which is a pass over the Smoky Mountains on the border of Tennessee and North Carolina, as well as a popular lookout point and a trailhead for the Appalachian Trail.  It’s a must-do for my family when we visit GRSM.  Let’s look for towns/cities within 50 miles of Newfound Gap. 

area for point and radius geospatial search

Here’s the radius-based query: 

 

The result is 79 cities, sorted by distance from Newfound Gap.  I’ve included the first 15 results here: 

Search Method 2: Bounding Box

79 is a lot of towns & cities to consider, so let’s think about another way to look at this. From my visits to the national park over the years, I know roughly that I want to live somewhere between Knoxville, TN and Waynesville, NC.  Given those two locations, I can query against my GeoNames dataset using the “bounding box” or “rectangle-based” geospatial search method.  

I can supply the coordinates of places near Knoxville and Wanyesville as parameters to my search and those will be used as the upper left and lower right corners of a rectangle.  Any cities located within that rectangle will be returned by the query.  

selected area for bounding box geospatial search

Here’s the rectangle-based query: 

 

The result is 21 cities, sorted by name: 

Search Method 3: Polygon

After some additional research, I’ve decided that I would prefer to live within Sevier County, but south of Interstate 40 and north of the national park boundary.  

selected area for polygon-based geospatial search

To do this, I will need to run a polygon-based search against my FTS index.  This third method was recently added in Couchbase Server 6.5.1. The areas for geospatial search queries can now be specified as polygons, in addition to circles and rectangles.  The polygon is expressed as a series of latitude-longitude coordinates, each determining the location of one corner of the polygon.  

On the map of Sevier County above (the light red line is the county boundary), I’ve overlaid a polygon that roughly corresponds to the area that I’m interested in, and I’ve captured the coordinates of the points of the polygon.  I’ll use these coordinates to form my geospatial polygon-based query:

 

The result is a very manageable list of 6 cities, sorted by population in ascending order: 

Summary and Next Steps

With these three search methods, Couchbase offers a comprehensive geospatial search capability for you to include in your applications.  I encourage you to create an index with geopoint data and run some geospatial point, or geospatial polygon-based queries.  You can easily do this with one of our Couchbase sample datasets, travel-sample, which has a lot of location-based data to use for this purpose.  

Take it one step further and visualize JSON data as real time output from the document database search request using web-based geospatial technology platforms like Mapbox or ESRI.  You will benefit from managing data in a distributed database management system that also supports horizontal scaling, general key value store, data consistency, and more.

Geospatial search is just one of the capabilities of Full Text Search in Couchbase.  You can also try out queries on arrays and natural language queries with scoring, faceting, and boosting.  For more information on this topic, take a look at the application developer documentation and training links in the reference section below.  

References

 

Appendix

Index Creation cURL Command and JSON Definition:

 

Author

Posted by Brian Kane, Solutions Engineer, Couchbase

Brian Kane is a Solutions Engineer at Couchbase and has been working in application development and with database technologies since 1996. He is based in the Houston, Texas area.

Leave a reply