A look under the covers of PouchDB-find

PouchDB-find is a new API and syntax that allows for a simpler way to query PouchDB. It is much more suited to ad-hoc querying and a fair amount easier to learn than PouchDB’s current way of querying documents via Map/Reduce. It is a MongoDB-inspired query language to query a PouchDB database. It works with PouchDB, Cloudant Query and CouchDB Mango (CouchDB 2.0 Release).

At the moment, PouchDB-find is still in beta. We are working hard on getting it out of beta and to a stable release (V1). Once that is achieved we will integrate it into PouchDB to replace Map/Reduce as the default way to find documents. 

Understanding how PouchDB-find translates your query and actually finds the documents, will really help you to write great queries for it. This post will explain how PouchDB-find works, and will highlight some of the constraints of the library.

Since PouchDB-find is a JavaScript implementation of CouchDB Mango, and CouchDB Mango is the underlying implementation for CouchDB 2.0 and Cloudant, the explanation here applies to CouchDB 2.0 and Cloudant as well.

Creating a basic index

To begin using PouchDB-find, we first need to create an index with a list of fields we want to query against.

Let’s create a user document structure to use for the queries in this blogpost.

{
    _id: 'garren-unique-id',
    name: 'Garren Smith',
    age: '31',
    country: 'South Africa'
    languages: ['javascript', 'ruby', 'c#', 'c']
}

Imagine we have lots of documents like this that are stored in our database. For our first query we would like to find all the people over the age of 30. We first create an index for the age field:

db.createIndex({
  index: {
    fields: ['age']
  }
});

Now that the index is created, we can use that to run our first query:

db.find({
  selector: {age: {$gte: 30}},
 }).then(function (result) {
  // list of people shown here
});

The most important part of the selector is the age: {$gte: 30} selector. This says: find all the documents where the age field is greater than or equal to 30.

Let’s now explore how PouchDB-find actually runs a query and finds the documents.

How a query works

Under the hood, PouchDB-find still uses Map/Reduce. Each query made with PouchDB-find uses a two step process (I am ignoring the sorting steps to keep this simpler). The first step is a Map/Reduce query using the selector fields as the Map/Reduce view parameters. This is followed by in-memory processing of the results from the Map/Reduce query.

When we defined our age index, PouchDB-find created a Map/Reduce view that emits the age field as the key. A basic version of the Map function would look something like this:

function (doc) {
   if (doc.age) {
     emit(doc.age, doc);
   }
}

The age field is emitted as the key and the doc as the value.

For our query of age: {$gte: 30}, the Map/Reduce parameters used to find the correct documents would be {startkey=30} and that would be run against the index created by the above Map function.

Suppose we want to do a range query. Something like age: {$gte: 30, $lt: 35} that is, finding all the documents with the age field greater than or equal to 30 and less than 35. The view querying parameters would be {startkey=30, endkey=35, inclusive_end=false}.

In-memory processing

The above queries didn’t need any in-memory processing as the fields in the query were all defined in the index. PouchDB-find will only do in-memory processing if there are extra fields in the query that are not part of any index or if there are multiple fields in that index. Consider the following example:

db.find({
  selector: {
    age: {$gte: 30},
    country: 'South Africa'
  }
}).then(function (result) {
  // list of people shown here
})

The above query will first do a Map/Reduce query for the age field with the parameters {startkey=30}. Using the results from from the view query it will then do some in-memory processing and select each document with the country field equal to South Africa. If we wanted to make this query more efficient and reduce the number of documents for in-memory processing, we could create another index and include the country field in the query.

db.createIndex({
  index: {
    fields: ['age', 'country']
  }
});

Now the query for selector: {age: {$gte: 30}}, country: ‘South Africa’ } would translate into view query parameters {startkey=[30, ‘South Africa’], endkey=[{}, ‘South Africa’]}. There will still be some in-memory processing for the country field but it would be run against a lot less documents.

So now you have two indexes. If a query only has the age field it will use the index with the age field in it. And if the age and country fields are specified in the query it will use the second defined index. This way you can limit the amount of in-memory processing which leads to the queries being a lot faster.

Map/Reduce limits

Map/Reduce does have some limits - only the equality operators can be translated into a view query and this is where people can get stuck. You cannot convert a $regex or a $elemMatch into view query parameters. So only $eq, $gt, $gte, $lt, and $lte can be used for a view query. Any other operator will have to be calculated in-memory.

So even with the two previously created indexes I will get the error There is no index available for this selector. if I try and do this:

db.find({
  selector: {
    age: {$mod: [10,0]},
    country: 'South Africa'
  }
}).then(function (result) {
  // list of people shown here
});

A very quick fix to that would be to do this:

db.find({
  selector: {
    _id: {gt: null},
    age: {$mod: [10,0]},
    country: 'South Africa'
  }
}).then(function (result) {
  // list of people shown here
});

That uses PouchDB.allDocs() to fetch all the documents from the database and then processes the other two predicates in-memory. This would be fine with a small database, but as you can imagine it doesn’t scale well with a large number of documents.This is why it is always a good idea to test with your own database; target browsers and devices to make sure your queries are performant enough.

Effectively combining Map/Reduce and in-memory

Let’s look at one last query. If we want to find all the people that are older than 30, live in South Africa and program JavaScript we could do this:

db.find({
  selector: {
    age: {$gte: 30},
    country: 'South Africa',
    languages: {
      $elemMatch: 'javascript'
    }
  }
}).then(function (result) {
  // list of people shown here
});

This query will work by first running a Map/Reduce query using the age and country fields, then it will do an in-memory operation on the results to find all documents with the javascript value in the languages array. This is the best use of in-memory operators as they are only applied to a subset of documents in the database. Ideally you always want the Map/Reduce query to do the heaviest lifting in terms of narrowing down the result set so in-memory processing is done on the smallest number of documents possible.

Hopefully now you have a better understanding of how PouchDB-find works. Hopefully this guide helps you understand how to use PouchDB-find to its best abilities and get the best results from it. Please leave any questions or feedback about PouchDB-find in the comments.