Posts Tagged ‘NoSQL’

Rediscovering BerkeleyDB on the Raspberry Pi

Friday, August 29th, 2014

BerkeleyDB (BDB) is an embedded database designed for in-application datastores and small devices. At its most basic, it’s a simple key-value store of byte arrays.

Like sqlite, the entire database is self-contained in a series of files, with no process or network port overhead.

Unlike sqlite, it also has good support for managing concurrent transactions and more advanced database features such as live backups and replication.

Querying is also surprisingly advanced. Beyond hashtable style key lookups, the keys themselves are indexable and searchable, and thanks to the secondary index feature, it’s also possible to do complex queries within the data values themselves.

While it’s possible to get a full-fledged relational dbms like PostgreSQL running on a Raspberry Pi, the fact that BDB runs with minimal resources is useful in a constrained environment, especially if you have disabled swap, and undertaken other cruel and unusual measures to preserve SD card life so that your Pi can run as close to 24x7x365 as possible.

Back in 2009, I was using it extensively on a messaging and workflow project, and I revisted that code when it came time to decide on a datastore for a new, long-running server application designed to run on a Pi.

I’ve cleaned-up that codebase and made it available as a public repo on github:

Node.js and MongoDB: A Simple Example

Thursday, September 22nd, 2011

Update, October 14, 2012:
If you are interested in creating highly scalable applications with mongoDB, you should really consider using Go (#golang) instead, and check out the Go version of this post.

I’ve been learning Node.js. After getting through an excellent basic tutorial, I wanted to experiment with connecting to MongoDB, since it would be helpful for some enhancements I have planned for Stealth Mode Watch and BookHunch.

There are several drivers available, but the most commonly used and recommended one is node-mongodb-native.

Despite the examples, though, it wasn’t clear how to collect and return the results of a single query.

Both the simple example and the queries example just dump the result to the console within the cursor, and the best answer on the normally reliable StackOverflow site was problematic, too, because the server code was written in a blocking style without callbacks, which defeats the purpose of using Node.js in the first place.

Fortunately, though, there is a terrific article about control flow on How to Node which explains how to do multiple asynchronous tasks (either in parallel or serially), and collect all the results.

The examples in the control flow article dealt with accessing files from different folders, which was simple enough to change to running MongoDB queries instead.

The first step is to install node-mongodb-native using npm:

npm config set loglevel info
npm install mongodb

The first line essentially sets npm in a verbose mode, to let you know what it’s doing as it runs (a tip from the guys at, and the second actually installs the node-mongodb-native module (it seems that the native parser is obsolete, so there is no need to use the --mongodb:native switch in the second line).

Here’s how to execute multiple queries and collect their results.

var Db = require('./node_modules/mongodb').Db,
    Connection = require('./node_modules/mongodb').Connection,
    Server = require('./node_modules/mongodb').Server;

var host = process.env['MONGO_NODE_DRIVER_HOST'] != null ? 
           process.env['MONGO_NODE_DRIVER_HOST'] : 'localhost';
var port = process.env['MONGO_NODE_DRIVER_PORT'] != null ?
           process.env['MONGO_NODE_DRIVER_PORT'] : Connection.DEFAULT_PORT;

These lines just load the db module and prepare for a connection; no connection has been opened yet.

This function executes the query in the db, collects the result documents in a list, and passes them to a callback function when it’s done iterating through the query cursor:

function runQuery (db, myCollection query, nextFn) {
    // perform the {query} on the collection and invoke the nextFn when done, db) {
	db.collection(myCollection, function(err, collection) {
	    collection.find(query, function(err, cursor) {
		cursor.toArray(function(err, docs) {
		    console.log("Found " + docs.length + " documents");
		    var queryResults = [];
		    for(var i=0; i<docs.length; i++) {
			queryResults[queryResults.length] = docs[i]; 

So that leaves opening the connection, defining the query or queries, and calling runQuery().

Suppose we have a database consisting of two collections, people and companies, and that we want to search for a given string anywhere among a person’s name, a company’s name, or even a company’s address.

Our search function would look like this:

function search (personName, companyName, address, nextFn) {
    var data = [], count = 3;
    var doneFn = function(results) {
	data = data.concat(results);
	if( count <= 0 ) {
	    var uniqueResults = [];
	    for(var i=0; i<data.length; i++) {
		if( ! uniqueResults[data[i]['_id']] ) {
		    uniqueResults[uniqueResults.length] = data[i];
		    uniqueResults[data[i]['_id']] = true;
    runQuery(new Db('mydb', new Server(host, port, {})),
	     {'name':new RegExp('^'+personName, 'i')},
    runQuery(new Db('mydb', new Server(host, port, {})),
	     {'name':new RegExp('^'+companyName, 'i')},
    runQuery(new Db('mydb', new Server(host, port, {})),
	     {'address':new RegExp(address, 'i')},

Just like the control flow article, we use a counter in our callback function — doneFn — to check when there are no more queries to run, and once that’s the case, we iterate through the combined list of results and use the document ObjectId to ensure the list of results is unique.

That unique list of results is then passed to the callback function given to search(), i.e., nextFn, which ultimately decides what to do with the combined results.

Each call to runQuery() needs its own db connection, opened like this:

new Db('mydb', new Server(host, port, {}))

because each query will run asynchronously and independently of each other.

It’s also possible to define a single connection, once, before each of the runQuery() calls happen:

var db = new Db('mydb', new Server(host, port, {}));

and just pass the db variable to runQuery(), but then the db.close() line would have to be removed from inside the runQuery() function, and placed inside the doneFn instead.

As for the queries themselves, the beautiful thing about using Javascript and MongoDB together is that any native Javascript object doubles as the query input to MongoDB.

So any query that works in the MongoDB command line, will work in Javascript without much or any transformation necessary, unlike the hoops you have to jump through in Python+pymongo.

Finally, the function that calls search() has to decide what to do with the results.

If we’re writing an API and want the results sent back over HTTP as json, all we have to do is this (within the larger context of an http server in Node.js, of course):

apiSearch('Jones', 'Jones', 'California', function(results) {
    var replyJson = {"warning":"No matches found"};
    if( results.length > 0 ) {
	replyJson = {"result":{ "matched":results.length, "matches":results}};
    response.writeHead(200, {"Content-Type": "application/json"});

This example searches the database for people or companies named Jones, or companies located in California.

The final callback here gets passed the unique list of results from the search.doneFn — i.e., it is the nextFn passed into search() — and generates an http response in json format.