API
Database Creation
There are two configuration parameters that control the sharding
topology of a BigCouch database. The defaults are specified in the
[cluster] block of the server configuration file and may be
overridden at database creation time. N specifies the number of
replicas of each document that are stored, while Q fixes the number
of partitions of the database. A command to create a database
comprised of 32 partitions where each document is stored 3 times
would be:
curl -X PUT 'http://loadbalancer:5984/test_db?n=3&q=32'Document Updates
curl -X PUT http://loadbalancer:5984/test_db/doc_1 -H content-type:application/json -d '{"a":1,"b":2}'BigCouch accepts a w query-string parameter on updates which
overrides the default write quorum for the database. when BigCouch
writes the N copies of each document it will respond to the client
after W of them have been committed successfully (the operations to
commit the remaining copies will continue in the background).
If W copies cannot be committed successfully, BigCouch will respond with either a
202 Accepted if a copy is saved, or a 409 Conflict if all hosts conclude that the
update is based on an outdated revision.
W defaults to the simple majority of N and is
the recommended choice for most applications.
Document Reads
curl http://loadbalancer:5984/test_db/doc_1As in the case of updates there is an r query-string parameter that
sets the quorum for reads. When BigCouch reads a document it issues
requests to all N copies of the partition hosting the document and
responds to the client when R matching success responses are
received. The default quorum is the simple majority of N and is
the recommended choice for most applications.
The differences between BigCouch and CouchDB
When CouchDB runs standalone it listens on some port, typically
5984. BigCouch can run this way but usually several instances will
be run as part of a cluster of nodes. Each can be viewed more or less
as a single CouchDB instance but it actually listens on two ports.
Here's what a typical config file on a programmer's machine might
look like:
[chttpd]
port = 5984
docroot = /Users/bitdiddle/emacs/bigcouch/rel/dev1/share/www
[httpd]
port = 5986
The chttpd stanza specifies the front end port, which supports the
user API and the httpd stanza specifies the back door or the admin
port. So .eg. when adding nodes to a cluster or querying membership
one might make calls like :
curl -X PUT http://127.0.0.1:15986/nodes/dev2@127.0.0.1 -d {}against the admin port. Usually a load balancer would be used in
front of BigCouch and it would be passing calls to the 5984 ports.
So each node in the cluster has a config file that names these ports.
Since BigCouch makes use of distributed Erlang there are also some
key parameters in the vm.args file, in particular -name and
-setcookie that are important to set correctly. The -name must be
the name of a node as specified above, .eg. dev2@127.0.0.1 and the
-setcookie sets a cookie file common to all erlang nodes in a
cluster. More details are documented in the comments in the vm.args
file.
Zones
BigCouch also supports zones, which is as easy as adding the numbers of zones to the cluster config:
[cluster]
z = 3and then editing the node entries for each node in a zone to add a zone field, .eg.:
{"_id": "dev1@127.0.0.1", zone": "parts_unknown"}API differences
BigCouch embeds CouchDB and strives to maintain API compatibility, but differences do arise, often due to fundamental constraints of programming distributed systems. In this section we outline those differences.
BigCouch does not allow the user to
_restartthe entire cluster.The code behind the
_replicateendpoint in BigCouch is taken directly from Apache CouchDB. As such there is no sensible notion of a "local" database as a source or target. Users wishing to replicate with a BigCouch database should specify the database using a full URL.Cluster-wide
_statsare not provided. Statistics collection can be accomplished via the backdoor port.DB and view index
_compactresources are not supported. Compaction is triggered for each database partition individually via the backdoor HTTP interface.Update sequences are JSON Arrays instead of simple integers.
The
_changesfeed is not totally ordered but does honor all the semantics required for replication and third party synchronization. Two calls to_changeswill likely yield a different order of results. There is no guarantee even for a single user that changes made in a certain order will appear that way as writes are distributed to multiple nodes.Incremental requests to
_changesusing the ?since query-string parameter may show the same update multiple times if the set of database partitions used to generate the previous response cannot be contacted; i.e., BigCouch will contact another replica of that partition and merge its local_changesfeed into the result instead of refusing the request._purgeis not supportedTemporary views are not supported. It's widely thought that these should be removed completely from CouchDB so we've taken the lead
all_or_nothingsemantics on_bulk_docsare not supported in BigCouch.