mongo-connector

in Development

How to Use mongo-connector with Elasticsearch

Last week I wrote about how Jixee replaced Elasticsearch Rivers with mongo-connector.  This week I wanted to write up a tutorial to give you an idea of how you might implement this in your own environment. We’ll use a simple example for demo purposes, and we’ll be running apps from the command line in multiple terminals, instead of using start scripts.  Since this is a tutorial, we’ll start from scratch.

Here’s what you’ll need:

  • Ubuntu 14.04.3 (at least a 2G instance for this example)
  • curl
  • Java JRE 1.8.xx
  • Python 2.7.6 and pip
  • MongoDB 3.07 (or higher)
  • Elastics 2.0.0 (or higher)
  • mongo-connector (latest)

 

Installation

Let’s kick it off by installing what we need to get started.  On certain steps I’m going to have you open a separate terminal that you’ll need to keep open, in which we’ll start some of our services.

 

1. Install Python pip, and curl

sudo apt-get install python2.7 python-pip curl

2. Install MongoDB

sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 7F0CEB10
sudo echo "deb http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.0 multiverse" >/etc/apt/sources.list.d/mongodb-org-3.0.list
sudo apt-get update
sudo apt-get install mongodb-org

3. Install Java

sudo apt-add-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer

Let’s test to make sure java installed correctly:

java -version

You should see something like:

java version "1.8.0_66"
Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode)

4. Install Elastic 2.0.0

cd /opt
sudo curl -L -O https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/tar/elasticsearch/2.0.0/elasticsearch-2.0.0.tar.gz
sudo tar -xvf elasticsearch-2.0.0.tar.gz

Set java env:

export JAVA_HOME=/usr/lib/jvm/java-8-oracle

Now let’s set perms on the elasticsearch-2.0.0 dir to our user. We want to do this since we’ll be running Elastic as our own user:

sudo chown -R your_username elasticsearch-2.0.0
cd elasticsearch-2.0.0
./bin/elasticsearch

Now leave this terminal open off to the side somewhere, and open a new terminal. In that new terminal, test to see if Elasticsearch is up:

curl http://localhost:9200

You should see something like (name is randomized on default intances of Elastic):

{
  "name" : "Suprema",
  "cluster_name" : "elasticsearch",
  "version" : {
    "number" : "2.0.0",
    "build_hash" : "de54438d6af8f9340d50c5c786151783ce7d6be5",
    "build_timestamp" : "2015-10-22T08:09:48Z",
    "build_snapshot" : false,
    "lucene_version" : "5.2.1"
  },
  "tagline" : "You Know, for Search"
}

 

One more install left, stay in this new terminal for the next step.

5. Install mongo-connector with pip:

pip install mongo-connector

 

Configure MongoDB

Edit the MongoDB config file and turn on replication. Since mongo-connector uses MongoDB’s Replica Set Oplog to sync, replication is an important feature that must be turned on in order for mongo-connector to work. I’ve included the proper config file you should be using.

Replace /etc/mongo.conf with the below:

# mongod.conf
 
# for documentation of all options, see:
#   http://docs.mongodb.org/manual/reference/configuration-options/
 
# Where and how to store data.
storage:
  dbPath: /var/lib/mongodb
  journal:
    enabled: true
#  engine:
#  mmapv1:
#  wiredTiger:
 
# where to write logging data.
systemLog:
  destination: file
  logAppend: true
  path: /var/log/mongodb/mongod.log
 
# network interfaces
net:
  port: 27017
#  bindIp: 127.0.0.1
 
 
#processManagement:
 
#security:
 
#operationProfiling:
 
replication:
  replSetName: rs0
  oplogSizeMB: 100
 
#sharding:
 
## Enterprise-Only Options:
 
#auditLog:
 
#snmp:

 

Now let’s restart MongoDB:

service mongod restart

Next step is to make sure replication is turned on.  Start the Mongodb command line client:

mongo

(you’ll see some warnings which you can ignore for the purposes of this demonstration. If you’re deploying MongoDB in production, you’ll want to make sure to address the Warnings)

 

Now turn on replication:

rs.initiate()

The prompt should become:

rs0:PRIMARY>

Stay in this prompt for the next section.

 

Create Some Test Data

Alright now we’re ready to create some test data. While you’re still in the mongo client, let’s run some commands.

Create new database:

use connectortest

Create a new collection we’ll be using to test:

db.createCollection("syncthis");

Now let’s seed this collection with some sequential test data. We can run javascript directly in the mongo client, so let’s do a small for loop:

for (var i = 1; i <= 10; i++) {
   db.syncthis.insert( { value : i } )
}

Let’s check to make sure our data is there:

db.syncthis.find()

You should see something like this (ObjectIds will be likely not be like mine below):

{ "_id" : ObjectId("5643023b236a08630cce8dd1"), "value" : 1 }
{ "_id" : ObjectId("5643023b236a08630cce8dd2"), "value" : 2 }
{ "_id" : ObjectId("5643023b236a08630cce8dd3"), "value" : 3 }
{ "_id" : ObjectId("5643023b236a08630cce8dd4"), "value" : 4 }
{ "_id" : ObjectId("5643023b236a08630cce8dd5"), "value" : 5 }
{ "_id" : ObjectId("5643023b236a08630cce8dd6"), "value" : 6 }
{ "_id" : ObjectId("5643023b236a08630cce8dd7"), "value" : 7 }
{ "_id" : ObjectId("5643023b236a08630cce8dd8"), "value" : 8 }
{ "_id" : ObjectId("5643023b236a08630cce8dd9"), "value" : 9 }
{ "_id" : ObjectId("5643023b236a08630cce8dda"), "value" : 10 }

 

Alright, we’re ready to fire up mongo-connector.  Save this terminal for later and let’s open a new terminal.

 

Setup mongo-connector

Ok, from our new terminal, starting up mongo-connector should be pretty easy.

mongo-connector -m localhost:27017 -t http://localhost:9200 -o /opt/mongo-connector.oplog -d elastic_doc_manager -n connectortest.syncthis

For completeness’ sake, this is the breakdown of the parameters in the above command:

-m    mongodb_host:27017

-t     http://elasticsearch_host:9200

-o    mongodb_oplog_position.oplog

-d   (document type of target, in this case elastic_doc_manager, others are: solr_doc_manager, mongo_doc_manager)

-n    database.collection,database.another_collection

Verify mongo-connector is working

We’ll need yet another new terminal for this step (sorry for all the terms!). Run:

curl http://localhost:9200/connectortest/_search/?pretty=1

You should see a list of the objects that you created in your MongoDB collection above, and in no particular order. Another test is to check the object count in your index.

curl localhost:9200/connectortest/_count

Output will be like:

{
  "count" : 10,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  }
}

Add more data to MongoDB

Alright we’re almost done. One last thing to do is to make sure any data we add to MongoDB will be sync’d to our elastic instance. Let’s go back to our mongo client terminal and run another data creation method:

for (var i = 1; i <= 5; i++) {
   db.syncthis.insert( { value : i } )
}

Now verify the new data using the count method we discussed above:  

curl localhost:9200/connectortest/_count

If the connector worked, you should see:

{
  "count" : 15,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  }
}

 

And there you have it.  If you’re still with me, you can see how mongo-connector is a viable and easy way to sync your Mongodb data into Elastic.  This is obviously a simplistic demo of mongo-connector, but you can see how these ideas can be refactored to support your MongoDB implementations. Certain things you would want to add to your implementation would of course be start scripts for Elastic and mongo-connector, so they can run as services outputting to logs in place of the terminals we used to illustrate.

Thanks for reading!

  • Pingback: What Jixee Did to Replace Elasticsearch Rivers - Jixee Blog()

  • Pingback: DB WEEKLY NO.79 | ENUE Blog()

  • sampath kumar

    Hi! Thank you very much for the tutorial. I’ve successfully connected mongo DB and elastic search by following the tutorial. But I’m facing the problem indexing any new DB’s other than the one show in the tutorial. Can you please help me find out what the problem is.

    • Eric

      Hi Sampath, no problem! Sorry you’re having issues. You should be able to add databases and collections to the mongo-connector start line like this:

      mongo-connector -m localhost:27017 -t http://localhost:9200 -o /opt/mongo-connector.oplog -d elastic_doc_manager -n connectortest.syncthis,anotherdb.collection_name

      Have you already tried that?

      • mlioti

        I tried this as well and it did not work. Can you explain why?

        Everytime I search within Elasticsearch I only see the values for the syncthis collection. I can not seem to query/search on my preexisting databases.

  • SivaCoHan

    Thans for sharing. It helps a lot.

    • Eric

      Absolutely! I’m glad you found this helpful.

  • http://www.laurent-sarrazin.com/ Laurent Sarrazin

    Thanks for this tutorial. Really helpful. It seems that “elastic2_doc_manager” should be used instead of “elastic_doc_manager”.

  • Amit Kumar

    I have billions of record in mongodb and have continuous write operation in mongodb. Is it recommended to use Mongo-connector in production environment? What should we do if data are not syncing to es. We have to recreate the index or other solution exist?

  • Victor Tang

    Thank you for the tutorial. For everyone who has the problem with the configuration part.
    1. Pay attention to the db path
    2. Use “mongod -f ” to start the mongodb