Wednesday, December 10, 2014

How to Set Up Clustering In CQ/AEM 6 using MongoDB

Background:


With CQ / AEM 6 TarPM is not supported any more. AEM 6 ships with Oak which for now support TarMK and MongoMK Microkernal OOTB. More information about what is New Can be found from http://www.slideshare.net/AEMHub2014/oak-michael-marth . With this change Support from Clustering is moved to actual storage layer it self (Which make more sense, given supporting all issues for clustering in earlier version). TarMK does not have replication or sharding feature so it comes down to MongoDB which support replication and sharding and hence enable High Availability (HA through replication) and Scalability (Through Sharding, Though this is still a question ?? See note below) through clustering in CQ /AEM 6.

Here we will give step by step instruction of how to set up clustering using MongoDB in CQ

Pre requisite:


There are two cases for setting Up Replica Set:

Set up a new MongoDB Instance:

  • Set up additional MongoDB instance based on instruction above
  • Start any one of instance using ./mongod --port <Your Port> --dbpath <Your DB Path> --replSet <Replica Set Name could be any thing> &  
  • You can also use configuration file to do that. More instruction here http://docs.mongodb.org/manual/tutorial/deploy-replica-set/
  • Once Mongo DB is started you can add additional replica using following instruction 
  • Once Replica set is up, Now set Up AEM
  • Then You can go to each Mongo Instance and check of data is coming using Mongo Log
Convert Existing Mongo Instance:

  • Stop you AEM instance
  • Use Following instruction to convert Mongo to replica
  • Once this is set Change AEM start script to add mongo replica instance as given in approach one 
  • start your AEM instance
  • AEM should be part of replica set now

Backup and Restore

Please check https://docs.mongodb.org/v3.0/tutorial/backup-and-restore-tools/ for MongoDB instruction of backup and restore. 

Automated script can be found here: https://github.com/micahwedemeyer/automongobackup/blob/master/src/automongobackup.sh just put this script under /etc/cron.daily and you are set for backup.

Some Common Questions

Should I set up my AEM author instance on MongoDB

Unless you have clustering requirement, I would not suggest to set up your author instance with MongoDB. Mainly because of administrative overhead.

Should I set up my AEM publish instance on MongoDB

Same as above, Unless you have a requirement which requires shared content generation I would suggest not to use MongoDB. With AEM communities, now you have an option to add Mongo Persistence for community feature at any time. Here is more detail https://docs.adobe.com/docs/en/aem/6-1/administer/communities/srp/msrp.html and https://docs.adobe.com/docs/en/aem/6-1/administer/communities/srp/msrp/demo-mongo.html

Should I store Blobs in MongoDB as well in AEM

It is not recommended to store Blob data with MongoDB. There are other options like, Local Storage, NAS, AWS you can use in that case. More detail https://docs.adobe.com/content/docs/en/aem/6-1/deploy/platform/aem-with-mongodb.html#AEM Configuration and https://docs.adobe.com/docs/en/aem/6-1/deploy/platform/data-store-config.html

How can I secure my MongoDB deployment with AEM

Notes:


1) Mongo Replication Only Provide High Availability (HA) it does not provide scalability. For scalability you need to use Sharding feature provided by Mongo. However I am not sure what would be best key to create shard on for Mongo. You can create Shard based on _id attribute. More information about sharding can be obtained here http://docs.mongodb.org/manual/sharding/  . If you are using Sharding I would suggest to use sharding with replication (Shard and then replicate shard instance) to provide both HA and scalability.

2)  There are many feature available in Mongo Replication where you can make certain replica instance read only (Data Center replica), you can use this to avoid high latency across Data Center here is all configuration you can do on Mongo http://docs.mongodb.org/manual/administration/replica-set-member-configuration/

3) MongoDB recently released MMS https://mms.mongodb.com/ to monitor and deploy Mongo Cluster easily. This will be useful if you are worried about administrative cost for Mongo 

4) If you don't want to store large documents in Mongo feel free to use custom Data Store using instruction here http://jackrabbit.apache.org/oak/docs/osgi_config.html

5) Mongo Recently launched another feature of pluggable datastore. You can use this for faster read and write based on your requirement (For example Primary with high Write Enabled Storage Like SSD or something and read with cheap storage). More info here https://www.mongosoup.de/blog-entry/A-closer-look-at-pluggable-storage.html (Official Doc yet to come)

6) Official AEM Documentation: https://docs.adobe.com/content/docs/en/aem/6-1/deploy/platform/aem-with-mongodb.html


Finally .... Some more Mongo Command ...



Special Thanks To Nelson Mei for Setting up POC for Mongo with AEM

11 comments:

  1. This looks like an awesome article. I have to start the conversion of TarMK to MongoMK however found the migration tool doesn't work so going to use VLT instead. (saw reported bug in Jira)

    Thanks for your hardwork I will be stepping through this i the coming weeks.

    ReplyDelete
    Replies
    1. Hello Eric,

      Not sure why you are using VLT, you mean you are trying to migrate everything using vlt sync ?

      You might lose version and audit info in this process I guess.

      Yogesh

      Delete
  2. Thanks for the great information and for all of the other posts I have used as resources! This site has really become an awesome source of little know configs and tidbits to point me in the right direction.

    I have a question related to a combined TarMK and MongoMK configuration shown here:
    http://docs.adobe.com/docs/en/aem/6-0/deploy/recommended-deploys.html#MongoMK%20with%20Active/Active%20Cluster

    If I am using TarMK for the Author, does the publish agent point to just one Publish Instance (since they all share a single backend ), all Publish Instances or Directly to MongoDB? If it does point to MongoDB, what does the configuration for the Publish Agent look like?

    I appreciate any direction you can provide, thanks!

    ReplyDelete
    Replies
    1. If all publish share common data, for example user generated content, that needs to be in sync in all publish at same time. You can use MongoMK in that case. In that setting all publish agent will point to common Mongo replica Cluster as shown above. You can also have a setting where master Mongos can be used for write and other for read. This will help to improve read latency in such set up.

      Yogesh

      Delete
    2. Thanks! I understand having the publish instances using MongoMK and pointing to a common MongoDB, but if I am using TarMK with an Author Instance creating Authored Content for those Publish Instances, how is the content replicated? By pointing to one of the Publishes or by pointing it to the MongoDB directly somehow?

      Delete
    3. Hello George,

      Sorry for confusion. You need to set up replication to just 1 publish instance (Publish pointing to master replica preferred) in this case. As replication happen at Mongo level for other publish instances.

      Yogesh

      Delete
    4. I appreciate the clarification! The diagram showed the Author pointing to MongoDB and I was not sure if it should be replicating directly or through a Publish Instance. It makes sense to let the one Publish ingest the replication and store them in MongoMK for all other Publishes to access. But if there was a way to Publish directly to the MongoDB from the Author instance, I thought I should try it!

      Thanks again, and your "-Doak.mongo.db=" config example saved me a bunch of time. It seems to be missing from quite a few of the other examples I have seen.

      Delete
    5. I did not see diagram. But that is possible if you are using same Mongo for both author and publish (You need to create different collection in same database however). But even in that case you need to set up at least one replication agent.

      Yogesh

      Delete
  3. Hello Yogesh, Thanks for the clear explanation on aem and mongo integration. Please find below my clarification and let me know your thoughts
    We plan to set up 1 author with 3 publish instances(publish1, publish2, publish3) . Author is planned as TarMK and the publish would be of mongodb. We have 3 mongo instances of 1 primary (mongo1) and 2 secondary (mongo2, mongo3).
    publish1 is installed with pointing to mongo1 and we were able to replicate from author to publish1 using the tree activation.
    Now when we tend to install publish2 pointing to mongo1 or the other instances of mongo2 or mongo3, we get the following errors
    ---------------------
    java.lang.RuntimeException: Error occurred while obtaining InputStream for blobId [6c7055bf3fa87c47c2ade09f80849db9e4727e07#10816]
    ---------------------------------------------------

    And my concern is whether the individual publish instances are to installed with the fresh mongo and then at a later point mongo are to be replicated.

    ReplyDelete
    Replies
    1. Hello Krish,

      Any reason why pub2 and pub3 is pointing to mongo2 and mongo3 only ? In theory you should use same mongo config across all pub. As at one point there is only one master (Primary) in a mongo cluster (Replication). Pointing publish instance to just slave might not work. For more detail please check http://docs.mongodb.org/manual/core/replication-introduction/

      Yogesh

      Delete
    2. Hi Yogesh,
      Can you provide a recommended replication for MongoDB publishers. Specifically looking for replication from author to one publisher or replicating from author to all publishers via a load balancer. Can a load balancer be used for replication?
      Thanks.

      Delete