Tuesday, October 23, 2012

How to manage Hot backup Or Manage Disaster Recovery in CQ

There are various approach, I am listing them with pros and cons


Approach 1: Use clustering, but direct the requests only to one node. In case of problems with this node just switch over to the other node. Essentially this is an active-passive scenario.
+ proven technology, documented and fully supported feature
+ automatic fail over easily possible
- additional license cost
- Latency issue might effect performance
- Managing cluster is sometime difficult

Approach 2: Use replication (on modification, no version, no status update).
+ proven technology, documented and fully supported
+ easy handling, reconfiguration during runtime
- manual reconfiguration of replication in case of switch
- "on modification" only works for cq:pages and not for other type like DAM
workflows, Events, Replication queues. (But if you activate DAM asset it will be there is stand by system)

So, this approach 2 doesn't achieve a "full-standby" system, but it more looks like a way not to loose content. Everything else is probably gone, so it's only for a really worst-case scenario.


Approach 3: Build periodically content packages and replicate them to the standby-system
+ you can also do a provisioning of other staging systems with this content
+ load on the active system predictable, normal editing actions are not
loaded with this
- self-written, not supported
- you can package workflows and events when grabbing them from repository
- Need testing to see if it will work.

Some Questions:

Q: What is best way to create active passive clustering node and what should be I careful about.
A: Create multi node cluster and make sure that DR nodes are not taking any request. You have to careful about that there is not a lot of latency between DR node active nodes.
Also it is very difficult to keep one node from active node as master in case master goes down. There is no harm of having DR as master node but not recommended (As write always goes through master). You can use either felix console or preferredMaster property to set up master in advance. Please read http://crxcluster.wemblog.com very carefully to understand CRX better.

To make a node master:
http://HOST:PORT/system/console/jmx/com.adobe.granite%3Atype%3DRepository




Since this is exposed as a JMX you can monitor it and invoke it during run time.

Q: Do I have to take cold backup of all the nodes
A: No, if you are using clustering then taking backup of any node (usually master) is enough.

Q: How about publish instance 
A: It is not recommended to use clustering in publish instance, Unless there is no other way to support some use case. In that case each publish instane is Hotback of each other. Note that you can not recover publish instance from nightly backup (As things might have been updated from the time back up was created). Usually it is recommended to have a backup publish instance which does not take load but configured as replication agent in author.

Q: I am just left with nightly backup, How should I create new publish instance ?
A: In this case you have to find out things replicated from the time last nightly backup was created and now. You can use nightly backup in conjunction with http://www.wemblog.com/2011/10/how-to-find-all-pages-modified-or.html  to support this use case.

10 comments:

  1. Seen several alternative re-rooting methods but this is the easiest and the most doable one. It just took me an ample amount of time in redoing the whole thing and everything seems okay again.

    ReplyDelete
  2. Hi Yogesh,

    Interesting blog. I have a question on Publish Clustering. Is there any reason why publish clustering is not recommended? Looks like shared-nothing cluster is a good option for high availability provided there are no network latency issues. Instead of configuring multiple publish instances with replication from author against each one of them, can we use a publish cluster with replication happening to the master publish instance from the author and then the cluster takes care of synchronization between the different publish nodes.

    Could you provide your thoughts?

    ReplyDelete
    Replies
    1. Hello,

      Thanks for your feedback. Reason why clustering is not recommended in publish is because of single point of failure and in stability of clustering. Publish is very critical to business transaction and at any point you should be able to recover or work on any publish instance independently which is difficult in clustering case.

      Again I am not saying that you can not use clustering in publish but it is not recommended.

      Yogesh

      Delete
  3. Hello Yogesh,
    Nice and helpful article
    Do you have any suggestion on the online backup which CQ provides? have you found this as stable as cold backup?

    ReplyDelete
    Replies
    1. Yes. Online backups are stable (But suited for moderate size repo). Here is more information about backup http://www.cqtutorial.com/courses/cq-admin/cq-admin-lessons/cq-maintenance/cq-backup

      Yogesh

      Delete
  4. Nice and very helpful topic are you share with us. I am know about more online backups by read your post. keep up to write. I am come back to read your next post. Wish you all the best.

    ReplyDelete
  5. Really i agreed this Adobe is really helpful and amazing also good Article .keep on :)

    ReplyDelete
  6. Which one best Approach? Can you prefer any one

    ReplyDelete
    Replies
    1. Hello chiji,

      That depend, If you don't care about versioning, ACL and all then replication would be better approach as you don't have to deal with clustering issue. Other wise using Active Passive clustering is good solution.

      Yogesh

      Delete
  7. The second one looks more reliable. Nevertheless, i assume that data replication is also the most expensive one.

    ReplyDelete