Monday, July 16, 2012

How to Clean Up Nodes in CQ / WEM

Use Case:

1) Clean your repository.
2) remove unwanted nodes.

Solution: You can use following package to clean up nodes based on certain criteria.

1) Install this package using package manager






2) Go to http://HOST:PORT/apps/tools/components/cleanNode/run.html

3) Select your options

4) Select run



Note: Please use this utility with care, As it leads to deletion of node from repository. Please DO NOT use it in production without proper testing.

Thursday, July 12, 2012

How to clear replication queue in CQ / WEM

Use Case

Case 1: Your replication queue is blocking and you have no idea why it is not getting processed.
Case 2: Some one accidentally activated thousands of pages and you want to clean them now.
Case 3: You want safe way to remove replication jobs and then activate them again if required
Case 4: You want to clean a dummy or problematic replication agent

Pre requisite : CQ5.4 or CQ5.3 with replication fix pack.

Solution :

1) Install attached package using package manager

2) Go to http://<host>:<port>/apps/tools/components/cleanReplication/run.html Or go to http://<host>:<port>/etc/replication/agents.author.html click on your agent and you will will have options.

3) First do dry run and see if entries are getting deleted

4) Then check all checkbox and click on run clean replication.

5) There is also a failed replication Listener configuration, You can configure it to send an email in case replication fails.

Here are some screen shot




















Here is package:

For CQ5.4:



Q: What will go to failed queue
A: Replication Job failed even after max retry.

Q: Why not use clear replication from replication page to remove entry ?
A: Some time you won't see entry to remove. And some time you have a lot of items in queue that is not easy to remove using UI. Some time you want to delete just specific entry and they are a lot (In that case you can use filter). See use case for more detail.

Note: Since the way replication data is stored in repository changes a lot over time this might not work with earlier version of CQ till CQ5.3. You can also remove unwanted code and change it according to your Need. If there is some problem let me know and given time I can look in to it. 

Special thanks to Henry Saginor from Adobe for providing code for failed replication listener. 



How to Use Dispatcher with Mapped content

Use Case: You are using /etc/map or resource resolution to map your content and dispatcher flush is not working any more.
More information about Sling mapping can be found here http://sling.apache.org/site/mappings-for-resource-resolution.html

I have seen a lot of customer using sling mapping or resource resolver setting to map /content/<There site> to / to shorten URL. But as soon as they do that, They claim that dispatcher flush is not working. And here is reason why,

1) Dispatcher flush does not take mapping or resource resolver rule in to account to flush cache or invalidate static file.

To understand it better, Here is use case,

1) From example your site URL is somedomain.com/content/mysite/en/us/survey.html and then you shorten it to somedomain.com/en/us/survey.html by mapping rules or resource resolver rule.

2) Suppose your document root is /docroot/htdocs

3) Now when some one try to access page somedomain.com/en/us/survey.html your page will get cached under /docroot/htdocs/en/us/survey.html

4) But when you will activate this page from author page under somedomain.com/content/mysite/en/us/survey.html will get flush, As dispatcher flush agent has no idea about mapping rules.

Resolution:

Now in order to avoid this problem, You should have mod_rewrite rule along with resource resolver and mapping rules.

Here is basic example of rules you want in your mod_rewrite (These rules will differ from site to site)

RewriteRule ^/$ <Your home page>

RewriteCond %{REQUEST_URI} !^/apps/(.*) [NC]
RewriteCond %{REQUEST_URI} !^/etc(.*) [NC]
RewriteCond %{REQUEST_URI} !^/libs(.*) [NC]
RewriteCond %{REQUEST_URI} !^/content(.*) [NC]
RewriteCond %{REQUEST_URI} !^/system(.*) [NC]
RewriteCond %{REQUEST_URI} !^/dam(.*) [NC]
RewriteRule ^/(.*) /content/mysite/$1 [PT]
RewriteRule ^/(.*)\?(.*) /content/mysite/$1 [PT]

Above rule make sure that content at right path is getting flushed.

Some more trick (But will not work 100% see http://forums.adobe.com/thread/1082213 for detail):

You can also do something like

LoadModule headers_module modules/mod_headers.so
RequestHeader edit CQ-Handle /content/mysite(/.*) $1 early

Another solution at replication agent level By David to solve this issue,

http://adobe-consulting-services.github.io/acs-aem-commons/features/dispatcher-flush-rules.html

code for above is found here https://github.com/Adobe-Consulting-Services/acs-aem-commons/tree/master/bundle/src/main/java/com/adobe/acs/commons/replication

Q: Why I will use etc/map or resource resolution if these rules are enough for mapping ?
A: Mod_rewriter rules will take care of just mapping and not link rewriting. You need /etc/map or resource resolver rules for link rewriting.

Q: Why dispatcher flush do not take mapping rules in to account ?
A: There is already an enhancement request for this.

Q: I don't want to write these rewrite rules, How else I can handle this ?
A: One option is set your stat file level to 0 (Which is default), This might invalidate all resource under invalidation tag of dispatcher.any upon activation. But again if you watch your dispatcher log carefully, wrong file will get evicted on activation. But you still see updated content due to invalidation.

Q: How about vanity URL ?
A: Well thats problematic. You might want to stick with stat file or write mod_rewrite rules for them as well. 

On side note, understanding /statfileslevel is very important to improve performance of your website.