Friday, September 30, 2011

How to get Client IP through dispatcher in CQ

Try following code to get client IP address.

// headers in order of trust, most trusted at top

Then in your JSP you can just do

<%=getClientIpAddr(slingRequest)%>

Note: Make sure that all headers are allowed from dispatcher

At dispatcher side make sure that you are allowing all the headers with this configuration in dispatcher.any

/clientheaders
      {
      "*"
      }

How to remove version history in CQ / WEM

Problem:
Over time the Version store /jcr:system/jcr:versionStorage can grow to a considerable size. You see that when:
lots of tar files in (CQ5.3):
/crx-quickstart/repository/version/copy
/crx-quickstart/repository/shared/version
large lucene index on:
/crx-quickstart/repository/repository/index

This is also helpful when you are upgrading from CQ5.2X to latest version (As version purging was not enabled in previous version)

Solution: There are couple of solution to this,

You could enable automatic version purging. See more information here

You can also remove version using code mention here

Tuesday, September 20, 2011

How to Set up SSL for local in CQ / WEM / Dispatcher

Use case For testing

Step 1: Generate Key Store

Use Keytool to generate keystore

You can also confirm if you have keytool in your system by using command keytool --help

Once you have keytool installed you can following command to generate ketstore

From command line navigate to /crx-quickstart/server/etc
Use command "keytool -genkey -keystore mykey -alias myalias -keyalg rsa"

Set up SSL till CQ5.4 (Using server.xml)

use following entry in server.xml

<container>
<listener>
<!-- You already have one entry here don't modify it-->
</listener>
<!--Entry for new SSL Listener-->
<listener>
<bind-port>443</bind-port>
<ssl>
<protocol>SSL</protocol>
<key-store>
<name>etc/mykey</name>
<passphrase><Password you have given while creating certificate></passphrase>
</key-store>
<key>
<alias>myalias</alias>
<password><Password you have given while creating certificate></password>
</key>
</ssl>
</listener>
<!--End of new entry for SSL-->
</container>


You can also check /crx-quickstart/server/etc/SSL_HowTo.txt to see how configuration can be done.

NOTE:

Once you have SSL set up check logs/server.log to make sure that server is started on secure port.

If you get Error like,

*ERROR* servletengine: Unable to start https listener on address 127.0.0.1, port 443: Permission denied
That means you need to start CQ as root user.

Set up SSL in CQ5.5

In CQ5.5 CQSE is deployed as a bundle and you can configure SSL using Felix configuration, Please see screen shot of how to do that. All parameter is self explanatory



This is actual configuration




Note
1. You can put certificate file at any location you want. Only absolute path is required.
2. There is no way to configure multiple port you can listen to now.

Set up SSL on apache (If your SSL terminate at apache)

Assuming that you are using Apache web server,

Click here to see how to generate certificate and key file

If you already have cert and password then you can use following command to generate key

openssl rsa -in <Your Key>.key -out <Key with Password>.new.key

Then go to /conf/httpd.conf and add following entry

Listen 443
<VirtualHost *:80>
    ServerName wemblog.com
    ServerAlias wemblog*.com

    RewriteEngine on
    #Rewrite all request to https
    RewriteRule ^(.*)$ https://%{HTTP_HOST}$1 [L,R=301]
</VirtualHost>

NameVirtualHost *:443
Listen 443
<VirtualHost *:443>
    ServerName wemblog.com
    ServerAlias wemblog*.com
    SSLEngine on
    SSLCertificateFile <cert path>.cert
    SSLCertificateKeyFile <key path>.key

    RewriteEngine on
  
    ProxyPreserveHost On
    ProxyPass / http://localhost:4502/
    ProxyPassreverse / http://localhost:4502

    #set header for SSL
    Header add X-Forwarded-Proto "https"
    <LocationMatch "/(content|apps|etc).*">
        RequestHeader set X-Forwarded-Proto "https"
    </LocationMatch>

</VirtualHost>

To be honest you will get ton of information about How to set up SSL on Apache on Google.


Note: If you just have to use https (force https in CQ) and not let author to use http, There are two options,

1) You can configure dispatcher rewrite rule to redirect all http request to https.
2) If you are not using dispatcher, you can write rewrite rule under /etc/map to redirect all request to https port. Here is example

Please check https://cwiki.apache.org/SLING/flexible-resource-resolution.html


You have to do something like this

/etc/map
+-- http
+-- localhost.4502
+-- sling:redirect = "https://localhost:<your secure port>"

How to set up debug mode for Class/Package in CQ / CRX (Till CQ5.4)

Use Case You have problem with certain modules in CQ/CRX and you want to debug it

Setting Debug mode in CQ

For example if I have to set debug for package com.day.cq.dam.core

[1] Log into Felix Console: http://<host>:<port>/system/console/configMgr
[2] From "Factory Configurations", create "Apache Sling Logging Writer Configuration"
[2.1] Set value of "Log File" to "../logs/dam.log"
[2.2] Click on "Save"
[3] From "Factory Configurations", create "Apache Sling Logging Logger Configuration"
[3.1] Set value of "Log Level" to "Debug"
[3.2] Set value of "Log File" to "../logs/dam.log"
[3.3] Add "Logger" => com.day.cq.dam.core
[3.4] Click on "Save"

Now all debug information for package com.day.cq.dam.core will be redirected to /crx-quickstart/logs/dam.log

Click Here to see how to Rotate custom logs in CQ.


Note: Note that OOTB logger information will be redirected

For example everything with

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
private final Logger logger = LoggerFactory.getLogger(getClass());
and then logger.debug("Your message");


Setting debug mode for CRX

Suppose you want to enable logging for com.day.crx.security.ldap package.

1. Open the log4j.xml file (located in crx-quickstart/server/runtime/0/_crx/WEB-INF/) and add another appender:

<!-- ldap.log -->
<appender name="ldap" class="org.apache.log4j.FileAppender">
<param name="File" value="crx-quickstart/logs/crx/ldap.log"/>
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%d{dd.MM.yyyy HH:mm:ss} *%-5p* %m%n"/>
</layout>
</appender>


2. Add another logger:

<logger name="com.day.crx.security.ldap">
<level value="debug"/>
<appender-ref ref="ldap"/>
</logger>


NOTE

Optionally, you can use "com.day.crx.security" instead of "com.day.crx.security.ldap" to see all CRX security-related debug log messages in the logs/crx/ldap.log file.

3. Restart the server.

4. Try logging into CRX directly with your LDAP account.

5. Examine the ldap.log and error.log files from CRX to debug for errors.

NOTE

If you are troubleshooting LDAP for a CQ instance, first try logging CRX at http://<host>:<port>/crx. Second, try logging into CQ at http://<host>:<port>. Then examine the ldap.log and error.log files from CRX and CQ for errors.

For CQ5.5 there is no log4j.xml. All the CRX logger needs to configured using sling config. So If you want to debug LDAP in CQ5.5, You have to use sling logging logger configuration


Just as a side note you can start your CRXDE in debug mode by adding "-d" in the end of start option. For example to start CRXDE in debug mode you can chose java -jar <CQ jar file name> -d

Friday, September 16, 2011

How to add ACL to a node

Use case: Some time you want to associate ACL on a node based on certain events (For example page creation under certain path).

Solution: You can use this code to associate ACL with a node.

How to create pages using curl command in CQ / WEM

Use Case For testing (If you are trying to capture certain events or testing something in MSM)

Solution There are various way of creating pages using curl command in CQ Some of them are as follows

1) curl -u admin:admin –F cmd="createPage" -F label="" -F parentPath="/content/geometrixx/en/company" -F template="/apps/geometrixx/templates/contentpage" -F title="new page" http://localhost:4502/bin/wcmcommand

2) curl -u admin:admin -F "jcr:primaryType=cq:Page" -F "jcr:content/jcr:primaryType=cq:PageContent" -F "jcr:content/jcr:title=New Page" -F "jcr:content/sling:resourceType=geometrixx/components/contentpage" http://localhost:4502/content/geometrixx/en/page

3) curl --data jcr:primaryType=cq:page --user admin:admin http://<host>:<port>/content/geometrixx/en/toolbar/test3

And then

curl --data jcr:primaryType=PageContent --user admin:admin http://<host>:<port>/content/geometrixx/en/toolbar/test3/jcr:content


You can also use JAVA code to create pages in CQ.

Sling Content Manipulation POST API detail can be found here http://sling.apache.org/site/manipulating-content-the-slingpostservlet-servletspost.html

Thanks Henry Seginor, Paul from Adobe for providing this information

How to use vlt tool to copy data from one CRX to other CRX

Use Case Some time you want to migrate repository from one instance to another instance (Because of non repairable corruption in current repository). If your repository size is big, Package manager is not a viable option (As you end up creating a lot of packages)

Assumption You have basic knowledge about CQ structure and CRX. You are using crx2.2 or higher. Path of copy could differ if you are using older version of CRX.

Here are different options:

1) Use package manager
    • Pros: Very simple to create and do not require command like knowledge
    • Cons: Not suitable for Big packages. Slow some time
2) Use VLT (We are explaining how to use this in this article)
    •  Pros: Faster than package manager. Do not consume package space like package manager.
    • Cons: Slow I/O
3) Use Tool based on VLT like Recap http://adamcin.net/net.adamcin.recap/
    • Pros: VLT rcp UI easy to see status and administer. Do not require command like knowledge.
    • Cons: Slow I/O because it uses VLT under the hood.
4) Use Tool based on other transfer protocol like Grabit  https://github.com/TWCable/grabbit
    • Pros: Faster than VLT
    • Cons: Not Adobe supported. Need more dependency to set up.

Step 1: Make sure that you have vlt set up properly. More detail can be found here. In the end you can use vlt --version command to check if vlt is installed.



Step 2: Use following command to Migrate data between CRX

vlt rcp -r http://<login>:<password>@<source-host>:<port>/crx/-/jcr:root/<Source-Path> http://<login>:<password>@<destination-host>:<port>/crx/-/jcr:root/

Note
1) While you are migrating data from one CQ to another CQ, make sure that you stop launchpad application from <host>:<port>/admin (This will make sure that unnecessary workflows are not getting triggered during migration). If this is not possible make sure that you disable DAM related workflow by going to http://<host>:<port>/libs/cq/workflow/content/console.html and then launcher. Don't forget to re enable them after the copy and make sure that no DAM activities are going on during that time on target instance.


2) Note that copying user and groups does not mean all ACL are also copied. ACL are stored under /content node. In general you need to migrate following stuff (Plus if you have some thing else custom)
-- /content/<Your-site>
-- /apps/<Your-application>
-- /var/dam/<your-asset>
-- /content/dam/<your-asset>
-- /etc/design/<your-design>
-- /etc/tags/<your-tags>
-- /etc/workflow/<your-custom-stuff>
-- /etc/replication/<your-custom-replication>

Important Note: At any point of time (Even after Migration). You have to make sure that /content/dam and /var/dam are sync. After migration go to http://<host>:<port>/etc/dam/healthchecker.html and make sure that they are in sync by clicking check binaries (List entries missing in /var/dam) and check Asset (List entries missing in /content/dam). You might also do small test before you do big migration to make sure that all renditions are getting migrated fine.

If you want to process all asset again then just migrate /var/dam in batches (As Asset synchronization using workflow could be expensive) and enable launchpad or all workflows. In order to migrate Asset in baches you can use sleep between two rsync. Make sure that you monitor logs of target system for any OOM error or any issue.

You can have script to do above with all the path (So that you can re use them in future)

If do not have Login can download Recap from Here

How to read Logs from Browser in CQ

Use Case: Some time for whatever reason, You don't have an access to physical server thus can not read logs in real time.

Solution : Please follow steps below,

1) Download log reader war file from here
2) Go to http://<host>:<port>/admin
3) Add content as /logreader and upload war file you saved.
4) Once uploaded, You can go to http://<host>:<port>/logreader to read the log.

For CQ5.5 you can use system configuration to read on browser



For > CQ5.6 you can also go to HOST:PORT/system/console/status-slinglogs to view log files.



Caution:: Please don't keep this file in production (As it will consume resources). If your job is done remove it.


Thanks Sebastian Hoogenberk from Adobe for providing this information.

Thursday, September 15, 2011

How to reindex Large Repository in CQ or CRX

Note: Please test this before using.

Re-indexing a large repository may take a lot of time and cause downtime. The below procedure allows to keep the production repository operational and do the re-indexing on a backup. Downtime of the production repository is limited to the time it takes to index changes that were done on the production system while the re-indexing was performed.

  • Stop production repository
  • Create a copy of the repository to a separate machine
  • Start production repository again
  • On the repository copy:
    • delete the index
    • start repository, which will start the indexing process
    • once repository is up and running (indexing completed), stop it again
  • stop production repository
  • replace crx-quickstart/repository/workspaces/crx.default/index with the one from the copy
  • replace crx-quickstart/repository/revision.log with the one from the copy
  • rename crx-quickstart/repository/cluster_node.id to crx-quickstart/repository/cluster_node.id.orig
  • start production repository

At this point the repository will apply the changes to the index that were done since the copy was made.

  • Once the repository is online, shut it down again.
  • Delete crx-quickstart/repository/cluster_node.id
  • Rename crx-quickstart/repository/cluster_node.id.orig to cluster_node.id

CRX Clustering

All the information about CRX clustering can be found here, Please feel free to add comment.

How to remove eclipse based CRXDE cache

Why: Some time there is class path issues with CRXDE. You can clear CRXDE cache by removing ".crxde" under your home directory. For example in unix system if my username is test then you can find ".crxde" under /Users/test.

Note that .crxde is hidden folder.

You can also make jar file to be available for your CRXDE under class path by adding that jar file to /etc/crxde/profiles/default/libs
(You can use any webdev tool to do that). By default any thing under /install folder are available in class path.

How to remove repository inconsistency using redo.log in CQ5 / WEM

You can use redo.log to remove repository inconsistency (Instead of rebuilding whole index).

1. Find all the nodes {Please see below for that}
2. stop the instance
3. check the directory crx-quickstart/repository/workspaces/crx.default/index
4. there should be a file named 'indexes_X', e.g. indexes_1y
5. rename the redo_X.log so that it matches the suffix from above, e.g. redo_1y.log
6. place the redo-log in the crx-quickstart/repository/workspaces/crx.default/index folder
restart

Redo-log file will have entries like this

1 DEL 003171fe-e2e8-457b-a3af-f74eed12c1b9
1 DEL d4748598-374a-43a3-95a5-f53c94df8c7d
1 DEL fd8747eb-38c5-4256-b416-ad37b804a9e9
1 COM



How to find all inconsistent node
[1] Copy This file to your local system
[2] Open FindCorruption.java
[3] Change following variables to your local file system path
INPUT_FILE_NAME (This is location of your log file)
OUTPUT_FILE_NAME_SEARCH (Location where you want output to save)
OUTPUT_FILE_NAME_TAR
OUTPUT_FILE_NAME_MISSING_CHILD
OUTPUT_FILE_NAME_ORPHAN_CHILD
[4] Save and use javac FindCorruption.java
[5] Then run java FindCorruption
[6] For search index you can use it as redo.log
[7] for tar pm inconsistency you can use selective UUID (With consistencyCheckUUIDs param in repository.xml) to fix inconsistency. More information can be found here
[8] For missing child or orphan node you need to use console.sh (If it is not a lot) see information here or restore from backup (If you have a lot of orphan nodes)

How to find all Orphan Node in CRX

You often come across following error in CRX
NodeState CHILD_UUID references inexistent parent uuid PARENT_UUID
Unfortunately you can not fix this issue. But at least you can see that they exist in your repo.

Drop the attached JSP in runtime/0/_crx/diagnostic/, start CRX/CQ and try it to run through http://:/crx/diagnostic/showNodes.jsp. You can specify a path to a tar file or copy folder. It will scan through all nodes with uuid (referenceable)
and output an exception if the node is orphaned.

Other Option If you are sure that you have all the orphan node log message in crx log file, You can use any tool to parse them and find.

Thanks Andrew Khoury and Henry Seginor from Adobe for providing this information

Which index is what in CQ/CRX

> /repository/repository/index
Lucene Search Index for jcr:system (includes jcr:versionStorage)

> /repository/workspace/crx.default/index
Lucene Search Index for crx.default workspace

> /repository/workspace/crx.default/index*.tar
Tar PM Index for crx.default workspace

> /version/index {Version index offcourse}
Tar PM Index for version workspace

> /tarJournal/index*.tar
Tar PM Index for the cluster journal for Journal PM in 5.4

Thanks Andrew Khoury From Adobe for providing this information

How to find and remove all .lock file in CQ

Why: Sometime even after stopping CQ instance all the lock file does not get deleted, Causing problem in restart.

Use following command

find <path to /crx-quickstart> -name "\.lock" -exec rm '{}' \; -print

Simple unix stuff huh :)

How to fix LazyTextExtractorField: Failed to extract text from a binary property (LazyTextExtractorField.java, line 180) error

Symptom : Upload of certain documents (word, excel, pdf) cause system to go down because of OOM error. And since these document are under workflow, Restart of CQ does not help either.

Resolution :

1) Add -Dcom.day.crx.persistence.tar.IndexMergeDelay=0 In start up script. This will make sure that after uploading any document or changing large properties synchronize merge (There is known issue with Synchronize Index Merging) should not cause any issue.

2) Modify your repository.xml and in workspace.xml

change

<SearchIndex class="com.day.crx.query.lucene.LuceneHandler">
<param name="path" value="${wsp.home}/index"/>
<param name="resultFetchSize" value="50"/>
</SearchIndex>

to

<SearchIndex class="com.day.crx.query.lucene.LuceneHandler">
<param name="path" value="${wsp.home}/index"/>
<param name="resultFetchSize" value="50"/>
<param name="forkJavaCommand" value="nice java -Xmx32m"/>
<param name="extractorPoolSize" value="2"/>
</SearchIndex>


For above option in CRX2.1, you have to make sure that Hotfix pack 2.1.0.9 is installed. It will work OOTB in CRX2.2.

Please note that this will not help those file to process successfully, but help you to get rid of OOM error and your system will not go down.

Other Option: If you are really not concern about full text indexing of these documents, You could disable indexing of these document in tika-config (crx-quickstart/server/runtime/0/_crx/WEB-INF/classes/org/apache/jackrabbit/core/query/lucene/tika-config.xml). If this folder structure is not present then you have to create one. Original tika_config.xml can be found by unzipping crx-quickstart/server/runtime/0/_crx/WEB-INF/libs/jackrabbit-core-*.jar (Copy it to some other location, rename it to .zip and then unzip) and then going to org/apache/jackrabbit/core/query/lucene).

You could add org.apache.tika.parser.EmptyParser as class for not to parse document type.

For example (To not index excel sheet)

<parser class="org.apache.tika.parser.EmptyParser">
<mime>application/vnd.openxmlformats-officedocument.spreadsheetml.sheet</mime>
</parser>

To remove PDF parsing you can remove entry
<parser name="parse-pdf" class="org.apache.tika.parser.pdf.PDFParser">
<mime>application/pdf</mime>
</parser>

Above method will also help you to reduce Index size (Lucene) in CQ.

Note: To reduce Lucene Index size you can also add following in workspace.xml

<SearchIndex class="com.day.crx.query.lucene.LuceneHandler">
<param name="path" value="${wsp.home}/index"/>
<!-- add below param -->
<param name="supportHighlighting" value="false"/>
</SearchIndex>

How to allow only certain IP address to connect to author instance

Problem: You want only certain IP address to access your author instance

Use case: You have a dispatcher in front of author instance and you want everyone to access author through dispatcher.

Solution:
Approach1: You can have your author in a DMZ or behind the firewall and open firewall port for only dispatcher.

Approach2: (Only available CQ 5.4 or lower)
modify server.xml under /crx-quickstart/server/etc/ and add following entry


<listener>
<access-constraint>
<deny>
<ip-address><IP address you want to deny></ip-address>
</deny>

<allow>
<ip-address><IP you want to allow></ip-address>
</allow>
</access-constraint>
......
</listener>

See server_3_0.dtd for details of tags.

Approach 3:

You can also use dispatcher.any file to allow specific IP


/allowedClients
        {
      /0000
          {
          /glob "*"
          /type "deny"
          }
        /0001
          {
          /glob "localhost"
          /type "allow"
          }
         /0002
           {
           /glob "127.0.0.1"
           /type "allow"
           }
       }

Approach 4:

Use Mod security apache module to restrict IP address. More detail about module can be found here

About Mod security Module: https://www.modsecurity.org/

Set up Mod security module in Apache: https://linode.com/docs/web-servers/apache-tips-and-tricks/configure-modsecurity-on-apache/

Restrict IP address using Mod security: https://www.codeproject.com/Articles/574935/BlockplusIPplususingplusModSecurity

How to set up dynamic flushing based on path in CQ

Problem: Currently when you activate a page all the pages get invalided (Based on entry in /invalidate section in dispatcher.any) If you have very flat site structure, this could be a performance hit to the publish server. For example if you have site structure like /content/en , /content/es , /content/dam and you want that activating under /content/dam should not invalidate all the pages under /content/en or /content/es. In OOTB setting this is not possible to achieve. As for above case you need to set stat file level as 1.

Workaround: Thank fully there is workaround to above problem. Work around assume that you are using apache as web server. You need to do something similar in IIS which unfortunately I don't know. Workaround also assume that farm selection in dispatcher is based on HOST header, which is correct in current dispatcher release.

Approach: Since stat file level or invalidation can be per farm, We are going to create to create a separate farm to serve dam content and for DAM invalidation. We make sure that activation request and content request to DAM get FWD to new farm. For that we will use mod_env, mod_setenvif and mod_headers modules of apache.
Step 1: Add new host name to all the request for /content/dam (For our example). To do that add following entry under section of httpd.conf

RequestHeader set Host "localhost.dam.com"

#This make sure that flush request to DAM goes to DAM farm

SetEnv hostnameforfarm localhost
SetEnvIfNoCase CQ-Path ^/content/dam/ hostnameforfarm=localhost.dam.com
RequestHeader set Host %{hostnameforfarm}e


Step2 Create a new farm in dispatcher.any to handle all the DAM request and set invalidation to all html to false there. So your new dispatcher.any will look like this,
# name of the dispatcher
/name "edu.ru.siws.dispatcher"

# each farm configures a set off (loadbalanced) renders
/farms
{
# first farm entry (label is not important, just for you convenience)
/restofsite
{
.... all entry
}
# second farm entry (label is not important, just for you convenience)
/dam
{
/clientheaders
{

-- All headers
}

# hostname globbing for farm selection (virtual domain addressing)
/virtualhosts
{

"localhost.dam.com"
}

---Everything else and then

/invalidate
{
/0000
{
/glob "*.html"
/type "deny"
}

}



Now all the invalidation request to DAM will go to another farm, That will not invalidate any html pages.

You can customize above to have stat file at different level for second farm. That way you will not invalidate global stat file.

Note: You have to make sure that in your dispatcher flush you add CQ-Path header for this to work




Thanks Andrew Khoury from Adobe for this information.