Wednesday, February 15, 2012

How to improve Tar Optimization and Indexing process in CQ 5.4 / WEM

Use Case: You have a lot of tar files (Because of some bad code) that you want to clear quickly Or you want to improve indexing process while reindexing repository.

Pre-requisite CRX2.2 with Hotfix or greater.


There are two new TarPersistenceManager configuration

The optimizeCount option controls how many blocks the
TarPM optimizer processes at a time. The default value is
one, which avoids blocking concurrent user operations as
much as possible. By increasing this value the TarPM
optimization speed increases, but the performance of
concurrent user operations decreases.

In addition to this you can do following,

The main problem of the Tar PM optimization is reading from the Tar index files in random order. To speed this up, load the files in the buffer cache every few minutes, from both the crx.default and the version directory:
cat .../repository/workspaces/crx.default/index*.tar > /dev/null
cat .../repository/version/index*.tar > /dev/null

The indexInMemory option is a boolean flag for making the
repository read all TarPM index files entirely in memory.
This can significantly boost performance for large
repositories, but requires gigabytes of free RAM.

Here is How this needs to be set,

<PersistenceManager class="">
<param name="optimizeCount" value="{Numeric value you want to set}" />
<param name="indexInMemory" value="true" />

Please test it before use

Special Thanks to Thomas Mueller and Jukka zitting from Adobe for this information.


  1. what do you mean by "(Because of some bad code)" in this case? can you give any examples of "bad code" that might cause there to be "a lot of tar files"?

    1. Simple bad code is while(1){ Create Node in repository or create replication content } any recursive call for which terminating condition never comes and it create repository node.

  2. Hi Yogesh,
    How do we determine the optimal value for optimizeCount?

    1. Shuja,

      It depends upon how much blocking you can tolerate. Keeping value as less as possible make sure that there is no slowdown in user experience. As tar fiels will be blocked for write for that time. I guess for author instance you can increase this count if you know your author would not be working for that time.


    2. Thanks!!! We run our Tar optimization at 2 AM but wondering what will happen when we go international in few months.

    3. Shuja,

      Tar optimization should not create any problem unless write is going on in repository. In that case write performance will be slower than usual.


  3. Hi Yogesh,
    We are using cq 5.5. In our case, taroptimization is happening fine on tarjournal and version directories and its not happening properly on workspaces/crx-default for evaluating one tar file in that directory its taking forever and thus not finishing off the taroptimization at all. I also see there is no user logged in to this server and still it happens very slow and nvr finishes. But when i stop all the replication agents and rendering service this completes very fast.
    Any help will be greatly helpful.

    1. Hello Mohan,

      Note that taroptimization time is limited OOTB (3 hour). What happen when you do manual tar optimization ? In log you would be able to see exactly which tar file is taking a lot of time.


  4. Are you sure about
    cat .../repository/workspaces/crx.default/index*.tar /dev/null
    May be:
    cat .../repository/workspaces/crx.default/index*.tar >/dev/null

    1. Thank you for your feedback. It's cat .../repository/workspaces/crx.default/index*.tar >/dev/null HTML editor removed > for me. Thanks a lot for noticing that.


  5. This was really an interesting topic and I kinda agree with what you have mentioned here! SEO optimalisatie