Friday, May 10, 2013

How to Perform System Clean Up in Adobe CQ / AEM (CQ5.5)

Use Case:

CQ System grows over time as more data is modified, removed and added. CQ follow append only model for datastore, so data is never deleted from datastore even if it is deleted from console. Also over the time we end up having a lot of unnecessary packages as part of deployment and migration. On top of that adding a lot of DAM asset create a lot of workflow data that is not required.

As a result of which Disk size increases and if you are planning to have many instances sharing same hardware (Specially dev) it make sense to reduce size of instance time to time.

Solution: 

You can use following script to clean your data time to time.

Prerequisite:

Get workflow purge script from here

Step 1:

Create file with information about your instance (For example here name is host_list.txt)

#File is use to feed the clean up package script
#FORMAT HOST:PORT
<YOUR SERVER>:<PORT>
#END

Step 2:

Actual Script

#!/bin/bash
#
# Description:
#      Clean Master author Only
#      Clean Old Packages
#      Clean DataStore GC


PURGE_WORK_FLOWS_FILE="purge-workflows-2.zip"
CURL_USER='admin:my_super_secret'
IS_PURGE_PAK_FOUND=NO
MY_HOST_LIST=host_list.txt
# Name of package group that you want to clear
PACKAGE_GROUP=<MY PACKAGE GROUP>


if [ ! -f "${MY_HOST_LIST}" ]; then
  echo "Error cannot find host list file: ${MY_HOST_LIST}"
  echo "Exiting ..."
  exit 1;
fi

function run_purge_job()
{
MY_HOST= <YOUR HOST NAME>
IS_PURGE_PAK_FOUND=$(curl -su "${CURL_USER}" "http://${MY_HOST}:4502/crx/packmgr/service.jsp?cmd=ls" | grep "name" | grep "purge-workflows-2" | tr -d ' \t\n\r\f')

if [ -z "${IS_PURGE_PAK_FOUND}" ]; then
  IS_PURGE_PAK_FOUND=NO
else
  IS_PURGE_PAK_FOUND=YES
fi

if [ "$IS_PURGE_PAK_FOUND" = "NO" -a -f $PURGE_WORK_FLOWS_FILE ]; then
   MY_PAK_NAME=$(basename $PURGE_WORK_FLOWS_FILE .zip)
   MY_STATUS=$(curl -su "${CURL_USER}" -f -F"install=true" -F name=$MY_PAK_NAME -F file=@$PURGE_WORK_FLOWS_FILE http://${MY_HOST}:4502/crx/packmgr/service.jsp | grep code=\"200\"| tr -d ' \t\n\r\f')

   if [ -z "${MY_STATUS}" ]; then
     echo "Error uploading $PURGE_WORK_FLOWS_FILE exiting..."
     exit 1
   fi
fi

if [ "${IS_PURGE_PAK_FOUND}" = "YES" ]; then
   curl -su "${CURL_USER}"  -X POST --data "status=COMPLETED&runpurge=1&Start=Run"  http://${MY_HOST}:4502/apps/workflow-purge/purge.html > /dev/null 2>&1
    sleep 10
   curl -su "${CURL_USER}"  -X POST --data "status=ABORTED&runpurge=1&Start=Run"  http://${MY_HOST}:4502/apps/workflow-purge/purge.html > /dev/null 2>&1
fi
}

function clean_old()
{
for MY_HOST in $(cat $MY_HOST_LIST|grep -v '#')
do
IS_INSTANCE_UP=$(curl --connect-timeout 20 -su "${CURL_USER}" -X POST "http://${MY_HOST}/crx/packmgr/service.jsp?cmd=ls" | grep "name" | grep -i ${PACKAGE_GROUP} | tr -d ' \t\n\r\f')

if [ -z "${IS_INSTANCE_UP}" ]; then
   continue
fi

# You can have multiple package here
# Or you can use Commands from here
echo "deleting package group"
curl -su "${CURL_USER}" -F" :operation=delete" http://${MY_HOST}/etc/packages/<PACKAGE GROUP NAME> > /dev/null 2>&1
 sleep 10
 done
}

function clean_datastore_gc()
{
for MY_HOST in $(cat $MY_HOST_LIST|grep -v '#')
do


IS_INSTANCE_UP=$(curl --connect-timeout 20 -su "${CURL_USER}" -Is "http://${MY_HOST}/crx/packmgr/index.jsp"  | grep HTTP | cut -d ' ' -f2)

if [ ${IS_INSTANCE_UP} -eq 200 ]; then
   continue
fi
echo "running datastore gc"
   curl -su  "${CURL_USER}" -X POST --data "delete=true&delay=2" http://${MY_HOST}/system/console/jmx/com.adobe.granite%3Atype%3DRepository/op/runDataStoreGarbageCollection/java.lang.Boolean > /dev/null 2>&1
done
}

case "$1" in
  'purge')
   run_purge_job
;;
  'clean_paks')
   clean_old
;;
  'clean_ds')
   clean_datastore_gc
;;
*)
  echo $"Usage: $0 {purge|clean_paks|clean_ds}"
  exit 1
  ;;
esac
exit 0
#
#end


Manual Cleaning:

CQ5.5 and before:
1) Download workflow purge script from here
2) Install purge script using package manager
3) Login as admin or as user having administrative access
4) Go to http://${MY_HOST}:4502/apps/workflow-purge/purge.html
5) Select completed from drop down and run purge workflow.
6) You might have to run it multiple time to make sure that everything is deleted.
7) Using crxde light or crx explorer using admin session go to /etc/packages/<Your package group>
8) Delete package you want to delete
9) After deleting click save all
10) To run datastore GC please follow http://www.wemblog.com/2012/03/how-to-run-online-backup-using-curl-in.html Or http://www.cqtutorial.com/courses/cq-admin/cq-admin-lessons/cq-maintenance/cq-datastore-gc


In CQ 5.6 OOTB you can configure audit and workflow purge using instruction here http://helpx.adobe.com/cq/kb/howtopurgewf.html


Special Thanks to Rexwell Minnis for organizing this script.

Note: Please Test This before use. I did not get enough time to test it completely.