Saturday, November 26, 2011

How to truncate and interpret Tar File in CQ / WEM

Use Case: You want to analyze what is there in tar file or possibly truncate it using tar file truncate tools.

Solution:

Soppose you try to analyzing very first data tar file under (/crx-quickstart/repository/workspace/crx.default)

$ tar -tvf data_00010.tar | less

-rwxrwxrwx 0/0 301 2011-11-22 02:00 cafebabe-cafe-babe-cafe-babecafebabe.na
-rwxrwxrwx 0/0 80 2011-11-22 02:00 deadbeef-cafe-babe-cafe-babecafebabe.na
-rwxrwxrwx 0/0 142 2011-11-22 02:00 f34600a5-be47-41be-8568-8c79dcbd7355.na
-rwxrwxrwx 0/0 296 2011-11-22 02:00 21232f29-7a57-35a7-8389-4a0e4a801fc3.na
-rwxrwxrwx 0/0 348 2011-11-22 02:00 a9c3313a-c296-486d-99ac-b56ebfdefd26.na
-rwxrwxrwx 0/0 192 2011-11-22 02:00 63781579-2ec1-441d-b582-3eea7ce6a0bd.na
-rwxrwxrwx 0/0 276 2011-11-22 02:00 294de355-7d9d-30b3-92d8-a1e6aab028cf.na
-rwxrwxrwx 0/0 286 2011-11-22 02:00 984c30b3-96f5-48b7-a1f3-3d74d0f00460.na
-rwxrwxrwx 0/0 0 2011-11-23 11:30 1322065821400/2782dd9a-c5a8-4689-a9aa-ae46a4074ac3.n
.....
-rwxrwxrwx 0/0 20 2011-11-22 02:00 commit-1321945201521.sh

Entry without <number>/<UUID> is system UUID (First one being root node) and should not be modified.

The entries with 1322065821400/2782dd9a-c5a8-4689-a9aa-ae46a4074ac3.n mean the transaction was started at Wed Nov 23 2011 11:30:21 GMT-0500 (EST) (= System.out.println(new java.sql.Timestamp(1299071667963L).toString()) or http://www.esqsoft.com/javascript_examples/date-to-epoch.htm) on UUID 2782dd9a-c5a8-4689-a9aa-ae46a4074ac3 then

entry with commit-1321945201521.sh mean transaction committed at Tue Nov 22 2011 02:00:01 GMT-0500 (EST)

To truncate a tar file you need Hex Editor in combination with dd tool.

Something like hexdump -C data_00010.tar | more -- To get Hex Location of time stamp

and then dd if=./data_00010.tar of=./data_00010_copy.tar bs=<location> count=1 -- To truncate. Note that "Location" is decimal value so you might need to convert Hex to Decimal to use this command.


For example
$ tar -tvf data_00010.tar | tail -4f
-rwxrwxrwx 0/0 20 2011-11-23 17:24 commit-1322087067980.sh
-rwxrwxrwx 0/0 0 2011-11-23 18:20 1322090435044/398a250e-96f7-4029-a083-4c78f426e56d.n
-rwxrwxrwx 0/0 203 2011-11-23 18:20 1322090435044/7efe2484-5818-4638-8ccb-08ad9c2fe4c0.na
-rwxrwxrwx 0/0 20 2011-11-23 18:20 commit-1322090435044.sh

$ hexdump -C data_00010.tar | tail -100f

03ac3400 63 6f 6d 6d 69 74 2d 31 33 32 32 30 38 37 30 36 |commit-132208706|
03ac3410 37 39 38 30 2e 73 68 00 00 00 00 00 00 00 00 00 |7980.sh.........|
03ac3420 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
03ac3460 00 00 00 00 20 20 20 37 37 37 20 00 20 20 20 20 |.... 777 . |
03ac3470 20 30 20 00 20 20 20 20 20 30 20 00 20 20 20 20 | 0 . 0 . |
03ac3480 20 20 20 20 20 32 34 20 31 31 36 36 33 32 37 31 | 24 11663271|
03ac3490 32 33 33 20 20 20 20 37 32 32 32 20 30 00 00 00 |233 7222 0...|
03ac34a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
03ac3600 6d 76 20 31 33 32 32 30 38 37 30 36 37 39 38 30 |mv 1322087067980|
03ac3610 2f 2a 20 2e 00 00 00 00 00 00 00 00 00 00 00 00 |/* .............|
03ac3620 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
03ac3800 31 33 32 32 30 39 30 34 33 35 30 34 34 2f 33 39 |1322090435044/39|
03ac3810 38 61 32 35 30 65 2d 39 36 66 37 2d 34 30 32 39 |8a250e-96f7-4029|
03ac3820 2d 61 30 38 33 2d 34 63 37 38 66 34 32 36 65 35 |-a083-4c78f426e5|
03ac3830 36 64 2e 6e 00 00 00 00 00 00 00 00 00 00 00 00 |6d.n............|
03ac3840 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
03ac3860 00 00 00 00 20 20 20 37 37 37 20 00 20 20 20 20 |.... 777 . |
03ac3870 20 30 20 00 20 20 20 20 20 30 20 00 20 20 20 20 | 0 . 0 . |
03ac3880 20 20 20 20 20 20 30 20 31 31 36 36 33 32 37 37 | 0 11663277|
03ac3890 37 30 33 20 20 20 31 32 31 31 30 20 30 00 00 00 |703 12110 0...|
03ac38a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

Now in above example suppose you want to truncate on 1322090435044/398a250e-96f7-4029-a083-4c78f426e56d.n Now see corresponding entry in Hex output

03ac3800 31 33 32 32 30 39 30 34 33 35 30 34 34 2f 33 39 |1322090435044/39|
03ac3810 38 61 32 35 30 65 2d 39 36 66 37 2d 34 30 32 39 |8a250e-96f7-4029|
03ac3820 2d 61 30 38 33 2d 34 63 37 38 66 34 32 36 65 35 |-a083-4c78f426e5|
03ac3830 36 64 2e 6e 00 00 00 00 00 00 00 00 00 00 00 00 |6d.n............|
03ac3840 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

Then get first value of hex that is "03ac3800" in this case and convert it to decimal using any tool (http://www.binaryhexconverter.com/hex-to-decimal-converter). Decimal value would be "61618176". Then use following command to truncate

dd if=./data_00010.tar of=./data_00010_copy.tar bs=61618176 count=1

Now if you will examine both file (See data tar file is truncated (data_00010_copy.tar) and original file is under data_00010.tar)

$ tar -tvf data_00010.tar | tail -6f
-rwxrwxrwx 0/0 20 2011-11-23 16:09 commit-1322082572919.sh
-rwxrwxrwx 0/0 168 2011-11-23 17:24 1322087067980/fb042de1-85f4-4ca3-b4ee-e083ea79753c.na
-rwxrwxrwx 0/0 20 2011-11-23 17:24 commit-1322087067980.sh
-rwxrwxrwx 0/0 0 2011-11-23 18:20 1322090435044/398a250e-96f7-4029-a083-4c78f426e56d.n
-rwxrwxrwx 0/0 203 2011-11-23 18:20 1322090435044/7efe2484-5818-4638-8ccb-08ad9c2fe4c0.na
-rwxrwxrwx 0/0 20 2011-11-23 18:20 commit-1322090435044.sh

$ tar -tvf data_00010_copy.tar | tail -6f
-rwxrwxrwx 0/0 20 2011-11-23 12:14 commit-1322068449556.sh
-rwxrwxrwx 0/0 0 2011-11-23 16:09 1322082572919/ef65dcfa-000a-4ede-ab3d-4844ee66d242.n
-rwxrwxrwx 0/0 92 2011-11-23 16:09 1322082572919/97daadf7-770b-4005-a1f1-86040b32b9b5.na
-rwxrwxrwx 0/0 20 2011-11-23 16:09 commit-1322082572919.sh
-rwxrwxrwx 0/0 168 2011-11-23 17:24 1322087067980/fb042de1-85f4-4ca3-b4ee-e083ea79753c.na
-rwxrwxrwx 0/0 20 2011-11-23 17:24 commit-1322087067980.sh


1) http://www.binaryhexconverter.com/hex-to-decimal-converter
2) http://www.novell.com/communities/node/6419/making-sense-hexdump
3) http://en.wikipedia.org/wiki/Dd_(Unix)

Thanks Thomas Mueller From Adobe for providing this information.

Caution: This command is very dangerous and lead to data loss. Please use it with "A lot" of caution.

3 comments: