linux - how to remove only the duplication file under some directory ( with the same cksum ) -
i build following script in order remove files same cksum ( or content )
the problem script can remove files twice following example ( output )
my target remove duplication file , not source file ,
script output:
starting: same: /tmp/file_inventury.out /tmp/file_inventury.out.1 remove: /tmp/file_inventury.out.1 same: /tmp/file_inventury.out.1 /tmp/file_inventury.out remove: /tmp/file_inventury.out same: /tmp/file_inventury.out.2 /tmp/file_inventury.out.3 remove: /tmp/file_inventury.out.3 same: /tmp/file_inventury.out.3 /tmp/file_inventury.out.2 remove: /tmp/file_inventury.out.2 same: /tmp/file_inventury.out.4 /tmp/file_inventury.out remove: /tmp/file_inventury.out done.
.
my script:
#!/bin/bash dir="/tmp" echo "starting:" file1 in ${dir}/file_inventury.out*; file2 in ${dir}/file_inventury.out*; if [ $file1 != $file2 ]; diff "$file1" "$file2" 1>/dev/null stat=$? if [ $stat -eq 0 ] echo "same: $file1 $file2" echo "remove: $file2" rm "$file1" break fi fi done done echo "done."
.
in case want ear – other options how remove files same content or cksum ( need remove duplication file , not primary file )
please advice how can under solaris os , ( options example - find 1 liner , awk , sed ... etc )
this version should more efficient. nervous paste
matching correct rows, looks posix specifies glob'ing sorted default.
for in *; date -u +%y-%m-%dt%tz -r "$i"; done > .stat; #store last modification time in sortable format cksum * > .cksum; #store cksum, size, , filename paste .stat .cksum | #data each file, 1 per row sort | #sort mtime original comes first awk '{ if($2 in f) system("rm -v " $4); #rm if have seen occurrence of cksum else f[$2]++ #count first occurrence }'
this should run in o(n * log(n))
time, reading each file once.
you can put in shell script as:
#!/bin/sh in *; date -u +%y-%m-%dt%tz -r "$i"; done > .stat; cksum * > .cksum; paste .stat .cksum | sort | awk '{if($2 in f) system("rm -v " $4); else f[$2]++}'; rm .stat .cksum; exit 0;
or one-liner:
for in *; date -u +%y-%m-%dt%tz -r "$i"; done > .stat; cksum * > .cksum; paste .stat .cksum | sort | awk '{if($2 in f) system("rm -v " $4); else f[$2]++}'; rm .stat .cksum;
Comments
Post a Comment