linux - how to remove only the duplication file under some directory ( with the same cksum ) -

- June 15, 2013

i build following script in order remove files same cksum ( or content )

the problem script can remove files twice following example ( output )

my target remove duplication file , not source file ,

script output:

  starting:   same: /tmp/file_inventury.out /tmp/file_inventury.out.1   remove: /tmp/file_inventury.out.1   same: /tmp/file_inventury.out.1 /tmp/file_inventury.out   remove: /tmp/file_inventury.out   same: /tmp/file_inventury.out.2 /tmp/file_inventury.out.3   remove: /tmp/file_inventury.out.3   same: /tmp/file_inventury.out.3 /tmp/file_inventury.out.2   remove: /tmp/file_inventury.out.2   same: /tmp/file_inventury.out.4 /tmp/file_inventury.out   remove: /tmp/file_inventury.out   done.

my script:

 #!/bin/bash   dir="/tmp"  echo "starting:"   file1 in ${dir}/file_inventury.out*;     file2 in ${dir}/file_inventury.out*;             if [ $file1 != $file2 ];                     diff "$file1" "$file2" 1>/dev/null                     stat=$?                     if [ $stat -eq 0 ]                                                  echo "same: $file1 $file2"                             echo "remove: $file2"                             rm "$file1"                             break                     fi             fi     done  done  echo "done."

in case want ear – other options how remove files same content or cksum ( need remove duplication file , not primary file )

please advice how can under solaris os , ( options example - find 1 liner , awk , sed ... etc )

this version should more efficient. nervous paste matching correct rows, looks posix specifies glob'ing sorted default.

for in *;     date -u +%y-%m-%dt%tz -r "$i"; done > .stat;         #store last modification time in sortable format cksum * > .cksum;     #store cksum, size, , filename paste .stat .cksum |  #data each file, 1 per row     sort |            #sort mtime original comes first     awk '{         if($2 in f)             system("rm -v " $4); #rm if have seen occurrence of cksum         else             f[$2]++              #count first occurrence     }'

this should run in o(n * log(n)) time, reading each file once.

you can put in shell script as:

#!/bin/sh  in *;     date -u +%y-%m-%dt%tz -r "$i"; done > .stat; cksum * > .cksum; paste .stat .cksum | sort | awk '{if($2 in f) system("rm -v " $4); else f[$2]++}'; rm .stat .cksum; exit 0;

or one-liner:

for in *; date -u +%y-%m-%dt%tz -r "$i"; done > .stat; cksum * > .cksum; paste .stat .cksum | sort | awk '{if($2 in f) system("rm -v " $4); else f[$2]++}'; rm .stat .cksum;

Search This Blog

DTr

linux - how to remove only the duplication file under some directory ( with the same cksum ) -

Comments

Post a Comment

Popular posts from this blog

c++ - OpenCV Error: Assertion failed <scn == 3 ::scn == 4> in unknown function, -

php - render data via PDO::FETCH_FUNC vs loop -

The canvas has been tainted by cross-origin data in chrome only -