Ubuntu: How to copy random files to a specific folder?



Question:

I have vast collection of files (6.5 million) in several folders and sub-folders and i want to copy some random picks (about 200k-300k files) to a directory to make a randomized sample.

the folder tree is this (just a small sample) inside each folder there are several files

.  â"œâ"€â"€ articles.0-9A-B.txt  â"‚   â"œâ"€â"€ 20_Century_Br_Hist  â"‚   â"œâ"€â"€ 3_Biotech  â"‚   â"œâ"€â"€ A_A_Case_Rep  â"‚   â"œâ"€â"€ AAPS_J  â"‚   â"œâ"€â"€ AAPS_PharmSciTech  â"‚   â"œâ"€â"€ Abdom_Imaging  â"‚   â"œâ"€â"€ Abdom_Radiol  â"‚   â"œâ"€â"€ Abdom_Radiol_(NY)  â"‚   â"œâ"€â"€ Acad_Emerg_Med  â"‚   â"œâ"€â"€ Acad_Med  â"‚   â"œâ"€â"€ Acad_Psychiatry  â"‚   â"œâ"€â"€ Acad_Radiol  â"‚   â"œâ"€â"€ Acc_Chem_Res  .  .  .  â"‚   â"œâ"€â"€ Bull_Sci_Technol_Soc  â"‚   â"œâ"€â"€ Bull_Volcanol  â"‚   â"œâ"€â"€ Bull_World_Health_Organ  â"‚   â"œâ"€â"€ Bundesgesundheitsblatt_Gesundheitsforschung_Gesundheitsschutz  â"‚   â"œâ"€â"€ Burn_Res  â"‚   â"œâ"€â"€ Burns  â"‚   â"œâ"€â"€ Burns_Trauma  â"‚   â""â"€â"€ Bus_Soc  â"œâ"€â"€ articles.A-B.xml  â"‚   â"œâ"€â"€ 20_Century_Br_Hist  â"‚   â"œâ"€â"€ 3_Biotech  â"‚   â"œâ"€â"€ A_A_Case_Rep  â"‚   â"œâ"€â"€ AAPS_J  â"‚   â"œâ"€â"€ AAPS_PharmSciTech  â"‚   â"œâ"€â"€ Abdom_Imaging  .  .  .  


Solution:1

Normally this would be a oneliner, but it may be a bad idea to process such a huge number of file(name)s directly, so I'll use a tempfile here.

#!/bin/bash  a=$(mktemp)  find /path/to/dir -type f | shuf -n $(shuf -i200000-300000 -n1) >$a  while IFS='' read -r l || [[ -n "$l" ]]; do      cp "$l" /path/to/out/dir  done <$a  

This will find every file located in /path/to/dir, shuffle them and save a random number of lines (between 200,000 and 300,000 as requested) of the output in tempfile $a. The while loop then just copies every file in the list to /path/to/out/dir.


Nonsense, we don't need a tempfile at all, we just pipe it to the while loop or â€" which I prefer â€" to tr and xargs:

#!/bin/bash  find /path/to/dir -type f | shuf -n $(shuf -i200000-300000 -n1) |\  tr '\n' '\0' | xargs -0 -n1 cp -t /path/to/out/dir  

This way you can even specify how many file names each invocation of cp should receive via xargs' -n option.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »