Andrea MoroIl blog di

Combining multiple CSV files with your Mac OS X

If you have ever used a Windows or Dos based system, one thing that you may have appreciated is the possibility to use wildcards while searching for files and directories. Unfortunately, this opportunity it exists on Macintosh, but you have to remember to specify the exact path where you are looking for the files. Failing in this, LS will nastily command prompt the syntax string to use that is even useless.

Assuming you are in the folder where all your files reside, to selectively return only some files corresponding to certain criteria you have to type something like

ls ./*yoursearchstring*

Clearly ‘yourstring’ need to be replaced with the string you need, but pay attention remove the quotes.

I’ve written a Python script to generate a series of file for me, and in my circumstance the files are generated in a timely order. Let’s say I need to order my files from the most recent to the oldest.

One way to sort this out is using the right command while listing the files like:

ls -t ./*yourstring*

I’m not going to you any further examples on this as this is not the scope of the article. 

All in one line 

Unix based systems are nice because they can combine multiple commands in one single line. To merge files together we need to recur to the cat utility is normally used to print on screen the content of a file.

In our circumstance we will use this to combine the source with the destination. If the already destination exists, cat will replace the previous file unless you will use the operator >> (double greater sign) that stands for appending. The normal syntax requires you to use the single greater sign. 

So the final command will look like this

ls ./*yoursearchstring* | xargs -I t1 cat t1 >> mynewfile.csv 

Oh man. There’s a lot of stuff going on in there. We have already seen the ‘ls’. Next, I piped that list into ‘xargs’, which now has a -I flag to go with it. The flag is used to give the argument received from ls a name (‘t1’ in this case; the name should be passed immediately after the flag).
As you can see, that name gets used later in the ‘xargs’ command, as the first argument to the ‘cat’ that requires a source That ‘cat’ command is still part of ‘xargs’ and it’s where the magic actually happens!

Cat will receive the first file name shown from ls and it will create a file called mynewfile.csv. By xargs with ls, the system will execut the combination an x amount of times as many files as ls will find. Once on the second file cat will get the content of that file and because of our double greater sign it will combine the content with the existing file.
As a result, you will have your new shiny mynewfile.csv with all the content of your files (in my case a series of csv documents). 

If you are not sure whether to include all the files, you can prepend a ‘p’ flag to the xargs command, thus requiring the utility to prompt before executing each command.

Pretty cool, isn’t it?



Leave a Reply

gosq