Combining multiple CSV files with your Mac OS X

If you have ever used a Windows or Dos based system, one thing that you may have appreciated is the possibility to use wildcards while searching for files and directories. Unfortunately, this opportunity it exists on Macintosh, but you have to remember to specify the exact path where you are looking for the files. Failing in this, LS will nastily command prompt the syntax string to use that is even useless.

Assuming you are in the folder where all your files reside, to selectively return only some files corresponding to certain criteria you have to type something like the thing below, replacing the string according to your needs.

#!/bin/bash
ls ./*yoursearchstring*

I’ve written a Python script to generate a series of file for me, and in my circumstance the files are generated in a timely order. Let’s say I need to order my files from the most recent to the oldest.

One way to sort this out is using the right command while listing the files like:

#!/bin/bash
ls -t ./*yourstring*

I’m not going to you any further examples on this as this is not the scope of the article.

All in one line

Unix based systems are nice because they can combine multiple commands in one single line. To merge files together we need to recur to the cat utility is normally used to print on screen the content of a file.

In our circumstance we will use this to combine the source with the destination. If the already destination exists, cat will replace the previous file unless you will use the operator >> (double greater sign) that stands for appending. The normal syntax requires you to use the single greater sign.

So the final command will look like this

#!/bin/bash
ls ./*yoursearchstring* | xargs -I t1 cat t1 >> mynewfile.csv

Oh man. There’s a lot of stuff going on in there. We have already seen the ‘ls’. Next, I piped that list into ‘xargs’, which now has a -I flag to go with it. The flag is used to give the argument received from ls a name (‘t1’ in this case; the name should be passed immediately after the flag). As you can see, the output is later used via the xargs command, as the first argument to the [cat][100] that requires an input string.

Cat will receive the first file name shown from ls and it will create a file called mynewfile.csv. By xargs with ls, the system will execut the combination an x amount of times as many files as ls will find. Once on the second file cat will get the content of that file and because of our double greater sign it will combine the content with the existing file. As a result, you will have your new shiny mynewfile.csv with all the content of your files (in my case a series of csv documents).

If you are not sure whether to include all the files, you can prepend a ‘p’ flag to the xargs command, thus requiring the utility to prompt before executing each command.

Pretty cool, isn’t it?