Jiby's selfhosted effort

Jb Doyon’s personal website

Using Awk to replace only some lines

Posted on — Nov 16, 2019

I was trying to convert bank-issued CSV files that have a weird data layout, for automatic processing by the excellent ledger. This made me learn a new thing about awk. Follow me down the UNIXy goodness!

The data I was getting looks like this:

Transaction Date,Transaction Description,Debit Amount,Credit Amount,Balance
29/02/2018,Restaurant,14.5,,1234.2
12/03/2018,ATM withdrawal,50,,1184.2
23/04/2018,Deposited check,,100,1284.2

Note the last line having 100 in a different column: The problem is that, when receiving money, the transaction is logged in a separate entry (“Credit Amount”) than the (more common) case of debit, which goes to “Debit Amount” column. My importing tools required a single column, so the output must look like this:

Transaction Date,Transaction Description,Debit Amount,Credit Amount,Balance
29/02/2018,Restaurant,14.5,,1234.2
12/03/2018,ATM withdrawal,50,,1184.2
23/04/2018,Deposited check,-100,,1284.2

The change would be straightforward with sed if it was a simple word substitution problem, but the problem involved parsing the fields and applying action on selected transactions only. Naturally I turned to awk to do this, giving me the following conversion:

awk -F',' '$4 {print $1 "," $2",-" $4 ",," $5}' data.csv

Resultin in

23/04/2018,Deposited check,-100,,1284.2

Decomposing the command, we asked awk to parse fields delimited by commas (-F','), applying changes on lines that have non-empty fourth field (“Credit Amount”), and the changes are simply to print the same numbers, shifted one column to the left with a - in front.

This worked great to convert those entries, but would have meant I had to separate “positive data” and “negative data and re-join them manually later, possibly losing the original ordering. In effect, I wanted to make awk do nothing on debits, and swap credit operations with the script above. I had to look up how to do that, and re-discovered the -e flag of awk in the process!

       -e program-text
       --source program-text
              Use program-text as AWK program source code.  This option allows the
              easy  intermixing  of  library  functions  (used  via  the -f and -i
              options) with source code  entered  on  the  command  line.   It  is
              intended  primarily  for  medium to large AWK programs used in shell
              scripts.

So we can use multiple programs on the same data, ending up with quite a clean solution:

awk -F',' \
    -e '$4 {print $1 "," $2",-" $4 ",," $5}' \
    -e '!$4 {print}' \
    data.csv

Resulting as expected in:

Transaction Date,Transaction Description,Debit Amount,Credit Amount,Balance
29/02/2018,Restaurant,14.5,,1234.2
12/03/2018,ATM withdrawal,50,,1184.2
23/04/2018,Deposited check,-100,,1284.2

The script quite clearly shows that when there is a fourth field (credit), apply a transformation, but when there isn’t, print the line as it was before. awk processes records line by line, so both “programs” would be run on each line (with only one rule matching each time), preserving the order just like I wanted to.

Note that the -e flag is a common feature of such scripting languages, as it applies in sed as well, that’s what made me check the manual again when thinking about this.

Recapping, sed is good at word substitution (and more, but that’s not the point) and awk is fantastic when processing records with delimiters, applying operations on each entry. Hurray for standard UNIX tools!