How do I conditionally append the occurrence of a field using awk? - csv

I have a file that looks like this:
Level,Member
HIGH,John
HIGH,John
HIGH,Paul
HIGH,George
REG,George
REG,George
REG,George
REG,John
REG,Paul
REG,Paul
REG,Ringo
If I want to append a count of the occurrence of data in the second column, this works great:
awk 'BEGIN{ FS=OFS="," }{ $0=$0 OFS (++a[$2]) }1' file
But I'm having trouble figuring out how to add an if/else statement so that I can conditionally count by level so that my output looks like this:
Level,Member,1
HIGH,George,1
HIGH,John,1
HIGH,John,2
HIGH,Paul,1
REG,George,1
REG,George,2
REG,George,3
REG,John,1
REG,Paul,1
REG,Paul,2
REG,Ringo,1
Please note that the count starts over when the level changes from HIGH to REG. The file is already sorted by level and then by member.

or just..
$ awk '{$0=$0","++a[$0]}1' file
Level,Member,1
HIGH,John,1
HIGH,John,2
HIGH,Paul,1
HIGH,George,1
REG,George,1
REG,George,2
REG,George,3
REG,John,1
REG,Paul,1
REG,Paul,2
REG,Ringo,1

Keep it simple:
awk '{print $0","(++c[$0])}'

Your command was already pretty fine. I have just changed the key of associative array a[]:
awk 'BEGIN{ FS=OFS="," }{ $0=$0 OFS (++a[$2$1]) }1' file
or:
awk 'BEGIN{ FS=OFS="," }{ $0=$0 OFS (++a[$0]) }1' file

Related

CSV Column Insertion via awk

I am trying to insert a column in front of the first column in a comma separated value file (CSV). At first blush, awk seems to be the way to go but, I'm struggling with how to move down the new column.
CSV File
A,B,C,D,E,F
1,2,3,4,5,6
2,3,4,5,6,7
3,4,5,6,7,8
4,5,6,7,8,9
Attempted Code
awk 'BEGIN{FS=OFS=","}{$1=$1 OFS (FNR<1 ? $1 "0\nA\n2\nC" : "col")}1'
Result
A,col,B,C,D,E,F
1,col,2,3,4,5,6
2,col,3,4,5,6,7
3,col,4,5,6,7,8
4,col,5,6,7,8,9
Expected Result
col,A,B,C,D,E,F
0,1,2,3,4,5,6
A,2,3,4,5,6,7
2,3,4,5,6,7,8
C,4,5,6,7,8,9
This can be easily done using paste + printf:
paste -d, <(printf "col\n0\nA\n2\nC\n") file
col,A,B,C,D,E,F
0,1,2,3,4,5,6
A,2,3,4,5,6,7
2,3,4,5,6,7,8
C,4,5,6,7,8,9
<(...) is process substitution available in bash. For other shells use a pipeline like this:
printf "col\n0\nA\n2\nC\n" | paste -d, - file
With awk only you could try following solution, written and tested with shown samples.
awk -v value="$(echo -e "col\n0\nA\n2\nC")" '
BEGIN{
FS=OFS=","
num=split(value,arr,ORS)
for(i=1;i<=num;i++){
newVal[i]=arr[i]
}
}
{
$1=arr[FNR] OFS $1
}
1
' Input_file
Explanation:
First of all creating awk variable named value whose value is echo(shell command)'s output. NOTE: using -e option with echo will make sure that \n aren't getting treated as literal characters.
Then in BEGIN section of awk program, setting FS and OFS as , here for all line of Input_file.
Using split function on value variable into array named arr with delimiter of ORS(new line).
Then traversing through for loop till value of num(total values posted by echo command).
Then creating array named newVal with index of i(1,2,3 and so on) and its value is array arr value.
In main awk program, setting first field's value to array arr value and $1 and printing the line then.

Increment field value provided another field matches a string

I am trying to increment a value in a csv file, provided it matches a search string. Here is the script that was utilized:
awk -i inplace -F',' '$1 == "FL" { print $1, $2+1} ' data.txt
Contents of data.txt:
NY,1
FL,5
CA,1
Current Output:
FL 6
Intended Output:
NY,1
FL,6
CA,1
Thanks.
$ awk 'BEGIN{FS=OFS=","} $1=="FL"{++$2} 1' data.txt
NY,1
FL,6
CA,1
Intended Output:
NY,1 FL,6 CA,1
I would harness GNU AWK for this task following way, let file.txt content be
NY,1
FL,5
CA,1
then
awk 'BEGIN{FS=OFS=",";ORS=" "}{print $1,$2+($1=="FL")}' file.txt
gives output
NY,1 FL,6 CA,1
Explanation: I inform GNU AWK that field separator (FS) and output field separator (OFS) is , and output row separator (ORS) is space with accordance to your requirements. Then for each line I print 1st field followed by 2nd field increased by is 1st field FL? with 1 denoting it does hold, 0 denotes it does not hold. If you want to know more about FS or OFS or ORS then read 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR
(tested in gawk 4.2.1)
Use this Perl one-liner:
perl -i -F',' -lane 'if ( $F[0] eq "FL" ) { $F[1]++; } print join ",", #F;' data.txt
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array #F on whitespace or on the regex specified in -F option.
-F',' : Split into #F on comma, rather than on whitespace.
-i.bak : Edit input files in-place (overwrite the input file). Before overwriting, save a backup copy of the original file by appending to its name the extension .bak. If you want to skip writing a backup file, just use -i and skip the extension.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches

Convert JSON Objects to JSON object array Without programming Language (exception shell scripting)

I have file containing Multiple JSON Objects and need to covert them to JSON. I have bash and Excel installed, but cannot install any other tool.
{"name": "a","age":"17"}
{"name":"b","age":"18"}
To:
[{"name": "a","age":"17"},
{"name":"b","age":"18"}]
Assuming one object per line as shown by OP question
echo -n "["; while read line; do echo "${line},"; done < <(cat test.txt) | sed -re '$ s/(.*),/\1]/'
Result:
[{"name": "a","age":"17"},
{"name":"b","age":"18"}]
Inspired by https://askubuntu.com/a/475804
awk '(NR==FNR){count++} (NR!=FNR){ if (FNR==1) printf("["); printf("%s", $0); print (FNR==count)?"]":"," } END {if (count==0) print "[]"}' file file
A less compact but more readable version:
awk '
(NR==FNR) {
count++;
}
(NR!=FNR) {
if (FNR==1)
printf("[");
printf("%s", $0);
if (FNR==count)
print "]";
else
print ",";
}
END {
if (count==0) print "[]";
}' file file
The trick is to give the same file twice to awk. Because NR==FNR is always true for the first file, there is a first parse dedicated to counting the number of lines into variable count.
The second parse with NR!=FNR will apply the following algorithm for each line:
Write [ for the first line only
Then write the record, using printf instead of print in order to avoid the newline ending
Then write either ] or , depending on whether we are on the last line or not, using print in order to end with a newline
The END command is just a failsafe to output an empty array in case the file is empty.
Assumptions:
no requirement to (re)format the input data
Sample input:
$ cat raw.dat
{"name": "a","age":"17"}
{"name":"b","age":"18"}
{"name":"C","age":"23"}
One awk idea:
awk 'BEGIN {pfx="["} {printf "%s%s",pfx,$0; pfx=",\n"} END {printf "]\n"}' raw.dat
Where:
for each input line we printf the line without a terminating linefeed
for the first line we use a prefix (pfx) of [
for subsequent lines the prefix (pfx) is set to ,\n (ie, terminate the previous line with ,\n)
once the file has been processed we terminate the last input line with a printf "]\n"
requires a single pass through the input file
This generates:
[{"name": "a","age":"17"},
{"name":"b","age":"18"},
{"name":"C","age":"23"}]
Making sure #chepner's comment (re: a sed solution) isn't lost in the mix:
sed '1s/^/[/;2,$s/^/,/;$s/$/]/' raw.dat
This generates:
[{"name": "a","age":"17"}
,{"name":"b","age":"18"}
,{"name":"C","age":"23"}]
NOTE: I can remove this if #chepner wants to post this as an answer.

Splitting a string column "date time" into "date" "time" in CSV file

I have a csv file of the form :
$ head purchases.csv
id,userID,itemID,price,platform,day
1,9132,id_005,3600,2,2014-10-30 17:29:46
2,67894,id_005,3000,1,2015-04-23 21:22:55
3,272780,id_004,1000,1,2014-11-27 16:58:30
4,302396,id_001,100,1,2014-12-11 08:35:07
Now, I want to change the csv's last column. Currently, it's as day column in the form 2014-10-30 17:29:46 ie with a whitespace between the date and the time. But I want to split this column into two columns day and time so that after the change the csv file becomes:
$ head purchases.csv
id,userID,itemID,price,platform,day,time
1,9132,id_005,3600,2,2014-10-30,17:29:46
2,67894,id_005,3000,1,2015-04-23,21:22:55
3,272780,id_004,1000,1,2014-11-27,16:58:30
How can I do it from terminal?
Using split on $6:
$ awk -v OFS=\, -F\, 'NR==1{print $0,"time";next} {split($6,a," "); print $1,$2,$3,$4,$5,a[1],a[2]}' test.in
id,userID,itemID,price,platform,day,time
1,9132,id_005,3600,2,2014-10-30,17:29:46
2,67894,id_005,3000,1,2015-04-23,21:22:55
3,272780,id_004,1000,1,2014-11-27,16:58:30
4,302396,id_001,100,1,2014-12-11,08:35:07
Or you could use gsub and just replace the space with a comma:
$ awk -v OFS=\, -F\, 'NR==1{print $0,"time";next} {gsub(/ /,",",$6); print $0}' test.in
James Brown's answer is helpful, but hard-codes the column to modify while also assuming it it is the last.
A few simple tweaks generalize the solution:
awk -v ndx=6 -F, 'NR==1 {sub(/$/, ",time", $ndx); print; next} sub(" ", ",", $ndx)' \
purchases.csv

awk CSV Split with headers Windows

Ok I have a csv file I need to split based on a column value which is fine, but I cannot get the headers to print in each file.
Currently I use:
awk "FS =\",\" {output=$3\".csv\"; print $0 > output}" test.csv
Which splits the files file based on column 3, but I don't know how to add the header to each file.
I've searched high & low but can't find a solution that will work in a one liner...
UPDATE
OK to date we have a working one liner:
awk -F, "NR==1{hdr=$0;next}!($3 in files){files[$3]=1;print hdr>$3\".csv\"}{print>$3\".csv\"}" test.csv
Or in test.awk:
BEGIN{FS=","} NR==1 {hdr=$0;next}!($3 in files) {files[$3]=1;print hdr>$3".csv"}{print>$3".csv"}
Command to run used:
awk -f test.awk test.csv
I really appreciate the help here, I've been trying for hours and have a few things left to work out.
1) Blank line inserted after header
2) Sort the data on specified fields
Further down the line I want to additionally do a row count & cut a reference number from another file is this possible with AWK or am I using the wrong tool for the job?
Thanks again.
UPDATED#2
Blank line after header line
UPDATED
Try this:
On Unix/cygwin (I tested on cygwin):
awk -F, 'NR==1{hdr=$0;next}!($3 in files){files[$3]=1;print hdr"\n">$3".csv"}{print>$3".csv"}' test.csv
Or adding Kent's ideas:
awk -F, 'NR==1{hdr=$0;next}{out=$3".csv"}!($3 in files){files[$3];print hdr"\n">out}{print>out}' test.csv
On windows cmd (not tested):
awk -F, "NR==1{hdr=$0;next}!($3 in files){files[$3]=1;print hdr\"\n\">$3\".csv\"}{print>$3\".csv\"}" test.csv
This stores the header line in test.csv to hdr. For the next lines it checks if the file name value is already exists. If not then stores its name in the files hash and prints the header line. And anyway it prints the whole line to the file.
Example file:
$ cat test.csv
A,B,C,D
1,2,a,3
4,5,b,4
Output
$ cat a.csv
A,B,C,D
1,2,a,3
$ cat b.csv
A,B,C,D
4,5,b,4
ADDED
If You would like to put the awk script into a file You could try (I cannot test is, sorry).
test.awk
BEGIN{FS=","}
NR==1 {hdr=$0;next}
!($3 in files) {files[$3]=1;print hdr"\n">$3".csv"}
{print>"$3.csv"}
Then You may call it as
awk -f test.awk test.csv
awk -F, 'NR==1{h=$0;next}{out=$3".csv";
if!(out in a)print h> out; print $0 > out;a[out]}' test.csv
Try something like this:
awk -F, '
BEGIN {
getline header
}
{
out=$3".csv"
if (!($3 in seen)) {
print header > out
}
print $0 > out
seen[$3]
}' test.csv
Windows version: (Not tested)
awk " FS =\",\"
BEGIN {
getline header
}
{
out=$3\".csv\"
if (!($3 in seen)) {
print header > out
}
print $0 > out
seen[$3]
}" test.csv
awk '{ output=$3".csv"; if( !($0 in a)) print "header" > output; a[$0]
print > output}' FS=, test.csv