Adding a column in multiple csv file using awk - csv

I want to add a column at the multiple (500) CSV files (same dimensionality). Each column should act as an identifier for the individual file. I want to create a bash script using awk(I am a new bee in awk). The CSV files do come with headers.
For eg.
Input File1.csv
#name,#age,#height
A,12,4.5
B,13,5.0
Input File2.csv
#name,#age,#height
C,11,4.6
D,12,4.3
I want to add a new column "#ID" in both the files, where the value of ID will be same for an individual file but not for both the file.
Expected Output
File1.csv
#name,#age,#height,#ID
A,12,4.5,1
B,13,5.0,1
Expected File2.csv
#name,#age,#height,#ID
C,11,4.6,2
D,12,4.3,2
Please suggest.

If you don't need to extract the id number from the filename, this should do.
$ c=1; for f in File*.csv;
do
sed -i '1s/$/,#ID/; 2,$s/$/,'$c'/' "$f";
c=$((c+1));
done
note that this is inplace edit. Perhaps make a backup or test first.
UPDATE
If you don't need the individual files to be updated, this may work better for you
$ awk -v OFS=, 'BEGIN {f="allFiles.csv"}
FNR==1 {c++; print $0,"#ID" > f; next}
{print $0,c > f}' File*.csv

awk -F, -v OFS=, ‘
FNR == 1 {
$(NF + 1) = “ID#”
i++
f = FILENAME
sub(/Input/, “Output”, f)
} FNR != 1 {
$(NF + 1) = i
} {
print > f
}’ Input*.csv

With GNU awk for inplace editing and ARGIND:
awk -i inplace -v OFS=, '{print $0, (FNR==1 ? "#ID" : ARGIND)}' File*.csv

Related

Increment field value provided another field matches a string

I am trying to increment a value in a csv file, provided it matches a search string. Here is the script that was utilized:
awk -i inplace -F',' '$1 == "FL" { print $1, $2+1} ' data.txt
Contents of data.txt:
NY,1
FL,5
CA,1
Current Output:
FL 6
Intended Output:
NY,1
FL,6
CA,1
Thanks.
$ awk 'BEGIN{FS=OFS=","} $1=="FL"{++$2} 1' data.txt
NY,1
FL,6
CA,1
Intended Output:
NY,1 FL,6 CA,1
I would harness GNU AWK for this task following way, let file.txt content be
NY,1
FL,5
CA,1
then
awk 'BEGIN{FS=OFS=",";ORS=" "}{print $1,$2+($1=="FL")}' file.txt
gives output
NY,1 FL,6 CA,1
Explanation: I inform GNU AWK that field separator (FS) and output field separator (OFS) is , and output row separator (ORS) is space with accordance to your requirements. Then for each line I print 1st field followed by 2nd field increased by is 1st field FL? with 1 denoting it does hold, 0 denotes it does not hold. If you want to know more about FS or OFS or ORS then read 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR
(tested in gawk 4.2.1)
Use this Perl one-liner:
perl -i -F',' -lane 'if ( $F[0] eq "FL" ) { $F[1]++; } print join ",", #F;' data.txt
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array #F on whitespace or on the regex specified in -F option.
-F',' : Split into #F on comma, rather than on whitespace.
-i.bak : Edit input files in-place (overwrite the input file). Before overwriting, save a backup copy of the original file by appending to its name the extension .bak. If you want to skip writing a backup file, just use -i and skip the extension.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches

AWK : comparing 2 columns from 2 csv files, outputting to a third. How do I also get the output that doesnt match to another file?

I currently have the following script:
awk -F, 'NR==FNR { a[$1 FS $4]=$0; next } $1 FS $4 in a { printf a[$1 FS $4]; sub($1 FS $4,""); print }' file1.csv file2.csv > combined.csv
this compares two columns 1 & 4 from both csv files and outputs the result from both files to combined.csv. Is it possible to output the lines from file 1 & file 2 that dont match to other files with the same awk line? or would i need to do seperate parses?
File1
ResourceName,ResourceType,PatternType,User,Host,Operation,PermissionType
BIG.TestTopic,Cluster,LITERAL,Bigboy,*,Create,Allow
BIG.PRETopic,Cluster,LITERAL,Smallboy,*,Create,Allow
BIG.DEVtopic,Cluster,LITERAL,Oldboy,*,DescribeConfigs,Allow
File2
topic,groupName,Name,User,email,team,contact,teamemail,date,clienttype
BIG.TestTopic,BIG.ConsumerGroup,Bobby,Bigboy,bobby#example.com,team 1,Bobby,boys#example.com,2021-11-26T10:10:17Z,Consumer
BIG.DEVtopic,BIG.ConsumerGroup,Bobby,Oldboy,bobby#example.com,team 1,Bobby,boys#example.com,2021-11-26T10:10:17Z,Consumer
BIG.TestTopic,BIG.ConsumerGroup,Susan,Younglady,younglady#example.com,team 1,Susan,girls#example.com,2021-11-26T10:10:17Z,Producer
combined
BIG.TestTopic,Cluster,LITERAL,Bigboy,*,Create,Allow,BIG.TestTopic,BIG.ConsumerGroup,Bobby,Bigboy,bobby#example.com,team 1,Bobby,boys#example.com,2021-11-26T10:10:17Z,Consumer
BIG.DEVtopic,Cluster,LITERAL,Oldboy,*,DescribeConfigs,Allow,BIG.DEVtopic,BIG.ConsumerGroup,Bobby,Oldboy,bobby#example.com,team 1,Bobby,boys#example.com,2021-11-26T10:10:17Z,Consumer
Wanted additional files:
non matched file1:
BIG.PRETopic,Cluster,LITERAL,Smallboy,*,Create,Allow
non matched file2:
BIG.TestTopic,BIG.ConsumerGroup,Susan,Younglady,younglady#example.com,team 1,Susan,girls#example.com,2021-11-26T10:10:17Z,Producer```
again, I might be trying to do too much in one line? would it be wiser to run another parse?
Assuming the key pairs of $1 and $4 are unique within each input file then using any awk in any shell on every Unix box:
$ cat tst.awk
BEGIN { FS=OFS="," }
FNR==1 { next }
{ key = $1 FS $4 }
NR==FNR {
file1[key] = $0
next
}
key in file1 {
print file1[key], $0 > "out_combined"
delete file1[key]
next
}
{
print > "out_file2_only"
}
END {
for (key in file1) {
print file1[key] > "out_file1_only"
}
}
$ awk -f tst.awk file{1,2}
$ head out_*
==> out_combined <==
BIG.TestTopic,Cluster,LITERAL,Bigboy,*,Create,Allow,BIG.TestTopic,BIG.ConsumerGroup,Bobby,Bigboy,bobby#example.com,team 1,Bobby,boys#example.com,2021-11-26T10:10:17Z,Consumer
BIG.DEVtopic,Cluster,LITERAL,Oldboy,*,DescribeConfigs,Allow,BIG.DEVtopic,BIG.ConsumerGroup,Bobby,Oldboy,bobby#example.com,team 1,Bobby,boys#example.com,2021-11-26T10:10:17Z,Consumer
==> out_file1_only <==
BIG.PRETopic,Cluster,LITERAL,Smallboy,*,Create,Allow
==> out_file2_only <==
BIG.TestTopic,BIG.ConsumerGroup,Susan,Younglady,younglady#example.com,team 1,Susan,girls#example.com,2021-11-26T10:10:17Z,Producer
The order of lines in out_file1_only will be shuffled by the in operator - if that's a problem let us know as it's an easy tweak to retain the input order.

How to read a csv file into arrays and compare and replace it with entries from another csv file?

I have two csv files file1.csv and file2.csv.
file1.csv contains 4 columns.
file1:
Header1,Header2,Header3,Header4
aaaaaaa,bbbbbbb,ccccccc,ddddddd
eeeeeee,fffffff,ggggggg,hhhhhhh
iiiiiii,jjjjjjj,kkkkkkk,lllllll
mmmmmmm,nnnnnnn,ooooooo,ppppppp
file2:
"Header1","Header2","Header3"
"aaaaaaa","cat","dog"
"iiiiiii","doctor","engineer"
"mmmmmmm","sky","blue"
So what I am trying to do is read file1.csv line by line, put each entry into an array then compare the first element of that array with first column of file2.csv and if a match exists then replace the column1 and column2 of file1.csv with the corresponding column of file2.csv
So my desired output is:
cat,dog,ccccccc,ddddddd
eeeeeee,fffffff,ggggggg,hhhhhhh
doctor,engineer,kkkkkkk,lllllll
sky,blue,ooooooo,ppppppp
I am able to do it when there is only column to replace.
Here is my code:
awk -F'"(,")?' '
NR==FNR { r[$2] = $3; next }
{ for (n in r) gsub(n,r[n]) } IGNORECASE=1' file2.csv file1.csv>output.csv
My final step is to dump the entire array into a file with 10 columns.
Any suggestions where I can improve or correct my code?
EDIT: Considering that your Input_file2 is having date in "ytest","test2" etc format if yes then try following.(Thanks to Tiw for providing this samples in his/her post)
awk '
BEGIN{
FS=OFS=","
}
FNR==NR{
gsub(/\"/,"")
a[tolower($1)]=$0
next
}
a[tolower($1)]{
print a[tolower($1)],$NF
next
}
1' file2.csv file1.csv
Could you please try following.
awk '
BEGIN{
FS=OFS=","
}
FNR==NR{
a[$1]=$0
next
}
a[$1]{
print a[$1],$NF
next
}
1' Input_file2 Input_file1
OR in case you could have combination of lower and capital letters in Input_file(s) then try following.
awk '
BEGIN{
FS=OFS=","
}
FNR==NR{
a[tolower($1)]=$0
next
}
a[tolower($1)]{
print a[tolower($1)],$NF
next
}
1' Input_file2 Input_file1
With any awk and any number of fields in either file:
$ cat tst.awk
BEGIN { FS=OFS="," }
{
gsub(/"/,"")
key = tolower($1)
}
NR==FNR {
for (i=2; i<=NF; i++) {
map[key,i] = $i
}
next
}
{
for (i=2; i<=NF; i++) {
$(i-1) = ((key,i) in map ? map[key,i] : $(i-1))
}
print
}
$ awk -f tst.awk file2 file1
Header2,Header3,Header3,Header4
cat,dog,ccccccc,ddddddd
eeeeeee,fffffff,ggggggg,hhhhhhh
doctor,engineer,kkkkkkk,lllllll
sky,blue,ooooooo,ppppppp
Given your sample data, and the description from your comments, please try this:
(Judged from your own code, you may have quotes around fields, thus I didn't try to answer.)
awk 'BEGIN{FS=OFS=","}
NR==FNR{gsub(/^"|"$/,"");gsub(/","/,",");a[$1]=$2;b[$1]=$3;next}
$1 in a{$2=b[$1];$1=a[$1];}
1' file2.csv file1.csv
Eg:
$ cat file1.csv
Header1,Header2,Header3,Header4
aaaaaaa,bbbbbbb,ccccccc,ddddddd
eeeeeee,fffffff,ggggggg,hhhhhhh
iiiiiii,jjjjjjj,kkkkkkk,lllllll
mmmmmmm,nnnnnnn,ooooooo,ppppppp
$ cat file2.csv
"Header1","Header2","Header3"
"aaaaaaa","cat","dog"
"iiiiiii","doctor","engineer"
"mmmmmmm","sky","blue"
$ awk 'BEGIN{FS=OFS=","}
NR==FNR{gsub(/^"|"$/,"");gsub(/","/,",");a[$1]=$2;b[$1]=$3;next}
$1 in a{$2=b[$1];$1=a[$1];}
1' file2.csv file1.csv
Header2,Header3,Header3,Header4
cat,dog,ccccccc,ddddddd
eeeeeee,fffffff,ggggggg,hhhhhhh
doctor,engineer,kkkkkkk,lllllll
sky,blue,ooooooo,ppppppp
Another way, more verbose, but I think it's better for you to understand (GNU awk):
awk 'BEGIN{FS=OFS=","}
NR==FNR{for(i=1;i<=NF;i++)$i=gensub(/^"(.*)"$/,"\\1",1,$i);a[$1]=$2;b[$1]=$3;next}
$1 in b{$2=b[$1];}
$1 in a{$1=a[$1];}
1' file2.csv file1.csv
Note a pitfall here, since $1 is the key, so we should change $1 last.
A case insensitive solution:
awk 'BEGIN{FS=OFS=","}
NR==FNR{gsub(/^"|"$/,"");gsub(/","/,",");k=tolower($1);a[k]=$2;b[k]=$3;next}
{k=tolower($1);if(a[k]){$2=b[k];$1=a[k]}}
1' file2.csv file1.csv
For code concise, added variabe k and moved "if" inside.

awk compare one column from two CSV files and display fields from both files

I want to compare second column of 1st file with 1st column of 2nd file, if match found display all fields from 1st file and all fields from 2nd file.
file1:
"971525408953","a8:5b:78:5a:dd:dc","TRUE"
"971558216784","ec:1f:72:24:7b:30","TRUE"
"971506509910","e8:50:8b:d8:f3:b5","TRUE"
"971509525934","c8:14:79:b4:bc:da","FALSE"
"971506904830","58:48:22:83:87:7f","TRUE"
file2:
"fc:e9:98:1e:a2:a2",2016-03-07 23:39:29,"TRUE"
"c8:14:79:b4:bc:da",2016-03-08 04:26:06,"TRUE"
"78:a3:e4:87:df:19",2015-12-30 01:22:42,"TRUE"
"18:f6:43:b1:82:47",2016-03-08 08:38:41,"TRUE"
"58:48:22:83:87:7f",2015-12-22 01:22:42,"TRUE"
output expected:
"c8:14:79:b4:bc:da",2016-03-08 04:26:06,"TRUE","971509525934","c8:14:79:b4:bc:da","FALSE"
"58:48:22:83:87:7f",2015-12-2201:22:42,"TRUE","971506904830","58:48:22:83:87:7f","TRUE"
But if i run following command i get this output without n[$2] and n[$3]
awk -F"," 'NR==FNR { n[$2] = $1; next } ($1 in n) {print $1,$2,$3,n[$1],n[$2],n[$3] }' file1 file2
"c8:14:79:b4:bc:da",2016-03-0804:26:06,"TRUE","971509525934",, "58:48:22:83:87:7f",2015-12-22 01:22:42,"TRUE","971506904830",,
Can any one help me on this?
awk -F"," -v OFS="," 'NR==FNR { n[$2] = $1$2$3; next } ($1 in n) {print $1,$2,$3, n[$1] }' file1 file2
output:
"c8:14:79:b4:bc:da",2016-03-08 04:26:06,"TRUE","971509525934""c8:14:79:b4:bc:da""TRUE"
"58:48:22:83:87:7f",2015-12-22 01:22:42,"TRUE","971506904830""58:48:22:83:87:7f""TRUE"

awk CSV Split with headers Windows

Ok I have a csv file I need to split based on a column value which is fine, but I cannot get the headers to print in each file.
Currently I use:
awk "FS =\",\" {output=$3\".csv\"; print $0 > output}" test.csv
Which splits the files file based on column 3, but I don't know how to add the header to each file.
I've searched high & low but can't find a solution that will work in a one liner...
UPDATE
OK to date we have a working one liner:
awk -F, "NR==1{hdr=$0;next}!($3 in files){files[$3]=1;print hdr>$3\".csv\"}{print>$3\".csv\"}" test.csv
Or in test.awk:
BEGIN{FS=","} NR==1 {hdr=$0;next}!($3 in files) {files[$3]=1;print hdr>$3".csv"}{print>$3".csv"}
Command to run used:
awk -f test.awk test.csv
I really appreciate the help here, I've been trying for hours and have a few things left to work out.
1) Blank line inserted after header
2) Sort the data on specified fields
Further down the line I want to additionally do a row count & cut a reference number from another file is this possible with AWK or am I using the wrong tool for the job?
Thanks again.
UPDATED#2
Blank line after header line
UPDATED
Try this:
On Unix/cygwin (I tested on cygwin):
awk -F, 'NR==1{hdr=$0;next}!($3 in files){files[$3]=1;print hdr"\n">$3".csv"}{print>$3".csv"}' test.csv
Or adding Kent's ideas:
awk -F, 'NR==1{hdr=$0;next}{out=$3".csv"}!($3 in files){files[$3];print hdr"\n">out}{print>out}' test.csv
On windows cmd (not tested):
awk -F, "NR==1{hdr=$0;next}!($3 in files){files[$3]=1;print hdr\"\n\">$3\".csv\"}{print>$3\".csv\"}" test.csv
This stores the header line in test.csv to hdr. For the next lines it checks if the file name value is already exists. If not then stores its name in the files hash and prints the header line. And anyway it prints the whole line to the file.
Example file:
$ cat test.csv
A,B,C,D
1,2,a,3
4,5,b,4
Output
$ cat a.csv
A,B,C,D
1,2,a,3
$ cat b.csv
A,B,C,D
4,5,b,4
ADDED
If You would like to put the awk script into a file You could try (I cannot test is, sorry).
test.awk
BEGIN{FS=","}
NR==1 {hdr=$0;next}
!($3 in files) {files[$3]=1;print hdr"\n">$3".csv"}
{print>"$3.csv"}
Then You may call it as
awk -f test.awk test.csv
awk -F, 'NR==1{h=$0;next}{out=$3".csv";
if!(out in a)print h> out; print $0 > out;a[out]}' test.csv
Try something like this:
awk -F, '
BEGIN {
getline header
}
{
out=$3".csv"
if (!($3 in seen)) {
print header > out
}
print $0 > out
seen[$3]
}' test.csv
Windows version: (Not tested)
awk " FS =\",\"
BEGIN {
getline header
}
{
out=$3\".csv\"
if (!($3 in seen)) {
print header > out
}
print $0 > out
seen[$3]
}" test.csv
awk '{ output=$3".csv"; if( !($0 in a)) print "header" > output; a[$0]
print > output}' FS=, test.csv