output lines from file A where first columns match file B - csv

Two csv files formatted identically like this:
blah#domain.com,Elon,Tusk
I want to output the lines from the first file which match the same email address in the second file

Instead of awk, I use join for this type of task because it's simpler/easier for me to remember e.g. join -t',' -o 1.1,1.2,1.3 <(sort -t',' -k1,1 first.csv) <(sort -t',' -k1,1 second.csv), although I believe that awk is the best tool for this type of task, e.g. awk -F, 'FNR==NR {a[$1]; next}; $1 in a' second.csv first.csv

Related

extract rows from one csv file based on column information in the other csv

I have 2 csv files. I want the information from column 1 of from file 1 to be used to extract rows from the other csv file.
My file1.csv has names arranged in columns :
ENSG00000102977
ENSG00000100823
ENSG00000149311
ENSG00000175054
ENSG00000085224
ENSG00000178999
ENSG00000197299
ENSG00000139618
ENSG00000105173
ENSG00000175305
ENSG00000166226
ENSG00000163468
ENSG00000115484
ENSG00000150753
ENSG00000146731
and the 2nd csv file has the names along with the corresponding values arranged in rows.
ENSG00000102977 1083.82334384253 1824.50639384557 1962.86064714976 1367.60568624972
I wrote an awk script
`awk 'FNR == NR{f1[$1];next} $1 in f2 {print $1,$2,$3,$4,$5 $NF}' FS="," file1.csv FS="," file2.csv`
but it returns without any output or error.
Kindly guide me as to where I am wrong. Its a bit puzzling since there is no error.
Try grep's option -f - read pattern from a file.
grep -f fileWithPatterns.csv fileToExtractFrom.csv

matching patterns and create new files

I have a csv file named file1.csv:
something;AD;sss;Andorra;nothing;type_1;sss
something222;AD;sss222;Andorra;nothing222;type_2;aaa
thing;NL;thing3;Netherlands;thing;type_2;bb
etc;US;etc;United States;etc;type_2;nothing
I want to create separate files for each country. I make greps like that:
grep -e "\;AD\;.*\;Andorra\;" file1.csv > fileAD.csv
grep -e "\;NL\;.*\;Netherlands\;" file1.csv > fileNL.csv
grep -e "\;US\;.*\;United\sStates\;" file1.csv > fileUS.csv
This works, but I have all countries in the world. And i don't want to write these lines for every country. Is there any other solution ? Any help is really apreciated.
Edit: I updated my question. I also have a column with type_1 and type_2. And after all the files corresponding each country are created , I need to create new files for every country with just type_1 and new files with just type_2.
For example, for Andorra, I need the files:
fileAD.csv :
something;AD;sss;Andorra;nothing;type_1;sss
something222;AD;sss222;Andorra;nothing222;type_2;aaa
fileADtype_1.csv:
something;AD;sss;Andorra;nothing;type_1;sss
fileADtype_2.csv:
something222;AD;sss222;Andorra;nothing222;type_2;aaa
I think that is ok to look just for the column with the abbreviation, but i wanted the 2 columns, the one with "AD" and the one with the full_name "Andorra" for security reasons.
I go for a one liner with only one instance of awk, without temporary files:
awk -F ';' '{print >> "file" $2 ".csv"}' file1.csv
As one liner with awk:
for code in $(awk -F';' '{print $2}' data.csv | uniq); do awk -F';' -v pat="$code" '$2 ~ pat {print $0}' data.csv > "file${code}.csv"; done

Change column if some regex expression is true with awk or sed

I have a file (lets call it data.csv) similar to this
"123","456","ud,h-match","moredata"
with many rows in the same format and embedded commas. What I need to do is look at the third column and see if it has an expression. In this case I want to know if the third column has "match" anywhere (which it does). If there is any, then I to replace the whole column to something else like "replaced". So to relate it to the example data.csv file, I would want it to look this.
"123","456","replaced","moredata"
Ideally, I want the file data.csv itself to be changed (time is of the essence since I have a big file) but it's also fine if you write it to another file.
Edit:
I have tried using awk:
awk -F'","' -OFS="," '{if(tolower($3) ~ "stringI'mSearchingFor"){$3="replacement"; print}else print}' file
but it dosen't change anything. If I remove the OFS portion then it works but it gets separated by spaces and the columns don't get enclosed by double quotes.
Depending on the answer to my question about what you mean by column, this may be what you want (uses GNU awk for FPAT):
$ awk -v FPAT='[^,]+|"[^"]+"' -v OFS=',' '$3~/match/{$3="\"replaced\""} 1' file
"123","456","replaced","moredata"
Use awk -i inplace ... if you want to do "in place" editing.
With any awk (but slightly more fragile than the above since it leaves the leading/trailing " on the first and last fields, and has no -i inplace):
$ awk 'BEGIN{FS=OFS="\",\""} $3~/match/{$3="replaced"} 1' file
"123","456","replaced","moredata"

How to split text file into multiple files and extract filename from line prefix?

I have a simple log file with content like:
1504007980.039:{"key":"valueA"}
1504007990.359:{"key":"valueB", "key2": "valueC"}
...
That I'd like to output to multiple files that each have as content the JSON part that comes after the timestamp. So I would get as a result the files:
1504007980039.json
1504007990359.json
...
This is similar to How to split one text file into multiple *.txt files? but the name of the file should be extracted from each line (and remove an extra dot), and not generated via an index
Preferably I'd want a one-liner that can be executed in bash.
Since you aren't using GNU awk you need to close output files as you go to avoid the "too many open files" error. To avoid that and issues around specific values in your JSON and issues related to undefined behavior during output redirection, this is what you need:
awk '{
fname = $0
sub(/\./,"",fname)
sub(/:.*/,".json",fname)
sub(/[^:]+:/,"")
print >> fname
close(fname)
}' file
You can of course squeeze it onto 1 line if you see some benefit to that:
awk '{f=$0;sub(/\./,"",f);sub(/:.*/,".json",f);sub(/[^:]+:/,"");print>>f;close(f)}' file
awk solution:
awk '{ idx=index($0,":"); fn=substr($0,1,idx-1)".json"; sub(/\./,"",fn);
print substr($0,idx+1) > fn; close(fn) }' input.log
idx=index($0,":") - capturing index of the 1st :
fn=substr($0,1,idx-1)".json" - preparing filename
Viewing results (for 2 sample lines from the question):
for f in *.json; do echo "$f"; cat "$f"; echo; done
The output (filename -> content):
1504007980039.json
{"key":"valueA"}
1504007990359.json
{"key":"valueB"}

merge csv unix based on column 1

Hi I have two csv files having same columns like,
x.csv
column1,column2
A,2
B,1
y.csv
column1,column2
A,1
C,2
I want the output like:
z.csv
column1,column2
A,2
B,1
C,2
i.e. for the matching data in first column, I want to keep the x.csv record and for a new field in y.csv (like A,2) i just want to append it (like C,2).
Thanks
$ awk -F, 'NR==FNR{a[$1]; print; next} ! ($1 in a)' x.csv y.csv
column1,column2
A,2
B,1
C,2
How it works
-F,
This tells awk to use a comma as the field separator
NR==FNR{a[$1]; print; next}
While reading the first file (NR==FNR), this tells awk to (a) to add $1 as a key to the associative array a, (b) print the line, and (c) skip the remaining commands and jump to the next line in a file.
! ($1 in a)
If we get here, that means we are working on the second file. In that case, we print the line if the first field is not a key of array a (in other words, if the first field did not appear in the first file).