Read a CSV file inside awk script (CLOSED) - csv

I want to use an AWK script without typing in the terminal the CSV file and instead, call that file from inside my code.
Current Input terminal:
./script.awk file.csv
Desired Input Terminal:
./script.awk
On the other hand, here is the script I have done so far:
#!/usr/bin/awk -f
BEGIN{print"Filtered Elements:"}
BEGIN{FS=","}
{ if ($8~/.*5.*/ && $2~/.*Sh.*/ && ($3~/.*i.*/ || $4~/.*s.*/)) { print } }
{ if ($3~/.*ra.*/ && $7~/.*18.*/ && $13~/.*r.*/) { print } }
{ if ($5~/.*7.*/ && $2~/.*l.*/ && ($4~/.*Fi.*/ || $12~/20.*/)) { print } }
} **file.csv**
I aslo tried to do this:
#!/usr/bin/awk -f
BEGIN{print"Filtered Elements:"}
BEGIN{FS=","}
BEGIN{
while (getline < file.csv > 0) {
{ if ($8~/.*5.*/ && $2~/.*Sh.*/ && ($3~/.*i.*/ || $4~/.*s.*/)) { print } }
{ if ($3~/.*ra.*/ && $7~/.*18.*/ && $13~/.*r.*/) { print } }
{ if ($5~/.*7.*/ && $2~/.*l.*/ && ($4~/.*Fi.*/ || $12~/20.*/)) { print } }
}
But either ways an error occurred.
Thank you in advance!

Your second example is a correct getline loop, except that the file path should be quoted to be treated as a string (and not a variable): while (getline < "file.csv" > 0) #....
Alternatively, you can set the script arguments (including input files and variables) by manipulating ARGV and ARGC in a BEGIN block:
BEGIN {
ARGV[1] = "file.csv"
ARGC = 2
}
{
# commands here process file.csv as normal
}
Running this as ./script is the same as if you set the argument with the shell (like ./script file.csv).

An awk script isn't a command you call, it's a set of instructions interpreted by awk where awk IS a command you call. What you're trying to do apparently is write a Unix command that's implemented as a shell script which includes a call to awk, e.g.:
#!/usr/bin/env bash
awk '
{ print "foo", $0 }
' 'file.csv'
Store that in a file named stuff (not stuff.awk or stuff.sh or anything else with a suffix), and then call it as ./stuff or just stuff if the current directory is in your PATH.
Though you technically can use a shebang to call awk directly, don't do it - see https://stackoverflow.com/a/61002754/1745001.

Related

Tcl: can catch { exec } know whether a final newline was output?

Consider the following:
% catch { exec echo "test" } result
0
% catch { exec echo -n "test" } resultnonl
0
% if { $result == $resultnonl } { echo "true" }
true
Question: Is there a way for the two resulting variables to be different?
Use case: I'm retrieving the contents of the clipboard and cannot differentiate between these two cases. In Emacs, it is very common for me to kill (cut) a line without its final newline, and also very common to kill a whole line. The clipboard only differs by the newline.
Check out the -keepnewline flag to exec. Watch:
catch { exec -keepnewline -- echo "test" } result
string length $result

Create a table with style from file

I am trying to create an table from a csv file. In the csv file I have the three fields that are already filtered so that it does not generate problems, but when I run the code, the report file does not generate any output and it must be an error when going through the file or I do not know where else is the failure:
The csv input looks similar to this:
I cannot see an obvious error in your file but was able to generate the required html file using an awk script file as follows (the correct #! path can be found using which awk in terminal):
#! /usr/bin/awk -f
BEGIN {
FS =","
print "<!DOCTYPE html>\n<head><title>Report</title>"
print "<link rel=\"stylesheet\" href=\"style.css\">"
print "<meta charset=\"utf-8\"/>"
print "</head>\n<body>\n<div class=\"head-style\"<h2>Report</h2>\n</div>"
}
NR<2 {
print "<table>\n<tr><th>"$1"</th><th>"$2"</th><th>"$3"</th>"
}
NR>1 {
print "<tr><td>"$1"</td><td>"$2"</td><td>"$3"</td></tr>"
}
END {
print "</table>\n<div class=\"footer\">\n<p>0000 st</p>\n</div>\n</body>\n</html>"
}
I formatted the printing using field references $1 etc. between quotted string. Note also, quotes can be escaped for printing.
I saved the script as awkScript.awk and made it executable from the command line using:
chmod +x awkScript.awk
This can then be executed on the csv file with the command:
./awkScript.awk rm.csv > rep.html
It looks like you forgot to pass the fields as parameters to function print_line(Platan, Recl, Tror).
# Since "Platan" and "Recl" are strings, I think the format string
# should be: "%s %s %.2f %s\n" (BTW, I included "\n" to improve readability).
function print_line(Platan, Recl, Tror) {
printf("%s %s %.2f %s\n", "<tr><td>"Platan"</td>", "<td>"Recl"</td><td>", Tror, "</td></tr>") ;
}
{
if (NR > 1) {
print_line($1, $2, $3) # should solve
}
}

Is there a simple way to convert a CSV with 0-indexed paths as keys to JSON with Miller?

Consider the following CSV:
email/1,email/2
abc#xyz.org,bob#pass.com
You can easily convert it to JSON (taking into account the paths defined by the keys) with Miller:
mlr --icsv --ojson --jflatsep '/' cat file.csv
[ { "email": ["abc#xyz.org", "bob#pass.com"] } ]
Now, if the paths are 0-indexed in the CSV (which is surely more common):
email/0,email/1
abc#xyz.org,bob#pass.com
Then, without prior knowledge of the fields names, it seams that you'll have to rewrite the whole conversion:
edit: replaced the hard-coded / with FLATSEP builtin variable:
mlr --icsv --flatsep '/' put -q '
begin { #labels = []; print "[" }
# translate the original CSV header from 0-indexed to 1-indexed
NR == 1 {
i = 1;
for (k in $*) {
#labels[i] = joinv( apply( splita(k,FLATSEP), func(e) {
return typeof(e) == "int" ? e+1 : e
}), FLATSEP );
i += 1;
}
}
NR > 1 { print #object, "," }
# create an object from the translated labels and the row values
o = {};
i = 1;
for (k,v in $*) {
o[#labels[i]] = v;
i += 1;
}
#object = arrayify( unflatten(o,FLATSEP) );
end { if (NR > 0) { print #object } print "]" }
' file.csv
I would like to know if I'm missing something obvious, like a command line option or a way to rename the fields with the put verb, or maybe something else? You're also welcome to give your insights about the previous code, as I'm not really confident in my Miller's programming skills.
Update:
With #aborruso approach of pre-processing the CSV header, this could be reduced to:
note: I didn't keep the regextract part because it means knowing the CSV header in advance.
mlr --csv -N --flatsep '/' put '
NR == 1 {
for (i,k in $*) {
$[i] = joinv( apply( splita(k,FLATSEP), func(e) {
return typeof(e) == "int" ? e+1 : e
}), FLATSEP );
}
}
' file.csv |
mlr --icsv --flatsep '/' --ojson cat
Even if there are workarounds like using the rename verb (when you know the header in advance) or pre-processing the CSV header, I still hope that Miller's author could add an extra command-line option that would deal with this kind of 0‑indexed external data; adding a DSL function like arrayify0 (and flatten0) could also prove useful in some cases.
I would like to know if I'm missing something obvious, like a command line option or a way to rename the fields with put verb, or maybe something else?
Starting from this
email/0,email/1
abc#xyz.org,bob#pass.com
you can use implicit CSV header and run
mlr --csv -N put 'if (NR == 1) {for (k in $*) {$[k] = "email/".string(int(regextract($[k],"[0-9]+"))+1)}}' input.csv
to have
email/1,email/2
abc#xyz.org,bob#pass.com

AWK : comparing 2 columns from 2 csv files, outputting to a third. How do I also get the output that doesnt match to another file?

I currently have the following script:
awk -F, 'NR==FNR { a[$1 FS $4]=$0; next } $1 FS $4 in a { printf a[$1 FS $4]; sub($1 FS $4,""); print }' file1.csv file2.csv > combined.csv
this compares two columns 1 & 4 from both csv files and outputs the result from both files to combined.csv. Is it possible to output the lines from file 1 & file 2 that dont match to other files with the same awk line? or would i need to do seperate parses?
File1
ResourceName,ResourceType,PatternType,User,Host,Operation,PermissionType
BIG.TestTopic,Cluster,LITERAL,Bigboy,*,Create,Allow
BIG.PRETopic,Cluster,LITERAL,Smallboy,*,Create,Allow
BIG.DEVtopic,Cluster,LITERAL,Oldboy,*,DescribeConfigs,Allow
File2
topic,groupName,Name,User,email,team,contact,teamemail,date,clienttype
BIG.TestTopic,BIG.ConsumerGroup,Bobby,Bigboy,bobby#example.com,team 1,Bobby,boys#example.com,2021-11-26T10:10:17Z,Consumer
BIG.DEVtopic,BIG.ConsumerGroup,Bobby,Oldboy,bobby#example.com,team 1,Bobby,boys#example.com,2021-11-26T10:10:17Z,Consumer
BIG.TestTopic,BIG.ConsumerGroup,Susan,Younglady,younglady#example.com,team 1,Susan,girls#example.com,2021-11-26T10:10:17Z,Producer
combined
BIG.TestTopic,Cluster,LITERAL,Bigboy,*,Create,Allow,BIG.TestTopic,BIG.ConsumerGroup,Bobby,Bigboy,bobby#example.com,team 1,Bobby,boys#example.com,2021-11-26T10:10:17Z,Consumer
BIG.DEVtopic,Cluster,LITERAL,Oldboy,*,DescribeConfigs,Allow,BIG.DEVtopic,BIG.ConsumerGroup,Bobby,Oldboy,bobby#example.com,team 1,Bobby,boys#example.com,2021-11-26T10:10:17Z,Consumer
Wanted additional files:
non matched file1:
BIG.PRETopic,Cluster,LITERAL,Smallboy,*,Create,Allow
non matched file2:
BIG.TestTopic,BIG.ConsumerGroup,Susan,Younglady,younglady#example.com,team 1,Susan,girls#example.com,2021-11-26T10:10:17Z,Producer```
again, I might be trying to do too much in one line? would it be wiser to run another parse?
Assuming the key pairs of $1 and $4 are unique within each input file then using any awk in any shell on every Unix box:
$ cat tst.awk
BEGIN { FS=OFS="," }
FNR==1 { next }
{ key = $1 FS $4 }
NR==FNR {
file1[key] = $0
next
}
key in file1 {
print file1[key], $0 > "out_combined"
delete file1[key]
next
}
{
print > "out_file2_only"
}
END {
for (key in file1) {
print file1[key] > "out_file1_only"
}
}
$ awk -f tst.awk file{1,2}
$ head out_*
==> out_combined <==
BIG.TestTopic,Cluster,LITERAL,Bigboy,*,Create,Allow,BIG.TestTopic,BIG.ConsumerGroup,Bobby,Bigboy,bobby#example.com,team 1,Bobby,boys#example.com,2021-11-26T10:10:17Z,Consumer
BIG.DEVtopic,Cluster,LITERAL,Oldboy,*,DescribeConfigs,Allow,BIG.DEVtopic,BIG.ConsumerGroup,Bobby,Oldboy,bobby#example.com,team 1,Bobby,boys#example.com,2021-11-26T10:10:17Z,Consumer
==> out_file1_only <==
BIG.PRETopic,Cluster,LITERAL,Smallboy,*,Create,Allow
==> out_file2_only <==
BIG.TestTopic,BIG.ConsumerGroup,Susan,Younglady,younglady#example.com,team 1,Susan,girls#example.com,2021-11-26T10:10:17Z,Producer
The order of lines in out_file1_only will be shuffled by the in operator - if that's a problem let us know as it's an easy tweak to retain the input order.

Merge *.csv files and add filename as a column in terminal using awk

Summary: I have close to 500 *.csv files that I need to merge into one csv file where during the merge process the filename for each csv needs to be added in each row in a new column.
I have read many threads here on stackoverflow and beyond. I am attempting to do this in Terminal (not a script that I run in terminal). Here is what I have so far. When I run this Terminal it returns "for quote>" and does not complete. I am hoping that someone can guide me easily.
for f in *.csv; do awk -f ' { x=1 ; if ( x == NR ) { print "date,ProductNumber,Brand,Description,Size,UnitType,Pack,UPC,Available,Status,Delivery Due Date" } else { gsub(".csv","",FILENAME); print FILENAME","$0 } } “$f” > “output$f”; done
Each csv file is structured the same and here is some sample data:
ProductNumber,Brand,Description,Size,UnitType,Pack,UPC,Available,Status,Delivery Due Date
="0100503","BARNEY BUTTER","ALMOND BTR,SMOOTH","16 OZ ","CS"," 6",="0094922553584"," 99","Active"," "
="0100701","NATRALIA","BODY LOTION,DRY SKIN","8.45 FZ ","EA"," 1",="0835787000765"," 33","Active"," "
="0101741","SAN PELLEGRINO","SPRKLNG BEV,ARANCIATA,ROS","6/11.15F","CS"," 4",="0041508300360"," 0","Active"," "
awk -v OFS=, '
FNR == 1 {
print "date,ProductNumber,Brand,Description,Size,UnitType,Pack,UPC,Available,Status,Delivery Due Date"
file = FILENAME
sub(/.csv$/, "", file)
}
{print file, $0}
' *.csv > out.csv
If the list of file is too long, then
find . -name '*.csv' -print0 | xargs -0 awk '...' > out.csv
awk -v OFS=, '
NR == 1 && FNR == 1 {
file = FILENAME
sub(/.csv$/, "", file)
print "filename", $0
}
NR > 1 && FNR > 1{
file = FILENAME
sub(/.csv$/, "", file)
print file, $0
}
' *.csv > out.csv
This worked for me. Keeps the first line/header from the first file, and adds a header for the column that will hold the filename, then dumps all the files.