Printing column separated by comma using Awk command line

Printing column separated by comma using Awk command line - csv

I have a problem here. I have to print a column in a text file using awk. However, the columns are not separated by spaces at all, only using a single comma. Looks something like this:
column1,column2,column3,column4,column5,column6
How would I print out 3rd column using awk?

Try:
awk -F',' '{print $3}' myfile.txt
Here in -F you are saying to awk that use , as the field separator.

If your only requirement is to print the third field of every line, with each field delimited by a comma, you can use cut:
cut -d, -f3 file
-d, sets the delimiter to a comma
-f3 specifies that only the third field is to be printed

Try this awk
awk -F, '{$0=$3}1' file
column3
, Divide fields by ,
$0=$3 Set the line to only field 3
1 Print all out. (explained here)
This could also be used:
awk -F, '{print $3}' file

A simple, although awk-less solution in bash:
while IFS=, read -r a a a b; do echo "$a"; done <inputfile
It works faster for small files (<100 lines) then awk as it uses less resources (avoids calling the expensive fork and execve system calls).
EDIT from Ed Morton (sorry for hi-jacking the answer, I don't know if there's a better way to address this):
To put to rest the myth that shell will run faster than awk for small files:
$ wc -l file
99 file
$ time while IFS=, read -r a a a b; do echo "$a"; done <file >/dev/null
real 0m0.016s
user 0m0.000s
sys 0m0.015s
$ time awk -F, '{print $3}' file >/dev/null
real 0m0.016s
user 0m0.000s
sys 0m0.015s
I expect if you get a REALY small enough file then you will see the shell script run in a fraction of a blink of an eye faster than the awk script but who cares?
And if you don't believe that it's harder to write robust shell scripts than awk scripts, look at this bug in the shell script you posted:
$ cat file
a,b,-e,d
$ cut -d, -f3 file
-e
$ awk -F, '{print $3}' file
-e
$ while IFS=, read -r a a a b; do echo "$a"; done <file
$

Related

Running math on a few columns in awk

This seems like a simple problem but I find everything command line confusing.
I have a CSV with 5 columns. I want to multiply everything in columns 2-5 by a variable defined earlier in my bash script.
Obviously the below doesn't work, but show's what I'm trying to achieve:
awk -F , -v OFS=, 'seq($2 $5)*='$MULTIPLIER in.csv > out.csv

Generally speaking:
awk -F, -v OFS=, -v m="${MULTIPLIER}" '{for (i=2;i<=5;i++) $i*=m}1' in.csv > out.csv
Assuming there's a header record, and a variation on setting FS/OFS:
awk -v m="${MULTIPLIER}" 'BEGIN {FS=OFS=","} NR>1 {for (i=2;i<=5;i++) $i*=m}1' in.csv > out.csv

Processing command substitutions in strings embedded in a .json file with jq

I want to read environment variables from .json-file:
{
"PASSPHRASE": "$(cat /home/x/secret)",
}
With the following script:
IFS=$'\n'
for s in $(echo $values | jq -r "to_entries|map(\"\(.key)=\(.value|tostring)\")|.[]" $1); do
export $s
echo $s
done
unset IFS
But the I got $(cat /home/x/secret) in PASSPHRASE, and cat is not executed. When I execute the line export PASSPHRASE=$(cat /home/x/secret), I got the correct result (content of file in environment variable). What do I have to change on my script, to get it working?

When you do export PASSPHRASE=$(cat /home/x/secret) in the shell, it interpretes the $() expression, executes the command within and puts the output of it inside of the variable PASSPHRASE.
When you place $() in the json file, however, it is read by jq and treated as a normal string, which is the equivalent of doing export PASSPHRASE=\$(cat /home/x/secret) (notice the slash, which causes the dollar sign to be escaped and treated as a literal character, instead of creating a new shell). If you do that instead and try to echo the contents of the variable it will have similar results as running your script.
If you want to force bash to interpret the string as a command you could use sh -c <command> instead, for example like this:
test.json:
{
"PASSPHRASE": "cat /home/x/secret"
}
test.sh:
IFS=$'\n'
for s in $(echo $values | jq -r "to_entries|map(\"\(.value|tostring)\")|.[]" $1); do
echo $(sh -c $s)
done
unset IFS
This prints out the contents of /home/x/secret. It does not solve your problem directly but should give you an idea of how you could change your original code to achieve what you need.

Thanks to Maciej I changed the script and got it working:
IFS=$'\n'
for line in $(jq -r "to_entries|map(\"\(.key)=\(.value|tostring)\")|.[]" "$1"); do
lineExecuted=$(sh -c "echo $line")
export "$lineExecuted"
echo "$lineExecuted"
done
unset IFS

Merge csv files with same header: passing multiple files to awk with xargs

I need to merge together all csv files of a directory that have the same header (1st line). Let's say we have:
file a.txt:
head1,head2,head3
1,2,"abc"
8,42,"def"
file b.txt:
head4,head2
"aa",2
file c.txt:
head1,head2,head3
12,2,"z"
15,2,"z"
If I want all files with header "head1,head2,head3", then it should merge files a and c and produce:
awk 'FNR==1 && NR!=1{next;}{print}' a.txt c.txt
head1,head2,head3
1,2,"abc"
8,42,"def"
12,2,"z"
15,2,"z"
Now I can, for a given header, detect the files to merge automatically, but I can't pass the resulting list to awk. I am using the following command:
head -n1 -v * | grep -B1 "head1,head2,head3" | awk "/==>/{ print \$2 }" | xargs -l -0 awk 'FNR==1 && NR!=1{next;}{print}'
awk: fatal: cannot open file `a.txt
c.txt
' for reading (No such file or directory)
where head lists the file names and first lines, grep keeps only the headers that are matching (and associated filenames on the preceding lines with -B1), and the first call to awk keeps only the file names, one per line.
I tried as well (adding tr '\n' ' '):
head -n1 -v * | grep -B1 "head1,head2,head3" | awk "/==>/{ print \$2 }" | tr '\n' ' ' | xargs -l -0 awk 'FNR==1 && NR!=1{next;}{print}'
awk: fatal: cannot open file `a.txt c.txt ' for reading (No such file or directory)
I eventually tried the following (using tr '\n' '\0' instead):
head -n1 -v * | grep -B1 "head1,head2,head3" | awk "/==>/{ print \$2 }" | tr '\n' '\0' | xargs -l -0 awk 'FNR==1 && NR!=1{next;}{print}'
head1,head2,head3
1,2,"abc"
8,42,"def"
head1,head2,head3
12,2,"z"
15,2,"z"
(although I'm not sure to understand exactly how \0 is interpreted), at least this command works but it looks like each file is processed separately by awk, as the header is printed two times.
What am I missing?

Does this help?
$ awk -v h='head1,head2,head3' 'BEGIN{print h} FNR==1{f=$0==h?1:0; next} f' *.txt
head1,head2,head3
1,2,"abc"
8,42,"def"
12,2,"z"
15,2,"z"
-v h='head1,head2,head3' save the header line to check in a variable h
BEGIN{print h} print the header (assumes there'll be at least one file that matches)
FNR==1{f=$0==h?1:0; next} set/clear flag based on first line of file matching contents of h
f print if flag is set
*.txt list of files to merge
With GNU awk, you can skip unnecessarily reading files that don't match by using FNR==1{if($0!=h) nextfile; next} 1

Using awk to extract a token from a larger JSON string

I have a string assigned to a variable:
#/bin/bash
fullToken='{"type":"APP","token":"l0ng_Str1ng.of.d1fF3erent_charAct3rs"}'
I need to extract only l0ng_Str1ng.of.d1fF3erent_charAct3rs without quotes and assign that to another variable.
I understand I can use awk, sed, or cut but I am having trouble getting around the special characters in the original string.
Thanks in advance!
EDIT: I was not awake I should specify this is JSON. Thanks for the replies so far.
EDIT2: I am using BSD (macOS)

It looks like you have a JSON string there. Keep in mind that JSON is unordered, so most sed, awk, cut solutions will fail if you string comes next time in a different order.
It is most robust to use a JSON parser.
You could use ruby with its JSON parser library:
$ echo "$fullToken" | ruby -r json -e 'p JSON.parse($<.read)["token"];'
"l0ng_Str1ng.of.d1fF3erent_charAct3rs"
Or, if you don't want the quoted string (which is useful for Bash):
$ echo "$fullToken" | ruby -r json -e 'puts JSON.parse($<.read)["token"];'
l0ng_Str1ng.of.d1fF3erent_charAct3rs
Or with jq:
$ echo "$fullToken" | jq '.token'
"l0ng_Str1ng.of.d1fF3erent_charAct3rs"
All these solutions will work even if the JSON string is in a different order:
$ echo '{"type":"APP","token":"l0ng_Str1ng.of.d1fF3erent_charAct3rs"}' | jq '.token'
"l0ng_Str1ng.of.d1fF3erent_charAct3rs"
$ echo '{"token":"l0ng_Str1ng.of.d1fF3erent_charAct3rs", "type":"APP"}' | jq '.token'
"l0ng_Str1ng.of.d1fF3erent_charAct3rs"
But KNOWING that you SHOULD use a JSON parser, you can also use a PCRE with a look behind in Gnu Grep:
$ echo "$fullToken" | grep -oP '(?<="token":)"([^"]*)'
Or in Perl:
$ echo "$fullToken" | perl -lane 'print $1 if /(?<="token":)"([^"]*)/'
Both of those also work if the string is in a different order.
Or, with POSIX awk:
$ echo "$fullToken" | awk -F"[,:}]" '{for(i=1;i<=NF;i++){if($i~/"token"/){print $(i+1)}}}'
Or, with POSIX sed, you can do:
$ echo "$fullToken" | sed -E 's/.*"token":"([^"]*).*/\1/'
Those solutions are presented strongest (use a JSON parser) to more fragile (sed). But the sed solution I have there is better than the other because it will support the key, values in the JSON string being in different order.
Ps: If you want to remove the quotes from a line, that is a great job for sed:
$ echo '"quoted string"'
"quoted string"
$ echo '"quoted string"' | sed -E 's/^"(.*)"$/UN\1/'
UNquoted string

In awk:
$ awk -v f="$fullToken" '
BEGIN{
while(match(f,/[^:{},]+:[^:{},]+/)) { # search key:value pairs
p=substr(f,RSTART,RLENGTH) # set pair to p
f=substr(f,RSTART+RLENGTH) # remove p from f
split(p,a,":") # split to get key and value
for(i in a) # remove leadin and trailing "
gsub(/^"|"$/,"",a[i])
if(a[1]=="token") { # if key is token
print a[2] # output value
exit # no need to process further
}
}
}'
l0ng_Str1ng.of.d1fF3erent_charAct3rs
l0ng_String can't have characters :{}.

GNU sed:
fullToken='{"type":"APP","token":"l0ng_Str1ng.of.d1fF3erent_charAct3rs"}'
echo "$fullToken"|sed -r 's/.*"(.*)".*/\1/'

grep method would be,
$ grep -oP '[^"]+(?="[^"]+$)' <<< "$fullToken"
l0ng_Str1ng.of.d1fF3erent_charAct3rs
Brief explanation,
[^"]+ : grep would extract the non-" pattern
(?="[^"]+$): extract until the pattern ahead of last "
You may also use sed method to do that,
$sed -E 's/.*"([^"]+)"[^"]+$/\1/' <<< "$fullToken"
l0ng_Str1ng.of.d1fF3erent_charAct3rs

If the source of your string is JSON, then you should use JSON-specific tools. If not, then consider:
Using awk
$ fullToken='{"type":"APP","token":"l0ng_Str1ng.of.d1fF3erent_charAct3rs"}'
$ echo "$fullToken" | awk -F'"' '{print $8}'
l0ng_Str1ng.of.d1fF3erent_charAct3rs
Using cut
$ echo "$fullToken" | cut -d'"' -f8
l0ng_Str1ng.of.d1fF3erent_charAct3rs
Using sed
$ echo "$fullToken" | sed -E 's/.*"([^"]*)"[^"]*$/\1/'
l0ng_Str1ng.of.d1fF3erent_charAct3rs
Using bash and one of the above
The above all work with POSIX shells. If the shell is bash, then we can use a here-string and eliminate the pipeline. Taking cut as the example:
$ cut -d'"' -f8 <<<"$fullToken"
l0ng_Str1ng.of.d1fF3erent_charAct3rs

BATCH: grep equivalent

I need some help what ith the equivalent code for grep -v Wildcard and grep -o in batch file.
This is my code in shell.
result=`mysqlshow --user=$dbUser --password=$dbPass sample | grep -v Wildcard | grep -o sample`

The batch equivalent of grep (not including third party tools like GnuWin32 grep), will be findstr.
grep -v finds lines that don't match the pattern. The findstr version of this is findstr /V
grep -o shows only the part of the line that matches the pattern. Unfortunately, there's no equivalent of this, but you can run the command and then have a check along the lines of
if %errorlevel% equ 0 echo sample

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Printing column separated by comma using Awk command line - csv

I have a problem here. I have to print a column in a text file using awk. However, the columns are not separated by spaces at all, only using a single comma. Looks something like this: column1,column2,column3,column4,column5,column6 How would I print out 3rd column using awk?

Try: awk -F',' '{print $3}' myfile.txt Here in -F you are saying to awk that use , as the field separator.

If your only requirement is to print the third field of every line, with each field delimited by a comma, you can use cut: cut -d, -f3 file -d, sets the delimiter to a comma -f3 specifies that only the third field is to be printed

Try this awk awk -F, '{$0=$3}1' file column3 , Divide fields by , $0=$3 Set the line to only field 3 1 Print all out. (explained here) This could also be used: awk -F, '{print $3}' file

Related

Running math on a few columns in awk

Processing command substitutions in strings embedded in a .json file with jq

Merge csv files with same header: passing multiple files to awk with xargs

Using awk to extract a token from a larger JSON string

BATCH: grep equivalent

Categories

Resources