How to create a logic that substract results of two awk logics? - csv

I need help to subtract 2 of the awk result below. Could someone give me an insight?
Find the total of rows filtered by the specified word in 12th column
Find the total of rows filtered by the specified word in 12th column and has the specified date in column 13
Subtract 1 and 2 and print the result
This solves problem 1
awk -F ',' '$12 ~ /<WORD>/ {count++} END {print count}' file.csv
This solves problem 2
awk -F ',' '$12 ~ /<WORD>/ && $13 ~ /<DATE>/ {count2++} END {print count2}' file.csv
Unfortunately, I'm not getting the result for problem 3 below.
awk -F ',' '$12 ~ /<WORD>/ {count++} END {print count}' file.csv; awk -F ',' '$12 ~ /<WORD>/ && $13 ~ /<DATE>/ {count2++} END {print count2}' file.csv; awk {print $count-$count2}

If you run multiple awk commands, the variables used are not shared. If you want them to be shared, you could combine the commands into a single program:
awk -F ',' '
$12 ~ /<WORD>/ {count++}
'$12 ~ /<WORD>/ && $13 ~ /<DATE>/ {count2++}
END {print $count-$count2}
' file.csv
However, your three specifications seem to simplify to:
print the number of the rows of a csv file file.csv which contain a specific word in column 12 and which do not contain a specific date in column 13
awk -F, '$12~/word/ && $13!~/date/ {n++} END {print n+0}' file.csv
where /word/ and /date/ are regular expressions that provide the required word and date respectively.

Related

Convert single column to multiple, ensuring column count on last line

I would like to use AWK (Windows) to convert a text file with a single column to multiple columns - the count specified in the script or on the command line.
This question has been asked before but my final data file needs to have the same column count all the way.
Example of input:
L1
L2
L3
L4
L5
L6
L7
split into 3 columns and ";" as a separator
L1;L2;L3
L4;L5;L6
L7;; <<< here two empty fields are created after end of file, since I used just one on this line.
I tried to modify variants of the typical solution given: NR%4 {printf $0",";next} 1; and a counter, but could not quite get it right.
I would prefer not to count lines before, thereby running over the file multiple times.
You may use this awk solution:
awk -v n=3 '{
sub(/\r$/, "") # removes DOS line break, if present
printf "%s", $0(NR%n ? ";" : ORS)
}
END {
# now we need to add empty columns in last record
if (NR % n) {
for (i=1; i < (n - (NR % n)); ++i)
printf ";"
print ""
}
}' file
L1;L2;L3
L4;L5;L6
L7;;
With your shown samples please try following awk code. Using xargs + awk combination to achieve the outcome needed by OP.
xargs -n3 < Input_file |
awk -v OFS=";" '{if(NF==1){$0=$0";;"};if(NF==2){$0=$0";"};$1=$1} 1'
For an awk I would do:
awk -v n=3 '
{printf("%s%s", $0, (NR%n>0) ? ";" : ORS)}
END{
for(i=NR%n; i<n-1; i++) printf(";")
printf ORS
}' file
Or, an alternative awk:
awk -v n=3 -v OFS=";" '
{ row=row ? row FS $0 : $0 } # build row of n fields
!(NR%n) {$0=row; NF=n; print; row="" } # split the fields sep by OFS
END { if (NR%n) { $0=row; NF=n; print } } # same
' file
Or you can use ruby if you want more options:
ruby -le '
n=3
puts $<.read.
split($/).
each_slice(n).
map{|sl| sl.fill(sl.size...n) { "" }; sl.join(";") }.
join($\) # By using $\ and $/ with the -l the RS and ORS is set correctly for the platform
' file
Or, realize that paste is designed to do this:
paste -d';' - - - <file
(Use a - for each column desired)
Any of those prints (with n=3):
L1;L2;L3
L4;L5;L6
L7;;
(And work correctly for other values of n...)

Awk multiple transformation / separators in once

I have to transform (preprocess) a CSV file, by generating / inserting a new column, being the result of the concat of existing columns.
For example, transform:
A|B|C|D|E
into:
A|B|C|D|C > D|E
In this example, I do it with:
cat myfile.csv | awk 'BEGIN{FS=OFS="|"} {$4 = $4 OFS $3" > "$4} 1'
But now I have something more complex to do, and dont find how to do this.
I have to transform:
A|B|C|x,y,z|E
into
A|B|C|x,y,z|C > x,C > y,C > z|E
How can it be done in awk (or other command) efficiently (my csv file can contains thousands of lines)?
Thanks.
With GNU awk (for gensub which is a GNU extension):
awk -F'|' '{$6=$5; $5=gensub(/(^|,)/,"\\1" $3 " > ","g",$4); print}' OFS='|'
You can split the 4th field into an array:
awk 'BEGIN{FS=OFS="|"} {split($4,a,",");$4="";for(i=1;i in a;i++)$4=($4? $4 "," : "") $3 " > " a[i]} 1' myfile.csv
A|B|C|C > x,C > y,C > z|E
There are many ways to do this, but the simplest is the following:
$ awk 'BEGIN{FS=OFS="|"}{t=$4;gsub(/[^,]+/,$3" > &",t);$4 = $4 OFS t}1'
we make a copy of the fourth field in variable t. In there, we replace every string which does not contain the new separator (,) by the content of the third field followed by > and the original matched string (&).

How to combine several GAWK statements?

I have the following:
cat *.csv > COMBINED.csv
sort -k1 -n -t, COMBINED.csv > A.csv
gawk -F ',' '{sub(/[[:lower:]]+/,"",$1)}1' OFS=',' A.csv # REMOVE LOWER CASE CHARACTERS FROM 1st COLUMN
gawk -F ',' 'length($1) == 14 { print }' A.csv > B.csv # REMOVE ANY LINE FROM CSV WHERE VALUE IN FIRST COLUMN IS NOT 14 CHARACTERS
gawk -F ',' '{ gsub("/", "-", $2) ; print }' OFS=',' B.csv > C.csv # REPLACE FORWARD SLASH WITH HYPHEN IN SECOND COLUMN
gawk -F ',' '{print > ("processed/"$1".csv")}' C.csv # SPLIT CSV INTO FILES GROUPED BY VALUE IN FIRST COLUMN AND SAVE THE FILE WITH THAT VALUE
However, I think 4 separate lines is a bit overkill and was wondering whether I could optimise it or at least streamline it into a one-liner?
I've tried piping the data but getting stuck in a mix of errors
Thanks
In awk you can append multiple actions as:
pattern1 { action1 }
pattern2 { action2 }
pattern3 { action3 }
So every time a record is read, it will process it by first doing pattern-action1 followed by pattern-action2, ...
In your case, it seems like you can do:
awk 'BEGIN{FS=OFS=","}
# remove lower case characters from first column
{sub(/[[:lower:]]+/,"",$1)}
# process only lines with 14 characters in first column
(length($1) != 14) { next }
# replace forward slash with hyphen
{ gsub("/", "-", $2) }
{ print > ("processed/" $1 ".csv") }' <(sort -k1 -n -t, combined.csv)
You could essentially also put the sorting in GNU awk, but that is a but to mimic the sort exactly, we would need to know your input format.

Print Rows if End of Field Matches a String in AWK

I have a csv file and I am trying to print rows using awk if a certail field ends with a specific string. So for example, I have the below CSV file:
col1,col2,col3
1,abcd,.abcd_efg
2,efgh,.abcd
3,ijkl,.abcd_mno
4,mnop,.abcd
5,qrst,.abcd_uvw
This is the result I am seeking after:
2,efgh,.abcd
4,mnop,.abcd
But I am getting a different result. This is the awk command I am using:
cat file.csv | awk -F"," '{if ($3 ~ ".abcd" ) print $0}'
and This is the result I am getting:
1,abcd,.abcd_efg
2,efgh,.abcd
3,ijkl,.abcd_mno
4,mnop,.abcd
5,qrst,.abcd_uvw
I event tried the below, but no matched is returned so it didn't work:
cat file.csv | awk -F"," '{if ($3 ~ ".abcd$" ) print $0}'
Any clue what the issue might be? Am I using the wrong expression to get this result?
EDIT: This is another command I tried where I tried Kent's solution, but it didn't work:
cat file.csv | awk -F"," '$3 ~ "[.]abcd"'
First of all the cat in cat file|awk ... is useless, just awk ... file
Your input text has no single comma, how come you set FS=","?
If you want to do exact String compare, use $3 == "whatever" instead of $3 ~ /regex/
So your codes could be changed into:
awk '$3 == ".abcd"' file
If you really love regex, and want to do it in regex match way:
awk '$3 ~ "[.]abcd$"' file
or
awk '$3 ~ /^[.]abcd$/' file
depends on what you required.
You may modify your awk command as followed,
$ cat file.csv | awk '$3 ~ /\.abcd$/ {print $0}'
2 efgh .abcd
4 mnop .abcd
Brief explanation,
$3 ~ /.abcd$/: if $3 matches the regex .abcd$, print $0
According to your modified question, you may change the awk command to:
cat file.csv | awk -F, '$3 ~ /\.abcd$/ {print $0}'

(sed/awk) Extract values from text to csv file - even/odd lines pattern

I need to export some numeric values from a given ASCII text file and export it in a specific formatted csv file. The input file has got the even / odd line pattern:
SCF Done: E(UHF) = -216.432419652 A.U. after 12 cycles
CCSD(T)= -0.21667965032D+03
SCF Done: E(UHF) = -213.594303492 A.U. after 10 cycles
CCSD(T)= -0.21379841974D+03
SCF Done: E(UHF) = -2.86120139864 A.U. after 6 cycles
CCSD(T)= -0.29007031339D+01
and so on
I need the odd line value in the 5th column and the even line 2nd column value. They should be printed in a semicolon seperated csv file, with 10 values in each row. So the output should look like
-216.432419652;-0.21667965032D+03;-213.594303492;-0.21379841974D+03;-2.86120139864;-0.29007031339D+01; ...linebreak after 5 pairs of values
I started with awk '{print $5}' and awk '{print $2}', however I was not successful in creating a pattern that just acts on even/odd lines.
A simple way to do that?
The following script doesn't use a lot of the great power of awk, but will do the job for you and is hopefully understandable:
NR % 2 { printf "%s;", $5 }
NR % 2 == 0 { printf "%s;", $2 }
NR % 10 == 0 { print "" }
END { print "" }
Usage (save the above as script.awk):
awk -f script.awk input.txt
Given a file called data.txt, try:
awk '/SCF/{ printf $5 ";"; } /CCSD/{ printf($2); } NR % 10 == 0 { printf "\n"; }' data.txt
Something like this could work -
awk '{x = NF > 3 ? $5 : $2 ; printf("%s;",x)}(NR % 10 == 0){print OFS}' file
|_____________________| |________| |___________||_________|
| | | |
This is a `ternary operator`, Print with `NR` is a `OFS` is another built-in
what it does is checks the line formatting a built-in that has a default value of
for number of fields (`NF`). If to add that keeps `\n`
the number of fields is more than a ";" track of
3, we assign $5 value to variable x number of lines.
else we assign $2 value We are using modulo
operator to check when
10 lines are crossed.
This might work for you:
tr -s ' ' ',' <file | paste -sd',\n' | cut -d, -f5,11 | paste -sd',,,,\n'