table creation from CSV file with headers using awk

table creation from CSV file with headers using awk - csv

I have a comma separated CSV file with headers and want to include them in the table
Input:
header,word1,word2,word3
supercalifragi,black,white,red
adc,bad,cat,love
Output:
| header | word1 | word2 | word3 |
| -------------- | ----- | ----- | ----- |
| supercalifragi | black | white | red |
| adc | bad | cat | love |
I need to include the headers and I need to take into account the length of the words in the input file so that the finished table formats correctly
Here is the updated code:
function pr(){
for(i=1;i<=NF;i++)
printf "| %-"len[i]+1"s",$i;
printf "|\n"
}
NR==FNR{
for(i=1;i<=NF;i++)
if(len[i]<length($i)){
len[i]=length($i);
word[i]=$i
}next
}{pr()}
FNR==1{
for(i=1;i<=NF;i++){
gsub(/./,"-",word[i]);
$i=word[i]};
pr()
}
``

I took the freedom of rewriting the entire code from scratches. This should work:
BEGIN {
FS=","
OFS=" | "
for (i=1; i<=NF; i++) {
transientLength[i] = 0
}
}
{
if(NR==1) {
# read headers
for (i=0; i<NF; i++) {
headers[i] = $(i+1)
transientLength[i] = (length($(i+1))>=transientLength[i] ? length($(i+1)) : transientLength[i])
}
} else {
for (i=0; i<NF; i++) {
fields[NR][i] = $(i+1)
transientLength[i] = (length($(i+1))>=transientLength[i] ? length($(i+1)) : transientLength[i])
}
}
}
END {
# print header
for (j in headers) {
spaceLength = transientLength[j]-length(headers[j])
for (s=1;s<=spaceLength;s++) {
spaces = spaces" "
}
if (!printable) printable = headers[j] spaces
else printable = printable OFS headers[j] spaces
spaces = "" # garbage collection
}
printable = "| "printable" |"
print printable
printable = "" # garbage collection
# print alignments
for (j in transientLength) {
for (i=1;i<=transientLength[j];i++) {
sep = sep"-"
}
if (!printable) printable = sep
else printable = printable OFS sep
sep = "" # garbage collection
}
printable = "| "printable" |"
print printable
printable = "" # garbage collection
# print all rows
for (f in fields) {
for (j in fields[f]) {
spaceLength = transientLength[j]-length(fields[f][j])
for (s=1;s<=spaceLength;s++) {
spaces = spaces" "
}
if (!printable) printable = fields[f][j] spaces
else printable = printable OFS fields[f][j] spaces
spaces = "" # garbage collection
}
printable = "| "printable" |"
print printable
printable = "" # garbage collection
}
}
But please be aware: you need to clean your input file of unnecessary whitespaces. It should read:
header,word1,word2,word3
supercalifragi,black,white,red
adc,bad,cat,love
Alternatively, you might use FS=", ", but that would be actually limited to your example.

a shorter alternative with double scanning
$ awk -F' *, *' 'function pr()
{for(i=1;i<=NF;i++) printf "| %-"len[i]+1"s",$i; printf "|\n"}
NR==FNR{for(i=1;i<=NF;i++)
if(len[i]<length($i)) {len[i]=length($i); word[i]=$i} next}
{pr()}
FNR==1{for(i=1;i<=NF;i++) {gsub(/./,"-",word[i]); $i=word[i]}; pr()}' file{,}
| header | word1 | word2 | word3 |
| -------------- | ----- | ----- | ----- |
| supercalifragi | black | white | red |
| adc | bad | cat | love |

It's not exactly the output you asked for but maybe this is all you really need:
$ column -t -s, -o' | ' < file | awk '1; NR==1{gsub(/[^|]/,"-"); print}'
header | word1 | word2 | word3
---------------|-------|-------|------
supercalifragi | black | white | red
adc | bad | cat | love

Related

Format number in thousands separators with jq json cli

Given {"a": 1234567890}, I want 1,234,567,890 in the result, how this can be done with jq
echo '{"a": 1234567890}' | jq '.a | FORMAT?'
Thanks for #peak's answer, the solution is
echo '{"a": 1234567890}' | jq -r 'def h: [while(length>0; .[:-3]) | .[-3:]] | reverse | join(","); .a | tostring | h'
//-> 1,234,567,890

Here's an idiomatic one-liner definition:
def h: tostring | [while(length>0; .[:-3]) | .[-3:]] | reverse | join(",");
Example
12, 123, 1234, 12345678 | h
Output (using -r option):
12
123
1,234
12,345,678

jq doesn't have (yet) a printf function to format according locale settings.
If that's an option for you can pass the number to the shell using printf:
echo '{"a": 12345}' | jq '.a' | xargs printf "%'.f\n"
12,345
Note that the printf conversion relies on the format %'.f that is explained in man 3 printf

Here's a generic solution for integers or integer-valued strings:
# "h" for "human-readable"
def h:
def hh: .[0] as $s | .[1] as $answer
| if ($s|length) == 0 then $answer
else ((if $answer == "" then "" else "," end) + $answer ) as $a
| [$s[0:-3], $s[-3:] + $a] | hh
end;
[ tostring, ""] | hh;
Example
12, 123, 1234, 12345678 | h
Result (using -r option):
12
123
1,234
12,345,678

Regexp match the output , while it shouldn't

my output looks like this :
+------+--------------+---------+------------+-----------------------+
| id | IP address | status | type | created at |
+------+--------------+---------+------------+-----------------------+
| sc-1 | | running | r4.2 | Aug 21, 2017 08:09:44 |
| sc-2 | 164.54.39.30 | running | r4.2 | Aug 21, 2017 08:09:44 |
+------+--------------+---------+------------+-----------------------+
I need to check if sc-1and sc2 are "runnig".
My solution for the first row is following:
proc check_if_exist_in_output{lines
String_to_Check} {
set err_msg ""
set Flag "False"
foreach line $lines { if { $line eq $String_to_Check} {
set Flag "True"
return $Flag }}
if {$Flag == "False"} {return "Error"}
Now this is working fine, but the thing is, that I may have this IP also in the first line , and then my script doesn't work .
So I tries a solution with REGEXP , I want to provide a line from the output, and to check if it contains the line I am looking for.
long story short :
if "| sc-2 | | running |" is part of line " | sc-2 | 164.54.39.30 | running | "
The answer should be true.
my new proc with REXEXP looks like this :
proc check_if_exist_in_output_with_reg_exspression {lines Str} {
set Flag "False"
foreach line $lines {if {[regexp "$line.*" $Str] == 1}{
set Flag "True"
return $Flag }}
if {$Flag == "False"} {
#FAil step
"Retuen Error"
}
}
#calling the proc:
set lines [split $OutPut "\n"]
set expected_output "| sc-2 | | running |"
set Result [check_if_exist_in_output_reg $lines $expected_output]
But the above proc always returns TRUE, doesn't meter what I send it.
While I expect to get False in case the line doesn't really exist.
I also though about sending the expected result as something like :
set expected_output "| sc-2 |[regexp {(?:\d+\.){3}\d+}]| running |"
But I am not sure how to write it in the correct way.

as was suggested by jasonmclose (I am not sure on how to tag here) .
I 've changed the regexp to amore simple one . and the proc is working just fine like this :
proc check_if_exist_in_output_with_reg_exspression {lines String_to_Check} {
set err_msg ""
set Flag "False"
foreach line $lines { if {[regexp ^.*sc-2.*running.* $String_to_Check ]==1} {
set Flag "True"
return $Flag }}
if {$Flag == "False"} {
#Fail step
return "Error"
}
}

This is not an answer to the OP's question on getting the regexp right, but it is worth mentioning that the whole fuzz can also be tackled without a regexp. You may want to consider sth. along the lines of:
set data {
+------+--------------+---------+------------+-----------------------+
| id | IP address | status | type | created at |
+------+--------------+---------+------------+-----------------------+
| sc-1 | | running | r4.2 | Aug 21, 2017 08:09:44 |
| sc-2 | 164.54.39.30 | running | r4.2 | Aug 21, 2017 08:09:44 |
+------+--------------+---------+------------+-----------------------+
}
set status [dict create]
foreach line [lrange [split [string trim $data] "\n"] 3 end-1] {
set line [lmap el [split [string trim $line |] |] {string trim $el}]
dict set status [lindex $line 0] [lindex $line 2]
}
puts [dict get $status "sc-2"]

set Output {+------+--------------+---------+------------+-----------------------+
| id | IP address | status | type | created at |
+------+--------------+---------+------------+-----------------------+
| sc-1 | | running | r4.2 | Aug 21, 2017 08:09:44 |
| sc-2 | 164.54.39.30 | running | r4.2 | Aug 21, 2017 08:09:44 |
+------+--------------+---------+------------+-----------------------+};
proc check_if_exist_in_output_reg {lines Str} {
set Flag "False"
if {[regexp -linestop $Str $lines a] ==1} { ; #-linestop for matching whithin a line, a for just storing match in a variable a
#puts $a
set Flag "True"
return $Flag
} else {
return "Error"
}
}
#calling the proc:
#set lines [split $Output "\n"] : not needed
set expected_output {sc-1.*running} ; #brackets for literal substitution
set Result [check_if_exist_in_output_reg $Output $expected_output]
puts $Result

CSV output file using command line for wireshark IO graph statistics

I save the IO graph statistics as CSV file containing the bits per second using the wireshark GUI. Is there a way to generate this CSV file with command line tshark? I can generate the statistics on command line as bytes per second as follows
tshark -nr test.pcap -q -z io,stat,1,BYTES
How do I generate bits/second and save it to a CSV file?
Any help is appreciated.

I don't know a way to do that using only tshark, but you can easily parse the output from tshark into a CSV file:
tshark -nr tmp.pcap -q -z io,stat,1,BYTES | grep -P "\d+\s+<>\s+\d+\s*\|\s+\d+" | awk -F '[ |]+' '{print $2","($5*8)}'
Explanations
grep -P "\d+\s+<>\s+\d+\s*\|\s+\d+" selects only the raw from the tshark output with the actual data (i.e., second <> second | transmitted bytes).
awk -F '[ |]+' '{print $2","($5*8)}' splits that data into 5 blocks with [ |]+ as the separator and display blocks 2 (the second at which starts the interval) and 5 (the transmitted bytes) with a comma between them.

Another thing that may be good to know:
If you change the interval from 1 second to 0.5 seconds, then you have to allow . in the grep part by adding \. between two digits \d .
Otherwise the result will be an empty *.csv file.
grep -P "\d{1,2}\.{1}\d{1,2}\s+<>\s+\d{1,2}\.{1}\d{1,2}\s*\|\s+\d+"

The answers in this thread gave me the keys to solving a similar problem with tshark io stats and I wanted to share the results and how it works. In my case, the task was to convert multiple columns of tshark io stat records with potential decimals in the data. This answer converts multiple data columns to csv, adds rudimentary headers, accounts for decimals in fields and variable numbers of spaces.
Complete command string
tshark -r capture.pcapng -q -z io,stat,30,,FRAMES,BYTES,"FRAMES()ip.src == 10.10.10.10","BYTES()ip.src == 10.10.10.10","FRAMES()ip.dst == 10.10.10.10","BYTES()ip.dst == 10.10.10.10" \
| grep -P "\d+\.?\d*\s+<>\s+|Interval +\|" \
| tr -d " " | tr "|" "," | sed -E 's/<>/,/; s/(^,|,$)//g; s/Interval/Start,Stop/g' > somefile.csv
Explanation
The command string has 3 major parts.
tshark creates the report with the data in columns
Extract the desired lines with grep
Use tr and sed to convert the records grep matched into a csv delimited file.
Part 1: tshark creates the report with the data in columns
tshark is run with -z io,stat at a 30 second interval, counting frames and bytes with various filters.
tshark -r capture.pcapng -q -z io,stat,30,,FRAMES,BYTES,"FRAMES()ip.src == 10.10.10.10","BYTES()ip.src == 10.10.10.10","FRAMES()ip.dst == 10.10.10.10","BYTES()ip.dst == 10.10.10.10"
Here is the output when run against my test pcap file:
=================================================================================================
| IO Statistics |
| |
| Duration: 179.179180 secs |
| Interval: 30 secs |
| |
| Col 1: Frames and bytes |
| 2: FRAMES |
| 3: BYTES |
| 4: FRAMES()ip.src == 10.10.10.10 |
| 5: BYTES()ip.src == 10.10.10.10 |
| 6: FRAMES()ip.dst == 10.10.10.10 |
| 7: BYTES()ip.dst == 10.10.10.10 |
|-----------------------------------------------------------------------------------------------|
| |1 |2 |3 |4 |5 |6 |7 |
| Interval | Frames | Bytes | FRAMES | BYTES | FRAMES | BYTES | FRAMES | BYTES |
|-----------------------------------------------------------------------------------------------|
| 0 <> 30 | 107813 | 120111352 | 107813 | 120111352 | 26682 | 15294257 | 80994 | 104808983 |
| 30 <> 60 | 122437 | 124508575 | 122437 | 124508575 | 49331 | 17080888 | 73017 | 107422509 |
| 60 <> 90 | 138999 | 135488315 | 138999 | 135488315 | 54829 | 22130920 | 84029 | 113348686 |
| 90 <> 120 | 158241 | 217781653 | 158241 | 217781653 | 42103 | 15870237 | 115971 | 201901201 |
| 120 <> 150 | 111708 | 131890800 | 111708 | 131890800 | 43709 | 18800647 | 67871 | 113082296 |
| 150 <> Dur | 123736 | 142639416 | 123736 | 142639416 | 50754 | 22053280 | 72786 | 120574520 |
=================================================================================================
Considerations
Looking at this output, we can see several items to consider:
Rows with data have a unique sequence in the Interval column of "space<>space", which we will can use for matching.
We want the header line, so we will use the word "Interval" followed by spaces and then a "|" character.
The number of spaces in a column are variable depending on the number of digits per measurement.
The Interval column gives both the time from 0 and the from the first measurement. Either can be used, so we will keep both and let the user decide.
When using milliseconds there will be decimals in the Interval field
Depending on the statistic requested, there may be decimals in the data columns
The use of "|" as delimiters will require escaping in any regex statement that covers them.
Part 2: Extract the desired lines with grep
Once tshark produces output, we use grep with regex to extract the lines we want to save.
grep -P "\d+\.?\d*\s+<>\s+|Interval +\|""
grep will use the "Digit(s)Space(s)<>Space(s)" character sequence in the Interval column to match the lines with data. It also uses an OR to grab the header by matching the characters "Interval |".
grep -P # The "-P" flag turns on PCRE regex matching, which is not the same as egrep. With egrep, you will need to change the escaping.
"\d+ # Match on 1 or more Digits. This is the 1st set of numbers in the Interval column.
\.? # 0 or 1 Periods. We need this to handle possible fractional seconds.
\d* # 0 or more Digits. To handle possible fractional seconds.
\s+<>\s+ # 1 or more Spaces followed by the Characters "<>", then 1 or more Spaces.
| # Since this is not escaped, it is a regex OR
Interval\s+\|" # Match the String "Interval" followed by 1 or more Spaces and a literal "|".
From the tshark output, grep matched these lines:
| Interval | Frames | Bytes | FRAMES | BYTES | FRAMES | BYTES | FRAMES | BYTES |
| 0 <> 30 | 107813 | 120111352 | 107813 | 120111352 | 26682 | 15294257 | 80994 | 104808983 |
| 30 <> 60 | 122437 | 124508575 | 122437 | 124508575 | 49331 | 17080888 | 73017 | 107422509 |
| 60 <> 90 | 138999 | 135488315 | 138999 | 135488315 | 54829 | 22130920 | 84029 | 113348686 |
| 90 <> 120 | 158241 | 217781653 | 158241 | 217781653 | 42103 | 15870237 | 115971 | 201901201 |
| 120 <> 150 | 111708 | 131890800 | 111708 | 131890800 | 43709 | 18800647 | 67871 | 113082296 |
| 150 <> Dur | 123736 | 142639416 | 123736 | 142639416 | 50754 | 22053280 | 72786 | 120574520 |
Part 3: Use tr and sed to convert the records grep matched into a csv delimited file.
tr and sed are used for converting the lines grep matched into csv. tr does the bulk work of removing spaces and changing the "|" to ",". This is simpler and faster then using sed. However, sed is used for some cleanup work
tr -d " " | tr "|" "," | sed -E 's/<>/,/; s/(^,|,$)//g; s/Interval/Start,Stop/g'
Here is how these commands perform the conversion. The first trick is to get rid of all of the spaces. This means we dont have to account for them in any regex sequences, making the rest of the work simpler
| tr -d " " # Spaces are in the way, so delete them.
| tr "|" "," # Change all "|" Characters to ",".
| sed -E 's/<>/,/; # Change "<>" to "," splitting the Interval column.
s/(^,|,$)//g; # Delete leading and/or trailing "," on each line.
s/Interval/Start,Stop/g' # Each of the "Interval" columns needs a header, so change the text "Interval" into two words with a , separating them.
> somefile.csv # Pipe the output into somefile.csv
Final result
Once through this process, we have a csv output that can now be imported into your favorite csv tool, spreadsheet, or fed to a graphing program like gnuplot.
$cat somefile.csv
Start,Stop,Frames,Bytes,FRAMES,BYTES,FRAMES,BYTES,FRAMES,BYTES
0,30,107813,120111352,107813,120111352,26682,15294257,80994,104808983
30,60,122437,124508575,122437,124508575,49331,17080888,73017,107422509
60,90,138999,135488315,138999,135488315,54829,22130920,84029,113348686
90,120,158241,217781653,158241,217781653,42103,15870237,115971,201901201
120,150,111708,131890800,111708,131890800,43709,18800647,67871,113082296
150,Dur,123736,142639416,123736,142639416,50754,22053280,72786,120574520

Unable to fix the logic of the bash script

I have a table (say UserInputDetails) with the following entries:
+------------+-----------+----------+
| screenId | userInput | numInput |
+------------+-----------+----------+
| 13_1_2_1 | 2 | 9 |
| 13_1_2_2 | 2 | 9 |
| 13_1_2_2 | 3 | 2 |
| 13_1_2_2 | 9 | 2 |
| 13_1_2_2_2 | 3 | 3 |
| 13_1_2_2_2 | 5 | 2 |
| 13_2_2_2 | 4 | 4 |
| 13_2_2_2 | 5 | 4 |
| 13_2_2_2 | 7 | 2 |
+------------+-----------+----------+
I need to write a shell script which gives its expected output as:
13_1_2_1,0,0,9,0,0,0,0,0,0,0
13_1_2_2,0,0,9,2,0,0,0,0,0,2
13_1_2_2_2,0,0,0,3,0,2,0,0,0,0
13_2_2_2,0,0,0,0,4,4,0,2,0,0
Explanation for the output:
the first line of input denotes the numInputs for a particular userInput for screenId '13_1_2_1'. The line first prints the screenId and then corresponding NumInput for userInput 0-9. Since the numInput for userInput '2' is 9 and for the rest of 0-9 is 0, it gives the value 13_1_2_1,0,0,9,0,0,0,0,0,0,0
The bash script written for the following function is:
#!/bin/bash
MYSQL="mysql -uroot -proot -N Database1"
yesterday=""
if [ $# -ge 1 ]
then
yesterday="$1"
else
yesterday=`$MYSQL -sBe "select date_sub(date(now()), interval 1 day);"`
fi
echo "DATE: $yesterday"
PREVSCREENID=''
SCREENID=
ABC=tempSqlDataFile
$MYSQL -sBe "select screenId, userInput, numInput from userInputDetails group by screenID, userInput" > $ABC
for i in {0..9}
do
arr[$i]='0'
done
while read line
do
SCREENID=`echo $line | awk '{ print $1 }'`
i=`echo $line | awk '{print $2 }'`
arr[$i]=`echo $line | awk '{print $3}'`
if [[ $SCREENID != $PREVSCREENID ]]
then
echo "$SCREENID ${arr[*]}" | tr ' ' ','
for i in {0..9}
do
arr[$i]='0'
done
else
i=`echo $line | awk '{print $2 + 1}'`
arr[$i]=`echo $line | awk '{print $3}'`
fi
PREVSCREENID=$SCREENID
done < $ABC
The logic somewhere is going wrong and I am unable to get the logic right. the output from the above shell script is:
13_1_2_1,0,0,9,0,0,0,0,0,0,0,
13_1_2_2,0,0,9,0,0,0,0,0,0,0,
13_1_2_2_2,0,0,9,3,0,0,0,0,0,2,
13_2_2_2,0,0,0,3,4,2,0,0,0,0,
Please can you help me fix the logic in my script? Also, since I am new to scripting and programming, this may not be an efficient way to perform this task. Please suggest if there is an efficient way.

There are a number of errors in your script. Here's a rewrite of the latter part:
while read SCREENID i n; do
if [[ "$SCREENID" != "$PREVSCREENID" ]]; then
[ "$PREVSCREENID" ] && echo "$PREVSCREENID ${arr[*]}" | tr ' ' ,
for j in {0..9}; do arr[$j]=0; done
fi
arr[$i]="$n"
PREVSCREENID="$SCREENID"
done < "$ABC"
echo "$PREVSCREENID ${arr[*]}" | tr ' ' ,
You can avoid calling tr like this:
print_arr() { IFS=,; echo $PREVSCREENID,"${arr[*]}"; unset IFS; }
while read SCREENID i n; do
if [[ "$SCREENID" != "$PREVSCREENID" ]]; then
[ "$PREVSCREENID" ] && print_arr
for j in {0..9}; do arr[$j]=0; done
fi
arr[$i]="$n"
PREVSCREENID="$SCREENID"
done < "$ABC"
print_arr

How to parse out a grid of numbers in pictorial form?

Assuming all the grids can contain any number from 1 to 99 in each, what's the simplest way to recognize each number?
For example:
-------------
| 1 | 2 | 3 |
|-----------|
|11 | 12| 13|
|-----------|
|4 | 5 | 6 |
|-----------|
How do I parse them into a 2 dimensional array? Language doesn't matter, I just want to get a general solution.
Thanks,

If you know that to be the format I would go with either regex or simple string splitting
Examples in perl:
REGEX:
my #data;
for( <FILE> ) {
next unless /\d/;
/\D*(\d+)\D+(\d+)\D+(\d+)\D*/;
$data[$#data + 1] = ( $1, $2, $3 );
}
STRING OPS:
my #data;
for ( <FILE> ) {
next unless /\d/;
$data[$#data + 1] = split /|/, $_;
}
Or something to that effect.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

table creation from CSV file with headers using awk - csv

It's not exactly the output you asked for but maybe this is all you really need: $ column -t -s, -o' | ' < file | awk '1; NR==1{gsub(/[^|]/,"-"); print}' header | word1 | word2 | word3 ---------------|-------|-------|------ supercalifragi | black | white | red adc | bad | cat | love

Related

Format number in thousands separators with jq json cli

Regexp match the output , while it shouldn't

CSV output file using command line for wireshark IO graph statistics

Unable to fix the logic of the bash script

How to parse out a grid of numbers in pictorial form?

Categories

Resources