I am trying to create an table from a csv file. In the csv file I have the three fields that are already filtered so that it does not generate problems, but when I run the code, the report file does not generate any output and it must be an error when going through the file or I do not know where else is the failure:
The csv input looks similar to this:
I cannot see an obvious error in your file but was able to generate the required html file using an awk script file as follows (the correct #! path can be found using which awk in terminal):
#! /usr/bin/awk -f
BEGIN {
FS =","
print "<!DOCTYPE html>\n<head><title>Report</title>"
print "<link rel=\"stylesheet\" href=\"style.css\">"
print "<meta charset=\"utf-8\"/>"
print "</head>\n<body>\n<div class=\"head-style\"<h2>Report</h2>\n</div>"
}
NR<2 {
print "<table>\n<tr><th>"$1"</th><th>"$2"</th><th>"$3"</th>"
}
NR>1 {
print "<tr><td>"$1"</td><td>"$2"</td><td>"$3"</td></tr>"
}
END {
print "</table>\n<div class=\"footer\">\n<p>0000 st</p>\n</div>\n</body>\n</html>"
}
I formatted the printing using field references $1 etc. between quotted string. Note also, quotes can be escaped for printing.
I saved the script as awkScript.awk and made it executable from the command line using:
chmod +x awkScript.awk
This can then be executed on the csv file with the command:
./awkScript.awk rm.csv > rep.html
It looks like you forgot to pass the fields as parameters to function print_line(Platan, Recl, Tror).
# Since "Platan" and "Recl" are strings, I think the format string
# should be: "%s %s %.2f %s\n" (BTW, I included "\n" to improve readability).
function print_line(Platan, Recl, Tror) {
printf("%s %s %.2f %s\n", "<tr><td>"Platan"</td>", "<td>"Recl"</td><td>", Tror, "</td></tr>") ;
}
{
if (NR > 1) {
print_line($1, $2, $3) # should solve
}
}
Related
I have file containing Multiple JSON Objects and need to covert them to JSON. I have bash and Excel installed, but cannot install any other tool.
{"name": "a","age":"17"}
{"name":"b","age":"18"}
To:
[{"name": "a","age":"17"},
{"name":"b","age":"18"}]
Assuming one object per line as shown by OP question
echo -n "["; while read line; do echo "${line},"; done < <(cat test.txt) | sed -re '$ s/(.*),/\1]/'
Result:
[{"name": "a","age":"17"},
{"name":"b","age":"18"}]
Inspired by https://askubuntu.com/a/475804
awk '(NR==FNR){count++} (NR!=FNR){ if (FNR==1) printf("["); printf("%s", $0); print (FNR==count)?"]":"," } END {if (count==0) print "[]"}' file file
A less compact but more readable version:
awk '
(NR==FNR) {
count++;
}
(NR!=FNR) {
if (FNR==1)
printf("[");
printf("%s", $0);
if (FNR==count)
print "]";
else
print ",";
}
END {
if (count==0) print "[]";
}' file file
The trick is to give the same file twice to awk. Because NR==FNR is always true for the first file, there is a first parse dedicated to counting the number of lines into variable count.
The second parse with NR!=FNR will apply the following algorithm for each line:
Write [ for the first line only
Then write the record, using printf instead of print in order to avoid the newline ending
Then write either ] or , depending on whether we are on the last line or not, using print in order to end with a newline
The END command is just a failsafe to output an empty array in case the file is empty.
Assumptions:
no requirement to (re)format the input data
Sample input:
$ cat raw.dat
{"name": "a","age":"17"}
{"name":"b","age":"18"}
{"name":"C","age":"23"}
One awk idea:
awk 'BEGIN {pfx="["} {printf "%s%s",pfx,$0; pfx=",\n"} END {printf "]\n"}' raw.dat
Where:
for each input line we printf the line without a terminating linefeed
for the first line we use a prefix (pfx) of [
for subsequent lines the prefix (pfx) is set to ,\n (ie, terminate the previous line with ,\n)
once the file has been processed we terminate the last input line with a printf "]\n"
requires a single pass through the input file
This generates:
[{"name": "a","age":"17"},
{"name":"b","age":"18"},
{"name":"C","age":"23"}]
Making sure #chepner's comment (re: a sed solution) isn't lost in the mix:
sed '1s/^/[/;2,$s/^/,/;$s/$/]/' raw.dat
This generates:
[{"name": "a","age":"17"}
,{"name":"b","age":"18"}
,{"name":"C","age":"23"}]
NOTE: I can remove this if #chepner wants to post this as an answer.
If I have a file with such content:
{"id":"doi-platelemetry-doi/doiplatelemetry","name":"doi/doiplatelemetry","location":"doi/doiplatelemetry/2.0.0/doiplatelemetry:2.0.0_49","component":"doi","tag":"2.0.0_49"},{"id":"doi-maintenance-service-doi/maintenance-service","name":"doi/maintenance-service","location":"doi/1.0.0/maintenance-service:1.0.0.681","component":"doi","tag":"1.0.0.681"}
How do we replace all / with - only in the value of location field? Meaning, after the replacement, the file content should be:
{"id":"doi-platelemetry-doi/doiplatelemetry","name":"doi/doiplatelemetry","location":"doi-doiplatelemetry-2.0.0-doiplatelemetry:2.0.0_49","component":"doi","tag":"2.0.0_49"},{"id":"doi-maintenance-service-doi/maintenance-service","name":"doi/maintenance-service","location":"doi-1.0.0-maintenance-service:1.0.0.681","component":"doi","tag":"1.0.0.681"}
This can be very easily achieved using jq but I am looking for a solution that works even when tools like jq are not available.
With awk:
awk -F, 'BEGIN { RS=","} /location/ { gsub("/","-",$0) } {ORS=","}1' file > file.tmp && mv -f file.tmp file
Set the record separator to "," and then when the record contains location, using gsub to replace all "/" for "-"
I'm working on parsing JSON data using JSON.sh. And I wanted to read data from json file (test.json) whose content will be something like,
{
"/home/ukrishnan/projects/test.yml": {
"LOG_DRIVER": "syslog",
"IMAGE": "mysql:5.6"
},
"/home/ukrishnan/projects/mysql/app.xml": {
"ENV_ACCOUNT_BRIDGE_ENDPOINT": "/u01/src/test/sample.txt"
}
}
And I try to parse this JSON using JSON.sh by using,
test_parser=`sh ./lib/JSON.sh < test/test.json`
echo $test_parser
It prints,
["/home/ukrishnan/projects/test.yml","LOG_DRIVER"] "syslog" ["/home/ukrishnan/projects/test.yml","IMAGE"] "mysql:5.6" ["/home/ukrishnan/projects/test.yml"] {"LOG_DRIVER":"syslog","IMAGE":"mysql:5.6"} ["/home/ukrishnan/projects/mysql/app.xml","ENV_ACCOUNT_BRIDGE_ENDPOINT"] "/u01/src/test/sample.txt" ["/home/ukrishnan/projects/mysql/app.xml"] {"ENV_ACCOUNT_BRIDGE_ENDPOINT":"/u01/src/test/sample.txt"} [] {"/home/ukrishnan/projects/test.yml":{"LOG_DRIVER":"syslog","IMAGE":"mysql:5.6"},"/home/ukrishnan/projects/mysql/app.xml":{"ENV_ACCOUNT_BRIDGE_ENDPOINT":"/u01/src/test/sample.txt"}}
Whereas, the same command (sh ./lib/JSON.sh < test/test.json), if I run through terminal, it is printing with line breaks,
["/home/ukrishnan/projects/test.yml","LOG_DRIVER"] "syslog"
["/home/ukrishnan/projects/test.yml","IMAGE"] "mysql:5.6"
["/home/ukrishnan/projects/test.yml"] {"LOG_DRIVER":"syslog","IMAGE":"mysql:5.6"}
["/home/ukrishnan/projects/mysql/app.xml","ENV_ACCOUNT_BRIDGE_ENDPOINT"] "/u01/src/test/sample.txt"
["/home/ukrishnan/projects/mysql/app.xml"] {"ENV_ACCOUNT_BRIDGE_ENDPOINT":"/u01/src/test/sample.txt"}
[] {"/home/ukrishnan/projects/test.yml":{"LOG_DRIVER":"syslog","IMAGE":"mysql:5.6"},"/home/ukrishnan/projects/mysql/app.xml":{"ENV_ACCOUNT_BRIDGE_ENDPOINT":"/u01/src/test/sample.txt"}}
I wanted to read this and assign to bash variables like,
file_name='/home/ukrishnan/projects/test.yml'
key='LOG_DRIVER'
value='syslog'
As I'm almost completely new to shell script and grep or awk, I don't have much idea of how to achieve this. Any help on this would be greatly appreciated.
I wrote a JSON serializer / deserializer for gawk, if you're interested. Save that script and modify it, replacing everything above # === FUNCTIONS === with the following:
#!/usr/bin/gawk -f
# capture JSON string from beginning to end into a scalar variable
{ json = json ORS $0 }
END {
# objectify JSON string to the multilevel array "obj"
deserialize(json, obj)
for (filename in obj) {
print "file_name=" quote(filename)
for (key in obj[filename]) {
# print key="value"
print key "=" quote(obj[filename][key])
}
}
}
Do chmod 755 json.awk and execute it. Output will resemble this:
$ ./json.awk test5.json
file_name="/home/ukrishnan/projects/mysql/app.xml"
ENV_ACCOUNT_BRIDGE_ENDPOINT="/u01/src/test/sample.txt"
file_name="/home/ukrishnan/projects/test.yml"
LOG_DRIVER="syslog"
IMAGE="mysql:5.6"
Hopefully the logic is reasonably easy to follow. If you prefer to output filename=, key=, and value= on every loop iteration, modify the nested for loops accordingly:
for (filename in obj) {
for (key in obj[filename]) {
print "file_name=" quote(filename)
print "key=" quote(key)
print "value=" quote(obj[filename][key])
}
}
That change will result in the following output:
$ ./json.awk test5.json
file_name="/home/ukrishnan/projects/mysql/app.xml"
key="ENV_ACCOUNT_BRIDGE_ENDPOINT"
value="/u01/src/test/sample.txt"
file_name="/home/ukrishnan/projects/test.yml"
key="LOG_DRIVER"
value="syslog"
file_name="/home/ukrishnan/projects/test.yml"
key="IMAGE"
value="mysql:5.6"
Anyway, with that output, you can do something silly in BASH like this to populate and act upon the variables:
#!/bin/bash
./test.awk test5.json | while read -r line; do {
eval $line
[ "${line/=*/}" = "value" ] && {
echo "bash: file_name=$file_name"
echo "bash: key=$key"
echo "bash: value=$value"
echo "------"
}
}; done
It'd probably be more graceful just to do all processing within gawk from start to finish and not mess with the polyglot handoff, though.
Getting back to json.awk, if you prefer to keep json.awk modular for easy reuse in future projects, you could remove everything above # === FUNCTIONS ===, create a separate main.awk containing the code block at the top of this answer, and #include "json.awk" as a helper library pretty much anywhere outside of END {...} (just below the shbang, for example).
JSON.sh (from http://json.org) offers a nice bash friendly means of flattening out a JSON file. Which you've already provided how it looks in your question. So, the flatten form is the format:
[node] tab value
You have to think in UNIX script in extracting the information you want, you'll note the lines you're interested in actually follow this pattern:
["filename","key"] tab ["value"]
In regex notation, we replace:
filename with (.*)
key with (.*)
tab with \t
value with (.*)
We can retrieve the first, second and third matching groups with \1, \2, \3 respectively.
When used in sed we also note that these symbols []() need to be escaped with a backslash \, resulting in the following script:
./lib/JSON.sh < test/test.json | sed 's/\["\(.*\)","\(.*\)\"]\t"\(.*\)"/\1,\2,\3/;t;d'
/home/ukrishnan/projects/test.yml,LOG_DRIVER,syslog
/home/ukrishnan/projects/test.yml,IMAGE,mysql:5.6
/home/ukrishnan/projects/mysql/app.xml,ENV_ACCOUNT_BRIDGE_ENDPOINT,/u01/src/test/sample.txt
Now we put the lines in a loop and for each line, we can extract out filename,key,value:
for line in $(./lib/JSON.sh < test/test.json | sed 's/\["\(.*\)","\(.*\)\"]\t"\(.*\)"/\1,\2,\3/;t;d')
do
IFS="," read -ra arr <<< $line
filename=${arr[0]}
key=${arr[1]}
value=${arr[2]}
cat <<EOF
filename : $filename
key : $key
value : $value
EOF
done
Which outputs:
filename : /home/ukrishnan/projects/test.yml
key : LOG_DRIVER
value : syslog
filename : /home/ukrishnan/projects/test.yml
key : IMAGE
value : mysql:5.6
filename : /home/ukrishnan/projects/mysql/app.xml
key : ENV_ACCOUNT_BRIDGE_ENDPOINT
value : /u01/src/test/sample.txt
Ok I have a csv file I need to split based on a column value which is fine, but I cannot get the headers to print in each file.
Currently I use:
awk "FS =\",\" {output=$3\".csv\"; print $0 > output}" test.csv
Which splits the files file based on column 3, but I don't know how to add the header to each file.
I've searched high & low but can't find a solution that will work in a one liner...
UPDATE
OK to date we have a working one liner:
awk -F, "NR==1{hdr=$0;next}!($3 in files){files[$3]=1;print hdr>$3\".csv\"}{print>$3\".csv\"}" test.csv
Or in test.awk:
BEGIN{FS=","} NR==1 {hdr=$0;next}!($3 in files) {files[$3]=1;print hdr>$3".csv"}{print>$3".csv"}
Command to run used:
awk -f test.awk test.csv
I really appreciate the help here, I've been trying for hours and have a few things left to work out.
1) Blank line inserted after header
2) Sort the data on specified fields
Further down the line I want to additionally do a row count & cut a reference number from another file is this possible with AWK or am I using the wrong tool for the job?
Thanks again.
UPDATED#2
Blank line after header line
UPDATED
Try this:
On Unix/cygwin (I tested on cygwin):
awk -F, 'NR==1{hdr=$0;next}!($3 in files){files[$3]=1;print hdr"\n">$3".csv"}{print>$3".csv"}' test.csv
Or adding Kent's ideas:
awk -F, 'NR==1{hdr=$0;next}{out=$3".csv"}!($3 in files){files[$3];print hdr"\n">out}{print>out}' test.csv
On windows cmd (not tested):
awk -F, "NR==1{hdr=$0;next}!($3 in files){files[$3]=1;print hdr\"\n\">$3\".csv\"}{print>$3\".csv\"}" test.csv
This stores the header line in test.csv to hdr. For the next lines it checks if the file name value is already exists. If not then stores its name in the files hash and prints the header line. And anyway it prints the whole line to the file.
Example file:
$ cat test.csv
A,B,C,D
1,2,a,3
4,5,b,4
Output
$ cat a.csv
A,B,C,D
1,2,a,3
$ cat b.csv
A,B,C,D
4,5,b,4
ADDED
If You would like to put the awk script into a file You could try (I cannot test is, sorry).
test.awk
BEGIN{FS=","}
NR==1 {hdr=$0;next}
!($3 in files) {files[$3]=1;print hdr"\n">$3".csv"}
{print>"$3.csv"}
Then You may call it as
awk -f test.awk test.csv
awk -F, 'NR==1{h=$0;next}{out=$3".csv";
if!(out in a)print h> out; print $0 > out;a[out]}' test.csv
Try something like this:
awk -F, '
BEGIN {
getline header
}
{
out=$3".csv"
if (!($3 in seen)) {
print header > out
}
print $0 > out
seen[$3]
}' test.csv
Windows version: (Not tested)
awk " FS =\",\"
BEGIN {
getline header
}
{
out=$3\".csv\"
if (!($3 in seen)) {
print header > out
}
print $0 > out
seen[$3]
}" test.csv
awk '{ output=$3".csv"; if( !($0 in a)) print "header" > output; a[$0]
print > output}' FS=, test.csv
I have to extract data from JSON file depending on a specific key. The data then has to be filtered (based on the key value) and separated into different fixed width flat files. I have to develop a solution using shell scripting.
Since the data is just key:value pair I can extract them by processing each line in the JSON file, checking the type and writing the values to the corresponding fixed-width file.
My problem is that the input JSON file is approximately 5GB in size. My method is very basic and would like to know if there is a better way to achieve this using shell scripting ?
Sample JSON file would look like as below:
{"Type":"Mail","id":"101","Subject":"How are you ?","Attachment":"true"}
{"Type":"Chat","id":"12ABD","Mode:Online"}
The above is a sample of the kind of data I need to process.
Give this a try:
#!/usr/bin/awk
{
line = ""
gsub("[{}\x22]", "", $0)
f=split($0, a, "[:,]")
for (i=1;i<=f;i++)
if (a[i] == "Type")
file = a[++i]
else
line = line sprintf("%-15s",a[i])
print line > file ".fixed.out"
}
I made assumptions based on the sample data provided. There is a lot based on those assumptions that may need to be changed if the data varies much from what you've shown. In particular, this script will not work properly if the data values or field names contain colons, commas, quotes or braces. If this is a problem, it's one of the primary reasons that a proper JSON parser should be used. If it were my assignment, I'd push back hard on this point to get permission to use the proper tools.
This outputs lines that have type "Mail" to a file named "Mail.fixed.out" and type "Chat" to "Chat.fixed.out", etc.
The "Type" field name and field value ("Mail", etc.) are not output as part of the contents. This can be changed.
Otherwise, both the field names and values are output. This can be changed.
The field widths are all fixed at 15 characters, padded with spaces, with no delimiters. The field width can be changed, etc.
Let me know how close this comes to what you're looking for and I can make some adjustments.
perl script
#!/usr/bin/perl -w
use strict;
use warnings;
no strict 'refs'; # for FileCache
use FileCache; # avoid exceeding system's maximum number of file descriptors
use JSON;
my $type;
my $json = JSON->new->utf8(1); #NOTE: expect utf-8 strings
while(my $line = <>) { # for each input line
# extract type
eval { $type = $json->decode($line)->{Type} };
$type = 'json_decode_error' if $#;
$type ||= 'missing_type';
# print to the appropriate file
my $fh = cacheout '>>', "$type.out";
print $fh $line; #NOTE: use cache if there are too many hdd seeks
}
corresponding shell script
#!/bin/bash
#NOTE: bash is used to create non-ascii filenames correctly
__extract_type()
{
perl -MJSON -e 'print from_json(shift)->{Type}' "$1"
}
__process_input()
{
local IFS=$'\n'
while read line; do # for each input line
# extract type
local type="$(__extract_type "$line" 2>/dev/null ||
echo json_decode_error)"
[ -z "$type" ] && local type=missing_type
# print to the appropriate file
echo "$line" >> "$type.out"
done
}
__process_input
Example:
$ ./script-name < input_file
$ ls -1 *.out
json_decode_error.out
Mail.out