In my hypothetical folder /hd/log/, I have 2 dozens Folder and each folder has log files in this format foldername.2017.07.09.log. I have a crontab that gzips the last log file every night, so there is a new log file with new log name every day.
I am trying to create a dynamic json file whose out put looks like this:
[
{
"Foldername": "foldername",
"lastmodifiedfile": "/hd/log/foldername/foldername.2017.07.09.log"
},
{
"Foldername": "foldername2",
"lastmodifiedfile": "/hd/log/foldername2/foldername2.2017.07.09.log"
}
]
The bash script should be able to dynamically create array for each subfolder name (in case more folder are added or names are changed) and also give direct link to the last modified file.
I already php program to parse json file, but no sane way to crease this json file dynamically.
Any help or pointers is appreciated.
printf "%s" "["
for var in $(find /hd/log -type d)
do
path=$("ls -1t $var" | head -1)
echo $var"/"$path | awk -F\/ '{ printf "%s","\n\t{\n\t\t\"Foldername\":\""$(NF-1)"\",\n\t\tlastmodifiedfile\":\""$0"\"\n\t},"}'
done
printf "%s" "]"
Here we find all directories in /hd/log in a loop taking each directory in turn and then using ls -1t | head -1 to get the last modified file in the directory. The path and file is then parsed through awk to get the desired output. We first set the delimiter for awk as / with the -F flag. Then we then print the json syntax as required using the last but one / delimited piece of data for the directory (NF -1 - number field -1) and the complete line for the last modified file ($0).
Related
I'm creating a gitlab-ci.yml file where I need to iterate in a json file and extract the keys and values. I cannot use jq and I am trying with a cat command to do this. This is my script:
script:
while read line
do
echo $line
done < $myfile
And this is myfile:
{"var1":"test1",
"other_var":"test2"}
Since I am not able to use jq becuase is not installed and my customer doesn't allow me to install it, how I can make a print like this one:
"Key is var1 and value is test1"
I am using json2csv to convert multiple json files structured like
{
"address": "0xe9f6191596bca549e20431978ee09d3f8db959a9",
"copyright": "None",
"created_at": "None"
...
}
The problem is that I need to put multiple json files into one csv file.
In my code I iterate through a hash file, call a curl with that hash and output the data to a json. Then I use json2csv to convert each json to csv.
mkdir -p curl_outs
{ cat hashes.hash; echo; } | while read h; do
echo "Downloading $h"
curl -L https://main.net955305.contentfabric.io/s/main/q/$h/meta/public/nft > curl_outs/$h.json;
node index.js $h;
json2csv -i curl_outs/$h.json -o main.csv;
done
I use -o to output the json into csv, however it just overwrites the previous json data. So I end up with only one row.
I have used >>, and this does append to the csv file.
json2csv -i "curl_outs/${h}.json" >> main.csv
But for some reason it appends the data's keys to the end of the csv file
I've also tried
cat csv_outs/*.csv > main.csv
However I get the same output.
How do I append multiple json files to one main csv file?
It's not entirely clear from the image and your description what's wrong with >>, but it looks like maybe the CSV file doesn't have a trailing line break, so appending the next file (>>) starts writing directly at the end of the last row and column (cell) of the previous file's data.
I deal with CSVs almost daily and love the GoCSV tool. Its stack subcommand will do just what the name implies: stack multiple CSVs, one on top of the other.
In your case, you could download each JSON and convert it to an individual (intermediate) CSV. Then, at the end, stack all the intermediate CSVs, then delete all the intermediate CSVs.
mkdir -p curl_outs
{ cat hashes.hash; echo; } | while read h; do
echo "Downloading $h"
curl -L https://main.net955305.contentfabric.io/s/main/q/$h/meta/public/nft > curl_outs/$h.json;
node index.js $h;
json2csv -i curl_outs/$h.json -o curl_outs/$h.csv;
done
gocsv stack curl_outs/*.csv > main.csv;
# I suggested deleting the intermediate CSVs
# rm curl_outs/*.csv
# ...
I changed the last line of your loop to json2csv -i curl_outs/$h.json -o curl_outs/$h.csv; to create those intermediate CSVs I mentioned before. Now, gocsv's stack subcommand can take a list of those intermediate CSVs and give you main.csv.
I have case with loop. My task is to create json file with loop from csv data. Unfornunately when i generate field pk, the value is empty that make my json fault.This is the subset of my csv
table,pk
aaa,nik
aab,ida
aac,idb
aad,idc
aae,idd
aef,ide
...
This is my full code:
#!bin/bash
CSV_LIST="/xxx/table_lists.csv"
DATA=${CSV_LIST}
mkdir sqlconn
cd sqlconn
cat ${DATA} |
while IFS=',' read table pk ; do
PK= echo ${pk} | tr -d '\n'
cat > ./sqlservercon_$table.json << EOF
{"name" :"sqlservercon_$table","config":{"connector.class":"io.confluent.connect.jdbc.JdbcSinkConnector","topics":"$table",
...
,"pk.fields":" $PK","pk.mode":"record_value","destination.table.format":"db.dbo.$table","errors.tolerance":"all","flush.size":"10000"
}}
EOF
done
So the rendered result give me this:
{"name" :"sqlservercon_XXX","config":{"connector.class":"io.confluent.connect.jdbc.JdbcSinkConnector","topics":"XXX",...
,"pk.fields":" ","pk.mode":"record_value","destination.table.format":"XXX.XXX.XXX","errors.tolerance":"all","flush.size":"10000"
}}
but when i not edited my field pk
...,
"pk.fields":" $pk",
...
, it gives me wrong JSON file like this:
...,"pk.fields":" id
",...
Any helps are appreciated
UPDATE
When i check my csv using cat -v table_lists.csv the last column has ^M character that ruin the json file. But i still don't know how to deal with it.
In respect to the comments I gave, the following script were working
#!/bin/bash
cd /home/test
CSV_LIST="/home/test/tableList.csv"
DATA=${CSV_LIST}
# Prepare data file
sed -i "s/\r//g" ${DATA}
# Added for debugging purpose
echo "Creating connection file in JSON for"
# Print file content from 2nd line only
tail --lines=+2 ${DATA} |
while IFS=',' read TABLE PK ; do
# Added for debugging purpose
echo "Table: ${TABLE} and PK: ${PK}"
# Added missing $()
PK_TRIMMED=$(echo ${PK} | tr -d '\n')
cat > ./sqlservercon_${TABLE}.json << EOF
{"name":"sqlservercon_${TABLE}","config":{"connector.class":"io.confluent.connect.jdbc.JdbcSinkConnector","topics":"${TABLE}",...,"pk.fields":"${PK_TRIMMED}","pk.mode":"record_value","destination.table.format":"db.dbo.${TABLE}","errors.tolerance":"all","flush.size":"10000"}}
EOF
done
Okay, after several check, beside wrong script that i have give in here, I investigate the CSV file. I download it directly from Spreadsheet Google, even it's give me .csv, but not encoded right for UNIX or Ubuntu as my development environment.
So i decided to do something like this manually:
From google spreadsheet, select all column that i want to use
Create an empty csv file
Copy paste the cells it into the .csv file
Change the " "(double space) with ,
And for the loop one, because i want to curl it instead save the json, i do this:
#!/bin/bash
CSV_LIST="/home/admin/kafka/main/config/tables/table_lists.csv"
DATA=${CSV_LIST}
while IFS=',' read table pk; do
curl -X POST http://localhost:8083/connectors -H 'Content-Type:application/json' -d'{"name" :"sqlservercon_'$table'","config":{...,...,"destination.table.format":"db.dbo.'$table'","errors.tolerance":"all",
"flush.size":"10000"
}}' | jq
done < ${DATA}
I have a simple log file with content like:
1504007980.039:{"key":"valueA"}
1504007990.359:{"key":"valueB", "key2": "valueC"}
...
That I'd like to output to multiple files that each have as content the JSON part that comes after the timestamp. So I would get as a result the files:
1504007980039.json
1504007990359.json
...
This is similar to How to split one text file into multiple *.txt files? but the name of the file should be extracted from each line (and remove an extra dot), and not generated via an index
Preferably I'd want a one-liner that can be executed in bash.
Since you aren't using GNU awk you need to close output files as you go to avoid the "too many open files" error. To avoid that and issues around specific values in your JSON and issues related to undefined behavior during output redirection, this is what you need:
awk '{
fname = $0
sub(/\./,"",fname)
sub(/:.*/,".json",fname)
sub(/[^:]+:/,"")
print >> fname
close(fname)
}' file
You can of course squeeze it onto 1 line if you see some benefit to that:
awk '{f=$0;sub(/\./,"",f);sub(/:.*/,".json",f);sub(/[^:]+:/,"");print>>f;close(f)}' file
awk solution:
awk '{ idx=index($0,":"); fn=substr($0,1,idx-1)".json"; sub(/\./,"",fn);
print substr($0,idx+1) > fn; close(fn) }' input.log
idx=index($0,":") - capturing index of the 1st :
fn=substr($0,1,idx-1)".json" - preparing filename
Viewing results (for 2 sample lines from the question):
for f in *.json; do echo "$f"; cat "$f"; echo; done
The output (filename -> content):
1504007980039.json
{"key":"valueA"}
1504007990359.json
{"key":"valueB"}
I'm using jq (http://stedolan.github.io/jq/) to pull some specific data from some JSON files and convert it to another JSON file eg:
cat data1.json | ./jq '[.["messages"][] | {to: .to, from: .from, body: .body, direction: .direction, date_sent: .date_sent }]' > results1.json
I have 50 JSON files in a directory to do this to. How do I write a bit of shell script to iterate over all 50 files, perform said function, and save out to 50 scrubbed JSON files?
I'm thinking its something along these lines, but need some guidance:
for file in *.json | ./jq | '[.["messages"][] | {to: .to, from: .from, body: .body, direction: .direction, date_sent: .date_sent }]' "$file" "$newfile.json" ; done
Thanks!
I'm not familiar with jq, so there might be some way to get it to process many files in a single invocation. This will work for invoking it once per file though:
#!/bin/bash
for file in *.json; do
./jq '[.["messages"...' < "$file" > "$file.scrubbed"
done
Using cat for redirecting the input to a file is redundant. Just use < instead.
If your input files follow a consistent naming scheme like datan.json and you want the output files to be called e.g. resultn.json, you could use > "${file/data/result}" instead (though it might not be portable to some non-Bash shells). Watch out so you don't accidentally overwrite some file whose name doesn't contain "data" though. Search for ${parameter/pattern/string} in the Bash manual.