Bash loop Through CSV but left last row value with newline - json

I have case with loop. My task is to create json file with loop from csv data. Unfornunately when i generate field pk, the value is empty that make my json fault.This is the subset of my csv
table,pk
aaa,nik
aab,ida
aac,idb
aad,idc
aae,idd
aef,ide
...
This is my full code:
#!bin/bash
CSV_LIST="/xxx/table_lists.csv"
DATA=${CSV_LIST}
mkdir sqlconn
cd sqlconn
cat ${DATA} |
while IFS=',' read table pk ; do
PK= echo ${pk} | tr -d '\n'
cat > ./sqlservercon_$table.json << EOF
{"name" :"sqlservercon_$table","config":{"connector.class":"io.confluent.connect.jdbc.JdbcSinkConnector","topics":"$table",
...
,"pk.fields":" $PK","pk.mode":"record_value","destination.table.format":"db.dbo.$table","errors.tolerance":"all","flush.size":"10000"
}}
EOF
done
So the rendered result give me this:
{"name" :"sqlservercon_XXX","config":{"connector.class":"io.confluent.connect.jdbc.JdbcSinkConnector","topics":"XXX",...
,"pk.fields":" ","pk.mode":"record_value","destination.table.format":"XXX.XXX.XXX","errors.tolerance":"all","flush.size":"10000"
}}
but when i not edited my field pk
...,
"pk.fields":" $pk",
...
, it gives me wrong JSON file like this:
...,"pk.fields":" id
",...
Any helps are appreciated
UPDATE
When i check my csv using cat -v table_lists.csv the last column has ^M character that ruin the json file. But i still don't know how to deal with it.

In respect to the comments I gave, the following script were working
#!/bin/bash
cd /home/test
CSV_LIST="/home/test/tableList.csv"
DATA=${CSV_LIST}
# Prepare data file
sed -i "s/\r//g" ${DATA}
# Added for debugging purpose
echo "Creating connection file in JSON for"
# Print file content from 2nd line only
tail --lines=+2 ${DATA} |
while IFS=',' read TABLE PK ; do
# Added for debugging purpose
echo "Table: ${TABLE} and PK: ${PK}"
# Added missing $()
PK_TRIMMED=$(echo ${PK} | tr -d '\n')
cat > ./sqlservercon_${TABLE}.json << EOF
{"name":"sqlservercon_${TABLE}","config":{"connector.class":"io.confluent.connect.jdbc.JdbcSinkConnector","topics":"${TABLE}",...,"pk.fields":"${PK_TRIMMED}","pk.mode":"record_value","destination.table.format":"db.dbo.${TABLE}","errors.tolerance":"all","flush.size":"10000"}}
EOF
done

Okay, after several check, beside wrong script that i have give in here, I investigate the CSV file. I download it directly from Spreadsheet Google, even it's give me .csv, but not encoded right for UNIX or Ubuntu as my development environment.
So i decided to do something like this manually:
From google spreadsheet, select all column that i want to use
Create an empty csv file
Copy paste the cells it into the .csv file
Change the " "(double space) with ,
And for the loop one, because i want to curl it instead save the json, i do this:
#!/bin/bash
CSV_LIST="/home/admin/kafka/main/config/tables/table_lists.csv"
DATA=${CSV_LIST}
while IFS=',' read table pk; do
curl -X POST http://localhost:8083/connectors -H 'Content-Type:application/json' -d'{"name" :"sqlservercon_'$table'","config":{...,...,"destination.table.format":"db.dbo.'$table'","errors.tolerance":"all",
"flush.size":"10000"
}}' | jq
done < ${DATA}

Related

json2csv output multiple jsons to one csv

I am using json2csv to convert multiple json files structured like
{
"address": "0xe9f6191596bca549e20431978ee09d3f8db959a9",
"copyright": "None",
"created_at": "None"
...
}
The problem is that I need to put multiple json files into one csv file.
In my code I iterate through a hash file, call a curl with that hash and output the data to a json. Then I use json2csv to convert each json to csv.
mkdir -p curl_outs
{ cat hashes.hash; echo; } | while read h; do
echo "Downloading $h"
curl -L https://main.net955305.contentfabric.io/s/main/q/$h/meta/public/nft > curl_outs/$h.json;
node index.js $h;
json2csv -i curl_outs/$h.json -o main.csv;
done
I use -o to output the json into csv, however it just overwrites the previous json data. So I end up with only one row.
I have used >>, and this does append to the csv file.
json2csv -i "curl_outs/${h}.json" >> main.csv
But for some reason it appends the data's keys to the end of the csv file
I've also tried
cat csv_outs/*.csv > main.csv
However I get the same output.
How do I append multiple json files to one main csv file?
It's not entirely clear from the image and your description what's wrong with >>, but it looks like maybe the CSV file doesn't have a trailing line break, so appending the next file (>>) starts writing directly at the end of the last row and column (cell) of the previous file's data.
I deal with CSVs almost daily and love the GoCSV tool. Its stack subcommand will do just what the name implies: stack multiple CSVs, one on top of the other.
In your case, you could download each JSON and convert it to an individual (intermediate) CSV. Then, at the end, stack all the intermediate CSVs, then delete all the intermediate CSVs.
mkdir -p curl_outs
{ cat hashes.hash; echo; } | while read h; do
echo "Downloading $h"
curl -L https://main.net955305.contentfabric.io/s/main/q/$h/meta/public/nft > curl_outs/$h.json;
node index.js $h;
json2csv -i curl_outs/$h.json -o curl_outs/$h.csv;
done
gocsv stack curl_outs/*.csv > main.csv;
# I suggested deleting the intermediate CSVs
# rm curl_outs/*.csv
# ...
I changed the last line of your loop to json2csv -i curl_outs/$h.json -o curl_outs/$h.csv; to create those intermediate CSVs I mentioned before. Now, gocsv's stack subcommand can take a list of those intermediate CSVs and give you main.csv.

How to split text file into multiple files and extract filename from line prefix?

I have a simple log file with content like:
1504007980.039:{"key":"valueA"}
1504007990.359:{"key":"valueB", "key2": "valueC"}
...
That I'd like to output to multiple files that each have as content the JSON part that comes after the timestamp. So I would get as a result the files:
1504007980039.json
1504007990359.json
...
This is similar to How to split one text file into multiple *.txt files? but the name of the file should be extracted from each line (and remove an extra dot), and not generated via an index
Preferably I'd want a one-liner that can be executed in bash.
Since you aren't using GNU awk you need to close output files as you go to avoid the "too many open files" error. To avoid that and issues around specific values in your JSON and issues related to undefined behavior during output redirection, this is what you need:
awk '{
fname = $0
sub(/\./,"",fname)
sub(/:.*/,".json",fname)
sub(/[^:]+:/,"")
print >> fname
close(fname)
}' file
You can of course squeeze it onto 1 line if you see some benefit to that:
awk '{f=$0;sub(/\./,"",f);sub(/:.*/,".json",f);sub(/[^:]+:/,"");print>>f;close(f)}' file
awk solution:
awk '{ idx=index($0,":"); fn=substr($0,1,idx-1)".json"; sub(/\./,"",fn);
print substr($0,idx+1) > fn; close(fn) }' input.log
idx=index($0,":") - capturing index of the 1st :
fn=substr($0,1,idx-1)".json" - preparing filename
Viewing results (for 2 sample lines from the question):
for f in *.json; do echo "$f"; cat "$f"; echo; done
The output (filename -> content):
1504007980039.json
{"key":"valueA"}
1504007990359.json
{"key":"valueB"}

Split JSON into multiple files

I have json file exported from mongodb which looks like:
{"_id":"99919","city":"THORNE BAY"}
{"_id":"99921","city":"CRAIG"}
{"_id":"99922","city":"HYDABURG"}
{"_id":"99923","city":"HYDER"}
there are about 30000 lines, I want to split each line into it's own .json file. (I'm trying to transfer my data onto couchbase cluster)
I tried doing this:
cat cities.json | jq -c -M '.' | \
while read line; do echo $line > .chunks/cities_$(date +%s%N).json; done
but I found that it seems to drop loads of line and the output of running this command only gave me 50 odd files when I was expecting 30000 odd!!
Is there a logical way to make this not drop any data using anything that would suite?
Assuming you don't care about the exact filenames, if you want to split input into multiple files, just use split.
jq -c . < cities.json | split -l 1 --additional-suffix=.json - .chunks/cities_
In general to split any text file into separate files per-line using any awk on any UNIX system is simply:
awk '{close(f); f=".chunks/cities_"NR".json"; print > f}' cities.json

Read a json file with bash

I kwould like to read the json file from http://freifunk.in-kiel.de/alfred.json in bash and separate it into files named by hostname of each element in that json string.
How do I read json with bash?
How do I read json with bash?
You can use jq for that. First thing you have to do is extract the list of hostnames and save it to a bash array. Running a loop on that array you would then run again a query for each hostname to extract each element based on them and save the data through redirection with the filename based on them as well.
The easiest way to do this is with two instances of jq -- one listing hostnames, and another (inside the loop) extracting individual entries.
This is, alas, a bit inefficient (since it means rereading the file from the top for each record to extract).
while read -r hostname; do
[[ $hostname = */* ]] && continue # paranoia; see comments
jq --arg hostname "$hostname" \
'.[] | select(.hostname == $hostname)' <alfred.json >"out-${hostname}.json"
done < <(jq -r '.[] | .hostname' <alfred.json)
(The out- prefix prevents alfred.json from being overwritten if it includes an entry for a host named alfred).
You can use python one-liner in similar way like (I haven't checked):
curl -s http://freifunk.in-kiel.de/alfred.json | python -c '
import json, sys
tbl=json.load(sys.stdin)
for t in tbl:
with open(tbl[t]["hostname"], "wb") as fp:
json.dump(tbl[t], fp)
'

Export blob from mysql to file in bash script

I'm trying to write a bash script that, among other things, extracts information from a mysql database. I tried the following to extract a file from entry 20:
mysql -se "select file_column_name from table where id=20;" >file.txt
That gave me a file.txt with the file name, not the file contents. How would I get the actual blob into file.txt?
Turn the value in file.txt into a variable and then use it as you need to? i.e.
blobFile=$(cat file.txt)
echo "----- contents of $blobFile ---------"
cat $blobFile
# copy the file somewhere else
scp $blobFile user#Remote /path/to/remote/loc/for/blobFile
# look for info in blobfile
grep specialInfo $blobFile
# etc ...
Is that what you want/need to do?
I hope this helps.