Passing "|" as an argument to unix script - csv

I have written a generic unix script to load oracle table from any csv.Now the delimeter(field seperator) in the csv can be anything like ',' or '|" or ':' etc.
Hence i am trying to pass the delimeter as an argument to the script explicitly and its working fine for most of the delimeters but when i am trying to pass | then its not giving me proper result as it is implicitly converting | to ,
ksh -x myscript csv_name |
# not working
ksh -x myscript csv_name ,
# working
Please suggest me if there is in escape used for this?

Tested in my machine, you need to use below command -
ksh -x test.sh file "|"
OR
ksh -x test.sh file \|
OR
ksh -x test.sh file '|'
We need to disable the functionality of "|" using escape,single or double quotes.
Testing -
cat file
A|B|B|A
A|A|A
B|A|B|A
B|B
A|A
A|B
A|A|B
cat test.sh
vFile=$1
sep=$2
awk -F "$sep" '{print $1,$2}' ${vFile}
ksh -x test.sh file "|"
A B
A A
B A
B B
A A
A B
A A

Related

Validating the json files in a folder/sub folder through shell script

find dir1 | grep .json | python -mjson.tool $1 > /dev/null
I am using the above command but this doesn't take the files as inputs. What should i do to check for all the json files in a folder and validate whether its a proper json.
I found this a while back and have been using it:
for j in *.json; do python -mjson.tool "$j" > /dev/null || echo "INVALID $j" >&2; done;
I'm sure it compares similarly to dasho-o's answer, but am including it as another option.
You need to use 'xarg'.
The pipe find/grep will place the file names of the json file to STDIN. You need to create a command line with those file names. This is what xargs does:
find dir1 | grep .json | xargs -L1 python -mjson.tool > /dev/null
Side notes:
Since 'find' has filtering and execution predicates, more compact line can be created
find dir1 -name '*.json' -exec python -mjson.tool '{}' ';'
Also consider using 'jq' as light weight alternative to validating via python.

JQ can't parse \u2022 character

I'm trying to perform a bulk upload to Elasticsearch (around 1mln documents). In order to do that, I'm using jq to reformat the JSON file extracted from MySQL database and curl to post the data to Elasticsearch:
cat dataset.json | jq -r -c '.[] | { "index" : { } }, .' | curl -u login:password -H "Content-Type: application/json" -XPOST "https://.../skills/default/_bulk?pretty" --data-binary #-
I get an error:
parse error: Invalid string: control characters from U+0000 through U+001F must be escaped at line 276249, column 317
I found that the character that jq can't parse is \u2022. I tried adding "-r" jq command but the error stil occurs. How can I handle this for all occurrences of \u2022?
Here's verification that \u2022 is properly handled by various versions of jq in a Mac environment:
$ echo '"\u2022"' | jq-1.4 .
"•"
$ echo '"•"' | jq-1.6 .
"•"
$ echo '"•"' | jq-1.5 .
"•"
$ echo '"•"' | jq-1.4 .
"•"
$
Perhaps the problem is related to a bug that was fixed since the release of jq 1.5 (see e.g. https://github.com/stedolan/jq/issues/1311).
If you are having difficulties with jq version 1.6 (the current version), please provide a minimal complete verifiable example
with further details about the computing environment.

How to pass bash variable to JSON

I'm trying to write a sample script where I'm generating names like 'student-101...student-160'. I need to post JSON data and when I do, I get a JSON parse error.
Here's my script:
name="student-10"
for i in {1..1}
do
r_name=$name$i
echo $r_name
curl -i -H 'Authorization: token <token>' -d '{"name": $r_name, "private": true}' "<URL>" >> create_repos_1.txt
echo created $r_name
done
I always get a "Problems parsing JSON" error. I've tried various combination of quotes, etc but nothing seems to work!
What am I doing wrong?
First, your name property is a string, so you need to add double quotes to it in your json.
Second, using single quotes, bash won't do variable expansion: it won't replace $r_name with the variable content (see Expansion of variable inside single quotes in a command in bash shell script for more information).
In summary, use:
-d '{"name": "'"$r_name"'", "private": true}'
Another option is to use printf to create the data string:
printf -v data '{"name": "%s", "private": true}' "$r_name"
curl -i -H 'Authorization: token <token>' -d "$data" "$url" >> create_repos_1.txt
Don't; use jq (or something similar) to build correctly quoted JSON using variable inputs.
name="student-10"
for i in {1..1}
do
r_name=$name$i
jq -n --arg r_name "$r_name" '{name: $r_name, private: true}' |
curl -i -H 'Authorization: token <token>' -d #- "<URL>" >> create_repos_1.txt
echo created $r_name
done
The #- argument tells curl to read data from standard input (via the pipe from jq) to use for -d.
Something like "{\"name\": \"$r_name\", \"private\": true}" may work, but it is ugly and will also fail if r_name contains any character which needs to be quoted in the resulting JSON, such as double quotes or ASCII control characters.

Printing column separated by comma using Awk command line

I have a problem here. I have to print a column in a text file using awk. However, the columns are not separated by spaces at all, only using a single comma. Looks something like this:
column1,column2,column3,column4,column5,column6
How would I print out 3rd column using awk?
Try:
awk -F',' '{print $3}' myfile.txt
Here in -F you are saying to awk that use , as the field separator.
If your only requirement is to print the third field of every line, with each field delimited by a comma, you can use cut:
cut -d, -f3 file
-d, sets the delimiter to a comma
-f3 specifies that only the third field is to be printed
Try this awk
awk -F, '{$0=$3}1' file
column3
, Divide fields by ,
$0=$3 Set the line to only field 3
1 Print all out. (explained here)
This could also be used:
awk -F, '{print $3}' file
A simple, although awk-less solution in bash:
while IFS=, read -r a a a b; do echo "$a"; done <inputfile
It works faster for small files (<100 lines) then awk as it uses less resources (avoids calling the expensive fork and execve system calls).
EDIT from Ed Morton (sorry for hi-jacking the answer, I don't know if there's a better way to address this):
To put to rest the myth that shell will run faster than awk for small files:
$ wc -l file
99 file
$ time while IFS=, read -r a a a b; do echo "$a"; done <file >/dev/null
real 0m0.016s
user 0m0.000s
sys 0m0.015s
$ time awk -F, '{print $3}' file >/dev/null
real 0m0.016s
user 0m0.000s
sys 0m0.015s
I expect if you get a REALY small enough file then you will see the shell script run in a fraction of a blink of an eye faster than the awk script but who cares?
And if you don't believe that it's harder to write robust shell scripts than awk scripts, look at this bug in the shell script you posted:
$ cat file
a,b,-e,d
$ cut -d, -f3 file
-e
$ awk -F, '{print $3}' file
-e
$ while IFS=, read -r a a a b; do echo "$a"; done <file
$

xargs: how to have literal double-quotes in replacement?

I have a source file with JSON-Objects one per line like this:
source:
{"_id":"1","name":"one"}
{"_id":"2","name":"two"}
{"_id":"3","name":"three"}
I want to send each line to a
curl -X POST -H "application/json" myURL -d '<REPLACEMENT>'
The double quotes do not make it to curl when I am trying
<source xargs -I % curl -X POST -H "application/json" myURL -d '%'
I tried escaping the quotes in the curl command and later I replaced all double-quotes in the source file with \". I found no version to work.
Another approach to use seq with sed to write each line into a temp file and curl -d #temp did not work out for me.
Is there an elegant solution or do I have to write a script with a loop?
That's an interesting problem. There must be a better solution (perhaps konsolebox is onto something), but substituting all " with \" would work:
$ echo '"hello"'
"hello"
$ echo '"hello"' | xargs echo
hello
$ echo '"hello"' | sed 's/"/\\"/g' | xargs echo
"hello"
GNU Parallel was built specifically to deal with xargs bad handling of special chars:
<source parallel curl -X POST -H "application/json" myURL -d {}
Not only will it quote " correctly, it will quote any string correctly, so it will be interpreted as a single argument by curl.
Added bonus: Your queries will run in parallel - one query per cpu.
Should do the trick:
cat source.json | xargs -0 -I {} curl {}
From man xargs:
-0, --null
Input items are terminated by a null character instead of by whitespace, and the quotes and backslash are not special (ev‐
ery character is taken literally). Disables the end of file string, which is treated like any other argument. Useful
when input items might contain white space, quote marks, or backslashes. The GNU find -print0 option produces input suit‐
able for this mode.
Try to use --data-urlencode:
<source xargs -I % curl -X POST -H "application/json" myURL --data-urlencode '%'
The option may be used with other formats. See the manual of curl. You can also try --data-binary.