ruby: stripping backslashes from string - mysql

I'm running into an issue trying to parse through a predefined nessus xml report and dump the data into a mysql database. Some of the data i'm dumping into the database has the following characters which is making mysql barf obviously: ' " \
I am able to remove the single and double quotes, but is there a method to escape an escape? Keep in mind i dont have control over what is stored once it is iterated. Here's an example:
myvariable = "this is some bloated nessus output that has a bunch of crappy data and this domain\username"
myvariable.gsub!(/\\/, '')
The following gsub wont remove the backslash because it already thinks \u is escaped.
here is the actual code that is parsing the nessus xml file:
#!/usr/bin/ruby
#
# Database Schema:
# VALUES(Id (leave null), start time, hostname, host_ip, operating_system, scan_name, plugin_id, cve, cvss, risk, port, description, synopsis, solution, see_also, plugin_output, vuln_crit, vuln_high, vuln_med)
#
require 'mysql'
require 'nessus'
begin
con = Mysql.new 'yourdbhost', 'yourdbuser', 'yourpass', 'nessusdb'
scanTime = Time.now.to_i
Nessus::Parse.new("bloated.xml", :version => 2) do |scan|
scan.each_host do |host| # enumerate each host
start_time = host.start_time
next if host.event_count.zero? # skip host if there are no events to dump in the db
host.each_event do |event|
# '#{event.see_also.join('\s').gsub(/\"|\'|\\/, '')}'
# '#{event.solution.gsub!(/\"|\'|\\/, '')}'
# '#{event.synopsis.gsub!(/\"|\'|\\/, '')}'
con.query( \
"INSERT INTO nessus_scans VALUES \
(NULL, \
'#{scanTime}', \
'#{host.hostname}', \
'#{host.ip}', \
'#{host.operating_system}',\
'#{scan.title}', \
'#{event.plugin_id}', \
'#{event.cve}', \
'#{event.cvss_base_score}',\
'#{event.risk}', \
'#{event.port}', \
'#{event.description.gsub!(/\"|\'|\\/, '')}', \
NULL, \
NULL, \
NULL, \
NULL, \
NULL, \
NULL, \
NULL \)
")
end # end xml file iteration
end # end scan.each_host iteration
end # end host.each_event iteration
rescue Mysql::Error => e
puts e.errno
puts e.error
ensure
con.close if con
end

You have a gigantic SQL injection hole because you're not doing any escaping here. Using the MySQL driver directly is an extremely bad idea. At the very least use a database layer like Sequel or ActiveRecord. The singular reason why MySQL is "barfing" is because you're not using it correctly, you must escape.
The easiest fix for this mess is to use the escape_string method, but you need to do this for every single value, something that quickly becomes tedious. A proper database layer allows you to use parameterized queries that handle escaping for you, which is why I strongly encourage that.

for example, you can use brackets:
myvariable.gsub!(/[\\]+/, '')

Related

Export sql table with data that has commas to a csv file [duplicate]

Is there an easy way to run a MySQL query from the Linux command line and output the results in CSV format?
Here's what I'm doing now:
mysql -u uid -ppwd -D dbname << EOQ | sed -e 's/ /,/g' | tee list.csv
select id, concat("\"",name,"\"") as name
from students
EOQ
It gets messy when there are a lot of columns that need to be surrounded by quotes, or if there are quotes in the results that need to be escaped.
From Save MySQL query results into a text or CSV file:
SELECT order_id,product_name,qty
FROM orders
WHERE foo = 'bar'
INTO OUTFILE '/var/lib/mysql-files/orders.csv'
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n';
Note: That syntax may need to be reordered to
SELECT order_id,product_name,qty
INTO OUTFILE '/var/lib/mysql-files/orders.csv'
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
FROM orders
WHERE foo = 'bar';
in more recent versions of MySQL.
Using this command, columns names will not be exported.
Also note that /var/lib/mysql-files/orders.csv will be on the server that is running MySQL. The user that the MySQL process is running under must have permissions to write to the directory chosen, or the command will fail.
If you want to write output to your local machine from a remote server (especially a hosted or virtualize machine such as Heroku or Amazon RDS), this solution is not suitable.
mysql your_database --password=foo < my_requests.sql > out.tsv
This produces a tab-separated format. If you are certain that commas do not appear in any of the column data (and neither do tabs), you can use this pipe command to get a true CSV (thanks to user John Carter):
... .sql | sed 's/\t/,/g' > out.csv
mysql --batch, -B
Print results using tab as the column separator, with each row on a
new line. With this option, mysql does not use the history file.
Batch mode results in non-tabular output format and escaping of
special characters. Escaping may be disabled by using raw mode; see
the description for the --raw option.
This will give you a tab-separated file. Since commas (or strings containing comma) are not escaped, it is not straightforward to change the delimiter to comma.
Here's a fairly gnarly way of doing it[1]:
mysql --user=wibble --password mydatabasename -B -e "select * from vehicle_categories;" | sed "s/'/\'/;s/\t/\",\"/g;s/^/\"/;s/$/\"/;s/\n//g" > vehicle_categories.csv
It works pretty well. Once again, though, a regular expression proves write-only.
Regex Explanation:
s/// means substitute what's between the first // with what's between the second //
the "g" at the end is a modifier that means "all instance, not just first"
^ (in this context) means beginning of line
$ (in this context) means end of line
So, putting it all together:
s/'/\'/ Replace ' with \'
s/\t/\",\"/g Replace all \t (tab) with ","
s/^/\"/ at the beginning of the line place a "
s/$/\"/ At the end of the line, place a "
s/\n//g Replace all \n (newline) with nothing
[1] I found it somewhere and can't take any credit.
Pipe it through 'tr' (Unix/Cygwin only):
mysql <database> -e "<query here>" | tr '\t' ',' > data.csv
N.B.: This handles neither embedded commas, nor embedded tabs.
This saved me a couple of times. It is fast and it works!
--batch
Print results using tab as the column separator, with each row on a
new line.
--raw disables character escaping (\n, \t, \0, and \)
Example:
mysql -udemo_user -p -h127.0.0.1 --port=3306 \
--default-character-set=utf8mb4 --database=demo_database \
--batch --raw < /tmp/demo_sql_query.sql > /tmp/demo_csv_export.tsv
For completeness you could convert to CSV (but be careful because tabs could be inside field values - e.g., text fields)
tr '\t' ',' < file.tsv > file.csv
The OUTFILE solution given by Paul Tomblin causes a file to be written on the MySQL server itself, so this will work only if you have FILE access, as well as login access or other means for retrieving the file from that box.
If you don't have such access, and tab-delimited output is a reasonable substitute for CSV (e.g., if your end goal is to import to Excel), then serbaut's solution (using mysql --batch and optionally --raw) is the way to go.
MySQL Workbench can export recordsets to CSV, and it seems to handle commas in fields very well. The CSV opens up in OpenOffice Calc fine.
All of the solutions here to date, except the MySQL Workbench one, are incorrect and quite possibly unsafe (i.e., security issues) for at least some possible content in the MySQL database.
MySQL Workbench (and similarly phpMyAdmin) provide a formally correct solution, but they are designed for downloading the output to a user's location. They're not so useful for things like automating data export.
It is not possible to generate reliably correct CSV content from the output of mysql -B -e 'SELECT ...' because that cannot encode carriage returns and white space in fields. The '-s' flag to mysql does do backslash escaping, and might lead to a correct solution. However, using a scripting language (one with decent internal data structures that is, not Bash), and libraries where the encoding issues have already been carefully worked out is far safer.
I thought about writing a script for this, but as soon as I thought about what I'd call it, it occurred to me to search for preexisting work by the same name. While I haven't gone over it thoroughly, mysql2csv looks promising. Depending on your application, the YAML approach to specifying the SQL commands might or might not appeal though. I'm also not thrilled with the requirement for a more recent version of Ruby than comes as standard with my Ubuntu 12.04 (Precise Pangolin) laptop or Debian 6.0 (Squeeze) servers. Yes, I know I could use RVM, but I'd rather not maintain that for such a simple purpose.
Use:
mysql your_database -p < my_requests.sql | awk '{print $1","$2}' > out.csv
Many of the answers on this page are weak, because they don't handle the general case of what can occur in CSV format. E.g., commas and quotes embedded in fields and other conditions that always come up eventually. We need a general solution that works for all valid CSV input data.
Here's a simple and strong solution in Python:
#!/usr/bin/env python
import csv
import sys
tab_in = csv.reader(sys.stdin, dialect=csv.excel_tab)
comma_out = csv.writer(sys.stdout, dialect=csv.excel)
for row in tab_in:
comma_out.writerow(row)
Name that file tab2csv, put it on your path, give it execute permissions, then use it like this:
mysql OTHER_OPTIONS --batch --execute='select * from whatever;' | tab2csv > outfile.csv
The Python CSV-handling functions cover corner cases for CSV input format(s).
This could be improved to handle very large files via a streaming approach.
From your command line, you can do this:
mysql -h *hostname* -P *port number* --database=*database_name* -u *username* -p -e *your SQL query* | sed 's/\t/","/g;s/^/"/;s/$/"/;s/\n//g' > *output_file_name.csv*
Credits: Exporting table from Amazon RDS into a CSV file
This answer uses Python and a popular third party library, PyMySQL. I'm adding it because Python's csv library is powerful enough to correctly handle many different flavors of .csv and no other answers are using Python code to interact with the database.
import contextlib
import csv
import datetime
import os
# https://github.com/PyMySQL/PyMySQL
import pymysql
SQL_QUERY = """
SELECT * FROM my_table WHERE my_attribute = 'my_attribute';
"""
# embedding passwords in code gets nasty when you use version control
# the environment is not much better, but this is an example
# https://stackoverflow.com/questions/12461484
SQL_USER = os.environ['SQL_USER']
SQL_PASS = os.environ['SQL_PASS']
connection = pymysql.connect(host='localhost',
user=SQL_USER,
password=SQL_PASS,
db='dbname')
with contextlib.closing(connection):
with connection.cursor() as cursor:
cursor.execute(SQL_QUERY)
# Hope you have enough memory :)
results = cursor.fetchall()
output_file = 'my_query-{}.csv'.format(datetime.datetime.today().strftime('%Y-%m-%d'))
with open(output_file, 'w', newline='') as csvfile:
# http://stackoverflow.com/a/17725590/2958070 about lineterminator
csv_writer = csv.writer(csvfile, lineterminator='\n')
csv_writer.writerows(results)
I encountered the same problem and Paul's Answer wasn't an option since it was Amazon RDS. Replacing the tab with the commas did not work as the data had embedded commas and tabs. I found that mycli, which is a drop-in alternative for the mysql-client, supports CSV output out of the box with the --csv flag:
mycli db_name --csv -e "select * from flowers" > flowers.csv
This is simple, and it works on anything without needing batch mode or output files:
select concat_ws(',',
concat('"', replace(field1, '"', '""'), '"'),
concat('"', replace(field2, '"', '""'), '"'),
concat('"', replace(field3, '"', '""'), '"'))
from your_table where etc;
Explanation:
Replace " with "" in each field --> replace(field1, '"', '""')
Surround each result in quotation marks --> concat('"', result1, '"')
Place a comma between each quoted result --> concat_ws(',', quoted1, quoted2, ...)
That's it!
Also, if you're performing the query on the Bash command line, I believe the tr command can be used to substitute the default tabs to arbitrary delimiters.
$ echo "SELECT * FROM Table123" | mysql Database456 | tr "\t" ,
You can have a MySQL table that uses the CSV engine.
Then you will have a file on your hard disk that will always be in a CSV format which you could just copy without processing it.
To expand on previous answers, the following one-liner exports a single table as a tab-separated file. It's suitable for automation, exporting the database every day or so.
mysql -B -D mydatabase -e 'select * from mytable'
Conveniently, we can use the same technique to list out MySQL's tables, and to describe the fields on a single table:
mysql -B -D mydatabase -e 'show tables'
mysql -B -D mydatabase -e 'desc users'
Field Type Null Key Default Extra
id int(11) NO PRI NULL auto_increment
email varchar(128) NO UNI NULL
lastName varchar(100) YES NULL
title varchar(128) YES UNI NULL
userName varchar(128) YES UNI NULL
firstName varchar(100) YES NULL
Here's what I do:
echo $QUERY | \
mysql -B $MYSQL_OPTS | \
perl -F"\t" -lane 'print join ",", map {s/"/""/g; /^[\d.]+$/ ? $_ : qq("$_")} #F ' | \
mail -s 'report' person#address
The Perl script (snipped from elsewhere) does a nice job of converting the tab spaced fields to CSV.
Building on user7610, here is the best way to do it. With mysql outfile there were 60 mins of file ownership and overwriting problems.
It's not cool, but it worked in 5 mins.
php csvdump.php localhost root password database tablename > whatever-you-like.csv
<?php
$server = $argv[1];
$user = $argv[2];
$password = $argv[3];
$db = $argv[4];
$table = $argv[5];
mysql_connect($server, $user, $password) or die(mysql_error());
mysql_select_db($db) or die(mysql_error());
// fetch the data
$rows = mysql_query('SELECT * FROM ' . $table);
$rows || die(mysql_error());
// create a file pointer connected to the output stream
$output = fopen('php://output', 'w');
// output the column headings
$fields = [];
for($i = 0; $i < mysql_num_fields($rows); $i++) {
$field_info = mysql_fetch_field($rows, $i);
$fields[] = $field_info->name;
}
fputcsv($output, $fields);
// loop over the rows, outputting them
while ($row = mysql_fetch_assoc($rows)) fputcsv($output, $row);
?>
Not exactly as a CSV format, but the tee command from the MySQL client can be used to save the output into a local file:
tee foobar.txt
SELECT foo FROM bar;
You can disable it using notee.
The problem with SELECT … INTO OUTFILE …; is that it requires permission to write files at the server.
In my case from table_name ..... before INTO OUTFILE ..... gives an error:
Unexpected ordering of clauses. (near "FROM" at position 10)
What works for me:
SELECT *
INTO OUTFILE '/Volumes/Development/sql/sql/enabled_contacts.csv'
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
FROM table_name
WHERE column_name = 'value'
What worked for me:
SELECT *
FROM students
WHERE foo = 'bar'
LIMIT 0,1200000
INTO OUTFILE './students-1200000.csv'
FIELDS TERMINATED BY ',' ESCAPED BY '"'
ENCLOSED BY '"'
LINES TERMINATED BY '\r\n';
None of the solutions on this thread worked for my particular case. I had pretty JSON data inside one of the columns, which would get messed up in my CSV output. For those with a similar problem, try lines terminated by \r\n instead.
Also another problem for those trying to open the CSV with Microsoft Excel, keep in mind there is a limit of 32,767 characters that a single cell can hold, above that it overflows to the rows below. To identify which records in a column have the issue, use the query below. You can then truncate those records or handle them as you'd like.
SELECT id,name,CHAR_LENGTH(json_student_description) AS 'character length'
FROM students
WHERE CHAR_LENGTH(json_student_description)>32767;
Using the solution posted by Tim Harding, I created this Bash script to facilitate the process (root password is requested, but you can modify the script easily to ask for any other user):
#!/bin/bash
if [ "$1" == "" ];then
echo "Usage: $0 DATABASE TABLE [MYSQL EXTRA COMMANDS]"
exit
fi
DBNAME=$1
TABLE=$2
FNAME=$1.$2.csv
MCOMM=$3
echo "MySQL password: "
stty -echo
read PASS
stty echo
mysql -uroot -p$PASS $MCOMM $DBNAME -B -e "SELECT * FROM $TABLE;" | sed "s/'/\'/;s/\t/\",\"/g;s/^/\"/;s/$/\"/;s/\n//g" > $FNAME
It will create a file named: database.table.csv
If you have PHP set up on the server, you can use mysql2csv to export an (actually valid) CSV file for an arbitrary MySQL query. See my answer at MySQL - SELECT * INTO OUTFILE LOCAL ? for a little more context/info.
I tried to maintain the option names from mysql so it should be sufficient to provide the --file and --query options:
./mysql2csv --file="/tmp/result.csv" --query='SELECT 1 as foo, 2 as bar;' --user="username" --password="password"
"Install" mysql2csv via
wget https://gist.githubusercontent.com/paslandau/37bf787eab1b84fc7ae679d1823cf401/raw/29a48bb0a43f6750858e1ddec054d3552f3cbc45/mysql2csv -O mysql2csv -q && (sha256sum mysql2csv | cmp <(echo "b109535b29733bd596ecc8608e008732e617e97906f119c66dd7cf6ab2865a65 mysql2csv") || (echo "ERROR comparing hash, Found:" ;sha256sum mysql2csv) ) && chmod +x mysql2csv
(Download content of the gist, check checksum and make it executable.)
The following produces tab-delimited and valid CSV output. Unlike most of the other answers, this technique correctly handles escaping of tabs, commas, quotes, and new lines without any stream filter like sed, AWK, or tr.
The example shows how to pipe a remote MySQL table directly into a local SQLite database using streams. This works without FILE permission or SELECT INTO OUTFILE permission. I have added new lines for readability.
mysql -B -C --raw -u 'username' --password='password' --host='hostname' 'databasename'
-e 'SELECT
CONCAT('\''"'\'',REPLACE(`id`,'\''"'\'', '\''""'\''),'\''"'\'') AS '\''id'\'',
CONCAT('\''"'\'',REPLACE(`value`,'\''"'\'', '\''""'\''),'\''"'\'') AS '\''value'\''
FROM sampledata'
2>/dev/null | sqlite3 -csv -separator $'\t' mydb.db '.import /dev/stdin mycsvtable'
The 2>/dev/null is needed to suppress the warning about the password on the command line.
If your data has NULLs, you can use the IFNULL() function in the query.
A simple solution in Python that writes a standard-format CSV file with headers and writes data as a stream (low memory use):
import csv
def export_table(connection, table_name, output_filename):
cursor = connection.cursor()
cursor.execute("SELECT * FROM " + table_name)
# thanks to https://gist.github.com/madan712/f27ac3b703a541abbcd63871a4a56636 for this hint
header = [descriptor[0] for descriptor in cursor.description]
with open(output_filename, 'w') as csvfile:
csv_writer = csv.writer(csvfile, dialect='excel')
csv_writer.writerow(header)
for row in cursor:
csv_writer.writerow(row)
You could use it like:
import mysql.connector as mysql
# (or https://github.com/PyMySQL/PyMySQL should work but I haven't tested it)
db = mysql.connect(
host="localhost",
user="USERNAME",
db="DATABASE_NAME",
port=9999)
for table_name in ['table1', 'table2']:
export_table(db, table_name, table_name + '.csv')
db.close()
For simplicity, this intentionally doesn't include some fancier stuff from another answer like using an environment variable for credentials, contextlib, etc. There is a subtlety mentioned there about line endings that I haven't evaluated.
Tiny Bash script for doing simple query to CSV dumps, inspired by Tim Harding's answer.
#!/bin/bash
# $1 = query to execute
# $2 = outfile
# $3 = mysql database name
# $4 = mysql username
if [ -z "$1" ]; then
echo "Query not given"
exit 1
fi
if [ -z "$2" ]; then
echo "Outfile not given"
exit 1
fi
MYSQL_DB=""
MYSQL_USER="root"
if [ ! -z "$3" ]; then
MYSQL_DB=$3
fi
if [ ! -z "$4" ]; then
MYSQL_USER=$4
fi
if [ -z "$MYSQL_DB" ]; then
echo "Database name not given"
exit 1
fi
if [ -z "$MYSQL_USER" ]; then
echo "Database user not given"
exit 1
fi
mysql -u $MYSQL_USER -p -D $MYSQL_DB -B -s -e "$1" | sed "s/'/\'/;s/\t/\",\"/g;s/^/\"/;s/$/\"/;s/\n//g" > $2
echo "Written to $2"
If you are getting an error of secure-file-priv then, also after shifting your destination file location inside the C:\ProgramData\MySQL\MySQL Server 8.0\Uploads and also after then the query -
SELECT * FROM attendance INTO OUTFILE 'C:\ProgramData\MySQL\MySQL Server 8.0\Uploads\FileName.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n';
is not working, you have to just change \(backsplash) from the query to / (forwardsplash)
And that works!!
Example:
SELECT * FROM attendance INTO OUTFILE 'C:/ProgramData/MySQL/MySQL Server 8.0/Uploads/FileName.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n';
Each time when you run the successful query, it will generate the new CSV file each time!
Cool, right?
The following Bash script works for me. It optionally also gets the schema for the requested tables.
#!/bin/bash
#
# Export MySQL data to CSV
#https://stackoverflow.com/questions/356578/how-to-output-mysql-query-results-in-csv-format
#
# ANSI colors
#http://www.csc.uvic.ca/~sae/seng265/fall04/tips/s265s047-tips/bash-using-colors.html
blue='\033[0;34m'
red='\033[0;31m'
green='\033[0;32m' # '\e[1;32m' is too bright for white bg.
endColor='\033[0m'
#
# A colored message
# params:
# 1: l_color - the color of the message
# 2: l_msg - the message to display
#
color_msg() {
local l_color="$1"
local l_msg="$2"
echo -e "${l_color}$l_msg${endColor}"
}
#
# Error
#
# Show the given error message on standard error and exit
#
# Parameters:
# 1: l_msg - the error message to display
#
error() {
local l_msg="$1"
# Use ANSI red for error
color_msg $red "Error:" 1>&2
color_msg $red "\t$l_msg" 1>&2
usage
}
#
# Display usage
#
usage() {
echo "usage: $0 [-h|--help]" 1>&2
echo " -o | --output csvdirectory" 1>&2
echo " -d | --database database" 1>&2
echo " -t | --tables tables" 1>&2
echo " -p | --password password" 1>&2
echo " -u | --user user" 1>&2
echo " -hs | --host host" 1>&2
echo " -gs | --get-schema" 1>&2
echo "" 1>&2
echo " output: output CSV directory to export MySQL data into" 1>&2
echo "" 1>&2
echo " user: MySQL user" 1>&2
echo " password: MySQL password" 1>&2
echo "" 1>&2
echo " database: target database" 1>&2
echo " tables: tables to export" 1>&2
echo " host: host of target database" 1>&2
echo "" 1>&2
echo " -h|--help: show help" 1>&2
exit 1
}
#
# show help
#
help() {
echo "$0 Help" 1>&2
echo "===========" 1>&2
echo "$0 exports a CSV file from a MySQL database optionally limiting to a list of tables" 1>&2
echo " example: $0 --database=cms --user=scott --password=tiger --tables=person --output person.csv" 1>&2
echo "" 1>&2
usage
}
domysql() {
mysql --host $host -u$user --password=$password $database
}
getcolumns() {
local l_table="$1"
echo "describe $l_table" | domysql | cut -f1 | grep -v "Field" | grep -v "Warning" | paste -sd "," - 2>/dev/null
}
host="localhost"
mysqlfiles="/var/lib/mysql-files/"
# Parse command line options
while true; do
#echo "option $1"
case "$1" in
# Options without arguments
-h|--help) usage;;
-d|--database) database="$2" ; shift ;;
-t|--tables) tables="$2" ; shift ;;
-o|--output) csvoutput="$2" ; shift ;;
-u|--user) user="$2" ; shift ;;
-hs|--host) host="$2" ; shift ;;
-p|--password) password="$2" ; shift ;;
-gs|--get-schema) option="getschema";;
(--) shift; break;;
(-*) echo "$0: error - unrecognized option $1" 1>&2; usage;;
(*) break;;
esac
shift
done
# Checks
if [ "$csvoutput" == "" ]
then
error "output CSV directory is not set"
fi
if [ "$database" == "" ]
then
error "MySQL database is not set"
fi
if [ "$user" == "" ]
then
error "MySQL user is not set"
fi
if [ "$password" == "" ]
then
error "MySQL password is not set"
fi
color_msg $blue "exporting tables of database $database"
if [ "$tables" = "" ]
then
tables=$(echo "show tables" | domysql)
fi
case $option in
getschema)
rm $csvoutput$database.schema
for table in $tables
do
color_msg $blue "getting schema for $table"
echo -n "$table:" >> $csvoutput$database.schema
getcolumns $table >> $csvoutput$database.schema
done
;;
*)
for table in $tables
do
color_msg $blue "exporting table $table"
cols=$(grep "$table:" $csvoutput$database.schema | cut -f2 -d:)
if [ "$cols" = "" ]
then
cols=$(getcolumns $table)
fi
ssh $host rm $mysqlfiles/$table.csv
cat <<EOF | mysql --host $host -u$user --password=$password $database
SELECT $cols FROM $table INTO OUTFILE '$mysqlfiles$table.csv'
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n';
EOF
scp $host:$mysqlfiles/$table.csv $csvoutput$table.csv.raw
(echo "$cols"; cat $csvoutput$table.csv.raw) > $csvoutput$table.csv
rm $csvoutput$table.csv.raw
done
;;
esac

How to prevent SQL injection while executing a MySQL statement through shell?

I have this code:
printf '%s' 'Enter deployment request ID: '
read request_id
[[ $request_id ]] || { printf '%s' 'Request ID is required' >&2; exit 2; }
...
mysql -h "$db_host" -u app_user --database dep_db -p -sNE "
update dep_requests set
state='FAILED', end_time=sysdate(), message='Cancelled manually'
where id='$request_id' limit 1;
"
Since request_id, which is a string, is received as a user input, it could possibly lead to SQL injection. What is the best way to make this code free from that vulnerability?
I can possibly validate the input with a regex match. Are there better ways?
Some people propose to do this using SET syntax:
SET #requestId = '$request_id';
update dep_requests set
state='FAILED', end_time=sysdate(),
message='Cancelled manually'
where id=#request_id limit 1;
The $request_id is a reference to your bash variable. If you paste this into a double quoted bash string, it ought to work.
Since we are still pasting string together: If you escape single quotes with \' in $request_id, I think an attacker can't escape that. E.g.: request_id="${request_id//\'/\\\'}"

jq --arg variable used in quoted string within select()

I want to select() an object based on a string containing a jq variable ($ARCH) using -arg jq argument. Here's the use-case while looking for "/bin/linux/$ARCH/kubeadm" from Google...
# You may need to install `xml2json` IE
# sudo gem install --no-rdoc --no-ri xml2json and run the script I wrote to do the xml2json:
#!/usr/bin/ruby
# Written by Jim Conner
require 'xml2json'
xml = ARGV[0]
begin
if xml == '-'
xdata = ARGF.read.chomp
puts XML2JSON.parse(xdata)
else
puts XML2JSON.parse(File.read(file2parse).chomp)
end
rescue => e
$stderr.puts 'Unable to comply: %s' % [e.message]
end
Then run the following:
curl -sSL https://storage.googleapis.com/kubernetes-release/ > /var/tmp/k8s.xml | \
xml2json - | \
jq --arg ARCH amd64 '[.ListBucketResult.Contents[] | select(.Key | contains("/bin/linux/$arch/kubeadm"))]'
...which returns an empty set because jq doesn't transliterate inside quotes. I know I can get around this by using multiple select/contains() but I'd prefer not to if possible.
jq simply may not do it, but if someone knows a way to do it, I'd much appreciate it.
jq does support string interpolation, and in your case the string would be:
"/bin/linux/\($ARCH)/kubeadm"
Notice that this is not a JSON string: the occurrence of "\(" signals that the string is subject to interpolation. Very nifty.
(Alternatively, you could of course use string concatenation:
"/bin/linux/" + $ARCH + "/kubeadm")
Btw, you might wish to avoid contains here. Its semantics is (are?) quite complex and perhaps counter-intuitive. Consider using startswith, index, or (for regex matches) test.

Build a JSON string with Bash variables

I need to read these bash variables into my JSON string and I am not familiar with bash. any help is appreciated.
#!/bin/sh
BUCKET_NAME=testbucket
OBJECT_NAME=testworkflow-2.0.1.jar
TARGET_LOCATION=/opt/test/testworkflow-2.0.1.jar
JSON_STRING='{"bucketname":"$BUCKET_NAME"","objectname":"$OBJECT_NAME","targetlocation":"$TARGET_LOCATION"}'
echo $JSON_STRING
You are better off using a program like jq to generate the JSON, if you don't know ahead of time if the contents of the variables are properly escaped for inclusion in JSON. Otherwise, you will just end up with invalid JSON for your trouble.
BUCKET_NAME=testbucket
OBJECT_NAME=testworkflow-2.0.1.jar
TARGET_LOCATION=/opt/test/testworkflow-2.0.1.jar
JSON_STRING=$( jq -n \
--arg bn "$BUCKET_NAME" \
--arg on "$OBJECT_NAME" \
--arg tl "$TARGET_LOCATION" \
'{bucketname: $bn, objectname: $on, targetlocation: $tl}' )
You can use printf:
JSON_FMT='{"bucketname":"%s","objectname":"%s","targetlocation":"%s"}\n'
printf "$JSON_FMT" "$BUCKET_NAME" "$OBJECT_NAME" "$TARGET_LOCATION"
much clear and simpler
A possibility:
#!/bin/bash
BUCKET_NAME="testbucket"
OBJECT_NAME="testworkflow-2.0.1.jar"
TARGET_LOCATION="/opt/test/testworkflow-2.0.1.jar
# one line
JSON_STRING='{"bucketname":"'"$BUCKET_NAME"'","objectname":"'"$OBJECT_NAME"'","targetlocation":"'"$TARGET_LOCATION"'"}'
# multi-line
JSON_STRING="{
\"bucketname\":\"${BUCKET_NAME}\",
\"objectname\":\"${OBJECT_NAME}\",
\"targetlocation\":\"${TARGET_LOCATION}\"
}"
# [optional] validate the string is valid json
echo "${JSON_STRING}" | jq
In addition to chepner's answer, it's also possible to construct the object completely from args with this simple recipe:
BUCKET_NAME=testbucket
OBJECT_NAME=testworkflow-2.0.1.jar
TARGET_LOCATION=/opt/test/testworkflow-2.0.1.jar
JSON_STRING=$(jq -n \
--arg bucketname "$BUCKET_NAME" \
--arg objectname "$OBJECT_NAME" \
--arg targetlocation "$TARGET_LOCATION" \
'$ARGS.named')
Explanation:
--null-input | -n disabled reading input. From the man page: Don't read any input at all! Instead, the filter is run once using null as the input. This is useful when using jq as a simple calculator or to construct JSON data from scratch.
--arg name value passes values to the program as predefined variables: value is available as $name. All named arguments are also available as $ARGS.named
Because the format of $ARGS.named is already an object, jq can output it as is.
First, don't use ALL_CAPS_VARNAMES: it's too easy to accidentally overwrite a crucial shell variable (like PATH)
Mixing single and double quotes in shell strings can be a hassle. In this case, I'd use printf:
bucket_name=testbucket
object_name=testworkflow-2.0.1.jar
target_location=/opt/test/testworkflow-2.0.1.jar
template='{"bucketname":"%s","objectname":"%s","targetlocation":"%s"}'
json_string=$(printf "$template" "$BUCKET_NAME" "$OBJECT_NAME" "$TARGET_LOCATION")
echo "$json_string"
For homework, read this page carefully: Security implications of forgetting to quote a variable in bash/POSIX shells
A note on creating JSON with string concatenation: there are edge cases. For example, if any of your strings contain double quotes, you can broken JSON:
$ bucket_name='a "string with quotes"'
$ printf '{"bucket":"%s"}\n' "$bucket_name"
{"bucket":"a "string with quotes""}
Do do this more safely with bash, we need to escape that string's double quotes:
$ printf '{"bucket":"%s"}\n' "${bucket_name//\"/\\\"}"
{"bucket":"a \"string with quotes\""}
I had to work out all possible ways to deal json strings in a command request, Please look at the following code to see why using single quotes can fail if used incorrectly.
# Create Release and Tag commit in Github repository
# returns string with in-place substituted variables
json=$(cat <<-END
{
"tag_name": "${version}",
"target_commitish": "${branch}",
"name": "${title}",
"body": "${notes}",
"draft": ${is_draft},
"prerelease": ${is_prerelease}
}
END
)
# returns raw string without any substitutions
# single or double quoted delimiter - check HEREDOC specs
json=$(cat <<-!"END" # or 'END'
{
"tag_name": "${version}",
"target_commitish": "${branch}",
"name": "${title}",
"body": "${notes}",
"draft": ${is_draft},
"prerelease": ${is_prerelease}
}
END
)
# prints fully formatted string with substituted variables as follows:
echo "${json}"
{
"tag_name" : "My_tag",
"target_commitish":"My_branch"
....
}
Note 1: Use of single vs double quotes
# enclosing in single quotes means no variable substitution
# (treats everything as raw char literals)
echo '${json}'
${json}
echo '"${json}"'
"${json}"
# enclosing in single quotes and outer double quotes causes
# variable expansion surrounded by single quotes(treated as raw char literals).
echo "'${json}'"
'{
"tag_name" : "My_tag",
"target_commitish":"My_branch"
....
}'
Note 2: Caution with Line terminators
Note the json string is formatted with line terminators such as LF \n
or carriage return \r(if its encoded on windows it contains CRLF \r\n)
using (translate) tr utility from shell we can remove the line terminators if any
# following code serializes json and removes any line terminators
# in substituted value/object variables too
json=$(echo "$json" | tr -d '\n' | tr -d '\r' )
# string enclosed in single quotes are still raw literals
echo '${json}'
${json}
echo '"${json}"'
"${json}"
# After CRLF/LF are removed
echo "'${json}'"
'{ "tag_name" : "My_tag", "target_commitish":"My_branch" .... }'
Note 3: Formatting
while manipulating json string with variables, we can use combination of ' and " such as following, if we want to protect some raw literals using outer double quotes to have in place substirution/string interpolation:
# mixing ' and "
username=admin
password=pass
echo "$username:$password"
admin:pass
echo "$username"':'"$password"
admin:pass
echo "$username"'[${delimiter}]'"$password"
admin[${delimiter}]pass
Note 4: Using in a command
Following curl request already removes existing \n (ie serializes json)
response=$(curl -i \
--user ${username}:${api_token} \
-X POST \
-H 'Accept: application/vnd.github.v3+json' \
-d "$json" \
"https://api.github.com/repos/${username}/${repository}/releases" \
--output /dev/null \
--write-out "%{http_code}" \
--silent
)
So when using it for command variables, validate if it is properly formatted before using it :)
If you need to build a JSON representation where members mapped to undefined or empty variables should be ommited, then jo can help.
#!/bin/bash
BUCKET_NAME=testbucket
OBJECT_NAME=""
JO_OPTS=()
if [[ ! "${BUCKET_NAME}x" = "x" ]] ; then
JO_OPTS+=("bucketname=${BUCKET_NAME}")
fi
if [[ ! "${OBJECT_NAME}x" = "x" ]] ; then
JO_OPTS+=("objectname=${OBJECT_NAME}")
fi
if [[ ! "${TARGET_LOCATION}x" = "x" ]] ; then
JO_OPTS+=("targetlocation=${TARGET_LOCATION}")
fi
jo "${JO_OPTS[#]}"
The output of the commands above would be just (note the absence of objectname and targetlocation members):
{"bucketname":"testbucket"}
can be done following way:
JSON_STRING='{"bucketname":"'$BUCKET_NAME'","objectname":"'$OBJECT_NAME'","targetlocation":"'$TARGET_LOCATION'"}'
For Node.js Developer, or if you have node environment installed, you can try this:
JSON_STRING=$(node -e "console.log(JSON.stringify({bucketname: $BUCKET_NAME, objectname: $OBJECT_NAME, targetlocation: $TARGET_LOCATION}))")
Advantage of this method is you can easily convert very complicated JSON Object (like object contains array, or if you need int value instead of string) to JSON String without worrying about invalid json error.
Disadvantage is it's relying on Node.js environment.
These solutions come a little late but I think they are inherently simpler that previous suggestions (avoiding the complications of quoting and escaping).
BUCKET_NAME=testbucket
OBJECT_NAME=testworkflow-2.0.1.jar
TARGET_LOCATION=/opt/test/testworkflow-2.0.1.jar
# Initial unsuccessful solution
JSON_STRING='{"bucketname":"$BUCKET_NAME","objectname":"$OBJECT_NAME","targetlocation":"$TARGET_LOCATION"}'
echo $JSON_STRING
# If your substitution variables have NO whitespace this is sufficient
JSON_STRING=$(tr -d [:space:] <<JSON
{"bucketname":"$BUCKET_NAME","objectname":"$OBJECT_NAME","targetlocation":"$TARGET_LOCATION"}
JSON
)
echo $JSON_STRING
# If your substitution variables are more general and maybe have whitespace this works
JSON_STRING=$(jq -c . <<JSON
{"bucketname":"$BUCKET_NAME","objectname":"$OBJECT_NAME","targetlocation":"$TARGET_LOCATION"}
JSON
)
echo $JSON_STRING
#... A change in layout could also make it more maintainable
JSON_STRING=$(jq -c . <<JSON
{
"bucketname" : "$BUCKET_NAME",
"objectname" : "$OBJECT_NAME",
"targetlocation" : "$TARGET_LOCATION"
}
JSON
)
echo $JSON_STRING
To build upon Hao's answer using NodeJS: you can split up the lines, and use the -p option which saves having to use console.log.
JSON_STRING=$(node -pe "
JSON.stringify({
bucketname: process.env.BUCKET_NAME,
objectname: process.env.OBJECT_NAME,
targetlocation: process.env.TARGET_LOCATION
});
")
An inconvenience is that you need to export the variables beforehand, i.e.
export BUCKET_NAME=testbucket
# etc.
Note: You might be thinking, why use process.env? Why not just use single quotes and have bucketname: '$BUCKET_NAME', etc so bash inserts the variables? The reason is that using process.env is safer - if you don't have control over the contents of $TARGET_LOCATION it could inject JavaScript into your node command and do malicious things (by closing the single quote, e.g. the $TARGET_LOCATION string contents could be '}); /* Here I can run commands to delete files! */; console.log({'a': 'b. On the other hand, process.env takes care of sanitising the input.
You could use envsubst:
export VAR="some_value_here"
echo '{"test":"$VAR"}' | envsubst > json.json
also it might be a "template" file:
//json.template
{"var": "$VALUE", "another_var":"$ANOTHER_VALUE"}
So after you could do:
export VALUE="some_value_here"
export ANOTHER_VALUE="something_else"
cat json.template | envsubst > misha.json
For a general case of building JSON from bash with arbitrary inputs, many of the previous responses (even the high voted ones with jq) omit cases when the variables contain " double quote, or \n newline escape string, and you need complex string concatenation of the inputs.
When using jq you need to printf %b the input first to get the \n converted to real newlines, so that once you pass through jq you get \n back and not \\n.
I found this with version with nodejs to be quite easy to reason about if you know javascript/nodejs well:
TITLE='Title'
AUTHOR='Bob'
JSON=$( TITLE="$TITLE" AUTHOR="$AUTHOR" node -p 'JSON.stringify( {"message": `Title: ${process.env.TITLE}\n\nAuthor: ${process.env.AUTHOR}`} )' )
It's a bit verbose due to process.env. but allows to properly pass the variables from shell, and then format things inside (nodejs) backticks in a safe way.
This outputs:
printf "%s\n" "$JSON"
{"message":"Title: Title\n\nAuthor: Bob"}
(Note: when having a variable with \n always use printf "%s\n" "$VAR" and not echo "$VAR", whose output is platform-dependent! See here for details)
Similar thing with jq would be
TITLE='Title'
AUTHOR='Bob'
MESSAGE="Title: ${TITLE}\n\nAuthor: ${AUTHOR}"
MESSAGE_ESCAPED_FOR_JQ=$(printf %b "${MESSAGE}")
JSON=$( jq '{"message": $jq_msg}' --arg jq_msg "$MESSAGE_ESCAPED_FOR_JQ" --null-input --compact-output --raw-output --monochrome-output )
(the last two params are not necessary when running in a subshell, but I just added them so that the output is then same when you run the jq command in a top-level shell).
Bash will not insert variables into a single-quote string. In order to get the variables bash needs a double-quote string.
You need to use double-quote string for the JSON and just escape double-quote characters inside JSON string.
Example:
#!/bin/sh
BUCKET_NAME=testbucket
OBJECT_NAME=testworkflow-2.0.1.jar
TARGET_LOCATION=/opt/test/testworkflow-2.0.1.jar
JSON_STRING="{\"bucketname\":\"$BUCKET_NAME\",\"objectname\":\"$OBJECT_NAME\",\"targetlocation\":\"$TARGET_LOCATION\"}"
echo $JSON_STRING
if you have node.js and get minimist installed in global:
jc() {
node -p "JSON.stringify(require('minimist')(process.argv), (k,v) => k=='_'?undefined:v)" -- "$#"
}
jc --key1 foo --number 12 --boolean \
--under_score 'abc def' --'white space' ' '
# {"key1":"foo","number":12,"boolean":true,"under_score":"abc def","white space":" "}
you can post it with curl or what:
curl --data "$(jc --type message --value 'hello world!')" \
--header 'content-type: application/json' \
http://server.ip/api/endpoint
be careful that minimist will parse dot:
jc --m.room.member #gholk:ccns.io
# {"m":{"room":{"member":"#gholk:ccns.io"}}}
Used this for AWS Macie configuration:
JSON_CONFIG=$( jq -n \
--arg bucket_name "$BUCKET_NAME" \
--arg kms_key_arn "$KMS_KEY_ARN" \
'{"s3Destination":{"bucketName":$bucket_name,"kmsKeyArn":$kms_key_arn}}'
)
aws macie2 put-classification-export-configuration --configuration "$JSON_CONFIG"
You can simply make a call like this to print the JSON.
#!/bin/sh
BUCKET_NAME=testbucket
OBJECT_NAME=testworkflow-2.0.1.jar
TARGET_LOCATION=/opt/test/testworkflow-2.0.1.jar
echo '{ "bucketName": "'"$BUCKET_NAME"'", "objectName": "'"$OBJECT_NAME"'", "targetLocation": "'"$TARGET_LOCATION"'" }'
or
JSON_STRING='{ "bucketName": "'"$BUCKET_NAME"'", "objectName": "'"$OBJECT_NAME"'", "targetLocation": "'"$TARGET_LOCATION"'" }'
echo $JOSN_STRING

How can I regex ,, to ,\N, in my CSVs so that mysqlimport understands them?

Say I have a normal CSV like
# helloworld.csv
hello,world,,,"please don't replace quoted stuff like ,,",,
If I want mysqlimport to understand that some of those fields are NULL, then I need:
# helloworld.mysql.csv
hello,world,\N,\N,"please don't replace quoted stuff like ,,",\N,\N
I got some help from another question -- Why does sed not replace overlapping patterns -- but note the problem:
$ perl -pe 'while (s#,,#,\\N,#) {}' -pe 's/,$/,\\N/g' helloworld.csv
hello,world,\N,\N,"please don't replace quoted stuff like ,\N,",\N,\N
^^
How can I write the regex so it doesn't replace ,, if they're between quotes?
FINAL ANSWER
Here's the final perl I used, thanks to the accepted answer below:
perl -pe 's/^,/\\N,/; while (s/,(?=,)(?=(?:[^"]*"[^"]*")*[^"]*$)/,\\N/g) {}; s/,$/,\\N/' helloworld.csv
That takes care of leading, trailing, and unquoted empty strings.
Why not use Text::CSV? You can parse the file with it and then use map to replace empty fields with '\N', e.g.
use Text::CSV;
my $csv = Text::CSV->new({ binary => 1 }) or die Text::CSV->error_diag();
$csv->parse($line); # parse a CSV string into fields
my #fields = $csv->fields(); # get the parsed fields
#fields = map { $_ eq "" ? '\N' : $_ } #fields;
$csv->combine(#fields); # combine fields into a string
Assuming that you won't have escaped quotes, you can make sure that you only replace ,, if it's followed by an even number of quotes:
$subject =~
s/, # Match ,
(?=,) # only if followed by another ,
(?= # and only if followed by...
(?: # the following group:
[^"]*" # any number of non-quote characters, followed by one quote
[^"]*" # the same thing again (even number!)
)* # any number of times, followed by
[^"]* # any number of non-quotes until...
$ # end of string.
) # End of lookahead assertion
/,\N/x
g;
Input:
foo,,bar,,,baz,"foo,,,oof",zap,,zip
Output:
foo,\N,bar,\N,\N,baz,"foo,,,oof",zap,\N,zip