Mysql LOAD DATA from Powershell with variable - mysql

I try to insert the data from a csv file into a mysql database using a powershell script. When using a (dummy) variable in the LOAD DATA query I run into troubles.
Reproducible example:
Create a Mysql database and table with
CREATE DATABASE loadfiletest;
USE loadfiletest;
CREATE TABLE testtable (field1 INT, field2 INT DEFAULT 0);
Create a csv file named loadfiletestdata.csv containing
1,3
2,4
Create the powershell script (don't forget to change the db password and possibly the username)
[system.reflection.assembly]::LoadWithPartialName("MySql.Data")
$mysqlConn = New-Object -TypeName MySql.Data.MySqlClient.MySqlConnection
$mysqlConn.ConnectionString = "SERVER=localhost;DATABASE=loadfiletest;UID=root;PWD=pwd"
$mysqlConn.Open()
$MysqlQuery = New-Object -TypeName MySql.Data.MySqlClient.MySqlCommand
$MysqlQuery.Connection = $mysqlConn
$MysqlQuery.CommandText = "LOAD DATA LOCAL INFILE 'C:/path/to/files/loadfiletestdata.csv' INTO TABLE loadfiletest.testtable FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '""' LINES TERMINATED BY '\r\n' (field1, field2)"
$MysqlQuery.ExecuteNonQuery()
Put everything in the folder C:/path/to/files/ (should also be your path in the powershell script) and run the script. This populates the table testtable with
field1 field2
1 3
2 4
as one would expect. This implies that quotes and such are like they should be. Each time the script is executed, those values are inserted in the table. Now, when I replace in the one but last line of the powershell script (field1, field2) by (field1, #dummy), I would expect that the values
field1 field2
1 0
2 0
are inserted into the table. However, I receive the error
Exception calling "ExecuteNonQuery" with "0" argument(s): "Fatal error encountered during command execution."
At C:\path\to\files\loadfiletest.ps1:8 char:1
+ $queryOutput = $MysqlQuery.ExecuteNonQuery()
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [], MethodInvocationException
+ FullyQualifiedErrorId : MySqlException
When running the query with #dummy from a mysql client it works. Also the syntax looks the same to me as what can be found in the mysql manual (somewhere in the middle of the page, look for #dummy).
A few further experiment that I did, suggest that any LOAD DATA query containing a variable #whatever gives the error.
So the questions:
Why doesn't it work?
Is there a way to execute a LOAD DATA query with (dummy) variables from powershell?
If not, is there an elegant workaround?
Obvious workarounds are creating an intermediate csv file according to the layout of the table or creating an intermediate table matching the layout of the csv file. However that seems ugly and cumbersome for something that imho should "just work".
Note: The present question is a follow up and generalization of this question. I chose to start a new one since replacing the old content would make the answers already given obsolete and adding the content of this question would make the old question veeeeery long and full of useless sidetracks.

I know this is old, but I had the same problem and I found the solution here:
http://blog.tjitjing.com/index.php/2009/05/mysqldatamysqlclientmysqlexception-parameter-id-must-be-defined.html
Quoting from the above blog:
"Starting from version 5.2.2 of the Connector you should add the Allow User Variables=True Connection String Setting in order to use User Defined Variables in your SQL statements.
Example of Connection String:
Database=testdb;Data Source=localhost;User Id=root;Password=hello;Allow User Variables=True"
Thank you for down-voting my answer.

Related

Parametric query when using 'load data infile'

I use parametric queries for normal insert/updates for security.
How do I do that for queries like this:
LOAD DATA INFILE '/filepath' INTO TABLE mytable
In my case, the path to the file would be different everytime (for different requests). Is it fine to proceed like this (since I am not getting any data from outside, the file is from the server itself):
path = /filepath
"LOAD DATA INFILE" + path + "INTO TABLE mytable"
Since LOAD DATA is not listed in SQL Syntax Allowed in Prepared Statements you can't prepare something like
LOAD DATA INFILE ? INTO TABLE mytable
But SET is listed. So a workaround could be to prepare and execute
SET #filepath = ?
And then execute
LOAD DATA INFILE #filepath INTO TABLE mytable
Update
In Python with MySQLdb the following query should work
LOAD DATA INFILE %s INTO TABLE mytable
since no prepared statement is used.
To answer your "is it fine to proceed like this" question, your example code will fail because the resulting query will be missing quotes around the filename. If you changed it to the following it could run, but is still a bad idea IMO:
path = "/filepath"
sql = "LOAD DATA INFILE '" + path + "' INTO TABLE mytable" # note the single quotes
While you may not be accepting outside input today, code has a way of sticking around and getting reused/copied, so you should use the API in a way that will escape your parameters:
sql = "LOAD DATA INFILE %s INTO TABLE mytable"
cursor.execute(sql, (path,))
And don't forget to commit if autocommit is not enabled.

PostgreSQL multiple CSV import and add filename to each column

I've got 200k csv files and I need to import them all to a single postgresql table. It's a list of parameters from various devices and each csv's file name contains device's serial number and I need it to be in one of the colums for each row.
So to simplify, I've got few columns of data (no headers), let's say that columns in each csv file are: Date, Variable, Value and file name contains SERIALNUMBER_and_someOtherStuffIDontNeed.csv
I'm trying to use cygwin to write a bash script to iterate over files and do it for me, however for some reason it won't work, showing 'syntax error at or near "as" '
Here's my code:
#!/bin/bash
FILELIST=/cygdrive/c/devices/files/*
for INPUT_FILE in $FILELIST
do
psql -U postgres -d devices -c "copy devicelist
(
Date,
Variable,
Value,
SN as CURRENT_LOAD_SOURCE(),
)
from '$INPUT_FILE
delimiter ',' ;"
done
I'm learning SQL so it might be an obvious mistake, but I can't see it.
Also I know that in that form I will get full file name, not just the serial number bit I want but I can probably handle that somehow later.
Please advise.
Thanks.
I dont think there is a CURRENT_LOAD_SOURCE() function in postgres. A work-around is to leave the name-column NULL on copy, and patch is to the desired value just after the copy. I prefer a shell here-document because that make quoting inside the SQL body easier. (BTW: for 10K of files, the globbing needed to obtain FILELIST might exceed argmax for the shell ...)
#!/bin/bash
FILELIST="`ls /tmp/*.c`"
for INPUT_FILE in $FILELIST
do
echo "File:" $INPUT_FILE
psql -U postgres -d devices <<OMG
-- I have a schema "tmp" for testing purposes
CREATE TABLE IF NOT EXISTS tmp.filelist(name text, content text);
COPY tmp.filelist ( content)
from '/$INPUT_FILE' delimiter ',' ;
UPDATE tmp.filelist SET name = '$FILELIST'
WHERE name IS NULL;
OMG
done
For anyone interested in an answer, I've used a python script to change file names and then another script using psycopg2 to connect to the database and then done everyting in one connection. Took 10 minutes instead of 10 hours.
Here's the code:
Renaming files (also apparently to import from CSV you need all the rows to be filled and the information I needed was in first 4 columns anyway, therefore I've put together a solution to generate whole new CSVs instead of just renaming them):
import os
import csv
path='C:/devices/files'
os.chdir(path)
i=0
for file in os.listdir(path):
try:
i+=1
if i%10000 == 0:
#just to see the progress
print(i)
serial_number = (file[:8])
creader = csv.reader(open(file))
cwriter = csv.writer(open('processed_'+file, 'w'))
for cline in creader:
new_line = [val for col, val in enumerate(cline) if col not in (4, 5, 6, 7)]
new_line.insert(0, serial_number)
#print(new_line)
cwriter.writerow(new_line)
except:
print('problem with file: ' + file)
pass
Updating database:
import os
import psycopg2
path="C:\\devices\\files"
directory_listing = os.listdir(path)
conn = psycopg2.connect("dbname='devices' user='postgres' host='localhost'")
cursor = conn.cursor()
print(len(directory_listing))
i=100001
while i < 218792:
current_file=(directory_listing[i])
i+=1
full_path = "C:/devices/files/" + current_file
with open(full_path) as f:
cursor.copy_from(file=f, table='devicelistlive', sep=",")
conn.commit()
conn.close()
Don't mind while and weird numbers, it's just because I was doing it in portions for testing purposes. Can easily be replaced with for

MySQL File or Directory not Found ODBC

I am writing a program which deals with data transformations via MySQL and it deals with big files.
I made a question earlier about another issue I was having, while I was trying out someone's answer I got the following error
[MySQL][ODBC 5.3(a) Driver][mysqld-5.5.5-10.1.9-MariaDB]File 'C:\xampp\mysql\data\ingram\' not found (Errcode: 2 "No such file or directory")
I am certain that directory exists and when I change the code to its original state it works perfectly.
What is going on there?
This is the piece of code that gives me the problem
Cmd.CommandText = String.Format("LOAD DATA INFILE ""{0}"" IGNORE INTO TABLE libros_nueva FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '""' ESCAPED BY '""' LINES TERMINATED BY '\r\n';", filepath)
Cmd.Execute()
Any help will be appreciated!
Given the salient portion of the error message:
File 'C:\xampp\mysql\data\ingram\' not found (Errcode: 2 "No such file or directory")
I am pretty sure you are passing just a path when a full path and file name are required. There is certainly no file name in the path it echoed back.
Can you please explain it [MySqlBulkLoader] to me?
Another way to import is to use MySqlBulkLoader from the MySql.Data.MySqlClient namespace:
' columns in the order they appear in the CSV file:
Dim cols As String() = {"Name", "Descr", "`Group`", "ValueA",
"Bird", "Fish", "zDate", "Color", "Active"}
Dim csvFile As String = "C:\Temp\mysqlImport.csv"
Dim rows As Int32
Using dbcon As New MySqlConnection(MySQLConnStr)
Dim bulk = New MySqlBulkLoader(dbcon)
bulk.TableName = "importer"
bulk.FieldTerminator = "," ' this is a CSV
bulk.LineTerminator = "\r\n" ' == CR/LF
bulk.FileName = csvFile ' full file path name to CSV
bulk.NumberOfLinesToSkip = 0 ' has a header?
bulk.Columns.Clear()
For Each s In cols
bulk.Columns.Add(s) ' tell MySQL the order
Next
rows = bulk.Load() ' Make it so.
End Using
Times to import 100k rows: 3619, 2719 and 2987 ms. There is also a LoadAsync method which may be of interest given your last question.
If there are data transforms to do before the insert, CSVHelper can provide an easy way to load records so you can do whatever needs to be done, then use normal SQL Inserts to update the DB.
Part of this answer shows using CSVHelper to import into Access in batches of 50k and which was pretty fast.

use of DECLARE for MySQL LOAD DATA statement

I'm trying to run this query from a .Net application
LOAD DATA LOCAL INFILE 'testsFile.txt'
INTO TABLE Test
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(idTest, SampleID, Analyst, #Analysed, Device, Comments, #TotalRUL, #RULOne, #RULTwo, #RULThree, #RULFour, Uploaded)
SET
Analysed = nullif(#Analysed,''),
TotalRUL = nullif(#TotalRUL,''),
RULOne = nullif(#RULOne,''),
RULTwo = nullif(#RULTwo,''),
RULThree = nullif(#RULThree,''),
RULFour = nullif(#RULFour,'')
When I run this query from MySQL Workbench everything works fine, but when I use my .net application to run the query I get the following exception:
Parameter '#Analysed' must be defined.
I don't think I can use a declare statement outside of a stored procedure and I cant use a stored procedure due to my use of the LOAD DATA statement
What to do? Is this checkmate?
Sure you can't. If your query works with Workbench, this sounds like a .net bug.
I suggest you try "stupid" solutions like using backticks (after the # and after Analyzed... sorry, Stack Overflows autoformatting doesnt allow me to show you what I mean) or changing the variable's name.
How to use MySQL user-variables with ADO.NET
seems to have the answer to this
I needed to add "allow user variables" to the connection string

Why does my INSERT sometimes fail with "no such field"?

I've been using the following snippet in developements for years. Now all of a sudden I get a DB Error: no such field warning
$process = "process";
$create = $connection->query
(
"INSERT INTO summery (process) VALUES($process)"
);
if (DB::isError($create)) die($create->getMessage($create));
but it's fine if I use numerics
$process = "12345";
$create = $connection->query
(
"INSERT INTO summery (process) VALUES($process)"
);
if (DB::isError($create)) die($create->getMessage($create));
or write the value directly into the expression
$create = $connection->query
(
"INSERT INTO summery (process) VALUES('process')"
);
if (DB::isError($create)) die($create->getMessage($create));
I'm really confused ... any suggestions?
It's always better to use prepared queries and parameter placeholders. Like this in Perl DBI:
my $process=1234;
my $ins_process = $dbh->prepare("INSERT INTO summary (process) values(?)");
$ins_process->execute($process);
For best performance, prepare all your often-used queries right after opening the database connection. Many database engines will store them on the server during the session, much like small temporary stored procedures.
Its also very good for security. Writing the value into an insert string yourself means that you must write the correct escape code at each SQL statement. Using a prepare and execute style means that only one place (execute) needs to know about escaping, if escaping is even necessary.
Ditto what Zan Lynx said about placeholders. But you may still be wondering why your code failed.
It appears that you forgot a crucial detail from the previous code that worked for you for years: quotes.
This (tested) code works fine:
my $thing = 'abcde';
my $sth = $dbh->prepare("INSERT INTO table1 (id,field1)
VALUES (3,'$thing')");
$sth->execute;
But this next code (lacking the quotation marks in the VALUES field just as your first example does) produces the error you report because VALUES (3,$thing) resolves to VALUES (3,abcde) causing your SQL server to look for a field called abcde and there is no field by that name.
my $thing = 'abcde';
my $sth = $dbh->prepare("INSERT INTO table1 (id,field1)
VALUES (3,$thing)");
$sth->execute;
All of this assumes that your first example is not a direct quote of code that failed as you describe and therefore not what you intended. It resolves to:
"INSERT INTO summery (process) VALUES(process)"
which, as mentioned above causes your SQL server to read the item in the VALUES set as another field name. As given, this actually runs on MySQL without complaint and will fill the field called 'process' with NULL because that's what the field called 'process' contained when MySQL looked there for a value as it created the new record.
I do use this style for quick throw-away hacks involving known, secure data (e.g. a value supplied within the program itself). But for anything involving data that comes from outside the program or that might possibly contain other than [0-9a-zA-Z] it will save you grief to use placeholders.