Dynamic Multi Insert with DBI placeholders for many sets of VALUES - mysql

I'm building a dynamic SQL statement, that will insert one or more sets of VALUES via a prepared DBI statement, my question is this:
Since I have a dynamic number of VALUES sets, and I will need to add as many ( ?, ?, ?),( ?, ?, ?) etc as necessary to extend the statement INSERT INTO `tblname` ( $columnsString ) VALUES in order to submit only one query using placeholders and bind values- is this the preferred method(most efficient, etc., - reasoning behind efficiency would be helpful in your answer if possible) or should I just be building this as a query string with sprintf and dbh->quote()?
(As a little extra information: I'm actually using AnyEvent::DBI right now, which only exposes placeholders & bind values and not the quote() method so this wouldn't be easy for me to accomplish without creating another straight DBI $dbh and using another db server connection just to use the quote() method, or without altering the AnyEvent::DBI module myself.)
Normally I would just execute the statements as necessary but in this heavy workload case I'm trying to batch inserts together for some DB efficiency.
Also, if anyone could answer if it is possible( and then how to ) insert an sql DEFAULT value using placeholders and bind values that'd be awesome. Typically if I ever needed to do that I'd append the DEFAULTs to the string directly and use sprintf and $dbh->quote() only for the non DEFAULT values.
UPDATE:
Worked out the misunderstanding in a quick chat. User ikegami suggested that instead of building the query string myself without placeholders, that I just intermingle VALUES and placeholders such as:
$queryString .= '(DEFAULT,?,?),(DEFAULT,DEFAULT,DEFAULT)';
Some of the reasoning behind my first asking of this question on SO was because I was somewhat against this intermingling due to my thought that it made the code less readable, though after being assured that sql 'DEFAULT' couldn't be in a placeholder bind value, this was the method I had begun implementing.
Using placeholders where possible does seem to be the more accepted method of building queries, and if you want an SQL DEFAULT you just need to include it in the same query building as the placeholders. This does not apply to NULL values, as those CAN be inserted with placeholders and a bind value of undef.
Update 2:
The reasoning I asked about performance, the 'acceptance' of building your own query with quote() vs building with placeholders, and why I've gone with a solution that involves using all columns for the SQL INSERT INTO tblname (cols) is because I have roughly 2-4 million rows a day going into a terrible db server, and my code is running on an equally terrible server. With my requirements of needing DEFAULT sql values, and these terrible performance constraints, I've chosen a solution for now.
For future devs who stumble upon this - take a look at #emazep's solution of using SQL::Abstract, or if for some reason you need to build your own, you might consider either using #Schwern's subroutine solution or possibly incorporating some of #ikegami's answer into it as these are all great answers as to the 'Current state of affairs' regarding the usage of DBI and building dynamic queries.

Unless there is a specific reason to reinvent the wheel (there could be some), SQL::Abstract (among others) has already solved the problem of dynamic SQL generation for all of us:
my %data = (
name => 'Jimbo Bobson',
phone => '123-456-7890',
address => '42 Sister Lane',
city => 'St. Louis',
state => 'Louisiana'
);
use SQL::Abstract;
my ($stmt, #bind)
= SQL::Abstract->new->insert('people', \%data);
print $stmt, "\n";
print join ', ', #bind;
which prints:
INSERT INTO people ( address, city, name, phone, state)
VALUES ( ?, ?, ?, ?, ? )
42 Sister Lane, St. Louis, Jimbo Bobson, 123-456-7890, Louisiana
SQL::Abstract then offers a nice trick to iterate over many rows to insert without regenerating the SQL every time, but for bulk inserts there is also SQL::Abstract::Plugin::InsertMulti
use SQL::Abstract;
use SQL::Abstract::Plugin::InsertMulti;
my ($stmt, #bind)
= SQL::Abstract->new->insert_multi( 'people', [
{ name => 'foo', age => 23 },
{ name => 'bar', age => 40 },
]);
# INSERT INTO people ( age, name ) VALUES ( ?, ? ), ( ?, ? )
# 23, foo, 40, bar

I have, on occasion, used a construct like:
#!/usr/bin/env perl
use strict; use warnings;
# ...
my #columns = ('a' .. 'z');
my $sql = sprintf(q{INSERT INTO sometable (%s) VALUES (%s)},
join(',', map $dbh->quote($_), #columns),
join(',', ('?') x #columns),
);
As for handling DEFAULT, wouldn't leaving that column out ensure that the DB sets it to the default value?

If you would use placeholders for "static" queries, you should use them for "dynamic" queries too. A query is a query.
my $stmt = 'UPDATE Widget SET foo=?'
my #params = $foo;
if ($set_far) {
$stmt .= ', far=?';
push #params, $far;
}
{
my #where;
if ($check_boo) {
push #where, 'boo=?';
push #params, $boo;
}
if ($check_bar) {
push #where, 'bar=?';
push #params, $bar;
}
$stmt .= ' WHERE ' . join ' AND ', map "($_)", #where
if #where;
}
$dbh->do($stmt, undef, #params);
I used an UPDATE since it allowed me to demonstrate more, but everything applies to INSERT too.
my #fields = ('foo');
my #params = ($foo);
if ($set_far) {
push #fields, 'bar';
push #params, $far;
}
$stmt = 'INSERT INTO Widget ('
. join(',', #fields)
. ') VALUES ('
. join(',', ('?')x#fields)
. ')';
$dbh->do($stmt, undef, #params);

You've expressed concerns about the readability of the code and also being able to pass in a DEFAULT. I'll take #ikegami's answer one step further...
sub insert {
my($dbh, $table, $fields, $values) = #_;
my $q_table = $dbh->quote($table);
my #q_fields = map { $dbh->quote($_) } #$fields;
my #placeholders = map { "?" } #q_fields;
my $sql = qq{
INSERT INTO $q_table
( #{[ join(', ', #q_fields) ]} )
VALUES ( #{[ join(', ', #placeholders ]} )
};
return $dbh->do($sql, undef, #$values);
}
Now you have a generic multi value insert routine.
# INSERT INTO foo ('bar', 'baz') VALUES ( 23, 42 )
insert( $dbh, "foo", ['bar', 'baz'], [23, 43] );
To indicate a default value, don't pass in that column.
# INSERT INTO foo ('bar') VALUES ( 23 )
# 'baz' will use its default
insert( $dbh, "foo", ['bar'], [23] );
You can optimize this to make your subroutine do multiple inserts with one subroutine call and one prepared statement saving CPU on the client side (and maybe some on the database side if it supports prepared handles).
sub insert {
my($dbh, $table, $fields, #rows) = #_;
my $q_table = $dbh->quote($table);
my #q_fields = map { $dbh->quote($_) } #$fields;
my #placeholders = map { "?" } #q_fields;
my $sql = qq{
INSERT INTO $q_table
( #{[ join(', ', #q_fields) ]} )
VALUES ( #{[ join(', ', #placeholders ]} )
};
my $sth = $dbh->prepare_cached($sql);
for my $values (#rows) {
$sth->execute(#$values);
}
}
# INSERT INTO foo ('bar', 'baz') VALUES ( 23, 42 )
# INSERT INTO foo ('bar', 'baz') VALUES ( 99, 12 )
insert( $dbh, "foo", ['bar', 'baz'], [23, 43], [99, 12] );
Finally, you can write a bulk insert passing in multiple values in a single statement. This is probably the most efficient way to do large groups of inserts. This is where having a fixed set of columns and passing in a DEFAULT marker comes in handy. I've employed the idiom where values passed as scalar references are treated as raw SQL values. Now you have the flexibility to pass in whatever you like.
sub insert {
my($dbh, $table, $fields, #rows) = #_;
my $q_table = $dbh->quote($table);
my #q_fields = map { $dbh->quote($_) } #$fields;
my $sql = qq{
INSERT INTO $q_table
( #{[ join(', ', #q_fields) ]} )
VALUES
};
# This would be more elegant building an array and then joining it together
# on ",\n", but that would double the memory usage and there might be
# a lot of values.
for my $values (#rows) {
$sql .= "( ";
# Scalar refs are treated as bare SQL.
$sql .= join ", ", map { ref $value ? $$_ : $dbh->quote($_) } #$values;
$sql .= "),\n";
}
$sql =~ s{,\n$}{};
return $dbh->do($sql);
}
# INSERT INTO foo ('bar', 'baz') VALUES ( 23, NOW ), ( DEFAULT, 12 )
insert( $dbh, "foo", ['bar', 'baz'], [23, \"NOW"], [\"DEFAULT", 12] );
The down side is this builds a string in memory, possibly very large. To get around that you have to involve database specific bulk insert from file syntax.
Rather than writing all this SQL generation stuff yourself, go with #emazep's answer and use SQL::Abstract and SQL::Abstract::Plugin::InsertMulti.
Just make sure you profile.

Related

Symfony3 : How to do a massive import from a CSV file as fast as possible?

I have a .csv file with more than 690 000 rows.
I found a solution to import data that works very well but it's a little bit slow... (around 100 records every 3 seconds = 63 hours !!).
How can I improve my code to make it faster ?
I do the import via a console command.
Also, I would like to import only prescribers that aren't already in database (to save time). To complicate things, no field is really unique (except for id).
Two prescribers can have the same lastname, firstname, live in the same city and have the same RPPS and professional codes. But, it's the combination of these 6 fields which makes them unique !
That's why I check on every field before create a new one.
<?php
namespace AppBundle\Command;
use Symfony\Bundle\FrameworkBundle\Command\ContainerAwareCommand;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Output\OutputInterface;
use Symfony\Component\Console\Helper\ProgressBar;
use AppBundle\Entity\Prescriber;
class PrescribersImportCommand extends ContainerAwareCommand
{
protected function configure()
{
$this
// the name of the command (the part after "bin/console")
->setName('import:prescribers')
->setDescription('Import prescribers from .csv file')
;
}
protected function execute(InputInterface $input, OutputInterface $output)
{
// Show when the script is launched
$now = new \DateTime();
$output->writeln('<comment>Start : ' . $now->format('d-m-Y G:i:s') . ' ---</comment>');
// Import CSV on DB via Doctrine ORM
$this->import($input, $output);
// Show when the script is over
$now = new \DateTime();
$output->writeln('<comment>End : ' . $now->format('d-m-Y G:i:s') . ' ---</comment>');
}
protected function import(InputInterface $input, OutputInterface $output)
{
$em = $this->getContainer()->get('doctrine')->getManager();
// Turning off doctrine default logs queries for saving memory
$em->getConnection()->getConfiguration()->setSQLLogger(null);
// Get php array of data from CSV
$data = $this->getData();
// Start progress
$size = count($data);
$progress = new ProgressBar($output, $size);
$progress->start();
// Processing on each row of data
$batchSize = 100; # frequency for persisting the data
$i = 1; # current index of records
foreach($data as $row) {
$p = $em->getRepository('AppBundle:Prescriber')->findOneBy(array(
'rpps' => $row['rpps'],
'lastname' => $row['nom'],
'firstname' => $row['prenom'],
'profCode' => $row['code_prof'],
'postalCode' => $row['code_postal'],
'city' => $row['ville'],
));
# If the prescriber doest not exist we create one
if(!is_object($p)){
$p = new Prescriber();
$p->setRpps($row['rpps']);
$p->setLastname($row['nom']);
$p->setFirstname($row['prenom']);
$p->setProfCode($row['code_prof']);
$p->setPostalCode($row['code_postal']);
$p->setCity($row['ville']);
$em->persist($p);
}
# flush each 100 prescribers persisted
if (($i % $batchSize) === 0) {
$em->flush();
$em->clear(); // Detaches all objects from Doctrine!
// Advancing for progress display on console
$progress->advance($batchSize);
$progress->display();
}
$i++;
}
// Flushing and clear data on queue
$em->flush();
$em->clear();
// Ending the progress bar process
$progress->finish();
}
protected function getData()
{
// Getting the CSV from filesystem
$fileName = 'web/docs/prescripteurs.csv';
// Using service for converting CSV to PHP Array
$converter = $this->getContainer()->get('app.csvtoarray_converter');
$data = $converter->convert($fileName);
return $data;
}
}
EDIT
According to #Jake N answer, here is the final code.
It's very very faster ! 10 minutes to import 653 727 / 693 230 rows (39 503 duplicate items!)
1) Add two columns in my table : created_at and updated_at
2) Add a single index of type UNIQUE on every column of my table (except id and dates) to prevent duplicate items with phpMyAdmin.
3) Add ON DUPLICATE KEY UPDATE in my query, to update just the updated_at column.
foreach($data as $row) {
$sql = "INSERT INTO prescripteurs (rpps, nom, prenom, code_prof, code_postal, ville)
VALUES(:rpps, :nom, :prenom, :codeprof, :cp, :ville)
ON DUPLICATE KEY UPDATE updated_at = NOW()";
$stmt = $em->getConnection()->prepare($sql);
$r = $stmt->execute(array(
'rpps' => $row['rpps'],
'nom' => $row['nom'],
'prenom' => $row['prenom'],
'codeprof' => $row['code_prof'],
'cp' => $row['code_postal'],
'ville' => $row['ville'],
));
if (!$r) {
$progress->clear();
$output->writeln('<comment>An error occured.</comment>');
$progress->display();
} elseif (($i % $batchSize) === 0) {
$progress->advance($batchSize);
$progress->display();
}
$i++;
}
// Ending the progress bar process
$progress->finish();
1. Don't use Doctrine
Try to not use Doctrine if you can, it eats memory and as you have found is slow. Try and use just raw SQL for the import with simple INSERT statements:
$sql = <<<SQL
INSERT INTO `category` (`label`, `code`, `is_hidden`) VALUES ('Hello', 'World', '1');
SQL;
$stmt = $this->getDoctrine()->getManager()->getConnection()->prepare($sql);
$stmt->execute();
Or you can prepare the statement with values:
$sql = <<<SQL
INSERT INTO `category` (`label`, `code`, `is_hidden`) VALUES (:label, :code, :hidden);
SQL;
$stmt = $this->getDoctrine()->getManager()->getConnection()->prepare($sql);
$stmt->execute(['label' => 'Hello', 'code' => 'World', 'hidden' => 1);
Untested code, but it should get you started as this is how I have done it before.
2. Index
Also, for your checks, have you got an index on all those fields? So that the lookup is as quick as possible.

Inserting several "new" items into the database with DBIC

I'm working in a bioinformatics project that requires me to read genomic data (nothing too fancy, just think of it as strings) from various organisms and insert it into a database. Each read belongs to one organism, and can contain from 5000 to 50000 thousand genes, which I need to process and analyze prior to storage.
The script currently doing this is written in perl and, after all calculations, stores the results in a hash likie this:
$new{$id}{gene_name} = $id;
$new{$id}{gene_database_source} = $gene_database_source
$new{$id}{product} = $product;
$new{$id}{sequence} = $sequence;
$new{$id}{seqlength} = $seqlength;
$new{$id}{digest} = $digest;
$new{$id}{mw} = $mw;
$new{$id}{iep} = $iep;
$new{$id}{tms} = $tms;
After all genes are read and, the insertions are made looping through the hash into an eval{} statement.
eval {
foreach my $id (keys %new) {
my $rs = $schema->resultset('Genes')->create(
{
gene_name => $new{$id}{gene_name},
gene_product => $new{$id}{product},
sequence => $new{$id}{sequence},
gene_protein_length => $new{$id}{seqlength},
digest => $new{$id}{digest},
gene_isoelectric_point => $new{$id}{iep},
gene_molecular_weight => $new{$id}{mw},
gene_tmd_count => $new{$id}{tms},
gene_species => $species,
species_code => $spc,
user_id => $tdruserid,
gene_database_source => $new{$id}{gene_database_source}
}
);
};
While this "works", it has at least two problems I'd like to solve:
The eval statement is intended to "failsafe" the insertions: if one of the insertions fail, the eval dies and no insertion is done. This is clearly not how eval works. I'm pretty sure all insertions made
until failure point will be done and there's no rollback whatsoever.
The script needs to loop twice through very large datasets (one while reading and creating the hashes, and once again when reading
the hashes and performing the insertions). This makes the process' performance rather poor.
Instead of creating the hashes, I'd been thinking of using the new directive of DBIX $schema->new({..stuff..}); and then doing a massive insert transaction. That would solve the double iteration and the eval would either work (or not) with a single transaction, which would do the expected behaviour of < either all insertions or none > ... Is there a way to do this?
You can create your massive transaction by using a TxnScopeGuard in DBIC. In the most basic form, that would be as follows.
eval { # or try from Try::Tiny
my $guard = $schema->txn_scope_guard;
foreach my $id ( keys %new ) {
my $rs = $schema->resultset('Genes')->create(
{
gene_name => $new{$id}{gene_name},
gene_product => $new{$id}{product},
sequence => $new{$id}{sequence},
gene_protein_length => $new{$id}{seqlength},
digest => $new{$id}{digest},
gene_isoelectric_point => $new{$id}{iep},
gene_molecular_weight => $new{$id}{mw},
gene_tmd_count => $new{$id}{tms},
gene_species => $species,
species_code => $spc,
user_id => $tdruserid,
gene_database_source => $new{$id}{gene_database_source}
}
);
}
$guard->commit;
}
You create a scope guard object, and when you're done setting up your transaction, you commit it. If the object goes out of scope, i.e. because something died, it will rollback the transaction automatically.
The eval can catch the die, and your program will not crash. You had that part correct, but you're also right that your code will not undo previous inserts. Note that Try::Tiny's try provides nicer syntax. But it's not needed here.
Transaction in this case means that all queries are collected and run at the same time.
Note that this will still insert one row per INSERT statement only!
If you want to instead create larger INSERT statements, like the following, you need populate, not new.
INSERT INTO foo (bar, baz) VALUES
(1, 1),
(2, 2),
(3, 3),
...
The populate method lets you pass in an array reference with multiple rows at one time. This is supposed to be way faster than inserting one at a time.
$schema->resultset("Artist")->populate([
[ qw( artistid name ) ],
[ 100, 'A Formally Unknown Singer' ],
[ 101, 'A singer that jumped the shark two albums ago' ],
[ 102, 'An actually cool singer' ],
]);
Translated to your loop, that would be as follows. Note that the documentation claims that it's faster if you run it in void context.
eval {
$schema->resultset('Genes')->populate(
[
[
qw(
gene_name gene_product sequence
gene_protein_length digest gene_isoelectric_point
gene_molecular_weight gene_tmd_count gene_species
species_code user_id gene_database_source
)
],
map {
[
$new{$_}{gene_name}, $new{$_}{product},
$new{$_}{sequence}, $new{$_}{seqlength},
$new{$_}{digest}, $new{$_}{iep},
$new{$_}{mw}, $new{$_}{tms},
$species, $spc,
$tdruserid, $new{$_}{gene_database_source},
]
} keys %new
],
);
}
Like this the scope guard is not needed. However, I would advise you to not do more than 1000 rows per statement though. Processing it in chunks might be a good idea for performance reasons. In that case, you'd loop over the keys 1000 at a time. List::MoreUtils has a nice natatime function for that.
use List::MoreUtils 'natatime';
eval {
my $guard = $schema->txn_scope_guard;
my $it = natatime 1_000, keys %new;
while ( my #keys = $it->() ) {
$schema->resultset('Genes')->populate(
[
[
qw(
gene_name gene_product sequence
gene_protein_length digest gene_isoelectric_point
gene_molecular_weight gene_tmd_count gene_species
species_code user_id gene_database_source
)
],
map {
[
$new{$_}{gene_name}, $new{$_}{product},
$new{$_}{sequence}, $new{$_}{seqlength},
$new{$_}{digest}, $new{$_}{iep},
$new{$_}{mw}, $new{$_}{tms},
$species, $spc,
$tdruserid, $new{$_}{gene_database_source},
]
} #keys
],
);
}
$guard->commit;
}
Now it will do 1000 rows per insertion, and run all those queries in one big transaction. If one of them fails, none will be done.
The script needs to loop twice through very large datasets (one while reading and creating the hashes, and once again when reading the hashes and performing the insertions). This makes the process' performance rather poor.
You're not showing how you create the data, besides this assignment.
$new{$id}{gene_name} = $id;
$new{$id}{gene_database_source} = $gene_database_source
$new{$id}{product} = $product;
If that's all there is to it, nothing is stopping you from using the approach I've shown above directly where you're processing the data the first time and building the hash. The following code is incomplete, because you're not telling us where the data is coming from, but you should get the gist.
eval {
my $guard = $schema->txn_scope_guard;
# we use this to collect rows to process
my #rows;
# this is where your data comes in
while ( my $foo = <DATA> ) {
# here you process the data and come up with your variables
my ( $id, $gene_database_source, $product, $sequence, $seqlength,
$digest, $mw, $iep, $tms );
# collect the row so we can insert it later
push(
#rows,
[
$id, $gene_database_source, $product, $sequence, $seqlength,
$digest, $mw, $iep, $tms,
]
);
# only insert if we reached the limit
if ( scalar #rows == 1000 ) {
$schema->resultset('Genes')->populate(
[
[
qw(
gene_name gene_product sequence
gene_protein_length digest gene_isoelectric_point
gene_molecular_weight gene_tmd_count gene_species
species_code user_id gene_database_source
)
],
\#rows,
],
);
# empty the list of values
#rows = ();
}
}
$guard->commit;
}
Essentially we collect up to 1000 rows directly as array references while we process them, and when we've reached the limit, we pass them to the database. We then reset our row array and start over. Again, all of this is wrapped in a transaction, so it will only be committed if all the inserts are fine.
There is more information on transactions in DBIC in the cookbook.
Please note that I have not tested any of this code.

Powershell array value in MySQL insert is outputting entire array

First off I am new to Powershell, as in this is my first ever attempt. I'm doing the I can program in other languages so I can hack through this project. I have spent a few days now trying to solve this one, and I know is something stupid. I'm just stuck.
# Insert into master table
$query = "INSERT INTO ``master``(``diff``, ``key``) VALUES ('$diff', '$key')"
Invoke-MySqlQuery -Query $query
$query
This works fine the test output displays:
INSERT INTO `master`(`diff`, `key`) VALUES ('248', 'k000002143200000000000000680006080005500900030082670009461000500000091000000000000')
It also inputs into the MySQL DB just fine.
This following the exact same format is not working for me. Like I said at the top, I know it is some stupid formatting thing I'm missing.
# Insert into answers table
$query = "INSERT INTO ``puzzles``(``id``, ``p1``, ``p2``, ``p3``, ``p4``, ``p5``, ``p6``, ``p7``, ``p8``, ``p9``, ``p10``, ``p11``, ``p12``, ``p13``, ``p14``, ``p15``, ``p16``, ``p17``, ``p18``, ``p19``, ``p20``, ``p21``, ``p22``, ``p23``, ``p24``, ``p25``, ``p26``, ``p27``, ``p28``, ``p29``, ``p30``, ``p31``, ``p32``, ``p33``, ``p34``, ``p35``, ``p36``, ``p37``, ``p38``, ``p39``, ``p40``, ``p41``, ``p42``, ``p43``, ``p44``, ``p45``, ``p46``, ``p47``, ``p48``, ``p49``, ``p50``, ``p51``, ``p52``, ``p53``, ``p54``, ``p55``, ``p56``, ``p57``, ``p58``, ``p59``, ``p60``, ``p61``, ``p62``, ``p63``, ``p64``, ``p65``, ``p66``, ``p67``, ``p68``, ``p69``, ``p70``, ``p71``, ``p72``, ``p73``, ``p74``, ``p75``, ``p76``, ``p77``, ``p78``, ``p79``, ``p80``, ``p81``) VALUES ('$id', '$p[0]', '$p[1]', '$p[2]', '$p[3]', '$p[4]', '$p[5]', '$p[6]', '$p[7]', '$p[8]', '$p[9]', '$p[10]', '$p[11]', '$p[12]', '$p[13]', '$p[14]', '$p[15]', '$p[16]', '$p[17]', '$p[18]', '$p[19]', '$p[20]', '$p[21]', '$p[22]', '$p[23]', '$p[24]', '$p[25]', '$p[26]', '$p[27]', '$p[28]', '$p[29]', '$p[30]', '$p[31]', '$p[32]', '$p[33]', '$p[34]', '$p[35]', '$p[36]', '$p[37]', '$p[38]', '$p[39]', '$p[40]', '$p[41]', '$p[42]', '$p[43]', '$p[44]', '$p[45]', '$p[46]', '$p[47]', '$p[48]', '$p[49]', '$p[50]', '$p[51]', '$p[52]', '$p[53]', '$p[54]', '$p[55]', '$p[56]', '$p[57]', '$p[58]', '$p[59]', '$p[60]', '$p[61]', '$p[62]', '$p[63]', '$p[64]', '$p[65]', '$p[66]', '$p[67]', '$p[68]', '$p[69]', '$p[70]', '$p[71]', '$p[72]', '$p[73]', '$p[74]', '$p[75]', '$p[76]', '$p[77]', '$p[78]', '$p[79]', '$p[80]')"
$query
$p[0]
$p[1]
$p[2]
$p[3]
$p[4]
$p[5]
Invoke-MySqlQuery -Query $query
This is the output I get on one of the runs:
INSERT INTO `puzzles`(`id`, `p1`, `p2`, `p3`, `p4`, `p5`, `p6`, `p7`, `p8`, `p9`, `p10`, `p11`, `p12`, `p13`, `p14`, `p15`, `p16`, `p17`, `p
18`, `p19`, `p20`, `p21`, `p22`, `p23`, `p24`, `p25`, `p26`, `p27`, `p28`, `p29`, `p30`, `p31`, `p32`, `p33`, `p34`, `p35`, `p36`, `p37`, `p
38`, `p39`, `p40`, `p41`, `p42`, `p43`, `p44`, `p45`, `p46`, `p47`, `p48`, `p49`, `p50`, `p51`, `p52`, `p53`, `p54`, `p55`, `p56`, `p57`, `p
58`, `p59`, `p60`, `p61`, `p62`, `p63`, `p64`, `p65`, `p66`, `p67`, `p68`, `p69`, `p70`, `p71`, `p72`, `p73`, `p74`, `p75`, `p76`, `p77`, `p
78`, `p79`, `p80`, `p81`) VALUES ('2596', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[0]', '000002143
200000000000000680006080005500900030082670009461000500000091000000000000[1]', '0000021432000000000000006800060800055009000300826700094610005
00000091000000000000[2]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[3]', '0000021432000000000000006
80006080005500900030082670009461000500000091000000000000[4]', '00000214320000000000000068000608000550090003008267000946100050000009100000000
0000[5]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[6]', '00000214320000000000000068000608000550090
0030082670009461000500000091000000000000[7]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[8]', '00000
2143200000000000000680006080005500900030082670009461000500000091000000000000[9]', '000002143200000000000000680006080005500900030082670009461
000500000091000000000000[10]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[11]', '0000021432000000000
00000680006080005500900030082670009461000500000091000000000000[12]', '0000021432000000000000006800060800055009000300826700094610005000000910
00000000000[13]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[14]', '00000214320000000000000068000608
0005500900030082670009461000500000091000000000000[15]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[1
6]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[17]', '000002143200000000000000680006080005500900030
082670009461000500000091000000000000[18]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[19]', '0000021
43200000000000000680006080005500900030082670009461000500000091000000000000[20]', '0000021432000000000000006800060800055009000300826700094610
00500000091000000000000[21]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[22]', '00000214320000000000
0000680006080005500900030082670009461000500000091000000000000[23]', '00000214320000000000000068000608000550090003008267000946100050000009100
0000000000[24]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[25]', '000002143200000000000000680006080
005500900030082670009461000500000091000000000000[26]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[27
]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[28]', '0000021432000000000000006800060800055009000300
82670009461000500000091000000000000[29]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[30]', '00000214
3200000000000000680006080005500900030082670009461000500000091000000000000[31]', '00000214320000000000000068000608000550090003008267000946100
0500000091000000000000[32]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[33]', '000002143200000000000
000680006080005500900030082670009461000500000091000000000000[34]', '000002143200000000000000680006080005500900030082670009461000500000091000
000000000[35]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[36]', '0000021432000000000000006800060800
05500900030082670009461000500000091000000000000[37]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[38]
', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[39]', '00000214320000000000000068000608000550090003008
2670009461000500000091000000000000[40]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[41]', '000002143
200000000000000680006080005500900030082670009461000500000091000000000000[42]', '000002143200000000000000680006080005500900030082670009461000
500000091000000000000[43]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[44]', '0000021432000000000000
00680006080005500900030082670009461000500000091000000000000[45]', '0000021432000000000000006800060800055009000300826700094610005000000910000
00000000[46]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[47]', '00000214320000000000000068000608000
5500900030082670009461000500000091000000000000[48]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[49]'
, '000002143200000000000000680006080005500900030082670009461000500000091000000000000[50]', '000002143200000000000000680006080005500900030082
670009461000500000091000000000000[51]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[52]', '0000021432
00000000000000680006080005500900030082670009461000500000091000000000000[53]', '0000021432000000000000006800060800055009000300826700094610005
00000091000000000000[54]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[55]', '00000214320000000000000
0680006080005500900030082670009461000500000091000000000000[56]', '00000214320000000000000068000608000550090003008267000946100050000009100000
0000000[57]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[58]', '000002143200000000000000680006080005
500900030082670009461000500000091000000000000[59]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[60]',
'000002143200000000000000680006080005500900030082670009461000500000091000000000000[61]', '0000021432000000000000006800060800055009000300826
70009461000500000091000000000000[62]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[63]', '00000214320
0000000000000680006080005500900030082670009461000500000091000000000000[64]', '00000214320000000000000068000608000550090003008267000946100050
0000091000000000000[65]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[66]', '000002143200000000000000
680006080005500900030082670009461000500000091000000000000[67]', '000002143200000000000000680006080005500900030082670009461000500000091000000
000000[68]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[69]', '0000021432000000000000006800060800055
00900030082670009461000500000091000000000000[70]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[71]',
'000002143200000000000000680006080005500900030082670009461000500000091000000000000[72]', '00000214320000000000000068000608000550090003008267
0009461000500000091000000000000[73]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[74]', '000002143200
000000000000680006080005500900030082670009461000500000091000000000000[75]', '000002143200000000000000680006080005500900030082670009461000500
000091000000000000[76]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[77]', '0000021432000000000000006
80006080005500900030082670009461000500000091000000000000[78]', '0000021432000000000000006800060800055009000300826700094610005000000910000000
00000[79]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[80]')
0
0
0
0
0
2
In the $query variable the $p[X] does not work like the $p[X] below. It's spitting out the entire array. Obviously it's not working when it sends it to MySQL, I get the ID correct and that's it.
Thank you all for the help in advance, thumping my head on the keyboard hurts after awhile!
Ian
Bonus question! Any tips on this would be great as well. Any of the 0s don't need to be there, I actually want them as NULL. Any tips on a clean way to do that would be wonderful. Full disclosure I have not even looked into this part yet!
EDIT - In response to #Mathias R. Jessen
I removed all the other code not having to do with $p, to make it easy on the eyes.
# Set the key variable
$key = (Get-Content Puzzles.txt)[$loop] # ....73.146...8.......6......5..2..8.8..5.9..1.923..........273.........8..3..1.46 #Extreme
$key = $key.Replace(".", "0") # 000073014600080000000600000050020080800509001092300000000002730000000008003001046 #Extreme
$key = -join('k', $key.Substring(0,81)) # k000073014600080000000600000050020080800509001092300000000002730000000008003001046
$check = (Invoke-MySqlQuery -Query "SELECT COUNT(*) FROM ``master`` WHERE ``key`` = '$key'")
$check = $check.'COUNT(*)'
# Check in master table if key exists, i.e. not a new puzzle
IF ($check -eq 0) {
$p = $key.Substring(1,81) # 000073014600080000000600000050020080800509001092300000000002730000000008003001046
# Insert into answers table
$query = "INSERT INTO ``puzzles``(``id``, ``p1``, ``p2``, ``p3``, ``p4``, ``p5``, ``p6``, ``p7``, ``p8``, ``p9``, ``p10``, ``p11``, ``p12``, ``p13``, ``p14``, ``p15``, ``p16``, ``p17``, ``p18``, ``p19``, ``p20``, ``p21``, ``p22``, ``p23``, ``p24``, ``p25``, ``p26``, ``p27``, ``p28``, ``p29``, ``p30``, ``p31``, ``p32``, ``p33``, ``p34``, ``p35``, ``p36``, ``p37``, ``p38``, ``p39``, ``p40``, ``p41``, ``p42``, ``p43``, ``p44``, ``p45``, ``p46``, ``p47``, ``p48``, ``p49``, ``p50``, ``p51``, ``p52``, ``p53``, ``p54``, ``p55``, ``p56``, ``p57``, ``p58``, ``p59``, ``p60``, ``p61``, ``p62``, ``p63``, ``p64``, ``p65``, ``p66``, ``p67``, ``p68``, ``p69``, ``p70``, ``p71``, ``p72``, ``p73``, ``p74``, ``p75``, ``p76``, ``p77``, ``p78``, ``p79``, ``p80``, ``p81``) VALUES ('$id', '$p[0]', '$p[1]', '$p[2]', '$p[3]', '$p[4]', '$p[5]', '$p[6]', '$p[7]', '$p[8]', '$p[9]', '$p[10]', '$p[11]', '$p[12]', '$p[13]', '$p[14]', '$p[15]', '$p[16]', '$p[17]', '$p[18]', '$p[19]', '$p[20]', '$p[21]', '$p[22]', '$p[23]', '$p[24]', '$p[25]', '$p[26]', '$p[27]', '$p[28]', '$p[29]', '$p[30]', '$p[31]', '$p[32]', '$p[33]', '$p[34]', '$p[35]', '$p[36]', '$p[37]', '$p[38]', '$p[39]', '$p[40]', '$p[41]', '$p[42]', '$p[43]', '$p[44]', '$p[45]', '$p[46]', '$p[47]', '$p[48]', '$p[49]', '$p[50]', '$p[51]', '$p[52]', '$p[53]', '$p[54]', '$p[55]', '$p[56]', '$p[57]', '$p[58]', '$p[59]', '$p[60]', '$p[61]', '$p[62]', '$p[63]', '$p[64]', '$p[65]', '$p[66]', '$p[67]', '$p[68]', '$p[69]', '$p[70]', '$p[71]', '$p[72]', '$p[73]', '$p[74]', '$p[75]', '$p[76]', '$p[77]', '$p[78]', '$p[79]', '$p[80]')"
Invoke-MySqlQuery -Query $query
}
You need to use the subexpression operator $() to evaluate the collection at that index during string interpolation. Without this operator, the content of your entire collection is being printed, as well as your literal index syntax.
Your first example works as expected because you're only interpolating simple variables, without doing any additional work to them.
Here's a simple example from the command line:
C:\> $arr = 1,2,3,4
Outside of a string:
C:\> $arr[0]
1
During string interpolation, without the subexpression operator:
C:\> "$arr[0]"
1 2 3 4[0]
During string interpolation, with the subexpression operator:
C:\> "$($arr[0])"
1
This means that your example would become something like this:
...VALUES ('$id', '$($p[0])', '$($p[1])'...
Note that $id is working correctly because it is a simple variable. You only need to use the subexpression operator for additional work like evaluating indexes, properties, etc.
This concept is also sometimes called variable expansion, if you would like to research further.

Using PDO to insert variables into SELECT clause?

I am attempting to get the distance from a user to each venue stored in a MySQL database, using the spherical law of cosines. The user inputs their location, and the following query is executed.
$data = array(':lat' => $lat, ':lon' => $lon);
$qry = "SELECT ACOS(SIN(v.Latitude) * SIN(:lat) + COS(v.Latitude) * COS(:lat) * COS(:lon - v.Longitude)) * 3963 AS distance FROM Venue v";
$stmt = $pdo->prepare($qry);
$stmt->execute($data);
$rows = $stmt->fetchAll();
The problem is, I get the following error.
PHP Fatal error: Uncaught exception 'PDOException' with message 'SQLSTATE[HY093]: Invalid parameter number'
When I remove the variables (:lat and :lon) from the SELECT clause, it works just fine. Other variables further on in the statement (not shown here) work just fine, it is only the variables in the SELECT clause that cause an issue. Is this inability to use PDO variables within SELECT clauses a limitation of PDO, or is there a way around this issue?
I am using PHP 5.4.15, and my PDO options are as follows.
$options = array(PDO::MYSQL_ATTR_INIT_COMMAND => 'SET NAMES utf8', // UTF-8 to prevent issue sending special characters with JSON
PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION, // fire exceptions for errors (turn this off for release)
PDO::ATTR_DEFAULT_FETCH_MODE => PDO::FETCH_ASSOC, // only return results indexed by column name
PDO::ATTR_EMULATE_PREPARES => false // actually prepare statements, not pseudo-prepare ( http://stackoverflow.com/questions/10113562/pdo-mysql-use-pdoattr-emulate-prepares-or-not )
);
$data = array($lat, $lat, $lon);
$qry = "SELECT ACOS(SIN(v.Latitude) * SIN(?) + COS(v.Latitude) * COS(?) * COS(? - v.Longitude)) * 3963 AS distance FROM Venue v";
$stmt = $pdo->prepare($qry);
$stmt->execute($data);
$rows = $stmt->fetchAll();

How do I cleanly extract MySQL enum values in Perl?

I have some code which needs to ensure some data is in a mysql enum prior to insertion in the database. The cleanest way I've found of doing this is the following code:
sub enum_values {
my ( $self, $schema, $table, $column ) = #_;
# don't eval to let the error bubble up
my $columns = $schema->storage->dbh->selectrow_hashref(
"SHOW COLUMNS FROM `$table` like ?",
{},
$column
);
unless ($columns) {
X::Internal::Database::UnknownColumn->throw(
column => $column,
table => $table,
);
}
my $type = $columns->{Type} or X::Panic->throw(
details => "Could not determine type for $table.$column",
);
unless ( $type =~ /\Aenum\((.*)\)\z/ ) {
X::Internal::Database::IncorrectTypeForColumn->throw(
type_wanted => 'enum',
type_found => $type,
);
}
$type = $1;
require Text::CSV_XS;
my $csv = Text::CSV_XS->new;
$csv->parse($type) or X::Panic->throw(
details => "Could not parse enum CSV data: ".$csv->error_input,
);
return map { /\A'(.*)'\z/; $1 }$csv->fields;
}
We're using DBIx::Class. Surely there is a better way of accomplishing this? (Note that the $table variable is coming from our code, not from any external source. Thus, no security issue).
No need to be so heroic. Using a reasonably modern version of DBD::mysql, the hash returned by DBI's column info method contains a pre-split version of the valid enum values in the key mysql_values:
my $sth = $dbh->column_info(undef, undef, 'mytable', '%');
foreach my $col_info ($sth->fetchrow_hashref)
{
if($col_info->{'TYPE_NAME'} eq 'ENUM')
{
# The mysql_values key contains a reference to an array of valid enum values
print "Valid enum values for $col_info->{'COLUMN_NAME'}: ",
join(', ', #{$col_info->{'mysql_values'}}), "\n";
}
...
}
I'd say using Text::CSV_XS may be an overkill, unless you have weird things like commas in enums (a bad idea anyway if you ask me). I'd probably use this instead.
my #fields = $type =~ / ' ([^']+) ' (?:,|\z) /msgx;
Other than that, I don't think there are shortcuts.
I spent part of the day asking the #dbix-class channel over on MagNet the same question and came across this lack of answer. Since I found the answer and nobody else seems to have done so yet, I'll paste the transcript below the TL;DR here:
my $cfg = new Config::Simple( $rc_file );
my $mysql = $cfg->get_block('mysql');
my $dsn =
"DBI:mysql:database=$mysql->{database};".
"host=$mysql->{hostname};port=$mysql->{port}";
my $schema =
DTSS::CDN::Schema->connect( $dsn, $mysql->{user}, $mysql->{password} );
my $valid_enum_values =
$schema->source('Cdnurl')->column_info('scheme')->{extra}->{list};
And now the IRC log of me beating my head against a wall:
14:40 < cj> is there a cross-platform way to get the valid values of an
enum?
15:11 < cj> it looks like I could add 'InflateColumn::Object::Enum' to the
__PACKAGE__->load_components(...) list for tables with enum
columns
15:12 < cj> and then call values() on the enum column
15:13 < cj> but how do I get dbic-dump to add
'InflateColumn::Object::Enum' to
__PACKAGE__->load_components(...) for only tables with enum
columns?
15:20 < cj> I guess I could just add it for all tables, since I'm doing
the same for InflateColumn::DateTime
15:39 < cj> hurm... is there a way to get a column without making a
request to the db?
15:40 < cj> I know that we store in the DTSS::CDN::Schema::Result::Cdnurl
class all of the information that I need to know about the
scheme column before any request is issued
15:42 <#ilmari> cj: for Pg and mysql Schema::Loader will add the list of
valid values to the ->{extra}->{list} column attribute
15:43 <#ilmari> cj: if you're using some other database that has enums,
patches welcome :)
15:43 <#ilmari> or even just a link to the documentation on how to extract
the values
15:43 <#ilmari> and a willingness to test if it's not a database I have
access to
15:43 < cj> thanks, but I'm using mysql. if I were using sqlite for this
project, I'd probably oblige :-)
15:44 <#ilmari> cj: to add components to only some tables, use
result_components_map
15:44 < cj> and is there a way to get at those attributes without making a
query?
15:45 < cj> can we do $schema->resultset('Cdnurl') without having it issue
a query, for instance?
15:45 <#ilmari> $result_source->column_info('colname')->{extra}->{list}
15:45 < cj> and $result_source is $schema->resultset('Cdnurl') ?
15:45 <#ilmari> dbic never issues a query until you start retrieving the
results
15:45 < cj> oh, nice.
15:46 <#ilmari> $schema->source('Cdnurl')
15:46 <#ilmari> the result source is where the result set gets the results
from when they are needed
15:47 <#ilmari> names have meanings :)