Question regarding best practices for queries and perl.
Is there a performance or 'bad practice' of having a select inside a perl for loop? Is it an issue to send so many selects in rapid fire to the DB?
code is quasi pseudo code
#line has 5000 lines
foreach my $elem( #line ){
SQL = SELECT IGNORE INTO <table> ( column1, .. , column 10 ) VALUES ( 'a', .. , 'j' )
}
What about deletes and/or updates?
foreach my $elem( #line ) {
my $UN = substr($elem, 0, 10 );
SQL = UPDATE <table> SET <column> = $UN;
}
foreach my $elem( #line ) {
my $UN = substr($elem, 0, 10 );
SQL = DELETE from <table> WHERE <column> = $UN
}
Also, I have a question in the same arena, I have 5000 items I am checking and my Database has anywhere from 1 element to 5000 elements at any given time. Is it acceptable to loop through my 5000 items in perl and delete the ID in the Database or should there be a check at first to see if the ID exists before issues the delete command.
foreach my $elem ( #line ){
$ID = substr( $elem, 5, 0 );
SQL = DELETE FROM <table> WHERE id = $ID;
}
or should it be something like:
foreach my $elem ( #line ){
$ID = substr( $elem, 5, 0 );
SQL = DELETE FROM <table> WHERE id = $ID if ID exists;
}
Thanks,
--Eherr
As for inserts in rapid succession, not a problem. The server is tailored to handle that.
Caution should be taken with insert ignore for other reasons though, such as program logic that should address failure that otherwise would not be able to address a failure you just ignored.
As for your particular update you showed, that does not make a ton of sense in a loop (or perhaps at all) because you are not specifying a where clause. Meaning, why loop, say, 1000 times, each doing an update to all the rows due to no where clause? Maybe that was just a typo of yours.
As for deletes, there is no problem with that, either, in a loop, in general. If you are looking to empty a table, look into truncate table, faster, and not logged if that is ever a desire. Note though that truncate is disallowed on tables that are the referenced table in foreign key constraint situations. In those situations there are the concepts of the referencing table and the referenced.
Other general comments: care should be taken to ensure that any referential integrity in place or that should be in place is honored. Doing insert ignore, update, or delete can fail due to foreign key constraints. Also, checking for the existence of a row that you are about to delete anyway may be overkill idk. It is marching down a btree anyway to find it. Why do it twice (the marching part). Marching might not be a good word, perhaps flying. But on a tablescan, it would be added pain.
Lastly, when you are in a situation of massive bulk insert, loops are never up to the task in any programming language as compared to LOAD DATA INFILE performance. Several of your peers have seen 8 to 10 hour operations reduced to 2 minutes by using LOAD DATA (references to links available if you ask). Ok This Link is one.
Mysql Manual Page below:
Referential Integrity with Foreign Key Constraints
Quickly clearing tables with Truncate Table
Bulk inserts with LOAD DATA INFILE.
As per my opinion,it is a bit slow to make multiple queries.Better construct a single update,insert,select and delete query and fire.
There are few tips before using multiple quesris or single query.
1) If Database is configured to kill all queries that takes more than spcecified time, then using a single query if it is too large, can lead to killing of query.
2) Also, if user is waiting for response, then it can be done using pagination,i.e., fetch few records now...and subsequent later but not one by one.
5000 queries into a database shouldn't be a performance bottleneck. You're fine. You can always benchmark a read-only run.
Related
I have table with 1.5 mil rows and I have 47k values to update.
I've tried two ways of doing it and both are pretty slow.
First is 47k rows of
UPDATE $table
SET name = '$name'
WHERE id = '$id'
Second is
$prefix = "UPDATE table
SET name = (case ";
while () {
$mid .= "when id = '$id' then '$name' ";
}
$suffix = "end);";
$query = $prefix . $mid . $suffix;
Is there a way of doing it faster? Maybe with LOAD DATA INFILE ? Can't figure out the UPDATE syntax with this one.
I had to import large files on a daily basis, and tried all sorts of things.
In the end I got the best performance a specific combination of:
First copy the CSV to the database server, and load it from the local disk there, instead of loading the CSV from your client machine.
Make sure that you have a table structure that matches exactly with this. I've used a temporary table for the import, and then used separate queries on that to get data into the final table.
No foreign keys and unique index checks on the tmp table.
That will speeds things up a lot already. If you need to squeeze more performance in, you can increase the log buffer size.
And obviously:
make sure that you don't import stuff that you don't need to. Be critical about which fields you include, and which rows.
If you only have a few different values of text in a column, use a numeric value for it instead.
Do you really need 8 decimals in your floats?
Are you repeatedly importing the same data, where you could insert updates only?
Make sure that you don't trigger unnecessary type conversions during import. Prepare your data to be as close to the table that you're importing into.
I have a table with
(ID INT auto_incrment primary key,
tag VARCHAR unique)
I want to insert multiple tags at one. Like this:
INSERT INTO tags (tag) VALUES ("java"), ("php"), ("phyton");
If I would execute this, and "java" is already in the table, I'm getting an error. It doesn't add "php" and "python".
If I do it like this :
INSERT INTO tags (tag) VALUES ("java"), ("php"), ("phyton")
ON DUPLICATE KEY UPDATE tag = VALUES(tag)
it gets added without an error, but it skips 2 values at the ID field.
Example: I have Java with ID = 1 and I run the query. Then PHP will be 3 and Phyton 4. Is there a way to execute this query without skipping the IDs?
I don't want big spaces between them. I also tried INSERT IGNORE.
Thank you!
See "SQL #1" in http://mysql.rjweb.org/doc.php/staging_table#normalization . It is more complex but avoids 'burning' ids. It has the potential drawback of needing the tags in another table. A snippet from that link:
# This should not be in the main transaction, and it should be
# done with autocommit = ON
# In fact, it could lead to strange errors if this were part
# of the main transaction and it ROLLBACKed.
INSERT IGNORE INTO Hosts (host_name)
SELECT DISTINCT s.host_name
FROM Staging AS s
LEFT JOIN Hosts AS n ON n.host_name = s.host_name
WHERE n.host_id IS NULL;
By isolating this as its own transaction, we get it finished in a hurry, thereby minimizing blocking. By saying IGNORE, we don't care if other threads are 'simultaneously' inserting the same host_names. (If you don't have another thread doing such INSERTs, you can toss the IGNORE.)
(Then it goes on to talk about IODKU.)
INNODB engine Its main feature is to support ACID type transactions.
What it usually does that I point out is not a "problem", is that the engine will "reserve" the id before knowing if it is a duplicate or not.
This is a solution, but it depends on your table, if we are talking about a very large one you should do some tests first because the AUTO_INCREMENT function helps you to follow the ordering of the id.
I'll give you some examples:
INSERT INTO tags (java,php,python) VALUES ("val1"), ("val2"), ("val3")
ON DUPLICATE KEY UPDATE java = VALUES(java), id = LAST_INSERT_ID(id);
SELECT LAST_INSERT_ID();
ALTER TABLE tags AUTO_INCREMENT = 1;
Note: I added LAST_INSERT_ID () to you because every time you insert or update it always gives you an inserted or reserved id.
Each time INSERT INTO is called, AUTO_INCREMNT must be followed.
Here is a chunk of the SQL I'm using for a Perl-based web application. I have a number of requests and each has a number of accessions, and each has a status. This chunk of code is there to update the table for every accession_analysis that shares all these fields for each accession in a request.
UPDATE accession_analysis
SET analysis_id = ? ,
reference_id = ? ,
status = ? ,
extra_parameters = ?
WHERE analysis_id = ?
AND reference_id = ?
AND status = ?
AND extra_parameters = ?
and accession_id is (
SELECT accesion_id
FROM accessions
where request_id = ?
)
I have changed the tables so that there's a status table for accession_analysis, so when I update, I update both accession_analysis and accession_analysis_status, which has status, status_text and the id of the accession_analysis, which is a not null auto_increment variable.
I have no strong idea about how to modify this code to allow this. My first pass grabbed all the accessions and looped through them, then filtered for all the fields, then updated. I didn't like that because I had many connections with short SQL commands, which I understood to be bad, but I can't help but think the only way to really do this is to go back to the loop in Perl holding two simpler SQL statements.
Is there a way to do this in SQL that, with my relative SQL inexperience, I'm just not seeing?
The answer depends on which DBMS you're using. The easiest way is to create a trigger on one table that provides the logic of updating the other table. (For any DB newbies -- a trigger is procedural code attached to a table at the DBMS (not application) layer that runs in response to an insert, update or delete on the table.). A similar, slightly less desirable method is to put the logic in a stored procedure and execute that instead of the update statement you're now using.
If the DBMS you're using doesn't support either of these mechanisms, then there isn't a good way to do what you're after while guaranteeing transactional integrity. However if the problem you're solving can tolerate a timing difference in the two tables' updates (i.e. The data in one of the tables is only used at predetermined times, like reporting or some type of batched operation) you could write to one table (live) and create a separate process that runs when needed (later) to update the second table using data from the first table. The correctness of allowing data to be updated at different times becomes a large and immovable design assumption, however.
If this is mostly about connection speed, then one option you have is to write a stored procedure that handles the "double update or insert" transparently. See the manual for stored procedures:
http://dev.mysql.com/doc/refman/5.5/en/create-procedure.html
Otherwise, You probably cannot do it in one statement, see the MySQL INSERT syntax:
http://dev.mysql.com/doc/refman/5.5/en/insert.html
The UPDATE syntax allows for multi-table updates (not in combination with INSERT, though):
http://dev.mysql.com/doc/refman/5.5/en/update.html
Each table needs its own INSERT / UPDATE in the query.
In fact, even if you create a view by JOINing multiple tables, when you INSERT into the view, you can only INSERT with fields belonging to one of the tables at a time.
The modifications made by the INSERT statement cannot affect more than one of the base tables referenced in the FROM clause of the view. For example, an INSERT into a multitable view must use a column_list that references only columns from one base table. For more information about updatable views, see CREATE VIEW.
Inserting data into multiple tables through an sql view (MySQL)
INSERT (SQL Server)
Same is true of UPDATE
The modifications made by the UPDATE statement cannot affect more than one of the base tables referenced in the FROM clause of the view. For more information on updatable views, see CREATE VIEW.
However, you can have multiple INSERTs or UPDATEs per query or stored procedure.
I have a table that needs regular updating. These updates happen in batches. Unlike with INSERT, I can't just include multiple rows in a single query. What I do now is prepare the UPDATE statement, then loop through all the possibilities and execute each. Sure, the preparation happens only once, but there are still a lot of executions.
I created several versions of the table of different sizes (thinking that maybe better indexing or splitting the table would help). However, that did not have an effect on update times. 100 updates take about 4 seconds for either 1,000-row table or 500,000-row one.
Is there a smarter way of doing this faster?
As asked in the comments, here is actual code (PHP) I have been testing with. Column 'id' is a primary key.
$stmt = $dblink->prepare("UPDATE my_table SET col1 = ? , col2 = ? WHERE id = ?");
$rc = $stmt->bind_param("ssi", $c1, $c2, $id);
foreach ($items as $item) {
$c1 = $item['c1'];
$c2 = $item['c2'];
$id = $item['id'];
$rc = $stmt->execute();
}
$stmt->close();
If you really want to do it all in one big statement, a kludgy way would be to use the "on duplicate key" functionality of the insert statement, even though all the rows should already exist, and the duplicate key update will hit for every single row.
INSERT INTO table (a,b,c) VALUES (1,2,3),(4,5,6)
ON DUPLICATE KEY UPDATE 1=VALUES(a), b=VALUES(b), c=VALUES(c);
Try LOAD DATA INFILE. Much faster than MySQL INSERT's or UPDATES, as long as you can get the data in a flat format.
I am working on a web app project and there is a rather large html form that needs to have its data stored in a table. The form and insert are already done but my client wants to be able to load the saved data back into the HTML form and be able to change it, again, this is no problem, but I came across a question when going to do the update, would it be appropriate to just keep the insert query and then delete the old row if it was an edit?
Basically, what already happens is when the form is submitted all of the data is put into a table using INSERT, I also have a flag called edit that contains the primary key ID if the data is for an existing field being updated. I can handle the update function two ways:
a) Create an actual update query with all the fields/data set and use an if/else to decide whether to run the update or insert query.
b) Do the insert every time but add a single line to DELETE WHERE row=editID after the insert is successful.
Since the Delete would only happen if the INSERT was successful I don't run the risk of deleting the data without inserting, thus losing the data, but since INSERT/DELETE is two queries, would it be less efficient than just using an if/else to decide whether to run an insert or update?
There is a second table that uses the auto-increment id as a foreign key, but this table has to be updated every time the form is submitted, so if I delete the row in table A, I will also be deleting the associated rows from table b. This seems like it would be bad programming practice, so I am leaning towards option a) anyway, but it is very tempting just to use the single line option. The DELETE would basically be as follows. Would this in fact be bad programming practice? Aside from conventions, are there any reasons why this is a "never do that!" type of code?
if ($insertFormResults) {
$formId = mysql_insert_id();
echo "Your form was saved successfully.";
if(isset($_POST['edit'])){
$query = "DELETE FROM registerForm WHERE id='$_POST[edit]'";
$result = mysql_query($query);
}
}
Whilst the INSERT/DELETE option would work perfectly well I'd recommend against it as:
Unless you bundle the INSERT/DELETE
up into a single transaction, or
better yet encapsulate the
INSERT/DELETE up into a stored
procedure you do run the theoretical
risk of accumulating duplicates. If
you use a SP or a transaction you're
just effectively rewriting the UPDATE
statement which is obviously
inefficient and moreover will give
rise to a few WTF raised eyebrows
later by anyone maintaining your
code.
Although it doesn't sound like an
issue in your case you are
potentially impacting referential
integrity should you need that.
Furthermore you are loosing the
rather useful ability to easily
retrieve records in creation order.
Probably not a great consideration on
a small application, but you are
going to end up with a seriously
fragmented database fairly quickly
which will slow data retrieval.
Update is only one round trip to the server, which is more efficient. Unless you have a reason that involves the possibility of bad data, always default to using an UPDATE.
It seems to me that doing the delete is pointless, if you run an update in MySql it will only update the record if it is different that what is stored already, is there some reason why you would need to do a delete instead. I usually use a case(switch) to catch update/delete calls from the user,
<?php
switch (action) {
case "delete" :
block of coding;
if the condition equals value1;
break;
case "edit" :
block of coding;
if the condition equals value2;
break;
}
?>