Update code
$stmt_update = $db->prepare( '
UPDATE table SET Column = 1
WHERE Id = 17
' );
$stmt_update->execute( );
Takes ~ 25 miliseconds
But
$stmt_select = $db->prepare( '
SELECT
`Text`,
`NameOfPerson`,
`EmailOne`,
`Phone`,
`EmailTwo`
FROM table_name
WHERE Id = ?
' );
$stmt_select->execute( array( trim( $_GET["two"] ) ) );
This takes ~ one millisecond.
Is it normal (such difference)? Any ideas how to make faster update (to execute faster)?
That makes sense. But, you need to learn a few things about measuring performance. The first query may be reading data into memory. This takes a bit of time. When you run the second query it is already there. Often, the second time you run exactly the same query is faster than the first time -- unless you fiddle with caching options on the server.
The update is going to be slower because databases have what are called ACID properties. That means that the update is not completed until the database is as sure as it can be that the change has been made. Typically, this means committing a log transaction to disk, so you are waiting for the disk write to be completed. It is not enough for the disk write to start -- it has to be completed. It also means that the update has acquire locks for the parts of the table being updated.
In addition, the database eventually has to write the actual modified data pages to disk. In MySQL, this probably depends on the storage engine. So, you might be waiting for that.
A select doesn't modify anything. It just reads. So there is some time for getting the data, but as soon as it is in memory, the query can process and finish.
In addition, updates may generate other work for the database engine -- such as updating indexes and running triggers. It is unclear if these are defined on your table.
So, I would expect an update to take longer than a select.
Related
I have a mysql table that keep gaining new records every 5 seconds.
The questions are
can I run query on this set of data that may takes more than 5 seconds?
if SELECT statement takes more than 5s, will it affect the scheduled INSERT statement?
what happen when INSERT statement invoked while SELECT is still running, will SELECT get the newly inserted records?
I'll go over your questions and some of the comments you added later.
can I run query on this set of data that may takes more than 5 seconds?
Can you? Yes. Should you? It depends. In a MySQL configuration I set up, any query taking longer than 3 seconds was considered slow and logged accordingly. In addition, you need to keep in mind the frequency of the queries you intend to run.
For example, if you try to run a 10 second query every 3 seconds, you can probably see how things won't end well. If you run a 10 second query every few hours or so, then it becomes more tolerable for the system.
That being said, slow queries can often benefit from optimizations, such as not scanning the entire table (i.e. search using primary keys), and using the explain keyword to get the database's query planner to tell you how it intends to work on that internally (e.g. is it using PKs, FKs, indices, or is it scanning all table rows?, etc).
if SELECT statement takes more than 5s, will it affect the scheduled INSERT statement?
"Affect" in what way? If you mean "prevent insert from actually inserting until the select has completed", that depends on the storage engine. For example, MyISAM and InnoDB are different, and that includes locking policies. For example, MyISAM tends to lock entire tables while InnoDB tends to lock specific rows. InnoDB is also ACID-compliant, which means it can provide certain integrity guarantees. You should read the docs on this for more details.
what happen when INSERT statement invoked while SELECT is still running, will SELECT get the newly inserted records?
Part of "what happens" is determined by how the specific storage engine behaves. Regardless of what happens, the database is designed to answer application queries in a way that's consistent.
As an example, if the select statement were to lock an entire table, then the insert statement would have to wait until the select has completed and the lock has been released, meaning that the app would see the results prior to the insert's update.
I understand that locking database can prevent messing up the SELECT statement.
It can also put a potentially unacceptable performance bottleneck, especially if, as you say, the system is inserting lots of rows every 5 seconds, and depending on the frequency with which you're running your queries, and how efficiently they've been built, etc.
what is the good practice to do when I need the data for calculations while those data will be updated within short period?
My recommendation is to simply accept the fact that the calculations are based on a snapshot of the data at the specific point in time the calculation was requested and to let the database do its job of ensuring the consistency and integrity of said data. When the app requests data, it should trust that the database has done its best to provide the most up-to-date piece of consistent information (i.e. not providing a row where some columns have been updated, but others yet haven't).
With new rows coming in at the frequency you mentioned, reasonable users will understand that the results they're seeing are based on data available at the time of request.
All of your questions are related to locking of table.
Your all questions depend on the way database is configured.
Read : http://www.mysqltutorial.org/mysql-table-locking/
Perform Select Statement While insert statement working
If you want to perform a select statement during insert SQL is performing, you should check by open new connection and close connection every time. i.e If I want to insert lots of records, and want to know that last record has inserted by selecting query. I must have to open connection and close connection in for loop or while loop.
# send a request to store data
insert statement working // take a long time
# select statement in while loop.
while true:
cnx.open()
select statement
cnx.close
//break while loop if you get the result
I have a database with 200+ entries, and with a cronjob I'm updating the database every 5 minutes. All entries are unique.
My code:
for($players as $pl){
mysql_query("UPDATE rp_players SET state_one = '".$pl['s_o']."', state_two = '".$pl['s_t']."' WHERE id = '".$pl['id']."' ")
or die(mysql_error());
}
There are 200+ queries every 5 minute. I don't know what would happen if my database will have much more entries (2000... 5000+). I think the server will die.
Is there any solution (optimization or something...)?
I think you can't do much but make the cron to be executed every 10 minutes if it's getting slower and slower. Also, you can set X rule to delete X days old entries.
If id is your primary (and unique as you mentioned) key, updates should be fast and couldn't be optimised (since it's a primary key... if not, see if you can add an index).
The only problem which could occur (on my mind) is cronjob overlapping, due to slow updates: let's assume your job starts at 1:00am and isn't finished at 1:05am... this will mean that your queries will pile up, creating server load, slow response time, etc...
If this is your case, you should use rabbitmq in order to queue your update queries in order to process them in a more controlled way...
I would load all data that is to be updated into a temporary table using the LOAD DATA INFILE command: http://dev.mysql.com/doc/refman/5.5/en/load-data.html
Then, you could update everything with one query:
UPDATE FROM rp_players p
INNER JOIN tmp_players t
ON p.id = t.id
SET p.state_one = t.state_one
, p.state_two = t.state_two
;
This would be much more efficient because you would remove a lot of the back and forth to the server that you are incurring by running a separate query every time through a php loop.
Depending on where the data is coming from, you might be able to remove PHP from this process entirely.
Our database is insert inensive (200-500k per night) but update light (maybe a few hundred per day).
I need to preserve a history of all changes to the inserted rows themselves for an indefinite period of time (but not the actual insert). I would love to use Change Data Capture but the amount of space required to support this is not avaiable. If I can figure out of to do one of the following, my life would be much easier.
1) Limite change data tracking to UPDATES and DELETES only
2) Cleanup only INSERTS from the CDC tables regularly
In the past, I'd have just used a trigger (which is still not off the table!).
I would just use a trigger to capture updates and deletes.
I don't think you can tell CDC what DML to pay attention to, and I think it's quite wasteful to let CDC record all of these inserts only to delete them afterward. That in and of itself is expensive and the fragmentation it will cause will also cause issues for any queries you run against the capture tables (you'll have lots of mostly-empty pages) as well as the work statistics will have to do to constantly keep the stats up to date.
You could possible put an instead of insert trigger on the capture table, that just does nothing, but I haven't tried to do this to even see if it is allowed, and I certainly don't know what impact that will have on the CDC functions. Possibly worth some investigation, but my original answer still stands even if this hack does work: just use a trigger.
If space is a consideration, you can always assign the CDC tables to work with a different filegroup that could potentially live on a different server. You'd do that this way:
ALTER DATABASE YourDatabase
ADD FILEGROUP [cdc_ChangeTables];
go
--this step requires going on to somewhere on your hard drive and creating a folder
ALTER DATABASE YourDatabase
ADD FILE ( NAME = N'cdc_ChangeTables',
FILENAME = N'E:\NameOfFolderYouSetUp\YourDatabase_cdc_ChangeTables.mdf',
SIZE = 1048576KB,
FILEGROWTH = 102400KB )
TO FILEGROUP [cdc_ChangeTables];
GO
Then when you want to set up your CDC tables, you point them toward that filegroup instead:
EXEC sys.sp_cdc_enable_table
#source_schema = N'dbo',
#source_name = N'TableYouWantToCapture',
#role_name = N'cdc_admin',
#filegroup_name = N'cdc_ChangeTables', --this is where you name the filegroup from previous step
#supports_net_changes = 1,
#capture_instance = N'dbo_TableYouWantToCapture',
#captured_column_list = 'ColumnName1, ColumnName2'; --comma-delimited list of column names
GO
If you want to query only updates/deletes, you can use the system function like so:
SELECT *
FROM cdc.fn_cdc_get_all_changes_dbo_TableYouWantToCapture(#from_lsn, #to_lsn, N'all update old')
WHERE __$operation IN (1,3,4)
MySQL 5.1, Ubuntu 10.10 64bit, Linode virtual machine.
All tables are InnoDB.
One of our production machines uses a MySQL database containing 31 related tables. In one table, there is a field containing display values that may change several times per day, depending on conditions.
These changes to the display values are applied lazily throughout the day during usage hours. A script periodically runs and checks a few inexpensive conditions that may cause a change, and updates the display value if a condition is met. However, this lazy method doesn't catch all posible scenarios in which the display value should be updated, in order to keep background process load to a minimum during working hours.
Once per night, a script purges all display values stored in the table and recalculates them all, thereby catching all possible changes. This is a much more expensive operation.
This has all been running consistently for about 6 months. Suddenly, 3 days ago, the run time of the nightly script went from an average of 40 seconds to 11 minutes.
The overall proportions on the stored data have not changed in a significant way.
I have investigated as best I can, and the part of the script that is suddenly running slower is the last update statement that writes the new display values. It is executed once per row, given the (INT(11)) id of the row and the new display value (also an INT).
update `table` set `display_value` = ? where `id` = ?
The funny thing is, that the purge of all the previous values is executed as:
update `table` set `display_value` = null
And this statement still runs at the same speed as always.
The display_value field is not indexed. id is the primary key. There are 4 other foreign keys in table that are not modified at any point during execution.
And the final curve ball: If I dump this schema to a test VM, and execute the same script it runs in 40 seconds not 11 minutes. I have not attempted to rebuild the schema on the production machine, as that's simply not a long term solution and I want to understand what's happening here.
Is something off with my indexes? Do they get cruft in them after thousands of updates on the same rows?
Update
I was able to completely resolve this problem by running optimize on the schema. Since InnoDB doesn't support optimize, this forced a rebuild, and resolved the issue. Perhaps I had a corrupted index?
mysqlcheck -A -o -u <user> -p
There is a chance the the UPDATE statement won't use an index on id, however, it's very improbable (if possible at all) for a query like yours.
Is there a chance your table are locked by a long-running concurrent query / DML? Which engine does the table use?
Also, updating the table record-by-record is not efficient. You can load your values into a temporary table in a bulk manner and update the main table with a single command:
CREATE TEMPORARY TABLE tmp_display_values (id INT NOT NULL PRIMARY KEY, new_display_value INT);
INSERT
INTO tmp_display_values
VALUES
(?, ?),
(?, ?),
…;
UPDATE `table` dv
JOIN tmp_display_values t
ON dv.id = t.id
SET dv.new_display_value = t.new_display_value;
EDIT: To clarify the records originally come from a flat-file database and is not in the MySQL database.
In one of our existing C programs which purpose is to take data from the flat-file and insert them (based on criteria) into the MySQL table:
Open connection to MySQL DB
for record in all_record_of_my_flat_file:
if record contain a certain field:
if record is NOT in sql_table A: // see #1
insert record information into sql_table A and B // see #2
Close connection to MySQL DB
select field from sql_table A where field=XXX
2 inserts
I believe that management did not feel it is worth it to add the functionality so that when the field in the flat file is created, it would be inserted into the database. This is specific to one customer (that I know of). I too, felt it odd that we use tool such as this to "sync" the data. I was given the duty of using and maintaining this script so I haven't heard too much about the entire process. The intent is to primarily handle additional records so this is not the first time it is used.
This is typically done every X months to sync everything up or so I'm told. I've also been told that this process takes roughly a couple of days. There is (currently) at most 2.5million records (though not necessarily all 2.5m will be inserted and most likely much less). One of the table contains 10 fields and the other 5 fields. There isn't much to be done about iterating through the records since that part can't be changed at the moment. What I would like to do is speed up the part where I query MySQL.
I'm not sure if I have left out any important details -- please let me know! I'm also no SQL expert so feel free to point out the obvious.
I thought about:
Putting all the inserts into a transaction (at the moment I'm not sure how important it is for the transaction to be all-or-none or if this affects performance)
Using Insert X Where Not Exists Y
LOAD DATA INFILE (but that would require I create a (possibly) large temp file)
I read that (hopefully someone can confirm) I should drop indexes so they aren't re-calculated.
mysql Ver 14.7 Distrib 4.1.22, for sun-solaris2.10 (sparc) using readline 4.3
Why not upgrade your MySQL server to 5.0 (or 5.1), and then use a trigger so it's always up to date (no need for the monthly script)?
DELIMITER //
CREATE TRIGGER insert_into_a AFTER INSERT ON source_table
FOR EACH ROW
BEGIN
IF NEW.foo > 1 THEN
SELECT id AS #testvar FROM a WHERE a.id = NEW.id;
IF #testvar != NEW.id THEN
INSERT INTO a (col1, col2) VALUES (NEW.col1, NEW.col2);
INSERT INTO b (col1, col2) VALUES (NEW.col1, NEW.col2);
END IF
END IF
END //
DELIMITER ;
Then, you could even setup update and delete triggers so that the tables are always in sync (if the source table col1 is updated, it'll automatically propagate to a and b)...
Here's my thoughts on your utility script...
1) Is just a good practice anyway, I'd do it no matter what.
2) May save you a considerable amount of execution time. If you can solve a problem in straight SQL without using iteration in a C-Program, this can save a fair amount of time. You'll have to profile it first to ensure it really does in a test environment.
3) LOAD DATA INFILE is a tactic to use when inserting a massive amount of data. If you have a lot of records to insert (I'd write a query to do an analysis to figure out how many records you'll have to insert into table B), then it might behoove you to load them this way.
Dropping the indexes before the insert can be helpful to reduce running time, but you'll want to make sure you put them back when you're done.
Although... why aren't all the records in table B in the first place? You haven't mentioned how processing works, but I would think it would be advantageous to ensure (in your app) that the records got there without your service script's intervention. Of course, you understand your situation better than I do, so ignore this paragraph if it's off-base. I know from experience that there are lots of reasons why utility cleanup scripts need to exist.
EDIT: After reading your revised post, your problem domain has changed: you have a bunch of records in a (searchable?) flat file that you need to load into the database based on certain criteria. I think the trick to doing this as quickly as possible is to determine where the C application is actually the slowest and spends the most time spinning its proverbial wheels:
If it's reading off the disk, you're stuck, you can't do anything about that, unless you get a faster disk.
If it's doing the SQL query-insert operation, you could try optimizing that, but your'e doing a compare between two databases (the flat-file and the MySQL one)
A quick thought: by doing a LOAD DATA INFILE bulk insert to populate a temporary table very quickly (perhaps even an in-memory table if MySQL allows that), and then doing the INSERT IF NOT EXISTS might be faster than what you're currently doing.
In short, do profiling, and figure out where the slowdown is. Aside from that, talk with an experienced DBA for tips on how to do this well.
I discussed with another colleague and here is some of the improvements we came up with:
For:
SELECT X FROM TABLE_A WHERE Y=Z;
Change to (currently waiting verification on whether X is and always unique):
SELECT X FROM TABLE_A WHERE X=Z LIMIT 1;
This was an easy change and we saw some slight improvements. I can't really quantify it well but I did:
SELECT X FROM TABLE_A ORDER BY RAND() LIMIT 1
and compared the first two query. For a few test there was about 0.1 seconds improvement. Perhaps it cached something but the LIMIT 1 should help somewhat.
Then another (yet to be implemented) improvement(?):
for record number X in entire record range:
if (no CACHE)
CACHE = retrieve Y records (sequentially) from the database
if (X exceeds the highest record number in cache)
CACHE = retrieve the next set of Y records (sequentially) from the database
search for record number X in CACHE
...etc
I'm not sure what to set Y to, are there any methods for determining what's a good sized number to try with? The table has 200k entries. I will edit in some results when I finish implementation.