I have an odd problem that I can't figure out. I'm not much of a MySQL guy (more of a SQL Server person), and I have an INSERT statement that is running (seemingly) forever.
INSERT INTO voter_registration_v2.temp_address_map (person_id, address_id)
select person_id, address_id from
voter_registration.voters v
inner join voter_registration_v2.address a
on v.house_num = a.house_num
and v.half_code = a.half_code
and v.street_dir = a.street_dir
and v.street_name = a.street_name
and v.street_type_cd = a.street_type_cd
and v.street_sufx_cd = a.street_sufx_cd
and v.unit_designator = a.unit_designator
and v.unit_num = a.unit_num
and v.res_city_desc = a.res_city_desc
and v.state_cd = a.state_cd
and v.zip_code = a.zip_code;
The SELECT itself runs in 20s, 16s to fetch. When I run with the INSERT I've timed out at 6000s. All tables are using the MyISAM Engine. I attempted InnoDB originally but it didn't make a difference. It definitely is a large insert - about 600k records. Below is the CREATE for the temp table.
CREATE TABLE temp_address_map (
person_id int PRIMARY KEY,
address_id INT
);
However, even with 600k - I can't imagine an INSERT taking 100+ minutes if the SELECT only takes ~30s. Appreciate any suggestions.
I have noticed odd problems with my local installation of MySQL anyway. Some SELECT statements that take .5 seconds or less randomly began running forever as well. The ONLY way I could fix the problem was to uninstall and reinstall the server. I must have run through 100 suggestions on forums before I gave up. It's almost like MySQL gets progressively slower until it's unusable. (my RAM is around 48% used). Kind of odd, not sure that's what is going on here though...
You are correct. Under most circumstances, a select query that returns in 20s should not take hours for the insert. However, I would caution that you may be timing the select based on the "first row" that is returned. The insert doesn't return until all rows have been returned.
You have a very detailed on clause. I would suggest a composite index on all the columns used in the clause (starting from the most general to the least general):
create index idx_address_allkeys
on address(state_cd, res_city_desc, zip, street_name, . . . );
In other words, I am guessing that your code is using a nested loop join, returning one row at time.
Related
I have ran into an issue when using a mysql database where, after creating a new table and adding CRUD database query logic to my web application (with backend written in c), update querys will sometimes take 10-20 minute to execute.
The web application has apache modules that talk to server daemons that have a connection to a mysql (MariaDB 10.4) database. The server daemons each have about 20 work threads, waiting to handle any requests from the apache modules. The work threads maintain a consent connection to the mysql database. I added a new table of the following schema:
CREATE TABLE MyTable
(
table_id INT NOT NULL AUTO_INCREMENT,
index_id INT NOT NULL,
int_column_1 INT DEFAULT 0,
decimal_column_1 DECIMAL(9,3) DEFAULT 0,
decimal_column_2 DECIMAL(9,3) DEFAULT 0,
varchar_column_1 varchar(3000) DEFAULT NULL,
varchar_column_2 varchar(3000) DEFAULT NULL,
deleted tinyint DEFAULT 0,
PRIMARY KEY (table_id) ,
KEY index_on_index_id (index_id)
)
Then I added the following crud operations:
1. RETRIEVE:
SELECT * FROM MyTable table_id, varchar_column_1,... WHERE index_id = ${given index_id}
2. CREATE:
INSERT INTO MyTable (index_id, varchar_column_2, ,,,) VALUES ( ${given}, ${given})Note: This is done using a prepare statement because ${given varchar_column_2} is a user entered value.
3. UPDATE:
UPDATE MyTable SET varchar_column_1 = ISNULL(${given varchar_column_2}, `varchar_column_2 `) WHERE table_id = ${given table_id} Note: This is also done using a prepare statement because ${given varchar_column_2} is a user entered value. Also, the isnull is a kludge solution to the possibility that the given varchar_column_2 might be null, so that the column will just be set to the value in the table.
4. DELETE:
UPDATE MyTable SET deleted = 1 WHERE table_id = ${given table_id}
Finally, there is a delete index_id operation:
UPDATE MyTable SET deleted = 1 WHERE index_id = ${given index_id }
This was deployed to a production server without proper testing. On that production server, a script I wrote was ran that filled MyTable with about 30,000 entries. Then, using the crud operations, about 600 updates, 50 creates, 20 deletes, and thousands of retrieves were performed on the table. The problem that is occurring is that after some time (an hour or two) of these operations being performed, the update operation would take 10+ minutes to execute. Eventually, all of the work threads in the server daemon would be stuck waiting on the update operations, and any other requests to the daemon would time out. This behavior happened twice in one day and one more time two days later.
There were three parts of this behavior that really confused me. One is that all update operations on the database were being blocked. So even if the daemon, or any daemon, was updating a different table in database, that update would take 10+ minutes. The next is that the select operations would execute instantly as all the update queries were taking 10+ minutes. Finally, after 10-20 minutes, all of the 20-ish update queries would successfully execute, the database would be correctly updated, and the threads would go back to working properly.
I received a dump of the database and ran EXPLAIN ${mysql query} for each of the new CRUD queries, and none produced strange results. In the "Extras" column, the only entry was "using where clause" for the queries that have where clauses. Another potential problem is the use of varchars. Since the UPDATE operations are used the most and are the ones that seem to be causing the problem, I thought maybe the fact that the varchars are changing sizes a lot (they range from 8 chars to 500 chars), it might run into some mysql memory issues that cause the long execution time. I also thought maybe there was an issue with table level locks, but running
Show status like ' table%
returned table_locks_waited = 0.
Unfortunately, no database monitoring was being done on the production server that was having issues, I only have the order of the transactions as they happened. To this, each time this issue occurred, the first update query that was blocked was an update to a different table in the database. It was the same query twice (but it is also the most common update query in the application), but it has been in the application for months without any issues.
I tried to reproduce this issue on a server with the same table and CRUD operations, but with only 600 entries in MyTable. Making about 100 update requests, 20 create requests, 5 delete requests, and hundreds of get requests. I could not reproduce the issue of the update queries taking 10+ minutes. This makes me think that maybe the size of the table has something to do with it.
I am looking for any suggestions on what might be causing this issue, or any ideas on how to better diagnose the problem.
Sorry for the extremely long question. I am a junior software engineer that is in a little over his head. Any help would be really appreciated. I can also provide any additional information about the database or application if needed.
I have a table with just under 50 million rows. It hit the limit for INT (2147483647). At the moment the table is not being written to.
I am planning on changing the ID column from INT to BIGINT. I am using a Rails migration to do this with the following migration:
def up
execute('ALTER TABLE table_name MODIFY COLUMN id BIGINT(8) NOT NULL AUTO_INCREMENT')
end
I have tested this locally on a dataset of 2000 rows and it worked ok. Running the ALTER TABLE command across the 50 million should be ok since the table is not being used at the moment?
I wanted to check before I run the migration. Any input would be appreciated, thanks!
We had exactly same scenario but with postgresql, and i know how 50M fills up the whole range of int, its gaps in the ids, gaps generated by deleting rows over time or other factors involving incomplete transactions etc.
I will explain what we ended up doing, but first, seriously, testing a data migration for 50M rows on 2k rows is not a good test.
There can be multiple solutions to this problem, depending on the factors like which DB provider are you using? We were using mazon RDS and it has limits on runtime and what they call IOPS(input/output operations) if we run such intensive query on a DB with such limits it will run out of its IOPS quota mid way throuh, and when IOPS quota runs out, DB ends up being too slow and kind of just useless. We had to cancel our query, and let the IOPS catch up which takes about 30 minutes to 1 hour.
If you have no such restrictions and have DB on premises or something like that, then there is another factor, which is, if you can afford downtime?**
If you can afford downtime and have no IOPS type restriction on your DB, you can run this query directly, which will take a lot fo time(may half hour or so, depending on a lot of factors) and in the meantime
Table will be locked, as rows are being changed, so make sure not only this table is not getting any writes, but also no reads during the process, to make sure your process goes to the end smoothly without any deadlocks type situation.
What we did avoiding downtimes and the Amazon RDS IOPS limits:
In my case, we had still about 40M ids left in the table when we realized this is going to run out, and we wanted to avoid downtimes. So we took a multi step approach:
Create a new big_int column, name it new_id or something(have it unique indexed from start), this will be nullable with default null.
Write background jobs which runs each night a few times and backfills the new_id column from id column. We were backfilling about 4-5M rows each night, and a lot more over weekends(as our app had no traffic on weekends).
When you are caught up backfilling, now we will have to stop any access to this table(we just took down our app for a few minutes at night), and create a new sequence starting from the max(new_id) value, or use existing sequence and bind it to the new_id column with default value to nextval of that sequence.
Now switch primary key from id to new_id, before that make new_id not null.
Delete id column.
Rename new_id to id.
And resume your DB operations.
This above is minimal writeup of what we did, you can google up some nice articles about it, one is this. This approach is not new and pretty much common, so i am sure you will find even mysql specific ones too, or you can just adjust a couple of things in this above article and you should be good to go.
I have a database with 200+ entries, and with a cronjob I'm updating the database every 5 minutes. All entries are unique.
My code:
for($players as $pl){
mysql_query("UPDATE rp_players SET state_one = '".$pl['s_o']."', state_two = '".$pl['s_t']."' WHERE id = '".$pl['id']."' ")
or die(mysql_error());
}
There are 200+ queries every 5 minute. I don't know what would happen if my database will have much more entries (2000... 5000+). I think the server will die.
Is there any solution (optimization or something...)?
I think you can't do much but make the cron to be executed every 10 minutes if it's getting slower and slower. Also, you can set X rule to delete X days old entries.
If id is your primary (and unique as you mentioned) key, updates should be fast and couldn't be optimised (since it's a primary key... if not, see if you can add an index).
The only problem which could occur (on my mind) is cronjob overlapping, due to slow updates: let's assume your job starts at 1:00am and isn't finished at 1:05am... this will mean that your queries will pile up, creating server load, slow response time, etc...
If this is your case, you should use rabbitmq in order to queue your update queries in order to process them in a more controlled way...
I would load all data that is to be updated into a temporary table using the LOAD DATA INFILE command: http://dev.mysql.com/doc/refman/5.5/en/load-data.html
Then, you could update everything with one query:
UPDATE FROM rp_players p
INNER JOIN tmp_players t
ON p.id = t.id
SET p.state_one = t.state_one
, p.state_two = t.state_two
;
This would be much more efficient because you would remove a lot of the back and forth to the server that you are incurring by running a separate query every time through a php loop.
Depending on where the data is coming from, you might be able to remove PHP from this process entirely.
MySQL 5.1, Ubuntu 10.10 64bit, Linode virtual machine.
All tables are InnoDB.
One of our production machines uses a MySQL database containing 31 related tables. In one table, there is a field containing display values that may change several times per day, depending on conditions.
These changes to the display values are applied lazily throughout the day during usage hours. A script periodically runs and checks a few inexpensive conditions that may cause a change, and updates the display value if a condition is met. However, this lazy method doesn't catch all posible scenarios in which the display value should be updated, in order to keep background process load to a minimum during working hours.
Once per night, a script purges all display values stored in the table and recalculates them all, thereby catching all possible changes. This is a much more expensive operation.
This has all been running consistently for about 6 months. Suddenly, 3 days ago, the run time of the nightly script went from an average of 40 seconds to 11 minutes.
The overall proportions on the stored data have not changed in a significant way.
I have investigated as best I can, and the part of the script that is suddenly running slower is the last update statement that writes the new display values. It is executed once per row, given the (INT(11)) id of the row and the new display value (also an INT).
update `table` set `display_value` = ? where `id` = ?
The funny thing is, that the purge of all the previous values is executed as:
update `table` set `display_value` = null
And this statement still runs at the same speed as always.
The display_value field is not indexed. id is the primary key. There are 4 other foreign keys in table that are not modified at any point during execution.
And the final curve ball: If I dump this schema to a test VM, and execute the same script it runs in 40 seconds not 11 minutes. I have not attempted to rebuild the schema on the production machine, as that's simply not a long term solution and I want to understand what's happening here.
Is something off with my indexes? Do they get cruft in them after thousands of updates on the same rows?
Update
I was able to completely resolve this problem by running optimize on the schema. Since InnoDB doesn't support optimize, this forced a rebuild, and resolved the issue. Perhaps I had a corrupted index?
mysqlcheck -A -o -u <user> -p
There is a chance the the UPDATE statement won't use an index on id, however, it's very improbable (if possible at all) for a query like yours.
Is there a chance your table are locked by a long-running concurrent query / DML? Which engine does the table use?
Also, updating the table record-by-record is not efficient. You can load your values into a temporary table in a bulk manner and update the main table with a single command:
CREATE TEMPORARY TABLE tmp_display_values (id INT NOT NULL PRIMARY KEY, new_display_value INT);
INSERT
INTO tmp_display_values
VALUES
(?, ?),
(?, ?),
…;
UPDATE `table` dv
JOIN tmp_display_values t
ON dv.id = t.id
SET dv.new_display_value = t.new_display_value;
I have a process that imports a lot of data (950k rows) using inserts that insert 500 rows at a time. The process generally takes about 12 hours, which isn't too bad. Normally doing a query on the table is pretty quick (under 1 second) as I've put (what I think to be) the proper indexes in place. The problem I'm having is trying to run a query when the import process is running. It is making the query take almost 2 minutes! What can I do to make these two things not compete for resources (or whatever)? I've looked into "insert delayed" but not sure I want to change the table to MyISAM.
Thanks for any help!
Have you tried using priority hints?
SELECT HIGH_PRIORITY ... and INSERT LOW_PRIORITY ...
12 hours to insert 950k rows is pretty heavy duty. How big are these rows? What kind of indexes are on them? Even if the actual data insertion goes quickly, the continual updating of the indexes will definitely cause performance degradation for anything using those table(s) at the time.
Are you doing these imports with the bulk INSERT syntax (insert into tab (x) values (a), (b), (c), etc...) or one INSERT per row? Doing the bulk insert will require a longer index updating period (as it has to generate index data for 500 rows) than doing it for a single row. There will be no doubt be some sort of internal lock placed on the indexes while the data's updated, in which case you're contending with 950k/500 = 1,900 locking sessions at minimum.
I found that on some of my bulk-insert scripts (an http log analyzer for some custom data mining), it was quicker to DISABLE indexes on the relevant tables, then reenabling/rebuilding them after the data dump was completed. If I remember right, it was about 37 minutes to insert 200,000 rows of hit data with keys enabled, and about 3 minutes with no indexing.
So I finally found the slowdown while searching during the import of my data. I had one query like this:
SELECT * FROM `properties` WHERE (state like 'Florida%') and (county like 'Hillsborough%') ORDER BY created_at desc LIMIT 0, 50
and when I ran an EXPLAIN on it, I found out it was scanning around 215,000 rows (even with proper indexes on state and county in place). I then ran an EXPLAIN on the following query:
SELECT * FROM `properties` WHERE (state = 'Florida') and (county = 'Hillsborough') ORDER BY created_at desc LIMIT 0, 50
and saw that it only had to scan 500 rows. Considering that the actual result set was something like 350, I think I identified the slowdown.
I've made the switch to not using "like" in my queries and am very happy with the snappier results.
Thanks to everyone for your help and suggestions. They are much appreciated!
You can try import your data to some auxiliary table and then merge it into the main table. You don't lose performance in your main table, and I think your db can manage the merge much faster than the multiple insertions.