I have a table with several hundred million rows of data. I want to delete the table, but every operation I perform on the table loses connection after running for 50,000+ seconds (about 16 hours), which is under the 60,000 second time out condition I have set in the database. I've tried creating a stored procedure with the Drop Table code thinking that if I send the info to the DB to perform the operation it will not need a connection to process it, but it does the same thing. Is it just timing out? Or do I need to do something else?
Instead do TRUNCATE TABLE. Internally it creates an equivalent, but empty, table, then swaps. This technique might take a second, even for a very big table.
If you are deleting most of a table, then it is usually faster (sometimes a lot faster), to do
CREATE TABLE new LIKE real;
INSERT INTO new
SELECT ... FROM real
WHERE ... -- the rows you want to keep
Why do you need to delete everything?
For other techniques in massive deletes, including big chunks out of a huge table, see https://mariadb.com/kb/en/mariadb/big-deletes/
Related
I was working on table which has near about 50 million data(2GB-size). I had requirement to optimize the performance. So when I add index on column through phpmyadmin panel, table got lock and result in holding up all queries in queue on that table and ultimately results in restart/kill all queries. (And yeah, I forgot to mention I was doing this on production. My bad!)
When I did some research I found out some solution like creating duplicate table but any alternative method ?
You may follow this steps,
Create a temp table
Creates triggers on the first table (for
inserts, updates, deletes) so that they are replicated to the temp
table
In small batches, migrate data When done, rename table to new
table, and drop the other table
But as you said you are doing it in production then you need to consider live traffic while dropping a table and creating another one
I have the following cron process running every hour to update global game stats:
Create temporary table
For each statistic, insert rows into the temporary table (stat key, user, score, rank)
Truncate main stats table
Copy data from temporary table to main table
The last step causes massive backlog in queries. Looking at SHOW PROCESSLIST I see a bunch of updating-status queries that are stuck until the copy completes (which may take up to a minute).
However I did notice that it's not like it has consecutive query IDs piling up, many queries get completed just fine. So it almost seems like it's a "thread" that gets stuck or something. Also of note is that the stuck updates have nothing in common with the ongoing copy (different tables, etc)
So:
Can I have cron connect to MySQL on a dedicated "thread" such that its disk activity (or whatever it is) doesn't lock other updates, OR
Am I misinterpreting what's going on, and if so how can I find out what the actual case is?
Let me know if you need any more info.
MySQL threads are not perfectly named. If you're a Java dev, for example, you might make some untrue assumptions about MySQL threads based on your Java knowledge.
For some reason that's hard to diagnose from a distance, your copy step is blocking some queries from completing. If you're curious about which ones try doing
SHOW FULL PROCESSLIST
and try to make sense of the result.
In the meantime, you might consider a slightly different approach to refreshing these hourly stats.
create a new, non temporary table, calling it something like stats_11 for the 11am update. If the table with that name already existed, drop the old one first.
populate that table as needed.
add the indexes it needs. Sometimes populating the table is faster if the indexes aren't in place while you're doing it.
create or replace view stats as select * from stats_11
Next hour, do the same with stats_12. The idea is to have your stats view pointing to a valid stats table almost always.
This should reduce your exposure time to the stats-table building operaiton.
If the task is to completely rebuild the table, this is the best:
CREATE TABLE new_stats LIKE stats;
... fill up new_stats by whatever means ...
RENAME TABLE stats TO old_stats, new_stats TO stats;
DROP TABLE old_stats;
There is zero interference because table real is always available and always has a complete set of rows. (OK, RENAME does take a minuscule amount of time.)
No VIEWs, no TEMPORARY table, no copying the data over, no need for 24 tables.
You could consider doing the task "continually", rather than hourly. This becomes especially beneficial if the table gets so big that the hourly cron job takes more than one hour!
I need to add at least 1 index to a column of type int(1) on an InnoDB table. There are about 3 million rows that it would need to index. This is a database on my production server, and it is in use by thousands of people everyday. I tried to add an index the standard way, but it was taking up too much time (I let it run for about 7 minutes before killing the process) and locking rows, meaning a frozen application for many users.
My VPS that runs all of this has 512mb of RAM and has an Intel Xeon E5504 processor.
How can I add an index to this production database without interrupting my user's experience?
Unless the table either reads XOR writes then you'll probably need to take down the site. Lock the databases, run the operation and wait.
If the table is a write only swap the writes to a temporary table and run the operation on the old table, then swap the writes back to the old table and insert the data from the temporary table.
If the table is read only, duplicate the table and run the operation on the copy.
If the table is a read/write then a messy alternative that might work, is to create a new table with the indexes and set the primary key start point to the next value in the original table, add a join to your read requests to select from both tables, but write exclusively to the new table. Then write a script that inserts from the old table to the new then deletes the row in the old table. It'll take far, far longer than the downtime, and plenty can go wrong, but it should be do-able.
you can set the start point of a primary key with
ALTER TABLE `my_table` AUTO_INCREMENT = X;
hope that helps.
take a look at pt-online-schema-change. i think this tool can be quite useful in your case. it will obviously put additional load on your database server but should not block access to the table for most of the operation time.
I have a huge indexed MySQL table (~300-400 GB) that I need to append with new entries from time to time (where the new data takes ~10-20 GB). The raw file with new data may contain mistakes, that could be fixed only manually and are visible only when the processing script reach them. Also the new data should be available in the main db only after the full processing of the raw data is finished. So to not screw up the main table I decided to have the following workflow:
The script creates temporary table with the structure identical to the main table and fills it.
Once it is done and verified, the temporary table is inserted into main one:
INSERT INTO main_table (all_fields_except_primary_key) SELECT all_fields_except_primary_key FROM new_table;
And this procedure is extremely slow, as I understand due to indexing new results.
I have read that inserting into indexed tables is very slow in general and some professionals suggest to DROP INDEX'es before insert big amount of data and then index again. But with such huge data indexing of whole table is very long (much longer than my naive INSERT INTO .. SELECT ..) and what is more important, the main table almost couldn't be used during it (without indexes SELECTS takes ages).
So I had an idea of indexing the temprory table before inserting (since it is very fast) and then do merge combining both indexes.
Is it possible somehow in MySQL?
And another question: possibly there're another workaround for my task?
Just finished rewriting many queries as batch queries - no more DB calls inside of foreach loops!
One of these new batch queries, and insert ignore into a pivot table, is taking 1-4 seconds each time. It is fairly large (~100 rows per call) and the table is also > 2 million rows.
This is the current bottleneck in my program. Should I consider something like locking the table (never done this before, but I have heard it is ... dangerous) or are there other options I should look at first.
As it is a pivot table, there is a unique key comprised of both the rows I am updating.
Are you using indexes? Indexing the correct columns speeds things up immensely. If you are doing a lot of updating and inserting, sometimes it makes sense to disable indexes until finished, since re-indexing takes time. I don't understand how locking the table would help. Is this table in user by other users or applications? That would be the main reason locking would increase speed.