I need to update about 100 000 records in MySQL table (with indexes) so this process can take long time. i'm searching solution which will work faster.
I have three solutions but i have no time for speed tests.
Solutions:
usual UPDATE with each new record in array loop (bad perfomance)
using UPDATE syntax like here Update multiple rows with one query? - can't find any perfomance result
using LOAD DATA INFILE with the same value for key field, i guess in this case it will call UPDATE instead UNSERT - i guess should work faster when ever
Do you know which is solution is best.
The one important criteria is execution speed.
Thanks.
LOAD DATA INFILE the fastest way to upsert large amount of data from file;
second solution is not so bad as you might think. Especially if you can execute something like
update table
set
field = values
where
id in (list_of_ids)
but it would be better to post your update query.
Related
So i have database in project Mysql .
I have a main table that have main staff for updating and inserting .
I have huge data traffic on the data . what i am doing mainly reading .csv file and inserting to table .
Everything works file for 3 days but when table record goes above 20 million the database start responding slow , and in 60 million more slow.
What i have done ?
I have applied index in the record where i think i need of it . (where clause field for fast searching) .
I think query optimisation can not be issue because database working fine for 3 days and when data filled in table it get slow . and as i reach 60 million it work more slow .
Can you provide me the approach how can i handle this ?
What should i do ? Should i shift data in every 3 days or what ? What you have done in such situation .
The purpose of database is to store a huge information. I think the problem is not in your database, it should be poor query, joins, Database buffer, index and cache. These are the following reason which makes your response to slow up. For more info check this link
I have applied index in the record where i think i need of it
Yes, index improve the performance of SELECT query, but at the same time it will degrade your DML operation and index has to be restructure whenever you perform any changes to indexed column.
Now, this is totally depending on your business need, whether you need index or not, whether you can compromise SELECT or DML.
Currently, many industries uses two different schemas OLAP for reporting and analytics and OLTP to store real-time data (including some real-time reporting).
First of all it could be helpful for us to now which kind of data you want to store.
Normally it makes no sense to store such a huge amount of data in 3 days because no one ever will be able to use this in an effective way. So it is better to reduce the data before storing in the database.
e.g.
If you get measuring values from a device which gives you one value a millisecond, you should think if any user is ever asking for a special value at a special millisecond or if it not makes more sense to calculate the average value of once a second, minute or hour or perhaps once a day?
If you really need the milliseconds but only if the user takes a deeper look, you can create a table from the main table with only the average values of an hour or day or whatever and work with that table. Only if the user goes in ths "milliseconds" view you use the main table and have to live with the more bad performance.
This all is of course only possible if the database data is read only. If the data in the database is changed from the application (and not only appended by the CSV import) then using more then one table will be error prone.
Whick operation do you want to speed up?
insert operation
A good way to speed it up is to insert records in batch. For example, insert 1000 records in each insert statement:
insert into test values (value_list),(value_list)...(value_list);
other operations
If your table got tens of millions of records, everything will be slowing down. This is quite common.
To speed it up in this situation, here is some advice:
Optimize your table definition. It depends on your particular case. Creating indexes is a common way.
Optimize your SQL statements. Apparently a good SQL statement will run much faster, and a bad SQL statement might be a performance killer.
Data migration. If only part of your data is used frequently, you can shift the infrequently-used data to another big table.
Sharding. This is a more complicated way, but usually used in big data system.
For the .csv file, use LOAD DATA INFILE ...
Are you using InnoDB? How much RAM do you have? What is the value of innodb_buffer_pool_size? That may not be set right -- based on queries slowing down as the data increases.
Let's see a slow query. And SHOW CREATE TABLE. Often a 'composite' index is needed. Or reformulation of the SELECT.
I've came across the situation, where I need to select huge amount of data (say 100k records which look like ID | {"points":"9","votes":"2","breakdown":"0,0,0,1,1"}), process it in PHP and then put it back. Question is about putting it back efficiently. I saw a solution using INSERT ... ON DUPLICATE KEY UPDATE, I saw a solution with UPDATE using CASE. Are there any other solutions? Which would be the most efficient way to update huge data array?
Better choice is using simple update.
When you try to put data with insert exceptions your DB will do more additional work: try to insert, verify constraints, raise exception, update row, verify constraints again.
Update
Run tests on my local PC for insert into ... ON DUPLICATE KEY UPDATE and UPDATE statements against the table with 43k rows.
the first approach works on 40% faster.
But both worked faster then 1.5s. I suppose, you php code will be bottleneck of your approach and you should not worry about speed of MySQL statements. Of course, it works if you table not huge and does not had dozens millions rows.
Update 2
My local PC uses MySQL 5.6 in default configuration.
CPU: 8Gb
I have to read approximately 5000 rows of 6 columns max from a xls file. I'm using PHP >= 5.3. I'll be using PHPExcel for that task. I haven't try it but I think it can handle (If you have other options, they are welcome).
The issue is that every time I read a row, I need to query the database to verify if that particular row exists, If it does, then overwrite it, If not, then add it.
I think that's going to take a lot of time and PHP will just simply timeout ( I can't modify the timeout variable since it's a shared server).
Could you give me a hand with this?
Appreciate your help
Since you're using MySQL, all you have to do is insert data and not worry about a row being there at all.
Here's why and how:
If you query a database from PHP to verify a row exists, that's bad. Reason it's bad is because you are prone to getting false results. There's a lag between PHP and MySQL, and PHP can't be used to verify data integrity. That's the job of the database.
To ensure there are no duplicate rows, we use UNIQUE constraints on our columns.
MySQL extends SQL standard using INSERT INTO ... ON DUPLICATE KEY UPDATE syntax. That lets you just insert data, and if there's a duplicate row - you can just update it with new data.
Reading 5000 rows is quick. Inserting 5000 is also quick, if you wrap it in a transaction. I would suggest reading 100 rows from the excel file, starting a transaction and just insert data (using ON DUPLICATE KEY UPDATE to handle duplicates). That will let you spend 1 I/O of your hard drive to save 100 records. Doing so, you can finish this whole process in a few seconds, which lets you not to worry about performance or timeouts.
At first run this process via exec, and timeout has no matter
At second, select all rows before read excel file. Select not at one query, read 2000 rows at time for example, and collect it to array.
At third use xlsx format and chunkReader, that allows read not whole file.
It's not 100% garantee, but i did the same.
I have a delete query that removes a huge chunk of data from a table. The query looks like this
While loop
DELETE FROM table_name WHERE id=12345;
End loop
This query fetches 1000s of data.
The loop is huge as well as the data being deleted by the query is also very huge. As a result the system becomes very slow. I need a way to optimize it.
Note : storage engine InnoDB
Thanks
This reply is based on the assumption, that the loop isn't responsible for the slow process.
There are several ways to speed DELETE-statements up, some are more practical than others. First you have to find out what exaclty slows down the DELETE-query, for example many indexes, alot of data rows, maybe a slow hardware - or just a poor connection to the server.
One way would be to use the QUICK-option, which handles deleting in the index-files differently. Check out the details for this in the MySQL-manual.
Another way would be:
Copy your table,
Delete your data in the copy
Delete the original table,
Rename the processed copy to the former filename.
That you separate the deleting from your working table. This ofc shouldn't be done if the table we're talking about here is under alot of traffic, or using foreign keys and such.
I have an "item" table with the following structure:
item
id
name
etc.
Users can put items from this item table into their inventory. I store it in the inventory table like this:
inventory
id
item_id
user_id
Is it OK to insert 1000 rows into inventory table? What is the best way to insert 1000 rows?
MySQL can handle millions of records in a single table without any tweaks. With little tweaks it can handle hundreds of millions (I did that). So I wouldn't worry about that.
To improve insert performance you should use batch inserts.
insert into table my_table(col1, col2) VALUES (val1_1, val2_1), (val1_2, val2_2);
Storing records to a file and using load data infile yields even better results (best in my case), but it requires more effort.
It's okay to insert 1000 rows. You should do this as a transaction so the indices are updated all at once at the commit.
You can also construct a single INSERT statement to insert many rows at a time. (See the syntax for INSERT.) However, I wonder how advisable it would be to do that for 1,000 rows.
Most efficient would probably be to use LOAD DATA INFILE or LOAD XML
When it gets to 1000's, I usually use write to a pipe-delimited CSV file and use LOAD DATA INFILE to suck it in quick. By writing to disk, you avoid issues with overflowing your string buffer, if the language you are using has limits on string size. LOAD DATA INFILE is optimized for bulk uploads.
I've done this with up to 1 billion rows (on a cheap $400 4GB 3 year old 32-bit Ubuntu box), so one thousand is not an issue.
Added note: if you don't care about the id assigned and you just want a new unique ID for every record you insert, you could consider setting up AUTO_INCREMENT on id in the table and let Mysql assign an ID for you.
It kind of also depends on how many users you have, if you have 1,000,000 users all doing 1,000 inserts every few minutes then the server is going to struggle to keep up. From a mysql point of view it is certainly capable of handling that much data.