I have a project assignment that need the big data. So, I decide to test mysql query performance with big data. I want to multiply one table on the database. I've try it before, but I got an very long process to multiply it.
First I've try to use INSERT INTO the table itself and I got long process.
Second, I've tried a different way, and I use mysqlimport to import 1 GB data and I got about 1,5 hours long.
So If I want to enlarge the mysql table, do you have any suggestions for me?
Though this question should be flagged as "not constructive". I will still suggest you something.
If your objective is "only to make the table large" as per your comment. Why taking all the trouble to insert duplicate OR mysqlimport . Instead search for and download free sample large databases and play around
https://launchpad.net/test-db/+download
http://dev.mysql.com/doc/index-other.html
http://www.ozerov.de/bigdump/
If explicitly a particular table structure is needed, then run some DDL queries (ALTER TABLE) to shape those tables (downloaded) according to your wish
Related
I have a backup table that - thanks to poorly planned management by a programmer who is bad at math - has 3.5 billion records in it. The data drive is nearly full, performance is suffering. I need to clean out this table but just a simple SELECT COUNT(1) statement takes 30 minutes or more to return.
There is no primary key on the table.
The database uses SIMPLE logging. There's only 25gb left on the drive, so I need to be mindful that whatever I do has to leave space for the database to continue functioning for everyone else. I'm waiting for confirmation as I type this, but I don't think I need to keep any of the data that's in there now.
On a table with that many records, would TRUNCATE TABLE grind the system to a halt?
Also looking into the solutions proposed here: How to delete large data of table in SQL without log?
The idea is to allow my clients to keep working while I'm doing all this.
Truncate table would work if you no longer need the records. This will not reduce the size of the database. If you need to do that you would need to shrink the data file.
If you would rather delete the records Aaron Bertrand has good examples and test results he did located here: https://sqlperformance.com/2013/03/io-subsystem/chunk-deletes
As specified in the above heading normally, what I did is not a normal way of altering a table to add one column.
We are using MySql 5.5, and Toad for doing operations in database. There is one table appln_doc which is having nearly 300K records storing documents of applicants, like images and that too of huge size.
Now we have to add a new column is_signature of tinyint type. We tried in three different ways
Using Toads in built provision - Double click on table, a new window will there under column tab add a new column with name, type and size and click Alter button.
Using alter table query in toad itself.
Using putty we logged inside mysql and executed the same query.
All the three efforts lead into same problem Fixing “Lock wait timeout exceeded; try restarting transaction” for a 'stuck" Mysql table. So we tried to kill the waiting process and again tried to alter the table, still the result was same.
We killed the process again and restarted the mysql server and again tried to add the column, still the problem was same.
Lastly we exported all the table data to an excel sheet and truncated the table. After that when we tried to add that column it was successful. Then that exported excel sheet was added with a new column is_signature with all its values as 0 as a default value to the new column. Then we exported that data back to the table again. That's why I said that I didn't add the column in a normal way.
So has anybody faced any situation like this and has got a better solution than than this? Can anybody tell why this is happening is it because of the bulk and size of data stored in that table?
PS : The table appln_doc was having a child table appln_doc_details with no data. Only this table is having problem while altering.
At the end of the day, no matter what tools you use, it all boils down to one of two scenarios:
An alter table statement
Create new table/copy data/delete old table/rename new table.
The first one is generally faster, the second one is generally more flexible (there are some things for a table that cannot be altered).
Either way, handling this much data just takes a lot of time, and there's nothing you can really do about it.
On the bright side, almost all timeouts are configurable somewhere. I don't know how to configure this particular one, but I'm 99% sure that you can. Find out how and increase it to be big enough. For 300K records, I think that the operation will take around 10 minutes or less, but of course it depends. Set it to 10 hours. :)
So i have database in project Mysql .
I have a main table that have main staff for updating and inserting .
I have huge data traffic on the data . what i am doing mainly reading .csv file and inserting to table .
Everything works file for 3 days but when table record goes above 20 million the database start responding slow , and in 60 million more slow.
What i have done ?
I have applied index in the record where i think i need of it . (where clause field for fast searching) .
I think query optimisation can not be issue because database working fine for 3 days and when data filled in table it get slow . and as i reach 60 million it work more slow .
Can you provide me the approach how can i handle this ?
What should i do ? Should i shift data in every 3 days or what ? What you have done in such situation .
The purpose of database is to store a huge information. I think the problem is not in your database, it should be poor query, joins, Database buffer, index and cache. These are the following reason which makes your response to slow up. For more info check this link
I have applied index in the record where i think i need of it
Yes, index improve the performance of SELECT query, but at the same time it will degrade your DML operation and index has to be restructure whenever you perform any changes to indexed column.
Now, this is totally depending on your business need, whether you need index or not, whether you can compromise SELECT or DML.
Currently, many industries uses two different schemas OLAP for reporting and analytics and OLTP to store real-time data (including some real-time reporting).
First of all it could be helpful for us to now which kind of data you want to store.
Normally it makes no sense to store such a huge amount of data in 3 days because no one ever will be able to use this in an effective way. So it is better to reduce the data before storing in the database.
e.g.
If you get measuring values from a device which gives you one value a millisecond, you should think if any user is ever asking for a special value at a special millisecond or if it not makes more sense to calculate the average value of once a second, minute or hour or perhaps once a day?
If you really need the milliseconds but only if the user takes a deeper look, you can create a table from the main table with only the average values of an hour or day or whatever and work with that table. Only if the user goes in ths "milliseconds" view you use the main table and have to live with the more bad performance.
This all is of course only possible if the database data is read only. If the data in the database is changed from the application (and not only appended by the CSV import) then using more then one table will be error prone.
Whick operation do you want to speed up?
insert operation
A good way to speed it up is to insert records in batch. For example, insert 1000 records in each insert statement:
insert into test values (value_list),(value_list)...(value_list);
other operations
If your table got tens of millions of records, everything will be slowing down. This is quite common.
To speed it up in this situation, here is some advice:
Optimize your table definition. It depends on your particular case. Creating indexes is a common way.
Optimize your SQL statements. Apparently a good SQL statement will run much faster, and a bad SQL statement might be a performance killer.
Data migration. If only part of your data is used frequently, you can shift the infrequently-used data to another big table.
Sharding. This is a more complicated way, but usually used in big data system.
For the .csv file, use LOAD DATA INFILE ...
Are you using InnoDB? How much RAM do you have? What is the value of innodb_buffer_pool_size? That may not be set right -- based on queries slowing down as the data increases.
Let's see a slow query. And SHOW CREATE TABLE. Often a 'composite' index is needed. Or reformulation of the SELECT.
I have a system that a client designed and the table was originally not supposed to get larger than 10 gigs (maybe 10 million rows) over a few years. Well, they've imported a lot more information than they were thinking and within a month, the table is now up to 208 gigs (900 million rows).
I have very little experience with MySQL and a lot more experience with Microsoft SQL. Is there anything in MySQL that would allow the client to have the database span multiple files so the queries that are run wouldn't have to use the entire table and index? There is a field on the table that could easily be split on, but I wasn't sure how to do this.
The main issue I'm trying to solve is a retrieval query from this table. Inserts aren't a big deal at all since it's all done by a back-end service. I have a test system where the table is about 2 gigs (6 million rows) and my query takes less than a second. When this same query is run on the production system, it takes 20 seconds. I have feeling that the query is doing well, it's just the size of the table that's causing the issue. There is an index on this table created specifically for this query, and using an EXPLAIN, it is using it.
If you have any other suggestions/questions, please feel free to ask.
Use partitioning and especially the part of create table that sets the data_directory and index_directory.
With these options you can put partitions on separate drives if needed. Usially though, it's enough to partition with a key that you can use on each query, usually time.
In addition to partitioning which has been mentioned you might also want to run the tuning-primer script to ensure your mysql configuration is optimal.
I am inserting part of a large table to a new MyISAM table. I tried both command line and phpmyadmin, both take a long time. But I find in the mysql data folder, the table file actually has GB of data, but in phpmyadmin, it shows there is no record. Then I "check" the table, and it takes like forever...
What is wrong here? Should I change to innoDB?
Do you have indicies defined on your table? If you're most interested in inserting a lot of data quickly, you could consider dropping the indicies, doing the insert, and then re-adding the indicies. It won't be any faster overall (in fact the manual intervention would likely make the overall operation slower), but it would give you more direct visibility into how long the data insertion is taking versus the indexing that follows.