MySQL insert query cannot finish - mysql

I am inserting part of a large table to a new MyISAM table. I tried both command line and phpmyadmin, both take a long time. But I find in the mysql data folder, the table file actually has GB of data, but in phpmyadmin, it shows there is no record. Then I "check" the table, and it takes like forever...
What is wrong here? Should I change to innoDB?

Do you have indicies defined on your table? If you're most interested in inserting a lot of data quickly, you could consider dropping the indicies, doing the insert, and then re-adding the indicies. It won't be any faster overall (in fact the manual intervention would likely make the overall operation slower), but it would give you more direct visibility into how long the data insertion is taking versus the indexing that follows.

Related

How to add a column normally into a mysql table with more than 100K records?

As specified in the above heading normally, what I did is not a normal way of altering a table to add one column.
We are using MySql 5.5, and Toad for doing operations in database. There is one table appln_doc which is having nearly 300K records storing documents of applicants, like images and that too of huge size.
Now we have to add a new column is_signature of tinyint type. We tried in three different ways
Using Toads in built provision - Double click on table, a new window will there under column tab add a new column with name, type and size and click Alter button.
Using alter table query in toad itself.
Using putty we logged inside mysql and executed the same query.
All the three efforts lead into same problem Fixing “Lock wait timeout exceeded; try restarting transaction” for a 'stuck" Mysql table. So we tried to kill the waiting process and again tried to alter the table, still the result was same.
We killed the process again and restarted the mysql server and again tried to add the column, still the problem was same.
Lastly we exported all the table data to an excel sheet and truncated the table. After that when we tried to add that column it was successful. Then that exported excel sheet was added with a new column is_signature with all its values as 0 as a default value to the new column. Then we exported that data back to the table again. That's why I said that I didn't add the column in a normal way.
So has anybody faced any situation like this and has got a better solution than than this? Can anybody tell why this is happening is it because of the bulk and size of data stored in that table?
PS : The table appln_doc was having a child table appln_doc_details with no data. Only this table is having problem while altering.
At the end of the day, no matter what tools you use, it all boils down to one of two scenarios:
An alter table statement
Create new table/copy data/delete old table/rename new table.
The first one is generally faster, the second one is generally more flexible (there are some things for a table that cannot be altered).
Either way, handling this much data just takes a lot of time, and there's nothing you can really do about it.
On the bright side, almost all timeouts are configurable somewhere. I don't know how to configure this particular one, but I'm 99% sure that you can. Find out how and increase it to be big enough. For 300K records, I think that the operation will take around 10 minutes or less, but of course it depends. Set it to 10 hours. :)

How can I improve MySQL database performance?

So i have database in project Mysql .
I have a main table that have main staff for updating and inserting .
I have huge data traffic on the data . what i am doing mainly reading .csv file and inserting to table .
Everything works file for 3 days but when table record goes above 20 million the database start responding slow , and in 60 million more slow.
What i have done ?
I have applied index in the record where i think i need of it . (where clause field for fast searching) .
I think query optimisation can not be issue because database working fine for 3 days and when data filled in table it get slow . and as i reach 60 million it work more slow .
Can you provide me the approach how can i handle this ?
What should i do ? Should i shift data in every 3 days or what ? What you have done in such situation .
The purpose of database is to store a huge information. I think the problem is not in your database, it should be poor query, joins, Database buffer, index and cache. These are the following reason which makes your response to slow up. For more info check this link
I have applied index in the record where i think i need of it
Yes, index improve the performance of SELECT query, but at the same time it will degrade your DML operation and index has to be restructure whenever you perform any changes to indexed column.
Now, this is totally depending on your business need, whether you need index or not, whether you can compromise SELECT or DML.
Currently, many industries uses two different schemas OLAP for reporting and analytics and OLTP to store real-time data (including some real-time reporting).
First of all it could be helpful for us to now which kind of data you want to store.
Normally it makes no sense to store such a huge amount of data in 3 days because no one ever will be able to use this in an effective way. So it is better to reduce the data before storing in the database.
e.g.
If you get measuring values from a device which gives you one value a millisecond, you should think if any user is ever asking for a special value at a special millisecond or if it not makes more sense to calculate the average value of once a second, minute or hour or perhaps once a day?
If you really need the milliseconds but only if the user takes a deeper look, you can create a table from the main table with only the average values of an hour or day or whatever and work with that table. Only if the user goes in ths "milliseconds" view you use the main table and have to live with the more bad performance.
This all is of course only possible if the database data is read only. If the data in the database is changed from the application (and not only appended by the CSV import) then using more then one table will be error prone.
Whick operation do you want to speed up?
insert operation
A good way to speed it up is to insert records in batch. For example, insert 1000 records in each insert statement:
insert into test values (value_list),(value_list)...(value_list);
other operations
If your table got tens of millions of records, everything will be slowing down. This is quite common.
To speed it up in this situation, here is some advice:
Optimize your table definition. It depends on your particular case. Creating indexes is a common way.
Optimize your SQL statements. Apparently a good SQL statement will run much faster, and a bad SQL statement might be a performance killer.
Data migration. If only part of your data is used frequently, you can shift the infrequently-used data to another big table.
Sharding. This is a more complicated way, but usually used in big data system.
For the .csv file, use LOAD DATA INFILE ...
Are you using InnoDB? How much RAM do you have? What is the value of innodb_buffer_pool_size? That may not be set right -- based on queries slowing down as the data increases.
Let's see a slow query. And SHOW CREATE TABLE. Often a 'composite' index is needed. Or reformulation of the SELECT.

Converting a big MyISAM to InnoDB

I'm trying to convert a 10million rows MySQL MyISAM table into InnoDB.
I tried ALTER TABLE but that made my server get stuck so I killed the mysql manually. What is the recommended way to do so?
Options I've thought about:
1. Making a new table which is InnoDB and inserting parts of the data each time.
2. Dumping the table into a text file and then doing LOAD FILE
3. Trying again and just keep the server non-responsive till he finishes (I tried for 2hours and the server is a production server so I prefer to keep it running)
4. Duplicating the table, Removing its indexes, then converting, and then adding indexes
Changing the engine of the table requires rewrite of the table, and that's why the table is not available for so long. Removing indexes, then converting, and adding indexes, may speed up the initial convert, but adding index creates a read lock on your table, so the effect in the end will be the same. Making new table and transferring the data is the way to go. Usually this is done in 2 parts - first copy records, then replay any changes that were done while copying the records. If you can afford disabling inserts/updates in the table, while leaving the reads, this is not a problem. If not, there are several possible solutions. One of them is to use facebook's online schema change tool. Another option is to set the application to write in both tables, while migrating the records, than switch only to the new record. This depends on the application code and crucial part is handling unique keys / duplicates, as in the old table you may update record, while in the new you need to insert it. (here transaction isolation level may also play crucial role, lower it as much as you can). "Classic" way is to use replication, which, as far as I know is also done in 2 parts - you start replication, recording the master position, then import dump of the database in the second server, then start it as a slave to catch up with changes.
Have you tried to order your data first by the PK ? e.g:
ALTER TABLE tablename ORDER BY PK_column;
should speed up the conversion.

Generating a massive 150M-row MySQL table

I have a C program that mines a huge data source (20GB of raw text) and generates loads of INSERTs to execute on simple blank table (4 integer columns with 1 primary key). Setup as a MEMORY table, the entire task completes in 8 hours. After finishing, about 150 million rows exist in the table. Eight hours is a completely-decent number for me. This is a one-time deal.
The problem comes when trying to convert the MEMORY table back into MyISAM so that (A) I'll have the memory freed up for other processes and (B) the data won't be killed when I restart the computer.
ALTER TABLE memtable ENGINE = MyISAM
I've let this ALTER TABLE query run for over two days now, and it's not done. I've now killed it.
If I create the table initially as MyISAM, the write speed seems terribly poor (especially due to the fact that the query requires the use of the ON DUPLICATE KEY UPDATE technique). I can't temporarily turn off the keys. The table would become over 1000 times larger if I were to and then I'd have to reprocess the keys and essentially run a GROUP BY on 150,000,000,000 rows. Umm, no.
One of the key constraints to realize: The INSERT query UPDATEs records if the primary key (a hash) exists in the table already.
At the very beginning of an attempt at strictly using MyISAM, I'm getting a rough speed of 1,250 rows per second. Once the index grows, I imagine this rate will tank even more.
I have 16GB of memory installed in the machine. What's the best way to generate a massive table that ultimately ends up as an on-disk, indexed MyISAM table?
Clarification: There are many, many UPDATEs going on from the query (INSERT ... ON DUPLICATE KEY UPDATE val=val+whatever). This isn't, by any means, a raw dump problem. My reasoning for trying a MEMORY table in the first place was for speeding-up all the index lookups and table-changes that occur for every INSERT.
If you intend to make it a MyISAM table, why are you creating it in memory in the first place? If it's only for speed, I think the conversion to a MyISAM table is going to negate any speed improvement you get by creating it in memory to start with.
You say inserting directly into an "on disk" table is too slow (though I'm not sure how you're deciding it is when your current method is taking days), you may be able to turn off or remove the uniqueness constraints and then use a DELETE query later to re-establish uniqueness, then re-enable/add the constraints. I have used this technique when importing into an INNODB table in the past, and found even with the later delete it was overall much faster.
Another option might be to create a CSV file instead of the INSERT statements, and either load it into the table using LOAD DATA INFILE (I believe that is faster then the inserts, but I can't find a reference at present) or by using it directly via the CSV storage engine, depending on your needs.
Sorry to keep throwing comments at you (last one, probably).
I just found this article which provides an example of a converting a large table from MyISAM to InnoDB, while this isn't what you are doing, he uses an intermediate Memory table and describes going from memory to InnoDB in an efficient way - Ordering the table in memory the way that InnoDB expects it to be ordered in the end. If you aren't tied to MyISAM it might be worth a look since you already have a "correct" memory table built.
I don't use mysql but use SQL server and this is the process I use to handle a file of similar size. First I dump the file into a staging table that has no constraints. Then I identify and delete the dups from the staging table. Then I search for existing records that might match and put the idfield into a column in the staging table. Then I update where the id field column is not null and insert where it is null. One of the reasons I do all the work of getting rid of the dups in the staging table is that it means less impact on the prod table when I run it and thus it is faster in the end. My whole process runs in less than an hour (and actually does much more than I describe as I also have to denormalize and clean the data) and affects production tables for less than 15 minutes of that time. I don't have to wrorry about adjusting any constraints or dropping indexes or any of that since I do most of my processing before I hit the prod table.
Consider if a simliar process might work better for you. Also could you use some sort of bulk import to get the raw data into the staging table (I pull the 22 gig file I have into staging in around 16 minutes) instead of working row-by-row?

when to use OPTIMIZE in mysql

I have a database full of time-sensitive data, so on a daily basis I truncate the table and then import the new data (from a merge of other databases) into the truncated table.
Currently I am running OPTIMIZE on the table after I have imported the daily refresh of data.
However, looking at the mysql OPTIMIZE syntax page
http://dev.mysql.com/doc/refman/5.1/en/optimize-table.html
it says I can optimize to reclaim unused space and defrag the data.
So should I being running OPTIMIZE twice?
Once when I delete the data, and then again after I've reinserted it?
or just once?
and if just once, should it be after loading the new data?
or after clearing out the old?
it may depend upon whether you are using MyISAM or InnoDB tables, but i would run the OPTIMIZE after truncating the table. This should ensure space is reclaimed and it will run very quickly.
When you insert your batch of data it should all insert in order and not be fragmented anyway, and since it's a fresh insert there will be no space to reclaim. If it's a small dataset it may not matter too much, but on a large dataset doing the OPTIMIZE after the insert could also be quite slow.
Just once is fine, after you've imported the new data.
After deleteing or updateing a set of data into your database,you can use optimize table command to remove de-fragmented space .
there is no need to use optimize command two time.after all DML process you can
use optimize command.