importing table with phpmyadmin takes too long time - mysql

i've imported a mysql table stored in a .sql file using my localhost phpymadmin, the table has 14000 records (simple data, 5 fields only) and it took almost 10 minutes. is this normal? i'm running a laptop with win8, core i7 quad and my xampp seems to be configured properly.
thanks

Your hard drive is the limit in this case. Having a single insert per row means your inserting is limited on your hard drives IOPS (I/O operations per second).
Bulk inserting reduces the IOPS but increases the MB/s transfer which is what you want in this case.
so rewriting like
INSERT INTO table VALUES (1,2,3,4),(1,2,3,4)
with comma separated rows will give a huge boost
Putting in a hard drive with higher IOPS will speed it up also if the rewritten query is still slow

Related

Extreme execution time differences between local and production with same data in MySQL with innoDB

I'm trying to execute an ALTER TABLE to convert a table to UTF-8. I exported all the data from a very large table in production and imported it on my local mySQL instance to test how much time it would take and prepare myself.
The table contains 33 millions rows but it contains only 3 columns.
Here is the query for the conversion:
ALTER TABLE image_public_key CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci, FORCE;
When I run that query in my local instance running on a small laptop with a SSD, it is done in approximately 5 minutes. No big deal.
When I run the same query, in the same table in production using Amazon RDS with a db.r3.xlarge instance, it takes 55 minutes (which is 11x the time!).
I'm trying to understand how a small laptop can beat a machine like that (4cpu + 30 gb RAM + 800 Gb of memory total, 300 Gb free at the moment).
Could it be some throttling on Amazon side? Some default variables? Any help would be appreciated. I need to run the same query on multiple table and cannot put production on hold for so much time...
EDIT
I might add that the task stay stuck for a very long time with the state copy to tmp. Not sure if this is a useful information.

Queries fast after creating an index but slow after a few minutes MySQL

I have several tables with ~15 million rows. When I create an idex on the id column and then I execute a simple query like SELECT * FROM my_table WHERE id = 1 I retrieve the data within one second. But then, after a few minutes, if I execute the query with a different id it takes over 15 seconds.
I'm sure it is not the query cache because I'm trying different ids all the time to make sure I'm not retrieving from the cache. Also, I used EXPLAIN to make sure the index it's being used.
The specs of the server are:
CPU: Intel Dual Xeon 5405 Harpertown 2.0Ghz Quad Core
RAM: 8GB
Hard drive 2: 146GB SAS (15k rpm)
Another thing I noticed is that if I execute REPAIR TABLE my_table the queries become within one second again. I assume something is being cached, either the table or the index. If so, is there any way to tell MySQL to keep it cached. Is it normal, given the specs of the server, to take around 13 seconds on an indexed table? The index is not unique and each query returns around 3000 rows.
NOTE: I'm using MyISAM and I know there won't be any write in these tables, all the queries will be to read data.
SOLVED: thank you for your answers, as many of you pointed out it was the key_buffer_size.I also reordered the tables using the same column as the index so the records are not scattered, now I'm executing the queries consistently under 1 second.
Please provide
SHOW CREATE TABLE
SHOW VARIABLES LIKE '%buffer%';
Likely causes:
key_buffer_size (when using MyISAM) is not 20% of RAM; or innodb_buffer_pool_size is not 70% of available RAM (when using InnoDB).
Another query (or group of queries) is coming in and "blowing out the cache" (key_buffer or buffer_pool). Look for such queries).
When using InnoDB, you don't have a PRIMARY KEY. (It is really important to have such.)
For 3000 rows to take 15 seconds to load, I deduce:
The cache for the table (not necessarily for the index) was blown out, and
The 3000 rows were scattered around the table (hence fetching one row does not help much in finding subsequent rows).
Memory allocation blog: http://mysql.rjweb.org/doc.php/memory
Is it normal, given the specs of the server, to take around 13 seconds on an indexed table?
The high variance in response time indicates that something is amiss. With only 8 GB of RAM and 15 million rows, you might not have enough RAM to keep the index in memory.
Is swap enabled on the server? This could explain the extreme jump in response time.
Investigate the memory situation with a tool like top, htop or glances.

Mysql RAM requirement for 22 Billion Records Select query

I have a table which is expected to have 22 billion records yearly. How much will be the RAM requirement if each of the records cost around 4 KB of data.
It is expected to have around 8 TB of storage for the same table.
[update]
There is no join queries involved. I just need the select queries to be executed efficiently.
I have found that there is no general rule of thumb when it comes to how much RAM you need for x amount of records in MYSQL.
The first factor which you need to look at is the design of the database itself. This is one of the most impacting factors of them all. If your database is poorly designed, throwing RAM at it isn't going to fix your problem.
Another factor is how this data is going to be accessed, i.e if a specific row is being accessed by 100 people SELECT * FROM table where column = value then you could get away with a tiny amount of RAM as you would just use query caching.
It MAY (not always) be a good idea to keep your entire database in RAM to allow it to be read quicker (Dependent on the total size of the database). I.e. if your database is 100GB in size then 128GB of RAM should be proficient to deal with any overheads such as the OS and other factors.
As per my system i am supporting daily Oracle 224GB CDR record to a network operator.
Also for another system daily 20 lack data retrieve from SQL database .
you can use 128 GB if you are using one server or else
if you are using load balancer then you can use 62 GB on every PC.

IOPS or Throughput? - Determining Write Bottleneck in Amazon RDS Instance

We have nightly load jobs that writes several hundred thousand records to an Mysql reporting database running in Amazon RDS.
The load jobs are taking several hours to complete, but I am having a hard time figuring out where the bottleneck is.
The instance is currently running with General Purpose (SSD) storage. By looking at the cloudwatch metrics, it appears I am averaging less than 50 IOPS for the last week. However, Network Receive Throughput is less than 0.2 MB/sec.
Is there anyway to tell from this data if I am being bottlenecked by network latency (we are currently loading the data from a remote server...this will change eventually) or by Write IOPS?
If IOPS is the bottleneck, I can easily upgrade to Provisioned IOPS. But if network latency is the issue, I will need to redesign our load jobs to load raw data from EC2 instances instead of our remote servers, which will take some time to implement.
Any advice is appreciated.
UPDATE:
More info about my instance. I am using an m3.xlarge instance. It is provisioned for 500GB in size. The load jobs are done with the ETL tool from pentaho. They pull from multiple (remote) source databases and insert into the RDS instance using multiple threads.
You aren't using up much CPU. Your memory is very low. An instance with more memory should be a good win.
You're only doing 50-150 iops. That's low, you should get 3000 in a burst on standard SSD-level storage. However, if your database is small, it is probably hurting you (since you get 3 iops per GB- so if you are on a 50gb or smaller database, consider paying for provisioned iops).
You might also try Aurora; it speaks mysql, and supposedly has great performance.
If you can spread out your writes, the spikes will be smaller.
A very quick test is to buy provisioned IOPS, but be careful as you may get fewer than you do currently during a burst.
Another quick means to determine your bottleneck is to profile your job execution application with a profiler that understands your database driver. If you're using Java, JProfiler will show the characteristics of your job and it's use of the database.
A third is to configure your database driver to print statistics about the database workload. This might inform you that you are issuing far more queries than you would expect.
Your most likely culprit accessing the database remotely is actually round-trip latency. The impact is easy to overlook or underestimate.
If the remote database has, for example, a 75 millisecond round-trip time, you can't possibly execute more than 1000 (milliseconds/sec) / 75 (milliseconds/round trip) = 13.3 queries per second if you're using a single connection. There's no getting around the laws of physics.
The spikes suggest inefficiency in the loading process, where it gathers for a while, then loads for a while, then gathers for a while, then loads for a while.
Separate but related, if you don't have the MySQL client/server compression protocol enabled on the client side... find out how to enable it. (The server always supports compression but the client has to request it during the initial connection handshake), This won't fix the core problem but should improve the situation somewhat, since less data to physically transfer could mean less time wasted in transit.
I'm not an RDS expert and I don't know if my own particular case can shed you some light. Anyway, hope this give you any kind of insight.
I have a db.t1.micro with 200GB provisioned (that gives be 600 IOPS baseline performance), on a General Purpose SSD storage.
The heaviest workload is when I aggregate thousands of records from a pool of around 2.5 million rows from a 10 million rows table and another one of 8 million rows. I do this every day. This is what I average (it is steady performance, unlike yours where I see a pattern of spikes):
Write/ReadIOPS: +600 IOPS
NetworkTrafficReceived/Transmit throughput: < 3,000 Bytes/sec (my queries are relatively short)
Database connections: 15 (workers aggregating on parallel)
Queue depth: 7.5 counts
Read/Write Throughput: 10MB per second
The whole aggregation task takes around 3 hours.
Also check 10 tips to improve The Performance of your app in AWS slideshare from AWS Summit 2014.
I don't know what else to say since I'm not an expert! Good luck!
In my case it was the amount of records. I was writing only 30 records per minute and had an Write IOPS of round about the same 20 to 30. But this was eating at the CPU, which reduced the CPU credit quite steeply. So I took all the data in that table and moved it to another "historic" table. And cleared all data in that table.
CPU dropped back down to normal measures, but Write IOPS stayed about the same, this was fine though. The problem: Indexes, I think because so many records needed to indexed when inserting it took way more CPU to do this indexing with that amount of rows. Even though the only index I had was a Primary Key.
Moral of my story, the problem is not always where you think it lies, although I had increased Write IOPS it was not the root cause of the problem, but rather the CPU that was being used to do index stuff when inserting which caused the CPU credit to fall.
Not even X-RAY on the Lambda could catch an increased query time. That is when I started to look at the DB directly.
Your Queue depth graph shows > 2 , which clearly indicate that the IOPS is under provisioned. (if Queue depth < 2 then IOPS is over provisioned)
I think you have used the default AUTOCOMMIT = 1 (autocommit mode). It performs a log flush to disk for every insert and exhausted the IOPS.
So,It is better to use (for performance tuning) AUTOCOMMIT = 0 before bulk inserts of data in MySQL, if the insert query looks like
set AUTOCOMMIT = 0;
START TRANSACTION;
-- first 10000 recs
INSERT INTO SomeTable (column1, column2) VALUES (vala1,valb1),(vala2,valb2) ... (val10000,val10000);
COMMIT;
--- next 10000 recs
START TRANSACTION;
INSERT INTO SomeTable (column1, column2) VALUES (vala10001,valb10001),(vala10001,valb10001) ... (val20000,val20000);
COMMIT;
--- next 10000 recs
.
.
.
set AUTOCOMMIT = 1
Using the above approach in t2.micro and inserted 300000 in 15 minutes using PHP.

Mysql exceeds system ram when doing update select

I am running a mysql server on a mac pro, 64GB of ram, 6 cores. Table1 in my schema has 330 million rows. Table2 has 65,000 rows. (I also have several other tables with a combined total of about 1.5 billion rows, but they are not being used in the operation I am attempting, so I don't think they are relevant).
I am trying to do what I would have thought was a relatively simple update statement (see below) to bring some data from Table2 into Table1. However, I am having a terrible time with mysql blowing through my system ram, forcing me into swaps, and eventually freezing up the whole system so that mysql becomes unresponsive and I need to restart my computer. My update statement is as below:
UPDATE Table1, Table2
SET
Table1.Column1 = Table2.Column1,
Table1.Column2 = Table2.Column2,
Table1.Column3 = Table2.Column3,
Table1.Column4 = Table2.Column4
WHERE
(Table1.Column5 = Table2.Column5) AND
(Table1.Column6 = Table2.Column6) AND
(Table1.Column7 = Table2.Column7) AND
(Table1.id between 0 AND 5000000);
Ultimately, I want to perform this update for all 330 million rows in Table1. I decided to break it up into batches of 5 million lines each though because
(a) I was getting problems with exceeding lock size and
(b) I thought it might help with my problems of blowing through ram.
Here are some more relevant details about the situation:
I have created indexes for both Table1 and Table2 over the combination of Column5, Column6, Column7 (the columns whose values I am matching on).
Table1 has 50 columns and is about 60 GB total.
Table2 has 8 columns and is 3.5 MB total.
I know that some people might recommend foreign keys in this situation, rather than updating table1 with info from table2, but (a) I have plenty of disk space and don't really care about using it to maximum efficiency (b) none of the values in any of these tables will change over time and (c) I am most concerned about speed of queries run on table1, and if it takes this long to get info from table2 to table1, I certainly don't want to need to repeat the process for every query I run on table1.
In response to the problem of exceeding maximum lock table size, I have experimented with increasing innodb_buffer_pool_size. I've tried a number of values. Even at something as low as 8 GB (i.e. 1/8th of my computer's ram, and I'm running almost nothing else on it while doing this), I am still having this problem of the mysqld process using up basically all of the ram available on the system and then starting to pull ram allocation from the operating system (i.e. my kernel_task starts showing up as using 30GB of ram, whereas it usually uses around 2GB).
The problem with the maximum locks seems to have been largely resolved; I no longer get this error, though maybe that's just because now I blow through my memory and crash before I can get there.
I've experimented with smaller batch sizes (1 million rows, 100,000 rows). These seem to work maybe a bit better than the 5 million row batches, but they still generally have the same problems, maybe only a bit slower to develop. And, performance seems terrible - for instance, at the rate I was going on the 100,000 batch sizes, it would have taken about 7 days to perform this update.
The tables both use InnoDB
I generally set SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; although I don't know if it actually helps or not (I am the only user accessing this DB in any way, so I don't really care about locking and would do away with it entirely if I could)
I notice a lot of variability in the time it takes batches to run. For instance, on the 1 million row batches, I would observe times anywhere between 45 seconds and 20 minutes.
When I tried running something that just found the matching rows and then put only two column values for those into a new table, I got much more consistent times (about 2.5 minutes per million lines). Thus, it seems that my problems might somehow stem from the fact maybe that I'm updating values in the table that I am doing the matching on, even though the columns that I'm updating are different from those I am matching on.
The columns that I am matching on and updating just contain INT and CHAR types, none with more than 7 characters max.
I ran a CHECK TABLE diagnostic and it came back ok.
Overall, I am tremendously perplexed why this would be so difficult. I am new to mysql and databases in general. Since Table2 is so small, I could accomplish this same task, much faster I believe, in python using a dictionary lookup. I would have thought though that databases would be able to handle this better, since handling and updating big datasets is what they are designed for.
I ran some diagnostics on the queries using Mysql workbench and confirmed that there are NOT full table scans being performed.
It really seems something must be going wrong here though. If the system has 64 GB of ram, and that is more than the entire size of the two tables combined (though counting index size it is a bit more than 64 GB for the two tables), and if the operation is only being applied on 5 million out of 330 million rows at a time, it just doesn't make sense that it should blow out the ram.
Therefore, I am wondering:
Is the syntax of how I am writing this update statement somehow horribly bad and inefficient such that it would explain the horrible performance and problems?
Are there some kind of parameters beyond the innodb_buffer_pool_size that I should be configuring, either to put a firmer cap on the ram mysql uses or to get it to more effectively use resources?
Are there other sorts of diagnostics I should be running to try to detect problems with my tables, schema, etc.?
What is a "reasonable" amount of time to expect an update like this to take?
So, after consulting with several people knowledgeable about such matters, here are the solutions I came up with:
I brought my innodb_buffer_pool_size down to 4GB, i.e. 1/16th of my total system memory. This finally seemed to be enough to reliably stop MySQL from blowing through my 64GB of RAM.
I simplified my indexes so that they only contained exactly the columns I needed, and made sure that all indexes I was using were small enough to fit into RAM (with plenty of room to spare for other uses of RAM by MySQL as well).
I learned to accept that MySQL just doesn't seem to be built for particularly large data sets (or, at least not on a single machine, even if a relatively big machine like what I have). Thus, I accepted that manually breaking up my jobs into batches would often be necessary, since apparently the machinery of MySQL doesn't have what it takes to make the right decisions about how to break a job up on its own, in order to be conscientious about system resources like RAM.
Sometimes, when doing jobs along the lines of this, or in general, on my moderately large datasets, I'll use MySQL to do my updates and joins. Other times, I'll just break the data up into chunks and then do the joining or other such operations in another program, such as R (generally using a package like data.table that handles largish data relatively efficiently).
I was also advised that alternatively, I could use something like Pig of Hive on a Hadoop cluster, which should be able to better handle data of this size.