MySQL slows down after INSERT - mysql

I run into performance issues with my web application. Found out that bottleneck is db. App is running on LAMP server(VPS) with 4 CPU and 2GB RAM.
After insertion of new record into DB (table with around 100.000 records) select queries significantly slows down for a while (sometimes several for minutes). I thought that problem is reindexing, but there is practicly no activity at VPS after insert. There are plenty of memory left, no need for swapping. CPU is idle.
Truth is, selects are quite complex:
SELECT COUNT(A.id), B.title FROM B JOIN A .... WHERE ..lot of stuff..
Both A and B has about 100K records. A has many columns, B only few but it is tree structure represented by nested set. B doesnt change very often, but A does. WHERE conditions are mostly covered by indexes. There are usually about 10-30 rows in result set.
Are there any optimizations I could perform?

You might want to include your "lot of stuff"... you could be doing 'like' comparisons or joining on unindexed varchar columns :)
you'll also need to look at indexing columns that are used heavily.

First thing is: DO NOT trust any CPU/RAM etc. measurements inside a VPS - they can be wrong since they don't take into account what is going on on the machine (in other VPS)!
As for the performance:
Check the query plans for all your SQL statements... use a profiler on the app itself and see where the bottlenecks are...
Another point is to check the configuration of your MySQL DB... is there any replication going on (might cause a slowdon too) ? Does the DB have enough RAM ? Is the DB on a different machine/VPS or in the same VPS ?

Related

Redshift design or configuration issue? - My Redshift datawarehouse seems much slower than my mysql database

I have a Redshift datawarehouse that is pulling data in from multiple sources.
One is my from MySQL and the others are some cloud based databases that get pulled in.
When querying in redshift, the query response is significantly slower than the same mysql table(s).
Here is an example:
SELECT *
FROM leads
WHERE id = 10162064
In mysql this takes .4 seconds. In Redshift it takes 4.4 seconds.
The table has 11 million rows. "id" is indexed in mysql and in redshift it is not since it is a columnar system.
I know that Redshift is a columnar data warehouse (which is relatively new to me) and Mysql is a relational database that is able to utilize indexes. I'm not sure if Redshift is the right tool for us for reporting, or if we need something else. We have about 200 tables in it from 5 different systems and it is currently at 90 GB.
We have a reporting tool sitting on top that does native queries to pull data. They are pretty slow but are also pulling a ton of data from multiple tables. I would expect some slowness with these, but with a simple statement like above, I would expect it to be quicker.
I've tried some different DIST and SORT key configurations but see no real improvement.
I've run vacuum and analyze with no improvement.
We have 4 nodes, dc2.large. Currently only using 14% storage. CPU utilization is frequently near 100%. Database connections averages about 10 at any given time.
The datawarehouse just has exact copies of the tables from our integration with the other sources. We are trying to do near real-time reporting with this.
Just looking for advice on how to improve performance of our redshift via configuration changes, some sort of view or dim table architecture, or any other tips to help me get the most out of redshift.
I've worked with clients on this type of issue many times and I'm happy to help but this may take some back and forth to narrow in on what is happening.
First I'm assuming that "leads" is a normal table, not a view and not an external table. Please correct if this assumption isn't right.
Next I'm assuming that this table isn't very wide and that "select *" isn't contributing greatly to the speed concern. Yes?
Next question is wide this size of cluster for a table of only 11M rows? I'd guess it is that there are other much larger data sets on the database and that this table isn't setting the size.
The first step of narrowing this down is to go onto the AWS console for Redshift and find the query in question. Look at the actual execution statistics and see where the query is spending its time. I'd guess it will be in loading (scanning) the table but you never know.
You also should look at STL_WLM_QUERY for the query in question and see how much wait time there was with the running of this query. Queueing can take time and if you have interactive queries that need faster response times then some WLM configuration may be needed.
It could also be compile time but given the simplicity of the query this seems unlikely.
My suspicion is that the table is spread too thin around the cluster and there are lots of mostly empty blocks being read but this is just based on assumptions. Is "id" the distkey or sortkey for this table? Other factors likely in play are cluster load - is the cluster busy when this query runs? WLM is one place that things can interfere but disk IO bandwidth is a share resource and if some other queries are abusing the disks this will make every query's access to disk slow. (Same is true of network bandwidth and leader node workload but these don't seem to be central to your issue at the moment.)
As I mentioned resolving this will likely take some back and forth so leave comments if you have additional information.
(I am speaking from a knowledge of MySQL, not Redshift.)
SELECT * FROM leads WHERE id = 10162064
If id is indexed, especially if it is a Unique (or Primary) key, 0.4 sec sounds like a long network delay. I would expect 0.004 as a worst-case (with SSDs and `PRIMARY KEY(id)).
(If leads is a VIEW, then let's see the tables. 0.4s may be be reasonable!)
That query works well for a RDBMS, but not for a columnar database. Face it.
I can understand using a columnar database to handle random queries on various columns. See also MariaDB's implementation of "Columnstore" -- that would give you both RDBMS and Columnar in a single package. Still, they are separate enough that you can't really intermix the two technologies.
If you are getting 100% CPU in MySQL, show us the query, its EXPLAIN, and SHOW CREATE TABLE. Often, a better index and/or query formulation can solve that.
For "real time reporting" in a Data Warehouse, building and maintaining Summary Tables is often the answer.
Tell us more about the "exact copy" of the DW data. In some situations, the Summary tables can supplant one copy of the Fact table data.

Join order differs for between two instances of the same Mysql DB

There is a query that I want to optimize. To make some tests, I took a snapshot of the production database and create a new test instance of this database. Using the explain clause, I can see that the order of the joins differ between the two databases. The two databases have the same version (MySQL 5.6.19a), the same engine (InnoDB), the same schema, the same indexes, the same data, and are executed on the same material. The only difference, is that the production database use more memory (obviously) because it has more connections to it.
What may cause the join order to be different?
The memory usage?
The indexes are still building in the test instance?
The indexes of the production database are fragmented?
This is rare but quite feasible. InnoDB has "statistics" about each index on each table; it uses them to decide what it the best way to perform the query, including what order to look at the tables.
The statistics used to come from 8 'random' dives into the BTree to get a crude feel for the number of rows and the distribution of the data. The timing of the dives, the number '8', and the randomness have all been criticized, and gradually they have been improved. Only some improvements exist in 5.6.19.
Also the "cost" model of deciding how to perform the query has recently had an overhaul (5.7 / 8.0). 8.0 and MariaDB 10.0 have "histograms", which should lead to better query plan choices. Not yet implemented (as of 8.0.0): Noticing which blocks are already cached; this could picking a 'worse' index because more of it is cached, hence faster.
Because of the complexity of the optimization problem and the huge number of possibilities, there are even some cases where a newer version picks a worse query plan.
Even if you are running the same query on the same machine, the query plan could be different.
I presume you already knew that changing a constant in the query can change the query plan -- and do it for the better. I have seen the same query come up with 6 different query plans, presumably due to different constants. This can be annoying if you are doing EXPLAIN on a query found in the slowlog -- you can't be sure that that query plan was used when it was "slow".
We simply have to live with all this.
You could do ANALYZE TABLE to recompute the statistics. But that can make things worse or better, depending on the phase of the moon. It might even (coincidentally) make your two instances perform the query the same.
The real question is "did one server run your query significantly faster than the other?" (After accounting for caching, other activity, etc, etc.)
When both of two tables in a JOIN are being filtered (something in WHERE), it is very difficult for the Optimizer to decide. If there is also ORDER BY and LIMIT, it becomes even harder to decide.
If you would like to provide your SELECT, its EXPLAIN, and SHOW CREATE TABLE, we can discuss details. (But start a new question.)

Mysql exceeds system ram when doing update select

I am running a mysql server on a mac pro, 64GB of ram, 6 cores. Table1 in my schema has 330 million rows. Table2 has 65,000 rows. (I also have several other tables with a combined total of about 1.5 billion rows, but they are not being used in the operation I am attempting, so I don't think they are relevant).
I am trying to do what I would have thought was a relatively simple update statement (see below) to bring some data from Table2 into Table1. However, I am having a terrible time with mysql blowing through my system ram, forcing me into swaps, and eventually freezing up the whole system so that mysql becomes unresponsive and I need to restart my computer. My update statement is as below:
UPDATE Table1, Table2
SET
Table1.Column1 = Table2.Column1,
Table1.Column2 = Table2.Column2,
Table1.Column3 = Table2.Column3,
Table1.Column4 = Table2.Column4
WHERE
(Table1.Column5 = Table2.Column5) AND
(Table1.Column6 = Table2.Column6) AND
(Table1.Column7 = Table2.Column7) AND
(Table1.id between 0 AND 5000000);
Ultimately, I want to perform this update for all 330 million rows in Table1. I decided to break it up into batches of 5 million lines each though because
(a) I was getting problems with exceeding lock size and
(b) I thought it might help with my problems of blowing through ram.
Here are some more relevant details about the situation:
I have created indexes for both Table1 and Table2 over the combination of Column5, Column6, Column7 (the columns whose values I am matching on).
Table1 has 50 columns and is about 60 GB total.
Table2 has 8 columns and is 3.5 MB total.
I know that some people might recommend foreign keys in this situation, rather than updating table1 with info from table2, but (a) I have plenty of disk space and don't really care about using it to maximum efficiency (b) none of the values in any of these tables will change over time and (c) I am most concerned about speed of queries run on table1, and if it takes this long to get info from table2 to table1, I certainly don't want to need to repeat the process for every query I run on table1.
In response to the problem of exceeding maximum lock table size, I have experimented with increasing innodb_buffer_pool_size. I've tried a number of values. Even at something as low as 8 GB (i.e. 1/8th of my computer's ram, and I'm running almost nothing else on it while doing this), I am still having this problem of the mysqld process using up basically all of the ram available on the system and then starting to pull ram allocation from the operating system (i.e. my kernel_task starts showing up as using 30GB of ram, whereas it usually uses around 2GB).
The problem with the maximum locks seems to have been largely resolved; I no longer get this error, though maybe that's just because now I blow through my memory and crash before I can get there.
I've experimented with smaller batch sizes (1 million rows, 100,000 rows). These seem to work maybe a bit better than the 5 million row batches, but they still generally have the same problems, maybe only a bit slower to develop. And, performance seems terrible - for instance, at the rate I was going on the 100,000 batch sizes, it would have taken about 7 days to perform this update.
The tables both use InnoDB
I generally set SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; although I don't know if it actually helps or not (I am the only user accessing this DB in any way, so I don't really care about locking and would do away with it entirely if I could)
I notice a lot of variability in the time it takes batches to run. For instance, on the 1 million row batches, I would observe times anywhere between 45 seconds and 20 minutes.
When I tried running something that just found the matching rows and then put only two column values for those into a new table, I got much more consistent times (about 2.5 minutes per million lines). Thus, it seems that my problems might somehow stem from the fact maybe that I'm updating values in the table that I am doing the matching on, even though the columns that I'm updating are different from those I am matching on.
The columns that I am matching on and updating just contain INT and CHAR types, none with more than 7 characters max.
I ran a CHECK TABLE diagnostic and it came back ok.
Overall, I am tremendously perplexed why this would be so difficult. I am new to mysql and databases in general. Since Table2 is so small, I could accomplish this same task, much faster I believe, in python using a dictionary lookup. I would have thought though that databases would be able to handle this better, since handling and updating big datasets is what they are designed for.
I ran some diagnostics on the queries using Mysql workbench and confirmed that there are NOT full table scans being performed.
It really seems something must be going wrong here though. If the system has 64 GB of ram, and that is more than the entire size of the two tables combined (though counting index size it is a bit more than 64 GB for the two tables), and if the operation is only being applied on 5 million out of 330 million rows at a time, it just doesn't make sense that it should blow out the ram.
Therefore, I am wondering:
Is the syntax of how I am writing this update statement somehow horribly bad and inefficient such that it would explain the horrible performance and problems?
Are there some kind of parameters beyond the innodb_buffer_pool_size that I should be configuring, either to put a firmer cap on the ram mysql uses or to get it to more effectively use resources?
Are there other sorts of diagnostics I should be running to try to detect problems with my tables, schema, etc.?
What is a "reasonable" amount of time to expect an update like this to take?
So, after consulting with several people knowledgeable about such matters, here are the solutions I came up with:
I brought my innodb_buffer_pool_size down to 4GB, i.e. 1/16th of my total system memory. This finally seemed to be enough to reliably stop MySQL from blowing through my 64GB of RAM.
I simplified my indexes so that they only contained exactly the columns I needed, and made sure that all indexes I was using were small enough to fit into RAM (with plenty of room to spare for other uses of RAM by MySQL as well).
I learned to accept that MySQL just doesn't seem to be built for particularly large data sets (or, at least not on a single machine, even if a relatively big machine like what I have). Thus, I accepted that manually breaking up my jobs into batches would often be necessary, since apparently the machinery of MySQL doesn't have what it takes to make the right decisions about how to break a job up on its own, in order to be conscientious about system resources like RAM.
Sometimes, when doing jobs along the lines of this, or in general, on my moderately large datasets, I'll use MySQL to do my updates and joins. Other times, I'll just break the data up into chunks and then do the joining or other such operations in another program, such as R (generally using a package like data.table that handles largish data relatively efficiently).
I was also advised that alternatively, I could use something like Pig of Hive on a Hadoop cluster, which should be able to better handle data of this size.

MySQL vs SQL Server 2008 R2 simple select query performance

Can anyone explain to me why there is a dramatic difference in performance between MySQL and SQL Server for this simple select statement?
SELECT email from Users WHERE id=1
Currently the database has just one table with 3 users. MySQL time is on average 0.0003 while SQL Server is 0.05. Is this normal or the MSSQL server is not configured properly?
EDIT:
Both tables have the same structure, primary key is set to id, MySQL engine type is InnoDB.
I tried the query with WITH(NOLOCK) but the result is the same.
Are the servers of the same level of power? Hardware makes a difference, too. And are there roughly the same number of people accessing the db at the same time? Are any other applications using the same hardware (databases in general should not share servers with other applications).
Personally I wouldn't worry about this type of difference. If you want to see which is performing better, then add millions of records to the database and then test queries. Database in general all perform well with simple queries on tiny tables, even badly designed or incorrectly set up ones. To know if you will have a performance problem you need to test with large amounts of data and many simulataneous users on hardware similar to the one you will have in prod.
The issue with diagnosing low cost queries is that the fixed cost may swamp the variable costs. Not that I'm a MS-Fanboy, but I'm more familiar with MS-SQL, so I'll address that, primarily.
MS-SQL probably has more overhead for optimization and query parsing, which adds a fixed cost to the query when decising whether to use the index, looking at statistics, etc. MS-SQL also logs a lot of stuff about the query plan when it executes, and stores a lot of data for future optimization that adds overhead
This would all be helpful when the query takes a long time, but when benchmarking a single query, seems to show a slower result.
There are several factors that might affect that benchmark but the most significant is probably the way MySQL caches queries.
When you run a query, MySQL will cache the text of the query and the result. When the same query is issued again it will simply return the result from cache and not actually run the query.
Another important factor is the SQL Server metric is the total elapsed time, not just the time it takes to seek to that record, or pull it from cache. In SQL Server, turning on SET STATISTICS TIME ON will break it down a little bit more but you're still not really comparing like for like.
Finally, I'm not sure what the goal of this benchmarking is since that is an overly simplistic query. Are you comparing the platforms for a new project? What are your criteria for selection?

mySQL Inconsistent Performance

I'm running a mySQL query that joins various tables of 500,000+ rows. Sometimes it takes a second, other times around 15 seconds! This is on my local machine. I have experienced similarly varied times before on other intensive queries, does anyone know why this is?
Thanks
Thanks for the replies - I am using appropriate indexes, inner and left joins and have a WHERE clause range of one week out of possible 2 year period of invoices. If I keep varying it (so presumably query results are not cached) and re-running, time varies a lot, even if no. of rows retrieved is similar. The server is not busy. A few scheduled queries every minute but not intensive, take around 200ms.
The explain plan shows that a table of around 2000 rows is always fully scanned. So maybe these rows are sometimes cached, or maybe indexes are cached - didnt know indexes could be cached. I will try again with caching turned off.
Editing again - query cache is in fact off, I'm using InnoDB so looks like increasing innodb_buffer_pool_size is way to go
Same query each time?
It's hard to tell, based on what you've posted. If we assume that the schema and data aren't changing, I'd guess that there's something else running on your machine when the queries are long that would explain the difference. It could be that the state of memory is different, so paging is going on; an anti-virus program is running; some other service has started. It's impossible to answer.
Try to do an
Optimize Table
That should help to refresh some data useful for the query planner.
You have not give us much information, if you're using MyISAM tables, it may be a matter of locks.
Are you using ANSI INNER JOINs? Little basic, but don't use "cross joins". Those are the joins with the comma, like
SELECT * FROM t1, t2 WHERE t1.id_t1=t2.id_t1
Last things you may want to try. Increase your buffers (innodb), your key_buffers (myisam), and some query cache buffers.
Here's some common reasons(bar your server simply being too busy)
The slow query is hitting the harddrive. In the fast case the indexes and data are already cached in MySQL or the OS file cache.
Retrieving the data gets locked by updates/inserts, for MyISAM tables the whole table gets locked whenever someone inserts/updates data in it in some cases.
Table statistics are out of date and/or the wrong index gets selected. running analyze oroptimize on the table can help.
You have the query cache enabled, fetching the result of a cached query is fast, fetching it if it's not in the cache might be slow. Try turning off the query cache to check if the query is always slow if its not fetched from the cache.
In any case, you should show the output of EXPLAIN on your queries to verify indexes are getting used properly - even if they're not, queries can be fast if everything is in ram but grinding to a halt if it needs to hit the hardddrive.