Two identical databases on same server with different running times - mysql

I have two databases that are identical except that in one I have about 500.000 entries (distributed over several tables) while the other database is empty.
If I run my program in the empty database then execution takes around 10mins while in the database with the 500k entries execution takes around 40mins. I now deleted some of the entries (about 250k entries) and it speeded up the execution by around 10mins. The strange thing is that these tables where not heavily queried (just some very simple inserts), so I wonder how this can have such an effect on the execution.
Also, all SQL statements that I do (I run a lot of them) are rahter simple (no complicated joins mainly inserts), so I wonder why some tables with 250k entries can have such an effect on the performance. Any ideas what could be the reason?

Following things could be the reason but for actual reasons you should look and profile your queries,
Though you think you are making simple inserts, its not a simple operation from DB perspective. (for every entry you insert following things may change and update
Index
Constraints
Integrity of DB (PK-FK) and there are many things to consider.above things look simple but they take time if volume is high
Check volume of queries (if high no. of insert queries are getting executed then as might be knowing Insert is exclusive operation i.e. it locks the table for updating and volume is high that means more locking time and waiting time.)
to avoid this probably you can try chaining or bulk operations
Is bulk update faster than single update in db2?
Data Distribution also plays important role. if you are accessing heavily loaded tables then parsing/accessing/fetching data from such tables will also take time (it doesn't matter for single query but it really hurts for large volume of similar queries). Try to minimize that by tuning your queries.

Related

Concurrent mysql queries causing large query queue's

I have a large mysql database that receives large volumes of queries, each query takes around 5-10 seconds to perform.
Queries involve checking records, updating records and adding records.
I'm experiencing some significant bottle necks in the query executions, which I believe is due to incoming queries having to 'queue' whilst current queries are using records that these incoming queries need to access.
Is there a way, besides completely reformatting my database structure and SQL queries, to enable simultaneous use of database records by queries?
An INSERT, UPDATE, or DELETE operation locks the relevant tables - myISAM - or rows -InnoDB - until the operation completes. Be sure your query of this type are fastly commited .. and also chechck for you transacation isolating the part with relevant looking ..
For MySQL internal locking see: https://dev.mysql.com/doc/refman/5.5/en/internal-locking.html
Also remeber that in mysql there are differente storage engine with different features eg:
The MyISAM storage engine supports concurrent inserts to reduce
contention between readers and writers for a given table: If a MyISAM
table has no holes in the data file (deleted rows in the middle), an
INSERT statement can be executed to add rows to the end of the table
at the same time that SELECT statements are reading rows from the
table.
https://dev.mysql.com/doc/refman/5.7/en/concurrent-inserts.html
eventually take a look at https://dev.mysql.com/doc/refman/5.7/en/optimization.html

MySQL speed of a table affected by the size of other tables?

In a MySQL database with multiple tables, is the runtime of an query on a table affected by the size of all the other tables within the same database?
I can only think of two effects that other tables could have on a query, assuming that these tables are not involved in the query, no other queries are running, and there are no constraints or triggers that link the tables.
The first effect is during the compilation phase. During this phase, the SQL parser is fetching information about tables and columns from metadata tables. More tables and columns in the database could slow down the compilation of the query.
The other effect is page table and disk fragmentation. If you have a clean system and start allocating and filling pages, then the pages are probably going to fill contiguous pages on the disk (no guarantees, but probably). At access time, the operating system might be pre-fetching physical pages adjacent to a requested page. In the environment I described, this pre-fetched data would probably be used in the query.
In a database with multiple tables, you have a much greater chance of the disk being fragmented. In this case, the pre-fetched hardware pages are less likely to be for the same tables in the query. This means that they don't get used, so an additional I/O request is needed to get the next page.
I've described this in terms of disk fragmentation, but a similar thing can happen with the pages themselves. The physical pages where data for a table is stored may not be contiguous, with similar results.
Fragmentation can be an issue with databases. In fact, it can be an issue regardless of the number of tables in the database. But more tables with more insert/delete activity on them tends to increase fragmentation. However, the effects are usually pretty slight, and would only in certain extreme circumstances be responsible for a significant reduction in performance.
Having many tables (and data) only requires an HDD space.
Processing speed depends on your table optimization and size + queries you executing, not an entire database.
Query run-time can be effected by a wide variety of server environment concerns, so while one might say that in and of itself having other large tables in the database wouldn't effect the speed of queries on unrelated tables, the reality is those tables might be getting queried too and consuming server resources. This of course effects your query speed.
It should only be a problem if that table (the large one) is related to the other table with some sort of connection (foreign key constraints would be one example)...otherwise your tables should behave independently. That said, if you have a single table that is so massive as to be causing speed problems, you might want to find other solutions for refactoring some of that data into smaller subsets.

how to optimize a slow batch INSERT IGNORE in MySQL

Just finished rewriting many queries as batch queries - no more DB calls inside of foreach loops!
One of these new batch queries, and insert ignore into a pivot table, is taking 1-4 seconds each time. It is fairly large (~100 rows per call) and the table is also > 2 million rows.
This is the current bottleneck in my program. Should I consider something like locking the table (never done this before, but I have heard it is ... dangerous) or are there other options I should look at first.
As it is a pivot table, there is a unique key comprised of both the rows I am updating.
Are you using indexes? Indexing the correct columns speeds things up immensely. If you are doing a lot of updating and inserting, sometimes it makes sense to disable indexes until finished, since re-indexing takes time. I don't understand how locking the table would help. Is this table in user by other users or applications? That would be the main reason locking would increase speed.

Updating large quantities of data in a production database

I have a large quantity of data in a production database that I want to update with batches of data while the data in the table is still available for end user use. The updates could be insertion of new rows or updates of existing rows. The specific table is approximately 50M rows, and the updates will be between 100k - 1M rows per "batch". What I would like to do is insert replace with a low priority.. In other words, I want the database to kind of slowly do the batch import without impacting performance of other queries that are occurring concurrently to the same disk spindles. To complicate this, the update data is heavily indexed. 8 b-tree indexes across multiple columns to facilitate various lookup that adds quite a bit of overhead to the import.
I've thought about batching the inserts down into 1-2k record blocks, then having the external script that loads the data just pause for a couple seconds between each insert, but that's really kind of hokey IMHO. Plus, during a 1M record batch, I really don't want to add 500-1000 2second pauses to add 20-40 minutes of extra load time if its not needed. Anyone have ideas on a better way to do this?
I've dealt with a similar scenario using InnoDB and hundreds of millions of rows. Batching with a throttling mechanism is the way to go if you want to minimize risk to end users. I'd experiment with different pause times and see what works for you. With small batches you have the benefit that you can adjust accordingly. You might find that you don't need any pause if you run this all sequentially. If your end users are using more connections then they'll naturally get more resources.
If you're using MyISAM there's a LOW_PRIORITY option for UPDATE. If you're using InnoDB with replication be sure to check that it's not getting too far behind because of the extra load. Apparently it runs in a single thread and that turned out to be the bottleneck for us. Consequently we programmed our throttling mechanism to just check how far behind replication was and pause as needed.
An INSERT DELAYED might be what you need. From the linked documentation:
Each time that delayed_insert_limit rows are written, the handler checks whether any SELECT statements are still pending. If so, it permits these to execute before continuing.
Check this link: http://dev.mysql.com/doc/refman/5.0/en/server-status-variables.html What I would do is write a script that will execute your batch updates when MySQL is showing Threads_running or Connections under a certain number. Hopefully you have some sort of test server where you can determine what a good number threshold might be for either of those server variables. There are plenty of other of server status variables to look at in there also. Maybe control the executions by the Innodb_data_pending_writes number? Let us know what works for you, its an interesting question!

How to predict MySQL tipping points?

I work on a big web application that uses a MySQL 5.0 database with InnoDB tables. Twice over the last couple of months, we have experienced the following scenario:
The database server runs fine for weeks, with low load and few slow queries.
A frequently-executed query that previously ran quickly will suddenly start running very slowly.
Database load spikes and the site hangs.
The solution in both cases was to find the slow query in the slow query log and create a new index on the table to speed it up. After applying the index, database performance returned to normal.
What's most frustrating is that, in both cases, we had no warning about the impending doom; all of our monitoring systems (e.g., graphs of system load, CPU usage, query execution rates, slow queries) told us that the database server was in good health.
Question #1: How can we predict these kinds of tipping points or avoid them altogether?
One thing we are not doing with any regularity is running OPTIMIZE TABLE or ANALYZE TABLE. We've had a hard time finding a good rule of thumb about how often (if ever) to manually do these things. (Since these commands LOCK tables, we don't want to run them indiscriminately.) Do these scenarios sound like the result of unoptimized tables?
Question #2: Should we be manually running OPTIMIZE or ANALYZE? If so, how often?
More details about the app: database usage pattern is approximately 95% reads, 5% writes; database executes around 300 queries/second; the table used in the slow queries was the same in both cases, and has hundreds of thousands of records.
The MySQL Performance Blog is a fantastic resource. Namely, this post covers the basics of properly tuning InnoDB-specific parameters.
I've also found that the PDF version of the MySQL Reference Manual to be essential. Chapter 7 covers general optimization, and section 7.5 covers server-specific optimizations you can toy with.
From the sound of your server, the query cache may be of IMMENSE value to you.
The reference manual also gives you some great detail concerning slow queries, caches, query optimization, and even disk seek analysis with indexes.
It may be worth your time to look into multi-master replication, allowing you to lock one server entirely and run OPTIMIZE/ANALYZE, without taking a performance hit (as 95% of your queries are reads, the other server could manage the writes just fine).
Section 12.5.2.5 covers OPTIMIZE TABLE in detail, and 12.5.2.1 covers ANALYZE TABLE in detail.
Update for your edits/emphasis:
Question #2 is easy to answer. From the reference manual:
OPTIMIZE:
OPTIMIZE TABLE should be used if you have deleted a large part of a table or if you have made many changes to a table with variable-length rows. [...] You can use OPTIMIZE TABLE to reclaim the unused space and to defragment the data table.
And ANALYZE:
ANALYZE TABLE analyzes and stores the key distribution for a table. [...] MySQL uses the stored key distribution to decide the order in which tables should be joined when you perform a join on something other than a constant. In addition, key distributions can be used when deciding which indexes to use for a specific table within a query.
OPTIMIZE is good to run when you have the free time. MySQL optimizes well around deleted rows, but if you go and delete 20GB of data from a table, it may be a good idea to run this. It is definitely not required for good performance in most cases.
ANALYZE is much more critical. As noted, having the needed table data available to MySQL (provided with ANALYZE) is very important when it comes to pretty much any query. It is something that should be run on a common basis.
Question #1 is a bit more of a trick. I would watch the server very carefully when this happens, namely disk I/O. My bet would be that your server is thrashing either your swap or the (InnoDB) caches. In either case, it may be query, tuning, or load related. Unoptimized tables could cause this. As mentioned, running ANALYZE can immensely help performance, and will likely help out too.
I haven't found any good way of predicting MySQL "tipping points" -- and I've run into a few.
Having said that, I've found tipping points are related to table size. But not merely raw table size, rather how big the "area of interest" is to a query. For example, in a table of over 3 million rows and about 40 columns, about three-quarters integers, most queries that would easily select a portion of them based on indices are fast. However, when one value in a query on one indexed column means two-thirds of the rows are now "interesting", the query is now about 5-times slower than normal. Lesson: try to arrange your data so such a scan isn't necessary.
However, such behaviour now gives you a size to look for. This size will be heavily dependant on your server setup, the MySQL server variables and the table's schema and data.
Similarly, I've seen reporting queries run in reasonable time (~45 seconds) if the period is two weeks, but take half-an-hour if the period is extended to four weeks.
Use slow query log that will help you to narrow down the queries you want to optimize.
For time critical queries it sometimes better to keep stable plan by using hints.
It sounds like you have a frustrating situation and maybe not the best code review process and development environment.
Whenever you add a new query to your code you need to check that it has the appropriate indexes ready and add those with the code release.
If you don't do that your second option is to constantly monitor the slow query log and then go beat the developers; I mean go add the index.
There's an option to enable logging of queries that didn't use an index which would be useful to you.
If there are some queries that "works and stops working" (but are "using and index") then it's likely that the query wasn't very good in the first place (low cardinality in the index; inefficient join; ...) and the first rule of evaluating the query carefully when it's added would apply.
For question #2 - On InnoDB "analyze table" is basically free to run, so if you have bad join performance it doesn't hurt to run it. Unless the balance of the keys in the table are changing a lot it's unlikely to help though. It almost always comes down to bad queries. "optimize table" rebuilds the InnoDB table; in my experience it's relatively rare that it helps enough to be worth the hassle of having the table unavailable for the duration (or doing the master-master failover stuff while it's running).