Magento fragmented tables - mysql

After running mysqltuner on my Magento db it says me
[!!] Total fragmented tables: 1203
In suggestion it says
Run OPTIMIZE TABLE to defragment tables for better performance
I just made a db backup and I was wondering which is the best way to defragment Magneto db.
I saw on some forums that i can't use OPTIMIZE option because is an InnoDB.
My question is:
1) Which is the best command to optimize Magento InnoDB ?
I can't do it one by one if they are 1203 tables.
2)Can I do optimization on all database?

Bogus! That tool always complains about fragmented tables. Virtually all tables in the universe are "fragmented" to some level. Virtually no MySQL system suffers from fragmentation.
I have studied literally thousands of slow queries. Only two could be traced to fragmentation. The rest were solved by indexes, reformulation of queries, redesign of the schema, etc. Not defragmentation.
Most of the other output from that tool is reasonably good. Did anything else warrant a [!!]?
Is the system running slowly?
The best tool for diagnosing slow MySQL is the slowlog. Turn it on, set long_query_time to 2 (or less), wait a day, run pt-query-digest, show us the top 1 or 2 slow queries. With luck, we can give you a fix that will significantly speed up the system.
And if you want a more thorough analysis of the tunables, provide me with SHOW GLOBAL STATUS (after being up at least a day), SHOW VARIABLES and how much RAM you have.

Related

Table partitioning takes too long with percona mysql

I'm trying to create a new partition (LIST) on an existing table (innodb) with 30million records. the query is running almost 2 hours and it still on "copy to tmp table" state.
I have used the percona mysql performance wizard to improve and I don't see any differences. The server is with no traffic at all.
Running on Ubuntu server with 16cores, 30GB memory and SSD 300/3000 IO. it looks like the mysql is not using all resources. the memory usage is on 9GB and only 3 cores are running on very low load.
Is there a way to improve setting to use more resources and speed up the query?
First of all, PARTITION BY LIST is virtually useless. Why do you think it might be worth doing?
Let's see SHOW CREATE TABLE. If there are a lot of secondary indexes, that could be the issue.
How big is innodb_buffer_pool_size? Sounds like it is not as big as it should be. (Recommend about 70% of available RAM.)
Let's see the SQL that is taking so long. There may be something subtle in it.

What data quantity is considered as too big for MySQL?

I am looking for a free SQL database able to handle my data model. The project is a production database working in a local network not connected to the internet without any replication. The number of application connected at the same times would be less than 10.
The data volume forecast for the next 5 years are:
3 tables of 100 millions rows
2 tables of 500 millions rows
20 tables with less than 10k rows
My first idea was to use MySQL, but I have found around the web several articles saying that MySQL is not designed for big database. But, what is the meaning of big in this case?
Is there someone to tell me if MySQL is able to handle my data model?
I read that Postgres would be a good alternative, but require a lot of hours for tuning to be efficient with big tables.
I don't think so that my project would use NOSQL database.
I would know if someone has some experience to share with regarding MySQL.
UPDATE
The database will be accessed by C# software (max 10 at the same times) and web application (2-3 at the same times),
It is important to mention that only few update will be done on the big tables, only insert query. Delete statements will be only done few times on the 20 small tables.
The big tables are very often used for select statement, but the most often in the way to know if an entry exists, not to return grouped and ordered batch of data.
I work for Percona, a company that provides consulting and other services for MySQL solutions.
For what it's worth, we have worked with many customers who are successful using MySQL with very large databases. Terrabytes of data, tens of thousands of tables, tables with billions of rows, transaction load of tens of thousands of requests per second. You may get some more insight by reading some of our customer case studies.
You describe the number of tables and the number of rows, but nothing about how you will query these tables. Certainly one could query a table of only a few hundred rows in a way that would not scale well. But this can be said of any database, not just MySQL.
Likewise, one could query a table that is terrabytes in size in an efficient way. It all depends on how you need to query it.
You also have to set specific goals for performance. If you want queries to run in milliseconds, that's challenging but doable with high-end hardware. If it's adequate for your queries to run in a couple of seconds, you can be a lot more relaxed about the scalability.
The point is that MySQL is not a constraining factor in these cases, any more than any other choice of database is a constraining factor.
Re your comments.
MySQL has referential integrity checks in its default storage engine, InnoDB. The claim that "MySQL has no integrity checks" is a myth often repeated over the years.
I think you need to stop reading superficial or outdated articles about MySQL, and read some more complete and current documentation.
MySQLPerformanceBlog.com
High Performance MySQL, 3rd edition
MySQL 5.6 manual
MySQL has a two important (and significantly different) database engines - MyISAM and InnoDB. A limits depends on usage - MyISAM is nontransactional - there is relative fast import, but it is too simple (without own memory cache) and JOINs on tables higher than 100MB can be slow (due too simple MySQL planner - hash joins is supported from 5.6). InnoDB is transactional and is very fast on operations based on primary key - but import is slower.
Current versions of MySQL has not good planner as Postgres has (there is progress) - so complex queries are usually much better on PostgreSQL - and really simple queries are better on MySQL.
Complexity of PostgreSQL configuration is myth. It is much more simple than MySQL InnoDB configuration - you have to set only five parameters: max_connection, shared_buffers, work_mem, maintenance_work_mem and effective_cache_size. Almost all is related to available memory for Postgres on server. Usually work for 5 minutes. On my experience a databases to 100GB is usually without any problems on Postgres (probably on MySQL too). There are two important factors - how speed you expect and how much memory and how fast IO you have.
With large databases you have to have a experience and knowledges for any database technology. All is fast when you are in memory, and when ratio database size/memory is higher, then much more work you have to do to get good results.
First of all, MySQLs table size is only limited by the allowed file size limit of your OS which is I. The terra bytes on any modern OS. That would pose no problems. Most important are questions like this:
What kind of queries will you run?
Are the large table records updated frequently or basically archives for history data?
What is your hardware budget?
What is the kind of query speed you need?
Are you familiar with table partitioning, archive tables, config tuning?
How fast do you need to write (expected inserts per second)
What language will you use to connect to the db (Java, .net, Ruby etc)
What platform are you most familiar with?
Will you run queries which might cause table scans such like '%something%' which would have to go through every single row and take forever
MySQL is used by Facebook, google, twitter and others with large tables and 100,000,000 is not much in the age of social media. MySQL has very little drawbacks (even though I prefer postgresql in most cases) like altering large tables by adding a new index for example. That might send your company in a couple days forced vacation if you don't have a replica in the meantime. Is there a reason why NoSQL is not an option? Sometimes hybrid approaches are a good choice like having your relational business logic in MySQL and huge statistical tables in a NoSQL database like MongoDb which can scale by adding new servers in minutes (MySQL can too but it's more complicated). Now MongoDB can have a indexed column which can be searched by in blistering speed.
Bejond the bottom line: you need to answer the above questions first to make a very informed decision. If you have huge tables and only search on indexed keys almost any database will do - if you expect many changes to the structure down the road you want to use a different approach.
Edit:
Based on your update you just posted I doubt you would run into problems.

MySQL configuration for a data science workload?

All of what I find on the web for advice on tuning MySQL for performance deals with production databases that have a high number of connections and many repeated queries. This is not my workload, instead, I'm doing data investigation with MySQL where I am the only user, the data doesn't change very often (bulk imports only), and the number of connections I might have at any given time is < 20. The data I have is largish (several hundred gigs, tables with 50M rows with a bunch of strings in them), but the queries I write are rarely run more than a few times each.
I have the O'Reilly Schwartz et al. book on MySQL and it has been a godsend for understanding how to make some things (like indices) work to my advantage. Yet I feel much less comfortable with the server parameters for this kind of workload, as I can find few examples on the web. Here are the non-stock (MySQL 5.5, Ubuntu) parameters I am running with:
max_heap_table_size=32G
tmp_table_size=32G
join_buffer_size=6G
innodb_buffer_pool_size=10G
innodb_buffer_pool_instances=2
sort_buffer_size=100M
My server is a multi-core (quad, seems wasted on MySQL but sometimes I'll hit up a couple of queries at once) 32GB of RAM machine. Right now it looks like MySQL is limiting itself to 12GB of ram, likely because of the innodb_buffer_pool size. I set tmp_table_size and heap size to be just fantastical because I had been doing some queries where I stored a lot in memory.
Are there any good resources to tune MySQL to this kind of workload? Are there suggestions on what parameters I should set for innodb?
I don't think you have to tune your InnoDB engine performance any more. The real performance gain will be in the way you structure tables, and the queries you write. Be sure that the columns you select on are indexed, sensible primary keys are chosen, etc. Tables with 50M rows shouldn't be a problem as long as you have a good primary key.
If you haven't run into any performance bottlenecks yet, then I think there is no reason to worry.

MySQL Optimising Table Cache & tmp disk tables

I'm trying to optimise my MySQL database.
I've got around 90 tables most of which are hardly ever used.
Only 10 or so do the vast bulk of the work running my website.
MySQL status statistics show approx 2M queries over 2.5 days and reports "Opened_tables" of 1.7k (with Open_tables 256). I have the table_cache set at 256, increased from 32.
I presume most of the opened tables are either multiple instances of the same tables from different connections or some temporary tables.
In the same period it reports "Created_tmp_tables" of 19.1 k and more annoyingly Created_tmp_disk_tables of 5.7k. I have max_heap_table_size and tmp_table_size both set at 128M.
I've tried to optimise my indexes & joins as best i can, and i've tried to avoid BLOB and TEXT fields in the tables to avoid disk usage.
Is there anything you can suggest to improve things?
First of all, don't conclude your MySQL database is performing poorly based on these internal statistics. There's nothing wrong with tmp tables. In fact, queries involving ordering or summaries require their creation.
It's like trying to repair your vehicle after analyzing the amount of time it spent in second gear. Substantially less than 1% of your queries are generating tmp tables. That is good. That number is low enough that these queries might be for backups or some kind of maintenance operation, rather than production.
If you are having performance problems, you will know that because certain queries are working too slowly, and certain pages on your web app are slow. Can you figure out which queries have problems? There's a slow query log that might help you.
http://dev.mysql.com/doc/refman/5.1/en/slow-query-log.html
You might try increasing tmp_table_size if you have plenty of RAM. Why not take it up to a couple of megabytes and see if things get better? But, they probably won't change noticeably.

How to predict MySQL tipping points?

I work on a big web application that uses a MySQL 5.0 database with InnoDB tables. Twice over the last couple of months, we have experienced the following scenario:
The database server runs fine for weeks, with low load and few slow queries.
A frequently-executed query that previously ran quickly will suddenly start running very slowly.
Database load spikes and the site hangs.
The solution in both cases was to find the slow query in the slow query log and create a new index on the table to speed it up. After applying the index, database performance returned to normal.
What's most frustrating is that, in both cases, we had no warning about the impending doom; all of our monitoring systems (e.g., graphs of system load, CPU usage, query execution rates, slow queries) told us that the database server was in good health.
Question #1: How can we predict these kinds of tipping points or avoid them altogether?
One thing we are not doing with any regularity is running OPTIMIZE TABLE or ANALYZE TABLE. We've had a hard time finding a good rule of thumb about how often (if ever) to manually do these things. (Since these commands LOCK tables, we don't want to run them indiscriminately.) Do these scenarios sound like the result of unoptimized tables?
Question #2: Should we be manually running OPTIMIZE or ANALYZE? If so, how often?
More details about the app: database usage pattern is approximately 95% reads, 5% writes; database executes around 300 queries/second; the table used in the slow queries was the same in both cases, and has hundreds of thousands of records.
The MySQL Performance Blog is a fantastic resource. Namely, this post covers the basics of properly tuning InnoDB-specific parameters.
I've also found that the PDF version of the MySQL Reference Manual to be essential. Chapter 7 covers general optimization, and section 7.5 covers server-specific optimizations you can toy with.
From the sound of your server, the query cache may be of IMMENSE value to you.
The reference manual also gives you some great detail concerning slow queries, caches, query optimization, and even disk seek analysis with indexes.
It may be worth your time to look into multi-master replication, allowing you to lock one server entirely and run OPTIMIZE/ANALYZE, without taking a performance hit (as 95% of your queries are reads, the other server could manage the writes just fine).
Section 12.5.2.5 covers OPTIMIZE TABLE in detail, and 12.5.2.1 covers ANALYZE TABLE in detail.
Update for your edits/emphasis:
Question #2 is easy to answer. From the reference manual:
OPTIMIZE:
OPTIMIZE TABLE should be used if you have deleted a large part of a table or if you have made many changes to a table with variable-length rows. [...] You can use OPTIMIZE TABLE to reclaim the unused space and to defragment the data table.
And ANALYZE:
ANALYZE TABLE analyzes and stores the key distribution for a table. [...] MySQL uses the stored key distribution to decide the order in which tables should be joined when you perform a join on something other than a constant. In addition, key distributions can be used when deciding which indexes to use for a specific table within a query.
OPTIMIZE is good to run when you have the free time. MySQL optimizes well around deleted rows, but if you go and delete 20GB of data from a table, it may be a good idea to run this. It is definitely not required for good performance in most cases.
ANALYZE is much more critical. As noted, having the needed table data available to MySQL (provided with ANALYZE) is very important when it comes to pretty much any query. It is something that should be run on a common basis.
Question #1 is a bit more of a trick. I would watch the server very carefully when this happens, namely disk I/O. My bet would be that your server is thrashing either your swap or the (InnoDB) caches. In either case, it may be query, tuning, or load related. Unoptimized tables could cause this. As mentioned, running ANALYZE can immensely help performance, and will likely help out too.
I haven't found any good way of predicting MySQL "tipping points" -- and I've run into a few.
Having said that, I've found tipping points are related to table size. But not merely raw table size, rather how big the "area of interest" is to a query. For example, in a table of over 3 million rows and about 40 columns, about three-quarters integers, most queries that would easily select a portion of them based on indices are fast. However, when one value in a query on one indexed column means two-thirds of the rows are now "interesting", the query is now about 5-times slower than normal. Lesson: try to arrange your data so such a scan isn't necessary.
However, such behaviour now gives you a size to look for. This size will be heavily dependant on your server setup, the MySQL server variables and the table's schema and data.
Similarly, I've seen reporting queries run in reasonable time (~45 seconds) if the period is two weeks, but take half-an-hour if the period is extended to four weeks.
Use slow query log that will help you to narrow down the queries you want to optimize.
For time critical queries it sometimes better to keep stable plan by using hints.
It sounds like you have a frustrating situation and maybe not the best code review process and development environment.
Whenever you add a new query to your code you need to check that it has the appropriate indexes ready and add those with the code release.
If you don't do that your second option is to constantly monitor the slow query log and then go beat the developers; I mean go add the index.
There's an option to enable logging of queries that didn't use an index which would be useful to you.
If there are some queries that "works and stops working" (but are "using and index") then it's likely that the query wasn't very good in the first place (low cardinality in the index; inefficient join; ...) and the first rule of evaluating the query carefully when it's added would apply.
For question #2 - On InnoDB "analyze table" is basically free to run, so if you have bad join performance it doesn't hurt to run it. Unless the balance of the keys in the table are changing a lot it's unlikely to help though. It almost always comes down to bad queries. "optimize table" rebuilds the InnoDB table; in my experience it's relatively rare that it helps enough to be worth the hassle of having the table unavailable for the duration (or doing the master-master failover stuff while it's running).