MySQL memory usage for large database - mysql

We have a MySQL DB (OLD_DATA) where many partitioned table size has gone over 100GB. To improve server performance we thought of creating a parallel DB (NEW_DATA) and start collecting new data in NEW_DATA.
The DBs are MyISAM and the server has 96GB RAM.
After this the OLD_DATA will not be accessed.
Will this approach help in improving server performance in terms of RAM and CPU usage?
Will the data from OLD_DATA be loaded in memory?

Please provide SHOW CREATE TABLE. If it is, as you hinted, PARTITIONed, then we need to factor that into the analysis.
Generally if "old" rows are no longer accessed in any way, they do not hurt, and there would be no need to do what you did. Please elaborate on the queries that you feel are "slow", preferably by providing EXPLAIN SELECT.
MyISAM caches index blocks in the "key_buffer"; how big are the indexes? What is key_buffer_size set to? Data blocks are cached by the OS.
It is quite 'normal' for a system to have a dataset that are many times as big as RAM. Performance depends on the 'working set' of the dataset and on how actively you are querying the data. You have implied that "old" rows are not used, hence not part of the 'working set'.
On the other hand, if you have "table scans", the entire table is part of the working set. It is quite possible that we can advise on how to change those queries to be nicer.

Related

how to tune MySQL 5.6 database

We're developing an enterprise product that uses MySQL 5.6 on Windows to store reports generated by multiple clients. Our database contains approximately 20 tables, each table containing ranging from a few hundred thousand to some million records. All tables have more than 10 columns with a combination of numeric and textual data. All tables use innodb engine with numeric field as primary key. The tables are indexed on another numeric field, different than the primary key.
There are about 10 connections used to merge new data into the database. The data is viewed via a web console. There is as such no limit on the number of instances on viewing the data. We also don't have any reference/foreign key in tables so we don't use joins.
We haven't created stored procedure for fetching data. Does the store procedure really improve performance?
While searching solution on the internet, I found that if we changes the values of innodb_buffer_pool_size, read_rnd_buffer_size, sort_buffer_size etc fields in my.ini/my.cnf files, then we can improve performance as well as minimize memory requirements of mysql. But I am not confident about it because I don't know what should be the proper values of it and what are the side effect of it. Currently I kept the default configuration. Please let me know which values can be changed in the configuration file to improve performance and minimize memory requirements without any side effect.
I would also like to know some other ways to optimize & fine tune MySQL engine that would boost the performance & use optimum resources.
Minimum software/hardware requirement of our product is :
OS : Windows 2000 SP4 Professional / Server and later.
RAM : 1 GB
CPU : 1 GHz.
There are no predefined values for defined for performance in any database and if they are present then they are already used in default configurations.
So first time think about schema design. If your schema is designed well then you have already resolved half the issues.
From your description schema looks good as you already have applied indexes,no joins etc.
First try your application with default setting provided by Mysql engine.
Then carry out performance test of your application's critical/widely used/resource intensive workflow. (definition of these terms varies per application.)
If you find performance bottlenecks on database side then try below things,
Check heavy queries used by application. optimize them
if queries cannot be optimized then make them faster using indexes
carry out regular tuning of database (fragmentations,runstats etc.)
if you see bufferpools shortage,sort overflow,read buffers then tune them accordingly which is not one time job.
Performance tuning is very difficult job, requires lot of patience and expertise.
And Yes, generally stored procedures helps to improve performance. apply them whenever you can move heavy code logic into database.
Hope this was useful.

MySQL configuration for a data science workload?

All of what I find on the web for advice on tuning MySQL for performance deals with production databases that have a high number of connections and many repeated queries. This is not my workload, instead, I'm doing data investigation with MySQL where I am the only user, the data doesn't change very often (bulk imports only), and the number of connections I might have at any given time is < 20. The data I have is largish (several hundred gigs, tables with 50M rows with a bunch of strings in them), but the queries I write are rarely run more than a few times each.
I have the O'Reilly Schwartz et al. book on MySQL and it has been a godsend for understanding how to make some things (like indices) work to my advantage. Yet I feel much less comfortable with the server parameters for this kind of workload, as I can find few examples on the web. Here are the non-stock (MySQL 5.5, Ubuntu) parameters I am running with:
max_heap_table_size=32G
tmp_table_size=32G
join_buffer_size=6G
innodb_buffer_pool_size=10G
innodb_buffer_pool_instances=2
sort_buffer_size=100M
My server is a multi-core (quad, seems wasted on MySQL but sometimes I'll hit up a couple of queries at once) 32GB of RAM machine. Right now it looks like MySQL is limiting itself to 12GB of ram, likely because of the innodb_buffer_pool size. I set tmp_table_size and heap size to be just fantastical because I had been doing some queries where I stored a lot in memory.
Are there any good resources to tune MySQL to this kind of workload? Are there suggestions on what parameters I should set for innodb?
I don't think you have to tune your InnoDB engine performance any more. The real performance gain will be in the way you structure tables, and the queries you write. Be sure that the columns you select on are indexed, sensible primary keys are chosen, etc. Tables with 50M rows shouldn't be a problem as long as you have a good primary key.
If you haven't run into any performance bottlenecks yet, then I think there is no reason to worry.

When does a slow MySQL query on a given connection affect other connections?

I think I have a basic understanding of this, but am hoping that someone can give me more details as I am interested in learning more about database performance.
Lets say I have a very large database, with many millions of entries, the database supports many connections. Doing simple queries on the database will be slow as there's so much data. I'm trying to understand exactly when a query on a given connection starts to have a direct effect on the performance of queries running on other connections.
If one connection locks some elements, I understand that that will hold up queries running the other connections that need those elements . For example doing:
SELECT FOR UPDATE
will lock what you are selecting.
What happens when you do something simple like:
SELECT COUNT(*) FROM myTable
lets say we have a table with a billion rows so running the count is going to take some time (running on innodb). Will it affect queries running on other connections?
What if you select a large amount of data using SELECT and JOIN, like:
SELECT * FROM myTable1 JOIN myTable2 ON myTable1.id = myTable2.id;
does having a join lock anything for other queries?
I'm finding it hard to know which queries will have a direct effect on the performance of queries running on other connections.
Thanks
There are different angles:
Row locking: this shouldn't happen if you tune your architecture, so you should forget about it
Real performances issues and bottleneck. In our case, collateral effects.
About this second point, the problem is mainly divided in 3 areas:
Disk reads
Memory usage (buffer)
CPU usage.
About disk reads: the more data (in bytes) you will retrieve, the more the harddrive is going to be busy and slowdown any other activity using it. Reduce the size of selected rows to avoid disk overhead.
About memory usage: mysql manages an internal buffer, that can get stuck in some situations. I don't know enough about it to give you a proper answer, but I know this is definetly something you should keep an eye on.
About cpu usage: basically the cpu will get busy when it
has to calculate (joins, preparing statements, arithmetics...)
has to do all the peripheric stuff: moving bytes from disk to memory for instance.
Optimize your queries to reduce cpu overhead. (sounds silly but, well, it always turns out to be the problem anyway...)
So, now when to know when there's a collateral effect? By profiling your hardware...
How to profile?
absolute profiling: use SHOW INNODB STATUS or SHOW PROFILE to get useful informations about main mysql harddrive, cpu and memory watches.
relative profiling: use your favorite OS profiler. Under windows xp for instance, you can use the great perfmon.exe and watch for PRIVATE BYTES and VIRTUAL BYTES of the mysql process. I say relative, because afterall if a query is time consuming on your computer, it might not be on the NASA system...
Hope it helps, regards.
This is a very general question, so giving a precise answer is difficult.
You can think of the database as a pool of shared resources; especially because the underlying hardware your database runs on has physical limits. Most often the reason you see something like a select query that causes a performance impact on other queries it's because they're all competing for using those underlying physical resources like Disk IO or RAM access or CPU time and there isn't enough to go around.
So the actual results you wil see depend heavily on your database's physical hardware, and the configuration settings.
For instance in your select examples the variables might be: Is the data the query needs already in RAM? Can it look up the rows efficiently by an index? If it does have to do IO, how many other queries are asking to read data from disk? Are you using a secondary index and have to do multiple reads? Is the database doing read-ahead to buffer other pages? Is the query causing sequential or random io? Are any updates holding locks on the data? How much read IO can physical hardware support?
You would have to answer all those questions for all queries currently executing to know if they're going to affect performance of others queries.
This is why DBAs exist. Busy databases are complex system, and it's all about the interaction of a great many different operations, all with thousands of possible variables affecting them.
So what you generally do is optimize the things you can control as well as you know how (hardware, mysql configuration, schema and indexes) then start measuring the system as it runs to understand what is actually going on.
So in your case, I would say that it's infinitely more helpful to focus on simply optimizing your queries individually. The faster they execute, the less resources they are probably using and the less change they will impact others. Then you learn to analyze the system. Just look at one thing that's slow and ask "why is this slow?" Then fix it. That's the optimization process.
However, in the first case you wrote with SELECT ... FOR UPDATE explicit locks can and will be big performance issues. Be careful with those.
Read queries are only affected by isolation levels of other queries. They themselves do not block the table ever.
Isolation levels are designated transactional safety modes. If another query that uses locking does not allow dirty reads your reads will be held until the other query finishes writing or unlocks.
MVCC is a mechanism that allows databases to create a new version of the data when they need to update or delete. Which means that when you start a read on the current version of the data, it data won't get tainted by future updates/deletes.
When you start a write on current data despite the data being currently read by another process, you're in fact writing the new stuff somewhere else and marking them as the newest version. Which in the end means no blocking for the writing process (at least not because of the reading process).

How do I make a MySQL database run completely in memory?

I noticed that my database server supports the Memory database engine. I want to make a database I have already made running InnoDB run completely in memory for performance.
How do I do that? I explored PHPMyAdmin, and I can't find a "change engine" functionality.
Assuming you understand the consequences of using the MEMORY engine as mentioned in comments, and here, as well as some others you'll find by searching about (no transaction safety, locking issues, etc) - you can proceed as follows:
MEMORY tables are stored differently than InnoDB, so you'll need to use an export/import strategy. First dump each table separately to a file using SELECT * FROM tablename INTO OUTFILE 'table_filename'. Create the MEMORY database and recreate the tables you'll be using with this syntax: CREATE TABLE tablename (...) ENGINE = MEMORY;. You can then import your data using LOAD DATA INFILE 'table_filename' INTO TABLE tablename for each table.
It is also possible to place the MySQL data directory in a tmpfs in thus speeding up the database write and read calls. It might not be the most efficient way to do this but sometimes you can't just change the storage engine.
Here is my fstab entry for my MySQL data directory
none /opt/mysql/server-5.6/data tmpfs defaults,size=1000M,uid=999,gid=1000,mode=0700 0 0
You may also want to take a look at the innodb_flush_log_at_trx_commit=2 setting. Maybe this will speedup your MySQL sufficently.
innodb_flush_log_at_trx_commit changes the mysql disk flush behaviour. When set to 2 it will only flush the buffer every second. By default each insert will cause a flush and thus cause more IO load.
Memory Engine is not the solution you're looking for. You lose everything that you went to a database for in the first place (i.e. ACID).
Here are some better alternatives:
Don't use joins - very few large apps do this (i.e Google, Flickr, NetFlix), because it sucks for large sets of joins.
A LEFT [OUTER] JOIN can be faster than an equivalent subquery because
the server might be able to optimize it better—a fact that is not
specific to MySQL Server alone.
-The MySQL Manual
Make sure the columns you're querying against have indexes. Use EXPLAIN to confirm they are being used.
Use and increase your Query_Cache and memory space for your indexes to get them in memory and store frequent lookups.
Denormalize your schema, especially for simple joins (i.e. get fooId from barMap).
The last point is key. I used to love joins, but then had to run joins on a few tables with 100M+ rows. No good. Better off insert the data you're joining against into that target table (if it's not too much) and query against indexed columns and you'll get your query in a few ms.
I hope those help.
If your database is small enough (or if you add enough memory) your database will effectively run in memory since it your data will be cached after the first request.
Changing the database table definitions to use the memory engine is probably more complicated than you need.
If you have enough memory to load the tables into memory with the MEMORY engine, you have enough to tune the innodb settings to cache everything anyway.
"How do I do that? I explored PHPMyAdmin, and I can't find a "change engine" functionality."
In direct response to this part of your question, you can issue an ALTER TABLE tbl engine=InnoDB; and it'll recreate the table in the proper engine.
In place of the Memory storage engine, one can consider MySQL Cluster. It is said to give similar performance but to support disk-backed operation for durability. I've not tried it, but it looks promising (and been in development for a number of years).
You can find the official MySQL Cluster documentation here.
Additional thoughts :
Ramdisk - setting the temp drive MySQL uses as a RAM disk, very easy to set up.
memcache - memcache server is easy to set up, use it to store the results of your queries for X amount of time.

Will a MySQL table with 20,000,000 records be fast with concurrent access?

I ran a lookup test against an indexed MySQL table containing 20,000,000 records, and according to my results, it takes 0.004 seconds to retrieve a record given an id--even when joining against another table containing 4,000 records. This was on a 3GHz dual-core machine, with only one user (me) accessing the database. Writes were also fast, as this table took under ten minutes to create all 20,000,000 records.
Assuming my test was accurate, can I expect performance to be as as snappy on a production server, with, say, 200 users concurrently reading from and writing to this table?
I assume InnoDB would be best?
That depends on the storage engine you're going to use and what's the read/write ratio.
InnoDB will be better if there are lot of writes. If it's reads with very occasional write, MyISAM might be faster. MyISAM uses table level locking, so it locks up whole table whenever you need to update. InnoDB uses row level locking, so you can have concurrent updates on different rows.
InnoDB is definitely safer, so I'd stick with it anyhow.
BTW. remember that right now RAM is very cheap, so buy a lot.
Depends on any number of factors:
Server hardware (Especially RAM)
Server configuration
Data size
Number of indexes and index size
Storage engine
Writer/reader ratio
I wouldn't expect it to scale that well. More importantly, this kind of thing is to important to speculate about. Benchmark it and see for yourself.
Regarding storage engine, I wouldn't dare to use anything but InnoDB for a table of that size that is both read and written to. If you run any write query that isn't a primitive insert or single row update you'll end up locking the table using MyISAM, which yields terrible performance as a result.
There's no reason that MySql couldn't handle that kind of load without any significant issues. There are a number of other variables involved though (otherwise, it's a 'how long is a piece of string' question). Personally, I've had a number of tables in various databases that are well beyond that range.
How large is each record (on average)
How much RAM does the database server have - and how much is allocated to the various configurations of Mysql/InnoDB.
A default configuration may only allow for a default 8MB buffer between disk and client (which might work fine for a single user) - but trying to fit a 6GB+ database through that is doomed to failure. That problem was real btw - and was causing several crashes a day of a database/website till I was brought in to trouble-shoot it.
If you are likely to do a great deal more with that database, I'd recommend getting someone with a little more experience, or at least oing what you can to be able to give it some optimisations. Reading 'High Performance MySQL, 2nd Edition' is a good start, as is looking at some tools like Maatkit.
As long as your schema design and DAL are constructed well enough, you understand query optimization inside out, can adjust all the server configuration settings at a professional level, and have "enough" hardware properly configured, yes (except for sufficiently pathological cases).
Same answer both engines.
You should probably perform a load test to verify, but as long as the index was created properly (meaning indexes are optimized to your query statements), the SELECT queries should perform at an acceptable speed (the INSERTS and/or UPDATES may be more of a speed issue though depending on how many indexes you have, and how large the indexes get).