MySQL thread_concurrency, innodb_thread_concurrency, should I use or not? - mysql

I have a dedicated server with 24 CPUs and 32GB of ram.
This server serves website and mysql.
I don't know what is the difference between those two variables, if there is any.
I don't know if I should use them because after reading on Google some say that those variables might be ignored depending on the OS or MySQL versión.
So should I use them?

Please read Mysql Performance Blog carefully, select decent initial values, monitor performance of your server during busy hours of the day and tune accordingly.
There are no simple answers, because your workload is uniquely yours.
Off the top of my head your balance of CPU and RAM seems wrong. I think 1~4 cores for 64GB of ram, or 24 cores for max ram you can get, 192GB perhaps? CPU needs to be provisioned for query rate, while RAM for active/hot dataset size. I can imagine a weird workload where your CPU/RAM makes sense, but I'm not sure innodb is in fact the best solution for such workload.
Coming back to your question: "thread concurrency doesn't do what you expect" in short most likely you should not use. innodb_thread_concurrency is just a cutoff, I'd say if your workload is all hot (i.e. mysql doesn't use much disk(?)), it should not be higher than number of cores. Do read up the blog, these settings are not as simple as they seem.
Also you may want to pay attention to: thread cache, innodb buffer pool, add mem pool, heap table size, sort/key buffer size, flush log at tx commit, log file size. And probably a few more I couldn't think of right now.

The thread_concurrency option in MySQL is mainly for Solaris systems and will also be depreciated in version 5.6, so tuning it might be a waste of time.
thread_concurrency
Also please read: https://www.percona.com/blog/2012/06/04/thread_concurrency-doesnt-do-what-you-expect/
The innodb_thread_concurrency can be adjusted for performance, but I've found no performance increases using it.
I found the best information from https://www.percona.com/blog/. All others suggesting and giving advice might deem MySQL inoperable.

(According to manual "thread_concurrency" variable is usable only for Solaris OS)

This will depend on a number of issues, the operating system, the scheduler options, the I/O subsystem, and the number and type of CPUs, as well as the type and number of queries being run.
The only way you can tell for certain on your system is to adjust the value of innodb_thread_concurrency and run typical workloads to benchmark. A reasonable starting point is from 0 to 48 (in your case) x2 times the number of CPU cores available. You could then increase this until the point at which you start to see the system become CPU bound and throttle it back a bit.
This doesn't take into account the disk activity that your transactions will generate, From there you can then look at disk I/O and make adjustments from there.
Setting this to 0 is setting this to unlimited
** so that by default there is no limit on the number of concurrently executing threads
http://dev.mysql.com/doc/refman/5.5/en/innodb-performance-thread_concurrency.html

Related

what is the issue with rapidminer hbos process ?

I have a dataset of more than 1 million transactions running on windows with 32 GB RAM with HBOS algorithm.
The issue is we are getting an Out of Memory Error.
Can anyone help.
the HBOS algorithm can be quite memory intensive and the memory needed grows with the number of attributes used. So first of all, reducing the number of attributes might help.
But I couldn't reproduce your error. Perhaps you should reduce the max memory used by RapidMiner (under Settings -> Preferences -> System). The JVM always needs a slight overhead, so running RapidMiner with 30GB max memory should be safe.

How to correctly calculate RAM requirement for a bucket in Couchbase

We have a bucket of about 34 million items in a Couchbase cluster setup of 6 AWS nodes. The bucket has been allocated 32.1GB of RAM (5482MB per node) and is currently using 29.1GB. If I use the formula provided in the Couchbase documentation (http://docs.couchbase.com/admin/admin/Concepts/bp-sizingGuidelines.html) it should use approx. 8.94GB of RAM.
Am I calculating it incorrectly? Below is link to google spreadsheet with all the details.
https://docs.google.com/spreadsheets/d/1b9XQn030TBCurUjv3bkhiHJ_aahepaBmFg_lJQj-EzQ/edit?usp=sharing
Assuming that you indeed have a working set of 0.5%, which as Kirk pointed out in his comment, is odd but not impossible, then you are calculating the result of the memory sizing formula correctly. However, it's important to understand that the formula is not a hard and fast rule that fits all situations. Rather, it's a general guideline and serves as a good starting point for you to go and begin your performance tests. Also, keep in mind that the RAM sizing isn't the only consideration for deciding on cluster size, because you also have to consider data safety, total disk write throughput, network bandwidth, CPU, how much a single node failure affects the rest of the cluster, and more.
Using the result of the RAM sizing formula as a starting point, you should now actually test whether your working assumptions were correct. Which means putting real (or close to representative) load on the bucket and seeing whether the % of cache misses is low enough and the operation lacency is within your acceptable limits. There is no general rule for this, what's acceptable to some applications might be too slow for others.
Just as an example, if you see that under load your cache miss ratio is 5% and while the average read latency is 3ms, the top 1% latency is 100ms - then you have to consider whether having one out of every 100 reads take that much longer is acceptable in your application. If it is - great, if not - you need to start increasing the RAM size until it matches your actual working set. Similarly, you should keep an eye on the disk throughput, CPU usage, etc.

swap space used while physical memory is free

i recently have migrated between 2 servers (the newest has lower specs), and it freezes all the time even though there is no load on the server, below are my specs:
HP DL120G5 / Intel Quad-Core Xeon X3210 / 8GB RAM
free -m output:
total used free shared buffers cached
Mem: 7863 7603 260 0 176 5736
-/+ buffers/cache: 1690 6173
Swap: 4094 412 3681
as you can see there is 412 mb ysed in swap while there is almost 80% of the physical ram available
I don't know if this should cause any trouble, but almost no swap was used in my old server so I'm thinking this does not seem right.
i have cPanel license so i contacted their support and they noted that i have high iowait, and yes when i ran sar i noticed sometimes it exceeds 60%, most often it's 20% but sometimes it reaches to 60% or even 70%
i don't really know how to diagnose that, i was suspecting my drive is slow and this might cause the latency so i ran a test using dd and the speed was 250 mb/s so i think the transfer speed is ok plus the hardware is supposed to be brand new.
the high load usually happens when i use gzip or tar to extract files (backup or restore a cpanel account).
one important thing to mention is that top is reporting that mysql is using 100% to 125% of the CPU and sometimes it reaches much more, if i trace the mysql process i keep getting this error continually:
setsockopt(376, SOL_IP, IP_TOS, [8], 4) = -1 EOPNOTSUPP (Operation not supported)
i don't know what that means nor did i get useful information googling it.
i forgot to mention that it's a web hosting server for what it's worth, so it has the standard setup for web hosting (apache,php,mysql .. etc)
so how do i properly diagnose this issue and find the solution, or what might be the possible causes?
As you may have realized by now, the free -m output shows 7603MiB (~7.6GiB) USED, not free.
You're out of memory and it has started swapping which will drastically slow things down. Since most applications are unaware that the virtual memory is now coming from much slower disk, the system may very well appear to "hang" with no feedback describing the problem.
From your description, the first process I'd kill in order to regain control would be the Mysql. If you have ssh/rsh/telnet connectivity to this box from another machine, you may have to login from that in order to get a usable commandline to kill from.
My first thought (hypothesis?) for what's happening is...
MySQL is trying to do something that is not supported as this machine is currently configured. It could be missing a library or an environment variable is not set or any number things.
That operation allocates some memory but is failing and not cleaning up the allocation when it does. If this were a shell script, it could be fixed by putting an event trap command at the beginning that runs a function that releases memory and cleans up.
The code is written to keep retrying on failure so it rapidly uses up all your memory. Refering back to the shell script illustration, the trap function might also prompt to see if you really want to keep retrying.
Not a complete answer but hopefully will help.

What is the cost of memory access?

We like to think that a memory access is fast and constant, but on modern architectures/OSes, that's not necessarily true.
Consider the following C code:
int i = 34;
int *p = &i;
// do something that may or may not involve i and p
{...}
// 3 days later:
*p = 643;
What is the estimated cost of this last assignment in CPU instructions, if
i is in L1 cache,
i is in L2 cache,
i is in L3 cache,
i is in RAM proper,
i is paged out to an SSD disk,
i is paged out to a traditional disk?
Where else can i be?
Of course the numbers are not absolute, but I'm only interested in orders of magnitude. I tried searching the webs, but Google did not bless me this time.
Here's some hard numbers, demonstrating that exact timings vary from CPU family and version to version: http://www.agner.org/optimize/
These numbers are a good guide:
L1 1 ns
L2 5 ns
RAM 83 ns
Disk 13700000 ns
And as an infograph to give you the orders of magnitude:
(src http://news.ycombinator.com/item?id=702713)
Norvig has some values from 2001. Things have changed some since then but I think the relative speeds are still roughly correct.
It could also be in a CPU-register. The C/C++-keyword "register" tells the CPU to keep the variable in a register, but you can't guarantee it will stay or even ever get in there.
As long as the Cache/RAM/Harddisk/SSD is not busy serving other access (e.g. DMA requests) and that the hardware is reasonably reliable, then the cost is still constant (though they may be a large constant).
When you get a cache miss, and you have to page to harddisk to read the variable, then it's just a simple harddisk read request, this cost is huge, as the CPU has to: send interrupt to the kernel for harddisk read request, send a request to harddisk, wait for the harddisk to write the data to RAM, then read the data from RAM to cache and to a register. However, this cost is still constant cost.
The actual numbers and proportions will vary depending on your hardware, and on the compatibility of your hardware (e.g. if your CPU is running on 2000 Mhz and your RAM sends data at 333 Mhz then they doesn't sync very well). The only way you can figure this out is to test it in your program.
And this is not premature optimization, this is micro-optimization. Let the compiler worry about these kind of details.
These numbers change all the time. But for rough estimates for 2010, Kathryn McKinley has nice slides on the web, which I don't feel compelled to copy here.
The search term you want is "memory hierarchy" or "memory hierarchy cost".
Where else can i be?
i and *i are different things, both of them can be located in any of the locations in your list. The pointer address might additionally still be stored in a CPU register when the assignment is made, so it doesn't need to be fetched from RAM/Cache/…
Regarding performance: this is highly CPU-dependent. Thinking in orders of magnitude, accessing RAM is worse than accessing cache entries and accessing swapped-out pages is the worst. All are a bit unpredictable because they depend on other factors as well (i.e. other processors, depending on the system architecture).

How to measure current load of MySQL server?

How to measure current load of MySQL server? I know I can measure different things like CPU usage, RAM usage, disk IO etc but is there a generic load measure for example the server is at 40% load etc?
mysql> SHOW GLOBAL STATUS;
Found here.
The notion of "40% load" is not really well-defined. Your particular application may react differently to constraints on different resources. Applications will typically be bound by one of three factors: available (physical) memory, available CPU time, and disk IO.
On Linux (or possibly other *NIX) systems, you can get a snapshot of these with vmstat, or iostat (which provides more detail on disk IO).
However, to connect these to "40% load", you need to understand your database's performance characteristics under typical load. The best way to do this is to test with typical queries under varying amounts of load, until you observe response times increasing dramatically (this will mean you've hit a bottleneck in memory, CPU, or disk). This load should be considered your critical level, which you do not want to exceed.
is there a generic load measure for example the server is at 40% load ?
Yes! there is:
SELECT LOAD_FILE("/proc/loadavg")
Works on a linux machine. It displays the system load averages for the past 1, 5, and 15 minutes.
System load averages is the average number of processes that are either in a runnable or uninterruptable state. A process in a runnable state is either using the CPU or waiting to use the CPU.
A process in uninterruptable state is waiting for some I/O access, eg waiting for disk. The averages are taken over the three time intervals. Load averages are not normalized for the number of
CPUs in a system, so a load average of 1 means a single CPU system is loaded all the time while on a 4 CPU system it means it was idle 75% of the time.
So if you want to normalize you need to count de number of cpu's also.
you can do that too with
SELECT LOAD_FILE("/proc/cpuinfo")
see also 'man proc'
with top or htop you can follow the usage in Linux realtime
On linux based systems the standard check is usually uptime, a load index is returned according to metrics described here.
aside from all the good answers on this page (SHOW GLOBAL STATUS, VMSTAT, TOP...) there is also a very simple to use tool written by Jeremy Zawodny, it is perfect for non-admin users. It is called "mytop". more info # http://jeremy.zawodny.com/mysql/mytop/
Hi friend as per my research we have some command like
MYTOP: open source program written using PERL language
MTOP: also an open source program written on PERL, It works same as MYTOP but it monitors the queries which are taking longer time and kills them after specific time.
Link for details of above command