Who are using all the memory on my production server(apache + mysql + rails)? - mysql

I am running a EC2 small instance as my production server. It has 1.7G memory. I noticed it uses almost all memory. However when I check top output, it looks like that only 30% is actually used. Did I misread the top output?
Here is the top output (sorted by %MEM)
top - 21:33:15 up 141 days, 9:39, 2 users, load average: 0.00, 0.00, 0.00
Tasks: 81 total, 2 running, 79 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1747660k total, 1733580k used, 14080k free, 224144k buffers
Swap: 917496k total, 132k used, 917364k free, 1144808k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11664 mysql 15 0 794m 83m 5020 S 0.0 4.9 0:17.34 mysqld
12845 nobody 25 0 52416 38m 3200 S 0.0 2.3 0:02.10 ruby1.8
12847 nobody 16 0 52704 38m 2068 S 0.0 2.2 0:02.08 ruby1.8
12023 www-data 15 0 37692 10m 4164 S 0.0 0.6 0:01.28 apache2
11979 www-data 15 0 37660 10m 4172 S 0.0 0.6 0:01.24 apache2
12020 www-data 15 0 37708 10m 4120 S 0.0 0.6 0:01.17 apache2
12263 www-data 15 0 37708 10m 4176 S 0.0 0.6 0:00.83 apache2
11989 www-data 15 0 37720 10m 4024 S 0.0 0.6 0:01.28 apache2
12014 www-data 15 0 37468 10m 4172 S 0.0 0.6 0:01.17 apache2
12021 www-data 15 0 37652 10m 3992 S 0.0 0.6 0:01.25 apache2
12054 www-data 15 0 37480 10m 4176 S 0.0 0.6 0:01.33 apache2
11990 www-data 15 0 37448 10m 4188 S 0.0 0.6 0:01.16 apache2
12024 www-data 16 0 37416 10m 4172 S 0.0 0.6 0:01.00 apache2
11991 www-data 15 0 37432 10m 4148 S 0.0 0.6 0:01.24 apache2
11984 www-data 15 0 37444 9.8m 3972 S 0.0 0.6 0:01.33 apache2
11985 www-data 15 0 37444 9.8m 3948 S 0.0 0.6 0:01.18 apache2
11982 www-data 15 0 37408 9.8m 3968 S 0.0 0.6 0:01.12 apache2
12013 www-data 17 0 37432 9.8m 4152 S 0.0 0.6 0:01.19 apache2
12052 www-data 15 0 37176 9.8m 4180 S 0.0 0.6 0:01.29 apache2
11981 www-data 15 0 37172 9.8m 4168 S 0.0 0.6 0:01.40 apache2
12395 www-data 15 0 37420 9988 3972 S 0.0 0.6 0:00.72 apache2
12015 www-data 15 0 37412 9972 3900 S 0.0 0.6 0:01.31 apache2
11987 www-data 15 0 37160 9956 4136 S 0.0 0.6 0:01.22 apache2
12022 www-data 15 0 37140 9900 4140 S 0.0 0.6 0:01.20 apache2
12051 www-data 15 0 37216 9848 3976 S 0.0 0.6 0:01.31 apache2
11978 www-data 18 0 36948 9784 4180 S 0.0 0.6 0:01.08 apache2
11975 www-data 15 0 37140 9772 3972 S 0.0 0.6 0:01.49 apache2
12019 www-data 15 0 37148 9752 3944 S 0.0 0.6 0:01.08 apache2
11970 www-data 15 0 36920 9736 4160 S 0.0 0.6 0:01.25 apache2
11974 www-data 15 0 36848 9656 4148 S 0.0 0.6 0:01.53 apache2
11973 www-data 15 0 36924 9552 3972 S 0.0 0.5 0:01.19 apache2
28622 root 18 0 35232 9232 5592 S 0.0 0.5 0:00.30 apache2
11969 www-data 15 0 36340 9132 4136 S 0.0 0.5 0:01.51 apache2
12018 www-data 19 0 36332 9124 4136 S 0.0 0.5 0:01.32 apache2
11972 www-data 15 0 36320 8968 3988 S 0.0 0.5 0:01.33 apache2
12012 www-data 15 0 35796 8600 4144 S 0.0 0.5 0:01.11 apache2
11965 root 15 0 17356 7552 1644 S 0.0 0.4 0:00.13 ruby1.8
12848 root 15 0 8384 2744 2164 R 0.0 0.2 0:00.12 sshd
12762 root 15 0 8384 2724 2164 S 0.0 0.2 0:00.01 sshd
11302 postfix 18 0 6184 2576 1880 S 0.0 0.1 0:00.02 tlsmgr
11964 root 16 0 8188 2248 1492 S 0.0 0.1 0:00.06 ApplicationPool
23997 postfix 22 0 5856 1852 1488 S 0.0 0.1 0:00.22 qmgr
12850 root 15 0 4408 1848 1436 S 0.0 0.1 0:00.00 bash
12764 root 25 0 4396 1800 1400 S 0.0 0.1 0:00.00 bash
23996 root 15 0 5804 1780 1428 S 0.0 0.1 0:01.01 master
13036 postfix 17 0 5812 1684 1356 S 0.0 0.1 0:00.00 pickup
1051 klog 18 0 2884 1676 436 S 0.0 0.1 0:00.04 klogd
13035 root 15 0 2468 1164 916 R 0.0 0.1 0:00.01 top
5841 nobody 15 0 2652 1120 684 S 0.0 0.1 0:00.50 memcached
11509 root 15 0 5456 1068 676 S 0.0 0.1 0:00.00 sshd
1163 root 18 0 3560 1060 872 S 0.0 0.1 0:01.46 cron
1 root 18 0 2032 840 580 S 0.0 0.0 0:04.20 init
4070 syslog 18 0 2056 732 568 S 0.0 0.0 7:25.48 syslogd
908 root 16 -2 2292 656 528 S 0.0 0.0 0:00.06 dhclient3

The 'used' count includes filesystem cache and kernel buffers. The cached memory can be free'd when an application requires more heap. You are right to say that only about 30% is actually used, since 65% of that is cache, and 12% is buffers.
The kernel will release the cached memory when an application attempts to allocate more memory, this is normal behavior and I see no problem with your memory usage.
When you use significant amounts of swap, and your 'cached' count is very low - then you have a problem.
Some additional helpful information here (applicable to any Linux distro) -
http://forums.gentoo.org/viewtopic.php?t=175419

Mem: 1747660k total, 1733580k used, 14080k free, 224144k buffers
compare the total and used :-)

It is used for file buffering. It is nothing wrong since good memory managment should always use all availble memory in system. I don't remember but I think that 1144808k cached is the memory you can't find.
You can try writing simple application that reserves about 1 GB of memory and release it and quit. Then probably you should have this 1 GB counted as free memory since file buffers was removed.

Related

How can I stop zombie processes from being left behind by Puppeteer without --no-sandbox?

I don't want to use --no-sandbox for security reasons, but without it, I can't use --no-zygote which is the only solution I could find to prevent zombie process from being created. How can I achieve the same goal of cleaning up zombie processes without --no-sandbox? I know about dumb-init, but I want to know if there is a way to keep the processet from becoming zombies in the first place.
The zombie processes left behind are like this
$ ps aux | grep chrome | head -n 10
app 60 0.0 0.0 0 0 ? Z Mar24 0:00 [chrome_crashpad] <defunct>
app 65 0.0 0.0 0 0 ? Z Mar24 0:00 [chrome] <defunct>
app 66 0.0 0.0 0 0 ? Z Mar24 0:00 [chrome] <defunct>
app 82 0.0 0.0 0 0 ? Z Mar24 0:00 [chrome] <defunct>
app 163 0.0 0.0 0 0 ? Z Mar24 0:00 [chrome] <defunct>
app 179 0.0 0.0 0 0 ? Z Mar24 0:00 [chrome_crashpad] <defunct>
app 184 0.0 0.0 0 0 ? Z Mar24 0:00 [chrome] <defunct>
app 185 0.0 0.0 0 0 ? Z Mar24 0:00 [chrome] <defunct>
app 202 0.0 0.0 0 0 ? Z Mar24 0:00 [chrome] <defunct>
app 285 0.0 0.0 0 0 ? Z Mar24 0:00 [chrome] <defunct>

The loss effects in multitask learning framework

I have designed a multi-task network where the first layers are shared between two output layers. Through investigating multi-task learning principles, I got to know that there should be a weight scalar parameter such as alpha that dampens the two losses outputted from two output layers. My question is about this parameter itself. Does it have effect on the model's final performance? probably yes.
This is the part of my code snippet for computation of losses:
...
mtl_loss = (alpha) * loss_1 + (1-alpha) * loss_2
mtl_loss.backward()
...
Above, loss_1 is MSELoss, and loss_2 is CrossEntropyLoss. As such, picking alpha=0.9, I'm getting the following loss values during training steps:
[2020-05-03 04:46:55,398 INFO] Step 50/150000; loss_1: 0.90 + loss_2: 1.48 = mtl_loss: 2.43 (RMSE: 2.03, F1score: 0.07); lr: 0.0000001; 29 docs/s; 28 sec
[2020-05-03 04:47:23,238 INFO] Step 100/150000; loss_1: 0.40 + loss_2: 1.27 = mtl_loss: 1.72 (RMSE: 1.38, F1score: 0.07); lr: 0.0000002; 29 docs/s; 56 sec
[2020-05-03 04:47:51,117 INFO] Step 150/150000; loss_1: 0.12 + loss_2: 1.19 = mtl_loss: 1.37 (RMSE: 0.81, F1score: 0.08); lr: 0.0000003; 29 docs/s; 84 sec
[2020-05-03 04:48:19,034 INFO] Step 200/150000; loss_1: 0.04 + loss_2: 1.10 = mtl_loss: 1.20 (RMSE: 0.55, F1score: 0.07); lr: 0.0000004; 29 docs/s; 112 sec
[2020-05-03 04:48:46,927 INFO] Step 250/150000; loss_1: 0.02 + loss_2: 0.96 = mtl_loss: 1.03 (RMSE: 0.46, F1score: 0.08); lr: 0.0000005; 29 docs/s; 140 sec
[2020-05-03 04:49:14,851 INFO] Step 300/150000; loss_1: 0.02 + loss_2: 0.99 = mtl_loss: 1.05 (RMSE: 0.43, F1score: 0.08); lr: 0.0000006; 29 docs/s; 167 sec
[2020-05-03 04:49:42,793 INFO] Step 350/150000; loss_1: 0.02 + loss_2: 0.97 = mtl_loss: 1.04 (RMSE: 0.43, F1score: 0.08); lr: 0.0000007; 29 docs/s; 195 sec
[2020-05-03 04:50:10,821 INFO] Step 400/150000; loss_1: 0.01 + loss_2: 0.94 = mtl_loss: 1.00 (RMSE: 0.41, F1score: 0.08); lr: 0.0000008; 29 docs/s; 223 sec
[2020-05-03 04:50:38,943 INFO] Step 450/150000; loss_1: 0.01 + loss_2: 0.86 = mtl_loss: 0.92 (RMSE: 0.40, F1score: 0.08); lr: 0.0000009; 29 docs/s; 252 sec
As training loss shows, it seems that my first network that uses MSELoss converges super fast, while the second network has not been converged yet. RMSE, and F1score are two metrics that I'm using to track the progress of first, and second network, respectively.
I know that picking the optimal alpha is somewhat experimental, but are there hints to make the process of picking it easier? Specifically, I want the networks being trained in line with each other, not like above that the first network converges super duper fast. Can alpha parameter help controlling this?
With that alpha, loss_1 is contributing more to the result and due backpropagation updates weights proportionally to error it improves faster. Try using more equilibrated alpha to balance the performance in both tasks.
You also can try change alpha during training.

Preventing MySQL From Crashing? “Creeping” Memory Leak [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 3 years ago.
Improve this question
I set up a fresh 1GB VPS server instance dedicated solely to MySQL. Everything seems to be working great. However, I noticed that mysqld memory usage quickly grows and sort of peaks out at about 700MB (as expected), but then it slowly “creeps” up over the course of 1-2 days. Then when it reaches about 770MB, the process gets killed by OOM Killer and restarts within a few seconds. It’s not a massive downtime, but I would like it to be stable.
I am using MySQL version 5.7.21. Here are the variables that I changed from default in the my.cnf file, everything else is set to the defaults. The biggest change is an increase memory to the innodb_buffer_pool_size to 512M:
!includedir /etc/mysql/conf.d/
!includedir /etc/mysql/mysql.conf.d/
[mysqld]
sql_mode = "NO_ENGINE_SUBSTITUTION"
innodb_buffer_pool_size = 384M
innodb_log_buffer_size = 2097152
innodb_log_file_size = 20971520
innodb_strict_mode = OFF
join_buffer_size = 1048576
key_buffer_size = 88080384
max_connect_errors = 10000
max_connections = 151
myisam_recover_options = "BACKUP,FORCE"
performance_schema = 0
read_buffer_size = 1048576
slow_query_log = ON
sort_buffer_size = 1048576
sync_binlog = 0
thread_stack = 262144
wait_timeout = 14400
I’m kind of noobish with MySQL administration, so I’m hoping someone with more experience can provide some advice to keeping my MySQL instance a bit more stable on my 1GB instance, OOM Killer strategies and making the database faster/efficient at the same time.
EDIT: I added some extra files for additional information:
SHOW GLOBAL STATUS: https://pastebin.com/SSVEJrQc
SHOW GLOBAL VARIABLES: https://pastebin.com/gV5yGdFR
SHOW ENGINE INNODB STATUS: https://pastebin.com/suHwbpiP
/var/log/mysql/error.log : https://pastebin.com/cRFGrNTp (combined past couple days)
The my.cnf file is shown at the top. I made some changes to it to reduce some of the memory usage, such as dropping the innodb_buffer_pool_size down to 384M instead of 512M. According to "top -c", the memory usage quickly rose to around 550MB and slowly creeped up to about 740M, no crash yet.
Also, the server instance has 1GB of RAM, so I'm not sure why MySQL has to crash at around 770M. It's just a fresh install of Ubuntu 16.04 and MySQL, nothing else at all, no apache or php.
EDIT: I've included more data, I'm running instance on a Dreamhost DreamCompute cloud server:
"top" Results: https://pastebin.com/RNBYMf0b
"df -h" results:
Filesystem Size Used Avail Use% Mounted on
udev 488M 0 488M 0% /dev
tmpfs 100M 11M 89M 11% /run
/dev/vda1 78G 22G 56G 29% /
tmpfs 497M 0 497M 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 497M 0 497M 0% /sys/fs/cgroup
tmpfs 100M 0 100M 0% /run/user/1000
"iostat -x": https://pastebin.com/xUNu7fEi
"mysqltuner": https://pastebin.com/pVC3kN1C
I'm seeing a lot of "tmpfs" directories (using RAM)? Is there anything that should or shouldn't be on this list? Also, I've just now updated the my.cnf with the values Wilson Hauck provided and will restart mysqld.
Edit: Regarding "ulimit", after reading more about it, I edited /etc/security/limits.conf and added the following: "* - nofile 40000" . Now when I use "sudo sh -c "ulimit -n", it shows 40000 instead of 1024.
The "iostat -x" and "mysqltuner" reports were created right after I restarted MySQL because they wouldn't install due to "not enough memory". Thanks Wilson for the mysqltuner suggestion, that looks like a great program.
UPDATE: 2/25/2018:
With all of the recommended settings set, I had mysql again crash about 10 hours ago. Here are the details from syslog:
Feb 25 09:41:52 database kernel: [739566.195215] snapd invoked oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=-900
Feb 25 09:41:52 database kernel: [739566.195276] snapd cpuset=/ mems_allowed=0
Feb 25 09:41:52 database kernel: [739566.195310] CPU: 0 PID: 1228 Comm: snapd Not tainted 4.4.0-112-generic #135-Ubuntu
Feb 25 09:41:52 database kernel: [739566.195311] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011
Feb 25 09:41:52 database kernel: [739566.195313] 0000000000000286 245c0a16fc0af5b0 ffff8800000839d8 ffffffff813fc233
Feb 25 09:41:52 database kernel: [739566.195317] ffff880000083b90 ffff88003ae39c00 ffff880000083a48 ffffffff8120dafe
Feb 25 09:41:52 database kernel: [739566.195319] ffffffff81cd8367 0000000000000000 ffffffff81e6b1a0 0000000000000206
Feb 25 09:41:52 database kernel: [739566.195321] Call Trace:
Feb 25 09:41:52 database kernel: [739566.195361] [<ffffffff813fc233>] dump_stack+0x63/0x90
Feb 25 09:41:52 database kernel: [739566.195375] [<ffffffff8120dafe>] dump_header+0x5a/0x1c5
Feb 25 09:41:52 database kernel: [739566.195383] [<ffffffff811946a2>] oom_kill_process+0x202/0x3c0
Feb 25 09:41:52 database kernel: [739566.195385] [<ffffffff81194ac9>] out_of_memory+0x219/0x460
Feb 25 09:41:52 database kernel: [739566.195394] [<ffffffff8119aad5>] __alloc_pages_slowpath.constprop.88+0x965/0xb00
Feb 25 09:41:52 database kernel: [739566.195396] [<ffffffff8119aef6>] __alloc_pages_nodemask+0x286/0x2a0
Feb 25 09:41:52 database kernel: [739566.195404] [<ffffffff811e483c>] alloc_pages_current+0x8c/0x110
Feb 25 09:41:52 database kernel: [739566.195406] [<ffffffff81190c6b>] __page_cache_alloc+0xab/0xc0
Feb 25 09:41:52 database kernel: [739566.195407] [<ffffffff8119317a>] filemap_fault+0x14a/0x3f0
Feb 25 09:41:52 database kernel: [739566.195418] [<ffffffff812a5d56>] ext4_filemap_fault+0x36/0x50
Feb 25 09:41:52 database kernel: [739566.195419] [<ffffffff811bfe70>] __do_fault+0x50/0xe0
Feb 25 09:41:52 database kernel: [739566.195421] [<ffffffff811c39c2>] handle_mm_fault+0xfa2/0x1820
Feb 25 09:41:52 database kernel: [739566.195433] [<ffffffff810bbc6c>] ? set_next_entity+0x9c/0xb0
Feb 25 09:41:52 database kernel: [739566.195443] [<ffffffff8106b687>] __do_page_fault+0x197/0x400
Feb 25 09:41:52 database kernel: [739566.195445] [<ffffffff8106b957>] trace_do_page_fault+0x37/0xe0
Feb 25 09:41:52 database kernel: [739566.195450] [<ffffffff81063f29>] do_async_page_fault+0x19/0x70
Feb 25 09:41:52 database kernel: [739566.195464] [<ffffffff81849af8>] async_page_fault+0x28/0x30
Feb 25 09:41:52 database kernel: [739566.195465] Mem-Info:
Feb 25 09:41:52 database kernel: [739566.195470] active_anon:220242 inactive_anon:1399 isolated_anon:0
Feb 25 09:41:52 database kernel: [739566.195470] active_file:799 inactive_file:1712 isolated_file:0
Feb 25 09:41:52 database kernel: [739566.195470] unevictable:913 dirty:1 writeback:0 unstable:0
Feb 25 09:41:52 database kernel: [739566.195470] slab_reclaimable:5664 slab_unreclaimable:3906
Feb 25 09:41:52 database kernel: [739566.195470] mapped:1931 shmem:2691 pagetables:1544 bounce:0
Feb 25 09:41:52 database kernel: [739566.195470] free:12712 free_pcp:113 free_cma:0
Feb 25 09:41:52 database kernel: [739566.195473] Node 0 DMA free:4548kB min:716kB low:892kB high:1072kB active_anon:5548kB inactive_anon:12kB active_file:1068kB inactive_file:1968kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:4kB writeback:0kB mapped:808kB shmem:364kB slab_reclaimable:220kB slab_unreclaimable:500kB kernel_stack:368kB pagetables:936kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:19320 all_unreclaimable? yes
Feb 25 09:41:52 database kernel: [739566.195491] lowmem_reserve[]: 0 958 958 958 958
Feb 25 09:41:52 database kernel: [739566.195495] Node 0 DMA32 free:46300kB min:44336kB low:55420kB high:66504kB active_anon:875420kB inactive_anon:5584kB active_file:2128kB inactive_file:4880kB unevictable:3652kB isolated(anon):0kB isolated(file):0kB present:1032184kB managed:1000192kB mlocked:3652kB dirty:0kB writeback:0kB mapped:6916kB shmem:10400kB slab_reclaimable:22436kB slab_unreclaimable:15124kB kernel_stack:3376kB pagetables:5240kB unstable:0kB bounce:0kB free_pcp:452kB local_pcp:452kB free_cma:0kB writeback_tmp:0kB pages_scanned:42084 all_unreclaimable? yes
Feb 25 09:41:52 database kernel: [739566.195502] lowmem_reserve[]: 0 0 0 0 0
Feb 25 09:41:52 database kernel: [739566.195504] Node 0 DMA: 1*4kB (U) 6*8kB (UME) 81*16kB (UME) 32*32kB (UM) 8*64kB (ME) 7*128kB (ME) 3*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 4548kB
Feb 25 09:41:52 database kernel: [739566.195514] Node 0 DMA32: 107*4kB (MEH) 240*8kB (ME) 249*16kB (UMEH) 141*32kB (UMEH) 108*64kB (UMEH) 59*128kB (UMEH) 20*256kB (UMEH) 13*512kB (UME) 9*1024kB (UMH) 0*2048kB 0*4096kB = 46300kB
Feb 25 09:41:52 database kernel: [739566.195526] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Feb 25 09:41:52 database kernel: [739566.195560] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Feb 25 09:41:52 database kernel: [739566.195561] 5817 total pagecache pages
Feb 25 09:41:52 database kernel: [739566.195576] 0 pages in swap cache
Feb 25 09:41:52 database kernel: [739566.195581] Swap cache stats: add 0, delete 0, find 0/0
Feb 25 09:41:52 database kernel: [739566.195582] Free swap = 0kB
Feb 25 09:41:52 database kernel: [739566.195583] Total swap = 0kB
Feb 25 09:41:52 database kernel: [739566.195584] 262044 pages RAM
Feb 25 09:41:52 database kernel: [739566.195585] 0 pages HighMem/MovableOnly
Feb 25 09:41:52 database kernel: [739566.195585] 8019 pages reserved
Feb 25 09:41:52 database kernel: [739566.195586] 0 pages cma reserved
Feb 25 09:41:52 database kernel: [739566.195587] 0 pages hwpoisoned
Feb 25 09:41:52 database kernel: [739566.195588] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
Feb 25 09:41:52 database kernel: [739566.195593] [ 348] 0 348 9237 1199 21 3 0 0 systemd-journal
Feb 25 09:41:52 database kernel: [739566.195595] [ 451] 0 451 25742 46 17 3 0 0 lvmetad
Feb 25 09:41:52 database kernel: [739566.195597] [ 452] 0 452 10744 376 24 3 0 -1000 systemd-udevd
Feb 25 09:41:52 database kernel: [739566.195599] [ 550] 100 550 25081 61 19 3 0 0 systemd-timesyn
Feb 25 09:41:52 database kernel: [739566.195601] [ 940] 0 940 4030 222 11 3 0 0 dhclient
Feb 25 09:41:52 database kernel: [739566.195603] [ 1054] 0 1054 1305 29 8 3 0 0 iscsid
Feb 25 09:41:52 database kernel: [739566.195604] [ 1055] 0 1055 1430 877 8 3 0 -17 iscsid
Feb 25 09:41:52 database kernel: [739566.195606] [ 1063] 0 1063 68647 1056 37 3 0 0 accounts-daemon
Feb 25 09:41:52 database kernel: [739566.195608] [ 1072] 0 1072 6932 491 19 3 0 0 cron
Feb 25 09:41:52 database kernel: [739566.195609] [ 1078] 0 1078 6511 337 18 3 0 0 atd
Feb 25 09:41:52 database kernel: [739566.195611] [ 1091] 0 1091 1099 300 8 3 0 0 acpid
Feb 25 09:41:52 database kernel: [739566.195613] [ 1094] 0 1094 7136 99 19 3 0 0 systemd-logind
Feb 25 09:41:52 database kernel: [739566.195614] [ 1099] 0 1099 16377 273 35 3 0 -1000 sshd
Feb 25 09:41:52 database kernel: [739566.195616] [ 1101] 104 1101 64098 348 27 3 0 0 rsyslogd
Feb 25 09:41:52 database kernel: [739566.195618] [ 1105] 107 1105 10722 380 27 3 0 -900 dbus-daemon
Feb 25 09:41:52 database kernel: [739566.195619] [ 1113] 0 1113 70365 2590 31 6 0 -900 snapd
Feb 25 09:41:52 database kernel: [739566.195621] [ 1114] 0 1114 158952 1017 31 4 0 0 lxcfs
Feb 25 09:41:52 database kernel: [739566.195623] [ 1150] 0 1150 3343 36 11 3 0 0 mdadm
Feb 25 09:41:52 database kernel: [739566.195624] [ 1159] 0 1159 69294 181 38 3 0 0 polkitd
Feb 25 09:41:52 database kernel: [739566.195626] [ 1203] 0 1203 3618 374 12 3 0 0 agetty
Feb 25 09:41:52 database kernel: [739566.195627] [13137] 0 13137 3664 356 11 3 0 0 agetty
Feb 25 09:41:52 database kernel: [739566.195629] [13141] 0 13141 3664 329 12 3 0 0 agetty
Feb 25 09:41:52 database kernel: [739566.195631] [ 4475] 112 4475 345228 190282 441 4 0 0 mysqld
Feb 25 09:41:52 database kernel: [739566.195635] [28991] 0 28991 1126 141 8 3 0 0 apt.systemd.dai
Feb 25 09:41:52 database kernel: [739566.195638] [28998] 0 28998 1126 383 8 3 0 0 apt.systemd.dai
Feb 25 09:41:52 database kernel: [739566.195640] [29036] 0 29036 11324 1282 27 3 0 0 apt-get
Feb 25 09:41:52 database kernel: [739566.195641] [29217] 0 29217 11324 867 23 3 0 0 apt-get
Feb 25 09:41:52 database kernel: [739566.195643] [29220] 0 29220 1126 157 8 3 0 0 sh
Feb 25 09:41:52 database kernel: [739566.195645] [29221] 0 29221 1126 367 9 3 0 0 update-motd-upd
Feb 25 09:41:52 database kernel: [739566.195646] [29235] 0 29235 40967 16609 84 3 0 0 apt-check
Feb 25 09:41:52 database kernel: [739566.195648] [29570] 0 29570 12235 358 28 3 0 0 cron
Feb 25 09:41:52 database kernel: [739566.195649] [29571] 0 29571 1126 144 8 3 0 0 sh
Feb 25 09:41:52 database kernel: [739566.195651] [29572] 0 29572 2809 278 10 3 0 0 bash
Feb 25 09:41:52 database kernel: [739566.195653] [29579] 0 29579 12235 290 28 3 0 0 cron
Feb 25 09:41:52 database kernel: [739566.195654] [29580] 0 29580 1126 163 7 3 0 0 sh
Feb 25 09:41:52 database kernel: [739566.195656] [29581] 0 29581 2809 91 9 3 0 0 bash
Feb 25 09:41:52 database kernel: [739566.195658] [29582] 0 29582 14775 135 30 3 0 0 sshd
Feb 25 09:41:52 database kernel: [739566.195659] [29584] 0 29584 12235 358 28 3 0 0 cron
Feb 25 09:41:52 database kernel: [739566.195661] [29589] 0 29589 14775 134 32 3 0 0 sshd
Feb 25 09:41:52 database kernel: [739566.195662] [29592] 0 29592 12855 431 30 3 0 0 sudo
Feb 25 09:41:52 database kernel: [739566.195664] [29597] 0 29597 1126 70 8 3 0 0 sh
Feb 25 09:41:52 database kernel: [739566.195665] [29598] 0 29598 2809 124 9 3 0 0 bash
Feb 25 09:41:52 database kernel: [739566.195667] [29602] 0 29602 12235 357 28 3 0 0 cron
Feb 25 09:41:52 database kernel: [739566.195668] [29603] 0 29603 1126 92 8 3 0 0 sh
Feb 25 09:41:52 database kernel: [739566.195670] [29604] 0 29604 2809 321 10 3 0 0 bash
Feb 25 09:41:52 database kernel: [739566.195672] [29615] 0 29615 5787 36 13 3 0 0 systemctl
Feb 25 09:41:52 database kernel: [739566.195673] [29620] 0 29620 12235 276 28 3 0 0 cron
Feb 25 09:41:52 database kernel: [739566.195675] [29621] 0 29621 11236 413 26 3 0 0 sudo
Feb 25 09:41:52 database kernel: [739566.195677] [29623] 0 29623 1126 139 8 3 0 0 sh
Feb 25 09:41:52 database kernel: [739566.195678] [29624] 0 29624 2807 234 10 3 0 0 bash
Feb 25 09:41:52 database kernel: [739566.195680] [29628] 0 29628 14775 75 30 3 0 0 sshd
Feb 25 09:41:52 database kernel: [739566.195681] [29629] 0 29629 12235 476 28 3 0 0 cron
Feb 25 09:41:52 database kernel: [739566.195683] [29641] 0 29641 6945 111 18 3 0 0 sudo
Feb 25 09:41:52 database kernel: [739566.195684] [29642] 0 29642 6945 87 18 3 0 0 sudo
Feb 25 09:41:52 database kernel: [739566.195686] [29643] 0 29643 1126 152 8 3 0 0 sh
Feb 25 09:41:52 database kernel: [739566.195687] [29644] 0 29644 345 1 5 3 0 0 bash
Feb 25 09:41:52 database kernel: [739566.195689] Out of memory: Kill process 4475 (mysqld) score 750 or sacrifice child
Feb 25 09:41:52 database kernel: [739566.197374] Killed process 4475 (mysqld) total-vm:1380912kB, anon-rss:761128kB, file-rss:0kB
Feb 25 09:41:53 database kernel: [739566.658202] [UFW BLOCK] IN=ens3 OUT= MAC=fa:16:3e:bc:28:e3:44:f4:77:a7:c0:20:08:00 SRC=***.***.***.*** DST=***.***.***.*** LEN=40 TOS=0x00 PREC=0x00 TTL=61 ID=20807 DF PROTO=TCP SPT=54168 DPT=3306 WINDOW=0 RES=0x00 RST URGP=0
Feb 25 09:41:53 database kernel: [739566.658214] [UFW BLOCK] IN=ens3 OUT= MAC=fa:16:3e:bc:28:e3:44:f4:77:a7:c0:20:08:00 SRC=***.***.***.*** DST=***.***.***.*** LEN=40 TOS=0x00 PREC=0x00 TTL=61 ID=20808 DF PROTO=TCP SPT=49412 DPT=3306 WINDOW=0 RES=0x00 RST URGP=0
Feb 25 09:41:53 database kernel: [739566.658291] [UFW BLOCK] IN=ens3 OUT= MAC=fa:16:3e:bc:28:e3:44:f4:77:a7:c0:20:08:00 SRC=***.***.***.*** DST=***.***.***.*** LEN=40 TOS=0x00 PREC=0x00 TTL=61 ID=20809 DF PROTO=TCP SPT=54293 DPT=3306 WINDOW=0 RES=0x00 RST URGP=0
Feb 25 09:41:53 database kernel: [739566.666146] [UFW BLOCK] IN=ens3 OUT= MAC=fa:16:3e:bc:28:e3:44:f4:77:a7:c0:20:08:00 SRC=***.***.***.*** DST=***.***.***.*** LEN=40 TOS=0x00 PREC=0x00 TTL=61 ID=20810 DF PROTO=TCP SPT=53588 DPT=3306 WINDOW=0 RES=0x00 RST URGP=0
Feb 25 09:41:53 database kernel: [739566.666160] [UFW BLOCK] IN=ens3 OUT= MAC=fa:16:3e:bc:28:e3:44:f4:77:a7:c0:20:08:00 SRC=***.***.***.*** DST=***.***.***.*** LEN=40 TOS=0x00 PREC=0x00 TTL=61 ID=20811 DF PROTO=TCP SPT=53612 DPT=3306 WINDOW=0 RES=0x00 RST URGP=0
Feb 25 09:41:53 database systemd[1]: Started MySQL Community Server.
Feb 25 09:41:53 database CRON[29570]: (CRON) info (No MTA installed, discarding output)
Feb 25 09:41:53 database systemd[1]: mysql.service: Main process exited, code=killed, status=9/KILL
Feb 25 09:41:53 database systemd[1]: mysql.service: Unit entered failed state.
Feb 25 09:41:53 database systemd[1]: mysql.service: Failed with result 'signal'.
Feb 25 09:41:53 database systemd[1]: mysql.service: Service hold-off time over, scheduling restart.
Feb 25 09:41:53 database systemd[1]: Stopped MySQL Community Server.
Feb 25 09:41:53 database systemd[1]: Starting MySQL Community Server...
Feb 25 09:41:53 database kernel: [739567.046536] [UFW BLOCK] IN=ens3 OUT= MAC=fa:16:3e:bc:28:e3:44:f4:77:a7:c0:20:08:00 SRC=***.***.***.*** DST=***.***.***.*** LEN=40 TOS=0x00 PREC=0x00 TTL=42 ID=21475 DF PROTO=TCP SPT=37118 DPT=22 WINDOW=0 RES=0x00 RST URGP=0
Feb 25 09:41:53 database kernel: [739567.066187] [UFW BLOCK] IN=ens3 OUT= MAC=fa:16:3e:bc:28:e3:44:f4:77:a7:c0:20:08:00 SRC=***.***.***.*** DST=***.***.***.*** LEN=40 TOS=0x00 PREC=0x00 TTL=61 ID=20869 DF PROTO=TCP SPT=54293 DPT=3306 WINDOW=0 RES=0x00 RST URGP=0
Feb 25 09:41:53 database kernel: [739567.066313] [UFW BLOCK] IN=ens3 OUT= MAC=fa:16:3e:bc:28:e3:44:f4:77:a7:c0:20:08:00 SRC=***.***.***.*** DST=***.***.***.*** LEN=40 TOS=0x00 PREC=0x00 TTL=61 ID=20870 DF PROTO=TCP SPT=49412 DPT=3306 WINDOW=0 RES=0x00 RST URGP=0
Feb 25 09:41:53 database kernel: [739567.066324] [UFW BLOCK] IN=ens3 OUT= MAC=fa:16:3e:bc:28:e3:44:f4:77:a7:c0:20:08:00 SRC=***.***.***.*** DST=***.***.***.*** LEN=40 TOS=0x00 PREC=0x00 TTL=61 ID=20871 DF PROTO=TCP SPT=54168 DPT=3306 WINDOW=0 RES=0x00 RST URGP=0
Feb 25 09:41:53 database kernel: [739567.082221] [UFW BLOCK] IN=ens3 OUT= MAC=fa:16:3e:bc:28:e3:44:f4:77:a7:c0:20:08:00 SRC=***.***.***.***DST=***.***.***.*** LEN=40 TOS=0x00 PREC=0x00 TTL=61 ID=20873 DF PROTO=TCP SPT=53612 DPT=3306 WINDOW=0 RES=0x00 RST URGP=0
Feb 25 09:41:54 database kernel: [739567.524532] audit: type=1400 audit(1519551714.232:296): apparmor="DENIED" operation="open" profile="/usr/sbin/mysqld" name="/proc/29710/status" pid=29710 comm="mysqld" requested_mask="r" denied_mask="r" fsuid=112 ouid=112
Feb 25 09:41:54 database kernel: [739567.524603] audit: type=1400 audit(1519551714.232:297): apparmor="DENIED" operation="open" profile="/usr/sbin/mysqld" name="/sys/devices/system/node/" pid=29710 comm="mysqld" requested_mask="r" denied_mask="r" fsuid=112 ouid=0
Feb 25 09:41:54 database kernel: [739567.524670] audit: type=1400 audit(1519551714.232:298): apparmor="DENIED" operation="open" profile="/usr/sbin/mysqld" name="/proc/29710/status" pid=29710 comm="mysqld" requested_mask="r" denied_mask="r" fsuid=112 ouid=112
Feb 25 09:41:56 database systemd[1]: Started MySQL Community Server.
According to MySQLtuner, it says: "Maximum possible memory usage: 664.8M (66.99% of installed RAM)", so it must be exceeding this somehow.
Suggestions to consider for your my.cnf/ini [mysqld] section,
ulimit -n 40000 # at your Linux command prompt to raise n open files limit
table_open_cache=10000 # from 2000 to support 1M+ opened in 2 days
table_definition_cache=2500 # from default to support 2000+ opened in 2 days
open_files_limit=30000 # from 5000 to support 900,000 + opened in 2 days
max_connections=50 # from 151 to support 17 max_used_connections
read_rnd_buffer_size=128K # from 256k default to reduce RD RPS
innodb_change_buffer_max_size=15 # from 25% of innodb_buffer_pool_size 1% used
innodb_log_buffer_size=12M # from 2M to cover 30 minutes of log
innodb_log_file_size=120M # from ~ 20M to cover a few days
#max_allowed_packet=16M # lead with # for default of 1M m_a_p
if you need more than 1M, in your SESSION
SET #max_allowed_packet=nnnnnnnn; up to 1G and 1G is the LIMIT.
query_cache_size=0 # from 16M - it is already OFF, do not waste RAM on it
query_cache_limit=1K # from 1M to conserve more RAM
query_cache_min_res_unit=512 # from 4096 to store more small results, if ever used
innodb_buffer_pool_instances=8 # from 1 to minimize mutex contention
will be fine with you UNDER 1G per instance of innodb_buffer_pool_size
innodb_lru_scan_depth=128 # from 1024 which is causing page_cleaner warnings
innodb_page_cleaners=64 # from 1 to auto follow = innodb_buffer_pool_instances
thread_cache_size=50 # from 8 default to support 17 max_used with room for growth.
from my perspective, backup your current my.cnf/ini and implement all. There are more opportunities for another day.
Could we use Stack Overflow's chat next week ?
Your query_cache_size is 52428800 = 50M
innodb_buffer_pool_size = 512M
key_buffer_size = 88080384 = 84M
That's already ~650M, which leaves MySQL about 50M for 'other stuff'.
If you tweak down some of these settings, chances are that you can stay below 700M. Unless your version of MySQL really has a memory leak.
Try reducing some of these numbers by another 50M in total.
Bonus:
Just found this calculator: http://www.mysqlcalculator.com/. Might help you figure this out more accurately. Can't speak to how accurate this tool is though
To answer part of your question:
Also, the server instance has 1GB of RAM, so I'm not sure why MySQL has to crash at around 770M. It's just a fresh install of Ubuntu 16.04 and MySQL, nothing else at all, no apache or php.
A server doesn't have "nothing else at all" even if you don't install anything else yourself, there's always background stuff going on as well, and whatever your VPS image includes. You can use top (or if available my preferred tool, htop) to see what's running and what's using memory.
In the OOM kill log, for instance, apt-check has an rss size of 16848, meaning it's taking another 70GB of RAM by itself (see this answer on OOM kiler logs - the numbers indicate 4kB blocks). Additionally, many of the system "directories" such as /tmp are actually stored in RAM rather than on disk. You can see if that's the caes on your machine by running df -h - anything listed with tmpfs as its filesystem is stored in RAM, and if it has space being used, is using RAM as well.
Put various things together, and it's plausible that various system processes can take a decent chunk of that RAM overhead, even without any other processes you specifically installed.

How can I improve performance on DRF with high CPU time

I have a REST api with DRF and start to see already a performance hit with 100 objects and 1 user requesting (me - testing).
When requesting the more complex query, I get these results for CPU, always 5 - 10s:
Resource Value
>User CPU time 5987.089 msec
System CPU time 463.929 msec
Total CPU time 6451.018 msec
Elapsed time 6800.938 msec
Context switches 9 voluntary, 773 involuntary
but the SQL query stays below 100 ms
The more simple queries show similar behaviour, with CPU times around 1s and query time around 20 ms
So far, what I have tried out:
I am doing select_related() and prefetch_related(), which did improve the query time but not CPU time
I am using Imagekit to generate pictures, on a S3 instance. I removed the whole specification to test and this had minor impact
I run a method field to fetch user-specific data. Removing this had only minor impact
I have checked logs files on the backend and nothing specific shows up here...
Backend is Nginx - supervisord - gunicorn - postgresql - django 1.8.1
Here are the serializer and view:
class ParticipationOrganizationSerializer(ModelSerializer):
organization = OrganizationSerializer(required=False, read_only=True, )
bookmark = SerializerMethodField(
required=False,
read_only=True,
)
location_map = LocationMapSerializer(
required=False,
read_only=True,
)
class Meta:
model = Participation
fields = (
'id',
'slug',
'organization',
'location_map',
'map_code',
'partner',
'looking_for',
'complex_profile',
'bookmark',
'confirmed',
)
read_only_fields = (
'id',
'slug',
'organization',
'location_map',
'map_code',
'partner',
'bookmark',
'confirmed',
)
def get_bookmark(self, obj):
request = self.context.get('request', None)
if request is not None:
if(request.user.is_authenticated()):
# print(obj.bookmarks.filter(author=request.user).count())
try:
bookmark = obj.bookmarks.get(author=request.user)
# bookmark = Bookmark.objects.get(
# author=request.user,
# participation=obj,
# )
return BookmarkSerializer(bookmark).data
except Bookmark.DoesNotExist:
# We have nothing yet
return None
except Bookmark.MultipleObjectsReturned:
# This should not happen, but in case it does, delete all
# the bookmarks for safety reasons.
Bookmark.objects.filter(
author=request.user,
participation=obj,
).delete()
return None
return None
class ParticipationOrganizationViewSet(ReadOnlyModelViewSet):
"""
A readonly ViewSet for viewing participations of a certain event.
"""
serializer_class = ParticipationOrganizationSerializer
queryset = Participation.objects.all().select_related(
'location_map',
'organization',
'organization__logo_image',
).prefetch_related(
'bookmarks',
)
lookup_field = 'slug'
def get_queryset(self):
event_slug = self.kwargs['event_slug']
# Filter for the current event
# Filter to show only the confirmed participations
participations = Participation.objects.filter(
event__slug=event_slug,
confirmed=True
).select_related(
'location_map',
'organization',
'organization__logo_image',
).prefetch_related(
'bookmarks',
)
# Filter on partners? This is a parameter passed on in the url
partners = self.request.query_params.get('partners', None)
if(partners == "true"):
participations = participations.filter(partner=True)
return participations
# http://stackoverflow.com/questions/22616973/django-rest-framework-use-different-serializers-in-the-same-modelviewset
def get_serializer_class(self):
if self.action == 'list':
return ParticipationOrganizationListSerializer
if self.action == 'retrieve':
return ParticipationOrganizationSerializer
return ParticipationOrganizationListSerializer
Any help is very much appreciated!
update
I dumped the data to my local machine and I am observing similar times. I guess this rules out the whole production setup (nginx, gunicorn)?
update 2
Here are the results of the profiler.
Also I made some progress in improving the speeds by
Simplifying my serializers
Doing the tests with curl and having Debug Toolbar off
ncalls tottime percall cumtime percall filename:lineno(function)
0 0 0 profile:0(profiler)
1 0 0 3.441 3.441 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/rest_framework/views.py:442(dispatch)
1 0 0 3.441 3.441 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/rest_framework/viewsets.py:69(view)
1 0 0 3.441 3.441 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/django/views/decorators/csrf.py:57(wrapped_view)
1 0 0 3.44 3.44 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/rest_framework/mixins.py:39(list)
1 0 0 3.438 3.438 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/rest_framework/serializers.py:605(to_representation)
1 0 0 3.438 3.438 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/rest_framework/serializers.py:225(data)
1 0 0 3.438 3.438 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/rest_framework/serializers.py:672(data)
344/114 0.015 0 3.318 0.029 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/rest_framework/serializers.py:454(to_representation)
805 0.01 0 2.936 0.004 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/rest_framework/fields.py:1368(to_representation)
2767 0.013 0 2.567 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/django/dispatch/dispatcher.py:166(send)
2070 0.002 0 2.52 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/registry.py:52(existence_required_receiver)
2070 0.005 0 2.518 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/registry.py:55(_receive)
2070 0.004 0 2.513 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/utils.py:147(call_strategy_method)
2070 0.002 0 2.508 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/cachefiles/strategies.py:14(on_existence_required)
2070 0.005 0 2.506 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/cachefiles/__init__.py:86(generate)
2070 0.002 0 2.501 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/cachefiles/backends.py:109(generate)
2070 0.003 0 2.499 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/cachefiles/backends.py:94(generate_now)
2070 0.01 0 2.496 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/cachefiles/backends.py:65(get_state)
690 0.001 0 2.292 0.003 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/cachefiles/__init__.py:148(__nonzero__)
690 0.005 0 2.291 0.003 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/cachefiles/__init__.py:124(__bool__)
2070 0.007 0 2.276 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/cachefiles/backends.py:112(_exists)
2070 0.01 0 2.269 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/storages/backends/s3boto.py:409(exists)
4140 0.004 0 2.14 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/storages/backends/s3boto.py:282(entries)
1633 0.003 0 2.135 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/storages/backends/s3boto.py:288()
1633 0.001 0 2.129 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/boto/s3/bucketlistresultset.py:24(bucket_lister)
2 0 0 2.128 1.064 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/boto/s3/bucket.py:390(_get_all)
2 0 0 2.128 1.064 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/boto/s3/bucket.py:426(get_all_keys)
1331 0.003 0 1.288 0.001 /usr/lib/python2.7/ssl.py:335(recv)
1331 1.285 0.001 1.285 0.001 /usr/lib/python2.7/ssl.py:254(read)
2 0 0 0.983 0.491 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/boto/connection.py:886(_mexe)
2 0 0 0.983 0.491 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/boto/s3/connection.py:643(make_request)
2 0 0 0.983 0.491 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/boto/connection.py:1062(make_request)
2 0.004 0.002 0.896 0.448 /usr/lib/python2.7/httplib.py:585(_read_chunked)
2 0 0 0.896 0.448 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/boto/connection.py:393(read)
2 0 0 0.896 0.448 /usr/lib/python2.7/httplib.py:540(read)
166 0.002 0 0.777 0.005 /usr/lib/python2.7/httplib.py:643(_safe_read)
166 0.005 0 0.775 0.005 /usr/lib/python2.7/socket.py:336(read)
2 0 0 0.568 0.284 /usr/lib/python2.7/httplib.py:793(send)
2 0 0 0.568 0.284 /usr/lib/python2.7/httplib.py:998(_send_request)
2 0 0 0.568 0.284 /usr/lib/python2.7/httplib.py:820(_send_output)
2 0 0 0.568 0.284 /usr/lib/python2.7/httplib.py:977(request)
2 0 0 0.568 0.284 /usr/lib/python2.7/httplib.py:962(endheaders)
1 0 0 0.567 0.567 /usr/lib/python2.7/httplib.py:1174(connect)
1380 0.001 0 0.547 0 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/cachefiles/__init__.py:82(url)
1380 0.007 0 0.546 0 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/cachefiles/__init__.py:72(_storage_attr)
105 0.009 0 0.528 0.005 /usr/lib/python2.7/socket.py:406(readline)
2 0 0 0.413 0.207 /usr/lib/python2.7/httplib.py:408(begin)
2 0 0 0.413 0.207 /usr/lib/python2.7/httplib.py:1015(getresponse)
2 0 0 0.407 0.203 /usr/lib/python2.7/httplib.py:369(_read_status)
2750 0.003 0 0.337 0 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/rest_framework/fields.py:399(get_attribute)
1 0.223 0.223 0.335 0.335 /usr/lib/python2.7/socket.py:537(create_connection)
2865 0.012 0 0.334 0 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/rest_framework/fields.py:65(get_attribute)
1610 0.005 0 0.314 0 /home/my_app/.virtualenvs/my_app/src/django-s3-folder-storage/s3_folder_storage/s3.py:13(url)
1610 0.012 0 0.309 0 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/storages/backends/s3boto.py:457(url)
690 0.005 0 0.292 0 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/models/fields/utils.py:10(__get__)
690 0.007 0 0.251 0 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/cachefiles/__init__.py:20(__init__)
2 0 0 0.248 0.124
>>>> cutting here, low impact calls

The curious case of high 5 min load average

Looking for some expert advice here. I'm a first time sys admin on my own server and I can't figure the bottle neck in my server.
Linux CentOS 6 Apache 2.4 PHP 5.5
I've been receiving tons of high 5 min load average alert ranging between 8 - 80 from CSF
So I went ahead and installed sqltuner on my server and let it run for 3 days
The results don't show anything out of the ordinary but I'm still getting high 5 min load average daily
I'm trying to find the bottle neck (CPU, load caused by out of memory issues or I/O-bound load)
Would be stoked if someone can share any insights...
(I've included sqltuner's report and the high load email output below)
-------- Security Recommendations -------------------------------------------
[OK] There are no anonymous accounts for any database users
[OK] All database users have passwords assigned
[!!] There is no basic password file list!
-------- Performance Metrics -------------------------------------------------
[--] Up for: 120d 18h 27m 20s (227M q [21.795 qps], 51M conn, TX: 907B, RX: 26B)
[--] Reads / Writes: 38% / 62%
[--] Binary logging is disabled
[--] Total buffers: 15.4G global + 4.1M per thread (600 max threads)
[OK] Maximum reached memory usage: 16.2G (51.75% of installed RAM)
[OK] Maximum possible memory usage: 17.8G (56.91% of installed RAM)
[OK] Slow queries: 0% (11/227M)
[OK] Highest usage of available connections: 33% (199/600)
[OK] Aborted connections: 0.56% (284327/51183230)
[OK] Query cache efficiency: 83.0% (78M cached / 94M selects)
[!!] Query cache prunes per day: 10288
[OK] Sorts requiring temporary tables: 0% (392 temp sorts / 1M sorts)
[OK] Temporary tables created on disk: 4% (65K on disk / 1M total)
[OK] Thread cache hit rate: 99% (199 created / 51M connections)
[OK] Table cache hit rate: 22% (425 open / 1K opened)
[OK] Open file limit used: 0% (433/50K)
[OK] Table locks acquired immediately: 99% (41M immediate / 41M locks)
-------- MyISAM Metrics -----------------------------------------------------
[!!] Key buffer used: 20.2% (108M used / 536M cache)
[OK] Key buffer size / total MyISAM indexes: 512.0M/14.7M
[OK] Read Key buffer hit rate: 99.8% (51M cached / 121K reads)
[!!] Write Key buffer hit rate: 40.8% (4M cached / 2M writes)
-------- InnoDB Metrics -----------------------------------------------------
[--] InnoDB is enabled.
[OK] InnoDB buffer pool / data size: 14.6G/140.9M
[!!] InnoDB buffer pool instances: 1
[!!] InnoDB Used buffer: 3.39% (32546 used/ 959999 total)
[OK] InnoDB Read buffer efficiency: 100.00% (5437258684 hits/ 5437259670 total)
[!!] InnoDB Write buffer efficiency: 0.00% (0 hits/ 1 total)
[OK] InnoDB log waits: 0.00% (0 waits / 24069213 writes)
-------- AriaDB Metrics -----------------------------------------------------
[--] AriaDB is disabled.
-------- Replication Metrics -------------------------------------------------
[--] No replication slave(s) for this server.
[--] This is a standalone server..
-------- Recommendations -----------------------------------------------------
General recommendations:
Run OPTIMIZE TABLE to defragment tables for better performance
Increasing the query_cache size over 128M may reduce performance
Variables to adjust:
query_cache_size (> 128M) [see warning above]
innodb_buffer_pool_instances(=14)
----------------------
(The only change I've made is to reduce InnoDB size and add multiple pool instances)
The high daily load email:
Time: Sun Dec 6 05:43:53 2015 -0500
1 Min Load Avg: 80.26
5 Min Load Avg: 21.19
15 Min Load Avg: 7.46
Running/Total Processes: 221/875
ps.txt
O
Scoreboard Key:
"_" Waiting for Connection, "S" Starting up, "R" Reading Request,
"W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup,
"C" Closing connection, "L" Logging, "G" Gracefully finishing,
"I" Idle cleanup of worker, "." Open slot with no current process
Srv PID Acc M CPU SS Req Conn Child Slot Client VHost Request
0-1424 16243 0/149/2713129 W 1.28 14 0 0.0 2.34 65107.59 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
1-1424 17057 0/18/2701770 W 2.15 4 0 0.0 0.30 62402.50 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
2-1424 17064 0/24/2685073 W 2.11 13 0 0.0 0.32 62668.14 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
3-1424 15319 0/215/2657841 W 3.50 4 0 0.0 3.88 61950.21 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
4-1424 11567 0/204/2651294 W 7.10 7 0 0.0 3.00 63562.61 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
5-1424 16512 0/37/2640191 W 2.19 5 0 0.0 0.60 63637.48 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
6-1424 17735 0/8/2630311 W 0.62 19 0 0.0 0.06 65036.68 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
7-1424 16521 0/31/2613938 W 2.20 19 0 0.0 0.36 62385.07 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
8-1424 16081 0/33/2611913 W 2.46 5 0 0.0 0.42 60535.12 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
9-1424 14711 0/120/2603042 W 1.89 18 0 0.0 2.11 59868.26 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
10-1424 16838 0/21/2592501 W 1.77 15 0 0.0 0.24 62195.33 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
11-1424 16531 0/42/2584776 W 2.45 11 0 0.0 0.39 62253.11 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
12-1424 17065 0/20/2570161 W 1.29 12 0 0.0 0.18 60474.65 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
13-1424 17770 0/13/2564128 W 1.27 2 0 0.0 0.63 59748.24 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
14-1424 17771 0/14/2542936 W 1.30 2 0 0.0 0.17 60513.73 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
15-1424 15736 0/64/2536855 W 2.91 7 0 0.0 1.16 61453.61 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
16-1424 17077 0/19/2522131 W 2.76 15 0 0.0 0.35 59307.60 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
17-1424 14723 0/93/2521068 W 3.38 6 0 0.0 1.77 60437.40 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
18-1424 16279 0/62/2509938 W 1.81 15 0 0.0 1.07 61401.24 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
19-1424 15333 0/116/2498356 W 3.24 19 0 0.0 1.69 57911.45 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
20-1424 16297 1/35/2494463 W 0.98 53 62 16.1 0.47 59474.66 58.174.24.65 suspensionrevolution.com:80 GET /new/wp-content/themes/optimizePressTheme/lib/assets/defaul
21-1424 16298 0/40/2473943 W 3.83 3 0 0.0 0.54 57987.71 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
22-1424 18054 0/6/2469193 W 1.23 1 0 0.0 0.05 59122.65 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
23-1424 12894 0/162/2458774 W 5.90 17 0 0.0 2.42 56404.92 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
24-1424 18088 0/4/2452422 W 0.90 11 0 0.0 0.00 58405.08 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
25-1424 18089 0/6/2446570 W 1.22 1 0 0.0 0.03 57036.34 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
26-1424 17079 0/30/2439491 W 1.88 0 0 0.0 0.43 54697.67 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
27-1424 16101 0/64/2416961 W 1.53 18 0 0.0 1.69 57160.43 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
28-1424 18140 0/9/2403931 W 0.62 18 0 0.0 0.02 55901.03 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
29-982 1505 1/18/1548355 G 0.14 2914733 450294 2.8 0.29 34947.04 96.47.70.4 suspensionrevolution.com:80 POST /dap/dap-clickbank-6.0.php HTTP/1.1
30-1424 15338 0/100/2384316 W 2.20 7 0 0.0 1.24 53919.77 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
31-1424 16300 2/56/2379897 K 3.47 3 1365 2.4 1.01 55195.75 89.166.18.35 appcoiner.com:80 GET /favicon.ico HTTP/1.1
32-1424 15749 0/108/2369131 W 3.26 17 0 0.0 2.09 55452.02 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
33-1424 17100 0/17/2359616 W 1.70 11 0 0.0 0.16 52564.23 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
34-1424 16310 0/162/2356424 W 3.95 15 0 0.0 2.68 55800.32 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
35-1424 16543 0/63/2326471 W 1.29 4 0 0.0 0.75 55028.80 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
36-1424 17101 0/18/2331624 W 2.05 14 0 0.0 0.21 53656.66 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
37-1424 17102 0/20/2314444 W 1.51 19 0 0.0 0.29 55684.29 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
38-1424 19665 0/1/2295814 W 0.00 3 0 0.0 0.00 52187.61 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
39-984 8727 1/69/1464486 G 0.68 2906097 450284 2.8 0.83 33844.50 74.63.153.4 suspensionrevolution.com:80 POST /dap/dap-clickbank-6.0.php HTTP/1.1
40-1424 19720 0/1/2277467 W 0.00 2 0 0.0 0.00 55864.93 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
41-1424 18141 0/6/2270838 W 0.62 14 0 0.0 0.02 54059.95 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
42-983 18177 1/49/1440183 G 0.63 2910665 450302 2.8 0.57 31224.74 74.63.153.4 suspensionrevolution.com:80 POST /dap/dap-clickbank-6.0.php HTTP/1.1
43-1424 16104 2/57/2242969 W 2.62 5 0 8.8 0.83 56170.39 54.202.7.147 appcoiner.com:80 GET /start-2/?utm_expid=111102625-1.-ThtNpCTSByWcbkMGdBOow.1&ho
44-1424 16547 0/28/2247277 W 3.84 5 0 0.0 0.31 53028.08 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
45-1424 15797 0/80/2225028 W 3.24 5 0 0.0 1.63 51333.94 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
46-1424 19721 0/1/2205346 W 0.00 2 0 0.0 0.00 52025.79 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
47-1424 18142 0/11/2207016 W 0.94 1 0 0.0 0.07 51355.07 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
48-1424 17104 0/137/2172322 W 1.28 7 0 0.0 2.32 49665.11 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
49-1424 16314 0/63/2168481 W 4.14 5 0 0.0 1.16 49191.04 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
50-1424 19763 0/0/2141243 W 2.41 12 0 0.0 0.00 49538.97 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
51-1424 17106 0/20/2137681 W 1.24 7 0 0.0 0.29 49973.70 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
52-1424 16549 0/34/2125106 W 1.99 7 0 0.0 0.50 50442.63 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
53-1424 17107 0/20/2109740 W 2.18 2 0 0.0 0.31 48074.92 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
54-1424 18143 0/4/2087977 W 1.21 8 0 0.0 0.00 49243.64 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
55-1424 17114 0/30/2062106 W 0.34 17 0 0.0 0.46 48605.36 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
56-1424 17115 0/19/2064562 W 2.07 0 0 0.0 0.30 47600.46 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
57-1424 16569 0/41/2051051 W 3.50 8 0 0.0 0.54 47547.57 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
58-1424 17116 0/28/2023150 W 1.28 4 0 0.0 0.42 49170.14 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
59-1424 17117 0/30/2010767 W 1.41 1 0 0.0 0.51 47681.03 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
60-1424 17118 0/17/1999913 R 0.05 53 5 0.0 0.26 46914.84 65.30.135.196
61-1424 16572 0/35/1978028 W 2.73 16 0 0.0 0.45 45848.82 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
62-1424 16573 0/27/1957566 W 1.60 0 0 0.0 0.45 46768.01 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
63-1424 16574 0/43/1936669 W 2.37 3 0 0.0 0.54 43520.07 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
64-1424 16575 0/28/1922381 W 1.54 1 0 0.0 0.33 45007.49 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
65-1424 16576 0/32/1903916 W 2.15 13 0 0.0 0.73 45117.83 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
66-1424 17119 0/28/1878566 W 1.18 6 0 0.0 0.45 44448.25 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
67-1424 16578 1/39/1869043 K 2.01 0 0 1.2 0.53 44966.25 114.79.47.51 suspensionrevolution.com:80 GET /favicon.ico HTTP/1.1
68-1424 16579 0/37/1841958 W 2.59 5 0 0.0 0.48 44262.26 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
This is my TOP output:
root#ns513521 [~]# top
top - 08:21:31 up 153 days, 3:51, 1 user, load average: 0.15, 0.27, 0.51
Tasks: 230 total, 2 running, 227 sleeping, 0 stopped, 1 zombie
Cpu(s): 0.4%us, 0.3%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 32855908k total, 25102984k used, 7752924k free, 886004k buffers
Swap: 1569780k total, 63984k used, 1505796k free, 21254784k cached
This is my ioStat output:
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 21.99 1476.57 549.50 19540299168 7271856568
sdb 19.53 982.85 549.49 13006647390 7271718616
sdc 19.46 978.26 549.49 12945853934 7271718616
md2 9.78 492.22 399.94 6513868322 5292584264
md1 20.05 27.80 140.15 367920858 1854711136
I think you can just use sar, e.g. something like this :
sar -q -s 00:00:00 -e 11:59:59 -f /var/log/sa/sa`date +%d | awk '{printf "%02d", $1 - 1}'`