How atopsar calculates HDD load? - mysql

I have dedicated to MySQL InnoDB log files (ib_logfile0, ib_logfile0) HDD - sda. And atopsar shows big load to this HDD
atopsar -d 60:
13:02:10 disk busy read/s KB/read writ/s KB/writ avque avserv _dsk
13:03:10 sda 59% 4.4 4.0 45.3 6.2 1.0 11.88 ms
13:04:10 sda 60% 4.5 4.0 45.6 6.1 1.0 11.98 ms
13:05:10 sda 58% 4.2 4.0 44.7 6.0 1.0 11.94 ms
dstat -tdD total,sda 60:
----system---- -dsk/total----dsk/sda--
time | read writ: read writ
24-09 13:11:24| 23k 912k:9689B 391k
24-09 13:12:24| 33k 971k: 16k 270k
24-09 13:13:24| 16k 893k: 14k 235k
24-09 13:14:24| 18k 963k: 16k 254k
pt-ioprofile -cell sizes:
total pread read pwrite write fsync open close lseek fcntl filename
905728 0 0 905728 0 0 0 0 0 0 /var/mysqllog/mysql/ib_logfile0
200-400Kb per second does not seems to be much to show busy > 50%. Specially considering that the only files on HDD are MySQL InnoDB log files and (from the InnoDB blog).:
The redo log files are used in a circular fashion. This means that the redo logs are written from the beginning to end of first redo log file, then it is continued to be written into the next log file, and so on till it reaches the last redo log file. Once the last redo log file has been written, then redo logs are again written from the first redo log file.
So the question is why the load is so big, Is it real physical capability of HDD?
Seems to be the load is calculated (all request, read+write)*avserv/1000. For the first atopsar line calculation is as follows: (4.4 + 45.3)*11.88 / 1000 = 0.59

Related

libvirt: use of hugepages on NUMA system

Machine has 4 Numa nodes and is booted with kernel boot parameter default_hugepagesz=1G. I start VM with libvirt/virsh, and I can see that qemu launches with -m 65536 ... -mem-prealloc -mem-path /mnt/hugepages/libvirt/qemu, i.e. start virtual machine with 64GB of memory and request it to allocate the guest memory from a temporarily created file in /mnt/hugepages/libvirt/qemu:
% fgrep Huge /proc/meminfo
AnonHugePages: 270336 kB
ShmemHugePages: 0 kB
HugePages_Total: 113
HugePages_Free: 49
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 1048576 kB
Hugetlb: 118489088 kB
%
% numastat -cm -p `pidof qemu-system-x86_64`
Per-node process memory usage (in MBs) for PID 3365 (qemu-system-x86)
Node 0 Node 1 Node 2 Node 3 Total
------ ------ ------ ------ -----
Huge 29696 7168 0 28672 65536
Heap 0 0 0 31 31
Stack 0 0 0 0 0
Private 4 9 4 305 322
------- ------ ------ ------ ------ -----
Total 29700 7177 4 29008 65889
...
Node 0 Node 1 Node 2 Node 3 Total
------ ------ ------ ------ ------
MemTotal 128748 129017 129017 129004 515785
MemFree 98732 97339 100060 95848 391979
MemUsed 30016 31678 28957 33156 123807
...
AnonHugePages 0 4 0 260 264
HugePages_Total 29696 28672 28672 28672 115712
HugePages_Free 0 21504 28672 0 50176
HugePages_Surp 0 0 0 0 0
%
This output confirms that host's memory of 512GB is equally split across the numa nodes, and hugepages are also equally distributed across the nodes.
The question is how does qemu (or kvm?) determine how many hugepages to allocate? Note that libvirt xml has the following directive:
<memoryBacking>
<hugepages/>
<locked/>
</memoryBacking>
However, it is unclear from https://libvirt.org/formatdomain.html#memory-tuning what are defaults for hugepage allocation and on which nodes? Is it possible to have all memory for VM allocated from node 0? What is the right way doing this?
UPDATE
Since my VM workload is actually pinned to a set of cores on a single numa node 0 using <vcpupin> element, I thought it'd be good idea to enforce Qemu to allocate memory from the the same numa node:
<numtune>
<memory mode="strict" nodeset="0">
</numtune>
However this didn't work, qemu returned error in its log:
os_mem_prealloc insufficient free host memory pages available to allocate guest ram
Does it mean it fails to find free huge pages on the numa node 0?
If you use a plain <hugepages/> element, then libvirt will configure QEMU to allocate from the default huge page pool. Given your 'default_hugepagesz=1G' that should mean that QEMU allocates 1 GB sized pages. QEMU will allocate as many as are needed to satisfy the request RAM size. Given your configuration, these huge pages can potentially be allocated from any NUMA node.
With more advanced libvirt configuration it is possible to request allocation of a specific size of huge page, and pick them from specific NUMA nodes. The latter is only really needed if you are also locking CPUs to a specific host NUMA node.
Does it mean it fails to find free huge pages on the numa node 0?
Yes, it does.
numastat -m can be used to find out how many Huge Pages are there totally, free.

Chromium native memory growth (by image caching?)

My single-page web application runs into a memory leak situation with Chromium on Linux. My app fetches html markup and embeds it into the main page via .innerHTML. The remote html snippets may reference jpgs or pngs. The memory leak does not occur in JS-heap. This can be seen in the table below.
Column A shows the size of the JS-heap as reported by performance.memory.usedJSHeapSize. (I am running Chromium with -enable-precise-memory-info parameter to get precise values.)
Column B shows the total amount of memory occupied by Chromium as shown by top (only start and end values sampled).
Column C shows the amount of "available memory" in Linux as reported by /proc/meminfo.
A B C
01 1337628 234.5 MB 522964 KB
02 1372198 499404 KB
03 1500304 499864 KB
04 1568540 485476 KB
05 1651320 478048 KB
06 1718846 489684 KB GC
07 1300169 450240 KB
08 1394914 475624 KB
09 1462320 472540 KB
10 1516964 471064 KB
11 1644589 459604 KB GC
12 1287521 441532 KB
13 1446901 449220 KB
14 1580417 436504 KB
15 1690518 457488 KB
16 1772467 444216 KB GC
17 1261924 418896 KB
18 1329657 439252 KB
19 1403951 436028 KB
20 1498403 434292 KB
21 1607942 429272 KB GC
22 1298138 403828 KB
23 1402844 412368 KB
24 1498350 412560 KB
25 1570854 409912 KB
26 1639122 419268 KB
27 1715667 399460 KB GC
28 1327934 379188 KB
29 1438188 417764 KB
30 1499364 401160 KB
31 1646557 406020 KB
32 1720947 402000 KB GC
33 1369626 283.3 MB 378324 KB
While the JS-heap only alters between 1.3 MB and 1.8 MB during my 33-step test, Chromium memory (as reported by top) grows by 48.8 MB (from 234.5 MB to 283.3 MB). And according to /proc/meminfo the "available memory" is even shrinking by 145 MB (from 522964 KB to 378324 KB in column C) at the same time. I assume Chromium is occupying some good amount of cache memory outside of the reported 283.3 MB. Note that I am invoking GC manually 6 times via developer tools.
Before running the test, I have stopped all unnecessary services and killed all unneeded processes. No other code is doing any essential work in parallel. There are no browser extensions and no other open tabs.
The memory leak appears to be in native memory and probably involves the images being shown. They are never being freed, it seems. This issue appears to be similar to 1 (bugs.webkit.org). I have applied all the usual suggestions as listed here 2. If I keep the app running, the amount of memory occupied by Chromium will grow unbound up until everything comes to a crawl or until the Linux OOM killer strikes. The one thing I can do (before it is too late) is to switch the browser to a different URL. This will releases all native memory and I can then return to my app and continue with a completely fresh memory situation. But this is not a real solution.
Q: Can the native memory caches be freed in a more programatic way?

Spark CSV GZip to Parquet?

I am using Spark 2.3.1 PySpark (AWS EMR)
I am getting memory errors:
Container killed by YARN for exceeding memory limits
Consider boosting spark.yarn.executor.memoryOverhead
I have input of 160 files, each file approx 350-400 MBytes, each file is a CSV Gzip format.
To read the csv.gz files (with wildcard) and I use this Pyspark
dfgz = spark.read.load("s3://mybucket/yyyymm=201708/datafile_*.csv.gz",
format="csv", sep="^", inferSchema="false", header="false", multiLine="true", quote="^", nullValue="~", schema="id string,...."))
To save the data frame I use this (PySpark)
(dfgz
.write
.partitionBy("yyyymm")
.mode("overwrite")
.format("parquet")
.option("path", "s3://mybucket/mytable_parquet")
.saveAsTable("data_test.mytable")
)
One line of code to save all 160 files.
I tried this with 1 file and it works fine.
Total size for all 160 files (csv.gzip) is about 64 GBytes.
Each file as a pure CSV, when Unzipped is approx 3.5 GBytes. I am assuming Spark may unzip each file in RAM and then convert it to Parquet in RAM ??
I want to convert each csv.gzip file to Parquet format i.e. I want 160 Parquet files as output (ideally).
The task runs for a while and it seems to create 1 Parquet file for each CSV.GZ file. After some time it always fails with Yarn memory error.
I tried various settings for executors memory and memoryOverhead and all results in no change - jobs always fails. I tried memoryOverhead of up to 1-8 GB and executormemory of 8G.
Apart from manually breaking up input 160 files workload into many small workloads what else can I do?
Do I need a Spark cluster with a total RAM capacity of much greater than 64 GB?
I use 4 slave nodes, each has 8 CPU and 16 GB per node (slaves), plus one master of 4 CPU and 8 GB of RAM.
This is (with overhead) less than 64 GB of input gzip csv files I am trying to process but the files are evenly sized of 350-400 MBytes so I dont understand why Spark is throwing memory errors given it can easily process these 1 file at a time per executor, discard it and move on to next file. It does not appear to work this way. I feel it is trying to load all input csv.gzip files into memory but I have no way of knowing it (I am still new to Spark 2.3.1).
Late Update: I managed to get it to work with following memory config:
4 slave nodes, each 8 CPU and 16 GB of RAM
1 master node, 4 CPU and 8 GB of RAM:
spark maximizeResourceAllocation false
spark-defaults spark.driver.memoryOverhead 1g
spark-defaults spark.executor.memoryOverhead 2g
spark-defaults spark.executor.instances 8
spark-defaults spark.executor.cores 3
spark-defaults spark.default.parallelism 48
spark-defaults spark.driver.memory 6g
spark-defaults spark.executor.memory 6g
Needless to say - I cannot explain why this config worked!
Also this took 2 hours+ to process 64 GB of gzip data which seems slow even for a small 4+1 node cluster with total of 32+4 CPU and 64+8 GB of RAM. Perhaps S3 was the bottleneck....
FWIW I just did not expect to micro-manage a database cluster for memory, disk I/O or CPU allocation.
Update 2:
I just ran another load on same cluster with same config, a smaller load of 129 files of same sizes and this load failed with same Yarn memory errors.
I am very disappointed with Spark 2.3.1 memory management.
Thank you for any guidance

How to profile a web service?

I'm currently developing an practice application in node.js. This applications consists of a JSON REST web service which allows two services.
Insert log (a PUT request to /log, with the message to log)
Last 100 logs (a GET request to /log, that returns the latest 100 logs)
The current stack is formed by a node.js server that has the application logic and a mongodb database that takes care of the persistence. To offer the JSON REST web services I'm using the node-restify module.
I'm currently executing some stress tests using apache bench (using 5000 requests with a concurrency of 10) and get the following results:
Execute stress tests
1) Insert log
Requests per second: 754.80 [#/sec] (mean)
2) Last 100 logs
Requests per second: 110.37 [#/sec] (mean)
I'm surprised of the difference there is in performance, the query I'm executing uses an index. Interestingly enough it seems that the JSON output generation seems to get all the time on deeper tests I have performed.
Can node applications be profiled in detail?
Is this behaviour normal? Retrieving data takes so much more than inserting data?
EDIT:
Full test information
1) Insert log
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Server Software: log-server
Server Hostname: localhost
Server Port: 3010
Document Path: /log
Document Length: 0 bytes
Concurrency Level: 10
Time taken for tests: 6.502 seconds
Complete requests: 5000
Failed requests: 0
Write errors: 0
Total transferred: 2240634 bytes
Total PUT: 935000
HTML transferred: 0 bytes
Requests per second: 768.99 [#/sec] (mean)
Time per request: 13.004 [ms] (mean)
Time per request: 1.300 [ms] (mean, across all concurrent requests)
Transfer rate: 336.53 [Kbytes/sec] received
140.43 kb/s sent
476.96 kb/s total
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 3
Processing: 6 13 3.9 12 39
Waiting: 6 12 3.9 11 39
Total: 6 13 3.9 12 39
Percentage of the requests served within a certain time (ms)
50% 12
66% 12
75% 12
80% 13
90% 15
95% 24
98% 26
99% 30
100% 39 (longest request)
2) Last 100 logs
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Server Software: log-server
Server Hostname: localhost
Server Port: 3010
Document Path: /log
Document Length: 4601 bytes
Concurrency Level: 10
Time taken for tests: 46.528 seconds
Complete requests: 5000
Failed requests: 0
Write errors: 0
Total transferred: 25620233 bytes
HTML transferred: 23005000 bytes
Requests per second: 107.46 [#/sec] (mean)
Time per request: 93.057 [ms] (mean)
Time per request: 9.306 [ms] (mean, across all concurrent requests)
Transfer rate: 537.73 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 1
Processing: 28 93 16.4 92 166
Waiting: 26 85 18.0 86 161
Total: 29 93 16.4 92 166
Percentage of the requests served within a certain time (ms)
50% 92
66% 97
75% 101
80% 104
90% 113
95% 121
98% 131
99% 137
100% 166 (longest request)
Retrieving data from the database
To query the database I use the mongoosejs module. The log schema is defined as:
{
date: { type: Date, 'default': Date.now, index: true },
message: String
}
and the query I execute is the following:
Log.find({}, ['message']).sort('date', -1).limit(100)
Can node applications be profiled in detail?
Yes. Use node --prof app.js to create a v8.log, then use linux-tick-processor, mac-tick-processor or windows-tick-processor.bat (in deps/v8/tools in the node src directory) to interpret the log. You have to build d8 in deps/v8 to be able to run the tick processor.
Here's how I do it on my machine:
apt-get install scons
cd ~/development/external/node-0.6.12/deps/v8
scons arch=x64 d8
cd ~/development/projects/foo
node --prof app.js
D8_PATH=~/development/external/node-0.6.12/deps/v8 ~/development/external/node-0.6.12/deps/v8/tools/linux-tick-processor > profile.log
There are also a few tools to make this easier, including node-profiler and v8-profiler (with node-inspector).
Regarding your other question, I would like some more information on how you fetch your data from Mongo, and what the data looks like (I agree with beny23 that it looks like a suspiciously low amount of data).
I strongly suggest taking a look at the DTrace support of Restify. It will likely become your best friend when profiling.
http://mcavage.github.com/node-restify/#DTrace

Mysql InnoDB optimisation

I'm having some trouble understanding InnoDB usage - we have a drupal based DB (5:1 read:write) running on mysql (Server version: 5.1.41-3ubuntu12.10-log (Ubuntu)). Our current Innodb data/index sizing is:
Current InnoDB index space = 196 M
Current InnoDB data space = 475 M
Looking around on the web and reading books like 'High performance sql' suggest to have 10% increase on data size - i have set the buffer pool to be (data+index)+10% and noticed that the buffer pool was at 100%...even increasing about this to 896Mb still makes it 100% (even though the data + indexes are only ~671Mb?
I've attached the output of the innodb section of mysqlreport below. Pages free of 1 seems to be suggesting a major problem also as well. The innodb_flush_method is set at its default - I will investigate setting this to O_DIRECT but want to sort out this issue before.
__ InnoDB Buffer Pool __________________________________________________
Usage 895.98M of 896.00M %Used: 100.00
Read hit 100.00%
Pages
Free 1 %Total: 0.00
Data 55.96k 97.59 %Drty: 0.01
Misc 1383 2.41
Latched 0 0.00
Reads 405.96M 1.2k/s
From file 15.60k 0.0/s 0.00
Ahead Rnd 211 0.0/s
Ahead Sql 1028 0.0/s
Writes 29.10M 87.3/s
Flushes 597.58k 1.8/s
Wait Free 0 0/s
__ InnoDB Lock _________________________________________________________
Waits 66 0.0/s
Current 0
Time acquiring
Total 3890 ms
Average 58 ms
Max 3377 ms
__ InnoDB Data, Pages, Rows ____________________________________________
Data
Reads 21.51k 0.1/s
Writes 666.48k 2.0/s
fsync 324.11k 1.0/s
Pending
Reads 0
Writes 0
fsync 0
Pages
Created 84.16k 0.3/s
Read 59.35k 0.2/s
Written 597.58k 1.8/s
Rows
Deleted 19.13k 0.1/s
Inserted 6.13M 18.4/s
Read 196.84M 590.6/s
Updated 139.69k 0.4/s
Any help on this would be greatly apprectiated.
Thanks!