MPI executable fails for ArchLinux on Termux - configuration

Why do I see the following error messages when executing mpirun on ArchLinux for Termux?
The same program executes on Termux without any glitches.
#localhost:/data/data/com.termux/files/home[root#localhost home] mpirun --allow-run-as-root
[localhost:06773] opal_ifinit: ioctl(SIOCGIFHWADDR) failed with errno=13
[localhost:06773] pmix_ifinit: ioctl(SIOCGIFHWADDR) failed with errno=13
[localhost:06773] oob_tcp: problems getting address for index 83376 (kernel index -1)
--------------------------------------------------------------------------
No network interfaces were found for out-of-band communications. We require
at least one available network for out-of-band messaging.
--------------------------------------------------------------------------
A Google search reveals these links:
https://groups.google.com/g/trans-abyss/c/r5Z6w7BoNe4
https://users.open-mpi.narkive.com/ralQnMWY/ompi-users-no-network-interfaces-were-found-for-out-of-band-communications
https://users.open-mpi.narkive.com/kheNxePO/ompi-users-general-question-about-running-single-node-jobs
(export OMPI_MCA_oob=^tcp
removes the TCP related error.)
https://github.com/open-mpi/ompi/issues/6960
https://www.mail-archive.com/users#lists.open-mpi.org//msg32661.html
However, none of them appear to have a relevant solution.
Output to ifconfig and ip addr on ArchLinux for Termux
dummy0: flags=195<UP,BROADCAST,RUNNING,NOARP> mtu 1500
inet6 fe80::38a0:1bff:fe81:d4f5 prefixlen 64 scopeid 0x20<link>
unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 txqueuelen 0 (UNSPEC)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 3 bytes 210 (210.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 txqueuelen 0 (UNSPEC)
RX packets 17247 bytes 2062939 (1.9 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 17247 bytes 2062939 (1.9 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
p2p0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 txqueuelen 1000 (UNSPEC)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
rmnet_data0: flags=65<UP,RUNNING> mtu 1500
inet 10.140.58.138 netmask 255.255.255.252
inet6 fe80::93a5:ad99:4660:adc4 prefixlen 64 scopeid 0x20<link>
unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 txqueuelen 1000 (UNSPEC)
RX packets 416796 bytes 376287723 (358.8 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 318293 bytes 69933666 (66.6 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
rmnet_data7: flags=65<UP,RUNNING> mtu 2000
inet6 fe80::a6b7:c914:44de:639 prefixlen 64 scopeid 0x20<link>
unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 txqueuelen 1000 (UNSPEC)
RX packets 8 bytes 620 (620.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 10 bytes 752 (752.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
rmnet_ipa0: flags=65<UP,RUNNING> mtu 2000
unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 txqueuelen 1000 (UNSPEC)
RX packets 222785 bytes 381290027 (363.6 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 318303 bytes 69934418 (66.6 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
wlan0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 txqueuelen 1000 (UNSPEC)
RX packets 650238 bytes 739939859 (705.6 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 408284 bytes 63728624 (60.7 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
link/ether 3a:a0:1b:81:d4:f5 brd ff:ff:ff:ff:ff:ff
inet6 fe80::38a0:1bff:fe81:d4f5/64 scope link
valid_lft forever preferred_lft forever
3: sit0#NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default
link/sit 0.0.0.0 brd 0.0.0.0
4: rmnet_ipa0: <UP,LOWER_UP> mtu 2000 qdisc pfifo_fast state UNKNOWN group default qlen 1000
link/[530]
5: rmnet_data0: <UP,LOWER_UP> mtu 1500 qdisc htb state UNKNOWN group default qlen 1000
link/[530]
inet 10.140.58.138/30 scope global rmnet_data0
valid_lft forever preferred_lft forever
inet6 fe80::93a5:ad99:4660:adc4/64 scope link
valid_lft forever preferred_lft forever
6: rmnet_data1: <> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/[530]
7: rmnet_data2: <> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/[530]
8: rmnet_data3: <> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/[530]
9: rmnet_data4: <> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/[530]
10: rmnet_data5: <> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/[530]
11: rmnet_data6: <> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/[530]
12: rmnet_data7: <UP,LOWER_UP> mtu 2000 qdisc htb state UNKNOWN group default qlen 1000
link/[530]
inet6 fe80::a6b7:c914:44de:639/64 scope link
valid_lft forever preferred_lft forever
13: r_rmnet_data0: <> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/[530]
14: r_rmnet_data1: <> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/[530]
15: r_rmnet_data2: <> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/[530]
16: r_rmnet_data3: <> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/[530]
17: r_rmnet_data4: <> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/[530]
18: r_rmnet_data5: <> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/[530]
19: r_rmnet_data6: <> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/[530]
20: r_rmnet_data7: <> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/[530]
21: r_rmnet_data8: <> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/[530]
22: wlan0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
link/ether 04:d1:3a:07:70:4b brd ff:ff:ff:ff:ff:ff
23: p2p0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
link/ether 06:d1:3a:07:70:4b brd ff:ff:ff:ff:ff:ff

Faced this problem on a laptop with Fedora 34. OpenMPI only worked with active Wifi connection. The problem was solved by adding the name of the loopback network interface for out-of-band communication to the file with system wide OpenMPI settings /etc/openmpi-x86_64/openmpi-mca-params.conf
oob_tcp_if_include = lo

Related

HaProxy Configurations Error while setting up multi proc environment

I want to achieve 6000/sec requests and trying to setup HaProxy with multi proc settings but getting this error:
cpu-map expects a process number including 'all', 'odd', 'even', or a
number from 1 to 64, followed by a list of CPU ranges with numbers
from 0 to 63
Following are the configurations I'm using.
global
daemon
maxconn 200000
maxsslconn 200000
#stats socket /run/haproxy/admin.sock mode 660 level admin
stats socket 127.0.0.1:14567
nbproc 6
cpu-map auto:1/all 0
cpu-map auto:2/all 1
cpu-map auto:3/all 2
cpu-map auto:4/all 3
cpu-map auto:5/all 4
cpu-map auto:6/all 5
stats bind-process 6
# Default SSL material locations
ca-base /etc/ssl/certs
crt-base /etc/ssl/private
# Default ciphers to use on SSL-enabled listening sockets.
# For more information, see ciphers(1SSL). This list is from:
# https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/
ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:ECDH+3DES:DH+3DES:RSA+AESGCM:RSA+AES:RSA+3DES:!aNULL:!MD5:!DSS
ssl-default-bind-options no-sslv3
tune.ssl.default-dh-param 2048
and
global
daemon
maxconn 200000
maxsslconn 200000
#stats socket /run/haproxy/admin.sock mode 660 level admin
stats socket 127.0.0.1:14567
nbproc 6
cpu-map 1 0
cpu-map 2 1
cpu-map 3 2
cpu-map 4 3
cpu-map 5 4
cpu-map 6 5
stats bind-process 6
# Default SSL material locations
ca-base /etc/ssl/certs
crt-base /etc/ssl/private
# Default ciphers to use on SSL-enabled listening sockets.
# For more information, see ciphers(1SSL). This list is from:
# https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/
ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:ECDH+3DES:DH+3DES:RSA+AESGCM:RSA+AES:RSA+3DES:!aNULL:!MD5:!DSS
ssl-default-bind-options no-sslv3
tune.ssl.default-dh-param 2048
Nothing is working as expected.

MySQL TokuDB engine using too much CPU

I have converted tables of a database from InnoDB to TokuDB and i noticed that with TokuDB, reads are using way too much CPU. Why is this?
To be more specific, the server with TokuDB tables is a slave of a server with InnoDB which is part of the PXC. The slave just used regular percona server and not PXC. But the slave seems to be using way too much CPU and i do not know why?
Below is my my.cnf config:
[client]
port = 3306
socket = /var/run/mysqld/mysqld.sock
[mysqld_safe]
thp-setting=never
socket = /var/run/mysqld/mysqld.sock
nice = 0
flush_caches
numa_interleave
core-file-size = unlimited
open_files_limit = 1024
[mysqld]
back_log = 65535
bind-address = 0.0.0.0
binlog_format = ROW
character_set_server = utf8
collation_server = utf8_general_ci
core_file
basedir = /usr
datadir = /var/lib/mysql
#default_storage_engine = InnoDB
enforce-gtid-consistency = 1
expand_fast_index_creation = 1
expire_logs_days = 7
gtid_mode = ON
innodb_autoinc_lock_mode = 2
innodb_buffer_pool_instances = 1
innodb_buffer_pool_populate = 1
innodb_buffer_pool_size = 512M
innodb_data_file_path = ibdata1:64M;ibdata2:64M:autoextend
innodb_file_format = Barracuda
innodb_file_per_table
innodb_force_recovery = 1
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
innodb_io_capacity = 1600
innodb_large_prefix
innodb_locks_unsafe_for_binlog = 1
innodb_log_file_size = 64M
innodb_print_all_deadlocks = 1
innodb_read_io_threads = 64
innodb_stats_on_metadata = FALSE
innodb_support_xa = FALSE
innodb_write_io_threads = 64
lc-messages-dir = /usr/share/mysql
log-bin = mysqld-bin
log-queries-not-using-indexes
log-slave-updates
long_query_time = 1
master_info_repository = TABLE
max_allowed_packet = 64M
max_connect_errors = 4294967295
max_connections = 2500
max_user_connections = 2550
min_examined_row_limit = 1000
open_files_limit = 1024
port = 3306
relay_log_info_repository = TABLE
relay-log-recovery = TRUE
relay-log-recovery = 1
skip-external-locking
skip-name-resolve
slave_parallel_workers = 8
slow_query_log = 1
slow_query_log_timestamp_always = 1
socket = /var/run/mysqld/mysqld.sock
table_open_cache = 4096
thread_cache = 1024
tmpdir = /srv/tmp
transaction_isolation = REPEATABLE-READ
updatable_views_with_limit = 0
user = mysql
wait_timeout = 60
server-id = 2
# TokuDB fine tuning
default_storage_engine = TokuDB
tokudb_analyze_time = 5
#tokudb_cache_size = 6G
tokudb_directio = 1
tokudb_commit_sync = 0
tokudb_fsync_log_period = 1000
tokudb_load_save_space =1
tokudb_alter_print_error=0
tokudb_block_size = 4MB
tokudb_bulk_fetch = 1
tokudb_disable_slow_alter = 1
tokudb_last_lock_timeout = empty
tokudb_row_format = tokudb_quicklz
#tokudb_data_dir = /var/lib/tokudb
[mysqldump]
quick
quote-names
max_allowed_packet = 16M
[mysql]
#no-auto-rehash # faster start of mysql but no tab completition
[isamchk]
key_buffer = 16M
!includedir /etc/mysql/conf.d/
The following replication message was being reported by our monitoring system xymon when tokudb_cache_size when initially set to 80% of total RAM.
2016-02-25 16:42:04 9604 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=db-kdb-slave-6-relay-bin' to avoid this problem.
2016-02-25 16:42:05 9604 [Warning] Recovery from master pos 552554502 and file mysqld-bin.001163. Previous relay log pos and relay log file had been set to 552554714, ./db-kdb-slave-6-relay-bin.002933 respectively.
2016-02-25 16:42:05 9604 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
------More info about the Master server running InnoDB and part of PXC-----------
## Results from top
top - 10:05:12 up 14 days, 7:56, 2 users, load average: 2.16, 2.31, 2.39
Tasks: 413 total, 1 running, 412 sleeping, 0 stopped, 0 zombie
%Cpu(s): 8.9 us, 0.6 sy, 0.0 ni, 89.9 id, 0.3 wa, 0.0 hi, 0.2 si, 0.0 st
KiB Mem: 65704012 total, 63553216 used, 2150796 free, 169832 buffers
KiB Swap: 975868 total, 809892 used, 165976 free. 16304268 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2485 mysql 20 0 60.146g 0.045t 2.612g S 314.9 73.3 27762:43 mysqld
## disk info
george#db-erp-3:~$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 32G 8.0K 32G 1% /dev
tmpfs 6.3G 1.2M 6.3G 1% /run
/dev/sda2 274G 2.1G 258G 1% /
none 4.0K 0 4.0K 0% /sys/fs/cgroup
none 5.0M 0 5.0M 0% /run/lock
none 32G 0 32G 0% /run/shm
none 100M 0 100M 0% /run/user
/dev/nvme0n1p1 1.1T 542G 503G 52% /srv
na1:/vol/yphome 4.5T 3.7T 875G 82% /net/account
## Memory info
george#db-erp-3:~$ free -g
total used free shared buffers cached
Mem: 62 60 2 0 0 15
-/+ buffers/cache: 44 17
Swap: 0 0 0
george#db-erp-3:~$
## Database info
+--------------------+----------------------+
| Data Base Name | Data Base Size in MB |
+--------------------+----------------------+
| information_schema | 0.00976563 |
| dberp | 347143.32031250 |
| mysql | 2.11562061 |
| performance_schema | 0.00000000 |
+--------------------+----------------------+
4 rows in set (0.13 sec)
+--------------------+----------------------+------------------+
| Data Base Name | Data Base Size in MB | Free Space in MB |
+--------------------+----------------------+------------------+
| information_schema | 0.00976563 | 0.00000000 |
| dberp | 347143.32031250 | 6270.00000000 |
| mysql | 2.11562061 | 4.00199127 |
| performance_schema | 0.00000000 | 0.00000000 |
+--------------------+----------------------+------------------+
4 rows in set (0.03 sec)
Your CPU will be higher for reads because TokuDB data needs to be decompressed to be used. Also, if this slave is processing any activity from the master than it's also doing compression for the insert/update/delete activity.
Couple of ideas.
1. Reduce the value of tokudb_block_size. While 4MB is great for compression it means that your point queries need to decompress a lot more data than they have to. Try using 256KB and see how CPU and performance changes. You might have to rebuild your slave to accomplish this easily (I'm now over a year away from working at TokuDB).
2. Look at your tokudb_cache_size. It defaults to 50% of RAM, but if nothing else is on this server you should up it to somewhere between 75% and 80%. This will mean less reads and decompression since more data will be in your cache.

How can I improve performance on DRF with high CPU time

I have a REST api with DRF and start to see already a performance hit with 100 objects and 1 user requesting (me - testing).
When requesting the more complex query, I get these results for CPU, always 5 - 10s:
Resource Value
>User CPU time 5987.089 msec
System CPU time 463.929 msec
Total CPU time 6451.018 msec
Elapsed time 6800.938 msec
Context switches 9 voluntary, 773 involuntary
but the SQL query stays below 100 ms
The more simple queries show similar behaviour, with CPU times around 1s and query time around 20 ms
So far, what I have tried out:
I am doing select_related() and prefetch_related(), which did improve the query time but not CPU time
I am using Imagekit to generate pictures, on a S3 instance. I removed the whole specification to test and this had minor impact
I run a method field to fetch user-specific data. Removing this had only minor impact
I have checked logs files on the backend and nothing specific shows up here...
Backend is Nginx - supervisord - gunicorn - postgresql - django 1.8.1
Here are the serializer and view:
class ParticipationOrganizationSerializer(ModelSerializer):
organization = OrganizationSerializer(required=False, read_only=True, )
bookmark = SerializerMethodField(
required=False,
read_only=True,
)
location_map = LocationMapSerializer(
required=False,
read_only=True,
)
class Meta:
model = Participation
fields = (
'id',
'slug',
'organization',
'location_map',
'map_code',
'partner',
'looking_for',
'complex_profile',
'bookmark',
'confirmed',
)
read_only_fields = (
'id',
'slug',
'organization',
'location_map',
'map_code',
'partner',
'bookmark',
'confirmed',
)
def get_bookmark(self, obj):
request = self.context.get('request', None)
if request is not None:
if(request.user.is_authenticated()):
# print(obj.bookmarks.filter(author=request.user).count())
try:
bookmark = obj.bookmarks.get(author=request.user)
# bookmark = Bookmark.objects.get(
# author=request.user,
# participation=obj,
# )
return BookmarkSerializer(bookmark).data
except Bookmark.DoesNotExist:
# We have nothing yet
return None
except Bookmark.MultipleObjectsReturned:
# This should not happen, but in case it does, delete all
# the bookmarks for safety reasons.
Bookmark.objects.filter(
author=request.user,
participation=obj,
).delete()
return None
return None
class ParticipationOrganizationViewSet(ReadOnlyModelViewSet):
"""
A readonly ViewSet for viewing participations of a certain event.
"""
serializer_class = ParticipationOrganizationSerializer
queryset = Participation.objects.all().select_related(
'location_map',
'organization',
'organization__logo_image',
).prefetch_related(
'bookmarks',
)
lookup_field = 'slug'
def get_queryset(self):
event_slug = self.kwargs['event_slug']
# Filter for the current event
# Filter to show only the confirmed participations
participations = Participation.objects.filter(
event__slug=event_slug,
confirmed=True
).select_related(
'location_map',
'organization',
'organization__logo_image',
).prefetch_related(
'bookmarks',
)
# Filter on partners? This is a parameter passed on in the url
partners = self.request.query_params.get('partners', None)
if(partners == "true"):
participations = participations.filter(partner=True)
return participations
# http://stackoverflow.com/questions/22616973/django-rest-framework-use-different-serializers-in-the-same-modelviewset
def get_serializer_class(self):
if self.action == 'list':
return ParticipationOrganizationListSerializer
if self.action == 'retrieve':
return ParticipationOrganizationSerializer
return ParticipationOrganizationListSerializer
Any help is very much appreciated!
update
I dumped the data to my local machine and I am observing similar times. I guess this rules out the whole production setup (nginx, gunicorn)?
update 2
Here are the results of the profiler.
Also I made some progress in improving the speeds by
Simplifying my serializers
Doing the tests with curl and having Debug Toolbar off
ncalls tottime percall cumtime percall filename:lineno(function)
0 0 0 profile:0(profiler)
1 0 0 3.441 3.441 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/rest_framework/views.py:442(dispatch)
1 0 0 3.441 3.441 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/rest_framework/viewsets.py:69(view)
1 0 0 3.441 3.441 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/django/views/decorators/csrf.py:57(wrapped_view)
1 0 0 3.44 3.44 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/rest_framework/mixins.py:39(list)
1 0 0 3.438 3.438 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/rest_framework/serializers.py:605(to_representation)
1 0 0 3.438 3.438 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/rest_framework/serializers.py:225(data)
1 0 0 3.438 3.438 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/rest_framework/serializers.py:672(data)
344/114 0.015 0 3.318 0.029 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/rest_framework/serializers.py:454(to_representation)
805 0.01 0 2.936 0.004 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/rest_framework/fields.py:1368(to_representation)
2767 0.013 0 2.567 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/django/dispatch/dispatcher.py:166(send)
2070 0.002 0 2.52 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/registry.py:52(existence_required_receiver)
2070 0.005 0 2.518 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/registry.py:55(_receive)
2070 0.004 0 2.513 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/utils.py:147(call_strategy_method)
2070 0.002 0 2.508 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/cachefiles/strategies.py:14(on_existence_required)
2070 0.005 0 2.506 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/cachefiles/__init__.py:86(generate)
2070 0.002 0 2.501 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/cachefiles/backends.py:109(generate)
2070 0.003 0 2.499 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/cachefiles/backends.py:94(generate_now)
2070 0.01 0 2.496 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/cachefiles/backends.py:65(get_state)
690 0.001 0 2.292 0.003 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/cachefiles/__init__.py:148(__nonzero__)
690 0.005 0 2.291 0.003 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/cachefiles/__init__.py:124(__bool__)
2070 0.007 0 2.276 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/cachefiles/backends.py:112(_exists)
2070 0.01 0 2.269 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/storages/backends/s3boto.py:409(exists)
4140 0.004 0 2.14 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/storages/backends/s3boto.py:282(entries)
1633 0.003 0 2.135 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/storages/backends/s3boto.py:288()
1633 0.001 0 2.129 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/boto/s3/bucketlistresultset.py:24(bucket_lister)
2 0 0 2.128 1.064 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/boto/s3/bucket.py:390(_get_all)
2 0 0 2.128 1.064 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/boto/s3/bucket.py:426(get_all_keys)
1331 0.003 0 1.288 0.001 /usr/lib/python2.7/ssl.py:335(recv)
1331 1.285 0.001 1.285 0.001 /usr/lib/python2.7/ssl.py:254(read)
2 0 0 0.983 0.491 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/boto/connection.py:886(_mexe)
2 0 0 0.983 0.491 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/boto/s3/connection.py:643(make_request)
2 0 0 0.983 0.491 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/boto/connection.py:1062(make_request)
2 0.004 0.002 0.896 0.448 /usr/lib/python2.7/httplib.py:585(_read_chunked)
2 0 0 0.896 0.448 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/boto/connection.py:393(read)
2 0 0 0.896 0.448 /usr/lib/python2.7/httplib.py:540(read)
166 0.002 0 0.777 0.005 /usr/lib/python2.7/httplib.py:643(_safe_read)
166 0.005 0 0.775 0.005 /usr/lib/python2.7/socket.py:336(read)
2 0 0 0.568 0.284 /usr/lib/python2.7/httplib.py:793(send)
2 0 0 0.568 0.284 /usr/lib/python2.7/httplib.py:998(_send_request)
2 0 0 0.568 0.284 /usr/lib/python2.7/httplib.py:820(_send_output)
2 0 0 0.568 0.284 /usr/lib/python2.7/httplib.py:977(request)
2 0 0 0.568 0.284 /usr/lib/python2.7/httplib.py:962(endheaders)
1 0 0 0.567 0.567 /usr/lib/python2.7/httplib.py:1174(connect)
1380 0.001 0 0.547 0 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/cachefiles/__init__.py:82(url)
1380 0.007 0 0.546 0 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/cachefiles/__init__.py:72(_storage_attr)
105 0.009 0 0.528 0.005 /usr/lib/python2.7/socket.py:406(readline)
2 0 0 0.413 0.207 /usr/lib/python2.7/httplib.py:408(begin)
2 0 0 0.413 0.207 /usr/lib/python2.7/httplib.py:1015(getresponse)
2 0 0 0.407 0.203 /usr/lib/python2.7/httplib.py:369(_read_status)
2750 0.003 0 0.337 0 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/rest_framework/fields.py:399(get_attribute)
1 0.223 0.223 0.335 0.335 /usr/lib/python2.7/socket.py:537(create_connection)
2865 0.012 0 0.334 0 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/rest_framework/fields.py:65(get_attribute)
1610 0.005 0 0.314 0 /home/my_app/.virtualenvs/my_app/src/django-s3-folder-storage/s3_folder_storage/s3.py:13(url)
1610 0.012 0 0.309 0 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/storages/backends/s3boto.py:457(url)
690 0.005 0 0.292 0 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/models/fields/utils.py:10(__get__)
690 0.007 0 0.251 0 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/cachefiles/__init__.py:20(__init__)
2 0 0 0.248 0.124
>>>> cutting here, low impact calls

Percona Xtradb Cluster failing

I have setup Percona Xtradb cluster with 3 nodes. The first node starts fine with bootstrap, but when I try to start the second node to join the cluster, I get the following error:
2015-08-27 18:08:08 25990 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (a6b3fced-4ca1-11e5-b5da-d69fa186273c): 1 (Operation not permitted)
at galera/src/replicator_str.cpp:prepare_for_IST():463. IST will be unavailable.
2015-08-27 18:08:08 25990 [Note] WSREP: Member 0.0 (db-gc-pxc2) requested state transfer from 'any'. Selected 1.0 (db-gc-pxc1)(SYNCED) as donor.
2015-08-27 18:08:08 25990 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 0)
2015-08-27 18:08:08 25990 [Note] WSREP: Requesting state transfer: success, donor: 1
2015-08-27 18:08:08 25990 [Warning] WSREP: 1.0 (db-gc-pxc1): State transfer to 0.0 (db-gc-pxc2) failed: -12 (Cannot allocate memory)
2015-08-27 18:08:08 25990 [ERROR] WSREP: gcs/src/gcs_group.cpp:int gcs_group_handle_join_msg(gcs_group_t*, const gcs_recv_msg_t*)():731: Will never receive state. Need to abort.
2015-08-27 18:08:08 25990 [Note] WSREP: gcomm: terminating thread
2015-08-27 18:08:08 25990 [Note] WSREP: gcomm: joining thread
2015-08-27 18:08:08 25990 [Note] WSREP: gcomm: closing backend
Below is my cluster config in my.cnf file:
# Galera COnfig
wsrep_cluster_name = pxc
wsrep_cluster_address = gcomm://192.168.2.100,192.168.2.101,10.168.1.102
wsrep_node_address = 10.1.0.101
wsrep_provider = /usr/lib/libgalera_smm.so
wsrep_provider_options = "gcache.size=4G"
wsrep_slave_threads = 32
wsrep_sst_auth = "user:userpass"
wsrep_node_name = node2
#wsrep_sst_method = xtrabackup_throttle
wsrep_sst_method = xtrabackup-v2
What would be causing this error?
FYI, I do have the user and password for wsrep_sst_auth created in the database.
Here is the remainder of the my.cnf it it helps:
back_log = 65535
binlog_format = ROW
character_set_server = utf8
collation_server = utf8_general_ci
datadir = /var/lib/mysql
#default_storage_engine = InnoDB
expand_fast_index_creation = 1
expire_logs_days = 7
innodb_autoinc_lock_mode = 2
innodb_buffer_pool_instances = 6
innodb_buffer_pool_populate = 1
innodb_buffer_pool_size = 6G # XXX 64GB RAM, 80%
#innodb_data_file_path = ibdata1:64M;ibdata2:64M:autoextend
innodb_file_format = Barracuda
innodb_file_per_table
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
innodb_io_capacity = 1600
innodb_large_prefix
innodb_locks_unsafe_for_binlog = 1
#innodb_log_file_size = 64M
innodb_print_all_deadlocks = 1
innodb_read_io_threads = 64
innodb_stats_on_metadata = FALSE
innodb_support_xa = FALSE
innodb_write_io_threads = 64
log-bin = mysqld-bin
log-queries-not-using-indexes
log-slave-updates
long_query_time = 1
max_allowed_packet = 64M
max_connect_errors = 4294967295
max_connections = 4096
min_examined_row_limit = 1000
performance-schema-instrument='%=ON'
port = 3306
relay-log-recovery = TRUE
skip-name-resolve
slow_query_log = 1
slow_query_log_timestamp_always = 1
table_open_cache = 4096
thread_cache = 1024
tmpdir = /srv/tmp
transaction_isolation = REPEATABLE-READ
updatable_views_with_limit = 0
user = mysql
wait_timeout = 60
This would seem to be the root cause:
2015-08-27 18:08:08 25990 [Warning] WSREP: 1.0 (db-gc-pxc1): State transfer to 0.0 (db-gc-pxc2) failed: -12 (Cannot allocate memory)
The new node tries to join the cluster. The new node has no state currently (local UUID is zeroes), and so an IST is not available - this means it needs to run a full SST form the donor node.
Node pxc2 is the joiner and pxc1 is the selected donor; however we get an error from pxc1 that the state transfer failed, which causes the joining to fail.
You should check the logs on the donor node (pxc1) for more detail; but the log we have indicates that it has insufficient memory to run the export of the database. Not knowing your hardware configuration, I can't give a definite response, but most likely your my.cnf is configured to be too memory hungry for your available memory and so the xtrabackup process cannot run, or else the database is too large. Add more memory to the node, or else reduce the allocations in the my.cnf.

Who are using all the memory on my production server(apache + mysql + rails)?

I am running a EC2 small instance as my production server. It has 1.7G memory. I noticed it uses almost all memory. However when I check top output, it looks like that only 30% is actually used. Did I misread the top output?
Here is the top output (sorted by %MEM)
top - 21:33:15 up 141 days, 9:39, 2 users, load average: 0.00, 0.00, 0.00
Tasks: 81 total, 2 running, 79 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1747660k total, 1733580k used, 14080k free, 224144k buffers
Swap: 917496k total, 132k used, 917364k free, 1144808k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11664 mysql 15 0 794m 83m 5020 S 0.0 4.9 0:17.34 mysqld
12845 nobody 25 0 52416 38m 3200 S 0.0 2.3 0:02.10 ruby1.8
12847 nobody 16 0 52704 38m 2068 S 0.0 2.2 0:02.08 ruby1.8
12023 www-data 15 0 37692 10m 4164 S 0.0 0.6 0:01.28 apache2
11979 www-data 15 0 37660 10m 4172 S 0.0 0.6 0:01.24 apache2
12020 www-data 15 0 37708 10m 4120 S 0.0 0.6 0:01.17 apache2
12263 www-data 15 0 37708 10m 4176 S 0.0 0.6 0:00.83 apache2
11989 www-data 15 0 37720 10m 4024 S 0.0 0.6 0:01.28 apache2
12014 www-data 15 0 37468 10m 4172 S 0.0 0.6 0:01.17 apache2
12021 www-data 15 0 37652 10m 3992 S 0.0 0.6 0:01.25 apache2
12054 www-data 15 0 37480 10m 4176 S 0.0 0.6 0:01.33 apache2
11990 www-data 15 0 37448 10m 4188 S 0.0 0.6 0:01.16 apache2
12024 www-data 16 0 37416 10m 4172 S 0.0 0.6 0:01.00 apache2
11991 www-data 15 0 37432 10m 4148 S 0.0 0.6 0:01.24 apache2
11984 www-data 15 0 37444 9.8m 3972 S 0.0 0.6 0:01.33 apache2
11985 www-data 15 0 37444 9.8m 3948 S 0.0 0.6 0:01.18 apache2
11982 www-data 15 0 37408 9.8m 3968 S 0.0 0.6 0:01.12 apache2
12013 www-data 17 0 37432 9.8m 4152 S 0.0 0.6 0:01.19 apache2
12052 www-data 15 0 37176 9.8m 4180 S 0.0 0.6 0:01.29 apache2
11981 www-data 15 0 37172 9.8m 4168 S 0.0 0.6 0:01.40 apache2
12395 www-data 15 0 37420 9988 3972 S 0.0 0.6 0:00.72 apache2
12015 www-data 15 0 37412 9972 3900 S 0.0 0.6 0:01.31 apache2
11987 www-data 15 0 37160 9956 4136 S 0.0 0.6 0:01.22 apache2
12022 www-data 15 0 37140 9900 4140 S 0.0 0.6 0:01.20 apache2
12051 www-data 15 0 37216 9848 3976 S 0.0 0.6 0:01.31 apache2
11978 www-data 18 0 36948 9784 4180 S 0.0 0.6 0:01.08 apache2
11975 www-data 15 0 37140 9772 3972 S 0.0 0.6 0:01.49 apache2
12019 www-data 15 0 37148 9752 3944 S 0.0 0.6 0:01.08 apache2
11970 www-data 15 0 36920 9736 4160 S 0.0 0.6 0:01.25 apache2
11974 www-data 15 0 36848 9656 4148 S 0.0 0.6 0:01.53 apache2
11973 www-data 15 0 36924 9552 3972 S 0.0 0.5 0:01.19 apache2
28622 root 18 0 35232 9232 5592 S 0.0 0.5 0:00.30 apache2
11969 www-data 15 0 36340 9132 4136 S 0.0 0.5 0:01.51 apache2
12018 www-data 19 0 36332 9124 4136 S 0.0 0.5 0:01.32 apache2
11972 www-data 15 0 36320 8968 3988 S 0.0 0.5 0:01.33 apache2
12012 www-data 15 0 35796 8600 4144 S 0.0 0.5 0:01.11 apache2
11965 root 15 0 17356 7552 1644 S 0.0 0.4 0:00.13 ruby1.8
12848 root 15 0 8384 2744 2164 R 0.0 0.2 0:00.12 sshd
12762 root 15 0 8384 2724 2164 S 0.0 0.2 0:00.01 sshd
11302 postfix 18 0 6184 2576 1880 S 0.0 0.1 0:00.02 tlsmgr
11964 root 16 0 8188 2248 1492 S 0.0 0.1 0:00.06 ApplicationPool
23997 postfix 22 0 5856 1852 1488 S 0.0 0.1 0:00.22 qmgr
12850 root 15 0 4408 1848 1436 S 0.0 0.1 0:00.00 bash
12764 root 25 0 4396 1800 1400 S 0.0 0.1 0:00.00 bash
23996 root 15 0 5804 1780 1428 S 0.0 0.1 0:01.01 master
13036 postfix 17 0 5812 1684 1356 S 0.0 0.1 0:00.00 pickup
1051 klog 18 0 2884 1676 436 S 0.0 0.1 0:00.04 klogd
13035 root 15 0 2468 1164 916 R 0.0 0.1 0:00.01 top
5841 nobody 15 0 2652 1120 684 S 0.0 0.1 0:00.50 memcached
11509 root 15 0 5456 1068 676 S 0.0 0.1 0:00.00 sshd
1163 root 18 0 3560 1060 872 S 0.0 0.1 0:01.46 cron
1 root 18 0 2032 840 580 S 0.0 0.0 0:04.20 init
4070 syslog 18 0 2056 732 568 S 0.0 0.0 7:25.48 syslogd
908 root 16 -2 2292 656 528 S 0.0 0.0 0:00.06 dhclient3
The 'used' count includes filesystem cache and kernel buffers. The cached memory can be free'd when an application requires more heap. You are right to say that only about 30% is actually used, since 65% of that is cache, and 12% is buffers.
The kernel will release the cached memory when an application attempts to allocate more memory, this is normal behavior and I see no problem with your memory usage.
When you use significant amounts of swap, and your 'cached' count is very low - then you have a problem.
Some additional helpful information here (applicable to any Linux distro) -
http://forums.gentoo.org/viewtopic.php?t=175419
Mem: 1747660k total, 1733580k used, 14080k free, 224144k buffers
compare the total and used :-)
It is used for file buffering. It is nothing wrong since good memory managment should always use all availble memory in system. I don't remember but I think that 1144808k cached is the memory you can't find.
You can try writing simple application that reserves about 1 GB of memory and release it and quit. Then probably you should have this 1 GB counted as free memory since file buffers was removed.