The loss effects in multitask learning framework - deep-learning

I have designed a multi-task network where the first layers are shared between two output layers. Through investigating multi-task learning principles, I got to know that there should be a weight scalar parameter such as alpha that dampens the two losses outputted from two output layers. My question is about this parameter itself. Does it have effect on the model's final performance? probably yes.
This is the part of my code snippet for computation of losses:
...
mtl_loss = (alpha) * loss_1 + (1-alpha) * loss_2
mtl_loss.backward()
...
Above, loss_1 is MSELoss, and loss_2 is CrossEntropyLoss. As such, picking alpha=0.9, I'm getting the following loss values during training steps:
[2020-05-03 04:46:55,398 INFO] Step 50/150000; loss_1: 0.90 + loss_2: 1.48 = mtl_loss: 2.43 (RMSE: 2.03, F1score: 0.07); lr: 0.0000001; 29 docs/s; 28 sec
[2020-05-03 04:47:23,238 INFO] Step 100/150000; loss_1: 0.40 + loss_2: 1.27 = mtl_loss: 1.72 (RMSE: 1.38, F1score: 0.07); lr: 0.0000002; 29 docs/s; 56 sec
[2020-05-03 04:47:51,117 INFO] Step 150/150000; loss_1: 0.12 + loss_2: 1.19 = mtl_loss: 1.37 (RMSE: 0.81, F1score: 0.08); lr: 0.0000003; 29 docs/s; 84 sec
[2020-05-03 04:48:19,034 INFO] Step 200/150000; loss_1: 0.04 + loss_2: 1.10 = mtl_loss: 1.20 (RMSE: 0.55, F1score: 0.07); lr: 0.0000004; 29 docs/s; 112 sec
[2020-05-03 04:48:46,927 INFO] Step 250/150000; loss_1: 0.02 + loss_2: 0.96 = mtl_loss: 1.03 (RMSE: 0.46, F1score: 0.08); lr: 0.0000005; 29 docs/s; 140 sec
[2020-05-03 04:49:14,851 INFO] Step 300/150000; loss_1: 0.02 + loss_2: 0.99 = mtl_loss: 1.05 (RMSE: 0.43, F1score: 0.08); lr: 0.0000006; 29 docs/s; 167 sec
[2020-05-03 04:49:42,793 INFO] Step 350/150000; loss_1: 0.02 + loss_2: 0.97 = mtl_loss: 1.04 (RMSE: 0.43, F1score: 0.08); lr: 0.0000007; 29 docs/s; 195 sec
[2020-05-03 04:50:10,821 INFO] Step 400/150000; loss_1: 0.01 + loss_2: 0.94 = mtl_loss: 1.00 (RMSE: 0.41, F1score: 0.08); lr: 0.0000008; 29 docs/s; 223 sec
[2020-05-03 04:50:38,943 INFO] Step 450/150000; loss_1: 0.01 + loss_2: 0.86 = mtl_loss: 0.92 (RMSE: 0.40, F1score: 0.08); lr: 0.0000009; 29 docs/s; 252 sec
As training loss shows, it seems that my first network that uses MSELoss converges super fast, while the second network has not been converged yet. RMSE, and F1score are two metrics that I'm using to track the progress of first, and second network, respectively.
I know that picking the optimal alpha is somewhat experimental, but are there hints to make the process of picking it easier? Specifically, I want the networks being trained in line with each other, not like above that the first network converges super duper fast. Can alpha parameter help controlling this?

With that alpha, loss_1 is contributing more to the result and due backpropagation updates weights proportionally to error it improves faster. Try using more equilibrated alpha to balance the performance in both tasks.
You also can try change alpha during training.

Related

statsmodels OLS gives parameters despite perfect multicollinearity

Assume the following df:
ib c d1 d2
0 1.14 1 1 0
1 1.0 1 1 0
2 0.71 1 1 0
3 0.6 1 1 0
4 0.66 1 1 0
5 1.0 1 1 0
6 1.26 1 1 0
7 1.29 1 1 0
8 1.52 1 1 0
9 1.31 1 1 0
10 0.89 1 0 1
d1 and d2 are perfectly colinear. Now I estimate the following regression model:
import statsmodels.api as sm
reg = sm.OLS(df['ib'], df[['c', 'd1', 'd2']]).fit().summary()
reg
This gives me the following output:
<class 'statsmodels.iolib.summary.Summary'>
"""
OLS Regression Results
==============================================================================
Dep. Variable: ib R-squared: 0.087
Model: OLS Adj. R-squared: -0.028
Method: Least Squares F-statistic: 0.7590
Date: Thu, 17 Nov 2022 Prob (F-statistic): 0.409
Time: 12:19:34 Log-Likelihood: -1.5470
No. Observations: 10 AIC: 7.094
Df Residuals: 8 BIC: 7.699
Df Model: 1
Covariance Type: nonrobust
===============================================================================
coef std err t P>|t| [0.025 0.975]
-------------------------------------------------------------------------------
c 0.7767 0.111 7.000 0.000 0.521 1.033
d1 0.2433 0.127 1.923 0.091 -0.048 0.535
d2 0.5333 0.213 2.499 0.037 0.041 1.026
==============================================================================
Omnibus: 0.257 Durbin-Watson: 0.760
Prob(Omnibus): 0.879 Jarque-Bera (JB): 0.404
Skew: 0.043 Prob(JB): 0.817
Kurtosis: 2.019 Cond. No. 8.91e+15
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 2.34e-31. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.
"""
However, including c, d1 and d2 represents the well known dummy variable trap which, from my understanding, should make it impossible to estimate the model. Why is this not the case here?

Join two tables, different date frequency MySQL

I'm using MySQL to calculate returns for my portfolio. So, I have a table for portfolios, the holding period is 6 months say:
table Portfolio
DATE_ TCIKER WEIGHT
2007-01-31 AAPL 0.2
2007-01-31 IBM 0.2
2007-01-31 FB 0.3
2007-01-31 MMM 0.3
2007-07-31 AAPL 0.1
2007-07-31 FB 0.8
2007-07-31 AMD 0.1
... ... ...
And I have a monthly stat table for these companies(the whole universe of stocks) including monthly returns:
table stats
DATE_ TICKER RETURN OTHER_STATS
2007-01-31 AAPL 0.01 ...
2007-01-31 IBM 0.03 ...
2007-01-31 FB 0.13 ...
2007-01-31 MMM -0.07 ...
2007-02-31 AAPL 0.03 ...
2007-02-31 IBM 0.04 ...
2007-02-31 FB 0.06 ...
2007-02-31 MMM -0.10 ...
I'm re-balancing the portfolio every 6 month. So during these 6 months, the weights of each stock won't change. What I want to get is something like this:
ResultTable
DATE_ TICKER RETURN OTHER_STATS WEIGHT
2007-01-31 AAPL 0.01 ... 0.2
2007-01-31 IBM 0.03 ... 0.2
2007-01-31 FB 0.13 ... 0.3
2007-01-31 MMM -0.07 ... 0.3
2007-02-31 AAPL 0.03 ... 0.2
2007-02-31 IBM 0.04 ... 0.2
2007-02-31 FB 0.06 ... 0.3
2007-02-31 MMM -0.10 ... 0.3
2007-03-31 AAPL 0.03 ... 0.2
2007-03-31 IBM 0.14 ... 0.2
2007-03-31 FB 0.16 ... 0.3
2007-03-31 MMM -0.06 ... 0.3
... ... ... ... ...
2007-07-31 AAPL ... ... 0.1
2007-07-31 FB ... ... 0.8
2007-07-31 AMD ... ... 0.1
2007-08-31 AAPL ... ... 0.1
2007-08-31 FB ... ... 0.8
2007-08-31 AMD ... ... 0.1
I tired
select s.*, p.WEIGHT from portfolio p
left join stats s
on p.DATE_ = s.DATE_
and p.TICKER= s.TICKER;
It would only give me the dates of my portfolio re-balance date.
Is there any efficient way to calculate the monthly returns?
This might work, if I understand you formula:
SELECT
p.`DATE_`,
p.`TICKER`,
SUM(s.`RETURN` * p.`WEIGHT`) as `return`,
p.WEIGHT
FROM `portfolio` p
LEFT JOIN `stats` s
ON p.`TICKER` = s.`TICKER`
WHERE s.`DATE_` BETWEEN p.`DATE_` AND DATE_ADD(DATE_ADD(p.`DATE_`, INTERVAL 6 MONTHS),INTERVAL -1 DAY)
GROUP BY p.`DATE_`, p.`TICKER`
ORDER BY p.`DATE_`, p.`TICKER`;

Concatenating multiple csv files into one

I have multiple .csv files and I want to concatenate them into one file. Essentially I would like to choose certain columns and append them side by side.
This code I have here doesn't work. No error message at all. It just does nothing.
Does anybody know how to fix it?
import pandas as pd
import datetime
import numpy as np
import glob
import csv
import os
def concatenate(indir='/My Documents/Python/Test/in',
outfile='/My Documents/Python/Test/out/Forecast.csv'):
os.chdir(indir)
fileList = glob.glob('*.csv')
print(fileList)
dfList = []
colnames=["DateTime","WindSpeed","Capacity","p0.025","p0.05","p0.1","p0.5","p0.9","p0.95","p0.975","suffix"]
for filename in fileList:
print(filename)
df = pd.read_csv(filename ,delimiter=',',engine = 'python', encoding='latin-1', index_col = False)
dfList.append(df)
concatDF = pd.concat(dfList,axis=0)
concatDF.columns=colnames
concatDF.to_csv(outfile,index=None)
I ran this code to set up files on my file system
setup
import pandas as pd
import numpy as np
def setup_test_files(indir='in'):
colnames = [
"WindSpeed", "Capacity",
"p0.025", "p0.05", "p0.1", "p0.5",
"p0.9", "p0.95", "p0.975", "suffix"
]
tidx = pd.date_range('2016-03-31', periods=3, freq='M', name='DateTime')
for filename in ['in/fn_{}.csv'.format(i) for i in range(3)]:
pd.DataFrame(
np.random.rand(3, len(colnames)),
tidx, colnames
).round(2).to_csv(filename)
print(filename)
setup_test_files()
This created 3 files named ['fn_0.csv', 'fn_1.csv', 'fn_2.csv']
They look like this
with open('in/fn_0.csv', 'r') as fo:
print(''.join(fo.readlines()))
DateTime,WindSpeed,Capacity,p0.025,p0.05,p0.1,p0.5,p0.9,p0.95,p0.975,suffix
2016-03-31,0.03,0.76,0.62,0.21,0.76,0.36,0.44,0.61,0.23,0.04
2016-04-30,0.39,0.12,0.31,0.99,0.86,0.35,0.15,0.61,0.55,0.03
2016-05-31,0.72,1.0,0.71,0.86,0.41,0.79,0.22,0.76,0.92,0.79
I'll define a parser function and one that does the concatenation separately. Why? Because I think it's easier to follow that way.
import pandas as pd
import glob
import os
def read_csv(fn):
colnames = [
"DateTime", "WindSpeed", "Capacity",
"p0.025", "p0.05", "p0.1", "p0.5",
"p0.9", "p0.95", "p0.975", "suffix"
]
df = pd.read_csv(fn, encoding='latin-1')
df.columns = colnames
return df
def concatenate(indir='in', outfile='out/Forecast.csv'):
curdir = os.getcwd()
try:
os.chdir(indir)
file_list = glob.glob('*.csv')
df_names = [fn.replace('.csv', '') for fn in file_list]
concat_df = pd.concat(
[read_csv(fn) for fn in file_list],
axis=1, keys=df_names)
# notice I was nice enough to change directory back :-)
os.chdir(curdir)
concat_df.to_csv(outfile, index=None)
except:
os.chdir(curdir)
Then run concatenation
concatenate()
You can read in the results like this
print(pd.read_csv('out/Forecast.csv', header=[0, 1]))
fn_0 \
DateTime WindSpeed Capacity p0.025 p0.05 p0.1 p0.5 p0.9 p0.95 p0.975
0 2016-03-31 0.03 0.76 0.62 0.21 0.76 0.36 0.44 0.61 0.23
1 2016-04-30 0.39 0.12 0.31 0.99 0.86 0.35 0.15 0.61 0.55
2 2016-05-31 0.72 1.00 0.71 0.86 0.41 0.79 0.22 0.76 0.92
... fn_2
... WindSpeed Capacity p0.025 p0.05 p0.1 p0.5 p0.9 p0.95 p0.975 suffix
0 ... 0.80 0.79 0.38 0.94 0.91 0.18 0.27 0.14 0.39 0.91
1 ... 0.60 0.97 0.04 0.69 0.04 0.65 0.94 0.81 0.37 0.22
2 ... 0.78 0.53 0.83 0.93 0.92 0.12 0.15 0.65 0.06 0.11
[3 rows x 33 columns]
Notes:
You aren't taking care to make DateTime your index. I think this is probably what you want. If so, change the read_csv and concatenate functions to this
import pandas as pd
import glob
import os
def read_csv(fn):
colnames = [
"WindSpeed", "Capacity",
"p0.025", "p0.05", "p0.1", "p0.5",
"p0.9", "p0.95", "p0.975", "suffix"
]
# notice extra parameters for specifying index and parsing dates
df = pd.read_csv(fn, index_col=0, parse_dates=[0], encoding='latin-1')
df.index.name = "DateTime"
df.columns = colnames
return df
def concatenate(indir='in', outfile='out/Forecast.csv'):
curdir = os.getcwd()
try:
os.chdir(indir)
file_list = glob.glob('*.csv')
df_names = [fn.replace('.csv', '') for fn in file_list]
concat_df = pd.concat(
[read_csv(fn) for fn in file_list],
axis=1, keys=df_names)
os.chdir(curdir)
concat_df.to_csv(outfile)
except:
os.chdir(curdir)
This is what final result looks like with this change, notice the dates will be aligned this way
fn_0 \
WindSpeed Capacity p0.025 p0.05 p0.1 p0.5 p0.9 p0.95 p0.975
DateTime
2016-03-31 0.03 0.76 0.62 0.21 0.76 0.36 0.44 0.61 0.23
2016-04-30 0.39 0.12 0.31 0.99 0.86 0.35 0.15 0.61 0.55
2016-05-31 0.72 1.00 0.71 0.86 0.41 0.79 0.22 0.76 0.92
... fn_2 \
suffix ... WindSpeed Capacity p0.025 p0.05 p0.1 p0.5 p0.9
DateTime ...
2016-03-31 0.04 ... 0.80 0.79 0.38 0.94 0.91 0.18 0.27
2016-04-30 0.03 ... 0.60 0.97 0.04 0.69 0.04 0.65 0.94
2016-05-31 0.79 ... 0.78 0.53 0.83 0.93 0.92 0.12 0.15
p0.95 p0.975 suffix
DateTime
2016-03-31 0.14 0.39 0.91
2016-04-30 0.81 0.37 0.22
2016-05-31 0.65 0.06 0.11
[3 rows x 30 columns]

How can I improve performance on DRF with high CPU time

I have a REST api with DRF and start to see already a performance hit with 100 objects and 1 user requesting (me - testing).
When requesting the more complex query, I get these results for CPU, always 5 - 10s:
Resource Value
>User CPU time 5987.089 msec
System CPU time 463.929 msec
Total CPU time 6451.018 msec
Elapsed time 6800.938 msec
Context switches 9 voluntary, 773 involuntary
but the SQL query stays below 100 ms
The more simple queries show similar behaviour, with CPU times around 1s and query time around 20 ms
So far, what I have tried out:
I am doing select_related() and prefetch_related(), which did improve the query time but not CPU time
I am using Imagekit to generate pictures, on a S3 instance. I removed the whole specification to test and this had minor impact
I run a method field to fetch user-specific data. Removing this had only minor impact
I have checked logs files on the backend and nothing specific shows up here...
Backend is Nginx - supervisord - gunicorn - postgresql - django 1.8.1
Here are the serializer and view:
class ParticipationOrganizationSerializer(ModelSerializer):
organization = OrganizationSerializer(required=False, read_only=True, )
bookmark = SerializerMethodField(
required=False,
read_only=True,
)
location_map = LocationMapSerializer(
required=False,
read_only=True,
)
class Meta:
model = Participation
fields = (
'id',
'slug',
'organization',
'location_map',
'map_code',
'partner',
'looking_for',
'complex_profile',
'bookmark',
'confirmed',
)
read_only_fields = (
'id',
'slug',
'organization',
'location_map',
'map_code',
'partner',
'bookmark',
'confirmed',
)
def get_bookmark(self, obj):
request = self.context.get('request', None)
if request is not None:
if(request.user.is_authenticated()):
# print(obj.bookmarks.filter(author=request.user).count())
try:
bookmark = obj.bookmarks.get(author=request.user)
# bookmark = Bookmark.objects.get(
# author=request.user,
# participation=obj,
# )
return BookmarkSerializer(bookmark).data
except Bookmark.DoesNotExist:
# We have nothing yet
return None
except Bookmark.MultipleObjectsReturned:
# This should not happen, but in case it does, delete all
# the bookmarks for safety reasons.
Bookmark.objects.filter(
author=request.user,
participation=obj,
).delete()
return None
return None
class ParticipationOrganizationViewSet(ReadOnlyModelViewSet):
"""
A readonly ViewSet for viewing participations of a certain event.
"""
serializer_class = ParticipationOrganizationSerializer
queryset = Participation.objects.all().select_related(
'location_map',
'organization',
'organization__logo_image',
).prefetch_related(
'bookmarks',
)
lookup_field = 'slug'
def get_queryset(self):
event_slug = self.kwargs['event_slug']
# Filter for the current event
# Filter to show only the confirmed participations
participations = Participation.objects.filter(
event__slug=event_slug,
confirmed=True
).select_related(
'location_map',
'organization',
'organization__logo_image',
).prefetch_related(
'bookmarks',
)
# Filter on partners? This is a parameter passed on in the url
partners = self.request.query_params.get('partners', None)
if(partners == "true"):
participations = participations.filter(partner=True)
return participations
# http://stackoverflow.com/questions/22616973/django-rest-framework-use-different-serializers-in-the-same-modelviewset
def get_serializer_class(self):
if self.action == 'list':
return ParticipationOrganizationListSerializer
if self.action == 'retrieve':
return ParticipationOrganizationSerializer
return ParticipationOrganizationListSerializer
Any help is very much appreciated!
update
I dumped the data to my local machine and I am observing similar times. I guess this rules out the whole production setup (nginx, gunicorn)?
update 2
Here are the results of the profiler.
Also I made some progress in improving the speeds by
Simplifying my serializers
Doing the tests with curl and having Debug Toolbar off
ncalls tottime percall cumtime percall filename:lineno(function)
0 0 0 profile:0(profiler)
1 0 0 3.441 3.441 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/rest_framework/views.py:442(dispatch)
1 0 0 3.441 3.441 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/rest_framework/viewsets.py:69(view)
1 0 0 3.441 3.441 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/django/views/decorators/csrf.py:57(wrapped_view)
1 0 0 3.44 3.44 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/rest_framework/mixins.py:39(list)
1 0 0 3.438 3.438 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/rest_framework/serializers.py:605(to_representation)
1 0 0 3.438 3.438 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/rest_framework/serializers.py:225(data)
1 0 0 3.438 3.438 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/rest_framework/serializers.py:672(data)
344/114 0.015 0 3.318 0.029 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/rest_framework/serializers.py:454(to_representation)
805 0.01 0 2.936 0.004 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/rest_framework/fields.py:1368(to_representation)
2767 0.013 0 2.567 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/django/dispatch/dispatcher.py:166(send)
2070 0.002 0 2.52 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/registry.py:52(existence_required_receiver)
2070 0.005 0 2.518 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/registry.py:55(_receive)
2070 0.004 0 2.513 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/utils.py:147(call_strategy_method)
2070 0.002 0 2.508 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/cachefiles/strategies.py:14(on_existence_required)
2070 0.005 0 2.506 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/cachefiles/__init__.py:86(generate)
2070 0.002 0 2.501 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/cachefiles/backends.py:109(generate)
2070 0.003 0 2.499 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/cachefiles/backends.py:94(generate_now)
2070 0.01 0 2.496 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/cachefiles/backends.py:65(get_state)
690 0.001 0 2.292 0.003 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/cachefiles/__init__.py:148(__nonzero__)
690 0.005 0 2.291 0.003 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/cachefiles/__init__.py:124(__bool__)
2070 0.007 0 2.276 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/cachefiles/backends.py:112(_exists)
2070 0.01 0 2.269 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/storages/backends/s3boto.py:409(exists)
4140 0.004 0 2.14 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/storages/backends/s3boto.py:282(entries)
1633 0.003 0 2.135 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/storages/backends/s3boto.py:288()
1633 0.001 0 2.129 0.001 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/boto/s3/bucketlistresultset.py:24(bucket_lister)
2 0 0 2.128 1.064 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/boto/s3/bucket.py:390(_get_all)
2 0 0 2.128 1.064 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/boto/s3/bucket.py:426(get_all_keys)
1331 0.003 0 1.288 0.001 /usr/lib/python2.7/ssl.py:335(recv)
1331 1.285 0.001 1.285 0.001 /usr/lib/python2.7/ssl.py:254(read)
2 0 0 0.983 0.491 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/boto/connection.py:886(_mexe)
2 0 0 0.983 0.491 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/boto/s3/connection.py:643(make_request)
2 0 0 0.983 0.491 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/boto/connection.py:1062(make_request)
2 0.004 0.002 0.896 0.448 /usr/lib/python2.7/httplib.py:585(_read_chunked)
2 0 0 0.896 0.448 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/boto/connection.py:393(read)
2 0 0 0.896 0.448 /usr/lib/python2.7/httplib.py:540(read)
166 0.002 0 0.777 0.005 /usr/lib/python2.7/httplib.py:643(_safe_read)
166 0.005 0 0.775 0.005 /usr/lib/python2.7/socket.py:336(read)
2 0 0 0.568 0.284 /usr/lib/python2.7/httplib.py:793(send)
2 0 0 0.568 0.284 /usr/lib/python2.7/httplib.py:998(_send_request)
2 0 0 0.568 0.284 /usr/lib/python2.7/httplib.py:820(_send_output)
2 0 0 0.568 0.284 /usr/lib/python2.7/httplib.py:977(request)
2 0 0 0.568 0.284 /usr/lib/python2.7/httplib.py:962(endheaders)
1 0 0 0.567 0.567 /usr/lib/python2.7/httplib.py:1174(connect)
1380 0.001 0 0.547 0 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/cachefiles/__init__.py:82(url)
1380 0.007 0 0.546 0 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/cachefiles/__init__.py:72(_storage_attr)
105 0.009 0 0.528 0.005 /usr/lib/python2.7/socket.py:406(readline)
2 0 0 0.413 0.207 /usr/lib/python2.7/httplib.py:408(begin)
2 0 0 0.413 0.207 /usr/lib/python2.7/httplib.py:1015(getresponse)
2 0 0 0.407 0.203 /usr/lib/python2.7/httplib.py:369(_read_status)
2750 0.003 0 0.337 0 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/rest_framework/fields.py:399(get_attribute)
1 0.223 0.223 0.335 0.335 /usr/lib/python2.7/socket.py:537(create_connection)
2865 0.012 0 0.334 0 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/rest_framework/fields.py:65(get_attribute)
1610 0.005 0 0.314 0 /home/my_app/.virtualenvs/my_app/src/django-s3-folder-storage/s3_folder_storage/s3.py:13(url)
1610 0.012 0 0.309 0 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/storages/backends/s3boto.py:457(url)
690 0.005 0 0.292 0 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/models/fields/utils.py:10(__get__)
690 0.007 0 0.251 0 /home/my_app/.virtualenvs/my_app/local/lib/python2.7/site-packages/imagekit/cachefiles/__init__.py:20(__init__)
2 0 0 0.248 0.124
>>>> cutting here, low impact calls

The curious case of high 5 min load average

Looking for some expert advice here. I'm a first time sys admin on my own server and I can't figure the bottle neck in my server.
Linux CentOS 6 Apache 2.4 PHP 5.5
I've been receiving tons of high 5 min load average alert ranging between 8 - 80 from CSF
So I went ahead and installed sqltuner on my server and let it run for 3 days
The results don't show anything out of the ordinary but I'm still getting high 5 min load average daily
I'm trying to find the bottle neck (CPU, load caused by out of memory issues or I/O-bound load)
Would be stoked if someone can share any insights...
(I've included sqltuner's report and the high load email output below)
-------- Security Recommendations -------------------------------------------
[OK] There are no anonymous accounts for any database users
[OK] All database users have passwords assigned
[!!] There is no basic password file list!
-------- Performance Metrics -------------------------------------------------
[--] Up for: 120d 18h 27m 20s (227M q [21.795 qps], 51M conn, TX: 907B, RX: 26B)
[--] Reads / Writes: 38% / 62%
[--] Binary logging is disabled
[--] Total buffers: 15.4G global + 4.1M per thread (600 max threads)
[OK] Maximum reached memory usage: 16.2G (51.75% of installed RAM)
[OK] Maximum possible memory usage: 17.8G (56.91% of installed RAM)
[OK] Slow queries: 0% (11/227M)
[OK] Highest usage of available connections: 33% (199/600)
[OK] Aborted connections: 0.56% (284327/51183230)
[OK] Query cache efficiency: 83.0% (78M cached / 94M selects)
[!!] Query cache prunes per day: 10288
[OK] Sorts requiring temporary tables: 0% (392 temp sorts / 1M sorts)
[OK] Temporary tables created on disk: 4% (65K on disk / 1M total)
[OK] Thread cache hit rate: 99% (199 created / 51M connections)
[OK] Table cache hit rate: 22% (425 open / 1K opened)
[OK] Open file limit used: 0% (433/50K)
[OK] Table locks acquired immediately: 99% (41M immediate / 41M locks)
-------- MyISAM Metrics -----------------------------------------------------
[!!] Key buffer used: 20.2% (108M used / 536M cache)
[OK] Key buffer size / total MyISAM indexes: 512.0M/14.7M
[OK] Read Key buffer hit rate: 99.8% (51M cached / 121K reads)
[!!] Write Key buffer hit rate: 40.8% (4M cached / 2M writes)
-------- InnoDB Metrics -----------------------------------------------------
[--] InnoDB is enabled.
[OK] InnoDB buffer pool / data size: 14.6G/140.9M
[!!] InnoDB buffer pool instances: 1
[!!] InnoDB Used buffer: 3.39% (32546 used/ 959999 total)
[OK] InnoDB Read buffer efficiency: 100.00% (5437258684 hits/ 5437259670 total)
[!!] InnoDB Write buffer efficiency: 0.00% (0 hits/ 1 total)
[OK] InnoDB log waits: 0.00% (0 waits / 24069213 writes)
-------- AriaDB Metrics -----------------------------------------------------
[--] AriaDB is disabled.
-------- Replication Metrics -------------------------------------------------
[--] No replication slave(s) for this server.
[--] This is a standalone server..
-------- Recommendations -----------------------------------------------------
General recommendations:
Run OPTIMIZE TABLE to defragment tables for better performance
Increasing the query_cache size over 128M may reduce performance
Variables to adjust:
query_cache_size (> 128M) [see warning above]
innodb_buffer_pool_instances(=14)
----------------------
(The only change I've made is to reduce InnoDB size and add multiple pool instances)
The high daily load email:
Time: Sun Dec 6 05:43:53 2015 -0500
1 Min Load Avg: 80.26
5 Min Load Avg: 21.19
15 Min Load Avg: 7.46
Running/Total Processes: 221/875
ps.txt
O
Scoreboard Key:
"_" Waiting for Connection, "S" Starting up, "R" Reading Request,
"W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup,
"C" Closing connection, "L" Logging, "G" Gracefully finishing,
"I" Idle cleanup of worker, "." Open slot with no current process
Srv PID Acc M CPU SS Req Conn Child Slot Client VHost Request
0-1424 16243 0/149/2713129 W 1.28 14 0 0.0 2.34 65107.59 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
1-1424 17057 0/18/2701770 W 2.15 4 0 0.0 0.30 62402.50 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
2-1424 17064 0/24/2685073 W 2.11 13 0 0.0 0.32 62668.14 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
3-1424 15319 0/215/2657841 W 3.50 4 0 0.0 3.88 61950.21 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
4-1424 11567 0/204/2651294 W 7.10 7 0 0.0 3.00 63562.61 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
5-1424 16512 0/37/2640191 W 2.19 5 0 0.0 0.60 63637.48 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
6-1424 17735 0/8/2630311 W 0.62 19 0 0.0 0.06 65036.68 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
7-1424 16521 0/31/2613938 W 2.20 19 0 0.0 0.36 62385.07 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
8-1424 16081 0/33/2611913 W 2.46 5 0 0.0 0.42 60535.12 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
9-1424 14711 0/120/2603042 W 1.89 18 0 0.0 2.11 59868.26 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
10-1424 16838 0/21/2592501 W 1.77 15 0 0.0 0.24 62195.33 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
11-1424 16531 0/42/2584776 W 2.45 11 0 0.0 0.39 62253.11 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
12-1424 17065 0/20/2570161 W 1.29 12 0 0.0 0.18 60474.65 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
13-1424 17770 0/13/2564128 W 1.27 2 0 0.0 0.63 59748.24 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
14-1424 17771 0/14/2542936 W 1.30 2 0 0.0 0.17 60513.73 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
15-1424 15736 0/64/2536855 W 2.91 7 0 0.0 1.16 61453.61 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
16-1424 17077 0/19/2522131 W 2.76 15 0 0.0 0.35 59307.60 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
17-1424 14723 0/93/2521068 W 3.38 6 0 0.0 1.77 60437.40 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
18-1424 16279 0/62/2509938 W 1.81 15 0 0.0 1.07 61401.24 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
19-1424 15333 0/116/2498356 W 3.24 19 0 0.0 1.69 57911.45 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
20-1424 16297 1/35/2494463 W 0.98 53 62 16.1 0.47 59474.66 58.174.24.65 suspensionrevolution.com:80 GET /new/wp-content/themes/optimizePressTheme/lib/assets/defaul
21-1424 16298 0/40/2473943 W 3.83 3 0 0.0 0.54 57987.71 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
22-1424 18054 0/6/2469193 W 1.23 1 0 0.0 0.05 59122.65 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
23-1424 12894 0/162/2458774 W 5.90 17 0 0.0 2.42 56404.92 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
24-1424 18088 0/4/2452422 W 0.90 11 0 0.0 0.00 58405.08 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
25-1424 18089 0/6/2446570 W 1.22 1 0 0.0 0.03 57036.34 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
26-1424 17079 0/30/2439491 W 1.88 0 0 0.0 0.43 54697.67 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
27-1424 16101 0/64/2416961 W 1.53 18 0 0.0 1.69 57160.43 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
28-1424 18140 0/9/2403931 W 0.62 18 0 0.0 0.02 55901.03 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
29-982 1505 1/18/1548355 G 0.14 2914733 450294 2.8 0.29 34947.04 96.47.70.4 suspensionrevolution.com:80 POST /dap/dap-clickbank-6.0.php HTTP/1.1
30-1424 15338 0/100/2384316 W 2.20 7 0 0.0 1.24 53919.77 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
31-1424 16300 2/56/2379897 K 3.47 3 1365 2.4 1.01 55195.75 89.166.18.35 appcoiner.com:80 GET /favicon.ico HTTP/1.1
32-1424 15749 0/108/2369131 W 3.26 17 0 0.0 2.09 55452.02 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
33-1424 17100 0/17/2359616 W 1.70 11 0 0.0 0.16 52564.23 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
34-1424 16310 0/162/2356424 W 3.95 15 0 0.0 2.68 55800.32 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
35-1424 16543 0/63/2326471 W 1.29 4 0 0.0 0.75 55028.80 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
36-1424 17101 0/18/2331624 W 2.05 14 0 0.0 0.21 53656.66 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
37-1424 17102 0/20/2314444 W 1.51 19 0 0.0 0.29 55684.29 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
38-1424 19665 0/1/2295814 W 0.00 3 0 0.0 0.00 52187.61 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
39-984 8727 1/69/1464486 G 0.68 2906097 450284 2.8 0.83 33844.50 74.63.153.4 suspensionrevolution.com:80 POST /dap/dap-clickbank-6.0.php HTTP/1.1
40-1424 19720 0/1/2277467 W 0.00 2 0 0.0 0.00 55864.93 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
41-1424 18141 0/6/2270838 W 0.62 14 0 0.0 0.02 54059.95 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
42-983 18177 1/49/1440183 G 0.63 2910665 450302 2.8 0.57 31224.74 74.63.153.4 suspensionrevolution.com:80 POST /dap/dap-clickbank-6.0.php HTTP/1.1
43-1424 16104 2/57/2242969 W 2.62 5 0 8.8 0.83 56170.39 54.202.7.147 appcoiner.com:80 GET /start-2/?utm_expid=111102625-1.-ThtNpCTSByWcbkMGdBOow.1&ho
44-1424 16547 0/28/2247277 W 3.84 5 0 0.0 0.31 53028.08 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
45-1424 15797 0/80/2225028 W 3.24 5 0 0.0 1.63 51333.94 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
46-1424 19721 0/1/2205346 W 0.00 2 0 0.0 0.00 52025.79 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
47-1424 18142 0/11/2207016 W 0.94 1 0 0.0 0.07 51355.07 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
48-1424 17104 0/137/2172322 W 1.28 7 0 0.0 2.32 49665.11 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
49-1424 16314 0/63/2168481 W 4.14 5 0 0.0 1.16 49191.04 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
50-1424 19763 0/0/2141243 W 2.41 12 0 0.0 0.00 49538.97 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
51-1424 17106 0/20/2137681 W 1.24 7 0 0.0 0.29 49973.70 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
52-1424 16549 0/34/2125106 W 1.99 7 0 0.0 0.50 50442.63 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
53-1424 17107 0/20/2109740 W 2.18 2 0 0.0 0.31 48074.92 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
54-1424 18143 0/4/2087977 W 1.21 8 0 0.0 0.00 49243.64 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
55-1424 17114 0/30/2062106 W 0.34 17 0 0.0 0.46 48605.36 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
56-1424 17115 0/19/2064562 W 2.07 0 0 0.0 0.30 47600.46 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
57-1424 16569 0/41/2051051 W 3.50 8 0 0.0 0.54 47547.57 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
58-1424 17116 0/28/2023150 W 1.28 4 0 0.0 0.42 49170.14 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
59-1424 17117 0/30/2010767 W 1.41 1 0 0.0 0.51 47681.03 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
60-1424 17118 0/17/1999913 R 0.05 53 5 0.0 0.26 46914.84 65.30.135.196
61-1424 16572 0/35/1978028 W 2.73 16 0 0.0 0.45 45848.82 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
62-1424 16573 0/27/1957566 W 1.60 0 0 0.0 0.45 46768.01 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
63-1424 16574 0/43/1936669 W 2.37 3 0 0.0 0.54 43520.07 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
64-1424 16575 0/28/1922381 W 1.54 1 0 0.0 0.33 45007.49 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
65-1424 16576 0/32/1903916 W 2.15 13 0 0.0 0.73 45117.83 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
66-1424 17119 0/28/1878566 W 1.18 6 0 0.0 0.45 44448.25 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
67-1424 16578 1/39/1869043 K 2.01 0 0 1.2 0.53 44966.25 114.79.47.51 suspensionrevolution.com:80 GET /favicon.ico HTTP/1.1
68-1424 16579 0/37/1841958 W 2.59 5 0 0.0 0.48 44262.26 72.5.231.11 appcoiner.com:80 GET /?hopc2s=nakt123 HTTP/1.1
This is my TOP output:
root#ns513521 [~]# top
top - 08:21:31 up 153 days, 3:51, 1 user, load average: 0.15, 0.27, 0.51
Tasks: 230 total, 2 running, 227 sleeping, 0 stopped, 1 zombie
Cpu(s): 0.4%us, 0.3%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 32855908k total, 25102984k used, 7752924k free, 886004k buffers
Swap: 1569780k total, 63984k used, 1505796k free, 21254784k cached
This is my ioStat output:
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 21.99 1476.57 549.50 19540299168 7271856568
sdb 19.53 982.85 549.49 13006647390 7271718616
sdc 19.46 978.26 549.49 12945853934 7271718616
md2 9.78 492.22 399.94 6513868322 5292584264
md1 20.05 27.80 140.15 367920858 1854711136
I think you can just use sar, e.g. something like this :
sar -q -s 00:00:00 -e 11:59:59 -f /var/log/sa/sa`date +%d | awk '{printf "%02d", $1 - 1}'`