Random extremely long request in Rails app

Random extremely long request in Rails app - mysql

I have a really strange behaviour in my Rails app.
Sometimes, like 1 request every 500, I have a simple ActiveRecord code which takes a lot of time (from 5 sec to X min). Some examples here:
User.where(id: params[:id]).first # or
current_user # (with *Devise*) or
Authentication.where(provider: 'whatever', user_id: 12345).first
I don't really know where to look at: all these requests have appropriate indexes (they usually work in millisecond times), the server load seems constant, ...
Anyone here had this kind of issue before or some ideas on how I could resolve them?
Thanks!
FYI, I work with Rails/ActiveRecord 3.2.17, puma 3.11.4, Docker (ECS on AWS), MySQL 5.6.34

Related

How to test whether log compaction is working or not in Kafka?

I have made the changes in server.properties file in Kafka 0.8.1.1 i.e. added log.cleaner.enable=true and also enabled cleanup.policy=compact while creating the topic.
Now when I am testing it, I pushed the following messages to the topic with following (Key, Message).
Offset: 1 - (123, abc);
Offset: 2 - (234, def);
Offset: 3 - (345, ghi);
Offset: 4 - (123, changed)
Now I pushed the 4th message with a same key as an earlier input, but changed the message. Here log compaction should come into picture. And using Kafka tool, I can see all the 4 offsets in the topic. How can I know whether log compaction is working or not? Should the earlier message be deleted, or the log compaction is working fine as the new message has been pushed.
Does it have to do anything with the log.retention.hours or topic.log.retention.hours or log.retention.size configurations? What is the role of these configs in log compaction.
P.S. - I have thoroughly gone through the Apache Documentation, but still it is not clear.

even though this question is a few months old, I just came across it doing research for my own question. I had created a minimal example for seeing how compaction works with Java, maybe it is helpful for you too:
https://gist.github.com/anonymous/f78184eaeec3ee82b15182aec24a432a
Furthermore, consulting the documentation, I used the following configuration on a topic level for compaction to kick in as quickly as possible:
min.cleanable.dirty.ratio=0.01
cleanup.policy=compact
segment.ms=100
delete.retention.ms=100
When run, this class shows that compaction works - there is only ever one message with the same key on the topic.
With the appropriate settings, this would be reproducible on command line.

Actually, the log compaction is visible only when the number of logs reaches to a very high count eg 1 million. So, if you have that much data, it's good. Otherwise, using configuration changes, you can reduce this limit to say 100 messages, and then you can see that out of the messages with the same keys, only the latest message will be there, the previous one will be deleted. It is better to use log compaction if you have full snapshot of your data everytime, otherwise you may loose the previous logs with the same associated key, which might be useful.

In order check a Topics property from CLI you can do it using Kafka-topics cmd :
https://grokbase.com/t/kafka/users/14aev0snbd/command-line-tool-for-topic-metadata

It is a good point to take a look also on log.roll.hours, which by default is 168 hours. In simple words: even in case you have not so active topic and you are not able to fill the max segment size (by default 1G for normal topics and 100M for offset topic) in a week you will have a closed segment with size below log.segment.bytes. This segment can be compacted on next turn.

You can do it with kafka-topics CLI.
I'm running it from docker(confluentinc/cp-enterprise-kafka:6.0.0).
$ docker-compose exec kafka kafka-topics --zookeeper zookeeper:32181 --describe --topic count-colors-output
Topic: count-colors-output PartitionCount: 1 ReplicationFactor: 1 Configs: cleanup.policy=compact,segment.ms=100,min.cleanable.dirty.ratio=0.01,delete.retention.ms=100
Topic: count-colors-output Partition: 0 Leader: 1 Replicas: 1 Isr: 1
but don't get confused if you don't see anything in Config field. It happens if default values were used. So, unless you see cleanup.policy=compact in the output - the topic is not compacted.

How do I get Rails 4.x streaming to work with MySQL when testing?

I created a new Rails 4.2.1 test project to try out the new streaming feature (the 'Live' one which I read about here). This project is set up to use MySQL for the database (I also tried Sqlite but couldn't repro the issue with it). The project is simple, consisting only of: 1) a model Test with 2 attributes (both strings). 2) a simple route resources :tests and 3) a simple controller tests_controller with one action index. The model and controller were generated by the standard rails generators, and only the controller was modified, as follows:
class TestsController < ApplicationController
include ActionController::Live
def index
response.headers['Content-Type'] = 'application/json'
response.stream.write('{"count": 5, "tests": [')
Test.find_each do |test|
response.stream.write(test.to_json)
response.stream.write(',')
end
response.stream.write(']}')
response.stream.close
end
end
When I run rails s and test by hand everything seems fine. But when I added a test (shown below) I get a strange error:
1) Error:
TestsControllerTest#test_index:
ActiveRecord::StatementInvalid: Mysql2::Error: This connection is in use by: #<Thread:0x007f862a4a7e48#/Users/xxx/.rvm/gems/ruby-2.2.2/gems/actionpack-4.2.1/lib/action_controller/metal/live.rb:269 sleep>: ROLLBACK
The test is:
require 'test_helper'
class TestsControllerTest < ActionController::TestCase
test "index" do
#request.headers['Accept'] = 'application/json'
get :index
assert_response :success
end
end
Note that the error is intermittent, coming up only about half the time. Also, even though testing by hand doesn't cause any errors I'm worried that when multiple clients hit the API at the same time that errors will occur. Any suggestions as to what's going on here would be much appreciated.

Pretty old, but you need to actually checkout a new database connection since ActionController::Live executes the action in a new thread:
The final caveat is that your actions are executed in a separate thread than the main thread. Make sure your actions are thread safe, and this shouldn't be a problem (don't share state across threads, etc).
https://github.com/rails/rails/blob/861b70e92f4a1fc0e465ffcf2ee62680519c8f6f/actionpack/lib/action_controller/metal/live.rb
You can even use an around_filter/around_action for this.

django: South does not generate the same index names depending on platform

I'm working on a django application with a MySQL backend. I use south to migrate my schema.
I've just written a migration that drops indexes on some columns, because of a foreign key change AND a name change for said colums.
I did not want to go through the "add the column"->"copy the data"->"remove the old column".
So my migration code chunks look like this (sample given for one table):
# Change FK link from 'reference_workobject' to 'assets_asset'
db.delete_foreign_key('reference_snapshot', 'workObject_id')
db.delete_index('reference_snapshot', ['workObject_id'])
db.rename_column('reference_snapshot', 'workObject_id', 'asset_id')
db.alter_column('reference_snapshot', 'asset_id',
self.gf('django.db.models.fields.related.ForeignKey')(null=True, to=orm['assets.Asset'])
But I get an error that looks like:
DatabaseError: (1091, "Can't DROP 'reference_snapshot_6232368c; check that column/key exists")
Indeed, I fed my development DB (Windows 7 workstation) with a dump from the production server (CentOS 6.5). And I saw that the call to the index name generation (db.create_index_name) does not return the same value on the distinct platforms.
>>> # On WINDOWS
>>> db.create_index_name('reference_snapshot',['workObject_id'])
'reference_snapshot_6232368c'
>>> # On Linux
>>> db.create_index_name('reference_snapshot',['workObject_id'])
'reference_snapshot_9dcdc974'
After investigation, I saw that south should generate my index name as follows:
index_name = '%s_%x' % (table_name, abs(hash((column_name,))) % 2**32)
EDIT
Executing such code snippet yields the same value on both platforms. However the call to create_index_namedoes not. Paradox. Maybe south is not using the code I thought it would.
The call to that snippet of code does actually yields differents results.
>>> # On windows
>>> hash((column_name,))
1965709063
>>> # On Linux
>>> hash((column_name,))
-791966850447943929
If the 32 bits truncation/padding (% 2**32) part was made without the call to abs(), though, the results would be the same.
I conclude from this that the hashing is 32 bits based on windows (even if I use a 64bits python), and 64 bits based on linux.
end edit
So I'm stuck as I can't use the production dumps on windows so as to test my migrations.
Any ideas? Maybe monkey-patching something on my dev workstation could do the trick.
Thanks

Too bad. We can't help but suffer the differences in hash implementation.
However I've just read that the index name calculation will now (as of django 1.5) use md5 because of Python3's randomization in hash(). http://south.aeracode.org/ticket/1222

Uncommitted transactions in Plone + SqlAlchemy + MySql

We have a hybrid web application integrating a MySql db with Plone (last upgrade was to Plone 4.0), using collective.tin, collective.lead and SqlAlchemy.
Ok, I know that collective.tin never was released and collective.lead has been superseded; however all things work (almost) perfectly since a few years.
Recently we experienced a very strange behaviour and are looking for help in order to understand it.
Among others, we have 2 Plone content types, say A and B, defined by subclassing collective.tin, and the corresponding innodb MySql tables; rows of B have a foreign key towards A.
In the time span of 15-20 minutes, 2 different users created 3 A objects and some 10-20 B objects that weren't committed to MySql but were indexed by Plone; queries I executed with a MySql client from the linux shell weren't able to find those A rows (didn't look for B rows); however, queries executed through the web application (the aforementioned components stack) by those 2 users, and also by other users, occasionally were still finding and correctly visualizing some of those 3 A objects.
Only after I restarted the Zope instance, it was possible to resume normal activity from the Plone web interface; 3 A rows and many B rows were still missing from the MySql db, but the autoincrement counter showed the expected increment; I had to remove 3 invalid brains for A objects from the Plone index (didn't worry for B objects).
Any suggestion on possible causes and on how to investigate the problem?

We had the exact same problem with sqlalchemy 0.4; the session would get out of sync with the actual database contents. The problem was somewhat masked in our case because users were sent to specific backends in the cluster through session affinity. If the affinity was lost suddenly messages had disappeared. The exact details are a little hazy, because I cannot locate the correct (ancient) revision history of the fix I put in place.
From what I can glean from context is that the session identity map prevents the session from requiring the database for objects it retrieved before. It thus won't see changes made to these objects in different sessions.
The fix is to call .expire_all() on the session after each and every commit or rollback; SQLAlchemy 0.5 and up does this automatically (autoexpire=True on the session, now called expire_on_commit I believe), but for 0.4 you'll need to register a SessionExtension to do this for you.
Lucky for you, we also use collective.lead for this project, so my fix is your fix:
# The identity map should be flushed on commit.
# SQLAlchemy 0.5 does this properly, but in 0.4 we need to do this via
# a SesssionExtension.
from sqlalchemy import __version__
if __version__[:3] == '0.4':
from sqlalchemy.orm.session import SessionExtension
class ExpireAllSessionExtension(SessionExtension):
def after_commit(self, session):
"""Expire the identity-map on commit"""
session.expire_all()
def after_rollback(self, session):
"""Expire the identity-map on rollback"""
session.expire_all()
def installExtension():
# Patch collective.lead.database to let us install the extension
# on the session created there.
from collective.lead.database import Database
old_session = Database.session.fget
def session(self):
session = old_session(self)
if session.extension is None:
session.extension = ExpireAllSessionExtension()
return session
Database.session = property(session)
else:
def installExtension():
pass
When defining the mapper, you install this extension with:
from .sessionexpiration import installExtension
# Ensure that sessions get properly expired on commit and rollback.
installExtension()

Running multi-queries in MySQL (for SphinxQL)

Currently we're relying on Sphinx's PHP library to manage our faceted search, which depends on the ability to use Sphinx's multi-queries feature.
The latest Sphinx search documentation describes how to perform the same multi-query procedure in SphinxQL, via MySQL. It gives an example using PHP.
http://sphinxsearch.com/docs/manual-2.0.4.html#sphinxql-multi-queries
Do any MySQL gems exist for ruby that support multi-queries in this way?
I'm looking at the mysql2 gem, which seems to be the latest thing, but it doesn't appear to support it. Am I still at a loss when it comes to Sphinx multi-queries in ruby?
I'm going to write a client that supports them in the next few days at work anyway if not, but obviously SphinxQL would make this much easier. I'd also rather not have to make my gem connect to two different protocols for RT indexes (which can only be written to via SphinxQL). It seems like SphinxQL is basically where it's at.

It appears the ruby-mysql gem supports this: https://github.com/tmtm/ruby-mysql/blob/master/lib/mysql.rb#L406-419
I assume it is processing this correctly. It says 'execute', but in reality it appears to simply be fetching the next set of results from a query that was already executed.
# execute next query if multiple queries are specified.
# === Return
# true if next query exists.
def next_result
return false unless more_results
check_connection
#fields = nil
nfields = #protocol.get_result
if nfields
#fields = #protocol.retr_fields nfields
#result_exist = true
end
return true
end
There is reference to it here too: http://zetcode.com/db/mysqlrubytutorial/ (Under 'Multiple Statements')

I ended up writing my own gem, which includes a little wrapper around MySQL. Not a fully-fledged mysql client, but a minimal bridge in order to support SphinxQL.
You can see the gem here: https://github.com/d11wtq/oedipus
And the C extension here: https://github.com/d11wtq/oedipus/blob/master/ext/oedipus/oedipus.c

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Random extremely long request in Rails app - mysql

Related

How to test whether log compaction is working or not in Kafka?

How do I get Rails 4.x streaming to work with MySQL when testing?

django: South does not generate the same index names depending on platform

Uncommitted transactions in Plone + SqlAlchemy + MySql

Running multi-queries in MySQL (for SphinxQL)

Categories

Resources