Read after write consistency with mysql and multiple concurrent connections - mysql

I'm trying to understand whether it is possible to achieve the following:
I have multiple instances of an application server running behind a round-robin load balancer. The client expects GET after POST/PUT semantics, in particular the client will make a POST request, wait for the response and immediately make a GET request expecting the response to reflect the change made by the POST request, e.g:
> Request: POST /some/endpoint
< Response: 201 CREATED
< Location: /some/endpoint/123
> Request: GET /some/endpoint/123
< Response must not be 404 Not Found
It is not guaranteed that both requests are handled by the same application server. Each application server has a pool of connections to the DB. Each request will commit a transaction before responding to the client.
Thus the database will on one connection see an INSERT statement, followed by a COMMIT. One another connection, it will see a SELECT statement. Temporally, the SELECT will be strictly after the commit, however there may only be a tiny delay in the order of milliseconds.
The application server I have in mind uses Java, Spring, and Hibernate. The database is MySQL 5.7.11 managed by Amazon RDS in a multiple availability zone setup.
I'm trying to understand whether this behavior can be achieved and how so. There is a similar question, but the answer suggesting to lock the table does not seem right for an application that must handle concurrent requests.

Under ordinary circumstances, you will not have any issue with this sequence of requests, since your MySQL will have committed the changes to the database by the time the 201 response has been sent back. Therefore, any subsequent statements will see the created / updated record.
What could be the extraordinary circumstances under which the subsequent select will not find the updated / inserted record?
Another process commits an update or delete statement that changes or removes the given record. There is not too much you can do about this, since it is part of the normal operation. If you do not want such thing to happen, then you have to implement application level locking of data.
The subsequent GET request is routed not only to a different application server, but that one uses (or is forced to use) a different database instance, which does not have the most updated state of that record. I would envisage this to happen if either application or database server level there is a severe failure, or routing of the request goes really bad (routed to a data center at a different geographical location). These should not happen too frequently.

If you're using MyISAM tables, you might be seeing the effects of 'concurrent inserts' (see 8.11.3 in the mysql manual). You can avoid them by either setting the concurrent_insert system variable to 0, or by using the HIGH_PRIORITY keyword on the INSERT.

Related

MySQL trigger notifies a client

I have an Android frontend.
The Android client makes a request to my NodeJS backend server and waits for a reply.
The NodeJS reads a value in a MySQL database record (without send it back to the client) and waits that its value changes (an other Android client changes it with a different request in less than 20 seconds), then when it happens the NodeJS server replies to client with that new value.
Now, my approach was to create a MySQL trigger and when there is an update in that table it notifies the NodeJS server, but I don't know how to do it.
I thought two easiers ways with busy waiting for give you an idea:
the client sends a request every 100ms and the server replies with the SELECT of that value, then when the client gets a different reply it means that the value changed;
the client sends a request and the server every 100ms makes a SELECT query until it gets a different value, then it replies with value to the client.
Both are bruteforce approach, I would like to don't use them for obvious reasons. Any idea?
Thank you.
Welcome to StackOverflow. Your question is very broad and I don't think I can give you a very detailed answer here. However, I think I can give you some hints and ideas that may help you along the road.
Mysql has no internal way to running external commands as a trigger action. To my knowledge there exists a workaround in form of external plugin (UDF) that allowes mysql to do what you want. See Invoking a PHP script from a MySQL trigger and https://patternbuffer.wordpress.com/2012/09/14/triggering-shell-script-from-mysql/
However, I think going this route is a sign of using the wrong architecture or wrong design patterns for what you want to achieve.
First idea that pops into my mind is this: Would it not be possible to introduce some sort of messaging from the second nodjs request (the one that changes the DB) to the first one (the one that needs an update when the DB value changes)? That way the the first nodejs "process" only need to query the DB upon real changes when it receives a message.
Another question would be, if you actually need to use mysql, or if some other datastore might be better suited. Redis comes to my mind, since with redis you could implement the messaging to the nodejs at the same time...
In general polling is not always the wrong choice. Especially for high load environments where you expect in each poll to collect some data. Polling makes impossible to overload the processing capacity for the data retrieving side, since this process controls the maximum throughput. With pushing you give that control to the pushing side and if there is many such pushing sides, control is hard to achieve.
If I was you I would look into redis and learn how elegantly its publish/subscribe mechanism can be used as messaging system in your context. See https://redis.io/topics/pubsub

How to understand a session on an sql server?

Hi am confused with sql servers session. What does it actually mean? Does it keep track of the client like httpSession? I have read some documents on query life cycle. None talks about the sesion. Most of the documents say that after the query is recived by the server it gets parsed and then maintains a syntax tree and then execution plan and then executes the query and then a dispatch palan and then dispatches the resultset to the client who issued the query on the server. In the whole story where does the session on sql server like mysql server fits in and what actually it does? or There is no session concept on Mysql server(any sql server)? am i in wrong imagination?
A session in this context usually just refers to a single client connection.
The client connects to the DB server and authenticates; this is the start of the session.
When the client disconnects (gracefully or not) the session ends.
This is relevant for things like temporary tables or transactions: Un-committed transactions will be rolled back by the DBMS and all temporary tables created through this connection (=session) are discarded when the client disconnects, i.e. when the session ends.
Note that a client does not necessarily actively end a session or connection. The client may crash, or the network connection may break, or the server may shut down &c. Any of this implicitly terminates the session.
Problems may arise when a (client) application uses a connection pool keeping connections (and sessions) open and handing them out transparently to different application components. When not handled correctly, errors may occur because a given session may already be 'spoiled' by a previous operation. If, for example, one routine on the client creates a temporary table named 'X' and fails to explicitly drop it afterwards, the next routine that 'inherits' this session may encounter an error when trying to create another temporary table of that name, because it already exists in this specific session; which couldn't be the case if the connection/session was freshly created.
"Session" is mainly a generic term. You connect to a server (MySQL, Oracle, FTP, IRC... whatever), you do your stuff and finally disconnect when you're done. That has been a session.
HTTP is a particular case. It's a stateless protocol: if you spend an hour reading a web site, you don't remain connected for a whole hour. You make a quick connection, fetch an item at a time (an HTML document, a style sheet, a picture...) and close the connection. (Internals are actually more complex but that's the general idea.) When you ask for a second page, the server doesn't know who you are: that makes it impossible to keep track of your whole browsing session at protocol level. Thus HTTP sessions were invented: they're a way to emulate physical sessions.
The MySQL session starts when you open a connection to the server. A connection ID is assigned which can be read via the SELECT CONNECTION_ID() statement. The session is terminated when the connection is closed or, in case of persistent connections, after a certain timeout or when the server shuts down.

How to cache popular queries to avoid both stamedes and blank results

On the customizable front page of our web site, we offer users the option of showing modules showing recently updated content, choosing from well over 100 modules.
All of the data is generated by MySQL queries, the results of which are cached via memcached. Our current system works like this: when a user load a page containing modules, module, they are immediately served the data from cache, and the query is added to a queue to be updated by a separate gearman process (so that the page load does not wait for the mysql query). That query is then run once every 15 minutes to refresh the data in cache. The queue of queries itself is periodically purged so that we do not continually refresh data that has not been requested recently.
The problem is what to do when the cache is empty, for some reason. This doesn't happen often, but when it does, the user is currently shown an empty module, and the data is refreshed in the gearman process so that a bit later, when the same (or a different) user reloads the page, there is data to show.
Our traffic is such that, if we were to try to run the query live for the user when the cache is empty, we would have a serious problem with stampeding--we'd be running the same (possibly slow) query many times as many users loaded the page. Is there any way to solve the "blank module" problem without opening up the risk of stampeding?
This is an interesting implementation though varies a bit from the way most typically implement memcached in fronT of MySQL.
In most cases users will set things up to where queries are first evaluated at memcached to see if there is is an available entry. If so they server it from memcached and never query the database at all. If there is a cache miss, then the query is made against the database, the results added to memcached, and the information returned to the caller. This is how you would typically build up your cache for read queries.
In cases where data is being updated, the update would be made against the database, and then the appropriate data in memcached invalidated and/or updated. Similarly for inserts, you could either do nothing regarding the cache (and let the next read on that record populate the cache), or you could actively add the data related to the insert into the cache, depending on your application needs.
In this way you wouldn't need to take the extra step of calling the database to get authoritative data after getting initial data from memcached. The data in memcached would be a copy of the authoritative data which is just updated/invalidated upon updates/inserts.
Based on your comments, one thing you might want to try in order to prevent a number of of queries on your database in case of cache misses is to use a mutex of sorts. For example, when the first client hits memcached and gets a cache miss for that lookup, you could could insert a temporary value in memcached indicating that the data is pending, then make the query against the database, and the update the memcached data with the result.
On the client side, when you get a cache miss or a "pending" result, you could simply initiate a retry for the cache after a certain period of time (which you may want to increase exponentially). So perhaps first hey wait for 1 second, then try back gain in 2 seconds if they still get a "pending" results, then retry in 4 seconds, and so on.
This would amount in possibly more requests against the memcached server, but should resolve any problems on the database layer.

Prevent 'too many connections'(ConnectionPool is not the answer, looking for mysql server side solution)

A few weeks ago, I post a question about queuing database access request to prevent 'too many connection' error when massive concurrent db requests happen. People told me ConnectionPool is the right way to go which I agreed at that time. However, I finally realized this is not the solution especially when there are a lot of different clients accessing mysql server through network, because connection pool is at client side it can not prevent the sum of connections of all clients from exceeding the max connection number of mysql server.
I think there should be some middleware on the mysql server working as a queue or pool, is anybody familiar with this? Thank you.
I know this question is widely asked, I am also surprised as if there is no total solution for it.
HAProxy should perform TCP-level queueing for you purpose. Though, would it be better to build an application server in the middle, to handle incoming flow at more conscious level than TCP. This could require rewriting of both server and clients, but could give you more control over what's happening.
What you ask is actually a pretty complicated problem.
First of all you need to decide whether mis-alignments in data are acceptable, for example: if you store in the database the number of Likes received, and you ask this number at 12:00:00, and the number in the DB is 500, and someone posts a LIKE at 12:00:01, and you query it again at 12:00:02; is it OK to receive "500" again, even if the correct number should be 501, provided that in a little time the answer "501" does come out?
If this is acceptable (the infamous "301 bug" in YouTube), then you might start caching some SELECT responses.
You might even cache them in middleware, i.e. have a special process running continuously and hogging ONE connection to MySQL, and answering requests in a queue. You might run it internally in the server as a Web server on port 8001 and have an Apache ReverseProxy, HAproxy, pound, or NginX location to proxy it outside.
You can do the same for special UPDATE/DELETE queries even if it's trickier.
It would be best to cache queries running asynchronously through AJAX first, if any, because serializing queries with a proxy is liable to perceptibly slow down the application.
You have a threefold target:
run queries on MySQL as fast as possible (look into indexing and MySQL caching) in order to free the ConnectionPool and keep it as lightly loaded as possible.
refactor the application in order to extract all information from queries (e.g., the number of rows with a certain property AND those rows as data are often retrieved using TWO queries, but with proper management you need only one and a SQLNumRows() call. Also, quite often similar queries with different informations are run, when a single query might have returned all information at one go: typically, one query to check user/password, another to fetch the complete user profile).
divert the most calls possible to something not at all (NginX, middleware) or lightly (queuing process) bound to MySQL; in the latter case, using a known number of connections in order to run predictably.
Unfortunately there's no easy "magic bullet" to solve this problem (except of course increasing the number of connections, maybe replicating the DB on several hosts running as master-slave. While not really a magic bullet, it is easier to design and implement).

How To Mutex Across a Network?

I have a desktop application that runs on a network and every instance connects to the same database.
So, in this situation, how can I implement a mutex that works across all running instances that are connected to the same database?
In other words, I don't wan't that two+ instances to run the same function at the same time. If one is already running the function, the other instances shouldn't have access to it.
PS: Database transaction won't solve, because the function I wan't to mutex doesn't use the database. I've mentioned the database just because it can be used to exchange information across the running instances.
PS2: The function takes about ~30 minutes to complete, so if a second instance tries to run the same function I would like to display a nice message that it can't be performed right now because computer 'X' is already running that function.
PS3: The function has to be processed on the client machine, so I can't use stored procedures.
I think you're looking for a database transaction. A transaction will isolate your changes from all other clients.
Update:
You mentioned that the function doesn't currently write to the database. If you want to mutex this function, there will have to be some central location to store the current mutex holder. The database can work for this -- just add a new table that includes the computername of the current holder. Check that table before starting your function.
I think your question may be confusion though. Mutexes should be about protecting resources. If your function is not accessing the database, then what shared resource are you protecting?
put the code inside a transaction either - in the app, or better -inside a stored procedure, and call the stored procedure.
the transaction mechanism will isolate the code between the callers.
Conversely consider a message queue. As mentioned, the DB should manage all of this for you either in transactions or serial access to tables (ala MyISAM).
In the past I have done the following:
Create a table that basically has two fields, function_name and is_running
I don't know what RDBMS you are using, but most have a way to lock individual records for update. Here is some pseduocode based on Oracle:
BEGIN TRANS
SELECT FOR UPDATE is_running FROM function_table WHERE function_name='foo';
-- Check here to see if it is running, if not, you can set running to 'true'
UPDATE function_table set is_running='Y' where function_name='foo';
COMMIT TRANS
Now I don't have the Oracle PSQL docs with me, but you get the idea. The 'FOR UPDATE' clause locks there record after the read until the commit, so other processes will block on that SELECT statement until the current process commits.
You can use Terracotta to implement such functionality, if you've got a Java stack.
Even if your function does not currently use the database, you could still solve the problem with a specific table for the purpose of synchronizing this function. The specifics would depend on your DB and how it handles isolation levels and locking. For example, with SQL Server you would set the transaction isolation to repeatable read, read a value from your locking row and update it inside a transaction. Don't commit the transaction until your function is done. You can also use explicit table locks in a transaction on most databases which might be simpler. This is probably the simplest solution given you are already using a database.
If you do not want to rely on the database for whatever reason you could write a simple service that would accept TCP connections from your client. Each client would request permission to run and would return a response when done. The server would be able to ensure only one client gets permission to run at a time. Dead clients would eventually drop the TCP connection and be detected as long as you have the correct keep alive setting.
The message queue solution suggested by Xepoch would also work. You could use something like MSMQ or Java Message Queue and have a single message that would act as a run token. All your clients would request the message and then repost it when done. You risk a deadlock if a client dies before reposting so you would need to devise some logic to detect this and it might get complicated.