How to make polling from database scalable? - mysql

I am try to find a scalable way to allow for my desktop application to run command when a change in the database is made.
The application is for running a remote command on your PC. The user logs into the website and can choose the run the command. Currently, users have to download a desktop application that checks the database every few seconds to see if a value has changed. The value can only be changed when they login to a website and press a button.
For now it seems to be working fine since there aren't many users. But when I hit 100+ users hitting the database 100+ times every few seconds is not good. What might be a better approach?

It's true that polling for changes is too expensive, especially if you have many clients. The queries are often very costly, and it's tempting to run the queries frequently to make sure the client gets notified promptly after a change. It's better to avoid polling the database.
One suggestion in the comments above is to use a UDF called from a trigger. But I don't recommend this, because a trigger runs when you do an INSERT/UPDATE/DELETE, not when you COMMIT the change. So a client could be notified of a change, and then when they check the database the change appears to not be there, because either the transaction was rolled back, or else the transaction simply hasn't been committed yet.
Another reason the trigger solution is not good is that MySQL triggers execute once for each row changed, not once for each INSERT/UPDATE/DELETE statement. So you could cause notification spam, if you do an UPDATE that affects thousands of rows.
A different solution is to use a message queue like RabbitMQ or ActiveMQ or Amazon SQS (there are many others). When a client commits their INSERT/UPDATE/DELETE, they confirm the commit succeeded, then post a message on a message queue topic. Many clients can be notified efficiently this way. But it requires that every client who commits changes to the database write code to post to the message queue.
Another solution is for clients to subscribe to MySQL's binary log and read it as a change data capture log. Every committed change to the database is logged in the binary log. You can make clients read this, and it has no more impact to the database server than a replication client (MySQL can easily support hundreds of replicas).
A hybrid solution is to consume the binary log, and turn those changes into events in a message queue. This is how a product like Debezium works. It reads the binary log, and posts events to an Apache Kafka message queue. Then other clients can wait for events on the Kafka queue and respond to them.

Related

Trigger script when SQL Data changes

I'm trying to make a live react control panel, so when you push a button on the web control panel the data (true or false) goes to the SQL database (phpmyadmin) and the when the data changes te SQL database should trigger a script on the raspberry pi that will turn the light on.
I know how to write data to the SQL database and how to control a lamp with a raspberry pi but I dont know how to trigger or execute something when data in the SQL database gets updated.
It needs to live, like react in max 20 ms or something. Can anyone help me with this?
The SQL Database runs on Ubuntu and is phpmyadmin based.
Greets,
Jules
Schematic:
DataUpdateGraphical
It's not a good idea to use a trigger in MySQL to activate any external process. The reason is that the trigger fires when the INSERT/UPDATE/DELETE executes, not when the transaction commits. So if the external process receives the event, it may immediately go query the database to get other details about that data change, and find it cannot see the uncommitted data.
Instead, I recommend whatever app is writing to the database should be responsible for creating the notification. Only then can the app wait until after the transaction is confirmed to be committed.
So your PHP code that handles the button press would insert/update some data the database, and check that the SQL completed without errors (always check the result of executing an SQL statement) and the transaction committed.
Then the same PHP code subsequently calls your script, or posts an even to a message queue that the script is waiting for, or something like that.
Just don't use the MySQL as a poor man's message queue! It's not the right tool for that.
The same advice applies to any other action you want to do external to the database. Like sending an email, writing a file, making an http API call, etc.
Don't do it in an SQL trigger, because external actions don't obey transaction isolation. The trigger or one of the cascading data updates could get rolled back, but the effect of an external action cannot be rolled back.
MySQL doesn't have a way to deliver an event to external software from within a trigger. That's what you need to have your database push events to your app.
(Actually, it's possible to install a user-defined function that sends an industry-standard stomp messsage to a message queue system like rabbitmq . But you will have to control the entire server, AND your database administrator, to get that installed.)
The alternative: run a query every so often to retrieve changed information, and push it to your app. That's a nasty alternative: polling is a pain in the xxx neck.
Can you get your server app to detect changes as it UPDATEs the database? It'll take some programming and testing, but it's a good solution to your problem.
You could use redis instead of / in addition to MySql. redis sends events to web servers whenever values change, which is close to perfect for what you want to do. https://redis.io/topics/notifications

Is there a way to keep track of the calls being done in mysql server by a web app?

I'm finishing a system at work that makes calls to mysql server. Those calls' arguments reveal information that I need to keep private, like vote(idUser, idCandidate). There's no information in the db that relates those two of course, nor in "the visible part" of the back end, but even though I think this can't be done, I wanted to make sure that it is impossible to trace this sort of calls, with a log or something (calls that were made, or calls being made at the moment), as it is impossible in most languages, unless you specifically "debug" in a certain way, while the system is in production and being used. I hope the questions is clear enough. Thanks.
How do I log thee? Let me count the ways.
MySQL query log. I can enable this per-session and send everything to a log file.
I can set up a slave server and have insertions sent to me by the master. This is a significant intervention and would leave a wide trace.
On the server, unbeknownst to either Web app and MySQL log, I can intercept communications between the two. I need administrative access to the machine, of course.
On the server, again with administrative access, I can both log the query calls and inject a logging instrumentation into the SQL interface (the legitimate one is the MySQL Audit Plugin, but there are several alternatives, developed for various purposes by developers over the years)
What can you do? You can have the applications use a secure protocol, just for starters.
Then, you need to secure your machine so that administrator tricks do not work, and even if the logs are activated, nobody can read them and you can be advised of any new and modified file to delete it promptly.

Best Practice for synchronized jobs in Application clusters

We have got 3 REST-Applications within a cluster.
So each application server can receive requests from "outside".
Now we got timed events, which are analysing the database and add/remove rows from the database, send emails, etc.
The problem is, that each application server does start this timed events and it happens that 2 application server are starting this analysing job at the same time.
We got a sql table in the back.
Our idea was to lock a table within the sql database, when starting the job. If the table is locked, we exit the job, because an other application just started to analyse.
What's a good practice to insert some kind of semaphore ?
Any ideas ?
Don't use semaphores, you are over complicating things, just use message queueing, where you queue your tasks and get them executed in row.
Make ONLY one separate node/process/child_process to consume from the queue and get your task done.
We (at a previous employer) used a database-based semaphore. Each of several (for redundancy and load sharing) servers had the same set of cron jobs. The first thing in each was a custom library call that did:
Connect to the database and check for (or insert) "I'm working on X".
If the flag was already set, then the cron job silently exited.
When finished, the flag was cleared.
The table included a timestamp and a host name -- for debugging and recovering from cron jobs that fail to finish gracefully.
I forget how the "test and set" was done. Possibly an optimistic INSERT, then check for "duplicate key".

RDS Read Replica Considerations

We hired an intern and want to let him play around with our data to generate useful reports. Currently we just took a database snapshot and created a new RDS instance that we gave him access to. But that is out of date almost immediately due to changes on the production database.
What we'd like is a live (or close-to-live) mirror of our actual database that we can give him access to without worrying about him modifying any real data or accidentally bringing down our production database (eg by running a silly query like SELECT (*) FROM ourbigtable or a really slow join).
Would a read replica be suitable for this purpose? It looks like it would at least be staying up to date but I'm not clear what would happen if a read replica went down or if data was accidentally changed on it or any other potential liabilities.
The only thing I could find related to this was this SO question and this has me a bit worried (emphasis mine):
If you're trying to pre-calculate a lot of data and otherwise modify
what's on the read replica you need to be really careful you're not
changing data -- if the read is no longer consistent then you're in
trouble :)
TL;DR Don't do it unless you really know what you're doing and you
understand all the ramifications.
And bluntly, MySQL replication can be quirky in my experience, so even
knowing what is supposed to happen and what does happen if there's as
the master tries to write updated data to slave you've also
updated.... who knows.
Is there any risk to the production database if we let an intern have at it on an unreferenced read replica?
We've been running read-replicas of our production databases for a couple years now without any significant issues. All of our sales, marketing, etc. people who need the ability to run queries are provided access to the replica. It's worked quite well and has been stable for the most part. The production databases are locked down so that only our applications can connect to it, and the read-replicas are accessible only via SSL from our office. Setting up the security is pretty important since you would be creating all the user accounts on the master database and they'd then get replicated to the read-replica.
I think we once saw a read-replica get into a bad state due to a hardware-related issue. The great thing about read-replicas though is that you can simply terminate one and create a new one any time you want/need to. As long as the new replica has the exact same instance name as the old one its DNS, etc. will remain unchanged, so aside from being briefly unavailable everything should be pretty much transparent to the end users. Once or twice we've also simply rebooted a stuck read-replica and it was able to eventually catch up on its own as well.
There's no way that data on the read-replica can be updated by any method other than processing commands sent from the master database. RDS simply won't allow you to run something like an insert, update, etc. on a read-replica no matter what permissions the user has. So you don't need to worry about data changing on the read-replica causing things to get out of sync with the master.
Occasionally the replica can get a bit behind the production database if somebody submits a long running query, but it typically catches back up fairly quickly once the query completes. In all our production environments we have a few monitors set up to keep an eye on replication and to also check for long running queries. We make use of the pmp-check-mysql-replication-delay command in the Percona Toolkit for MySQL to keep an eye on replication. It's run every few minutes via Nagios. We also have a custom script that's run via cron that checks for long running queries. It basically parses the output of the "SHOW FULL PROCESSLIST" command and sends out an e-mail if a query has been running for a long period of time along with the username of the person running it and the command to kill the query if we decide we need to.
With those checks in place we've had very little problem with the read-replicas.
The MySQL replication works in a way that what happens on the slave has no effect on the master.
A replication slave asks for a history of events that happened on the master and applies them locally. The master never writes anything on the slaves: the slaves read from the master and do the writing themselves. If the slave fails to apply the events it read from the master, it will stop with an error.
The problematic part of this style of data replication is that if you modify the slave and later modify the master, you might have a different value on the slave than on the master. This can be avoided by turning on the global read_onlyvariable.

MySQL transactions: reads while writing

I'm implementing PayPal Payments Standard in the website I'm working on. The question is not related to PayPal, I just want to present this question through my real problem.
PayPal can notify your server about a payment in two ways:
PayPal IPN - after each payment PayPal sends a (server-to-server) notification to a url (choose by you) with the transaction details.
PayPal PDT - after a payment (if you set this up in your PP account) PayPal will redirect the user back to your site, passing the transaction id in the url, so you can query PayPal about that transaction, to get details.
The problem is, that you can't be sure which one happens first:
Will your server notified by IPN
Will be the user redirected back to your site
Whichever is happening first, I want to be sure I'm not processing a transaction twice.
So, in both cases, I query my DB against the transaction id coming from paypal (and the payment status actually..but it doesn't matter now) to see if I already saved and processed that transaction. If not, I process it, and save the transaction id with other transaction details into my database.
QUESTION
What happens if I start processing the first request (let it be the PDT..so the user was redirected back to my site, but my server wasn't notified by IPN yet), but before I actually save the transaction to database, the second (the IPN) request arrives and it will try to process the transaction too, because it doesn't find it in DB.
I would love to make sure that while I'm writing a transaction into database, no other queries can read the table, looking for that given transaction id.
I'm using InnoDB, and don't want to lock the whole table, for the time of the write.
Can this be solved simply by transactions, have I to lock "manually" that row? I'm really confused, and I hope some more experienced mysql developers can help making this clear for me and solving the problem.
Native database locks are almost useless in a Web context, particularly in situations like this. MySQL connections are generally NOT done in a persistent way - when a script shuts down, so does the MySQL connection and all locks are released and any in-flight transactions are rolled back.
e.g.
situation 1: You direct a user to paypal's site to complete the purchase
When they head off paypal, the script which sent over the http redirect will terminate and shuts down. Locks/transactions are released/rolled back, and they come back to a "virgin" status as far as the DB is concerned. Their record is no longer locked.
situation 2: Paypal does a server-to-server response. This will be done via a completely separate HTTP connection, utterly distinct from the connection established by the user to your server. That means any locks you establish in the yourserver<->user connection will be distinct from the paypal<->yourserver session, and the paypal response will encounter locked tables. And of course, there's no way of predicting when the paypal response comes in. If the network gods smile upon you and paypal's not swamped, you get a response very quickly and possibly while the user<->you connection is still open. If things are slow and the response is delayed, that response MAY encounter unlocked tables/rows because the user<->server session has completed.
You COULD use persistent MySQL connections, but they open up a whole other world of pain. e.g. consider the case where your script has a bug which gets triggered halfway through processing. You connection, do some transaction work, set up some locks... and then the script dies. Because the MySQL connection is persistent, MySQL will NOT see that the client script has died, and it will keep the transactions/locks in-flight. But the connection is still sitting there, in the shared pool waiting for another session to pick it up. When it invariably is, that new script has no idea that it's gotten this old "stale" connection. It'll step into the middle of a mess of locks and transactions it has no idea exists. You can VERY easily get yourself into a deadlock situation like this, because your buggy scripts have dumped garbage all over the system and other scripts cannot cope with that garbage.
Basically, unless you implement your own locking mechanism on top of the system, e.g. UPDATE users SET locked=1 WHERE id=XXX, you cannot use native DB locking mechanisms in a Web context except in 1-shot-per-script contexts. Locks should never be attempted over multiple independent requests.