Data queries and computation happen in MySQL server or Rails server? - mysql

I need to run a long backend job with long MySQL queries regularly, which will take several hours to complete. I set up Delayed Job gem to schedule this job.
When this process is running:
Will this job slow down my Rails front-end server (i.e., it will take much longer to response to a simple user's request)?
Where heavy computation happens: in my Rails server, or in MySQL server?
Will MySQL server be occupied by my scheduled job, and no one can access MySQL at the same time?
Thank you.

The answer to your question is: It depends
If your task is processor intensive it could slow down the rails server. If you are concerned about the DJ workers impacting the front end box, move them to another box with access to a shared DB. Your worker box needs the project setup but does not need to be the same box you are serving pages from.
This is completely dependent on how you wrote your task. Typically a rails app does simple select / insert / update / delete. the actual computation is done in rails. But you can specify select statements that involve complex joins or take advantage of functions in the DB. This can offload the computation of complex fields to the DB
This is dependent on the number of connections your DB is configured to accept. Typically in a production level server, you wouldn't see an issue here from the size of your query. But you should take into account how many active connections there are and how many are permitted. Each rails instance counts as a connection, as well as each worker for DJ.
In each case the actual performance is going to depend on several factors. How many connections are you creating, how much data are you transmitting between worker and DB. Where are you doing the work.

If the rails server is on the same machine as the mysql server, then there will be some impact. But your OS, and MySQL together, are pretty capable of minimizing the effects without much other intervention by you. Depending how you're deployed, you can always utilize the 'nice' command, and lower the priority of the delayed job, minimizing it's impact on your site's responsiveness.

Related

Amazon RDS MySQL/Aurora query sometimes hangs forever. Any 2 cents on the metrics and approaches we can triage it and prevent it from happening?

Just some contexts: In our old data pipeline system, we are running MySQL 5.6. or Aurora on Amazon rds. Bad thing about our old data pipeline is running a lot of heavy computations on the database servers because we are handcuffed by what was designed: treating transactional databases as data warehouse and our backend API directly “fishing” the databases heavily in our old system. We are currently patching this old data pipeline, while re-design the new data warehouse in SnowFlake.
In our old data pipeline system, the data pipeline calculation is a series of sequential MySQL queries. As our data grows bigger and bigger in the old data pipeline, what the problem now is the calculation might just hang forever at, for example, the step 3 MySQL query, while all metrics in Amazon CloudWatch/ grafana we are monitoring (CPU, database connections, freeable memory, network throughput, swap usages, read latency, available storage, write latency, etc. ) looks normal. The MySQL slow query log is not really helpful here because each of our query in the data pipeline is essentially slow anyway (can takes hours to run a query because the old data pipeline is running a lot of heavy computations on the database servers). The way we usually solve these problems is to “blindly” upgrade the MySQL/Aurora Amazon rds service and hoping it will solve the issue. I am wondering
(1) What are the recommended database metrics in MySQL 5.6. or Aurora on Amazon rds we should monitor real-time to help us identify why a query freezes forever? Like innodb_buffer_pool_size?
(2) Is there any existing tool and/or in-house approach where we can predict how many hardware resources we need before we can confidently execute a query and know it will succeed? Could someone share some 2 cents?
One thought: Since Amazon rds sometimes is a bit blackbox, one possible way is to host our own MySQL server on an Amazon EC2 instance in parallel to our Amazon MySQL 5.6/Aurora rds production server, so we can ssh into MySQL server and run a lot of command tools like mytop (https://www.tecmint.com/mysql-performance-monitoring/) to gather a lot more real time MySQL metrics which can help us triage the issue. Open to any 2 cents from gurus. Thank you!
None of the tools mentioned at that link should need to run on the database server itself, and to the extent that this is true, there should be no difference in their behavior if they aren't. Run them on any Linux server, giving the appropriate --host and --user and --password arguments (in whatever form they may expect). Even mysqladmin works remotely. Most of the MySQL command line tools do (such as the mysql cli, mysqldump, mysqlbinlog, and even mysqlcheck).
There is no magic coupling that most administrative utilities can gain by running on the same server as MySQL Server itself -- this is a common misconception but, in fact, even when running on the same machine, they still have to make a connection to the server, just like any other client. They may connect to the unix socket locally rather than using TCP, but it's still an ordinary client connection, and provides no extra capabilities.
It is also possible to run an external replica of an RDS/MySQL or Aurora/MySQL server on your own EC2 instance (or in your own data center, even). But this isn't likely to tell you a whole lot that you can't learn from the RDS metrics, particularly in light of the above. (Note also, that even replica servers acquire their replication streams using an ordinary client connection back to the master server.)
Avoid the temptation to tweak server parameters. On RDS, most of the defaults are quite sane, and unless you know specifically and precisely why you want to adjust a parameter... don't do it.
The most likely explanation for slow queries... is poorly written queries and/or poorly designed indexes.
If you are not familiar with EXPLAIN SELECT, then you need to learn it, live it, an love it. SQL is declarative, not procedural. That is, SQL tells the server what you want -- not specifically how to obtain it internall. For example: SELECT ... FROM x JOIN y tells the server to match up the rows from table x and y ON a certain criteria, but does not tell the server whether to read from x then find the matching rows in y... or read from y and find the matching rows in x. The net result is the same either way -- it doesn't matter which table the server examines first, internally -- but if the query or the indexes don't allow the server to correctly deduce the optimum path to the results you've requested, it can spend countless hours churning through unnecessary effort.
Take for an extreme and overly-simplified example, a table with millions of rows and table with 1 row. It would make sense to read the small table first, so you know what 1 value you're trying to join in the large table. It would make no sense to read throuh each row in the large table, then go over and check the small table for a match for each of the millions of rows. The order in which you join tables can be different than the order in which the actual joining is done.
And that's where EXPLAIN comes in. This allows you to inspect the query plan -- the strategy the internal query optimizer has concluded will get it to the answer you need with the least amount of effort. This is the core of the magic of relational database systems -- finding the correct solution in the optimal time, based on what it knows about the data. EXPLAIN shows you the order in which the tables are being accessed, how they're being joined, which indexes are being used, and an estimate of the number of rows from each table are involved -- and these numbers multiply together to give you an estimate of the number of permutations involved in resolving your query. Two small tables, each with 50,000 rows, joined without a proper index, means an entirely unreasonable 2,500,000,000 unique combinations between the two tables that must be evaluated; every row must be compared to every other row. In short, if this turns out to be the kind of thing that you are (unknowingly) asking the server to do, then you are definitely doing something wrong. Inspecting your query plan should be second nature any time you write a complex query, to ensure that the server is using a sensible strategy to resolve it.
The output is cryptic, but secret decoder rings are available.
https://dev.mysql.com/doc/refman/5.7/en/explain.html#explain-execution-plan

simultaneous connections to a mysql database

I made a program that receives user input and stores it on a MySQL database. I want to implement this program on several computers so users can upload information to the same database simoultaneously. The database is very simple, it has just seven columns and the user will only enter four of them.
There would be around two-three hundred computers uploading information (not always at the same time but it can happen). How reliable is this? Is that even possible?
It's my first script ever so I appreciate if you could point me in the right direction. Thanks in advance.
Having simultaneous connections from the same script depends on how you're processing the requests. The typical choices are by forking a new Python process (usually handled by a webserver), or by handling all the requests with a single process.
If you're forking processes (new process each request):
A single MySQL connection should be perfectly fine (since the total number of active connections will be equal to the number of requests you're handling).
You typically shouldn't worry about multiple connections since a single MySQL connection (and the server), can handle loads much higher than that (completely dependent upon the hardware of course). In which case, as #GeorgeDaniel said, it's more important that you focus on controlling how many active processes you have and making sure they don't strain your computer.
If you're running a single process:
Yet again, a single MySQL connection should be fast enough for all of those requests. If you want, you can look into grouping the inserts together, as well as multiple connections.
MySQL is fast and should be able to easily handle 200+ simultaneous connections that are writing/reading, regardless of how many active connections you have open. And yet again, the performance you get from MySQL is completely dependent upon your hardware.
Yes, it is possible to have up to that many number of mySQL connectins. It depends on a few variables. The maximum number of connections MySQL can support depends on the quality of the thread library on a given platform, the amount of RAM available, how much RAM is used for each connection, the workload from each connection, and the desired response time.
The number of connections permitted is controlled by the max_connections system variable. The default value is 151 to improve performance when MySQL is used with the Apache Web server.
The important part is to properly handle the connections and closing them appropriately. You do not want redundant connections occurring, as it can cause slow-down issues in the long run. Make sure when coding that you properly close connections.

Restart MySQL server without disrupting users

What are some generally accepted strategies for restarting a MySQL server on a busy website without interrupting current users? I am using a LAMP setup. I don't mind taking down the site for a time if need be, but if certain user activities are interrupted I could wind up with corrupted data. I do have the ability to bring up a second server if that helps in the transition. I need a solution that results in no corrupted data / data loss.
I suspect this could be a common problem without an easy solution, but not sure what the best approach would be. Any guidance would be appreciated.
Thanks, Brian
Any solution for high availability depends on redundancy.
The most popular strategy today is to run two MySQL servers. Configure the two servers to replicate bidirectionally. This comes with its own challenges; you must manage your applications carefully to write to only one server at a time, to avoid creating update conflicts. When you need to restart one MySQL server, switch your apps to use the other MySQL server.
Even with this configuration, you can't make the switchover without interrupting connections, even if the interruption is brief.
Another solution is MySQL Cluster, in which both MySQL Servers and storage are redundant, but this is also complex to set up and manage, requires high-end hardware resources, and shards your data in ways that make it hard to optimize for general SQL queries.

How are "simultaneous client connections" quantified in mysql

Sorry for the newb factor, but I was reading about "Too many connections" to mysql.
http://dev.mysql.com/doc/refman/5.5/en/too-many-connections.html
How are "simultaneous client connections" quantified in mysql?
For example if 20 million people are on gmail (let's say they use mysql with only 1 table to store everything just for sake of example) and all those people simultaneously all click on an email to open up, does that mean there are 20 million simultaneous connections or just one connection since all the users are connecting to the same table?
EDIT: I'm trying to understand what the term 'client' means. Is a 'client' someone who is using the application, or is a 'client' the part of the application (ex. php script) that is connecting to the database?
When a visitor goes to your website and the server-side script connects to the database it is 1 connection - you can make as many queries as necessary during that connection to any number of tables/databases - and on termination of the script the connection ends. If 31 people request a page (and hence a db connection) and your limit is 30, then the 31st person will get an error.
You can upgrade server hardware so MySQL can efficiently handle loads of connections or spread the load across multiple database servers. It is possible to have your server-side scripting environment maintain a persistent connection to MySQL in which case all scripts make queries through that single connection. This will probably have adverse effects on the correct queuing of queries and their order to maintain usable speeds under high load, and ultimately doesn't solve the CPU/memory/disk bottlenecks with handling large numbers of queries.
In the case of a webmail application, the query to check for new messages runs so fast (in the milliseconds) that hitting server limits isn't likely unless it's on a large scale.
Google's applications scale on a level previously unheard of. Check out the docs on MapReduce, GoogleFS, etc. It's awesome.
In answer to your edit - anything that connects directly to MySQL is considered a client in this case. Each PHP script that connects to MySQL is a client, as is the MySQL console on the command line, or anything else.
Hope that helps
The connections mentioned are server connection. Every client has one or more. For example if your php script connects mysql, there may be more web requests at a time and thus more connections to db.
Sometimes you can ran out of them, because they are not closed properly after they become useless.
And I thing Gmail is stored different way than in one large mysql db :]

Mysql primary database usage

Q:
I've inherited a system that consists (for simplicity) of 2 application servers that write to a single master database. One application server performs quite a few operations {small amount of time, like milli seconds. } per unit of time. The other application server acts like an API Server, through which clients interact. This "API" server operates on half the tables in the database most of which are not needed by the other application server. However the "API" server does cause the other application server, through its interaction with SQL Server, to lose time and performance.
I wanted to know what would be a good approach in resolving this.
idea's so far
[1] create a second database which will be master-master slaved with current database. Getting http://mysql-mmm.org/ scripts and running then. (concurrency?)
[2] slowly begin moving tables from "master" database into a new "API" database. (lots of legacy code..)
[3] some kind of a SQL priority queue.. (how fault tolerant can this be?)
Step 1 - work out where your bottleneck is
Step 2 - decide where your best return on effort is
If you simply want to make it perform better, then you have to work out where the slow point is. Ideally you would use 3 hosts, one for each application server and one for the database. In this configuration, you should quickly be able to work out if it is the database working the disks hard, or if it's CPU loading, lock contention etc.
Once you know where the bottleneck is, you'll have a much more focussed problem to fix. The options you have suggested may or may not help depending on what the real bottleneck is.