Can MySQL parallelize UNION subqueries (or anything at all)? - mysql

I use a partitioned table with a large amount of data. According to MySQL docs, it is on the ToDo list that:
Queries involving aggregate functions such as SUM() and COUNT() can
easily be parallelized.
... but, can I achieve the same functionality using UNION subqueries? Are they parallelized, or do I have to create a multithreaded client to run concurrent queries with all the possible partition keys?
Edit:
The question is not strictly about UNION or subqueries. I would like to utilize as many cores as possible for my queries. Is there any way to do this (and make sure it's done) without paralellizing my application?
Any good documentation about MySQL's current parallelizing capabilities?

As far as I know, currently the only way to use more than one thread/core to run queries in your application, is to use more than one connection. This of course makes it impossible to run parallel queries that are part of a single transaction.

The different queries that are UNIONed together in one larger query aren't really subqueries, strictly speaking.
The queries are run in order
The data type of the columns is determined by the first query
By default, identical rows are dropped (UNION defaults to DISTINCT)
The result set is not finished building until all queries are run
...there is no way to parallelize the different queries, as they are all really part of the same query.
You may want to try runing the different queries in parallel from your code, and then mashing the results up together in your code once the queries all complete.
The documentation on UNIONs can be found here.

I think a similar question was answered here.
http://forums.mysql.com/read.php?115,84453,84453
(May be I should have posted this as a comment, but I honestly couldn't find a comment button anywhere around here.)

Related

get MySQL query list of joins that do not use indexes

I see this error in phpmyadmin:
The number of joins that do not use indexes
but I never use join in my code.
Now I want get a list of query that do not use indexes.
How can I get this list?
I tried enabling slow query log, but I can not understand which query is not use indexes.
Can someone guide me?
There is no list of "joins not using indexes".
Certain admin queries use VIEWs that have JOINs; possibly that is where they came from.
There are valid cases in which not using an index is OK for a JOIN. A simple example is joining two tables where one of the tables has so few rows that an index would not help with performance.
The slowlog provides something more important -- a list of the "worst" queries (from a performance point of view). Any "slow" query with JOIN that needs an index will show up, even without that setting turned on. (I prefer to turn off that option, since it clutters the slowlog with unexciting queries.)
I'll briefly mention that this isn't an error, it's not even really a warning. The Advisor tab is meant to make broad and generic performance suggestions that are meant to guide you towards optimizing your system. Having some of these suggestions that aren't fixable or don't apply to your situation is pretty normal. In fact, my test system gives the same advice about the join without an index.
As Rick James alluded to, these queries might not come directly from code that you write...some administrative tasks may be triggering it (your operating system might run some housekeeping queries, MySQL runs some queries against itself, etc).
The best way to look at the queries is to log them, which is answered in Log all queries in mysql. You could use the "Process" sub tab of the "Status" area in phpMyAdmin (very similar to how you get to the Advisor tab) to look at active queries, but that's not going to give you a complete picture over time. Once you have that list of queries, you can analyze them to determine if there are improvements you can make to the code.

Are there instances where subquery or multiple query solutions are the only solution—where joins won't work?

I've just been getting used to MySQL, and I'm trying really hard to make sure that my queries are quick, and able to be used on another database system if I choose to move it. One of the things I've been trying to avoid is querying twice, or using subqueries. I like the idea of views, and so I try to use them when I can. Recently I've run into an instance where I needed help and the response I got used a subquery... I'm almost certain that the GROUP BY aggregate needed to be used, and I was wondering if I can sneak my way out of it, but the bigger question is, are there instances where subqueries or multiple queries are absolutely necessary?
Thanks

CakePHP: Is it possible to force find() to run a single MySQL query

I'm using CakePHP 2.x. When I inspect the sql dump, I notice that it's "automagic" is causing one of my find()s to run several separate SELECT queries (and then presumably merging them all together into a single pretty array of data).
This is normally fine, but I need to run one very large query on a table of 10K rows with several joins, and this is proving too much for the magic to handle because when I try to construct it through find('all', $conditions) the query times out after 300 seconds. But when I write an equivalent query manually with JOINS, it runs very fast.
My theory is that whatever PHP "magic" is required to weave the separate queries together is causing a bottleneck for this one large query.
Is my theory a plausible explanation for what's going on?
Is there a way to tell Cake to just keep it simple and make one big fat SELECT instead of it's fancy automagic?
Update: I forgot to mention that I already know about $this->Model->query(); Using this is how I figured out that the slow-down was coming from PHP magic. It works when we do it this way, but it feels a little clunky to maintain the same query in two different forms. That's why I was hoping CakePHP offered an alternative to the way it builds up big queries from multiple smaller ones.
In cases like this where you query tables with 10k records you shouldn't be doing a find('all') without limiting the associations, these are some of the strategies you can apply:
Set recursive to 0 If you don't need related models
Use Containable Behavior to bring only the associated models you need.
Apply limits to your query
Caching is a good friend
Create and destroy associations on the fly As you need.
Since you didn't specify the problem I just gave you general ideas to apply depending on the problem you have

Use subselects or multiple queries?

I'm wondering what's best. At this moment I have 3 'activation' codes for certain functionality within our back-end (shop)software. These three codes are checked for validity over 3 queries at this moment. This can also be done by using 1 query with subselects. The point is that in the future more and more codes can be added and what is considered the best practise in this situation? The perspective I'm interested in is reducing load on the DB-server and get the best performance in this scenario. (Indexes are set properly, ofcourse)
I think, almost the only scenario where breaking the query into several makes sence, is when results of some of them is cached. That way the overal permormance of them might be better.
Another scenario might be when you want to move business logic out of the DB to the application, even though the performance might degrade.
Otherwise, I would use a single query.
One wise query is practically always better, than several queries.
In most cases the best way is to rewrite you queries so that you could be able to retreive the info required with one query.
BTW, subqueries are treated like joins by the internal optimzer, so sometimes it's useful to learn how to write sql queries with joins and staff.

How to iteratively optimize a MySQL query?

I'm trying to iteratively optimize a slow MySQL query, which means I run the query, get timing, tweak it, re-run it, get timing, etc. The problem is that the timing is non-stationary, and later executions of the query perform very differently from earlier executions.
I know to clear the query cache, or turn it off, between executions. I also know that, at some level, the OS will affect query performance in ways MySQL can't control or understand. But in general, what's the best I can do wrt this kind of iterative query optimization, so that I can compare apples to apples?
Your best tool for query optimization is EXPLAIN. It will take a bit to learn what the output means, but after doing so, you will understand how MySQL's (horrible, broken, backwards) query planner decides to best retrieve the requested data.
Changes in the parameters to the query can result in wildly different query plans, so this may account for some of the problems you are seeing.
You might want to consider using the slow query log to capture all queries that might be running with low performance. Perhaps you'll find that the query in question only falls into the low performance category when it uses certain parameters?
Create a script that runs the query 1000 times, or whatever number of iterations causes the results to stabilize.
Then follow your process as described above, but just make sure you aren't relying on a single execution, but rather an average of multiple executions, because you're right, the results will not be stable as row counts change, and your machine is doing other things.
Also, try to use a wide array of inputs to the query, if that makes sense for your use case.