group_concat performance issue in MySQL - mysql

I added a group_concat to a query and killed the performance. The explain plans are identical before and after I added it, so I'm confused as to how to optimize this.
Here is a simplified version of the query:
SELECT #curRow := #curRow + 1 AS row_number,
docID,
docTypeID,
CASE WHEN COUNT(1) > 1
THEN group_concat( makeID )
-- THEN 'multiple makes found'
ELSE MIN(makeID)
END AS makeID,
MIN(desc) AS desc
FROM simplified_mysql_table,
(SELECT #curRow := 0) r
GROUP BY docID, docTypeID,
CASE WHEN docTypeID = 1
THEN 0
ELSE row_number
END;
Note the CASE statement in the SELECT. The group_concat kills performance. If I comment that line and just output 'multiple makes found' it executes very quickly. Any idea what is causing this?

In the original non-simplified version of this query we had a DISTINCT, which was completely unnecessary and causing the performance issue with group_concat. I'm not sure why it caused such a problem, but removing it fixed the performance issue.

In MySQL, group_concat performance should not kill query performance. It is additional work involving strings, so some slow down is expected. But more like 10% rather than 10X. Can you quantify the difference in the query times?
Question: is MakeID a character string or integer? I wonder if a conversion from integer to string might affect the performance.
Second, what would the performance be for concat(min(MakeId), '-', max(MakedId)) isntead of the group_concat?
Third, does the real group_concat use DISTINCT or ORDER BY? These could slow things down, especially in a memory limited environment.

An idea is to split the query into two parts, one for WHEN docTypeID = 1 and one for the rest.
Try adding an index on (docTypeID, docID, makeID, desc) first:
SELECT
docID,
docTypeID,
CAST(makeID AS CHAR) AS makeID,
`desc`
FROM
simplified_mysql_table,
WHERE
NOT docTypeID = 1
UNION ALL
SELECT
docID,
1,
GROUP_CONCAT( makeID ),
`desc` -- this is not standard SQL
FROM
simplified_mysql_table,
WHERE
docTypeID = 1
GROUP BY
docTypeID,
docID ;

Related

How can I speed up this query with an aliased column?

So I found this code snippet here on SO. It essentially fakes a "row_number()" function for MySQL. It executes quite fast, which I like and need, but I am unable to tack on a where clause at the end.
select
#i:=#i+1 as iterator, t.*
from
big_table as t, (select #i:=0) as foo
Adding in where iterator = 875 yields an error.
The snippet above executes in about .0004 seconds. I know I can wrap it within another query as a subquery, but then it becomes painfully slow.
select * from (
select
#i:=#i+1 as iterator, t.*
from
big_table as t, (select #i:=0) as foo) t
where iterator = 875
The snippet above takes over 10 seconds to execute.
Anyway to speed this up?
In this case you could use the LIMIT as a WHERE:
select
#i:=#i+1 as iterator, t.*
from
big_table as t, (select #i:=874) as foo
LIMIT 875,1
Since you only want record 875, this would be fast.
Could you please try this?
Increasing the value of the variable in the where clause and checking it against 875 would do the trick.
SELECT
t.*
FROM
big_table AS t,
(SELECT #i := 0) AS foo
WHERE
(#i :=#i + 1) = 875
LIMIT 1
Caution:
Unless you specify an order by clause it's not guaranteed that you will get the same row every time having the desired row number. MySQL doesn't ensure this since data in table is an unordered set.
So, if you specify an order on some field then you don't need user defined variable to get that particular record.
SELECT
big_table.*
FROM big_table
ORDER BY big_table.<some_field>
LIMIT 875,1;
You can significantly improve performance if the some_field is indexed.

Getting the top N results of every category in a table

I'd like to extract the top 10 results of a certain category within a table, ordered by date. My table looks like
CREATE TABLE IF NOT EXISTS Table
( name VARCHAR(50)
, category VARCHAR(50)
, date TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
So far I've come up with SELECT category FROM Table GROUP BY category;, this will give me every category in store.
Next I need to run SELECT * FROM Table WHERE category=$categ ORDER BY date DESC LIMIT 10; in some kind of foreach loop for every $categ fed to me by the first instruction.
I'd like to do all of this in MySQL, if possible; I've come across several answers online but they all seem to involve two or more tables, or provide difficult examples that seem hard to understand... It would seem silly to me that something that can be dealt with so simply in server code (doesn't even create that much overhead, apart from the needless storage of the category names) is so difficult to translate into SQL code, but if nothing works that's what I'll end up doing, I guess.
You can use an inline view and user-defined variables to set a "row number" column, and then the outer query can filter based on the "row number" column. (Doing this, we can emulate a ROW_NUMBER analytic function.)
For large sets, this may not be the most efficient approach, but it works reasonably for small sets.
The outer query would look something like this:
SELECT q.*
FROM (
<view_query>
) q
WHERE q.row_num <= 10
ORDER
BY q.category, q.date DESC, q.name
The view query would be something like this
SELECT IF(#cat = t.category,#i := #i + 1, #i := 1) AS row_num
, #cat := t.category AS category
, t.date
. t.name
FROM mytable t
CROSS
JOIN ( SELECT #i := 0, #cat := NULL ) i
ORDER BY t.category, t.date DESC

MySQL query / clause execution order

What is the predefined order in which the clauses are executed in MySQL? Is some of it decided at run time, and is this order correct?
FROM clause
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause
The actual execution of MySQL statements is a bit tricky. However, the standard does specify the order of interpretation of elements in the query. This is basically in the order that you specify, although I think HAVING and GROUP BY could come after SELECT:
FROM clause
WHERE clause
SELECT clause
GROUP BY clause
HAVING clause
ORDER BY clause
This is important for understanding how queries are parsed. You cannot use a column alias defined in a SELECT in the WHERE clause, for instance, because the WHERE is parsed before the SELECT. On the other hand, such an alias can be in the ORDER BY clause.
As for actual execution, that is really left up to the optimizer. For instance:
. . .
GROUP BY a, b, c
ORDER BY NULL
and
. . .
GROUP BY a, b, c
ORDER BY a, b, c
both have the effect of the ORDER BY not being executed at all -- and so not executed after the GROUP BY (in the first case, the effect is to remove sorting from the GROUP BY and in the second the effect is to do nothing more than the GROUP BY already does).
This is how you can get the rough idea about how mysql executes the select query
DROP TABLE if exists new_table;
CREATE TABLE `new_table` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`testdecimal` decimal(6,2) DEFAULT NULL,
PRIMARY KEY (`id`));
INSERT INTO `new_table` (`testdecimal`) VALUES ('1234.45');
INSERT INTO `new_table` (`testdecimal`) VALUES ('1234.45');
set #mysqlorder := '';
select #mysqlorder := CONCAT(#mysqlorder," SELECT ") from new_table,(select #mysqlorder := CONCAT(#mysqlorder," FROM ")) tt
JOIN (select #mysqlorder := CONCAT(#mysqlorder," JOIN1 ")) t on ((select #mysqlorder := CONCAT(#mysqlorder," ON1 ")) or rand() < 1)
JOIN (select #mysqlorder := CONCAT(#mysqlorder," JOIN2 ")) t2 on ((select #mysqlorder := CONCAT(#mysqlorder," ON2 ")) or rand() < 1)
where ((select #mysqlorder := CONCAT(#mysqlorder," WHERE ")) or IF(new_table.testdecimal = 1234.45,true,false))
group by (select #mysqlorder := CONCAT(#mysqlorder," GROUPBY ")),id
having (select #mysqlorder := CONCAT(#mysqlorder," HAVING "))
order by (select #mysqlorder := CONCAT(#mysqlorder," ORDERBY "));
select #mysqlorder;
And here is the output from above mysql query, hope you can figure out the mysql execution of a SELECT query :-
FROM JOIN1 JOIN2 WHERE ON2 ON1 ORDERBY GROUPBY SELECT WHERE
ON2 ON1 ORDERBY GROUPBY SELECT HAVING HAVING
It appears that the generalized pattern in Standard SQL for Logical Query Processing Phase is (at least from SQL-92 - starting on p.177) :
from clause
joined table
where clause
group by clause
having clause
query specification (ie. SELECT)
You can find and download newer Standard SQL Standardization documents from here:
https://wiki.postgresql.org/wiki/Developer_FAQ#Where_can_I_get_a_copy_of_the_SQL_standards.3F
For MSSQL (since it tends to stay farily close to standard in my experience) the Logical Query Processing Phase is generally:
FROM <left_table>
ON <join_condition>
<join_type> JOIN <right_table>
WHERE <where_condition>
GROUP BY <group_by_list>
WITH {CUBE | ROLLUP}
HAVING <having_condition>
SELECT
DISTINCT
ORDER BY <order_by_list>
<TOP_specification> <select_list>
From: Chapter 1 of Ben-Gan, Itzik, et. al., Inside Microsoft SQL Server 2005: T-SQL Querying, (Microsoft Press)
Also: Ben-Gan, Itzik, et. al., Training Kit (Exam 70-461) Querying Microsoft SQL Server 2012 (MCSA), (Microsoft Press)
It should be noted that MySQL can be configured to operate closer to standard as well if desired by setting the SQL Mode (although probably only recommended for fringe cases):
https://dev.mysql.com/doc/refman/8.0/en/sql-mode.html
For MySQL, I searched both MySQL and MariaDB documentation and could find nothing other than the few statements that Gordon Linoff mentioned in passing that were in the MySQL documentation for SELECT. They are:
If ORDER BY occurs within a parenthesized query expression and also is applied in the outer query, the results are undefined and may change in a future version of MySQL.
If LIMIT occurs within a parenthesized query expression and also is applied in the outer query, the results are undefined and may change in a future version of MySQL.
The HAVING clause is applied nearly last, just before items are sent to the client, with no optimization. (LIMIT is applied after HAVING.)
From MySQL JOIN Documentation: Natural joins and joins with USING, including outer join variants, are processed according to the SQL:2003 standard
Given that a quick skim through of SQL-92 from "from clause" to "query specification" showed that the Logic can be conditional at times depending on how the query is written, and given that I could not find anything in the MySQL or MariaDB documentation (not saying it is not there, I just could not find it), and other articles on MySQL's Logical Query Processing Phase were conflicting in their order, it seems that the best way that MySQL gives to determine some sort of Logical Query Processing Phase (or at least the steps used for join optimization for the query plan) for a specific query is to do a trace on the query execution by doing the following (from MySQL documentation "Tracing The Optimizer/Typical Usage"):
# Turn tracing on (it's off by default):
SET optimizer_trace="enabled=on";
SELECT ...; # your query here
SELECT * FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE;
# possibly more queries...
# When done with tracing, disable it:
SET optimizer_trace="enabled=off";
You can interpret the results by looking at MySQL documentation's Tracing Example.
Basically, it appears that you want to look for "join_optimization" (which says that the from/join statements are being evaluated for the specific query, the specific query being the one stated as "Select#"), then look for "condition_processing: condition", and then "clause_processing: clause". As it says in the MySQL documentation for General Trace Structure:
A trace follows closely the actual execution path: there is a join-preparation object, a join-optimization object, a join-execution object, for each JOIN... It is far from showing everything happening in the optimizer, but we plan to show more information in the future.
Interesting enough, I found that running a query like the following in MySQL gave me that its apparent process order for query optimization was FROM,WHERE,HAVING,ORDER BY, and then GROUP BY:
SELECT
a.id
, max(a.timestamp)
FROM database.table AS a
LEFT JOIN database.table2 AS b on a.id = b.id
WHERE a.id > 1
GROUP BY a.id HAVING max(a.timestamp) > 0
ORDER BY a.id
I am assuming that since "condition_processing" and "clause_processing" are within the "Select#" group that these are processed before SELECT - which lines up with SQL-99, but it is an assumption.
In terms of operators and variables, Zawodny, Jeremy D., et. al., High Performance MySQL, 2nd Edition, (O'Reilly) states that:
The := assignment operator has lower precedence than any other operator, so you have to be careful to parenthesize explicitly.
I only mention this since sometimes it may not be order of Logical Query Processing Phase as much as precedence of assignment operator when working with a variable, or variables, that could be an issue for troubleshooting if the query is not executing as thought.
I think the execution order is like this:
(7) SELECT
(8) DISTINCT <select_list>
(1) FROM <left_table>
(3) <join_type> JOIN <right_table>
(2) ON <join_condition>
(4) WHERE <where_condition>
(5) GROUP BY <group_by_list>
(6) HAVING <having_condition>
(9) ORDER BY <order_by_condition>
(10) LIMIT <limit_number>[, <offset_number>]

Efficiently getting the previous record (by index), in MySQL

I have an composite index (on 'name' VARCHAR(50) and 'ts' TIMESTAMP columns).
For a particular record, I want to find the preceding record, by time, with the same 'name' (if it exists).
(When using low-level non-relational DBs, such as D-ISAM, this was quick and trivial.)
How can I do this in MySQL?
Here is an approach. Find all rows that have the same name as the record you are looking for. Then order by ts in a reverse order and choose the first one:
select t.*
from table t join
(select * from table t where <record = your record>
) therec
on t.name = therec.name and t.ts < therec.ts
order by t.ts desc
limit 1;
All the help I could find was along the lines of #GordonLinoff's answer.
In the end, I solved the problem by adding a 'seq'uence number to each record, showing how far along the index it was.
This is only a solution where the data is append-only, but this is my use case.
I added the values with:
SET #num := 0;
UPDATE my_table
SET seq = #num := #num + 1
WHERE name = "my name"
ORDER BY ts, id
This worked but is ugly. It runs in roughly O(n), whereas subquery approaches are O(N^2) which is impractical for anything more than a few thousand rows.

Optimizing a mySQL query with functions

I'm running this query (via PHP) and wondering if there is a faster way to get the same result:
SELECT
date(date_time) as `date`,
unix_timestamp(date(date_time)) as `timestamp`,
month(date(date_time)) as `month`,
dayname(date(date_time)) as `dayname`,
dayofmonth(date(date_time)) as `daynum`,
hour(date_time) as `hour`, minute(date_time) as `increment`
FROM loc_data
WHERE loc_id = 2
As you can see I'm performing the date(date_time) function 5 times but would like to store the result of the first and use that result from then on. Would this increase performance of the query? The query is called many thousands of times in a script. (When I perform the functions in PHP instead of mySQL I get no big difference in speed from the current query above.)
Have you tried with SQL variables?
select #a from (select #a := 1) a
That works, with your query
SELECT
#n as `date`,
unix_timestamp(#n) as `timestamp`
FROM loc_data l,
(select #n := date(l.date_time)) a
WHERE l.loc_id = 2
But I'm not sure if this will work for you.