I was wondering if there is way to store where clause conditions and not calculate them more than once in order to determine which one was satisfied.
Here is what I am talking about:
select col, col1>5 cond1, col2<400 cond2 from table where col1>5 or col2<400
I don't think you should worry about calculating the simple comparisons like col1 > 5 more than once for each row, but in order to save typing in a more complex query, you may do like this:
SELECT col, col1 > 5 AS cond1, col2 < 400 AS cond2
FROM table
HAVING cond1 or cond2
Using having instead of where gives access to the aliases introduced in the select clause.
Related
In project we use MySQL table for analytics. It is big table 40+ columns, more than 10kk rows.
A big part of time in query take result calculation (50+ cols in result). Idea is to reuse calculated values as variables and make it faster.
Query example:
SELECT col1, SUM(col2) as s_col2, SUM(col3) as s_col3, AVG(col2) as a_col2, ...,
SUM(col2)/SUM(col3) as aaa,
ROUND(AVG(col4), 2) as a_col4,
ROUND(SUM(col5), 2) as s_col5,
ROUND(SUM(col5)/AVG(col4), 2) as zzz,
...
JOIN ...
GROUP_BY ...
ORDER BY ...
Idea is to use #variable, for example:
SELECT col1, #s_col2 := SUM(col2) as s_col2, #s_col3 := SUM(col3) as s_col3, ...,
#s_col2/#s_col3 as aaa,
It works only for a few variables which are outside function, but I don't need additional columns for every variable.
#a_col4 := AVG(col4), #I don`t need this column
#s_col5 := SUM(col5), #I don`t need this column
ROUND(#a_col4, 2) as a_col4,
ROUND(#s_col5, 2) as s_col5,
ROUND(#s_col5/#a_col4, 2) as zzz,
How I can assign variables inside functions?
ROUND(#a_col4 := AVG(col4), 2) as a_col4, #not works
ROUND((#s_col5 := SUM(col5)), 2) as s_col5, #not works
ROUND(#s_col5/#a_col4, 2) as zzz,
UPDATED:
Thanks guys for your help.
The MySQL engine is probably smart enough to compute the value of
SUM(col5) only once
I am not sure because for a big quantity of columns
SUM(col1) as a1,
SUM(col1) as a2,
SUM(col1) as a3,
SUM(col1) as a4,
SUM(col1) as a5,
Is slower than
#a1 := SUM(col1) as a1,
#a1 as a2,
#a1 as a3,
#a1 as a4,
#a1 as a5,
you can also use CTE for reusing table results
I tried, but some values use too many functions one inside another, sometimes 7 levels deep and not all variables can be reused in this way (COALESCE(ROUND(COALESCE(ROUND(SUM(AVG(IF..AND...OR...AND)...
All my changes, (15 variables) have very small effect, for the small period it takes 139 sec (was 151 sec), but some of our reports take a few hours and we need stronger optimisation.
We will try to analyse server bottlenecks, maybe use partitioning, sharding...
As a general rule, the number of rows touched is much more important
to how long a query will take than the functions being evaluated.
The number of rows is always big, a lot of indexes and it works really fast. If I comment columns where we need calculations and only select existing it will take 40 sec (instead of 150)
The qwy to do it in sql is to use the Select withe with the sum and average as basis for an outer select
SELECT *,
s_col2/s_col3 as aaa,
ROUND(a_col4, 2) as a_col4,
ROUND(acol5, 2) as s_col5,
ROUND(s_col5/a_col4, 2) as zzz.
...
FROM
(SELECT col1, SUM(col2) as s_col2, SUM(col3) as s_col3, AVG(col2) as a_col2, ...,
...
JOIN ...
GROUP_BY ...) t1
ORDER BY ...
in MySQL 8 you can also use CTE for reusing tbale rewsults see manual
Don't worry.
As a general rule, the number of rows touched is much more important to how long a query will take than the functions being evaluated.
A similar question comes from the choice between these:
SELECT foo, COUNT(*) FROM x GROUP BY foo ORDER BY COUNT(*) DESC LIMIT 5;
SELECT foo, COUNT(*) FROM x GROUP BY foo ORDER BY 1 DESC LIMIT 5;
I often do the latter because it is fewer keystrokes. I have not been able to determine whether it is faster.
I suggest you write your question in the simplest or clearest way. A subquery (or CTE) may actually be clearer in spite of taking more keystrokes.
Clarity and correctness are more important than speed.
And beware -- With both JOIN and GROUP BY in the query, you may have incorrect results. The JOIN is done before the aggregation; the GROUP BY comes after. Check to see if COUNT or SUM is bigger than it should be. If so, you will need a subquery or CTE.
I have a code like this:
SELECT column1 = (SELECT MAX(column-name21) FROM table-name2 WHERE condition2 GROUP BY id2) as m,
column2 = (SELECT count(*) FROM table-name2 WHERE condition2 GROUP BY id2) as c,
column-names
FROM table-name
WHERE condition
ORDER BY ordercondition
LIMIT 25,50
those internal selects are quite long and complicated.
My question is are there in mysql language contracts, which allow one to avoid duplicating code and computations in this case?
For example, something like this
SELECT (column1, column2) = (SELECT MAX(column-name1) as m, count(*) as c FROM table-name WHERE condition GROUP BY id),
column-names
FROM table-name
WHERE condition
ORDER BY ordercondition
LIMIT 25,50
which of course won't be interpreted by mysql.
I tried this:
SELECT (SELECT MAX(column-name1) as column1, count(*) as column2 FROM table-name WHERE condition GROUP BY id),
column-names
FROM table-name
WHERE condition
ORDER BY ordercondition
LIMIT 25,50
and it also doesn't work.
Such subqueries get cumbersome when you need more than one from the same source. Usually, the "fix" is to us a "derived table" and JOIN:
SELECT x2.col1, x2.col2, names
FROM ( SELECT MAX(c21) AS col1,
COUNT(*) AS col2,
?? -- may be needed for "cond2"
FROM t2
WHERE cond2a ) AS x2
JOIN t1
ON cond2b
WHERE cond1
ORDER BY ??? -- Limit is non-deterministic without ORDER BY
LIMIT 25, 50
If the "condition" in the subquery is "correlated", please specify it; it makes a big difference in how to transform the query.
The construct COUNT(col) is usually a mistake:
COUNT(*) -- the number of rows.
COUNT(DISTINCT col) -- the number of different values in column `col`.
COUNT(col) -- count the number of rows with non-NULL `col`.
Please provide your actual query and provide SHOW CREATE TABLE. I sloughed over several issues; "the devil is in the details".
for Edit 1
INDEX(tool, uuuuId) -- would help performance
Is uuuuId some form of "hash" or "UUID"? If so, that is relevant to seeing how the performance works. Also, how big (approximately) are the tables? What is the value of innodb_buffer_pool_size. (I am fishing for whether you are I/O-bound or CPU-bound.)
WZ needs INDEX(uuuuId, ppppppId, check1) But actually, that Select...=Yes can be turned and EXISTS for some speedup.
Z might benefit from INDEX(check1, uuuuId, ppppppId, check2)
Since Z and WZ are the same table, this might take care of both:
INDEX(ppppppId, uuuuId, check1, check2)
(The order is important.)
I'm updating an old website and one of the queries isn't working anymore:
SELECT * FROM tbl WHERE col1 IS NULL GROUP BY col2 ORDER BY col2
I noticed if I dropped the GROUP BY it works, but the result set doesn't match the original:
SELECT * FROM tbl WHERE col1 IS NULL ORDER BY col2
So I tried reading up on GROUP BY in the docs to see what might be the issue, and it seemed to suggest not using * to select all the fields, but explicitly using the column name so I tried it with just the column that was being ordered and grouped:
SELECT col2 FROM tbl WHERE col1 IS NULL GROUP BY col2 ORDER BY col2
Which works but after looking through the code the query requires 2 columns in the query so whoever added * was overdoing it, but if I add that column produces an error, similarly adding a third column produces the same error:
SELECT col2, col3 FROM tbl WHERE col1 IS NULL GROUP BY col2 ORDER BY col2
SELECT col1, col2, col3 FROM tbl WHERE col1 IS NULL GROUP BY col2 ORDER BY col2
Can anyone tell me why this last query doesn't work? I can't decipher why from the docs, but this is the minimum query required to get the result set I need.
Running the query in Adminer I get this error
Error in query (1055): Expression #2 of SELECT list is not in GROUP BY
clause and contains nonaggregated column 'name.table.column'
which is not functionally dependent on columns in GROUP BY clause; this is
incompatible with sql_mode=only_full_group_by
You need to be careful when you use GROUP BY. Once you understand what GROUP BY does, you will know the issue yourself. It does an aggregation on your data or in other words, it reduces your data by doing some operation on the raw entries and creating new reduced number of entries on which some aggregation function has been applied(SUM, COUNT, AVG, etc.)
The fields you provide in the GROUP BY clause represents the level of aggregation/roll-up you are going for.
SELECT col2, col3 FROM tbl WHERE col1 IS NULL GROUP BY col1 ORDER BY col1
Here you are trying to do the aggregation at col1 level, meaning that for every distinct value present in column col1, there will be some operation done on some other columns you provide in SELECT clause(here col2,col3) so that in the output you have non-repeating values in col1 and some rolled-up values of col2 and col3 against each distinct col1 value based on what function you apply(SUM, COUNT, AVG, etc.).
How do you apply this function? That is what is missing in your above query. To solve it, you need to apply some aggregation function on the fields that are present in the SELECT clause but not in GROUP BY clause. Taking an example of SUM, try this:
SELECT SUM(col2), SUM(col3) FROM tbl WHERE col1 IS NULL GROUP BY col1 ORDER BY col1
OR for a better idea, removing WHERE filter and checking the output by running:
SELECT col1, SUM(col2), SUM(col3) FROM tbl GROUP BY col1 ORDER BY col1
Additionally, the reason why your other query
SELECT col2 FROM tbl WHERE col1 IS NULL GROUP BY col2 ORDER BY col2
worked is because you need not apply aggregation to the field(here col2) which is present in the GROUP BY clause.
First of all, when query() returns false, you should find out what the error was. You seem to be using PDO, so I will direct you to this page: http://php.net/manual/en/pdo.error-handling.php
TL;DR - you should enable PDO exceptions, or else you need to write code to check the result of every call to query(), prepare(), and execute() to see if an error occurred. And if so, use errorInfo() to find out the actual error. Doing anything else is flying blind!
Error in query (1055): Expression #2 of SELECT list is not in GROUP BY
clause and contains nonaggregated column 'webvictoria.cats_oct.matchLink'
which is not functionally dependent on columns in GROUP BY clause; this is
incompatible with sql_mode=only_full_group_by
This is a common issue. See dozens of questions tagged mysql-error-1055.
I guess you just upgraded to MySQL 5.7. MySQL 5.7 enabled strict mode by default, so I guess you just upgraded. Prior to MySQL 5.6, strict mode was optional and not enabled by default.
See: https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html
You can't write ambiguous queries. If you GROUP BY col2, which value in the group of rows of each group should be used for col1 and col3? It's ambiguous.
Without strict mode, MySQL chooses an arbitrary row from the group. With strict mode, it reverts to standard SQL behavior, and disallows the ambiguous query. This is how most other brands of SQL database behave, by the way.
To fix it, you must follow this rule: Every column in your select list must be one of:
A column in your GROUP BY clause
A column functionally dependent on the columns in your GROUP BY clause (so there can only be one value)
Used in an aggregate function like MIN(), MAX(), COUNT(), SUM(), AVG(), or GROUP_CONCAT()
Some people choose to disable strict mode in MySQL 5.7 for the sake of "getting the code working again." But it isn't working—it's just giving ambiguous results like it did before MySQL 5.7.
It's better to fix the logic of your queries.
This query:
SELECT *
FROM tbl
WHERE col1 IS NULL
GROUP BY col1
ORDER BY col1;
never really worked. It may have seemed to work, but you were just lucky. You have unaggregated columns in the SELECT. These come from an arbitrary row.
You can do something like this to get values from other columns:
SELECT col1, min(col2), min(col3)
FROM tbl t
WHERE col1 IS NULL AND
GROUP BY col1
ORDER BY col1;
The reason it didn't work is because you need to use one of the selection criteria in the GROUP BY and the ORDER BY. So if you wanted to group by col1, you would need to do this:
SELECT col1, col2, col3
FROM tbl
WHERE
col1 IS NULL
GROUP BY col1
ORDER BY col1
;
Without selecting that field, you are basically saying "Hey go get me every phone number in California" Then after you get that you say "Now order them by first name and group them by last name" and DBMS says "but... I don't have any of that"
try this
SELECT col2, col3 FROM tbl WHERE col1 IS NULL GROUP BY col2, col3 ORDER BY col2, col3
SELECT * FROM table WHERE (a!=1 AND b!=2)
How can I select all rows where a is not 1 and b is not 2 in the same time?
That is - in PHP it'd be ($a!=1 && $b!=2), bringing rows for example like
a=1, b=6,
a=5, b=2,
...
(but not a=1, b=2)
What I am getting with MySQL equals to PHP's $a!=1 && $b!=2 (the same just without brackets), bringing rows like
a=7, b=9,
a=4, b=0,
...
(but never a=1, and never b=2)
Based on your example, you want or, not and:
SELECT t.*
FROM table t
WHERE a <> 1 OR b <> 2;
Although MySQL (and other databases) support != for inequality, the standard is =. If you want to allow NULL values, you might want:
SELECT t.*
FROM table t
WHERE a <=> 1 OR b <=> 2;
<=> is a NULL-safe inequality operator specific to MySQL.
I need to calculate the sum of one column(col2) , but the column has both numbers and text. How do I exclude the text alone before I use sum()?
The table has around 1 million rows, so is there any way other than replacing the text first?
My query will be :
Select col1,sum(col2) from t1 group by col1,col2
Thanks in advance
You can use regexp to filter the column:
Select col1,sum(col2) from t1 WHERE col2 REGEXP '^[0-9]+$' group by col1,col2
You could use MySQL built in REGEXP function.
to learn more visit : https://dev.mysql.com/doc/refman/5.1/en/regexp.html
Or another way is using CAST or CONVERT function
to learn in detail : https://dev.mysql.com/doc/refman/5.0/en/cast-functions.html
Hope this is helpful
Assuming you mean the number is at the beginning of the tex, the easiest way is simply to use implicit conversion:
Select col1, sum(col2 + 0)
from t1
group by col1, col2;
If col2 starts with a non-numeric character, then MySQL will return 0. Otherwise, it will convert the leading numeric characters to a number.
Note that your query doesn't really make sense, because you are aggregating by col2 as well as including it in the group by. I suspect you really want:
Select col1, sum(col2 + 0)
from t1
group by col1;