Is HAVING ever necessary for non-aggregated grouped columns? - mysql

I have some code that generates SQL and need to understand if HAVING is ever necessary (or useful) for non-aggregated grouped columns? I haven't found any examples that suggest it is but wanted to check here.
The MySQL docs has this comment "The SQL standard requires that HAVING must reference only columns in the GROUP BY clause or columns used in aggregate functions."
I know that HAVING is necessary for aggregated conditions on groups, and also understand that WHERE can be used for non-aggregated grouped columns (which can be more efficient than having), but my questions is this:
Is HAVING ever necessary (or useful) for non-aggregated grouped columns?
Thanks

HAVING is specifically designed for aggregated columns. MySQL allows non-aggregated columns in the HAVING clause. There are three use-cases, that I can think of:
An efficiency hack, when the values of the group having identical values for the group.
An error, which should be avoided.
An efficiency hack, when there is no aggregation.
The first could conceivably be used in a situation like this:
select l.*, sum(x.y)
from list l join
. . .
group by l.listid
having l.foo = 'bar';
This works because all l.foo should have the same value for a given l.listid (assuming l.listid is a primary key). In this case, this filters the data as if you used where.
BUT, if this condition is not true, then the HAVING/WHERE equivalence is not true. The HAVING will choose a value from an indeterminate row and then filter the resulting aggregation column. The WHERE does the filtering before the aggregation. So, if lists could have the same type and you do:
select l.*, sum(x.y)
from list l join
. . .
group by l.type
having l.foo = 'bar';
This is a badly formed query (hence an error in my opinion), but is not equivalent to moving the condition to the WHERE.
The third situation is where there is no aggregation:
select l.*, concat('a', 'b', 'c') as test
from list l
having test = 'abc';
This is a convenience in MySQL. Other dialects would use a subquery. MySQL materializes subqueries, introducing inefficiency.

No.
WHERE condition is applied before the rows are grouped, HAVING is applied on the grouped rows. If no aggregated column is used, you're selecting based on a single row value anyways, so the only (semantic) difference is whether the rows will be selected before grouping or after it - but the result will be the same.
(Note that this difference may not even be true in practice, the optimizer might reorder the operations.)

Related

what is the mean of this sql grammar: having 1?

My requirements are: I now have a table, I need to group according to one of the fields, and get the latest record in the group, and then I search the scheme on the Internet,
SELECT
* FROM(
SELECT
*
FROM
record r
WHERE
r.id in (xx,xx,xx) HAVING 1
ORDER BY
r.time DESC
) a
GROUP BY
a.id
, the result is correct, but I can't understand the meaning of "having 1" after the where statement. I hope a friend can give me an answer. Thank you very much.
It does nothing, just like having true would. Presumably it is a placeholder where sometimes additional conditions are applied? But since there is no group by or use of aggregate functions in the subquery, any having conditions are going to be treated no differently than where conditions.
Normally you select rows and apply where conditions, then any grouping (explicit, or implicit as in select count(*)) occurs, and the having clause can specify further constraints after the grouping.
Note that your query is not guaranteed to give the results you want; the order by in the subquery in theory has no effect on the outer query and the optimizer may skip it. It is possible the presence of having makes a difference to the optimizer, but that is not something you should rely on, certainly from one version of mysql to another.

Where clause with one column and multiple criteria returning one row instead of13

I have a simple query with a few rows and multiple criteria in the where clause but it is only returning one row instead of 13. No joins and the syntax was triple checked and appears to be free of errors.
Query:
select column1, column2, column3
from mydb
where onecolumn in (number1, number2....number13)
Results:
returns one row of data associated with a random number in the where clause
spent a big part of the day trying to figure this one out and am now out of ideas. Please help...
Absent a more detailed test case, and the actual SQL statement that is actually running, this question cannot be answered. Here are some "ideas"...
Our first guess is that the rows you think are going to satisfy the predicates aren't actually satisfying all of the conditions.
Our second guess is that you've got an aggregate expression (COUNT(), MAX(), SUM()) in the SELECT list that's causing an implicit GROUP BY. This is a common "gotcha"... the non-standard MySQL extension to GROUP BY which allows non-aggregates to appear in the SELECT list, which are not also included as expressions in the GROUP BY clause. This same gotcha appears when the GROUP BY clause is omitted entirely, and an aggregate is included in the SELECT list.
But the question doesn't make any mention of an aggregate expression in the SELECT list.
Our third guess is another issue that beginners frequently overlook: the order of precedence of operations, especially AND and OR. For example, consider the expressions:
a AND b OR c
a AND ( b OR c )
( a AND b ) OR c
consider those while we sing-along, Sesame Street style,...: "One of these things is not like the others, one of these things just doesn't belong..."
A fourth guess... if it wasn't for the row being returned having a value of onecolumn as a random number in the IN list... if it was instead the first number in the IN list, we'd be very suspicious that the IN list actually contains a single string value that looks like a list a values, but is actually not.
The two expression in the SELECT list look very similar, but they are very different:
SELECT t.n IN (2,3,5,7) AS n_in_list
, t.n IN ('2,3,5,7') AS n_in_string
FROM ( SELECT 2 AS n
UNION ALL SELECT 3
UNION ALL SELECT 5
) t
The first expression is comparing n to each value in a list of four values.
The second expression is equivalent to t.n IN (2).
This is a frequent trip up when neophytes are dynamically creating SQL text, thinking that they can pass in a string value and that MySQL will see the commas within the string as part of the SQL statement.
(But this doesn't explain how a some the random one in the list.)
Those are all just guesses. Those are some of the most frequent trip ups we see, but we're just guessing. It could be something else entirely. In it's current form, there is no definitive "answer" to the question.

SQL Query seemly returning an incomplete result set

The following query returns many correct rows, but does not return a row for seed = '1985.00-Miller-13' (there are others missing too but this is just one example):
SELECT g.dam_alias "Seed"
FROM genetic g LEFT OUTER JOIN (genetic g1d)
ON (g.dam_alias = g1d.genetic_alias)
GROUP BY g1d.dam_alias , g1d.sire_alias;
However if I add a WHERE clause to the query specifying the row that I think is missing, it shows up. Here is the modified query:
SELECT g.dam_alias "Seed"
FROM genetic g LEFT OUTER JOIN (genetic g1d)
ON (g.dam_alias = g1d.genetic_alias)
WHERE g.dam_alias = '1985.00-Miller-13' -- this is the added line
GROUP BY g1d.dam_alias , g1d.sire_alias;
If my original query indeed should not have returned the row for the seed "1985.00-Miller-13", I would have expected the second query to return no rows.
At first I suspected that my keys/indexes were corrupt and so I did a db dump and rebuilt from the resulting sql script. I have replicated the problem using MYSQL v5.6 and MariasDB v 10.0.17
I have hand inspected the data and walked through the query on paper and find nothing that is inconsistent with my expected results.
Any suggestions would be greatly appreciated. I can provide any additional information/schema/data that anyone might need.
Thanks.
You're grouping on g1d.dam_alias, but selecting g.dam_alias.
Most other RDBMS products do not allow the selection of unaggregated columns from within a group, because it is ambiguous from which record within the group a value should be returned. MySQL does however permit this operation as a performance enhancement, although the documentation is clear that the results in such cases are indeterminate:
See MySQL Handling of GROUP BY (emphasis added):
MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.
What's (presumably—we cannot say for certain without seeing the underlying data) happening is that g.dam_alias = '1985.00-Miller-13' exists within some groups, but different values of g.dam_alias from other records within those groups are selected instead. When you add the filter, there are no other values to select and consequently the value that is selected is guaranteed to be the one you expect.
It's difficult to make a recommendation for fixing this problem without understanding the semantics of your desired query.
You are using left outer join and the group by references the second table. These values could be NULL. Take the column from the first table:
SELECT g.dam_alias "Seed"
FROM genetic g LEFT OUTER JOIN
genetic g1d
ON g.dam_alias = g1d.genetic_alias
GROUP BY g.dam_alias, g1d.sire_alias;
---------^

Huge performance difference between two similar SQL queries

I have two SQL queries that provides the same output.
My first intuition was to use this:
SELECT * FROM performance_dev.report_golden_results
where id IN (SELECT max(id) as 'id' from performance_dev.report_golden_results
group by platform_id, release_id, configuration_id)
Now, this took something like 70 secs to complete!
Searching for another solution I tried something similar:
SELECT * FROM performance_dev.report_golden_results e
join (SELECT max(id) as 'id'
from performance_dev.report_golden_results
group by platform_id, release_id, configuration_id) s
ON s.id = e.id;
Surprisingly, this took 0.05 secs to complete!!!
how come these two are so different?
thanks!
First thing which Might Cause the Time Lag is that MySQL uses 'semi-join' strategy for Subqueries.The Semi Join includes Following Steps :
If a subquery meets the preceding criteria, MySQL converts it to a
semi-join and makes a cost-based choice from these strategies:
Convert the subquery to a join, or use table pullout and run the query
as an inner join between subquery tables and outer tables. Table
pullout pulls a table out from the subquery to the outer query.
Duplicate Weedout: Run the semi-join as if it was a join and remove
duplicate records using a temporary table.
FirstMatch: When scanning the inner tables for row combinations and
there are multiple instances of a given value group, choose one rather
than returning them all. This "shortcuts" scanning and eliminates
production of unnecessary rows.
LooseScan: Scan a subquery table using an index that enables a single
value to be chosen from each subquery's value group.
Materialize the subquery into a temporary table with an index and use
the temporary table to perform a join. The index is used to remove
duplicates. The index might also be used later for lookups when
joining the temporary table with the outer tables; if not, the table
is scanned.
But giving an explicit join reduces these efforts which might be the Reason.
I hope it helped!
MySQL does not consider the first query as subject for semi-join optimization (MySQL converts semi joins to classic joins with some kind of optimization: first match, duplicate weedout ...)
Thus a full scan will be made on the first table and the subquery will be evaluated for each row generated by the outer select: hence the bad performances.
The second one is a classic join, what will happen in this case that MySQL will compute the result of derived query and then matches only values from this query with values from first query satisfying the condition, hence no full scan is needed on the first table (I assumed here that id is an indexed column).
The question right now is why MySQL does not consider the first query as subject to semi-join optimization: the answer is documented in MySQL https://dev.mysql.com/doc/refman/5.6/en/semijoins.html
In MySQL, a subquery must satisfy these criteria to be handled as a semijoin:
It must be an IN (or =ANY) subquery that appears at the top level of the WHERE or ON clause, possibly as a term in an AND expression. For example:
SELECT ...
FROM ot1, ...
WHERE (oe1, ...) IN (SELECT ie1, ... FROM it1, ... WHERE ...);
Here, ot_i and it_i represent tables in the outer and inner parts of the query, and oe_i and ie_i represent expressions that refer to columns in the outer and inner tables.
It must be a single SELECT without UNION constructs.
It must not contain a GROUP BY or HAVING clause.
It must not be implicitly grouped (it must contain no aggregate functions).
It must not have ORDER BY with LIMIT.
The STRAIGHT_JOIN modifier must not be present.
The number of outer and inner tables together must be less than the maximum number of tables permitted in a join.
Your subquery use GROUP BY hence semi-join optimization was not applied.

Query for multiple conditions in MySQL

I want to be able to query for multiple statements when I have a table that connects the id's from two other tables.
My three tables
destination:
id_destination, name_destination
keyword:
id_keyword, name_keyword
destination_keyword:
id_keyword, id_destination
Where the last one connects ids from the destination- and the keyword table, in order to associate destination with keywords.
A query to get the destination based on keyword would then look like
SELECT destination.name_destination FROM destination
NATURAL JOIN destination_keyword
NATURAL JOIN keyword
WHERE keyword.name_keyword like _keyword_
Is it possible to query for multiple keywords, let's say I wanted to get the destinations that matches all or some of the keywords in the list sunny, ocean, fishing and order by number of matches. How would I move forward? Should I restructure my tables? I am sort of new to SQL and would very much like some input.
Order your table joins starting with keyword and use a count on the number of time the destination is joined:
select
d.id_destination,
d.name_destination,
count(d.id_destination) as matches
from keyword k
join destination_keyword dk on dk.keyword = k.keyword
join destination d on d.id_destination = dk.id_destination
where name_keyword in ('sunny', 'ocean', 'fishing')
group by 1, 2
order by 3 desc
This query assumes that name_keyword values are single words like "sunny".
Using natural joins is not a good idea, because if the table structures change such that two naturally joined tables get altered to have columns the same name added, suddenly your query will stop working. Also by explicitly declaring the join condition, readers of your code will immediately understand how the tables are jones, and can modify it to add non-key conditions as required.
Requiring that only key columns share the same name is also restrictive, because it requires unnatural column names like "name_keyword" instead of simply "name" - the suffix "_keyword" is redundant and adds no value and exists only because your have to have it because you are using natural joins.
Natural joins save hardly any typing (and often cause more typing over all) and impose limitations on join types and names and are brittle.
They are to be avoided.
You can try something like the following:
SELECT dest.name_destination, count(*) FROM destination dest, destination_keyword dest_key, keyword key
WHERE key.id_keyword = dest_key.id_keyword
AND dest_key.id_destination = dest.id_destination
AND key.name_keyword IN ('sunny', 'ocean', 'fishing')
GROUP BY dest.name_destination
ORDER BY count(*), dest.name_destination
Haven't tested it, but if it is not correct it should show you the way to accomplish it.
You can do multiple LIKE statements:
Column LIKE 'value1' OR Column LIKE 'value2' OR ...
Or you could do a regular expression match:
Column LIKE 'something|somtthing|whatever'
The trick to ordering by number of matches has to do with understanding the GROUP BY clause and the ORDER BY clause. You either want one count for everything, or you want one count per something. So for the first case you just use the COUNT function by itself. In the second case you use the GROUP BY clause to "group" somethings/categories that you want counted. ORDER BY should be pretty straight forward.
I think based on the information you have provided your table structure is fine.
Hope this helps.
DISCLAIMER: My syntax isn't accurate.