Difference between GROUP BY HAVING and SELECT MAX() GROUP BY - mysql

I'm new to SQL and getting confused by the difference in the following two queries:
SELECT MAX(version), * FROM table WHERE primary_key = #key GROUP BY location
SELECT version, * FROM table WHERE primary_key = #key GROUP BY location HAVING version = MAX(version)
Assuming that the table looks like something like this:
primary_key | version | location | data
If I'm understanding this correctly, both queries select the max version entry within each location (among those that have #key as primary key). So is there any difference between the two queries? Or is the difference just on performance?

You missed the table name
the use of * star (all columns) when you are using aggregation function is deprecated and in most db is not allowed
SELECT MAX(version)
FROM your_table
WHERE primary_key = #key
GROUP BY location
SELECT version
FROM your_table
WHERE primary_key = #key
GROUP BY location
HAVING version = MAX(version)
the two qyery are different because in the second the resulting query is filter for match the having condition
having work ever on the result of a query (instead where work directly on the row source for the query )
and yes the secondo is more slow that the first ... is most case the difference could be not appreciable

Please refer below link, may be useful:
http://www.dofactory.com/sql/group-by

Related

Exists on a distinct column selection does not work as expected when offset is provided

Given the following table definition:
CREATE TABLE demo (
id integer PRIMARY KEY,
name varchar(10)
);
INSERT INTO demo VALUES (1, 'test');
INSERT INTO demo VALUES (2, 'test');
The following queries (which are assumed to be semantically identical - please correct me if I'm wrong):
SELECT DISTINCT name
FROM demo
WHERE name = 'test';
SELECT DISTINCT name
FROM demo
WHERE name = 'test'
-- actual value is irrelevant as long
-- as it is > number of entries that would result
LIMIT 10
OFFSET 0;
Both correctly return:
name
----
test
In addition, the query:
SELECT EXISTS(
SELECT DISTINCT name
FROM demo
WHERE name = 'test'
LIMIT 10
OFFSET 0
);
also correctly returns 1 (or t in PostgreSQL). However, the query:
SELECT EXISTS(
SELECT DISTINCT name
FROM demo
WHERE name = 'test'
LIMIT 10
OFFSET 1 -- note the offset: 1 more than what the DISTINCTed query should return
);
also returns 1 in SQLite and MySQL, but f in PostgreSQL. It seems as if the OFFSET is applied to the query result in PostgreSQL (as expected), but the DISTINCT has precedence in SQLite and MySQL.
AFAIK, the SQL standard defines LIMIT/OFFSET to be evaluated last (though I couldn't actually find a link to the standard to verify this myself, though every search turns up the same...), meaning that the PostgreSQL behaviour is correct.
Is this a bug that has been fixed in PostgreSQL?
Tested on:
SQLite 3.36.0
MySQL 8.0.28-0ubuntu0.20.04.3
PostgreSQL 14.2 (Debian 14.2-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
Interestingly, using GROUP BY instead of DISTINCT like follows:
SELECT EXISTS(
SELECT name
FROM demo
WHERE name = 'test'
GROUP BY name
LIMIT 10
OFFSET 1
);
correctly returns 0 on SQLite, but it still returns an incorrect result of 1 on MySQL.
It seems that, as noted in the comments, this is simply a bug in MySQL and SQLite. I was about to report it on their tracker, but Oracle wants a lot of information for me to be able to do that, so I refrained.
The behaviour in PostgreSQL is the expected one and correct.

During group by I need to take a variable which is not using in group by also I don't want to take its aggregation function (I want it as it is)

I have a data Frame that has millions of records and 8 columns.
I want to group by it with col1 and col2 and in select, I want name_id, max(SUM),col1,col2.
Now the problem is I am not using name_id in a group by condition nor is it an aggregate function.
Can you please suggest any method that solves my problem in SQL or Pyspark.
Input Data Frame here SUM = number of columns have data and name_id is unique:
Required Output : name_id (as it is), max(SUM),Col1,Col2
I tried something like this but it's not working:
Any suggestion is welcome!
I tried below code which is working fine with one scenario and not with others.
Working scenario, When I have duplicate maximum values in sum column then its working fine and retuning max name_id which is my requirement
When SUM columns do not have maximum value duplicate then it is returning null, in the below table according to logic my output should contain name_id = 48981 and name_id = 52214 but I am getting the only name_id = 52214.
It is a classical greatest per group problem. I would suggest using the following solution to this problem:
select d.*
from data_frame d
join (
select col_1, col_2,
max(sum) max_sum,
max(name_id) max_name_id
from data_frame
group by col_1, col_2
) t on d.col_1 = t.col_1 and
d.col_2 = t.col_2 and
d.name_id = t.max_name_id and
d.sum = t.max_sum
You seem to want:
select max(name_id), max(sum), col1, col2, max(col3), . . .
from t
group by col1, col2;
Your last column doesn't seem to be using max(), but you have not explained that logic.

Mysql: Is it possible to use a subquery inside a from clause in order to pick the table name from another table

I was wondering if there's any way to add a subquery with a switch case to the form clause of my select query in order to select a table based on a condition.
For example:
select a.*
from (select (case when (table2.column = 'something')
then (table2.tablename1)
else (table2.tablename2)) as tablename
from table2
where table2.column2 = 'blabla'
limit 1
) a
I tried to write that in many variation & so far non of them worked.
On the most successful tryouts (when I got no mysql errors) it returned the name of the table as the result itself (for example: the value that's in table2.tablename2). I understand why it did that (because I selected everything from a select results...) but how can I use the tablename from the results in order to set the table on the main query?
Hope that make sense...
Any idea?

MySQL IF on Where clause

Is it possible to make a query that changes the where clause acording to some condition? For instance I want to select * from table1 where data is 19/July/2016 but if field id is null then do nothing, else compare id to something else. Like the query bellow?
Select * from table1 where date="2016-07-19" if(isnull(id),"",and id=(select * from ...))
Yes. This should be possible.
If we assume that date and id are references to columns in (the unfortunately named) table table1, if I'm understanding what you are attempting to achieve, we could write a query like this:
SELECT t.id
, t.date
, t....
FROM table1 t
WHERE t.date='2016-07-19'
AND ( t.id IS NULL
OR t.id IN ( SELECT expr FROM ... )
)
It would also be possible to incorporate the MySQL IF() and IFNULL() functions, if there's some requirement to do that.
As far as dynamically changing the text of the SQL statement after the statement is submitted to the database, no, that's not possible. Any dynamic changes to the SQL text would need to be done when the SQL statement is generated, before it is submitted to the database.
My personal preference would be to use a join operation rather than a IN (subquery) predicate.
I think you're trying too hard. If id is NULL that's equivalent to having a FALSE in the where clause. So:
Select * from table1 where date="2016-07-19" and id=(select * from ...)
Should only match the records you want. If id is NULL you get nothing.

Select record using values from a previous COUNT() IN MYSQL

I obtain a series of values that appear only one time in my database using COUNT in mysql that list below:
valueName
---------
value1
value2
value3
value4
I need a script that retrieves all records in a table where valueName are not the values listed in the initial count, and I need this two steps to run in a single script (doesn't matter how many parts it has).
I've got the script to obtain the list above like this:
SELECT field AS new_name FROM table GROUP BY field HAVING COUNT(field) = 1;
And it works.
The problem is that I don't know how to work with the aggregated result of the first step. Maybe using some kind of function. Or loop (I don't think in SQL..).
I've tried different things like attaching a COUNT inside a WHERE clause and others but it doesn't work.
Please help!
Use a join:
select t.*
from table t join
(SELECT field
FROM table
GROUP BY field
HAVING COUNT(field) > 1
) filter
on t.field = filter.field;
If you have a primary key in your table and an index on table(field, pk), the following is probably faster:
select t.*
from table t
where exists (select 1
from table t2
where t2.field = t.field and t2.pk <> t.pk
);
Try this:
SELECT table.* FROM table
JOIN
(SELECT field FROM table GROUP BY field HAVING COUNT(field) > 1) newtable
ON
table.field = newtable.field;
This should work.