DISTINCT work differntly in MYSQL & POSTGRESQL - mysql

I have created a sample table report in SQL and filled sample data in it using the following command.
create table report(id int primary key,vistor_id int, branch_id int,date int);
insert into report values (1,1,3,27),(2,1,2,27),(3,1,1,28),(4,1,4,30),(5,1,1,30);
I need to find the list of recently visited(based on date column) branches with out duplication.
So I used the following query
select distinct branch_id from report order by date desc;
It works on MYSQL but shows the following error on POSTGRESQL. How to fix this? Or How can I obtain the same result in POSTGRESQL?(The error is from sqlfiddle.com).
ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select
list Position: 48

Your query is not valid standard SQL. It works in MySQL only if you have option ONLY_FULL_GROUP_BY disabled.
The problem is that there may be multiple dates per branch_id: which one should be used for ordering?
You can use aggregation and be explicit about what you ask for. Say you want to order by the latest date per branch_id:
select branch_id from report group by branch_id order by max(date) desc

Related

GROUP BY clause in MySQL groups records with different values

MySQL GROUP BY clause groups records even when they have different values.
However I would like it to as with DB2 SQL so that if records not contain exactly the same information they are not grouped.
Currently in MySQL for:
id Name
A Amanda
A Ana
the Group by id would return 1 record randomly (unless aggregation clauses used of course)
However in DB2 SQL the same Group by id would not group those: returning 2 records and never doing such a thing as picking randomly one of the values when grouping without using aggregation functions.
First, id is a bad name for a column that is not the primary key of a table. But that is not relevant to your question.
This query:
select id, name
from t
group by id;
returns an error in almost any database other than MySQL. The problem is that name is not in the group by and is not the argument of an aggregation function. The failure is ANSI-standard behavior, not honored by MySQL.
A typical way to write the query is:
select id, max(name)
from t
group by id;
This should work in all databases (assuming name is not some obscure type where max() doesn't work).
Or, if you want each name, then:
select id, name
from t
group by id, name;
or the simpler:
select distinct id, name
from t;
In MySQL, you can get the ANSI standard behavior by setting ONLY_FULL_GROUP_BY for the database/session. MySQL will then return an error, as DB2 does in this case.
The most recent versions of MySQL have ONLY_FULL_GROUP_BY set by default.
Group by in mysql will group the records according to the set fields. Think of it as: It gets one and the others will not show up. It has uses, for example, to count how many times that ID is repeated on the table:
select count(id), id from table group by id
You can, however, to achieve your purpose, group by multiple fields, something among the lines of:
select * from table group by id, name
I do not think there is an automated way to do this but using
GROUP BY id, name
Would give you the solution you are looking for

Why mysql select count(distinct user_id) return wrong number?

I have a big table in mysql.It has 13 million rows.
Mysql version is 5.7.10.
Table structure as below:
create table table_name (
user_id varchar(20) not null,
item_id varchar(20) not null
);
1. The first sql is:
select count(distinct user_id) from table;
result:760,000
2. The second sql is:
select count(1) from (select user_id from table group by user_id) a;
result:120,000
user_id is not null for each row.
And, the right number is 120,000.Why the first sql get the wrong number?
Then,I run the first sql in hive and spark-sql, the result is 120,000.
So, is this a mysql's bug or something can be setting to make things right?
Thank you!
Update:I try it on another PC, the result of first sql is 120,000.This time get the right number.Mysql version is 5.6.26.
So, maybe it is a bug of 5.7.10.
There are multiple known bugs in MySQL count distinct when a column is included in two-column unique key.
here and here

MySQL not respecting 'HAVING' where aliases are used?

I've got the following table structure:
CREATE TABLE reservations (
id int auto_increment primary key,
minDate date,
maxDate date
);
CREATE TABLE stays (
id int auto_increment primary key,
reservation_id int,
theDate date
);
INSERT INTO reservations VALUES (null, CURDATE(), CURDATE());
INSERT INTO stays VALUES (null, 1, CURDATE());
It's for a booking system that records reservations (a general container) and stays (someone for each night).
I try to run the following query to extract all reservations that have a different number of days in the database (e.g. the reservation says there should be 2 nights, but there's only 1 in the database, etc)
SELECT
reservations.id AS 'Reservation ID',
reservations.minDate,
reservations.maxDate,
DATEDIFF(reservations.maxDate, reservations.minDate) + 1 AS 'numNights',
COUNT(DISTINCT stays.id) AS 'numStays'
FROM
reservations
LEFT JOIN stays ON reservations.id = stays.reservation_id
GROUP BY
reservations.id
HAVING
`numNights` != `numStays`
ORDER BY
reservations.minDate
This works perfectly on my Windows version of MySQL (xampp), and production CentOS server, but is broken on a testing machine running version 5.6.19-0ubuntu0.14.04.1. On the broken machine, it's pulling back all rows, even though the numNights and numStays columns match.
If I replace the aliases in the HAVING clause with the expressions used in the SELECT part, then it works fine, but I can't understand why it doesn't like the aliases in the HAVING clause (on this version)?
Btw, it's definitely not a no-quote/quote/double-quote/backtick issue, I've tried all combinations. I might have thought it was a charset encoding issue, but DATEDIFF() and COUNT() should be returning the same type of integers back, right? And that wouldn't explain why expressions work in the HAVING part.
I have an SQL Fiddle set up for experimenting as well... it works fine on that too. So now I'm at a loss
This is a "wrong" SQL-Query, because of wrong GROUP BY. But unfortunately, this allowed mysql sometimes. Please check mySql Configuration ONLY_FULL_GROUP_BY and http://dev.mysql.com/doc/refman/5.0/en/sql-mode.html#sqlmode_only_full_group_by
Both the tables contain only single row of data.Without 'Having' clause its returning only 1 row. sql fiddle. As numNights=numStays=1 , while executing 'Having' its giving nothing.I think it'll help you.

Why does MySQL allow you to group by columns that are not selected

I'm reading a book on SQL (Sams Teach Yourself SQL in 10 Minutes) and its quite good despite its title. However the chapter on group by confuses me
"Grouping data is a simple process. The selected columns (the column list following
the SELECT keyword in a query) are the columns that can be referenced in the GROUP
BY clause. If a column is not found in the SELECT statement, it cannot be used in the
GROUP BY clause. This is logical if you think about it—how can you group data on a
report if the data is not displayed? "
How come when I ran this statement in MySQL it works?
select EMP_ID, SALARY
from EMPLOYEE_PAY_TBL
group by BONUS;
You're right, MySQL does allow you to create queries that are ambiguous and have arbitrary results. MySQL trusts you to know what you're doing, so it's your responsibility to avoid queries like that.
You can make MySQL enforce GROUP BY in a more standard way:
mysql> SET SQL_MODE=ONLY_FULL_GROUP_BY;
mysql> select EMP_ID, SALARY
from EMPLOYEE_PAY_TBL
group by BONUS;
ERROR 1055 (42000): 'test.EMPLOYEE_PAY_TBL.EMP_ID' isn't in GROUP BY
Because the book is wrong.
The columns in the group by have only one relationship to the columns in the select according to the ANSI standard. If a column is in the select, with no aggregation function, then it (or the expression it is in) needs to be in the group by statement. MySQL actually relaxes this condition.
This is even useful. For instance, if you want to select rows with the highest id for each group from a table, one way to write the query is:
select t.*
from table t
where t.id in (select max(id)
from table t
group by thegroup
);
(Note: There are other ways to write such a query, this is just an example.)
EDIT:
The query that you are suggesting:
select EMP_ID, SALARY
from EMPLOYEE_PAY_TBL
group by BONUS;
would work in MySQL but probably not in any other database (unless BONUS happens to be a poorly named primary key on the table, but that is another matter). It will produce one row for each value of BONUS. For each row, it will get an arbitrary EMP_ID and SALARY from rows in that group. The documentation actually says "indeterminate", but I think arbitrary is easier to understand.
What you should really know about this type of query is simply not to use it. All the "bare" columns in the SELECT (that is, with no aggregation functions) should be in the GROUP BY. This is required in most databases. Note that this is the inverse of what the book says. There is no problem doing:
select EMP_ID
from EMPLOYEE_PAY_TBL
group by EMP_ID, BONUS;
Except that you might get multiple rows back for the same EMP_ID with no way to distinguish among them.

Avoid creating temporary table when both group by and order by are used in query

Is there any way to avoid creation of temporary table in such a query?
SELECT item_id FROM titem
GROUP BY item_id
ORDER BY item_created;
I tried to create indexes (item_id, item_created) and (item_created, item_id), but that didn't help.
You have a problem with your query. When the group by is performed which item_created is selected for each group? MySQL by default will select one for each group more or less randomly (from within the group). As only one item_created is used for each group a temporary table is required to sort it.
I would advise you to use one of the aggregation functions such as min() or max() to get a abetter defined item_created to sort on. This still won't fix teh temporary table problem, an index on item_id is about the best you can do.
SELECT item_id, max(item_created) as ic FROM titem
GROUP BY item_id
ORDER BY ic;
It is interesting to note that some databases eg Oracle (i'm told) would throw and error on your original query as item_created is neither in the selected fields in an aggregate function nor is it in the group by clause. MySQL can be set to mimic this behaviour.