using distinct with all attributes - mysql

We can use * to select all attribute from table ,I am using distinct and my table contain 16 columns, How can I use distinct with it.I cannot do select distinct Id,* from abc;
What would be the best way.
Another way could be select distinct id,col1,col2 etc.

If you want in the results, one row per id, you can use GROUP BY id. But then, it's not advisable to use the other columns in the SELECT list (even if MySQL allows it - that depends on whether you have ANSI setting On or Off). It's advisable to use the other columns with aggregate functions like MIN(), MAX(), COUNT(), etc. In MySQL, there is also a GROUP_CONCAT() aggregate function that will collect all values from a column for a group:
SELECT
id
, COUNT(*) AS number_of_rows_with_same_id
, MIN(col1) AS min_col1
, MAX(col1) AS max_col1
--
, GROUP_CONCAT(col1) AS gc_col1
, GROUP_CONCAT(col2) AS gc_col2
--
, GROUP_CONCAT(col16) AS gc_col16
FROM
abc
GROUP BY
id ;
The query:
SELECT *
FROM abc
GROUP BY id ;
is not valid SQL (up to 92) because you have non-aggregated results in the SELECT list and valid in SQL (2003+). Still, it's invalid here because the other columns are not functionally dependent on the grouping column (id). MySQL unfortunately allows such queries and does no checking of functional dependency.
So, you never know which row (of the many with same id) will be returned or even if - horror! - you get results from different rows (with same id). As #Andriy comments, the consequences are that values for columns other than id will be chosen arbitrarily. If you want predictable results, just don't use such a technique.
An example solution: If you want just one row from every id, and you have a datetime or timestamp (or some other) column that you can use for ordering, you can do this:
SELECT t.*
FROM abc AS t
JOIN
( SELECT id
, MIN(some_column) AS m -- or MAX()
FROM abc
GROUP BY id
) AS g
ON g.id = t.id
AND g.m = t.some_column ;
This will work as long as the (id, some_column) combination is unique.

use group by instead of distinct
group by col1, col2,col3
its doing like distinct

SELECT DISTINCT * FROM `some_table`
Is absolutely valid syntax.
The error is caused by the fact that you call Id, *. Well * includes the Id column too, which usually is unique anyway.
So what you'll need in your case is just:
SELECT DISTINCT * FROM `abc`

SELECT * FROM abc where id in(select distinct id from abc);
You can totally do this.
Hope this helps
Initially I thought it would work for group by is best one. This is same as doing select * froom abc. Sorry guys

Related

What is the correct MySQL syntax for a query with a COUNT from another table as a column?

Here is a query I've used that almost does what I want:
SELECT *, COUNT(p.prize_id) as number_prizes
FROM tbl_draw d
INNER JOIN tbl_prize p ON p.draw_id=d.draw_id
WHERE d.draw_id={$draw_id}
The key point is that it counts the number of items from tbl_prize that matches on draw_id and presents that number as a new column 'number_prizes' in the result set. For this query, the result set is a single row, because of the final WHERE clause that matches on a specific draw_id.
I want it to return ALL of the rows from tbl_draw with that same calculation per row. My problem is that when I remove the final clause "WHERE d.draw_id={$draw_id}", the result collapses all the rows into one, and is only sending back the first such row found in tbl_draw and with 'number_prizes' being a total of all
How can I phrase the query better?
You need to add a group by clause, naming all columns, eg:
<your query>
group by col1, col2, ... -- list all other columns
As said above, you should use the group by after the end of your query, like this:
SELECT *, COUNT(p.prize_id) as number_prizes
FROM tbl_draw d
INNER JOIN tbl_prize p ON p.draw_id=d.draw_id
WHERE d.draw_id={$draw_id}
GROUP BY col1, col2, col3, coln.....
And, if you want the id from tbl_draw, your SELECT must be like:
SELECT d.id as draw_id, *, COUNT(p.prize_id) as number_prizes
I strongly recommend that you do not use SELECT *, ... here. List all of the columns that form the groups within the SELECT clause, then repeat that list within GROUP BY.
Here's why: "SELECT * will work only until someone adds a column to that table!" Which of course will happen someday.
Then, all of the sudden, your query will start failing, because * references this new column (which presumably you don't care about anyway ...), which is not in the GROUP BY list as is required.
Always specify exactly what columns (and aggregate functions ...) your query needs. You'll be very glad you did. (And so will your co-workers!)

Mysql combine two results and group them by field

I have been trying but it seems I am missing something. I want to combine two results from two tables by a common field.
I would like to group results from these two queries by customer field.
SELECT errors.customer, count(errors.customer) as err_count,severity from errors group by customer,severity;
SELECT customer,sum(size) as Tot_size,count(customer) as Policy_count from backup group by customer;
I have tried this.
SELECT errors.customer, count(errors.customer) as err_count,severity from errors group by customer,severity union all SELECT customer,count(customer) as Policy_count ,sum(size) as Tot_size from backup group by customer;
But for some reason some columns are missing.
You should follow the requirements for union:
The UNION operator is used to combine the result-set of two or more SELECT statements.
Each SELECT statement within UNION must have the same number of columns
The columns must also have similar data types
The columns in each SELECT statement must also be in the same order
Apparently, the above items are not satisfied in your query.
Try something like this:
SELECT q1.customer, Tot_size, Policy_count, err_count, severity
FROM ( SELECT customer, SUM(size) AS Tot_size, COUNT(customer) AS Policy_count
FROM backup GROUP BY customer ) q1
LEFT JOIN ( SELECT customer, COUNT(customer) AS err_count, severity
FROM errors GROUP BY customer, severity ) q2 ON q1.costumer = q2.costumer
Your first query contains three columns and your second one contains two columns.
In order to use the UNION operator your two queries need to have the same amount of columns, and the columns should be compatible.
In your case the second query lacks a third column. If there is no corresponding column to use you can set a default such as
"'n/a' as severity "
if it should be textual or
"0 as severity "
for a numerical value.
Cheers Martin

GROUP BY clause with non aggregate functions

Why mysql allows use non aggregate functions with GROUP BY clause ?
For example, this query works fine:
SELECT col, CHAR_LENGTH(col) FROM table
GROUP BY col
There is acceptable using querys like this ?
Sometimes is quite acceptable. Your query, written in more standard SQL, would be something like:
SELECT col, CHAR_LENGTH(col)
FROM (SELECT col FROM table GROUP BY col) c
or as:
SELECT col, MAX(CHAR_LENGTH(col))
FROM table
GROUP BY col
using non aggregate functions you can simplify the query a little bit, but the query would be a little more difficult to read.
It could also be useful when you are sure that all non aggregated columns share the same value:
SELECT id, name, surname
FROM table
GROUP BY id
HAVING COUNT(*)=1
or when it doesn't matter which value you need to return:
SELECT id, name
FROM table
GROUP BY id
will return a single name associated to that id (probably the first name encountered, but we can't be sure which one is the first, order by doesn't help here...). Be warned that if you want to select multiple non aggregated columns:
SELECT id, name, surname
FROM table
GROUP BY id
we have no guarantees that the name and surname returned will belong to the same row.
I would prefer not to use this extension, unless you are 100% sure of why you are using it.
MySQL has some "improvements" and tries to run and return result from invalid queries, in example like yours every good RDBMS should throw syntax error, but MySQL will run it, group the result by col and put value of randomly chosen row into second column.
If I'm guessing correctly about what you want to do, DISTINCT is a better choice:
SELECT DISTINCT col, CHAR_LENGTH(col) FROM table;
It more clearly indicates the readers what you're trying to accomplish.
Here is a SQLFiddle.

mysql ORDER BY MIN() not matching up with id

I have a database that has the following columns:
-------------------
id|domain|hit_count
-------------------
And I would like to perform this query on it:
SELECT id,MIN(hit_count)
FROM table WHERE domain='$domain'
GROUP BY domain ORDER BY MIN(hit_count)
I would like this query to give me the id of the row that had the smallest hit_count for $domain. The only problem is that if I have two rows that have the same domain, say www.bestbuy.com, the query will just group by whichever one came first, and then although I will get the correct lowest hit_count, the id may or may not be the id of the row that has the lowest hit_count.
Does anyone know of a way for me to perform this query and to get the id that matches up with MIN(hit_count)? Thanks!
Try this:
SELECT id,MIN(hit_count),domain FROM table GROUP BY domain HAVING domain='$domain'
See, when you're using aggregates, either via aggregate functions (and min() is such a function) or via GROUP BY or HAVING operators, your data is being grouped. In your case it is grouped by domain. You have 2 fields in your select list, id and min(hit_count).
Now, for each group database knows which hit_count to pick, as you've specified this explicitly via the aggregate function. But what about id — which one should be included?
MySQL internally wraps such fields into max() aggregate function, which I find an error prone approach. In all other RDBMSes you will get an error for such a query.
The rule is: if you use aggregates, then all columns should be either arguments of aggregate functions or arguments of GROUP BY operator.
To achieve the desired result, you need a subquery:
SELECT id, domain, hit_count
FROM `table`
WHERE domain = '$domain'
AND hit_count = (SELECT min(hit_count) FROM `table` WHERE domain = '$domain');
I've used backticks, as table is a reserved word in SQL.
SELECT
id,
hit_count
FROM
table
WHERE
domain='$domain'
AND hit_count = (SELECT MIN(hit_count) FROM table WHERE domain='$domain')
Try this:
SELECT id,hit_count
FROM table WHERE domain='$domain'
GROUP BY domain ORDER BY hit_count ASC;
This should also work:
select id, MIN(hit_count) from table where domain="$domain";
I had same question. Please see that question below.
min(column) is not returning me correct data of other columns
You are using a GROPU BY. Which means each row in result represents a group of values.
One of those values is the group name (the value of the field you grouped by). The rest are arbitrary values from within that group.
For example the following table:
F1 | F2
1 aa
1 bb
1 cc
2 gg
2 hh
If u will group by F1: SELECT F1,F2 from T GROUP BY F1
You will get two rows:
1 and one value from (aa,bb,cc)
2 and one value from (gg,hh)
If u want a deterministic result set, you need to tell the software what algorithem to apply to the group. Several for example:
MIN
MAX
COUNT
SUM
etc etc
There is a most simplist way your query is OK just modify it with DESC keyword after GROUP BY domain
SELECT
id,
MIN(hit_count)
FROM table
WHERE domain = '$domain'
GROUP BY domain DESC
ORDER BY MIN(hit_count)
Explanation:
When you use group by with aggregate function it always selects the first record but if you restrict it with desc keyword it will select the lowest or last record of that group.
For testing puspose use this query that has only group_concat added.
SELECT
group_concat(id),
MIN(hit_count)
FROM table
WHERE domain = '$domain'
GROUP BY domain DESC
ORDER BY MIN(hit_count)
If you can have duplicated domains group by id:
SELECT id,MIN(hit_count)
FROM domain WHERE domain='$domain'
GROUP BY id ORDER BY MIN(hit_count)

What's faster, SELECT DISTINCT or GROUP BY in MySQL?

If I have a table
CREATE TABLE users (
id int(10) unsigned NOT NULL auto_increment,
name varchar(255) NOT NULL,
profession varchar(255) NOT NULL,
employer varchar(255) NOT NULL,
PRIMARY KEY (id)
)
and I want to get all unique values of profession field, what would be faster (or recommended):
SELECT DISTINCT u.profession FROM users u
or
SELECT u.profession FROM users u GROUP BY u.profession
?
They are essentially equivalent to each other (in fact this is how some databases implement DISTINCT under the hood).
If one of them is faster, it's going to be DISTINCT. This is because, although the two are the same, a query optimizer would have to catch the fact that your GROUP BY is not taking advantage of any group members, just their keys. DISTINCT makes this explicit, so you can get away with a slightly dumber optimizer.
When in doubt, test!
If you have an index on profession, these two are synonyms.
If you don't, then use DISTINCT.
GROUP BY in MySQL sorts results. You can even do:
SELECT u.profession FROM users u GROUP BY u.profession DESC
and get your professions sorted in DESC order.
DISTINCT creates a temporary table and uses it for storing duplicates. GROUP BY does the same, but sortes the distinct results afterwards.
So
SELECT DISTINCT u.profession FROM users u
is faster, if you don't have an index on profession.
All of the answers above are correct, for the case of DISTINCT on a single column vs GROUP BY on a single column.
Every db engine has its own implementation and optimizations, and if you care about the very little difference (in most cases) then you have to test against specific server AND specific version! As implementations may change...
BUT, if you select more than one column in the query, then the DISTINCT is essentially different! Because in this case it will compare ALL columns of all rows, instead of just one column.
So if you have something like:
// This will NOT return unique by [id], but unique by (id,name)
SELECT DISTINCT id, name FROM some_query_with_joins
// This will select unique by [id].
SELECT id, name FROM some_query_with_joins GROUP BY id
It is a common mistake to think that DISTINCT keyword distinguishes rows by the first column you specified, but the DISTINCT is a general keyword in this manner.
So people you have to be careful not to take the answers above as correct for all cases... You might get confused and get the wrong results while all you wanted was to optimize!
Go for the simplest and shortest if you can -- DISTINCT seems to be more what you are looking for only because it will give you EXACTLY the answer you need and only that!
well distinct can be slower than group by on some occasions in postgres (dont know about other dbs).
tested example:
postgres=# select count(*) from (select distinct i from g) a;
count
10001
(1 row)
Time: 1563,109 ms
postgres=# select count(*) from (select i from g group by i) a;
count
10001
(1 row)
Time: 594,481 ms
http://www.pgsql.cz/index.php/PostgreSQL_SQL_Tricks_I
so be careful ... :)
Group by is expensive than Distinct since Group by does a sort on the result while distinct avoids it. But if you want to make group by yield the same result as distinct give order by null ..
SELECT DISTINCT u.profession FROM users u
is equal to
SELECT u.profession FROM users u GROUP BY u.profession order by null
It seems that the queries are not exactly the same. At least for MySQL.
Compare:
describe select distinct productname from northwind.products
describe select productname from northwind.products group by productname
The second query gives additionally "Using filesort" in Extra.
In MySQL, "Group By" uses an extra step: filesort. I realize DISTINCT is faster than GROUP BY, and that was a surprise.
After heavy testing we came to the conclusion that GROUP BY is faster
SELECT sql_no_cache
opnamegroep_intern
FROM telwerken
WHERE opnemergroep IN (7,8,9,10,11,12,13) group by opnamegroep_intern
635 totaal 0.0944 seconds
Weergave van records 0 - 29 ( 635 totaal, query duurde 0.0484 sec)
SELECT sql_no_cache
distinct (opnamegroep_intern)
FROM telwerken
WHERE opnemergroep IN (7,8,9,10,11,12,13)
635 totaal 0.2117 seconds ( almost 100% slower )
Weergave van records 0 - 29 ( 635 totaal, query duurde 0.3468 sec)
(more of a functional note)
There are cases when you have to use GROUP BY, for example if you wanted to get the number of employees per employer:
SELECT u.employer, COUNT(u.id) AS "total employees" FROM users u GROUP BY u.employer
In such a scenario DISTINCT u.employer doesn't work right. Perhaps there is a way, but I just do not know it. (If someone knows how to make such a query with DISTINCT please add a note!)
Here is a simple approach which will print the 2 different elapsed time for each query.
DECLARE #t1 DATETIME;
DECLARE #t2 DATETIME;
SET #t1 = GETDATE();
SELECT DISTINCT u.profession FROM users u; --Query with DISTINCT
SET #t2 = GETDATE();
PRINT 'Elapsed time (ms): ' + CAST(DATEDIFF(millisecond, #t1, #t2) AS varchar);
SET #t1 = GETDATE();
SELECT u.profession FROM users u GROUP BY u.profession; --Query with GROUP BY
SET #t2 = GETDATE();
PRINT 'Elapsed time (ms): ' + CAST(DATEDIFF(millisecond, #t1, #t2) AS varchar);
OR try SET STATISTICS TIME (Transact-SQL)
SET STATISTICS TIME ON;
SELECT DISTINCT u.profession FROM users u; --Query with DISTINCT
SELECT u.profession FROM users u GROUP BY u.profession; --Query with GROUP BY
SET STATISTICS TIME OFF;
It simply displays the number of milliseconds required to parse, compile, and execute each statement as below:
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 2 ms.
SELECT DISTINCT will always be the same, or faster, than a GROUP BY. On some systems (i.e. Oracle), it might be optimized to be the same as DISTINCT for most queries. On others (such as SQL Server), it can be considerably faster.
This is not a rule
For each query .... try separately distinct and then group by ... compare the time to complete each query and use the faster ....
In my project sometime I use group by and others distinct
If you don't have to do any group functions (sum, average etc in case you want to add numeric data to the table), use SELECT DISTINCT. I suspect it's faster, but i have nothing to show for it.
In any case, if you're worried about speed, create an index on the column.
If the problem allows it, try with EXISTS, since it's optimized to end as soon as a result is found (And don't buffer any response), so, if you are just trying to normalize data for a WHERE clause like this
SELECT FROM SOMETHING S WHERE S.ID IN ( SELECT DISTINCT DCR.SOMETHING_ID FROM DIFF_CARDINALITY_RELATIONSHIP DCR ) -- to keep same cardinality
A faster response would be:
SELECT FROM SOMETHING S WHERE EXISTS ( SELECT 1 FROM DIFF_CARDINALITY_RELATIONSHIP DCR WHERE DCR.SOMETHING_ID = S.ID )
This isn't always possible but when available you will see a faster response.
in mySQL i have found that GROUP BY will treat NULL as distinct, while DISTINCT does not.
Took the exact same DISTINCT query, removed the DISTINCT, and added the selected fields as the GROUP BY, and i got many more rows due to one of the fields being NULL.
So.. I tend to believe that there is more to the DISTINCT in mySQL.