Mysql query to find distinct rows shows incorrect result - mysql

I wish to find the total number of distinct records in a table.
I have a table with the following columns
id, name, product, rating, manufacturer price
This has around 128 rows with some duplicates based on different column names.
I only want to select distinct rows:
select distinct name, product, rating, maufacturer, price from table
This returns 47 rows
For pagination purposes, I need to find the total number of distinct records, so I have another satatement:
select distinct count(name), product, rating, maufacturer, price from table
But this returns 128 instead of 47.
How can I get the total number of distinct rows? Any help will be much appreciated. Thanks

You have the distinct and count reversed.
SELECT COUNT(DISTINCT column_name) FROM table_name
Also, I would drop the extra fields when counting, your results will be unexpected for those other fields.

It is not quite clear if you want to get the count in the SAME query with the results or if you want to run a different query. Here go both solutions. In the result as a new column:
select distinct name, product, rating, manufacturer, price, (
select count(*) from (
select distinct name, product, rating, manufacturer, price from table1
) as resultCount) as resultCount
from table1
Notice the previous solution will repeat the count(*) for each row, which is not very efficient, not even visually appealing. Try running two queries one getting the actual data and the other one to get the amount of records in the table that match that data:
select distinct name, product, rating, manufacturer, price from table1
select count(*) from (
select distinct name, product, rating, manufacturer, price from table1
) as result
Hope this helps

Try adding GROUP BY name, product, rating, maufacturer, price clause

It would require running your actual query TWICE... an INNER for distinct and then get the count of those as a single row returned, and then join that to the original select distinct...
select distinct
t1.product,
t1.rating,
t1.maufacturer,
t1.price,
JustTheCount.DistCnt
from
table t1,
( select count(*) as DistCnt
from ( select distinct
t2.product,
t2.rating,
t2.maufacturer,
t2.price
from
table t2 )
) JustTheCount

In the following query, you're getting rows with distinct names since the DISTINCT clause precedes the name column:
SELECT DISTINCT name, product, rating, maufacturer, price FROM table
However, to get the count of the same records, use the following format:
SELECT COUNT(DISTINCT name) FROM table
Notice that DISTINCT goes inside of the COUNT function so that you're counting the distinct names. You probably don't want to include the other columns in the count query because they will be a random sample from the set. Of course, if you want a random sample, then include them.
Most applications will run the count query first, followed by the query to return the results. Also keep in mind that COUNT(*) is only an estimate, and the value may differ from the actual number of records returned.
SELECT DISTINCT COUNT(name), product FROM table isn't even a valid query in MySQL 4.x. You can't mix aggregate and non-aggregate columns. IN 5.x, it'll run, but the values for the non aggregate columns will be a random sample from the set.

At the risk of sparking some flames here.. you could always use:
SQL_CALC_FOUND_ROWS as the first part of your SQL. This is very mysql specific though.
http://dev.mysql.com/doc/refman/5.0/en/information-functions.html
mysql> SELECT SQL_CALC_FOUND_ROWS * FROM tbl_name
-> WHERE id > 100 LIMIT 10;
mysql> SELECT FOUND_ROWS();

Related

MySQL 5.7 How to do GROUP BY with sorting?

Similar to this issue: MySQL 5.7 group by latest record
I'm not sure how to do this properly in 5.7. Also with possibility of 2nd sort column. Working query in 5.6 that I'm trying to replicate in 5.7:
SELECT id FROM test
GROUP BY category
ORDER BY sort1 DESC, sort2 DESC
id is not always the highest, so MAX(id) does not work.
Looking into the link above, the solution for single sort should be:
SELECT t1.*
FROM test t1
INNER JOIN (
SELECT category, max(sort) AS sort FROM test GROUP BY category
) t2 ON t2.category = t1.category AND t2.sort = t1.sort
But how will it work with 2 sorting?
You are using GROUP BY the wrong way.
Think of group by as a way to separate data row into different groups. Each group has multiple rows, based on the value of group by column.
Once you get those groups, selecting table columns (as in: select *) is like picking any row from that group randomly. This is not helpful nor useful.
Usually once we group records (or rows), we need to find meta information about those records. For example: get us the count of records in that group (as in: select count(*)), or the sum of values of a specific column in that group (as in: select sum(price)), or get the min, max or avg values.
So in a nutshell, when you use group by you should use on of the aggregation functions with it, otherwise it's not going to do you any good.
Why don't you have the ORDER BY at your outer query, instead?
SELECT *
FROM (
SELECT 100 AS id, 1 AS category, NULL AS sort
UNION
SELECT 200 AS id, 1 AS category, 2 AS sort
) dt
GROUP BY category
ORDER BY sort DESC;
It seems that what happened to the data when it was grouped, it took the first data while neglecting the ORDER BY DESC. On your first query, it ordered descending first then group by took the first record which is 200. And yes, this shouldn't be the way you should use GROUP BY. It is used in conjunction with aggregate functions.
when you select a column in a group by query that is not one of the columns you are grouping by, (ie, your id) you have no control over the value unless you use another aggregate function. If you want to sort, use MIN or MAX:
SELECT MAX(id), category, FROM `test2`
GROUP BY category; -- always returns 200
SELECT MIN(id), category, FROM `test2`
GROUP BY category; -- always returns 100

Selecting unique data

I can't seem to find a good way to select unique data. Specifically unique values within a query.
Here's an example:
A select distinct query returns 10,000 rows. Within those rows, one column - let's call it vendors - has maybe 6 unique values. How can I return just the 6 unique vendors without scrolling through 10,000 records to make sure I caught them all. Even sorting by vendor this would still be a daunting task.
select distinct vendor from (select [distinct] col1, col2, ..., vendor from your_table) temp;
On the other hand you could ask directly for the distinct vendor, without running the more expensive query:
select distinct vendor from yourtable where {your_criteria}
Maybe you shoud try to give alias to your query result that returns 10k rows
something like (SELECT DISTINCT FROM ... ) as yourtable
and then do SELECT DISTINCT your column name FROM yourtable
(SELECT DISTINCT * FROM xxx ) as yourtable // this would return your 10k rows and nam that table simply yourtable
and then SELECT DISTINCT youruniquecolumn FROM yourtable // this will select all unique columns from your 10k table

select distinct count(id) vs select count(distinct id)

I'm trying to get distinct values from a table. When I ran select distinct count(id) from table I got over a million counts. However if I ran select count(distinct id) from table I've got only around 300k counts. What was the difference of the two queries?
Thanks
When you do select distinct count(id) then you are basically doing:
select distinct cnt
from (select count(id) as cnt from t) t;
Because the inner query only returns one row, the distinct is not doing anything. The query counts the number of rows in the table (well, more accurately, the number of rows where id is not null).
On the other hand, when you do:
select count(distinct id)
from t;
Then the query counts the number of different values that id takes on in the table. This would appear to be what you want.
If id is the pk the count with distinct count(id) will match the no of rows returned with count(distinct id).
If id is not the pk but has a unique constraint(on id alone, not in combination with any other column), the no of rows returned with count(distinct id) will be equal to the count with distinct count(id), as in the case of pk.
If id is just another column, select count distinct count(id) from table will return one row with the no of records where the id column is NOT NULL where as select count count(distinct id) from table will return 'one column' with all non NULL unique ids in the table.
In no case will the count or the no of rows returned exceed the total no of rows in your table.
The second select is definitely what you want, because it will aggregate the id's (if you have 10 records with id=5 then they will all be counted as one record) and the select will return "how many distinct id's were in the table".
However the first select will do something odd, and i'm not entirely sure what it will do.

Mysql select distinct

I am trying to select of the duplicate rows in mysql table it's working fine for me but the problem is that it is not letting me select all the fields in that query , just letting me select the field name i used as distinct , lemme write the query for better understading
mysql_query("SELECT DISTINCT ticket_id FROM temp_tickets ORDER BY ticket_id")
mysql_query("SELECT * , DISTINCT ticket_id FROM temp_tickets ORDER BY ticket_id")
1st one is working fine
now when i am trying to select all fields i am ending up with errors
i am trying to select the latest of the duplicates let say ticket_id 127 is 3 times on row id 7,8,9 so i want to select it once with the latest entry that would be 9 in this case and this applies on all the rest of the ticket_id's
Any idea
thanks
DISTINCT is not a function that applies only to some columns. It's a query modifier that applies to all columns in the select-list.
That is, DISTINCT reduces rows only if all columns are identical to the columns of another row.
DISTINCT must follow immediately after SELECT (along with other query modifiers, like SQL_CALC_FOUND_ROWS). Then following the query modifiers, you can list columns.
RIGHT: SELECT DISTINCT foo, ticket_id FROM table...
Output a row for each distinct pairing of values across ticket_id and foo.
WRONG: SELECT foo, DISTINCT ticket_id FROM table...
If there are three distinct values of ticket_id, would this return only three rows? What if there are six distinct values of foo? Which three values of the six possible values of foo should be output?
It's ambiguous as written.
Are you looking for "SELECT * FROM temp_tickets GROUP BY ticket_id ORDER BY ticket_id ?
UPDATE
SELECT t.*
FROM
(SELECT ticket_id, MAX(id) as id FROM temp_tickets GROUP BY ticket_id) a
INNER JOIN temp_tickets t ON (t.id = a.id)
You can use group by instead of distinct. Because when you use distinct, you'll get struggle to select all values from table. Unlike when you use group by, you can get distinct values and also all fields in table.
You can use DISTINCT like that
mysql_query("SELECT DISTINCT(ticket_id), column1, column2, column3
FROM temp_tickets
ORDER BY ticket_id");
use a subselect:
http://forums.asp.net/t/1470093.aspx

MySQL DISTINCT before GROUP

I have a MySQL table with 10 fields. For this particular query, I only care about 4: Title, Variables, Location, Date. I would like to take the distinct values of these four groups and then group by Title, Variables. However, When I use the following query
Select DISTINCT
Title,
Variables,
Location,
Date
FROM ForecastsTest2 WHERE ...
GROUP BY Variables, Title
ORDER BY Title
It groups first and then takes distinct results. Is there any way I can switch this order?
I ended up finding my solution in
SELECT Variables,
Title
FROM (SELECT DISTINCT Variables,
Title,
Location,
Date
FROM MyTABLE as Table1
) as Table2
WHERE ...
GROUP BY Variables, Title
ORDER BY TITLE
I guess I didn't do a good job of mentioning this, but I also added a HAVING COUNT(*) >= 2 in the query. In this case, the the count will happen after all non distinct rows have been removed.