Not selecting duplicates in join / where query - mysql

I've been trying to learn MySQL, and I'm having some trouble creating a join query to not select duplicates.
Basically, here's where I'm at :
SELECT atable.phonenumber, btable.date
FROM btable
LEFT JOIN atable ON btable.id = atable.id
WHERE btable.country_id = 4
However, in my database, there is the possibility of having duplicate rows in column atable.phonenumber.
For example (added asterisks for clarity)
phonenumber | date
-------------|-----------
*555-681-2105 | 2015-08-12
555-425-5161 | 2015-08-15
331-484-7784 | 2015-08-17
*555-681-2105 | 2015-08-25
.. and so on.
I tried using SELECT DISTINCT but that doesn't work. I also was looking through other solutions which recommended GROUP BY, but that threw an error, most likely because of my WHERE clause and condition. Not really sure how I can easily accomplish this.

DISTINCT applies to the whole row being returned, essentially saying "I want only unique rows" - any row value may participate in making the row unique
You are getting phone numbers duplicated because you're only looking at the column in isolation. The database is looking at phone number and also date. The rows you posted have different dates, and these hence cause the rows to be different
I suggest you do as the commenter recommended and decide what you want to do with the dates. If you want the latest date for a phone number, do this:
SELECT atable.phonenumber, max(btable.date)
FROM battle
LEFT JOIN atable ON btable.id = atable.id
WHERE btable.country_id = 4
GROUP BY atable.phonenumber
When you write a query that uses grouping, you will get a set of rows where there is only one set of value combinations for anything that is in the group by list. In this case, only unique phone numbers. But, because you want other values as well (I.e. Date) you MUST use what's called an aggregate function, to specify what you want to do with all the various values that aren't part of the unique set. Sometimes it will be MAX or MIN, sometimes it will be SUM, COUNT, AVG and so on.
if you're familiar with hash tables or dictionaries from elsewhere in programming, this is what a group by is: it maps a set of values (a key) to a list of rows that have those key values, and then the aggregating function is applied to any of the values in the list associated with the key
The simple rule when using group by (and one that MySQL will do implicitly for you) is to write queries thus:
SELECT
List,
of,
columns,
you,
want,
in,
unique,
combination,
FN(List),
FN(of),
FN(columns),
FN(you),
FN(want),
FN(aggregating)
FROM table
GROUP BY
List,
of,
columns,
you,
want,
in,
unique,
combination
i.e. You can copy paste from your select list to your group list. MySQL does this implicitly for you if you don't do it (i.e. If you use one or more aggregate functions like max in your select list, but forget or omit the group by clause- it will take everything that isn't in an agggregate function and run the grouping as if you'd written it). Whether group by is hence largely redundant is often debated, but there do exist other things you can do with a group by, such as rollup, cube and grouping sets. Also you can group on a column, if that column is used in a deterministic function, without having to group on the result of he deterministic function. Whether there is any point to doing so is a debate for another time :)

You should add GROUP BY, and an aggregate to the date field, something like this:
SELECT atable.phonenumber, MAX(btable.date)
FROM btable
LEFT JOIN atable ON btable.id = atable.id
WHERE btable.country_id = 4
GROUP BY atable.phonenumber
This will return the maximum date, hat is the latest date...

Related

mySQL group by function showing lack of data

What I'm after is to see what is the fastest lap time for particular races, which will be identified by using race name and race date.
SELECT lapName AS Name, lapDate AS Date, T
FROM Lap
INNER JOIN Race ON Lap.lapName = Race.Name
AND Lap.lapDate = Race.Date
GROUP BY Date;
It currently only displays 3 different race names, with 4 different dates, meaning I've got 4 combinations total, when there are in fact 9 unique race name, race date combinations.
Unique race data is stored in the Race table. Laptimes are stored in the LapInfo table.
I'm also getting a warning about my group statement saying it is ambiguous though it still runs.
You don't seem to need a join for this:
SELECT l.lapRaceName, l.lapRaceDate,
MIN(l.lapTime)
FROM LapInfo l
GROUP BY l.lapRaceName, l.lapRaceDate;
If you don't need a JOIN, it is superfluous to put one in the query.
First of all, your query is actually invalid SQL. You need to use the MIN function to get the fastest lapTime. Also, you have to GROUP BY lapRaceName, raceDate instead of just lapRaceName. Unfortunately, in this case, mysql is lax enough to execute it without error.
Also, you JOIN LapInfo with Race, and return jthe joined columns from LapInfo that you alias as names that can be found in Race. That's OK from SQL point of view, but that's also usulessly complicated : return directly the columns from the Race table, as they have the names that you are looking for.
Finally, it would be far better to indicate which table each column belongs to. Here, column lapTime belongs to table LapInfo, so let's make it explicit.
Query :
SELECT
Race.raceName,
Race.raceDate,
MIN(LapInfo.lapTime)
FROM
Race
INNER JOIN LapInfo
ON LapInfo.lapRaceName = Race.raceName
AND LapInfo.lapRaceDate = Race.raceDate
GROUP BY
Race.raceName,
Race.raceDate
;

Query for multiple conditions in MySQL

I want to be able to query for multiple statements when I have a table that connects the id's from two other tables.
My three tables
destination:
id_destination, name_destination
keyword:
id_keyword, name_keyword
destination_keyword:
id_keyword, id_destination
Where the last one connects ids from the destination- and the keyword table, in order to associate destination with keywords.
A query to get the destination based on keyword would then look like
SELECT destination.name_destination FROM destination
NATURAL JOIN destination_keyword
NATURAL JOIN keyword
WHERE keyword.name_keyword like _keyword_
Is it possible to query for multiple keywords, let's say I wanted to get the destinations that matches all or some of the keywords in the list sunny, ocean, fishing and order by number of matches. How would I move forward? Should I restructure my tables? I am sort of new to SQL and would very much like some input.
Order your table joins starting with keyword and use a count on the number of time the destination is joined:
select
d.id_destination,
d.name_destination,
count(d.id_destination) as matches
from keyword k
join destination_keyword dk on dk.keyword = k.keyword
join destination d on d.id_destination = dk.id_destination
where name_keyword in ('sunny', 'ocean', 'fishing')
group by 1, 2
order by 3 desc
This query assumes that name_keyword values are single words like "sunny".
Using natural joins is not a good idea, because if the table structures change such that two naturally joined tables get altered to have columns the same name added, suddenly your query will stop working. Also by explicitly declaring the join condition, readers of your code will immediately understand how the tables are jones, and can modify it to add non-key conditions as required.
Requiring that only key columns share the same name is also restrictive, because it requires unnatural column names like "name_keyword" instead of simply "name" - the suffix "_keyword" is redundant and adds no value and exists only because your have to have it because you are using natural joins.
Natural joins save hardly any typing (and often cause more typing over all) and impose limitations on join types and names and are brittle.
They are to be avoided.
You can try something like the following:
SELECT dest.name_destination, count(*) FROM destination dest, destination_keyword dest_key, keyword key
WHERE key.id_keyword = dest_key.id_keyword
AND dest_key.id_destination = dest.id_destination
AND key.name_keyword IN ('sunny', 'ocean', 'fishing')
GROUP BY dest.name_destination
ORDER BY count(*), dest.name_destination
Haven't tested it, but if it is not correct it should show you the way to accomplish it.
You can do multiple LIKE statements:
Column LIKE 'value1' OR Column LIKE 'value2' OR ...
Or you could do a regular expression match:
Column LIKE 'something|somtthing|whatever'
The trick to ordering by number of matches has to do with understanding the GROUP BY clause and the ORDER BY clause. You either want one count for everything, or you want one count per something. So for the first case you just use the COUNT function by itself. In the second case you use the GROUP BY clause to "group" somethings/categories that you want counted. ORDER BY should be pretty straight forward.
I think based on the information you have provided your table structure is fine.
Hope this helps.
DISCLAIMER: My syntax isn't accurate.

MySQL GROUP BY returns only first row

I have a table named forms with the following structure-
GROUP | FORM | FILEPATH
====================================
SomeGroup | SomeForm1 | SomePath1
SomeGroup | SomeForm2 | SomePath2
------------------------------------
I use the following query-
SELECT * FROM forms GROUP BY 'GROUP'
It returns only the first row-
GROUP | FORM | FILEPATH
====================================
SomeGroup | SomeForm1 | SomePath1
------------------------------------
Shouldn't it return both (or all of it)? Or am I (possibly) wrong?
As the manual states:
In standard SQL, a query that includes a GROUP BY clause cannot refer to nonaggregated columns in the select list that are not named in the GROUP BY clause. For example, this query is illegal in standard SQL because the name column in the select list does not appear in the GROUP BY:
SELECT o.custid, c.name, MAX(o.payment)
FROM orders AS o, customers AS c
WHERE o.custid = c.custid
GROUP BY o.custid;
For the query to be legal, the name column must be omitted from the select list or named in the GROUP BY clause.
MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.
In your case, MySQL is correctly performing the grouping operation, but (since you select all columns including those by which you are not grouping the query) gives you an indeterminate one record from each group.
It only returns one row, because the values of your GROUP column are the same ... that's basically how GROUP BY works.
Btw, when using GROUP BY it's good form to use aggregate functions for the other columns, such as COUNT(), MIN(), MAX(). In MySQL it usually returns the first row of each group if you just specify the column names; other databases will not like that though.
Your code:
SELECT * FROM forms GROUP BY 'GROUP'
isn't very "good" SQL, MySQL lets you get away with it and returns only one value for all columns not mentioned in the group by clause. Almost any other database would not perform this query. As a rule, any column, that is not part of the grouping condition must be used with an aggregate function.
as far as mysql is concerned, I just solved my problem by hit & trial.
I had the same problem 10 minutes ago. I was using mysql statement something like this:
SELECT * FROM forms GROUP BY 'ID'; // returns only one row
However using the statement like the following would yeild same result:
SELECT ID FROM forms GROUP BY 'ID'; // returns only one row
The following was my solution:
SELECT ID FROM forms GROUP BY ID; // returns more than one row (with one column of field "ID") grouped by ID
or
SELECT * FROM forms GROUP BY ID; // returns more than one row (with columns of all fields) grouped by ID
or
SELECT * FROM forms GROUP BY `ID`; // returns more than one row (with columns of all fields) grouped by ID
Lesson: Donot use semicolon, i believe it does a stringtype search with colons. Remove colons from column name and it will group by its value. However you can use backtick escapes eg. ID
Thank you everyone for pointing out the obvious mistake I was too blind to see. I finally replaced GROUP BY with ORDER BY and included a WHERE clause to get my desired result. That is what I was intending to use all along. Silly me.
My final query becomes this-
SELECT * FROM forms WHERE GROUP='SomeGroup' ORDER BY 'GROUP'
SELECT * FROM forms GROUP BY `GROUP`
it's strange that your query does work
The above result is kind of correct, but not quite.
All columns you select, which are not part of the GROUP BY statement have to be aggregated by some function (list of aggregation function from the MySQL docu). Most often they are used together with numeric columns.
Besides this, your query will return one output row for every (combination of) attributes in the columns referenced in the GROUP BY statement. In your case there is just one distinct value in the GROUP column, namely "SomeGroup", so the output will only contain one row for this value.
Group by clause should only be required if you have any group functions, say max, min, avg, sum, etc, applied in query expressions. Your query does not show any such functions. Meaning you actually not required a Group by clause. And if you still use such clause, you will receive only the first record from a grouped results.
Hence output on your query is perfect.
Query result is perfect; it will return only one row.

How do I use distinct for a column along with a where clause in sql server 2008?

I want to get the distinct value of a particular column however duplicity is not properly managed if more than 3 columns are selected.
The query is:
SELECT DISTINCT
ShoppingSessionId, userid
FROM
dbo.tbl_ShoppingCart
GROUP BY
ShoppingSessionId, userid
HAVING
userid = 7
This query produces correct result, but if we add another column then result is wrong.
Please help me as I want to use the ShoppingSessionId as a distinct, except when I want to use all the columns from the table, including with the where clause .
How can I do that?
The DISTINCT keyword applies to the entire row, never to a column.
Presently DISTINCT is not needed at all, because your script already makes sure that ShoppingSession is distinct: by specifying the column in GROUP BY and filtering on the other grouping column (userid).
When you add a third column to GROUP BY and it results in duplicated ShoppingSession, it means that some ShoppingSession values are associated with many different values of the added column.
If you want ShoppingSession to remain distinct after including that third column, you should decide which values of the the added column should be left in the output and which should be discarded. This is called aggregating. You could apply the MAX() function to that column, or MIN() or any other suitable aggregate function. Note that the column should not be included in GROUP BY in this case.
Here's an illustration of what I'm talking about:
SELECT
ShoppingSessionId,
userid,
MAX(YourThirdColumn) AS YourThirdColumn
FROM dbo.tbl_ShoppingCart
GROUP BY
ShoppingSessionId,
userid
HAVING userid = 7
There's one more note on your query. The HAVING clause is typically used for filtering on aggregated columns. If your filter does not involve aggregated columns, you'll be better off using the WHERE clause instead:
SELECT
ShoppingSessionId,
userid,
MAX(YourThirdColumn) AS YourThirdColumn
FROM dbo.tbl_ShoppingCart
WHERE userid = 7
GROUP BY
ShoppingSessionId,
userid
Although both queries would produce identical results, their efficiency would be different, because the first query would have to pull all rows, group/aggregate them, then discard all rows except userid = 7, but the second one would discard rows first and only then group/aggregate the remaining, which is much more efficient.
You could go even further and exclude the userid column from GROUP BY and pull its value with an aggregate function:
SELECT
ShoppingSessionId,
MAX(userid) AS userid,
MAX(YourThirdColumn) AS YourThirdColumn
FROM dbo.tbl_ShoppingCart
WHERE userid = 7
GROUP BY
ShoppingSessionId
Since all userid values in your output are supposed to contain 7 (because that's in your filter), you can just pick a maximum value per every ShoppingSession, knowing that it'll always be 7.

SQL combine COUNT and AVG query with SELECT

I need to get the average rating and the total number of ratings for a particular user and then select all single ratings (rating_value, rating_text, creator) as well:
$rating_query = mysql_query("SELECT COUNT(1) as rating_count
,AVG(rating_value), rating_value, rating_text, creator
FROM user_rating WHERE rated_user = $user_id");
This query would return the COUNT(1) result and the AVG(rating_value) for every row, but I only need those values once.
Is there any way to do this without making 2 separate queries?
There may be a trick I'm not aware of, but I don't think that's possible to do in a single query. You could try using a GROUP BY clause if that would make sense for you, but I'm guessing it probably doesn't from the column names you're using. Any relation requires a single atomic value at any given row and column, even if that value is null. What you are requesting is that columns 1 and 2 in every row but the first have no value, and again I don't think this is possible.