MySQL multiple intersection with self performance - mysql

For simplicity, let's say we have a table with two columns: uid (user id) and fruit, describing what kinds of fruit a user likes.
E.g.:
uid | fruit
----|------------
1 | Strawberry
1 | Orange
2 | Strawberry
2 | Banana
3 | Watermelon
and so forth.
If I want to find what kinds of fruit are common in N particular users (i.e. the intersection N times of the table with itself), the first option is to use an INNER JOIN.
SELECT DISTINCT fruit FROM Fruits f1
INNER JOIN Fruits f2 USING (fruit)
INNER JOIN Fruits f3 USING (fruit)
...
INNER JOIN Fruits fN USING (fruit)
WHERE f1.uid = 1 AND f2.uid = 2 ... AND fN.uid = M
But this kinds of looks silly to me. What if N = 10? or even 20? Is it sensible to do 20 joins? Is there some other join operation I'm missing?
Before learning the "magic" of joins, I used another method, which would apply in my current case as follows:
SELECT DISTINCT fruit FROM Fruits
WHERE uid IN (1, 2, ..., M)
GROUP BY fruit
HAVING COUNT (*) = N
It seems much more compact, but I remember somebody telling me to avoid using GROUP BY because it is slower than an INNER JOIN.
So, I guess my question really is, is there maybe a third method for doing the above? If yes/no, which one is the most efficient?
-- EDIT --
So, it seems a question has been asked before, bearing a resemblance to mine. The two answers provided, are actually the two methods I'm using.
But the question remains. Which one is really more efficient? Is there, maybe, a third one?

Related

MySQL Query which require multiple joins

I have a system that is used to log kids' their behavior. If a child is naughty it is logged as negative and if it has a well behaviour it is logged as positive.
For instance - if a child is rude it gets a 'Rude' negative and this is logged in the system with minus x points.
My structure can be seen in this sqlfiddle - http://sqlfiddle.com/#!9/46904
In the users_rewards_logged table, the reward_id column is a foreign key linked to either the deductions OR achievements table depending on the type of column.
If type is 1 is a deduction reward, if the type value is 2 is a achievement reward.
I basically want a query to list out something like this:
+------------------------------+
| reward | points | count |
+------------------------------+
| Good Work | 100 | 1 |
| Rude | -50 | 2 |
+------------------------------+
So it tallys up the figures and matches the reward depending on type (1 is a deduction, 2 is a achievement)
What is a good way to do this, based on the sqlfiddle?
Here's a query that gets the above desired results:
SELECT COALESCE(ua.name, ud.name) AS reward,
SUM(url.points) AS points, COUNT(url.logged_id) AS count
FROM users_rewards_logged url
LEFT JOIN users_deductions ud
ON ud.deduction_id = url.reward_id
AND url.type = 1
LEFT JOIN users_achievements ua
ON ua.achievement_id = url.reward_id
AND url.type = 2
GROUP BY url.reward_id, url.type
Your SQLFiddle had the order of points and type in the wrong order for the table users_rewards_logged.
Here's the fixed SQLFiddle with the result:
reward points count
Good Work 100 1
Rude -50 2
Although eggyal is correct--this is rather bad design for your data--what you ask can be done, but requires a UNION clause:
SELECT users_achievements.name, users_rewards_logged.points, COUNT(*)
FROM users_rewards_logged
INNER JOIN users_achievements ON users_achievements.achievement_id = users_rewards_logged.reward_id
WHERE users_rewards_logged.type = 2
UNION
SELECT users_deductions.name, users_rewards_logged.points, COUNT(*)
FROM users_rewards_logged
INNER JOIN users_deductions ON users_deductions.deduction_id = users_rewards_logged.reward_id
WHERE users_rewards_logged.type = 1
GROUP BY 1, 2
There's no reason NOT to combine the achievements and deductions tables and just use non-conflicting codes. If you combined the tables, then you would no longer need the UNION clause--your query would be MUCH simpler.
I noticed that you have two tables (users_deductions and users_achievements) that defines the type of reward. As #eggyal stated, you are violating the principle of orthogonal design, which causes the lack of normalization of your schema.
So, I have combined the tables users_deductions and users_achievements in one table called reward_type.
The result is in this fiddle: http://sqlfiddle.com/#!9/813d5/6

SQL view query not working

These are my tables:
portal_users(id, first_name, last_name, email, ...,)
courses(id, name, location, capacity, ...,)
feedback_questions(id, question, ...,)
feedback_answers(id, answers, IDquestion, IDuser, IDcourse)
I want to do a view with this:
course | first_name | last_name | IDuser | IDcourse | question_1 | answer_1 | question_2 | answer_2
So far
CREATE VIEW feedback_answers_vw
as
SELECT
fa.id,
fa.answer,
fq.question,
pu.first_name,
pu.last_name,
fa.IDuser,
fa.IDcourse
FROM feedback_answers fa
INNER JOIN feedback_questions fq
ON fa.IDquestion = fq.id
INNER JOIN portal_users pu
ON fa.IDuser = pu.id
INNER JOIN courses cu
on fa.IDcourse = cu.id
GROUP BY
fa.IDcourse, fa.IDuser
This just display one question and its answers, but not all the questions that belong to the same course and user.
I could hardcode this with something like this in the SELECT statement
SELECT
fa.id,
(select question
from feedback_questions
where id = 1) as question_1,
(select question
from feedback_questions
where id = 2) as question_2,
(select question
from feedback_questions
where id = 3) as question_3,
pu.first_name,
pu.last_name,
fa.IDuser,
fa.IDcourse
But I want to do it in the right way, so I won't be changing the code everytime that a question is added.
Edit:
This is data example of my tables:
**Portal users:**
1, tom, hanks, tom_hanks#example.com, ...,
2, steven, spielberg, steven#example.com, ...,
**Courses:**
1, quality, california, 30
2, information technologies, texas, 24
**Questions:**
1, How did you find the course?, ...,
2, Do you want purchase order?, ...,
**Answers:**
1, Internet, 1, 1, 1
2, yes, 2, 1, 1
3, TV, 1, 2, 1,
4, no, 2, 2, 1,
5, Internet, 1, 1, 2
6, yes, 1, 1, 2
This are data example I want to display in the view:
course|first_name|last_name|IDuser|IDcourse|Question_1|Answer_1|Question_2|Answer_2
----------------------------------------------------------------------------------
quality | tom | hanks | 1 | 1 | How did you find the course? | Internet | Do you want purchase order? | yes
quality | steven | spielberg | 2 | 1 | How did you find the course? | TV | Do you want purchase order? | no
Information technologies | tom | hanks | 1 | 2 How did you find the course? | Internet | Do you want purchase order? | yes
I dont know if you would actually need it as a view, and here is a sample query for it.
If you look at the first part of the where clause, it is based on a minimum of question ID = 1. I then join that to the course table and portal_user table to get those fields. Since the feedback answers is the first table, we can immediately get that answer, but also join to the questions table to get the question.
So now, how to simply expand this query for future questions? Notice the LEFT-JOINs right after that for the feedback answers AGAIN, but this time as an alias "fa2" (feedback_answers2), and the JOIN clause is based on the same user, same course, but only for question ID = 2. This then joins to the questions table on the fa2.questionID to the questions table (alias fq2). And then another exact same setup but for question 3. So here, I am doing nothing but using the same tables, but just different aliases. They all start with question 1, and if there is a question 2, get it... if a question 3, get it too, expand as needed, no group by, etc.
So now if you only cared about a single course you were interested, just add that to the outermost WHERE clause, nothing else needs to change. Get 10 questions, copy/paste the LEFT-JOIN components for the corresponding question IDs.
SELECT
c.name as course,
pu.first_name,
pu.last_name,
fa1.IDUser,
fa1.IDCourse,
fq1.question as question1,
fa1.answers as answer1,
fq2.question as question2,
fa2.answers as answer2,
fq3.question as question3,
fa3.answers as answer3
from
feedback_answers fa1
JOIN courses c
ON fa1.IDCourse = c.id
JOIN portal_users pu
ON fa1.IDUser = pu.id
JOIN feedback_questions fq1
ON fa1.IDquestion = fq1.id
LEFT JOIN feedback_answers fa2
ON fa1.IDUser = fa2.IDUser
AND fa1.IDCourse = fa2.IDCourse
AND fa2.id = 2
LEFT JOIN feedback_questions fq2
ON fa2.IDquestion = fq2.id
LEFT JOIN feedback_answers fa3
ON fa1.IDUser = fa3.IDUser
AND fa1.IDCourse = fa3.IDCourse
AND fa2.id = 3
LEFT JOIN feedback_questions fq3
ON fa3.IDquestion = fq3.id
where
fa1.id = 1
After some sample data was posted, it is clear a pivot table (in access/excel known as a 'crosstab') is needed here. We could hard code a select with some case statements to cover a fixed number of questions, but that solution would have to be revisited each time a question was added or removed. #drapp has such a solution below.
However, what we really want is the number of columns returned by the view to grow and shrink with the number of questions. Unfortunately mysql doesn't have any built in functions to help us build one. The best solution I found online is http://www.artfulsoftware.com/infotree/qrytip.php?id=523. The author proposes building the sql select statement dynamically then executing it. It is actually a pretty elegant solution.
Faced with this requirement, I wouldn't do this work in SQL. The problem I foresee is that as the numbers of questions rises, more and more columns will be returned; your view would quickly slow down to a crawl, and eventually stop working altogether. I would try to transform the data outside of sql, perhaps in an etl tool, application server, or BI tool.
Or, if I had the ability to (and I never do), I would switch database engines. Here are three solutions from other engines that provide tools for creating pivot tables:
Oracle: http://www.oracle.com/technetwork/articles/sql/11g-pivot-097235.html
Postgres: http://www.postgresql.org/docs/9.1/static/tablefunc.html
SQL Server: http://blogs.msdn.com/b/spike/archive/2009/03/03/pivot-tables-in-sql-server-a-simple-sample.aspx
We just talked about views in my DB Design class and my Professor said that you can't use a join in a VIEW statement. I can't recall why not but your issues looks familiar.

Select one value from a group based on order from other columns

Problem
Suppose I have this table tab (fiddle available).
| g | a | b | v |
---------------------
| 1 | 3 | 5 | foo |
| 1 | 4 | 7 | bar |
| 1 | 2 | 9 | baz |
| 2 | 1 | 1 | dog |
| 2 | 5 | 2 | cat |
| 2 | 5 | 3 | horse |
| 2 | 3 | 8 | pig |
I'm grouping rows by g, and for each group I want one value from column v. However, I don't want any value, but I want the value from the row with maximal a, and from all of those, the one with maximal b. In other words, my result should be
| 1 | bar |
| 2 | horse |
Current solution
I know of a query to achieve this:
SELECT grps.g,
(SELECT v FROM tab
WHERE g = grps.g
ORDER BY a DESC, b DESC
LIMIT 1) AS r
FROM (SELECT DISTINCT g FROM tab) grps
Question
But I consider this query rather ugly. Mostly because it uses a dependant subquery, which feels like a real performance killer. So I wonder whether there is an easier solution to this problem.
Expected answers
The most likely answer I expect to this question would be some kind of add-on or patch for MySQL (or MariaDB) which does provide a feature for this. But I'll welcome other useful inspirations as well. Anything which works without a dependent subquery would qualify as an answer.
If your solution only works for a single ordering column, i.e. couldn't distinguish between cat and horse, feel free to suggest that answer as well as I expect it to be still useful to the majority of use cases. For example, 100*a+b would be a likely way to order the above data by both columns while still using only a single expression.
I have a few pretty hackish solutions in mind, and might add them after a while, but I'll first look and see whether some nice new ones pour in first.
Benchmark results
As it is pretty hard to compare the various answers just by looking at them, I've run some benchmarks on them. This was run on my own desktop, using MySQL 5.1. The numbers won't compare to any other system, only to one another. You probably should be doing your own tests with your real-life data if performance is crucial to your application. When new answers come in, I might add them to my script, and re-run all the tests.
100,000 items, 1,000 groups to choose from, InnoDb:
0.166s for MvG (from question)
0.520s for RichardTheKiwi
2.199s for xdazz
19.24s for Dems (sequential sub-queries)
48.72s for acatt
100,000 items, 50,000 groups to choose from, InnoDb:
0.356s for xdazz
0.640s for RichardTheKiwi
0.764s for MvG (from question)
51.50s for acatt
too long for Dems (sequential sub-queries)
100,000 items, 100 groups to choose from, InnoDb:
0.163s for MvG (from question)
0.523s for RichardTheKiwi
2.072s for Dems (sequential sub-queries)
17.78s for xdazz
49.85s for acatt
So it seems that my own solution so far isn't all that bad, even with the dependent subquery. Surprisingly, the solution by acatt, which uses a dependent subquery as well and which I therefore would have considered about the same, performs much worse. Probably something the MySQL optimizer can't cope with. The solution RichardTheKiwi proposed seems to have good overall performance as well. The other two solutions heavily depend on the structure of the data. With many groups small groups, xdazz' approach outperforms all other, whereas the solution by Dems performs best (though still not exceptionally good) for few large groups.
SELECT g, a, b, v
FROM (
SELECT *,
#rn := IF(g = #g, #rn + 1, 1) rn,
#g := g
FROM (select #g := null, #rn := 0) x,
tab
ORDER BY g, a desc, b desc, v
) X
WHERE rn = 1;
Single pass. All the other solutions look O(n^2) to me.
This way doesn't use sub-query.
SELECT t1.g, t1.v
FROM tab t1
LEFT JOIN tab t2 ON t1.g = t2.g AND (t1.a < t2.a OR (t1.a = t2.a AND t1.b < t2.b))
WHERE t2.g IS NULL
Explanation:
The LEFT JOIN works on the basis that when t1.a is at its maximum value, there is no s2.a with a greater value and the s2 rows values will be NULL.
Many RDBMS have constructs that are particularly suited to this problem. MySQL isn't one of them.
This leads you to three basic approaches.
Check each record to see if it is one you want, using EXISTS and a correlated sub-query in an EXISTS clause. (#acatt's answer, but I understand that MySQL doesn't always optimise this very well. Ensure that you have a composite index on (g,a,b) before assuming that MySQL won't do this very well.)
Do a half cartesian product to full-fill the same check. Any record which does not join is a target record. Where each group ('g') is large, this can quickly degrade performance (If there are 10 records for each unique value of g, this will yield ~50 records and discard 49. For a group size of 100 it yields ~5000 records and discard 4999), but it is great for small group sizes. (#xdazz's answer.)
Or use multiple sub-queries to determine the MAX(a) and then the MAX(b)...
Multiple sequential sub-queries...
SELECT
yourTable.*
FROM
(SELECT g, MAX(a) AS a FROM yourTable GROUP BY g ) AS searchA
INNER JOIN
(SELECT g, a, MAX(b) AS b FROM yourTable GROUP BY g, a) AS searchB
ON searchA.g = searchB.g
AND searchA.a = searchB.a
INNER JOIN
yourTable
ON yourTable.g = searchB.g
AND yourTable.a = searchB.a
AND yourTable.b = searchB.b
Depending on how MySQL optimises the second sub-query, this may or may not be more performant than the other options. It is, however, the longest (and potentially least maintainable) code for the given task.
Assuming an composite index on all three search fields (g, a, b), I would presume it to be best for large group sizes of g. But that should be tested.
For small group sizes of g, I'd go with #xdazz's answer.
EDIT
There is also a brute force approach.
Create an identical table, but with an AUTO_INCREMENT column as an id.
Insert your table into this clone, ordered by g, a, b.
The id's can then be found with SELECT g, MAX(id).
This result can then be used to look-up the v values you need.
This is unlikely to be the best approach. If it is, it is effectively a condmenation of MySQL's optimiser's ability to deal with this type of problem.
That said, every engine has it's weak spots. So, personally, I try everything until I think I understand how the RDBMS is behaving and can make my choice :)
EDIT
Example using ROW_NUMBER(). (Oracle, SQL Server, PostGreSQL, etc)
SELECT
*
FROM
(
SELECT
ROW_NUMBER() OVER (PARTITION BY g ORDER BY a DESC, b DESC) AS sequence_id,
*
FROM
yourTable
)
AS data
WHERE
sequence_id = 1
This can be solved using a correlated query:
SELECT g, v
FROM tab t
WHERE NOT EXISTS (
SELECT 1
FROM tab
WHERE g = t.g
AND a > t.a
OR (a = t.a AND b > t.b)
)

How do I compute a ranking with MySQL stored procedures?

Let's assume we have this very simple table:
|class |student|
---------------
Math Alice
Math Bob
Math Peter
Math Anne
Music Bob
Music Chis
Music Debbie
Music Emily
Music David
Sports Alice
Sports Chris
Sports Emily
.
.
.
Now I want to find out, who I have the most classes in common with.
So basically I want a query that gets as input a list of classes (some subset of all classes)
and returns a list like:
|student |common classes|
Brad 6
Melissa 4
Chris 3
Bob 3
.
.
.
What I'm doing right now is a single query for every class. Merging the results is done on the client side. This is very slow, because I am a very hardworking student and I'm attending around 1000 classes - and so do most of the other students. I'd like to reduce the transactions and do the processing on the server side using stored procedures. I have never worked with sprocs, so I'd be glad if someone could give me some hints on how to do that.
(note: I'm using a MySQL cluster, because it's a very big school with 1 million classes and several million students)
UPDATE
Ok, it's obvious that I'm not a DB expert ;) 4 times the nearly the same answer means it's too easy.
Thank you anyway! I tested the following SQL statement and it's returning what I need, although it is very slow on the cluster (but that will be another question, I guess).
SELECT student, COUNT(class) as common_classes
FROM classes_table
WHERE class in (my_subject_list)
GROUP BY student
ORDER BY common_classes DESC
But actually I simplified my problem a bit too much, so let's make a bit it harder:
Some classes are more important than others, so they are weighted:
| class | importance |
Music 0.8
Math 0.7
Sports 0.01
English 0.5
...
Additionally, students can be more ore less important.
(In case you're wondering what this is all about... it's an analogy. And it's getting worse. So please just accept that fact. It has to do with normalizing.)
|student | importance |
Bob 3.5
Anne 4.2
Chris 0.3
...
This means a simple COUNT() won't do it anymore.
In order to find out who I have the most in common with, I want to do the following:
map<Student,float> studentRanking;
foreach (Class c in myClasses)
{
float myScoreForClassC = getMyScoreForClass(c);
List students = getStudentsAttendingClass(c);
foreach (Student s in students)
{
float studentScoreForClassC = c.classImportance*s.Importance;
studentRanking[s] += min(studentScoreForClassC, myScoreForClassC);
}
}
I hope it's not getting too confusing.
I should also mention that I myself am not in the database, so I have to tell the SELECT statement / stored procedure, which classes I'm attending.
SELECT
tbl.student,
COUNT(tbl.class) AS common_classes
FROM
tbl
WHERE tbl.class IN (SELECT
sub.class
FROM
tbl AS sub
WHERE
(sub.student = "BEN")) -- substitue "BEN" as appropriate
GROUP BY tbl.student
ORDER BY common_classes DESC;
SELECT student, COUNT(class) as common_classes
FROM classes_table
WHERE class in (my_subject_list)
GROUP BY student
ORDER BY common_classes DESC
Update re your question update.
Assuming there's a table class_importance and student_importance as you describe above:
SELECT classes.student, SUM(ci.importance*si.importance) AS weighted_importance
FROM classes
LEFT JOIN class_importance ci ON classes.class=ci.class
LEFT JOIN student_importance si ON classes.student=si.student
WHERE classes.class in (my_subject_list)
GROUP BY classes.student
ORDER BY weighted_importance DESC
The only thing this doesn't have is the LEAST(weighted_importance, myScoreForClassC) because I don't know how you calculate that.
Supposing you have another table myScores:
class | score
Math 10
Sports 0
Music 0.8
...
You can combine it all like this (see the extra LEAST inside the SUM):
SELECT classes.student, SUM(LEAST(m.score,ci.importance*si.importance)) -- min
AS weighted_importance
FROM classes
LEFT JOIN class_importance ci ON classes.class=ci.class
LEFT JOIN student_importance si ON classes.student=si.student
LEFT JOIN myScores m ON classes.class=m.class -- add in myScores
WHERE classes.class in (my_subject_list)
GROUP BY classes.student
ORDER BY weighted_importance DESC
If your myScores didn't have a score for a particular class and you wanted to assign some default, you could use IFNULL(m.score,defaultvalue).
As I understand your question, you can simply run a query like this:
SELECT `student`, COUNT(`class`) AS `commonClasses`
FROM `classes_to_students`
WHERE `class` IN ('Math', 'Music', 'Sport')
GROUP BY `student`
ORDER BY `commonClasses` DESC
Do you need to specify the classes? Or could you just specify the student? Knowing the student would let you get their classes and then get the list of other students who share those classes.
SELECT
otherStudents.Student,
COUNT(*) AS sharedClasses
FROM
class_student_map AS myClasses
INNER JOIN
class_student_map AS otherStudents
ON otherStudents.class = myClasses.class
AND otherStudents.student != myClasses.student
WHERE
myClasses.student = 'Ben'
GROUP BY
otherStudents.Student
EDIT
To follow up your edit, you just need to join on the new table and do your calculation.
Using the SQL example you gave in the edit...
SELECT
classes_table.student,
MIN(class_importance.importance * student_importance.importance) as rank
FROM
classes_table
INNER JOIN
class_important
ON classes_table.class = class_importance.class
INNER JOIN
student_important
ON classes_table.student = student_importance.student
WHERE
classes_table.class in (my_subject_list)
GROUP BY
classes_table.student
ORDER BY
2

GROUP BY does not remove duplicates

I have a watchlist system that I've coded, in the overview of the users' watchlist, they would see a list of records, however the list shows duplicates when in the database it only shows the exact, correct number.
I've tried GROUP BY watch.watch_id, GROUP BY rec.record_id, none of any types of group I've tried seems to remove duplicates. I'm not sure what I'm doing wrong.
SELECT watch.watch_date,
rec.street_number,
rec.street_name,
rec.city,
rec.state,
rec.country,
usr.username
FROM
(
watchlist watch
LEFT OUTER JOIN records rec ON rec.record_id = watch.record_id
LEFT OUTER JOIN members usr ON rec.user_id = usr.user_id
)
WHERE watch.user_id = 1
GROUP BY watch.watch_id
LIMIT 0, 25
The watchlist table looks like this:
+----------+---------+-----------+------------+
| watch_id | user_id | record_id | watch_date |
+----------+---------+-----------+------------+
| 13 | 1 | 22 | 1314038274 |
| 14 | 1 | 25 | 1314038995 |
+----------+---------+-----------+------------+
GROUP BY does not "remove duplicates". GROUP BY allows for aggregation. If all you want is to combine duplicated rows, use SELECT DISTINCT.
If you need to combine rows that are duplicate in some columns, use GROUP BY but you need to to specify what to do with the other columns. You can either omit them (by not listing them in the SELECT clause) or aggregate them (using functions like SUM, MIN, and AVG). For example:
SELECT watch.watch_id, COUNT(rec.street_number), MAX(watch.watch_date)
... GROUP by watch.watch_id
EDIT
The OP asked for some clarification.
Consider the "view" -- all the data put together by the FROMs and JOINs and the WHEREs -- call that V. There are two things you might want to do.
First, you might have completely duplicate rows that you wish to combine:
a b c
- - -
1 2 3
1 2 3
3 4 5
Then simply use DISTINCT
SELECT DISTINCT * FROM V;
a b c
- - -
1 2 3
3 4 5
Or, you might have partially duplicate rows that you wish to combine:
a b c
- - -
1 2 3
1 2 6
3 4 5
Those first two rows are "the same" in some sense, but clearly different in another sense (in particular, they would not be combined by SELECT DISTINCT). You have to decide how to combine them. You could discard column c as unimportant:
SELECT DISTINCT a,b FROM V;
a b
- -
1 2
3 4
Or you could perform some kind of aggregation on them. You could add them up:
SELECT a,b, SUM(c) "tot" FROM V GROUP BY a,b;
a b tot
- - ---
1 2 9
3 4 5
You could add pick the smallest value:
SELECT a,b, MIN(c) "first" FROM V GROUP BY a,b;
a b first
- - -----
1 2 3
3 4 5
Or you could take the mean (AVG), the standard deviation (STD), and any of a bunch of other functions that take a bunch of values for c and combine them into one.
What isn't really an option is just doing nothing. If you just list the ungrouped columns, the DBMS will either throw an error (Oracle does that -- the right choice, imo) or pick one value more or less at random (MySQL). But as Dr. Peart said, "When you choose not to decide, you still have made a choice."
While SELECT DISTINCT may indeed work in your case, it's important to note why what you have is not working.
You're selecting fields that are outside of the GROUP BY. Although MySQL allows this, the exact rows it returns for the non-GROUP BY fields is undefined.
If you wanted to do this with a GROUP BY try something more like the following:
SELECT watch.watch_date,
rec.street_number,
rec.street_name,
rec.city,
rec.state,
rec.country,
usr.username
FROM
(
watchlist watch
LEFT OUTER JOIN est8_records rec ON rec.record_id = watch.record_id
LEFT OUTER JOIN est8_members usr ON rec.user_id = usr.user_id
)
WHERE watch.watch_id IN (
SELECT watch_id FROM watch WHERE user_id = 1
GROUP BY watch.watch_id)
LIMIT 0, 25
I Would never recommend using SELECT DISTINCT, it's really slow on big datasets.
Try using things like EXISTS.
You are grouping by watch.watch_id and you have two results, which have different watch IDs, so naturally they would not be grouped.
Also, from the results displayed they have different records. That looks like a perfectly valid expected results. If you are trying to only select distinct values, then you don't want ot GROUP, but you want to select by distinct values.
SELECT DISTINCT()...
If you say your watchlist table is unique, then one (or both) of the other tables either (a) has duplicates, or (b) is not unique by the key you are using.
To suppress duplicates in your results, either use DISTINCT as #Laykes says, or try
GROUP BY watch.watch_date,
rec.street_number,
rec.street_name,
rec.city,
rec.state,
rec.country,
usr.username
It sort of sounds like you expect all 3 tables to be unique by their keys, though. If that is the case, you are simply masking some other problem with your SQL by trying to retrieve distinct values.