MySQL get duplicate rows in subquery - mysql

I want to display all duplicate records from my table, rows are like this
uid planet degree
1 1 104
1 2 109
1 3 206
2 1 40
2 2 76
2 3 302
I have many different OR statements with different combinations in subquery and I want to count every one of them which matches, but it only displays the first match of each planet and degree.
Query:
SELECT DISTINCT
p.uid,
(SELECT COUNT(*)
FROM Params AS p2
WHERE p2.uid = p.uid
AND(
(p2.planet = 1 AND p2.degree BETWEEN 320 - 10 AND 320 + 10) OR
(p2.planet = 7 AND p2.degree BETWEEN 316 - 10 AND 316 + 10)
...Some more OR statements...
)
) AS counts FROM Params AS p HAVING counts > 0 ORDER BY p.uid DESC
any solution folks?

updated
So, the problem most people have with their counting-joined-sub-query-group-queries, is that the base query isn't right, and the following may seem like a complete overkill for this question ;o)
base data
in this particular example what you would want as a data basis is at first this:
(uidA, planetA, uidB, planetB) for every combination of player A and player B planets. that one is quite simple (l is for left, r is for right):
SELECT l.uid, l.planet, r.uid, r.planet
FROM params l, params r
first step done.
filter data
now you want to determine if - for one row, meaning one pair of planets - the planets collide (or almost collide). this is where the WHERE comes in.
WHERE ABS(l.degree-r.degree) < 10
would for example only leave those pairs of planet with a difference in degrees of less than 10. more complex stuff is possible (your crazy conditional ...), for example if the planets have different diameter, you may add additional stuff. however, my advise would be, that you put some additional data that you have in your query into tables.
for example, if all 1st planets players have the same size, you could have a table with (planet_id, size). If every planet can have different sizes, add the size to the params table as a column.
then your WHERE clause could be like:
WHERE l.size+r.size < ABS(l.degree-r.degree)
if for example two big planets with size 5 and 10 should at least be 15 degrees apart, this query would find all those planets that aren't.
we assume, that you have a nice conditional, so at this point, we have a list of (uidA, planetA, uidB, planetB) of planets, that are close to colliding or colliding (whatever semantics you chose). the next step is to get the data you're actually interested in:
limit uidA to a specific user_id (the currently logged in user for example)
add l.uid = <uid> to your WHERE.
count for every planet A, how many planets B exist, that threaten collision
add GROUP BY l.uid, l.planet,
replace r.uid, r.planet with count(*) as counts in your SELECT clause
then you can even filter: HAVING counts > 1 (HAVING is the WHERE for after you have GROUPed)
and of course, you can
filter out certain players B that may not have planetary interactions with player A
add to your WHERE
r.uid NOT IN (1)
find only self collisions
WHERE l.uid = r.uid
find only non-self collisions
WHERE l.uid <> r.uid
find only collisions with one specific planet
WHERE l.planet = 1
conclusion
a structured approach where you start from the correct base data, then filter it appropriately and then group it, is usually the best approach. if some of the concepts are unclear to you, please read up on them online, there are manuals everywhere
final query could look something like this
SELECT l.uid, l.planet, count(*) as counts
FROM params l, params r
WHERE [ collision-condition ]
GROUP BY l.uid, l.planet
HAVING counts > 0
if you want to collide a non-planet object, you might want to either make a "virtual table", so instead of FROM params l, params r you do (with possibly different fields, I just assume you add a size-field that is somehow used):
FROM params l, (SELECT 240 as degree, 2 as planet, 5 as size) r
multiple:
FROM params l, (SELECT 240 as degree, 2 as planet, 5 as size
UNION
SELECT 250 as degree, 3 as planet, 10 as size
UNION ...) r

Related

Creating a SQL view from tables without UIDs

I have two tables:
match_rating, which have data on a team's performance in a match. There are naturally two tuples for every matchId (since there are two teams to each match). The PK is matchId, teamId.
event, which has information on events during matches. The PK is an autoincremented UID, and it contains the Foreign Keys match_id and subject_team_id as well.
Now I want to create a new view which counts how many times certain events happen in a match, for each team, with fields like this:
But for the life of me I cannot get around the fact that there are 1) two tuples for each match in the match_rating table, and 2) querying the event table on match_id returns events for both teams.
The closest I got was something like this:
SELECT SUM(
CASE
WHEN evt.event_type_id = 101 THEN 1
WHEN evt.event_type_id = 111 THEN 1
WHEN evt.event_type_id = 121 THEN 1
[etc]
END
) AS 'mid_chances',
SUM(
CASE
WHEN evt.event_type_id = 103 THEN 1
WHEN evt.event_type_id = 113 THEN 1
WHEN evt.event_type_id = 123 THEN 1
[etc]
END
) AS 'right_chances',
mr.tactic,
mr.tactic_skill,
mr.bp,
evt.match_id,
evt.subject_team_id
FROM event evt
JOIN match_rating mr
ON evt.match_id = mr.match_id
WHERE evt.event_type_id BETWEEN 100 AND 104 OR
evt.event_type_id BETWEEN 110 AND 114 OR
evt.event_type_id BETWEEN 120 AND 124 OR
[etc]
GROUP BY evt.match_id
ORDER BY `right_chances` DESC
But still, this counts the events twice, reporting 2 events where there was only 1, 6 for 3 events and so on. I have tried grouping on team_id as well (GROUP BY evt.match_id AND team_id) , but that returns only 2 rows with all events counted.
I hope I have made my problem clear, and it should be obvious that I really need a good tip or two.
Edit for clarity (sorry):
Sample data for match_rating table:
Sample data for the event table:
What I would like to see as the result is this:
That is, two tuples for each match, one for each team, where the types of events that team had is summed up. Thanks so much for looking into this!
Update after comments/feedback
OK.. just to confirm, what you want is
Each row of the output represents a team within a match
Other values (other than match_id and team_id) are sums or other aggregations across multiple rows?
If that is the case, then I believe you should be doing a GROUP BY the match_id and team_id. This should cause the correct number of rows to be generated (one for each match_id/team_id combination). You say in your question that you have tried it already - I suggest reviewing it (potentially after also considering the below).
With your data, it appears that the 'event' table also has a field which indicates the team_id. To ensure you only get the relevant team's events, I suggest your join between match_rating and event be on both fields e.g.,
FROM event evt
JOIN match_rating mr
ON evt.match_id = mr.match_id
AND evt.subject_team_id = mr.team_id
Previous answer - does not answer the question (as per later comments)
Just confirming - the issue is that when you run it, for each match it returns 2 rows - one for each team - but you want to do processing on both teams as one row only?
As such, you could do a few things (e.g., self-join the match rating table to itself, with Team1 ratings and Team2 ratings).
Alternatively, you could modify your FROM to have joins to match_rating twice - where the first has the lower ID for the two teams e.g.,
FROM event evt
JOIN match_rating mr_team1
ON evt.match_id = mr_team1.match_id
JOIN match_rating mr_team2
ON evt.match_id = mr_team2.match_id
AND mr_team1.match_id < mr_team2.match_id
Of course, your processing then needs to be modified to take this into account e.g., one row represents a match, and you have a bunch of data for team1 and similar data for team2. You'd then, I assume, compare the data for team1 columns and team2 columns to get some sort of rating etc (e.g., chance for Team1 to win, etc).

Need a different permutation of groups of numbers

I have numbers from 1 to 36. What I am trying to do is put all these numbers into three groups and works out all various permutations of groups.
Each group must contain 12 numbers, from 1 to 36
A number cannot appear in more than one group, per permutation
Here is an example....
Permutation 1
Group 1: 1,2,3,4,5,6,7,8,9,10,11,12
Group 2: 13,14,15,16,17,18,19,20,21,22,23,24
Group 3: 25,26,27,28,29,30,31,32,33,34,35,36
Permutation 2
Group 1: 1,2,3,4,5,6,7,8,9,10,11,13
Group 2: 12,14,15,16,17,18,19,20,21,22,23,24
Group 3: 25,26,27,28,29,30,31,32,33,34,35,36
Permutation 3
Group 1: 1,2,3,4,5,6,7,8,9,10,11,14
Group 2: 12,11,15,16,17,18,19,20,21,22,23,24
Group 3: 25,26,27,28,29,30,31,32,33,34,35,36
Those are three example, I would expect there to be millions/billions more
The analysis that follows assumes the order of groups matters - that is, if the numbers were 1, 2, 3 then the grouping [{1},{2},{3}] is distinct from the grouping [{3},{2},{1}] (indeed, there are six distinct groupings when taking from this set of numbers).
In your case, how do we proceed? Well, we must first choose the first group. There are 36 choose 12 ways to do this, or (36!)/[(12!)(24!)] = 1,251,677,700 ways. We must then choose the second group. There are 24 choose 12 ways to do this, or (24!)/[(12!)(12!)] = 2,704,156 ways. Since the second choice is already conditioned upon the first we may get the total number of ways of taking the three groups by multiplying the numbers; the total number of ways to choose three equal groups of 12 from a pool of 36 is 3,384,731,762,521,200. If you represented numbers using 8-bit bytes then to store every list would take at least ~3 pentabytes (well, I guess times the size of the list, which would be 36 bytes, so more like ~108 pentabytes). This is a lot of data and will take some time to generate and no small amount of disk space to store, so be aware of this.
To actually implement this is not so terrible. However, I think you are going to have undue difficulty implementing this in SQL, if it's possible at all. Pure SQL does not have operations that return more than n^2 entries (for a simple cross join) and so getting such huge numbers of results would require a large number of joins. Moreover, it does not strike me as possible to generalize the procedure since pure SQL has no ability to do general recursion and therefore cannot do a variable number of joins.
You could use a procedural language to generate the groupings and then write the groupings into a database. I don't know whether this is what you are after.
n = 36
group1[1...12] = []
group2[1...12] = []
group3[1...12] = []
function Choose(input[1...n], m, minIndex, group)
if minIndex + m > n + 1 then
return
if m = 0 then
if group = group1 then
Choose(input[1...n], 12, 1, group2)
else if group = group2 then
group3[1...12] = input[1...12]
print group1, group2, group3
for i = i to n do
group[12 - m + 1] = input[i]
Choose(input[1 ... i - 1].input[i + 1 ... n], m - 1, i, group)
When you call this like Choose([1...36], 12, 1, group1) what it does is fill in group1 with all possible ordered subsequences of length 12. At that point, m = 0 and group = group1, so the call Choose([?], 12, 1, group2) is made (for every possible choice of group1, hence the ?). That will choose all remaining ordered subsequences of length 12 for group2, at which point again m = 0 and now group = group2. We may now safely assign group3 to the remaining entries (there is only one way to choose group3 after choosing group1 and group2).
We take ordered subsequences only by propagating the index at which to begin looking on the recursive call (minIdx). We take ordered subsequences to avoid getting permutations of the same set of 12 items (since order doesn't matter within a group).
Each recursive call to Choose in the loop passes input with one element removed: precisely that element that just got added to the group under consideration.
We check for minIndex + m > n + 1 and stop the recursion early because, in this case, we have skipped too many items in the input to be able to ever fill up the current group with 12 items (while choosing the subsequence to be ordered).
You will notice I have hard-coded the assumption of 12/36/3 groups right into the logic of the program. This was done for brevity and clarity, not because you can't make parameterize it in the input size N and the number of groups k to form. To do this, you'd need to create an array of groups (k groups of size N/k each), then call Choose with N/k instead of 12 and use a select/switch case statement instead of if/then/else to determine whether to Choose again or print. But those details can be left as an exercise.

SQL code need help understanding

I'm working on a movie database and I thought it might be a good idea to have some type of parental control in place for the future. I wrote some SQL code and it works for the most part, but i don't know why.
In my main movies table I have movies rated with the standard rating g, pg, pg-13, r, and NC-17.
Here is the SQL code i used
Select title
From movies
Where rating < "r";
I works though it still shows the NC-17 shows. If I change the r rating to NC-17 it only shows the g rated shows.
I know I can type out a longer SQL to give me the matches I want, but I want to understand why this code is performing the way it is.
Thanks for the help.
How is MySQL to know R is less than NC-17? MySQL knows how to sort numbers and letters but not movie ratings. You have to assign the ratings numbers and sort based on that.
For example:
Rating Value
------------------
G 1
PG 10
PG-13 20
R 30
NC-17 40
Than give each movie the numerical value of the rating (or use a join) and then sort on that.
SQL doesn't understand the movie rating system. The < operator looks at strings in alphabetical order. So when you say < 'R', it's looking for for all ratings that start with a letter before R in the alphabet. Since there are a limited number of options for ratings, you're best off doing something along the lines of this:
SELECT title
FROM movies
WHERE rating NOT LIKE 'R'
AND rating NOT LIKE 'NC-17'
Here is the query that would probably work for you if you want to rank them:
SELECT m.title, m.rating
FROM
(
SELECT s.title, s.rating,
CASE WHEN s.rating = 'G' THEN 1
WHEN s.rating = 'PG' THEN 2
WHEN s.rating = 'PG-13' THEN 3
WHEN s.rating = 'R' THEN 4
WHEN s.rating = 'NC-17' THEN 5
ELSE 6 END AS MovieRanking
FROM movies s
) m
WHERE m.MovieRanking < 4

mysql select rows not in time period (defined by two columns)

The issue looks like that:
I have a table with classes , each has start_time and end_time (stored as INT - minutes, eg 120 - 2:00, 130 - 2:10), what I need to do is to take selected by user classes and filter rest of them to retrive classes that do not collide with selected. Can anyone help with this ? Maybe some clue?
sample rows:
id start end
1 0 100
2 50 150
3 160 200
4 50 150
5 50 100
6 200 300
if I have selected id=1 then it should return row 3,6 (it covers with 2,4,5 between 50 and 100 so it's impossible to participate in both classes)
if I have selected id=2,6 then it should return row 3
if I have selected id=2 then it should return rows 3,6
if I have selected id=6 then it should return rows 1,2,3,4,5
if I have selected id=3 then it should return rows 1,2
That gives overlapping ids for one given id:
select id from classes c
inner join classes noc on noc.id = <given id>
where c.start > noc.end or c.end < noc.start
;
EDIT:
As far as I understood the extended examples now, you want to give arbitrary subsets of ids as input and want to have all ids which don't overlap with any of them. Let's try:
select c.*
from classes c
left join classes noc on noc.id in (<idlist>)
and noc.start < c.end
and noc.end > c.start
where noc.id is null;
The "<" and ">" might be "<=" and ">=" depending on meaning of "overlap".
You do not sound like searching for subclasses, but I think you will find your way through the set-djungle! ;-)
Your example "selected id=3" gives 1,2,4,5,6 to me, because 3 overlaps with no one.
Explanation:
Go through classes
Look for classes that overlap with the given classes
And show me only those classes where no overlapping class is found.
The noc stands for "not allowed class". If that class is found, there exists an overlapping to your given set.
Based on hardly anything other than assuming one table as you stated, you will need a self join:
SELECT rest.*
FROM classes AS chosen
RIGHT JOIN classes AS rest
ON rest.start_time NOT BETWEEN chosen.start_time AND chosen.end_time
AND rest.end_time NOT BETWEEN chosen.start_time AND chosen.end_time
WHERE chosen.ClassID = '4'
I've aliased the tables as 'chosen' for chosen class, and 'rest' for the rest of the class list. This will return all the 'rest' of the classes that don't overlap your chosen class mentioned in the WHERE clause.

MySQL query for items where average price is less than X?

I'm stumped with how to do the following purely in MySQL, and I've resorted to taking my result set and manipulating it in ruby afterwards, which doesn't seem ideal.
Here's the question. With a dataset of 'items' like:
id state_id price issue_date listed
1 5 450 2011 1
1 5 455 2011 1
1 5 490 2011 1
1 5 510 2012 0
1 5 525 2012 1
...
I'm trying to get something like:
SELECT * FROM items
WHERE ([some conditions], e.g. issue_date >= 2011 and listed=1)
AND state_id = 5
GROUP BY id
HAVING AVG(price) <= 500
ORDER BY price DESC
LIMIT 25
Essentially I want to grab a "group" of items whose average price fall under a certain threshold. I know that my above example "group by" and "having" are not correct since it's just going to give the AVG(price) of that one item, which doesn't really make sense. I'm just trying to illustrate my desired result.
The important thing here is I want all of the individual items in my result set, I don't just want to see one row with the average price, total, etc.
Currently I'm just doing the above query without the HAVING AVG(price) and adding up the individual items one-by-one (in ruby) until I reach the desired average. It would be really great if I could figure out how to do this in SQL. Using subqueries or something clever like joining the table onto itself are certainly acceptable solutions if they work well! Thanks!
UPDATE: In response to Tudor's answer below, here are some clarifications. There is always going to be a target quantity in addition to the target average. And we would always sort the results by price low to high, and by date.
So if we did have 10 items that were all priced at $5 and we wanted to find 5 items with an average < $6, we'd simply return the first 5 items. We wouldn't return the first one only, and we wouldn't return the first 3 grouped with the last 2. That's essentially how my code in ruby is working right now.
I would do almost an inverse of what Jasper provided... Start your query with your criteria to explicitly limit the few items that MAY qualify instead of getting all items and running a sub-select on each entry. Could pose as a larger performance hit... could be wrong, but here's my offering..
select
i2.*
from
( SELECT i.id
FROM items i
WHERE
i.issue_date > 2011
AND i.listed = 1
AND i.state_id = 5
GROUP BY
i.id
HAVING
AVG( i.price) <= 500 ) PreQualify
JOIN items i2
on PreQualify.id = i2.id
AND i2.issue_date > 2011
AND i2.listed = 1
AND i2.state_id = 5
order by
i2.price desc
limit
25
Not sure of the order by, especially if you wanted grouping by item... In addition, I would ensure an index on (state_id, Listed, id, issue_date)
CLARIFICATION per comments
I think I AM correct on it. Don't confuse "HAVING" clause with "WHERE". WHERE says DO or DONT include based on certain conditions. HAVING means after all the where clauses and grouping is done, the result set will "POTENTIALLY" accept the answer. THEN the HAVING is checked, and if IT STILL qualifies, includes in the result set, otherwise throws it out. Try the following from the INNER query alone... Do once WITHOUT the HAVING clause, then again WITH the HAVING clause...
SELECT i.id, avg( i.price )
FROM items i
WHERE i.issue_date > 2011
AND i.listed = 1
AND i.state_id = 5
GROUP BY
i.id
HAVING
AVG( i.price) <= 500
As you get more into writing queries, try the parts individually to see what you are getting vs what you are thinking... You'll find how / why certain things work. In addition, you are now talking in your updated question about getting multiple IDs and prices at apparent low and high range... yet you are also applying a limit. If you had 20 items, and each had 10 qualifying records, your limit of 25 would show all of the first item and 5 into the second... which is NOT what I think you want... you may want 25 of each qualified "id". That would wrap this query into yet another level...
What MySQL does makes perfectly sense. What you want to do does not make sense:
if you have let's say 4 items, each with price of 5 and you put HAVING AVERAGE <= 7 what you say is that the query should return ALL the permutations, like:
{1} - since item with id 1, can be a group by itself
{1,2}
{1,3}
{1,4}
{1,2,3}
{1,2,4}
...
and so on?
Your algorithm of computing the average in ruby is also not valid, if you have items with values 5, 1, 7, 10 - and seek for an average value of less than 7, element with value 10 can be returned just in a group with element of value 1. But, by your algorithm (if I understood correctly), element with value 1 is returned in the first group.
Update
What you want is something like the Knapsack problem and your approach is using some kind of Greedy Algorithm to solve it. I don't think there are straight, easy and correct ways to implement that in SQL.
After a google search, I found this article which tries to solve the knapsack problem with AI written in SQL.
By considering your item price as a weight, having the number of items and the desired average, you could compute the maximum value that can be entered in the 'knapsack' by multiplying desired_cost with number_of_items
I'm not entirely sure from your question, but I think this is a solution to your problem:
SELECT * FROM items
WHERE (some "conditions", e.g. issue_date > 2011 and listed=1)
AND state_id = 5
AND id IN (SELECT id
FROM items
GROUP BY id
HAVING AVG(price) <= 500)
ORDER BY price DESC
LIMIT 25
note: This is off the top of my head and I haven't done complex SQL in a while, so it might be wrong. I think this or something like it should work, though.