How to determine the first result of a group in MySQL? - mysql

So I'm doing a collectible cards managing app, and I have these tables:
"card": contains all the distinct cards ever made, and gives their informations (id, name, text, and so on)
"edition": contains all the card sets ever released (called it "edition" because "set" was a reserved word)
"cardinset": since a card can appear in more than one set, this is the associative table between the two. It also gives the number of the card in the set and the number of copies I have of it in french (fr) and in english (en) from that set.
Of course, all tables have a unique auto-increment id, called "id".
The purpose of my SQL request is to list all the cards of a set, ordered by the number of the card in the set. I need all the info of the "cardinset" entry (number, fr, en) and of course the info of the "card" entry (name, rarity, etc.), but I also need to know how many copies of each card I have in total (across all sets, not just in this one).
My SQL request looks like this (I removed a few fields that weren't important):
SELECT
c.name,
c.rarity,
SUM(cis.fr) + SUM(cis.en) AS available,
cis.number,
cis.fr,
cis.en
FROM
card AS c
INNER JOIN cardinset AS cis ON c.id = cis.cardId
WHERE
c.id IN
(
SELECT
cardId
FROM
cardinset AS cs
WHERE
setId = 104
ORDER BY
number
)
GROUP BY
c.id,
c.name
ORDER BY
cis.number
It almost works, but it doesn't retrieve the right cardinset entry for each card, since it takes the first one of the group, which is not always the one linked to the right set.
Example:
| c.name | c.rarity | available | cis.number | cis.fr | cis.en |
| -------------- | -------- | --------- | ---------- | ------ | ------ |
| Divine Verdict | Common | 9 | 008 | 1 | 1 |
Here, the card info (name and rarity) are correct, as well as the "available" field. However the cis field are wrong: they are part of a cis entry linking this card to another set.
The question is: is it possible to define which entry is the first in the group, and therefore is returned in this case? And if not, is there another way (maybe cleaner) to get the result I want?
Thank you in advance for your answer, I really don't know what to do here... I guess I've reached the limits of my knowledge of MySQL...
Here's a more precise example. This screenshot n°1 shows the first results of my query (described above), knowing that there are 212 results in total. They should be ordered by number, and there should be exactly one result of each number, and yet there are some exceptions:
n° 005, which should be "Divine Verdict" isn't there, because it appears instead as n° 008. That's because that card is part of 6 different sets, a we can see in screenshot n°2 (result of the query "SELECT * FROM cardinset WHERE cardId = 13984"), and the group returns the first entry, which is for set n°12 and not n°104 as I would have it. However the "available" field shows "9", which is the result I want: the sum of all the "fr" and "en" field for that card in all 6 sets it appears in.
There are other cards that don't have the right cardinset info: n° 011 and 019 are missing, but can be found lower with other cardinset info.

I believe this is the way you would want to format your query.
SELECT
c.name,
c.rarity,
cis.fr + cis.en AS available,
cis.number,
cis.fr,
cis.en
FROM
card AS c
INNER JOIN cardinset AS cis ON c.id = cis.cardId
WHERE
c.id IN
(
SELECT
cardId
FROM
cardinset AS cs
WHERE
setId = 104
GROUP BY
setID, cardID
)
ORDER BY
cis.number
The GROUP BY clause was moved into the sub select and modified to make sure an entry is the right combo of card/set. Also removed the SUMs because that was not necessary.

I made it at last!
I used a subquery to get the "available" field with a GROUP BY clause. It's long, and not very fast, but it gets the job done. If you have an idea that could improve it, don't hesitate.
SELECT e.code, cs.number, sub.name, sub.rarity, cs.fr, cs.en, sub.available
FROM cardinset as cs
INNER JOIN edition as e ON e.id = cs.setId
INNER JOIN (
SELECT c.id, c.name, c.rarity,
SUM(cis.fr)+SUM(cis.en) as available, SUM(cis.frused)+SUM(cis.enused) as used
FROM card as c
INNER JOIN cardinset as cis ON c.id = cis.cardId
WHERE c.id IN (
SELECT cardId
FROM cardinset as cins
WHERE setId = 54)
GROUP BY c.id, c.name
ORDER BY c.id
) AS sub ON cs.cardId = sub.id
WHERE setId = 54
ORDER BY cs.number

Related

Reuse body of a mysql query for both count and rows

Because I'm working with a framework (Magento) I don't have direct control of the SQL that is actually executed. I can build various parts of the query, but in different contexts its modified in different ways before it goes to the database.
Here is a simplified example of what I'm working with.
students enrolments
-------- ------------------
id| name student_id| class
--+----- ----------+-------
1| paul 1|biology
2|james 1|english
3| jo 2| maths
2|english
2| french
3|physics
3| maths
A query to show all students who are studying English together with all the courses those students are enrolled on, would be:
SELECT name, GROUP_CONCAT(enrolments.class) AS classes
FROM students LEFT JOIN enrolments ON students.id=enrolments.student_id
WHERE students.id IN ( SELECT e.student_id
FROM enrolments AS e
WHERE e.class LIKE "english" )
GROUP BY students.id
This will give the expected results
name| classes
----+----------------------
paul|biology, english
james|maths, english, french
Counting the number of students who study english would be trivial, if it weren't for the fact that Magento automatically uses portions of my first query. For the count, it modifies my original query as follows:
Removes the columns being selected. This would be the name and classes columns.
Adds a count(*) column to the select
Removes any group by clause
After this butchery, my query above becomes
SELECT COUNT(*)
FROM students LEFT JOIN enrolments ON students.id=enrolments.student_id
WHERE students.id IN ( SELECT e.student_id
FROM enrolments AS e
WHERE e.class LIKE "english" )
Which will not give me the number of students enrolled on the English course as I require. Instead it will give me the combined number of enrolments of all students who are enrolled on the English course.
I'm trying to come up with a query which can be used in both contexts, counting and getting the rows. I get to keep any join clauses and and where clauses and that's about it.
The problem with your original query is the GROUP BY clause. Selecting COUNT(*) by keeping the GROUP BY clause would result in two rows with a number of classes for each user:
| COUNT(*) |
|----------|
| 2 |
| 3 |
Removing the GROUP BY clause will just retun the number of all rows from the LEFT JOIN:
| COUNT(*) |
|----------|
| 5 |
The only way I see, magento could solve that problem, is to put the original query into a subquery (derived table) and count the rows of the result. But that might end up in terrible performance. I would also be fine with an exception, complaining that a query with a GROUP BY clause can not be used for pagination (or something like that). Just return an anexpected result is probably the worst what a library can do.
Well, it just so happens I have a solution. :-)
Use a corelated subquery for GROUP_CONCAT in the SELECT clause. This way you will not need a GROUP BY clause.
SELECT name, (SELECT GROUP_CONCAT(enrolments.class)
FROM enrolments
WHERE enrolments.student_id = students.id
) AS classes
FROM students
WHERE students.id IN ( SELECT e.student_id
FROM enrolments AS e
WHERE e.class LIKE "english" )
However, I would rewrite the query to use an INNER JOIN instead of an IN condition:
SELECT s.name, (
SELECT GROUP_CONCAT(e2.class)
FROM enrolments e2
WHERE e2.student_id = s.id
) AS classes
FROM students s
JOIN enrolments e1
ON e1.student_id = s.id
WHERE e1.class = "english";
Both queries will return the same result as your original one.
| name | classes |
|-------|----------------------|
| paul | biology,english |
| james | maths,english,french |
But also return the correct count when modified my magento.
| COUNT(*) |
|----------|
| 2 |
Demo: http://rextester.com/OJRU38109
Additionally - chances are good that it will even perform better, due to MySQLs optimizer, which often creates bad execution plans for queries with JOINs and GROUP BY.

Eliminate certain duplicated rows after group by

With this db:
Chef(cid,cname,age),
Recipe(rid,rname),
Cooked(orderid,cid,rid,price)
Customers(cuid,orderid,time,daytime,age)
[cid means chef id, and so on]
Given orders from customers, I need to find for each chef, the difference between his age and the average of people who ordered his/her meals.
I wrote the following query:
select cid, Ch.age - AVG(Cu.age) as Diff
from Chef Ch NATURAL JOIN Cooked Co,Customers Cu
where Co.orderid = Cu.orderid
group by cid
This solves the problem, but if you assume that customers has their unique id, it might not work,because then one can order two meals of the same chef and affect the calculation.
Now I know it can be answered with NOT EXISTS but I'm looking for a soultion which includes the group by function (something similar to what I wrote). So far I couldn't find (I searched and tried many ways, from select distinct , to manipulation in the where clause ,to "having count(distinct..)" )
Edit: People asked for an exmaple. i'm coding using SQLFiddle and it crashes alot, so I'll try my best:
cid | cuid | orderid | Cu.age
-----------------------------
1 333 1 20
1 200 2 41
1 200 5 41
2 4 3 36
Let's say Chef 1's age is 50 . My query will give you 50 - (20+40+40/3) = 16 and 2/3. althought it should actually be 50 - (20+40/2) = 20. (because the guy with id 200 ordered two recipes of our beloved Chef 1.).
Assume Chef 2's age is 47. My query will result:
cid | Diff
----------
1 16.667
2 11
Another edit: I wasn't taught any particular sql-query form.So I really have no idea what are the differences between Oracle's to MySql's to Microsoft Server's, so I'm basically "freestyle" querying.(I hope it will be good in my exam as well :O )
First, you should write your query as:
select cid, Ch.age - AVG(Cu.age) as Diff
from Chef Ch join
Cooked Co
on ch.cid = co.cid join
Customers Cu
on Co.orderid = Cu.orderid
group by cid;
Two different reasons:
NATURAL JOIN is just a bug waiting to happen. List the columns that you want used for the join, lest an unexpected field or spelling difference affect the results.
Never use commas in the FROM clause. Always use explicit JOIN syntax.
Next, the answer to your question is more complicated. For each chef, we can get the average age of the customers by doing:
select cid, avg(age)
from (select distinct co.cid, cu.cuid, cu.age
from Cooked Co join
Customers Cu
on Co.orderid = Cu.orderid
) c
group by cid;
Then, for the difference, you need to bring that information in as well. One method is in the subquery:
select cid, ( age - avg(cuage) ) as diff
from (select distinct co.cid, cu.cuid, cu.age as cuage, c.age as cage
from Chef c join
Cooked Co
on ch.cid = co.cid join
Customers Cu
on Co.orderid = Cu.orderid
) c
group by cid, cage;

mysql self-join fixing result order when parent is doesnot exists

I i have a table named Taxonomy it preaty much holds everything that has to do with the structure of the organization. as a school example table looks like
id | name | type | parent
1 | 2014 | year | null
2 | igcse| dep | 1
3 | kg1 | grade| 2
4 | c1 | class| 3
4 types, upper most is the school year (2014-2015) for example. under it school department, under it grade (grade1,2,3 etc) under it multiple classes.
when i need to get this i run a self join query like this :
SELECT y.name AS year,
d.name AS dep,
g.name AS grade,
c.name AS class,
e.name AS exam
FROM `taxonomys` e
left JOIN `taxonomys` c on c.id = e.parent
left JOIN `taxonomys` g on g.id = c.parent
left JOIN `taxonomys` d on d.id = g.parent
left JOIN `taxonomys` y on y.id = d.parent
where c.type in ('grade','department','class','year')
its working fine except with the so many nulls i get !
example result of query
as you can see, the classes shows correctly with year under year field,
yet on first row, year is under Class field, (shifted 3 cells).
when ever there is a null value is shifted. how can i fix that ?
thanks alot
EDIT
What is this table ?
A variation of Adjacency List Model, I added an extra column named type so that I can identify level of any row without having to retrieve whole path.
Table hold the structure of the school. example
In every new year, they create A Year example "2014-2015", then inside this year creates the school different departments "american diploma, Highschool, playshool, etc.." under that comes the school grades and under every grade come the classes..
Example
id name type parent
1 2014-2015 year null
2 highschool dep. 1
3 grade 10 grade 2
4 grade 11 grade 2
5 class a class 3
6 class b class 3
These inputs means
2014-2015
|__Highschool
|__ grade 10
|__ class a
|__ class b
|__ grade 11
after than another table link students to Class node, and another table link posts to class's and so on.
so this table basically holds the school structures. since years, departments, and grades are only there for organization of the school users "there is no data except names needed for them" i decided to have the all in one table.
Why did you build it in this a very very very bad/anti-pattern design ?
its actually working very nice for me !
(we hv 4 years with all the departments and students and classes, over 100k posts linked to user/class and over 10k users and its working smooth so far !)
I don't know what your output is supposed to be, and I don't know how your query is suppposed to get it, and I don't know how you are encoding what information in your table, but I suspect that the query below gives the result you want, and that it's somewhat redundant (namely the last 4 lines), and that your design is really, really, really an anti-pattern.
/*
rows y.n,d.n,g.n,c.n,e.n
where
taxonomies(y.i,y.n,'year',y.p)
AND taxonomies(d.i,d.n,'dep',y.i) AND taxonomies(g.i,g.n,'grade',d.p)
AND taxonomies(c.i,c.n,'class',g.i) AND taxonomies(e.i,e.n,'exam',c.i)
*/
SELECT y.name AS year,
d.name AS dep,
g.name AS grade,
c.name AS class,
e.name AS exam
FROM `taxonomys` y
JOIN `taxonomys` d on y.id = d.parent
JOIN `taxonomys` g on d.id = g.parent
JOIN `taxonomys` c on g.id = c.parent
JOIN `taxonomys` e on c.id = e.parent
WHERE y.type = 'year'
AND d.type = 'dep'
AND g.type = 'grade'
AND c.type = 'class'
AND e.type = 'exam'
They are shifting to the left and you are doing a left JOIN. That's a clue. I suggest to change it to only join. When you perform a left JOIN you get all the rows from the left table in conjunction with the matching rows from the right table. If there are no columns matching in the right table, it returns NULL values.
The columns shifting places is new to my eyes (not a SQL expert here though) but it seems to me that it could be related with the fact that you are retrieving all the data from the same table taxonomys.

MYSQL 4 Table Query

I could use some expert advice with a MYSQL query I'm trying to put together.
What i would like to do:
I'm trying to create a page that will allow users to perform an advanced search across multiple tables.
The 4 tables are:
members, profiles, skills, genre
Members:
*********************************
id | member_id | login | zipcode
*********************************
Profiles:
*********************************************************************
id | member_id | exp | commitment | practice | gigs | availability
*********************************************************************
Skills:
************************************************************************
id | member_id | lead_vocals | background_vocals | guitar | bass| drums
************************************************************************
Genre:
********************************************************************************
id | member_id | alternative | classic_rock | modern_rock | blues | heavy_metal
********************************************************************************
Skills and Genre represent check box values checked or not (1 or 0)
The search form would be a series of checkboxes and dropdowns that would allow a user to specify the specific items they want to search for.
What I need help with:
I need help coming up with the best way to put this query together. I've been reading up on Joins, Unions, Sub Queries and Derived tables. I can do some basic queries and get part of the data for example:
SELECT members.member_id FROM members LEFT JOIN skills ON members.member_id = skills.member_id WHERE skills.leadvocals = 1
However I just cant seem to wrap my head around putting it all together.
An example of the search criteria would look something like this:
A user fills out the form and wants to search for all members with (members table) zipcode = 11111 OR zipcode = 22222 (profiles table) commitment = ANY, practice = ANY, gigs = 1, availability = ANY (skills table) lead_vocals = 1 and lead_guitar = 1 (genre table) alternative = 1, modern_rock = 1, heavy_metal = 1
Note I already have the logic to calculate the zipcode distance and return a list of zip codes in the range.
At the end of the day the query just needs to return a list of results with member_id and login from the members table that match the criteria.
I'm not looking for somebody to just provide that answer (although I wouldn't mind the answer :)) I learn better by trying to figure it out on my own but I need some help getting started.
Thanks in advance.
The SQL query in the question seems valid, you just need to add the rest of the tables and conditions to get the data you want.
A user fills out the form and wants to search for all members with
(members table) zipcode = 11111 OR zipcode = 22222 (profiles table)
commitment = ANY, practice = ANY, gigs = 1, availability = ANY
(skills table) lead_vocals = 1 and lead_guitar = 1 (genre table)
alternative = 1, modern_rock = 1, heavy_metal = 1
You already put half of the query in your request, except it is written in English and it happens that SQL is just a small subset of English.
The tables you mentioned appear in the FROM clause:
FROM members m
INNER JOIN profiles p USING (member_id)
INNER JOIN skills s USING (member_id)
INNER JOIN genre g USING (member_id)
The conditions appear in the WHERE clause:
WHERE p.gigs = 1
AND s.lead_vocals = 1 AND s.guitar = 1
AND g.alternative = 1 AND g.modern_rock = 1 AND g.heavy_metal = 1
The fields that allow ANY value do not appear in the query, they do not filter the results.
Searching more than one value for zipcode can be done using the IN operator:
AND m.zipcode IN ('11111', '22222')
At the end of the day the query just needs to return a list of results with member_id and login from the members table that match the criteria.
The fields to be returned goes to the SELECT clause:
SELECT m.member_id, m.login
Maybe you want to get the list of members in a specific order, for example sorted by their login names:
ORDER BY m.login
... or by some of their skills; put lead vocals in front of the list:
ORDER BY s.lead_vocals DESC
(order DESCending to get those having 1 in front of those having 0 in column lead_vocals
Now, if we put all together we get the complete query:
SELECT m.member_id, m.login
FROM members m
INNER JOIN profiles p USING (member_id)
INNER JOIN skills s USING (member_id)
INNER JOIN genre g USING (member_id)
WHERE p.gigs = 1
AND s.lead_vocals = 1 AND s.lead_guitar = 1
AND g.alternative = 1 AND g.modern_rock = 1 AND g.heavy_metal = 1
AND m.zipcode IN ('11111', '22222')
ORDER BY s.lead_vocals DESC, m.login
Because you get the information from user input you don't know in advance that practice, for example, is allowed to have ANY value. You need to compose the query from pieces, using the data received from the form.
try this query.
select
m.*
from
members m
left join Profiles p on p.member_id = m.id
left join Skills s on s.member_id = m.id
left join Genre g on g.member_id = m.id
where (m.zipcode = 11111 OR m.zipcode = 22222) and p.gigs = 1 and s.lead_vocals = 1 and s.lead_guitar = 1 and g.alternative = 1, g.modern_rock = 1, g.heavy_metal = 1

MySQL selecting rows with a max id and matching other conditions

Using the tables below as an example and the listed query as a base query, I want to add a way to select only rows with a max id! Without having to do a second query!
TABLE VEHICLES
id vehicleName
----- --------
1 cool car
2 cool car
3 cool bus
4 cool bus
5 cool bus
6 car
7 truck
8 motorcycle
9 scooter
10 scooter
11 bus
TABLE VEHICLE NAMES
nameId vehicleName
------ -------
1 cool car
2 cool bus
3 car
4 truck
5 motorcycle
6 scooter
7 bus
TABLE VEHICLE ATTRIBUTES
nameId attribute
------ ---------
1 FAST
1 SMALL
1 SHINY
2 BIG
2 SLOW
3 EXPENSIVE
4 SHINY
5 FAST
5 SMALL
6 SHINY
6 SMALL
7 SMALL
And the base query:
select a.*
from vehicle a
join vehicle_names b using(vehicleName)
join vehicle_attribs c using(nameId)
where c.attribute in('SMALL', 'SHINY')
and a.vehicleName like '%coo%'
group
by a.id
having count(distinct c.attribute) = 2;
So what I want to achieve is to select rows with certain attributes, that match a name but only one entry for each name that matches where the id is the highest!
So a working solution in this example would return the below rows:
id vehicleName
----- --------
2 cool car
10 scooter
if it was using some sort of max on the id
at the moment I get all the entries for cool car and scooter.
My real world database follows a similar structure and has 10's of thousands of entries in it so a query like above could easily return 3000+ results. I limit the results to 100 rows to keep execution time low as the results are used in a search on my site. The reason I have repeats of "vehicles" with the same name but only a different ID is that new models are constantly added but I keep the older one around for those that want to dig them up! But on a search by car name I don't want to return the older cards just the newest one which is the one with the highest ID!
The correct answer would adapt the query I provided above that I'm currently using and have it only return rows where the name matches but has the highest id!
If this isn't possible, suggestions on how I can achieve what I want without massively increasing the execution time of a search would be appreciated!
If you want to keep your logic, here what I would do:
select a.*
from vehicle a
left join vehicle a2 on (a.vehicleName = a2.vehicleName and a.id < a2.id)
join vehicle_names b on (a.vehicleName = b.vehicleName)
join vehicle_attribs c using(nameId)
where c.attribute in('SMALL', 'SHINY')
and a.vehicleName like '%coo%'
and a2.id is null
group by a.id
having count(distinct c.attribute) = 2;
Which yield:
+----+-------------+
| id | vehicleName |
+----+-------------+
| 2 | cool car |
| 10 | scooter |
+----+-------------+
2 rows in set (0.00 sec)
As other said, normalization could be done on few levels:
Keeping your current vehicle_names table as the primary lookup table, I would change:
update vehicle a
inner join vehicle_names b using (vehicleName)
set a.vehicleName = b.nameId;
alter table vehicle change column vehicleName nameId int;
create table attribs (
attribId int auto_increment primary key,
attribute varchar(20),
unique key attribute (attribute)
);
insert into attribs (attribute)
select distinct attribute from vehicle_attribs;
update vehicle_attribs a
inner join attribs b using (attribute)
set a.attribute=b.attribId;
alter table vehicle_attribs change column attribute attribId int;
Which led to the following query:
select a.id, b.vehicleName
from vehicle a
left join vehicle a2 on (a.nameId = a2.nameId and a.id < a2.id)
join vehicle_names b on (a.nameId = b.nameId)
join vehicle_attribs c on (a.nameId=c.nameId)
inner join attribs d using (attribId)
where d.attribute in ('SMALL', 'SHINY')
and b.vehicleName like '%coo%'
and a2.id is null
group by a.id
having count(distinct d.attribute) = 2;
The table does not seems normalized, however this facilitate you to do this :
select max(id), vehicleName
from VEHICLES
group by vehicleName
having count(*)>=2;
I'm not sure I completely understand your model, but the following query satisfies your requirements as they stand. The first sub query finds the latest version of the vehicle. The second query satisfies your "and" condition. Then I just join the queries on vehiclename (which is the key?).
select a.id
,a.vehiclename
from (select a.vehicleName, max(id) as id
from vehicle a
where vehicleName like '%coo%'
group by vehicleName
) as a
join (select b.vehiclename
from vehicle_names b
join vehicle_attribs c using(nameId)
where c.attribute in('SMALL', 'SHINY')
group by b.vehiclename
having count(distinct c.attribute) = 2
) as b on (a.vehicleName = b.vehicleName);
If this "latest vehicle" logic is something you will need to do a lot, a small suggestion would be to create a view (see below) which returns the latest version of each vehicle. Then you could use the view instead of the find-max-query. Note that this is purely for ease-of-use, it offers no performance benefits.
select *
from vehicle a
where id = (select max(b.id)
from vehicle b
where a.vehiclename = b.vehiclename);
Without going into proper redesign of you model you could
1) Add a column IsLatest that your application could manage.
This is not perfect but will satisfy you question (until next problem, see not at the end)
All you need is when you add a new entry to issue queries such as
UPDATE a
SET IsLatest = 0
WHERE IsLatest = 1
INSERT new a
UPDATE a
SET IsLatest = 1
WHERE nameId = #last_inserted_id
in a transaction or a trigger
2) Alternatively you can find out the max_id before you issue your query
SELECT MAX(nameId)
FROM a
WHERE vehicleName = #name
3) You can do it in single SQL, and providing indexes on (vehicleName, nameId) it should actually have decent speed with
select a.*
from vehicle a
join vehicle_names b ON a.vehicleName = b.vehicleName
join vehicle_attribs c ON b.nameId = c.nameId AND c.attribute = 'SMALL'
join vehicle_attribs d ON b.nameId = c.nameId AND d.attribute = 'SHINY'
join vehicle notmax ON a.vehicleName = b.vehicleName AND a.nameid < notmax.nameid
where a.vehicleName like '%coo%'
AND notmax.id IS NULL
I have removed your GROUP BY and HAVING and replaced it with another join (assuming that only single attribute per nameId is possible).
I have also used one of the ways to find max per group and that is to join a table on itself and filter out a row for which there are no records that have a bigger id for a same name.
There are other ways, search so for 'max per group sql'. Also see here, though not complete.