I i have a table named Taxonomy it preaty much holds everything that has to do with the structure of the organization. as a school example table looks like
id | name | type | parent
1 | 2014 | year | null
2 | igcse| dep | 1
3 | kg1 | grade| 2
4 | c1 | class| 3
4 types, upper most is the school year (2014-2015) for example. under it school department, under it grade (grade1,2,3 etc) under it multiple classes.
when i need to get this i run a self join query like this :
SELECT y.name AS year,
d.name AS dep,
g.name AS grade,
c.name AS class,
e.name AS exam
FROM `taxonomys` e
left JOIN `taxonomys` c on c.id = e.parent
left JOIN `taxonomys` g on g.id = c.parent
left JOIN `taxonomys` d on d.id = g.parent
left JOIN `taxonomys` y on y.id = d.parent
where c.type in ('grade','department','class','year')
its working fine except with the so many nulls i get !
example result of query
as you can see, the classes shows correctly with year under year field,
yet on first row, year is under Class field, (shifted 3 cells).
when ever there is a null value is shifted. how can i fix that ?
thanks alot
EDIT
What is this table ?
A variation of Adjacency List Model, I added an extra column named type so that I can identify level of any row without having to retrieve whole path.
Table hold the structure of the school. example
In every new year, they create A Year example "2014-2015", then inside this year creates the school different departments "american diploma, Highschool, playshool, etc.." under that comes the school grades and under every grade come the classes..
Example
id name type parent
1 2014-2015 year null
2 highschool dep. 1
3 grade 10 grade 2
4 grade 11 grade 2
5 class a class 3
6 class b class 3
These inputs means
2014-2015
|__Highschool
|__ grade 10
|__ class a
|__ class b
|__ grade 11
after than another table link students to Class node, and another table link posts to class's and so on.
so this table basically holds the school structures. since years, departments, and grades are only there for organization of the school users "there is no data except names needed for them" i decided to have the all in one table.
Why did you build it in this a very very very bad/anti-pattern design ?
its actually working very nice for me !
(we hv 4 years with all the departments and students and classes, over 100k posts linked to user/class and over 10k users and its working smooth so far !)
I don't know what your output is supposed to be, and I don't know how your query is suppposed to get it, and I don't know how you are encoding what information in your table, but I suspect that the query below gives the result you want, and that it's somewhat redundant (namely the last 4 lines), and that your design is really, really, really an anti-pattern.
/*
rows y.n,d.n,g.n,c.n,e.n
where
taxonomies(y.i,y.n,'year',y.p)
AND taxonomies(d.i,d.n,'dep',y.i) AND taxonomies(g.i,g.n,'grade',d.p)
AND taxonomies(c.i,c.n,'class',g.i) AND taxonomies(e.i,e.n,'exam',c.i)
*/
SELECT y.name AS year,
d.name AS dep,
g.name AS grade,
c.name AS class,
e.name AS exam
FROM `taxonomys` y
JOIN `taxonomys` d on y.id = d.parent
JOIN `taxonomys` g on d.id = g.parent
JOIN `taxonomys` c on g.id = c.parent
JOIN `taxonomys` e on c.id = e.parent
WHERE y.type = 'year'
AND d.type = 'dep'
AND g.type = 'grade'
AND c.type = 'class'
AND e.type = 'exam'
They are shifting to the left and you are doing a left JOIN. That's a clue. I suggest to change it to only join. When you perform a left JOIN you get all the rows from the left table in conjunction with the matching rows from the right table. If there are no columns matching in the right table, it returns NULL values.
The columns shifting places is new to my eyes (not a SQL expert here though) but it seems to me that it could be related with the fact that you are retrieving all the data from the same table taxonomys.
Related
So i have this relational model for hospital (not made by me).
Patient (has an adress and an id), hospital (has id and address), and also there's a table for relationship representing placement in the hospital (hospital.id, patient.id) (also there's other tables, but they don't matter in this query);
The purpose of the query is to find hospitals where is no placed patients from from other cities than hospital's one (on condition that address only contains city).
The problem that i have is theoretical, i don't really know if to use full outer join with a or b null, or something else in the query that finds hospitals containing "foreign" patients, (like join hospital with its placement and then full outer join with a or b table record null, but that leads to a question will i get results in the query? Because i need cities that don't match but all the explanations of that join are about .
Thanks to all who embraced my utterly imperfect english and understood it.
Upd.
Patient:
id=1, city =A;
id=2, city =B;
id=3, city =B;
id=4, city =A;
id=5, city =C;
Hospital:
Id =1, city=A
id =2, city=B;
Placement:
h.id p.id
1 1
1 4
2 2
2 3
2 5
Expected results is "1", id of the first hospital (where's no patients from other city) and others with that "feature"
my query is like
select id from hospital where id not in
(select id,address from hospital inner join placement on h.id=placement.h.id as b inner join patient on placement.p.id=p.id where hospital.address<>patient.address )
Sorry for the delay
Is shawn's query correct?
Can i use h.id instead 1? Idk if our teacher would accept that, because he's never showed us something like that and in 10 years he hasn't managed to create an example of that database for students to test queries on.
select * from hospitals h
where not exists (
select 1 -- dummy value, use h.id if you prefer
from patients p inner join placement pl on pl.pid = p.id
where pl.hid = h.id and p.city <> h.city
)
or
select h.id
from hospitals h
left outer join
placement pl inner join patients p on p.id = pl.pid
on pl.hid = h.id
group by h.id
-- this won't count nulls resulting from zero placements for that hospital
-- as long as standard sql null comparisons are active
having count(case when h.city <> p.city then 1 end) = 0
Looks like it works to me: http://rextester.com/BTJB59061
So I'm doing a collectible cards managing app, and I have these tables:
"card": contains all the distinct cards ever made, and gives their informations (id, name, text, and so on)
"edition": contains all the card sets ever released (called it "edition" because "set" was a reserved word)
"cardinset": since a card can appear in more than one set, this is the associative table between the two. It also gives the number of the card in the set and the number of copies I have of it in french (fr) and in english (en) from that set.
Of course, all tables have a unique auto-increment id, called "id".
The purpose of my SQL request is to list all the cards of a set, ordered by the number of the card in the set. I need all the info of the "cardinset" entry (number, fr, en) and of course the info of the "card" entry (name, rarity, etc.), but I also need to know how many copies of each card I have in total (across all sets, not just in this one).
My SQL request looks like this (I removed a few fields that weren't important):
SELECT
c.name,
c.rarity,
SUM(cis.fr) + SUM(cis.en) AS available,
cis.number,
cis.fr,
cis.en
FROM
card AS c
INNER JOIN cardinset AS cis ON c.id = cis.cardId
WHERE
c.id IN
(
SELECT
cardId
FROM
cardinset AS cs
WHERE
setId = 104
ORDER BY
number
)
GROUP BY
c.id,
c.name
ORDER BY
cis.number
It almost works, but it doesn't retrieve the right cardinset entry for each card, since it takes the first one of the group, which is not always the one linked to the right set.
Example:
| c.name | c.rarity | available | cis.number | cis.fr | cis.en |
| -------------- | -------- | --------- | ---------- | ------ | ------ |
| Divine Verdict | Common | 9 | 008 | 1 | 1 |
Here, the card info (name and rarity) are correct, as well as the "available" field. However the cis field are wrong: they are part of a cis entry linking this card to another set.
The question is: is it possible to define which entry is the first in the group, and therefore is returned in this case? And if not, is there another way (maybe cleaner) to get the result I want?
Thank you in advance for your answer, I really don't know what to do here... I guess I've reached the limits of my knowledge of MySQL...
Here's a more precise example. This screenshot n°1 shows the first results of my query (described above), knowing that there are 212 results in total. They should be ordered by number, and there should be exactly one result of each number, and yet there are some exceptions:
n° 005, which should be "Divine Verdict" isn't there, because it appears instead as n° 008. That's because that card is part of 6 different sets, a we can see in screenshot n°2 (result of the query "SELECT * FROM cardinset WHERE cardId = 13984"), and the group returns the first entry, which is for set n°12 and not n°104 as I would have it. However the "available" field shows "9", which is the result I want: the sum of all the "fr" and "en" field for that card in all 6 sets it appears in.
There are other cards that don't have the right cardinset info: n° 011 and 019 are missing, but can be found lower with other cardinset info.
I believe this is the way you would want to format your query.
SELECT
c.name,
c.rarity,
cis.fr + cis.en AS available,
cis.number,
cis.fr,
cis.en
FROM
card AS c
INNER JOIN cardinset AS cis ON c.id = cis.cardId
WHERE
c.id IN
(
SELECT
cardId
FROM
cardinset AS cs
WHERE
setId = 104
GROUP BY
setID, cardID
)
ORDER BY
cis.number
The GROUP BY clause was moved into the sub select and modified to make sure an entry is the right combo of card/set. Also removed the SUMs because that was not necessary.
I made it at last!
I used a subquery to get the "available" field with a GROUP BY clause. It's long, and not very fast, but it gets the job done. If you have an idea that could improve it, don't hesitate.
SELECT e.code, cs.number, sub.name, sub.rarity, cs.fr, cs.en, sub.available
FROM cardinset as cs
INNER JOIN edition as e ON e.id = cs.setId
INNER JOIN (
SELECT c.id, c.name, c.rarity,
SUM(cis.fr)+SUM(cis.en) as available, SUM(cis.frused)+SUM(cis.enused) as used
FROM card as c
INNER JOIN cardinset as cis ON c.id = cis.cardId
WHERE c.id IN (
SELECT cardId
FROM cardinset as cins
WHERE setId = 54)
GROUP BY c.id, c.name
ORDER BY c.id
) AS sub ON cs.cardId = sub.id
WHERE setId = 54
ORDER BY cs.number
I have tried for a number of hours to get this. I am still quite new to mysql but have managed to achieve queries that I was impressed with after using the resources and examples I found. I am a bit stuck here. Apologies if I do not ask this very well.
Three tables that are used for managing categories and category membership within a project.
table a = project membership
id user_id project_id
== ======= ==========
1 1 10
2 1 12
3 3 45
4 5 12
table b = categories
id name project_id
== ==== ==========
1 cat1 10
2 cat4 12
3 cat8 45
tabke c = category members
id user_id_added category_id capability
== ============= =========== ==========
1 1 2 1
2 3 3 2
3 5 3 1
4 5 2 0
Required result
members of project 2
user_id category capability_in_category
======= ======== ======================
1 2 1
5 2 0
SELECT a.user_id
, c.capability
, b.id as category
FROM a
LEFT OUTER JOIN b
ON a.project_id = c.project_id
LEFT OUTER JOIN c
ON b.id = c.category_id
WHERE a.project_id = $project_id
AND c.category_id = $category_id;
It feels like I don't need to join the three tables, but I do not see a way of joining the project table with the category membership table without using the category table (b). The query I am running nearly works, but user capability is not returning correct. I am using left outer joins as a member may not always be part of a category, but they still need to be shown as a member in the project. I have been trying various joins and subqueries, without success. I basically need a list of the members in the project and if they are part of a category, to show the capability they have of the specific category. I feel there are a few ways of doing this potentially, but there is a gray area I am struggling to bridge.
The question is vague so I might help you to solve the wrong problem but if you want to have all members of a specific project listed (regardless of their capability) and to list the capabilities in a specified category listed as well, then:
SELECT project_memberships.user_id
, category_members.category_id AS category
, category_members.capability AS capability
FROM project_members
LEFT OUTER JOIN categories
ON project_members.project_id = categories.project_id
LEFT OUTER JOIN category_members
ON categories.id = category_members.category_id
AND category_members.user_id_added = project_membership.user_id
WHERE project_members.project_id = $project_id
AND (categories.id = $category_id OR categories.id IS NULL);
should get you that.
I altered tree things compared to your original query:
I used the table names as they are more speaking than "a, b, c"
I added the additional constraint category_members.user_id_added = project_membership.user_id to the second join so as to not join category_members of a different user to a project_members record.
I loosened the WHERE condition so that members not having the desired capability are also displayed. category and capability will be NULL for those records.
As to your question regarding having to join the three tables the answer is yes, you need to do that.
Below is my query:
$query = "
SELECT DISTINCT gr.SessionId, t.TeacherUsername, t.TeacherForename,
t.TeacherSurname, cm.ModuleId, m.ModuleName,
cm.CourseId, c.CourseName, st.Year, st.StudentUsername,
st.StudentForename, st.StudentSurname, gr.Mark, gr.Grade
FROM Teacher t
INNER JOIN Session s ON t.TeacherId = s.TeacherId
JOIN Grade_Report gr ON s.SessionId = gr.SessionId
JOIN Student st ON gr.StudentId = st.StudentId
JOIN Course c ON st.CourseId = c.CourseId
JOIN Course_Module cm ON c.CourseId = cm.CourseId
JOIN Module m ON cm.ModuleId = m.ModuleId
WHERE
('".mysql_real_escape_string($sessionid)."' = '' OR gr.SessionId = '".mysql_real_escape_string($sessionid)."')
ORDER BY $orderfield ASC
";
You don't need to worry about the WHERE clause and ORDER BY clause. My problem is that the query result shows 26 rows when it should show 13 rows.
I know the reason for this and it is because the Course_Module table is a cross reference table between Course table and Module table and is needed so that it is able to link Course table and Module table together.
But Course Table uses CourseId to JOIN another table and so does Course_Module Table. So CourseId is used twice in the JOINS section and because of this it is duplicating rows again. So there should be 13 rows but because each row is duplicate it shows 26.
I tried GROUP BY cm.CourseId but it ends up displaying 2 rows which are two different CourseId which is not what I want at all.
So what my question is that is there are way I can use the Course_Module table to JOIN tables but ignore it when it comes to displaying results?
If query was this:
$query = "
SELECT DISTINCT gr.SessionId, t.TeacherUsername, t.TeacherForename,
t.TeacherSurname, cm.ModuleId, m.ModuleName,
cm.CourseId, c.CourseName, st.Year, st.StudentUsername,
st.StudentForename, st.StudentSurname, gr.Mark, gr.Grade
FROM Teacher t
INNER JOIN Session s ON t.TeacherId = s.TeacherId
JOIN Grade_Report gr ON s.SessionId = gr.SessionId
JOIN Student st ON gr.StudentId = st.StudentId
JOIN Course c ON st.CourseId = c.CourseId;
This query shows 13 rows but it means there is no link to Module Table so don't know name of Modules taken for each grade reort.
Below is example of result I am getting at moment:
Student Session Module Course Grade
S1 AAA CHT2520 ICT A
S1 AAA CHT2520 ICT A
S2 AAA CHT2520 ICT B
S2 AAA CHT2520 ICT B
S3 AAB CHT2220 BIT D
S3 AAB CHT2220 BIT D
S4 AAC CHI2250 COMP A
S4 AAC CHI2250 COMP A
It should be:
Below is result I am getting at moment:
Student Session Module Course Grade
S1 AAA CHT2520 ICT A
S2 AAA CHT2520 ICT B
S3 AAB CHT2220 BIT D
S4 AAC CHI2250 COMP A
Thank You
Your select clause asks for 14 columns. The results you showed only had 5. If you limit your select clause to those 5 columns, you'll get the 13 rows that you want.
To include all 14 columns, look at the other columns in the results. Realize that right now, you don't have 26 rows in your result set so much as you have 13 pairs of rows. Look carefully at each pair, and somewhere you'll find a column that's different — something that distinguishes one record in a matched pair from the other. Add a condition to the join on the table that hosts this column to prevent one of the values from making it to your results, and you'll get the right number of rows. This may require a derived table or correlated sub-query in the join condition to limit the join to only the first match (for some definition of "first" determined by the sub query).
I am not a big expert but why are you using two different joins in your case? Stick to INNER JOIN throughout the query and it might fix the issue.
Using the tables below as an example and the listed query as a base query, I want to add a way to select only rows with a max id! Without having to do a second query!
TABLE VEHICLES
id vehicleName
----- --------
1 cool car
2 cool car
3 cool bus
4 cool bus
5 cool bus
6 car
7 truck
8 motorcycle
9 scooter
10 scooter
11 bus
TABLE VEHICLE NAMES
nameId vehicleName
------ -------
1 cool car
2 cool bus
3 car
4 truck
5 motorcycle
6 scooter
7 bus
TABLE VEHICLE ATTRIBUTES
nameId attribute
------ ---------
1 FAST
1 SMALL
1 SHINY
2 BIG
2 SLOW
3 EXPENSIVE
4 SHINY
5 FAST
5 SMALL
6 SHINY
6 SMALL
7 SMALL
And the base query:
select a.*
from vehicle a
join vehicle_names b using(vehicleName)
join vehicle_attribs c using(nameId)
where c.attribute in('SMALL', 'SHINY')
and a.vehicleName like '%coo%'
group
by a.id
having count(distinct c.attribute) = 2;
So what I want to achieve is to select rows with certain attributes, that match a name but only one entry for each name that matches where the id is the highest!
So a working solution in this example would return the below rows:
id vehicleName
----- --------
2 cool car
10 scooter
if it was using some sort of max on the id
at the moment I get all the entries for cool car and scooter.
My real world database follows a similar structure and has 10's of thousands of entries in it so a query like above could easily return 3000+ results. I limit the results to 100 rows to keep execution time low as the results are used in a search on my site. The reason I have repeats of "vehicles" with the same name but only a different ID is that new models are constantly added but I keep the older one around for those that want to dig them up! But on a search by car name I don't want to return the older cards just the newest one which is the one with the highest ID!
The correct answer would adapt the query I provided above that I'm currently using and have it only return rows where the name matches but has the highest id!
If this isn't possible, suggestions on how I can achieve what I want without massively increasing the execution time of a search would be appreciated!
If you want to keep your logic, here what I would do:
select a.*
from vehicle a
left join vehicle a2 on (a.vehicleName = a2.vehicleName and a.id < a2.id)
join vehicle_names b on (a.vehicleName = b.vehicleName)
join vehicle_attribs c using(nameId)
where c.attribute in('SMALL', 'SHINY')
and a.vehicleName like '%coo%'
and a2.id is null
group by a.id
having count(distinct c.attribute) = 2;
Which yield:
+----+-------------+
| id | vehicleName |
+----+-------------+
| 2 | cool car |
| 10 | scooter |
+----+-------------+
2 rows in set (0.00 sec)
As other said, normalization could be done on few levels:
Keeping your current vehicle_names table as the primary lookup table, I would change:
update vehicle a
inner join vehicle_names b using (vehicleName)
set a.vehicleName = b.nameId;
alter table vehicle change column vehicleName nameId int;
create table attribs (
attribId int auto_increment primary key,
attribute varchar(20),
unique key attribute (attribute)
);
insert into attribs (attribute)
select distinct attribute from vehicle_attribs;
update vehicle_attribs a
inner join attribs b using (attribute)
set a.attribute=b.attribId;
alter table vehicle_attribs change column attribute attribId int;
Which led to the following query:
select a.id, b.vehicleName
from vehicle a
left join vehicle a2 on (a.nameId = a2.nameId and a.id < a2.id)
join vehicle_names b on (a.nameId = b.nameId)
join vehicle_attribs c on (a.nameId=c.nameId)
inner join attribs d using (attribId)
where d.attribute in ('SMALL', 'SHINY')
and b.vehicleName like '%coo%'
and a2.id is null
group by a.id
having count(distinct d.attribute) = 2;
The table does not seems normalized, however this facilitate you to do this :
select max(id), vehicleName
from VEHICLES
group by vehicleName
having count(*)>=2;
I'm not sure I completely understand your model, but the following query satisfies your requirements as they stand. The first sub query finds the latest version of the vehicle. The second query satisfies your "and" condition. Then I just join the queries on vehiclename (which is the key?).
select a.id
,a.vehiclename
from (select a.vehicleName, max(id) as id
from vehicle a
where vehicleName like '%coo%'
group by vehicleName
) as a
join (select b.vehiclename
from vehicle_names b
join vehicle_attribs c using(nameId)
where c.attribute in('SMALL', 'SHINY')
group by b.vehiclename
having count(distinct c.attribute) = 2
) as b on (a.vehicleName = b.vehicleName);
If this "latest vehicle" logic is something you will need to do a lot, a small suggestion would be to create a view (see below) which returns the latest version of each vehicle. Then you could use the view instead of the find-max-query. Note that this is purely for ease-of-use, it offers no performance benefits.
select *
from vehicle a
where id = (select max(b.id)
from vehicle b
where a.vehiclename = b.vehiclename);
Without going into proper redesign of you model you could
1) Add a column IsLatest that your application could manage.
This is not perfect but will satisfy you question (until next problem, see not at the end)
All you need is when you add a new entry to issue queries such as
UPDATE a
SET IsLatest = 0
WHERE IsLatest = 1
INSERT new a
UPDATE a
SET IsLatest = 1
WHERE nameId = #last_inserted_id
in a transaction or a trigger
2) Alternatively you can find out the max_id before you issue your query
SELECT MAX(nameId)
FROM a
WHERE vehicleName = #name
3) You can do it in single SQL, and providing indexes on (vehicleName, nameId) it should actually have decent speed with
select a.*
from vehicle a
join vehicle_names b ON a.vehicleName = b.vehicleName
join vehicle_attribs c ON b.nameId = c.nameId AND c.attribute = 'SMALL'
join vehicle_attribs d ON b.nameId = c.nameId AND d.attribute = 'SHINY'
join vehicle notmax ON a.vehicleName = b.vehicleName AND a.nameid < notmax.nameid
where a.vehicleName like '%coo%'
AND notmax.id IS NULL
I have removed your GROUP BY and HAVING and replaced it with another join (assuming that only single attribute per nameId is possible).
I have also used one of the ways to find max per group and that is to join a table on itself and filter out a row for which there are no records that have a bigger id for a same name.
There are other ways, search so for 'max per group sql'. Also see here, though not complete.