I have four tables: series, seasons, episodes, images. Each series consists of multiple seasons which consists of multiple episodes. Each episode has one or more images attached to it. Now I would like to retrieve one series including all its seasons, episodes and images.
SELECT * FROM series
LEFT JOIN seasons ON seasons.seasons_series_id=series.series_id
LEFT JOIN episodes ON episodes.episodes_seasons_id=seasons.seasons_id
LEFT JOIN images ON images.images_id=episodes.episodes_images_id
WHERE series.series_id=1
The above query does not work, because seasons_id is not available when running the second LEFT JOIN etc. Should I be using nested queries instead?
In the query posted to the question, the seasons_id generally IS available for that second LEFT JOIN (and the third, if it comes to it).
When you add additional JOINs to a query, those JOINs take into account not only the table from the original FROM clause but also the entire result sets built up by any additional JOIN so far. This is one reason why always using an alias for your tables is a good idea... its possible to include the same table in a query more than once via a JOIN, and aliases can be important to keep straight separate instances of the same table.
The only case when your seasons_id would not be available is when you have a series record that does not have any seasons records associated with it. In this case, you would have a NULL value in your results for the seasons_id, and you would further have no way in the schema shown to connect any episode record with that series record at all. In this schema, every series must have at least one season if it is to have any episodes or images. Thus, the missing seasons_id wouldn't matter anyway, because you couldn't ever hope to match any episode records for that series.
There's nothing wrong with your query.. if a relationship breaks down and left join shows e.g. A season 2 with no known episodes, then there won't be any images for those non-episodes. It doesn't stop the series having two seasons, you just see results like:
Game of thrones, season 1, episode 1, image 1
Game of thrones, season 2, null, null
If your database enforces relationships then you'll never be able to insert images from game of thrones season 2 episode 1, because the episode has to exist first to be a parent to the child images. If your database doesn't enforce relationships, then you can go ahead and insert a load of images and give them all an episode ID of 971 which you predict is what s2 e1 will get when you do get around to insert it, but they won't show in your query because theyre orphans if episode with ID 971 doesn't exist in the DB yet
If you're hoping your query will show these orphaned images, you'll have to write it in a different way
You might having missing columns as your are not using aliases for your tables, so MySQL does not know which column belongs to each table. Try to use an alias for every table and run it again.
Hope this helps
Related
Something really bugs me and im not sure what is the "correct" approach.
If i make a select to get contacts from my database there are a decent amount of joins involved.
It will look something like this (around 60-70 columns):
SELECT *
FROM contacts
LEFT JOIN company
LEFT JOIN person
LEFT JOIN address
LEFT JOIN person_communication
LEFT JOIN company_communication
LEFT JOIN categories
LEFT JOIN notes
company and person are 1:1 cardinality so its straight forward.
But "address", "communication" and "categories" are 1:n cardinality.
So depending on the amount of rows in the 1:n tables i will get a lot of "double" rows (I don't know whats the real term for that, the rows are not double i know that the address or phone number etc is different). For myself as a contact, a fairly filled contact, i get 85 rows back.
How do you guys work with that?
In my PHP application i always wrote some "Data-Mapper" where the array key was the "contact.ID aka primary" and then checked if it exists and then pushed the additional data into it. Also PHP is not really type strict what makes it easy.
Now I'm learning GO(golang) and i thought screw that LOOOONG select and data mapping just write selects for all the 1:n.... yeah no, not enough connections to load a table full of contacts. I know that i can increase the connections but the error seems to imply that this would be the wrong way.
I use the following driver: https://github.com/go-sql-driver/mysql
I also tried GROUP_CONCAT but then i running in trouble parsing it back.
Do i have to do my mapping approach again or is there some nice solution out there? I found it quite dirty at points tho?
The solution is simple: you need to execute more than one query!
The cause of all the "duplicate" rows is that you're generating a result called a Cartesian product. You are trying to join to several tables with 1:n relationships, but each of these has no relationship to the other, so there's no join condition restricting them with respect to each other.
Therefore you get a result with every combination of all the 1:n relationships. If you have 3 matches in address, 5 matches in communication, and 5 matches in categories, you'd get 3x5x5 = 75 rows.
So you need to run a separate SQL query for each of your 1:n relationships. Don't be afraid—MySQL can handle a few queries. You need them.
I have a query that the only way I could get it to work was to left join, on three fields. If I did an ordinary inner join on these three fields the query returned nothing. But if I try each individual join separately, they all join as I would expect, e.g. Bob to Bob, Bookshop to Bookshop, Bread to Bread etc.
So for these two sets of query results...
1.Manager 1.Shop 1.Product 1.Cost 2.Manager 2.Shop 2.Product 2.Quantity
Bob Hardware Spanners 15 Bob Hardware Spanners 3
Terry Food Bread 12 Terry Food Bread 4
Sue Bookshop Books 18 Sue Bookshop Books 7
...this query returns no results:
SELECT 1.Manager, 1.Shop, 1.Product, 1.Cost, 2.Quantity
FROM 1 INNER JOIN 2 ON 1.Manager = 2.Manager AND 1.Shop = 2.Shop AND 1.Product = 2.Product;
I know joining on text isn't ideal, but I have similar queries that join on these three fields without problem, so wondered whether it was a 'feature' of Access that I had encountered, or whether it's likely to be a problem in the data?
-edit-
By putting the JOIN conditions into the WHERE clause instead, I found that, if I have WHERE 1.Manager = "Bob" AND 2.Manager = "Bob:
WHERE 1.Product = "Spanners"
works on its own, and:
WHERE 2.Product = "Spanners"
works on its own, but combining the two:
WHERE 1.Product = "Spanners" AND 2.Product = "Spanners"
again returns nothing!
-edit 2-
The main query does indeed behave properly when it is referencing the data in tables. So there may be something odd about the way the base queries return their results.
-edit 3-
This is the link to an example of the problem: [link removed]
01 Top Level Queries: both of these are the same, but that one refers to tables, and works; and the other refers to queries, and does not work. I want to find out why the query version doesn't work.
02 2nd Level Queries and Tables: there are two versions of each set of data - one is a query, and the other is a table made using a Make Table version of the query. Both are identical as far as I can tell.
03 and 04 Level Queries: these are lower level queries that go to make up the 2nd level queries
Tables: these are the base tables that all other queries are built on.
OK, so I downloaded your db and took a look. I got as far as finding that if you put the NumStores query first in you inner join then it would return records, then abandoned ship. I don't want to sound harsh but you are so far down the road of poor database design you have no hope of going further. Among the many issues that will continue to cause you problems are:
No primary keys in your tables (no indexes of any kind).
Incomprehensible naming convention for your objects (queries and tables).
Data is duplicated in many different tables (normalization violations).
Embedded subqueries in your main queries.
If you want to use Access to help you you need to learn how to use it.
For the record, if anyone looks at this question having a similar problem - one of the queries that fed into the main query was grouping on a field that didn't appear anywhere in that particular query. Once I'd removed that field from the Group By clause the main query returned the results I expected.
Odd that a query was essentially returning exactly the same results with different behaviour, but there you go.
Had the same problema here in the future (year 2017, Access 2010).
For some reason, Left Join would work bringing the exact same result Inner Join brought and mysteriously stopped.
After "Feb 11 '13 at 9:54" message, I noticed that one of the joined queries had doubled Group By fields not showing (no reason for that), so I deleted them. It worked. Access recreated the no-show Group By fields, but not doubled anymore, and that was the (bug?) problem.
Using MySQL I have table of users, a table of matches (Updated with the actual result) and a table called users_picks (at first it's always going to be 10 football matches pr. gameweek pr. league because there's only one league as of now, but more leagues will come along eventually, and some of them only have 8 matches pr. gameweek).
In the users_picks table should i store each 'pick' (by pick I mean both 'hometeam score' and 'awayteam score') in a different row, or have all 10 picks in one single row? Both with a FK for user and gameweek. All picks in one row would mean I had columns with appended numbers like this:
Option 1: [pick_id, user_id, league_id, gameweek_id, match1_hometeam_score, match1_awayteam_score, match2_hometeam_score, match2_awayteam_score ... etc]
and that option doesn't quite fill me with joy, and looks a bit stupid. Especially since there's going to be lots of potential NULLs in the db. The second option would mean eventually millions of rows. But would look like this:
Option 2: [pick_id, user_id, league_id, gameweek_id, match_id, hometeam_score, awayteam_score]
What's the best practice? And would it be a PITA to do all sorts of statistics using the second option? eg. Calculating how many matches a user has hit correctly in a specific round, how many alltime correct hits etc.
If I'm not making much sense, I'll try to elaborate anything. I just wan't my table design to be good from the start, so I won't have a huge headache in a couple of months.
Thanks in advance.
The second choice is much better than the first. This is called database normalisation and makes querying easier, not harder. I would suggest reading the linked article, and the related descriptions of the various "normal forms", and aiming for a 3rd Normal Form data structure as a minimum.
To see the flaw in your first option, imagine if there were to be included later a new league with 11 matches. Or 400.
You should read up about database normalization.
When you have a 1:n relation, like in your case one team having many matches, you would create two tables. One table "teams" and a second table "matches" where each row includes the ID of the team which played the match.
In the same manner you should also have separate tables for users, picks and leagues.
Option two is better, provided you INDEX your table properly, since (as you indicate) it will grow quite large. The pick_id is the primary key, but also create an INDEX on the user_id field, as likely the most common query will be
SELECT * FROM `users_pics` WHERE `user_id`=?;
to get all the picks for a given user.
We have a database table that has way too many rows. to speed up performance, we are trying to create a summary table. this works great for one to one relationships. e.g. let's say furniture has a type and a manufacturer_id, you could have a table that has both of these columns and a counts column. it would be easy to query that table and very quickly get the number of furnitures of a given type.
But, what if there is a many to many relationship? so each piece of furniture can also have one or many colors and one or many distributors. what happens then? is there any way to summarize this data so i can quickly find how many furnitures are green? or how many are blue and yellow?
obviously this is just a made up example. but given a huge database table with millions and millions of rows, how can i create a summary table to quickly look up aggregate information?
Assuming you know what you do and know this is a real bottleneck: Do you have measurements of the performance now? Do you know where it starts taking time?
You will have to query the database anyway to get that count. So you can store it in a separate table like color count and distributor count. Another solution is to cache the results of these queries in a caching system. For example if you have memcached or some other tools already in use.
Most simply when you just have a database is just to create a table:
table color count
color_id
amount
That is a very simple query. You can index it very well and no joins are needed.
Updating can be done with triggers, with a cron or at the moment you update the many to many table. Depending on your needs and capacity. Take into consideration that updating the records also takes time so use it for optimizing reads, that's what I read in your question.
Multiple tables should keep the size down... and a good database system should keep the performance up.
In my opinion, keeping a separate 'summary table' creates a lot of overhead and maintenance problems and is only really useful if the same summary information is desired over and over (i.e., how many furnitures are green without also storing how many are blue, how many are yellow, how many are blue and yellow, etc., etc., etc.)
What I would do is:
Table 1: furnitures
Column 1: uniqueID
Column 2: name
Table 2: distributors
Column 1: uniqueID
Column 2: name
Table 3: colors
Column 1: uniqueID
Column 2: name
Table 4: furniture-distributor
Column 1: furnitureUniqueIDvalue
Column 2: distributorUniqueIDvalue
Table 5: furniture-color
Column 1: furnitureUniqueIDvalue
Column 2: colorUniqueIDvalue
How many furnitures are green:
SELECT COUNT(*) FROM furniture-color WHERE colorUniqueIDvalue = 'green ID';
How many furniture are both blue and yellow:
SELECT COUNT(*) FROM furniture-color as t1 INNER JOIN furniture-color as t2 ON t1.furnitureUniqueIDvalue = t2.furnitureUniqueIDvalue AND t1.colorUniqueIDvalue = 'blue ID' AND t2.colorUniqueIDvalue = 'yellow ID';
Getting lists of distributors of blue and yellow furniture, or furniture from a particular distributor that is either green or red, or most anything else is possible with the right SQL statement (left as an exercise for the reader).
You need to distinguish between counting different types of furniture (distinct furniture id) and counting actual pieces of furniture.
If you have a distributor-color table, then you can count actual pieces of furniture. However, you cannot count different types of furniture. This is the difference between additive facts and non-additive facts, in the terminology of OLAP. If you are interested in this subject, check out Ralph Kimball and his classic book "The Data Warehouse Toolkit".
To count furniture types, you need to include that in your table. So, you need a distributor-color-furniture table. Now to get the total for a distributor, you can use:
select distributor, count(distinct furnitureid)
from dcf
group by distributor
And similarly for color.
It seems that you want to translate your original data into a fact table, for each of reporting. This is a very good and standard idea for developing data marts. Your data mart could have two fact tables. One for each type of furniture (so you can handle the manufacturing questions easily) and other for distributor-color-furniture (for harder questions).
Some databases, such as Oracle and SQL Server, have support for these types of data structures. What you are talking about is more like a new "system", rather than just a new "table". You need to think about the dimensions for the fact table, the updates, and the types of reports that you need.
There will be 2^n possible rows in the color summary table where 'n' is the number of colors. If you reduce the colors to a bitmap and assign each color a location (red=0,orange=1,yellow=2,green=3,etc.) then your color summary table could be:
Color Count
0x0001 256
0x0002 345
0x0003 23839
etc.
256 only have red, 345 only have orange, 23,839 have red and orange. To get a count of how many have red but could have other colors would require summing the rows with bit position 0 set. Alternatively a separate summary table could be set up with only 'n' entries, one for each color, to avoid summing over the rows.
If you want the summary table to manage both distributor and color then I think it would have 2^n * 2^m rows (where 'm' is the number of distributors) to have all the combinations of multiple distributors for multiple pieces of furniture each possibly with multiple colors.
I'm building a small cinema booking system PHP web application,
The database has a Film and Showing table. (amongst others but not important)
A Showing has a date and a time, and each Showing consists of one Film
A Film can have many Showings
I'm trying to build a query that will get all the film_name, showing_date and showing_time although I want to group the results so I don't have multiple films in the result, as you can have more than one showing on the same date.
I have this SQL:
SELECT f.film_name, s.showing_date, s.showing_time
FROM film f, showing s
WHERE f.film_id = s.film_id
GROUP BY s.film_id
However it's not showing all the times for each film, just the first one. I guess there is a lot I'm missing out, and maybe I should split the showing times into a separate table, but any help would be greatly appreciated. I will most more information and diagrams if necessary.
Thanks
Assuming you want one row per film, with all showings in the same row, try:
SELECT f.film_name, group_concat(concat(s.showing_date, s.showing_time)) showings
FROM film f, showing s
WHERE f.film_id = s.film_id
GROUP BY s.film_id
You cannot do what you are asking to do.
Each row in your result set can only show one film name and one show time. If film A is showing 5 times, then you can either get a result set of five lines, all listing film A and the different show times, or if you group by film A, you will only get one result, and it will list the first show time.
Based upon what you have told us, I believe what you are looking for is some way to condense each film into one row that still lists the showing dates and times properly. In order to do this, you will need to somehow collapse these rows into one row in a way that is not often used. Normall you would use some sort of function on these rows (SUM, COUNT, etc.) to give aggregate data. However, it sounds like you want to see the actual data.
To do this, there is a really helpful SO question here:
Concatenate many rows into a single text string?
The second-highest rated response talks about using XML PATH, which would probably be the cleanest way of doing it if your database supports that feature. If not, look at the accepted answer (COALESCE). I would suggest putting this type of code into a scalar function that returned one field with comma-separated showtimes for you. Then you could list a film and have a list of showtimes next to the film.
Sorry for the confusing and maybe wasting of time, I think I have found the solution by splitting the showing times into a separate table.
I find all of the films being shown on a certain date, then loop through and select all the showing times for those films based on the showing id returned from the first query, as there will only be on showing of a film per day. I add this information to the first result per loop cycle and pass the whole data back.
There's probably better way's of doing it, but this will do for now.
Thanks