Combine four queries into one - mysql

I have inherited a legacy code base, and a database with it (which I cannot modify), and I got stuck on these queries while refactoring the code. There are 4 separate queries which I am trying to put in one, if possible. I will supply the tables schema if needed. Also, if you think that it is not possible to solved in one query, please elaborate on that one.
These are the queries (it is about sports), the final purpose is to get all cups which the given user has created and/or joined
cups table holds the created cups, their creator, etc
joined table holds the users that have joined a specific cup, the place (rank) they have ended
registered_cups table holds the names of the cups
The first one is trying to get all cups which the user has created
SELECT cup_id FROM cups WHERE user_id = 'givenUser' AND cup_type <> 6
The second one is trying to get all cups which the user has joined
SELECT joined.cup_id, joined.cup_rank FROM joined LEFT JOIN cups USING (cup_id) WHERE joined.user_id = 'givenUser' AND cups.cup_type <> 6
Now, the results from the first one are put into a comma-separated string which is then supplied to the next queries. The third one is reading information for the selected cups
SELECT cup_id, user_id, status FROM joined WHERE cup_id IN (cupList)
And the last one gets the names of the cups, their edition, and orders them
SELECT name, cup_id, edition, cups.user_id FROM cups LEFT JOIN registered_cups USING(register_id) WHERE cup_id IN (cupList) ORDER BY name ASC, edition DESC
As you can see there is a lot of repetition, and not needed things, so this is what I came up with:
SELECT cups.cup_id, cups.edition, cups.user_id, joined.cup_rank, registered_cups.name
FROM cups, joined, registered_cups
WHERE cups.cup_id = joined.cup_id
AND cups.register_id = registered_cups.register_id
AND joined.user_id = '308288'
AND cups.cup_type <>6
ORDER BY registered_cups.name ASC , cups.edition DESC
The problem with my query is that I only get the cups which the user joined, but there is also a possibility that a user created a cup, but did not take part into it. That's why are the first two queries, but I don't know how to combine them successfully. I hope you understand what I'm trying to achieve.
Update:
Here is a fiddle of the schema with a little input. Ther user in question is user_id = 133, it should basically select all of the rows, which is 21, with the point that one of them should be a cup that is organized, but not joined by the user, all of the others are both.

There's a couple ways you could do it. You could do an outer join like this:
SELECT cups.cup_id, cups.edition, cups.user_id, joined.cup_rank, registered_cups.name
FROM cups INNER JOIN registered_cups ON cups.register_id = registered_cups.register_id
LEFT OUT JOIN joined ON cups.cup_id = joined.cup_id and joined.user_id = '308288'
WHERE cups.cup_type <>6
AND (joined.user_id IS NOT NULL
OR cups.user_id = '308288')
ORDER BY registered_cups.name ASC , cups.edition DESC
or a union like this:
SELECT cups.cup_id, cups.edition, cups.user_id, joined.cup_rank, registered_cups.name
FROM cups, joined, registered_cups
WHERE cups.cup_id = joined.cup_id
AND cups.register_id = registered_cups.register_id
AND joined.user_id = '308288'
AND cups.cup_type <>6
UNION
SELECT cups.cup_id, cups.edition, cups.user_id, null, registered_cups.name
FROM cups, registered_cups
WHERE cups.register_id = registered_cups.register_id
AND cups.user_id = '308288'
AND cups.cup_type <>6
AND NOT EXISTS
(SELECT 1
FROM joined
WHERE cups.user_id = joined.user_id and cups.cup_id = join.cup_id)
ORDER BY registered_cups.name ASC , cups.edition DESC

I assume you have an users table and cups are created before they can be joined (seems logic)
SELECT
c.cup_id,
u.user_name,
c.edition,
j.status,
j.cup_rank,
cr.user_id IS NOT NULL created,
r.name
FROM
(SELECT 133 user_id) u
/* semi cross join the cups table */
JOIN
cups c
ON u.user_id = 133 AND c.cup_type <> 6
JOIN
registered_cups r
ON c.register_id = r.register_id
LEFT JOIN
joined j
ON c.cup_id = j.cup_id AND u.user_id = j.user_id
LEFT JOIN
cups cr
ON c.cup_id = cr.cup_id AND u.user_id = cr.user_id
ORDER BY r.name, c.edition
Maybe I have some column names wrong or from the wrong table, but the idea stays the same
Edit:
SQL Fiddle

Related

MySQL SELECT queries without LIMIT

I am doing a course on Relational Databases, MySQL to be more especific. We need to create some SELECT queries for a project. The project is related to music. It has tables to represent musicians (musician), bands (band) and the musician ability to do a certain task, like singing or playing the guitar (act).
Table musician contains :
id
name
stagename
startyear
Table band contains :
code
name
type ("band" or "solo")
startyear
And finally, table act contains :
band (foreign key to code of "band" table)
musician (foreign key to id of "musician" table)
hability (guitarist, singer, like that... and a foreign key to another table)
earnings
I have doubts in two exercises, the first one asks to select musicians id and stagename who participate with more acts in bands whose type is solo.
My solution for the first one is this:
SELECT ma.id, ma.stagename
FROM musician ma, act d, band ba
WHERE ma.id = d.musician
AND ba.code = d.band
AND ba.type = "solo"
GROUP BY ma.id, ma.stagename
HAVING COUNT(ma.id) = (SELECT COUNT(d2.musician) AS count
FROM act d2, band ba2
WHERE d2.band = ba2.code
AND ba2.type = "solo"
GROUP BY d2.musician
ORDER BY count DESC
LIMIT 1);
The second one is very similar to the last one. We need to select, for every startyear, the id and stagename of a musician who can do more acts, with the corresponding number of acts and the maximum and minimum of his cachet. This is my solution:
SELECT ma.startyear, ma.id, ma.stagename, COUNT(ma.id) AS NumActs, MIN(d.earnings), MAX(d.earnings)
FROM musician ma, act d, band ba
WHERE ma.id = d.musician
AND ba.code = d.band
AND ba.type = "solo"
GROUP BY ma.year, ma.id, ma.stagename
HAVING COUNT(ma.id) = (SELECT COUNT(d2.musician) AS count
FROM act d2, band ba2
WHERE d2.band = ba2.code
AND ba2.type = "solo"
GROUP BY d2.musician
ORDER BY count DESC
LIMIT 1);
The results with my dummy data are perfect but my teacher told us we should avoid using the LIMIT option, but that's the only way we can get the highest number, at least with what we know right now.
I've seen a lot of subqueries after the FROM statement to solve this problem, however, for this project we can't use subqueries inside FROM. Is this really possible without LIMIT ?
Thanks in advance.
It is possible, but much worse than with sub-query in from or limit. So I'd never use it in real life :)
Well, long story short, you can do something like this:
SELECT
m.id
, m.stagename
FROM
musician m
INNER JOIN act a ON (
a.musician = m.id
)
INNER JOIN band b ON (
b.code = a.band
AND b.type = 'solo'
)
GROUP BY
m.id
, m.stagename
HAVING
NOT EXISTS (
SELECT
*
FROM
act a2
INNER JOIN band b2 ON (
b2.code = a2.band
AND b2.type = 'solo'
)
WHERE
a2.musician != a.musician
GROUP BY
a2.musician
HAVING
COUNT(a2.musician) > COUNT(a.musician)
)
;
I think you can understand the idea from the query itself as it's pretty straightforward. However, let me know if you need an explanation.
It is possible that your restriction was slightly different and you were not allowing to use subquery in your main FROM part only.
P.S. I'm also use INNER JOIN ... ON syntax as it is easier to see what are table join conditions and what are where conditions.
P.P.S. It might be mistakes in query as I do not have your data structure so cannot execute the query and check. I only checked if the idea works with my test table.
EDIT I just re-read the question; my initial reading missed that inline views are disallowed.
We can avoid the ORDER BY ... DESC LIMIT 1 construct by making the subquery into an inline view (or, a "derived table" in the MySQL parlance), and using a MAX() aggregate.
As a trivial demonstration, this query:
SELECT b.foo
FROM bar b
ORDER
BY b.foo DESC
LIMIT 1
can be emulated with this query:
SELECT MAX(c.foo) AS foo
FROM (
SELECT b.foo
FROM bar b
) c
An example re-write of the first query in the question
SELECT ma.id
, ma.stagename
FROM musician ma
JOIN act d
ON d.musician = ma.id
JOIN band ba
ON ba.code = d.band
WHERE ba.type = 'solo'
GROUP
BY ma.id
, ma.stagename
HAVING COUNT(ma.id)
= ( SELECT MAX(c.count)
FROM (
SELECT COUNT(d2.musician) AS count
FROM act d2
JOIN band ba2
ON ba2.code = d2.band
WHERE ba2.type = 'solo'
GROUP
BY d2.musician
) c
)
NOTE: this is a demonstration of a rewrite of the query in the question; this makes no guarantee that this query (or the query in the question) are guaranteed to return a result that satisfies any particular specification. And the specification given in the question is not at all clear.

How to improve SELECT performance joining multiple tables

I have the following mySQL SELECT statement that was working ok on a small data set but died when the volume was increased:
SELECT DISTINCT Bookings.BookingId, Bookings.ResortId, Bookings.WeekBeginning, Bookings.DepartDate, Bookings.CancelledDate,Clients.FirstName, Clients.LastName, Clients.Email, Clients.Address1, Clients.City, Clients.State, Clients.CountryId, Clients.ClientType, Countries.Country, BookingAccommodation.AccomId, BookingAccommodation.ShareType, BookingProgram.ProgramId, Programs.ProgramDesc
FROM Bookings, Clients, BookingProgram, BookingAccommodation, Countries, ClientType, Programs
WHERE Bookings.BookingId = BookingProgram.BookingId
AND Bookings.BookingId = BookingAccommodation.BookingId
AND Bookings.WeekBeginning >= '2016-10-01'
AND BookingAccommodation.Nights > 0
AND Clients.ClientId = Bookings.ClientId
AND Clients.Email <> ''
AND Clients.CountryId = Countries.CountryId
AND Programs.ProgramId = BookingProgram.ProgramId
With around 10K records in Bookings and 25K records in each of BookingAccommodation and BookingPrograms the volume isn't huge but the query ran in 950 seconds. I'm running the query in the SQL window of phpAdmin on a local MAMP server.
Splitting it into 3 queries the result comes back in a fraction of a second for each:
SELECT DISTINCT Bookings.BookingId, Bookings.ResortId, Bookings.WeekBeginning, Bookings.DepartDate, Bookings.CancelledDate, Clients.FirstName, Clients.LastName, Clients.Email, Clients.Address1, Clients.City, Clients.State, Clients.CountryId, Clients.ClientType, Countries.Country
FROM Bookings, Clients, Countries, ClientType
WHERE Bookings.WeekBeginning >= '2016-10-01'
AND Clients.ClientId = Bookings.ClientId
AND Clients.Email <> ''
AND Clients.CountryId = Countries.CountryId
SELECT DISTINCT Bookings.BookingId, BookingAccommodation.AccomId, BookingAccommodation.ShareType
FROM Bookings, BookingAccommodation
WHERE Bookings.BookingId = BookingAccommodation.BookingId
AND Bookings.WeekBeginning >= '2016-10-01'
AND BookingAccommodation.Nights > 0
SELECT DISTINCT Bookings.BookingId, BookingProgram.ProgramId, Programs.ProgramDesc
FROM Bookings, BookingProgram, Programs
WHERE Bookings.BookingId = BookingProgram.BookingId
AND Bookings.WeekBeginning >= '2016-10-01'
AND Programs.ProgramId = BookingProgram.ProgramId
There are multiple records in BookingAccommodation and BookingProgram for each record in Bookings but I only require one record from each hence the SELECT DISTINCT.
The primary key on Bookings is BookingId.
The primary key on BookingAccommodation is BookingId, AccomDate, AccomId
The primary key on BookingProgram is BookingId, ProgramId, AccomType
I've tried to rewrite the query with joins and sub queries but I'm obviously not doing it right. How can I join these 3 queries back into a single query that will perform well?
These are the basics of using subqueries instead of joins (MySQL assumed FWIW). Apologies for pseudocode, I thought it important to answer ASAP as this is one of the top hits on this issue I faced just now.
A client makes a booking to go on a cruise ship. The client should also specify their diet (eg. vegetarian, vegan, no soy, etc). We thus have three tables:
Bookings
Booking_Id, Booking_Date, Booking_Time, Client_Id
Clients
Client_Id, Client_Name, Client_Phone, Client_DietId
Diets
Diet_Id, Diet_Name
We now want to present to the concierge a full booking view.
Using "JOINS":
SELECT Bookings.Booking_Id, Bookings.Booking_Date, Bookings.Booking_Time, Clients.Client_Name, Diets.Diet_Name
FROM Bookings
INNER JOIN Clients
ON Bookings.Client_Id = Clients.Client_Id
INNER JOIN Diets
ON Clients.Client_DietId = Diets.Diet_Id
Using "SUBQUERIES":
How I think of it is creating "temp tables" in those separate JOINs - of course "temp tables" may or may not be the accurate low-level implementation, etc. but anecdotally subqueries may be faster than huge joins (other threads on this).
I have separate joins I want to do from the above example:
First I need to join the Clients with their Diets, then I join that "table" with Bookings.
Thus I end up with this (note the table (re)naming when referring to the subquery):
SELECT [RELEVANT FIELDS HERE ETC]
FROM
(SELECT Clients.Client_Id, Clients.Client_Name, Diets.Diet_Name
FROM Clients
INNER JOIN Diets
ON Clients.Client_DietId = Diets.Diet_Id)
AS ClientDetailsWithDiets
INNER JOIN Bookings
ON Bookings.Booking_Id = ClientDetailsWithDiets.Client_Id
Now if another table is to be joined say Staff assigned to a particular Booking, then the whole thing above would be nested, and so on eg:
SELECT [RELEVANT FIELDS HERE ETC]
FROM
(SELECT [RELEVANT FIELDS HERE ETC]
FROM
(SELECT Clients.Client_Id, Clients.Client_Name, Diets.Diet_Name
FROM Clients
INNER JOIN Diets
ON Clients.Client_DietId = Diets.Diet_Id)
AS ClientDetailsWithDiets
INNER JOIN Bookings
ON Bookings.Booking_Id = ClientDetailsWithDiets.Client_Id)
AS BookingDetailsFull
INNER JOIN Staff
ON BookingDetailsFull.Booking_Id = Staff.Booking_Id_Assigned
Try changing it as
SELECT DISTINCT Bookings.BookingId, Bookings.ResortId,
Bookings.WeekBeginning, Bookings.DepartDate, Bookings.CancelledDate,
Clients.FirstName, Clients.LastName, Clients.Email, Clients.Address1,
Clients.City, Clients.State, Clients.CountryId, Clients.ClientType, Countries.Country,
BookingAccommodation.AccomId, BookingAccommodation.ShareType, BookingProgram.ProgramId,
Programs.ProgramDesc
FROM Bookings
JOIN Clients ON Clients.ClientId = Bookings.ClientId AND Bookings.WeekBeginning >= '2016-10-01' AND Clients.Email <> ''
JOIN BookingProgram ON Bookings.BookingId = BookingProgram.BookingId
JOIN BookingAccommodation ON Bookings.BookingId = BookingAccommodation.BookingId AND BookingAccommodation.Nights > 0
JOIN Countries ON Clients.CountryId = Countries.CountryId
JOIN Programs ON Programs.ProgramId = BookingProgram.ProgramId
WHERE Bookings.WeekBeginning >= '2016-10-01';
If this is not getting you the results you wanted, try EXPLAIN and see the query plan.
Please Note: I didn't see table ClientType is being used anywhere so I did not include it in JOINs
Rather than spend more time trying to improve the select statement as it hits so many tables I opted to split it into the separate queries as I outlined in the original question.
In the end this was the quickest practical solution.

optimizing query (LEFT JOIN)

My goal is to show search results of companies both: with categories and without (not added yet). My companies table has more or less 12 000 records. Companies with categories are only more or less 200.
There are two search inputs:
$name -> name of company or category.
$id_country -> id of the country
I want to display:
1) how many results is in all database. (that's why i use: SQL_CALC_FOUND_ROWS)
2) i use LIMIT to show 10 results per page (with pagination).
My query:
SELECT SQL_CALC_FOUND_ROWS
c.*,
lc.name as langName,
lc.shortDesc,
lc.longDesc
FROM companies c
JOIN lang_companies lc USING(id_company)
LEFT JOIN categories_companies cc USING(id_company)
LEFT JOIN lang_categories lang_cat USING (id_category)
WHERE
lc.id_lang = '2' AND c.status = 1 AND c.active = 1 AND c.id_country = ".$id_country." AND
(lc.name = LCASE('".$name."') OR (lang_cat.name = LCASE('".$name."') AND lang_cat.id_lang = '2')
OR c.city = '".$name."')
GROUP BY c.id_company
ORDER BY c.id_hierarchi asc
LIMIT 0, 10
This query executes more or less 6 seconds and I want to optimize it. Could you help me?
I will be grateful for any suggestions.
Out of the FROM part of your query, you do not seem to actually use the tables which are joined with these two lines:
LEFT JOIN categories_companies cc USING(id_company)
LEFT JOIN categories cat USING (id_category)
I presume you can simply exclude them from the query, if they are not relevant through something more sublte like the join suppressing rows.

MySQL - Trying to show results for rows that have 0 records...across 3 columns

There's a lot of Q&A out there for how to make MySQL show results for rows that have 0 records, but they all involve 1-2 tables/fields at most.
I'm trying to achieve the same ends, but across 3 fields, and I just can't seem to get it.
Here's what I've hacked together:
SELECT circuit.circuit_name, county.county_name, result.adr_result, count( result.adr_result ) AS num_results
FROM
(
SELECT cases.case_id, cases.county_id, cases.result_id
FROM cases
WHERE cases.status_id <> "2"
) q1
RIGHT JOIN county ON q1.county_id = county.county_id
RIGHT JOIN circuit ON county.circuit_id = circuit.circuit_id
RIGHT JOIN result ON q1.result_id = result.result_id
GROUP BY adr_result, circuit_name, county_name
ORDER BY circuit_name, county_name, adr_result
What I need to see is a list of ALL circuits in the first column, a list of ALL counties per circuit in the second column, a list of ALL possible adr_result entries for each county (they're the same for every county) in the third column, and then the respective count for the circuit/county/result combination-- even if it is 0. I've tried every combination of left, right and inner join (I know inner is definitely not the solution, but I'm frustrated) and just can't see where I'm going wrong.
Any help would be appreciated!
Here is a start. I can't follow your problem statement completely. For instance, what is the purposes of the cases table? None the less, when you say "ALL" records for each of those tables, I interpret it as a Cartesian product - which is implemented through the derived table in the FROM clause (notice the lack of the JOIN in that clause)
SELECT everthingjoin.circuit_name
, everthingjoin.county_name
, everthingjoin.adr_result
, COUNT(result.adr_result) AS num_results
FROM
(SELECT circuit.circuit_name, county.county_name, result.adr_result,
FROM circuit
JOIN county
JOIN result) AS everthingjoin
LEFT JOIN cases
ON cases.status_id <> "2"
AND cases.county_id = everthingjoin.county_id
LEFT JOIN circuit
ON everthingjoin.circuit_id = circuit.circuit_id
LEFT JOIN result
ON cases.result_id = result.result_id
GROUP BY adr_result, circuit_name, county_name
ORDER BY circuit_name, county_name, adr_result
try this, see if it provides some ideas:
SELECT
circuit.circuit_name
, county.county_name
, result.adr_result
, ISNULL(COUNT(result.*)) AS num_results
, COUNT(DISTINCT result.adr_result) AS num_distinct_results
FROM cases
LEFT JOIN county
ON cases.county_id = county.county_id
LEFT JOIN circuit
ON county.circuit_id = circuit.circuit_id
LEFT JOIN result
ON cases.result_id = result.result_id
WHERE cases.status_id <> "2"
GROUP BY
circuit.circuit_name
, county.county_name
, result.adr_result
ORDER BY
circuit_name, county_name, adr_result

Query worked fine on localhost but it's very slow on our server

We've tested with 1 million records on every table, results were fine, always under 0,08.
So we implemented on our server but it's very slow there, taking up to 36 secs.
We've asked for help before to optimize the query we were running on our test machine, we detailed the basic structure of our one to many relationship:
Problems to optimize large query and tables structure
That's the final query, the one we're using after getting help on the link above:
explain
SELECT
st.sid, st.title, st.summary, st.storynotes, st.thumb, st.completed, st.wordcount, st.rid, st.date, st.updated,
stats.total_reviews, stats.total_recommendations,
(SELECT GROUP_CONCAT(CAST(catid AS CHAR)) FROM fanfiction_stories_categories WHERE sid = st.sid) as categories,
(SELECT GROUP_CONCAT(CAST(genre_id AS CHAR)) FROM fanfiction_stories_genres WHERE sid = st.sid) as genres,
(SELECT GROUP_CONCAT(CAST(warning_id AS CHAR)) FROM fanfiction_stories_warnings WHERE sid = st.sid) as warnings
FROM
fanfiction_stories st
LEFT JOIN fanfiction_stories_stats stats ON st.sid = stats.sid
JOIN fanfiction_stories_categories cat ON st.sid = cat.sid AND cat.catid = 924
WHERE validated = 1
ORDER BY updated DESC
LIMIT 0, 15
That's the explain:
http://dl.dropbox.com/u/14508898/Printscreen/stackoverflow_explain_print_003.PNG
0 rows affected, 6 rows found. Duration for 1 query: 31,356 sec.
Updated
We removed some old indexes of the previous DB structure there was at fanfiction_stories and added new indexes to fanfiction_stories_categories, now is much faster. That's the updated explain:
http://dl.dropbox.com/u/14508898/Printscreen/stackoverflow_explain_print_004.PNG
Sorry, the program that I use only format the explain table as HTML, CSV, etc, doesn't make an ASCII table to display here.
Can we optimize it even more? Any help is very appreciated.
Hi There instead of a JOIN you might be better using an explicit INNER JOIN like:
It might also be all the GROUP_CONCAT's that you are doing, they are quite memory hungry.
SELECT
st.sid, st.title, st.summary, st.storynotes, st.thumb, st.completed, st.wordcount, st.rid, st.date, st.updated,
stats.total_reviews, stats.total_recommendations,
(SELECT GROUP_CONCAT(CAST(catid AS CHAR)) FROM fanfiction_stories_categories WHERE sid = st.sid) as categories,
(SELECT GROUP_CONCAT(CAST(genre_id AS CHAR)) FROM fanfiction_stories_genres WHERE sid = st.sid) as genres,
(SELECT GROUP_CONCAT(CAST(warning_id AS CHAR)) FROM fanfiction_stories_warnings WHERE sid = st.sid) as warnings
FROM
fanfiction_stories st
LEFT JOIN fanfiction_stories_stats stats ON st.sid = stats.sid
INNER JOIN fanfiction_stories_categories cat ON st.sid = cat.sid AND cat.catid = 924
WHERE validated = 1
ORDER BY updated DESC
LIMIT 0, 15
This should work although I don't have table structures and sample data to simulate. By removing each of the (SELECT ... ) as Column and just leaving as left joins, group the entire outer query by the sid should give the same result. I think its more efficient than each subquery AS Column than normal query/join. The Group_Concat is grouped based on the "sid" at the end anyway and should retain... The only thing that might be an issue is any NULL values at the end on these concat fields which you can then wrap with IFNULL() test.
I would ensure EACH of these tables has index on the "sid" used for the join. Additionally, your main stories table to have an index on Validated for its criteria = 1.
Based on your feedback, I would shift the criteria and first table to the top by categories.. Get ONE CATEGORY first, then see what stories are associated with it. Then, from only those stories, hook up the rest of the genre, warnings, comments, etc. You obviously have a smaller set of categories, so I would hit THAT as the primary table in the query. Let me know how this works.
SELECT STRAIGHT_JOIN
st.sid,
st.title,
st.summary,
st.storynotes,
st.thumb,
st.completed,
st.wordcount,
st.rid,
st.date,
st.updated,
stats.total_reviews,
stats.total_recommendations,
GROUP_CONCAT( DISTINCT cat.catid ) categories,
GROUP_CONCAT( DISTINCT genre.genre_id ) genres,
GROUP_CONCAT( DISTINCT warn.warning_id ) as warnings
FROM
fanfiction_stories_categories cat
JOIN fanfiction_stories st
ON cat.sid = st.sid
AND st.Validated = 1
LEFT JOIN fanfiction_stories_stats stats
ON st.sid = stats.sid
LEFT JOIN fanfiction_stories_genres genre
on st.sid = genre.sid
LEFT JOIN fanfiction_stories_warnings warn
on st.sid = warn.sid
WHERE
cat.catid = 924
group by
st.sid
ORDER BY
updated DESC
LIMIT
0, 15