Running a check within a SQL query (Maybe a subquery?) - mysql

I have a simple laptop testing booking system with 6 laptops, named Laptop01 to 06 that each have three allocated time slots.
A user is is able to select these time slots if the slot is not booked or if the booking has been cancelled/declined.
While I have working code, I've realised a fatal error that causes a cancelled/declined slot to duplicate.
Let me explain...
event_information - Holds the booking event information (only ID is
needed for this example)
event_machine_time - This hold all the
laptops, with three rows per laptop with the unique timings available
to choose from
event_booking - This holds the actual booking, which
then links to another candidate database, not included here
I then run a simple query that joins everything together and (I thought) identifies the booked events:
SELECT machine_laptop, machine_name, B.id AS m_id, C.id AS c_id, C.confirmed AS c_confirmed, C.live AS c_live,
(C.id IS NOT NULL AND C.confirmed !=2 AND C.live !=0) AS booked
FROM event_information A
INNER JOIN event_machine_time B ON ( 1 =1 )
LEFT JOIN event_booking C on (B.id = C.machine_time_id and A.id = C.information_id )
WHERE A.id = :id
ORDER BY `B`.`id` DESC
booked is checking if confirmed isn't 2 - which means the booking has been cancelled/declined (0 - not confirmed, 1 - confirmed) and live is checking for deletion (0 - deleted, 1 - not deleted).
However if a person either gets deleted (live - 0) or cancels/declines (confirmed - 2) then in my front end slot selector dropdown it will add an extra slot as the booked column is still 0, as shown below:
This allows the user to then choose from two slots at the same time, meaning double bookings occur.
I now know that using a Join is the wrong thing to do, and I'm presuming that I need to run a subquery, but I'm not an SQL expert and I would love some help to find examples of similar 'second queries' that I can learn from.
Also apologies if my terminology is wrong.
EDIT:
As requested I've included the output:
Second edit and conclusion:
In the end I managed to craft a solution together using a sub query to remove the cancelled/declined bookings before the output, then use a Group By to only display one of each timing. This most likely isn't the best way, but it worked for me.
SELECT machine_laptop, machine_name, B.id AS m_id, C.id AS c_id, C.confirmed AS c_confirmed, C.live AS c_live, B.start_time AS b_start_time, (
C.id IS NOT NULL
AND C.confirmed !=2
AND C.live !=0
) AS booked
FROM event_information A
INNER JOIN event_machine_time B ON (1=1)
LEFT JOIN (SELECT * FROM event_booking WHERE confirmed <> '2' AND live <> '0') AS C ON ( B.id = C.machine_time_id AND A.id = C.information_id )
WHERE A.id = :id
GROUP BY m_id
ORDER BY machine_name ASC, b_start_time ASC
Thank you for all your input.

Try below :
SELECT machine_laptop, machine_name, B.id AS m_id, C.id AS c_id, C.confirmed
AS c_confirmed, C.live AS c_live,
(C.id IS NOT NULL AND C.confirmed !=2 AND C.live !=0) AS booked
FROM event_information A
LEFT JOIN event_booking C ON A.id = C.information_id
RIGHT JOIN event_machine_time B ON B.id = C.machine_time_id
WHERE A.id = :id
ORDER BY `B`.`id` DESC

If you make the event_booking (B) as starting point for your query, you can see that there's no need to use pull all rows and columns from A and C. Intead you can join on matching rows directly. But as I can't even properly grasp what your query is trying to achieve, I have couple of questions first:
While this may work it's actually something that's not under your control nor defined by you. Some more strict mode would politely tell you to specify which aliased table you're referring to in your SELECT, as this
SELECT machine_laptop, machine_name -- combined with
FROM event_information A
actually doesn't make sense and the only reason why it's working is that you're leveraging on MySQL's optimisations. In addition to that you're trying to do table joins in a mixed mode (meaning that you use both JOIN and WHERE tA.colX=tB.colY methods. This makes it really difficult to follow.
INNER JOIN event_machine_time B ON ( 1 =1 )
Um? What exactly is the e purpose of this? As far as I can tell this will only cause it to JOIN both full tables, only to later filter the result using WHERE.
Furthermore, are you even using primary keys? Your condition includes C.id IS NOT NULL while primary keys can't even contain NULLs (as NULL is third boolean state in SQL land. There is True, False, and Null (meaning Undefined, which obviously couldn't be used in primary key, as primary key must be unique and Undefined value can be anything or nothing - ergo it's violating the uniqueness requirement). So I'm assuming you're actually using this NULL check because the temp table during JOIN seems to contain them?
EDIT:
Try to split this into two parts, where you first join 2 tables, and then join third table with the result.
I suggest you go briefly over What is the difference between "INNER JOIN" and "OUTER JOIN"? - as this is pretty great post and clarifies many aspects.
For startest I'd go with something like:
SELECT
<i.cols>,
<b.cols>,
<mt.cols>,
IF(b.confirmed !=2 AND b.live !=0, True, False) sa booked
FROM
event_booking b
LEFT JOIN
event_information i ON b.information_id = i.id
LEFT JOIN
event_machine_time mt ON b.machine_time_id = mt.id
WHERE <conditions>
Later I'd change LEFT JOIN into something more appropriate. However bear in mind that INNER JOIN is only useful if you're 100% sure that there rows returned from joined table columns are unique.
Can there even be 1:n, n:1 relationship between i and b tables? I'd assume there couldn't be multiple bookings to same event info (n:1), nor there'd be so that event information is the same for multiple events ? (1:n)

Related

How to rewrite UNION with LEFT JOIN more efficiently

I have two tables...one that registers users and one that checks in users. A user will always have a single entry in the register table but a user may have 0 or multiple entries in the checkin table. For a raffle selector, I wrote a query that is picking 1 entry from the register table and then 1 entry from the checkin table - each sub query picks a random entry so long as that userID does not exist in a 3rd table that stores the raffle winners. After the two entries are returned than it randomly selects one of the two returned entries as the winnner.
However, I believe there should be a more efficient way of writing this so its ONLY picking an entry once....not picking two entries and then picking one of the two.
It took me quite a while to figure out how to correctly write the below query as I am not proficient in mysql at all. The query works and seems to work efficiently, but I believe there should be a better way of writing it that also consolidates the amount of query code.
Hoping someone here can help or advise.
Table note: clubusers/clubHistory have multiple overlapping columns but the tables are not the same:
register = clubUsers
checkins = clubHistory
winners = clubRaffleWinners
SELECT * FROM (
(SELECT ch.user_ID,ch.clID FROM clubHistory AS ch
LEFT OUTER JOIN clubRaffleWinners AS cr1 ON
ch.user_ID=cr1.user_ID
AND cr1.cID=1157
AND cr1.rafID=18
AND cr1.crID=1001
AND cr1.ceID=1167
AND cr1.chDate1='2022-06-04'
WHERE
ch.cID=1157
AND ch.crID=1001
AND ch.ceID=1167
AND ch.chDate='2022-06-04'
AND cr1.user_ID IS NULL
GROUP BY ch.user_ID ORDER BY RAND() LIMIT 1
)
UNION
(SELECT cu.user_ID,cu.clID FROM clubUsers AS cu
LEFT OUTER JOIN clubRaffleWinners AS cr2 ON
cu.user_ID=cr2.user_ID
AND cr2.cID=1157
AND cr2.rafID=18
AND cr2.crID=1001
AND cr2.ceID=1167
AND cr2.chDate1='2022-06-04'
WHERE
cu.cID=1157
AND cu.crID=1001
AND cu.ceID=1167
AND cu.calDate<='2022-06-04'
AND cr2.user_ID IS NULL
GROUP BY cu.user_ID ORDER BY RAND() LIMIT 1
)
) AS foo order by RAND() LIMIT 1 ;
UPDATE:
As #JettoMartinez points out below, my current query could in fact randomly return the same user from each table so the final returned entry would just be the same user. I didn't realize this in my struggles just to get the above query to work. Thus my original OP asking for a more optimized query simply selecting a single random entry from both tables (where that user is not already in the winners table) is applicable for yet another reason.
There are two ways I can think of (Do note that since I don't fully understand the tables, I'm not using all the conditions you used in your JOIN statements, meaning it might need more work):
Using a exclusive subquery:
SELECT
cu.user_ID,
cu.clID,
ch.cID
FROM
clubUsers cu
LEFT JOIN clubHistory ch ON ch.user_ID = cu.user_ID
WHERE user_ID NOT IN (
SELECT
user_ID
FROM
clubRaffleWinners
WHERE
-- other conditions
)
ORDER BY RAND() LIMIT 1;
Using a LEFT "OUTER" JOIN, as you asked for:
SELECT
cu.user_ID,
cu.clID,
ch.cID -- Or any relevant field from clubHistory, really
FROM
clubUsers cu
LEFT JOIN clubHistory ch ON ch.user_ID = cu.user_ID
LEFT JOIN clubRaffleWinners cr ON cr.user_ID = cu.user_ID
AND ... -- other conditions to ensure uniqueness
AND ... -- that could also be in the WHERE part
WHERE
cr.user_ID IS NULL -- this will filter out the INNER part of the JOIN
ORDER BY RAND() LIMIT 1;
I don't have a dataset to properly test this queries, so please take them as a concept. I also didn't queried in clubHistory since I honestly don't see the point of doing so. Interpolating clubRaggleWinners to clubUsers seems enough for me.
EDIT
Since the user_ID in clubHistory is relevant to the raffle, I added a LEFT JOIN to it and added a field from said table in the SELECT statement, so that the user_id repeats once per entry in clubHistory plus the row of clubUsers, meaning that every user has 1 + number of entries / number of users + number of entries - number of winners chances to win.
This logic can be applied to the first query with a subquery too, and if the added field needs to be out, the query could be wrapped in a CTE or a subquery.
From what you are describing, and I want to make sure I understand.
Every registered person is qualified 1 entry.
However, each time they have checked in, they get 1 entry for each time they checked in. So, for someone registered and has NEVER checked-in, they get 1 entry. But if someone registered, and checked in 3 times, they would get a total of JUST the 3 times they checked in, vs 4 just for being registered.
Regardless of who is POSSIBLE, you want to EXCLUDE all people who have already been a winner in the raffle.
You SHOULD be able to get results from this below. Since the columns appear to be the same filtering on the cID, crID, ceID and Date, I have the primary FROM based on the registered clubUsers.
From that, a left-join to the clubHistory will either allow that person's ID to be returned once if only registered, OR multiple times based on the times checked in such as the example.
From the given user, I am also directly left-joining to the raffle winning history on the same criteria. If its the same criteria to the club history join, and the same criteria to the raffle (with exception of rafID = 18), appearing to indicate a specific raffle being drawn for, If the person is found, or not, the final WHERE accounts to exclude if its the single entry, or multiple entries via the IS NULL test.
The query will return all entries single or multiple, that have not already won in the order by RAND() qualifier, and apply a single LIMIT 1 to get the final winner. I dont know why you needed what appeared to be the clubhouse ID when you only really care about WHO won, without any regard to being a clubhouse history entry or not.
SELECT
cu.user_ID
FROM
clubUsers AS cu
LEFT JOIN clubHistory ch
on cu.user_ID = ch.user_ID
AND cu.cID = ch.cID
AND cu.crID = ch.crID
AND cu.ceID = ch.ceID
AND ch.chDate = '2022-06-04'
LEFT JOIN clubRaffleWinners AS crw
ON cu.user_ID = crw.user_ID
AND cu.cID = crw.cID
AND cu.crID = crw.crID
AND cu.ceID = crw.ceID
AND crw.chDate1 = '2022-06-04'
AND crw.rafID = 18
WHERE
cu.cID = 1157
AND cu.crID = 1001
AND cu.ceID = 1167
AND cu.calDate <= '2022-06-04'
AND crw.user_id IS NULL
order by
RAND()
LIMIT 1
For performance purposes, I would ensure the following indexes
table index
clubUsers ( cid, crID, ceID, calDate, user_id )
clubHistory ( user_id, cID, crID, ceID, chDate )
clubRaffleWinners ( user_id, cID, crID, ceID, chDate1, rafID )
(Just a Comment, but need formatting.)
I would start by trying to put these 4 values in a single table, not repeated across 3 tables:
cu.cID=1157
AND cu.crID=1001
AND cu.ceID=1167
AND cu.calDate<='2022-06-04'
Please provide SHOW CREATE TABLE for each table; then I can assess whether the recommended indexes make sense.

MySQL View in place of subquery does not return the same result

The query below is grabbing some information about a category of toys and showing the most recent sale price for three levels of condition (e.g., Brand New, Used, Refurbished). The price for each sale is almost always different. One other thing - the sales table row id's are not necessarily in chronological order, e.g., a toy with a sale id of 5 could have happened later than a toy with a sale id of 10).
This query works but is not performant. It runs in a manageable amount of time, usually about 1s. However, I need to add yet another left join to include some more data, which causes the query time to balloon up to about 9s, no bueno.
Here is the working but nonperformant query:
SELECT b.brand_name, t.toy_id, t.toy_name, t.toy_number, tt.toy_type_name, cp.catalog_product_id, s.date_sold, s.condition_id, s.sold_price FROM brands AS b
LEFT JOIN toys AS t ON t.brand_id = b.brand_id
JOIN toy_types AS tt ON t.toy_type_id = tt.toy_type_id
LEFT JOIN catalog_products AS cp ON cp.toy_id = t.toy_id
LEFT JOIN toy_category AS tc ON tc.toy_category_id = t.toy_category_id
LEFT JOIN (
SELECT date_sold, sold_price, catalog_product_id, condition_id
FROM sales
WHERE invalid = 0 AND condition_id <= 3
ORDER BY date_sold DESC
) AS s ON s.catalog_product_id = cp.catalog_product_id
WHERE tc.toy_category_id = 1
GROUP BY t.toy_id, s.condition_id
ORDER BY t.toy_id ASC, s.condition_id ASC
But like I said it's slow. The sales table has about 200k rows.
What I tried to do was create the subquery as a view, e.g.,
CREATE VIEW sales_view AS
SELECT date_sold, sold_price, catalog_product_id, condition_id
FROM sales
WHERE invalid = 0 AND condition_id <= 3
ORDER BY date_sold DESC
Then replace the subquery with the view, like
SELECT b.brand_name, t.toy_id, t.toy_name, t.toy_number, tt.toy_type_name, cp.catalog_product_id, s.date_sold, s.condition_id, s.sold_price FROM brands AS b
LEFT JOIN toys AS t ON t.brand_id = b.brand_id
JOIN toy_types AS tt ON t.toy_type_id = tt.toy_type_id
LEFT JOIN catalog_products AS cp ON cp.toy_id = t.toy_id
LEFT JOIN toy_category AS tc ON tc.toy_category_id = t.toy_category_id
LEFT JOIN sales_view AS s ON s.catalog_product_id = cp.catalog_product_id
WHERE tc.toy_category_id = 1
GROUP BY t.toy_id, s.condition_id
ORDER BY t.toy_id ASC, s.condition_id ASC
Unfortunately, this change causes the query to no longer grab the most recent sale, and the sales price it returns is no longer the most recent.
Why is it that the table view doesn't return the same result as the same select as a subquery?
After reading just about every top-n-per-group stackoverflow question and blog article I could find, getting a query that actually worked was fantastic. But now that I need to extend the query one more step I'm running into performance issues. If anybody wants to sidestep the above question and offer some ways to optimize the original query, I'm all ears!
Thanks for any and all help.
The solution to the subquery performance issue was to use the answer provided here: Groupwise maximum
I thought that this approach could only be used when querying a single table, but indeed it works even when you've joined many other tables. You just have to left join the same table twice using the s.date_sold < s2.date_sold join condition and make sure the where clause looks for the null value in the second table's id column.

SQL: select non-archived and referenced entries among three tables

I got the following database layout. The table on the left - the "exhibit" is somehow the parent. The one in the middle - the "flyer" table describes some data about the exhibit and is therefore linked to it by ex_ID. Let's say it's like versioning. Only the latest "flyer" (the highest ID) is important. On the right side you got "text" parts that reference a row in the "flyer" table directly.
So here's the question. How can I retrieve in ONE statements all flyers a) with the highest ID to a non-archived exhibit and that are b) referenced by non-archived text-parts. For example, as shown in the picture, ID 1 and 4 of the flyer-table should not be returned.
For a SQL expert, this might be an easy question. But for all others, that's learning. So, please, no down-voting.
This statement returns only flyers with the highest id by exhibition, that are referenced by a non archived exhibition and have either not archived parts or no referenced parts at all:
SELECT
f.id fid,
f.ex_id,
e.id eid,
e.archived earchived,
p.id pid,
p.archived parchived
FROM
Flyer f
INNER JOIN (
SELECT
MAX(id) max_id
FROM
Flyer
GROUP BY
ex_id
) t
ON
f.id = t.max_id
INNER JOIN
Exhibition e
ON
f.ex_id = e.id
AND
e.archived = false
LEFT JOIN
Part p
ON
f.id = p.fl_id
AND
p.archived = false
;
Explanation
The INNER JOIN to the subselect will give us the highest id of the
Flyer table by Exhibition id.
We use then an INNER JOIN to Exhibiton to get the details of the exhibiton, only if these are not archived
and a LEFT JOIN to the part table to get only those parts that are not archived or non existent.
DEMO

I can't wrap my head around joins

So, alright, I have a few tables. My current query runs against a "historical" table. I want to do a join of some kind to get the most recent status from my Current table. These tables share a like column, called "ID"
Here's the structure
ddCurrent
-ID
-Location
-Status
-Time
ddHistorical
-CID (AI field to keep multiple records per site)
-ID
-Location
-Status
-Time
My goal now is to do a simple join to get all the variables from ddHistorical and the current Status from ddCurrent.
I know that they can be joined on ID since both of them have the same items in their ID tables, I just can't figure out which kind of join is appropriate or why?
I'm sure someone may provide a specific link that goes into great detail explaining, but I'll try to summarize it this way. When writing a query, I try to list the tables from the position of what table do I want to get data from and have that as my first table in the "FROM" clause. Then, do "JOIN" criteria to other tables based on relationships (such as IDs). In your example
FROM
ddHistorical ddH
INNER JOIN ddCurrent ddC
on ddH.ID = ddC.ID
In this case, INNER JOIN (same as JOIN) the ddHistorical table is the left table(listed first for my styling consistency and indentation) and ddCurrent is the right table. Notice my ON criteria that joins them together is also left alias.column = right alias table.column -- again, this is just for mental correlation purposes.
an Inner Join (or JOIN) means a record MUST have a match on each side, otherwise it is discarded.
A LEFT JOIN means give me all records in the LEFT table (ddHistorical in this case), regardless of a matching in the right-side table (ddCurrent). Not practical in this example.
A RIGHT JOIN is the reverse... give me all records from the RIGHT-side table REGARDLESS of a matching record in the left side table. Most of the time you will see LEFT-JOINs more frequently than RIGHT-JOINs.
Now, a sample to mentally get the left-join. You work at a car dealership and have a master table of 10 cars that are sold. For a given month, you want to know what IS NOT selling. So, start with the master table of all cars and look at the sales table for what DID sell. If there is NO such sales activity the right-side table will have NULL value
select
M.CarID,
M.CarModel
from
MasterCarsList M
LEFT JOIN CarSales CS
on M.CarID = CS.CarID
AND month( CS.DateSold ) = 4
where
CS.CarID IS NULL
So, my LEFT join is based on a matching car ID -- AND -- the month of sales activity is 4 (April) as I may not care about sales for Jan-Mar -- but would also qualify year too, but this is a simple sample.
If there is no record in the Car Sales table it will have a NULL value for all columns. I just happen to care about the car ID column since that was the join basis. That is why I am including that in the WHERE clause. For all other types of cars that DO have a sale it will have a value.
This is a common approach you will see in querying where someone looking for all regardless of other... Some use a where NOT EXIST ( subselect ), but those perform slower because they test on every record. Having joins is much faster.
Other examples may be you want a list of all employees of a company, and if they had some certification / training to show it... You still want all employees, but LEFT-JOINING to some certification/training table would expose those extra field as needed.
select
Emp.FullName,
Cert.DateCertified
FROM
Employees Emp
Left Join Certifications Cert
on Emp.EmpID = Cert.EmpID
Hopefully these samples help you understand better the relationship for queries, and now to actually provide answer for your needs.
If what you want is a list of all "Current" items and want to look at their historical past, I would use current FIRST. This might be if your current table of things is 50, but historically your table had 420 items. You don't care about the other 360 items, just those that are current and the history of those.
select
ddC.WhateverColumns,
ddH.WhateverHistoricalColumns
from
ddCurrent ddC
JOIN ddHistorical ddH
on ddC.ID = ddH.ID
If there is always a current field then a simple INNER JOIN will do it
SELECT a.CID, a.ID, a.Location, a.Status, a.Time, b.Status
FROM ddHistorical a
INNER JOIN ddCurrent b
ON a.ID = b.ID
An INNER JOIN will omit any ddHistorical rows that don't have a corresponding ID in ddCurrent.
A LEFT JOIN will include all ddHistorical rows, even if they don't have a corresponding ID in ddCurrent, but the ddCurrent values will be null (because they're unknown).
Also note that a LEFT JOIN is just a specific type of outer join. Don't bother with the others yet - 90% or more of what you'll ever do will be INNER or LEFT.
To include only those ddHistorical rows where the ID is in ddCurrent:
SELECT h.CID, h.ID, h.Location, h.Status, c.Status, h.Time
FROM ddHistorical h
INNER JOIN ddCurrent c ON h.ID = c.ID
If you want to include ddHistorical rows even if the ID isn't in ddCurrent:
SELECT h.CID, h.ID, h.Location, h.Status, c.Status, h.Time
FROM ddHistorical h
LEFT JOIN ddCurrent c ON h.ID = c.ID
If all ddHistorical rows happen to match an ID in ddCurrent, note that both queries will return the same result.

MySQL query return unexpected values

Need to generate courses list and count
all
unanswered
answered but unchecked
Questions.
My database structure is looking like that
https://docs.google.com/open?id=0B9ExyO6ktYcOenZ1WlBwdlY2R3c
Explanation for some of tables:
answer_chk_results - checked answers table. So if some answer doesn't exist on this table, it means it's unchecked
lesson_questions - lesson <-> question associations (by id) table
courses-lessons - courses <-> lessons associations (by id) table
Executing
SELECT
c.ID,
c. NAME,
COUNT(lq.id) AS Questions,
COUNT(
CASE
WHEN a.id IS NULL THEN
lq.id
END
) AS UnAnswered,
COUNT(
CASE
WHEN cr.id IS NULL THEN
lq.id
END
) AS UnChecked
FROM
courses c
LEFT JOIN `courses-lessons` cl ON cl.cid = c.id
LEFT JOIN `lesson_questions` lq ON lq.lid = cl.lid
LEFT JOIN answers a ON a.qid = lq.qid
LEFT JOIN answer_chk_results cr ON cr.aid = a.id
GROUP BY
c.ID
Tested it first on SQL fiddle with sample data. (Real data is huge, so I can't place it on sqlfiddle) It returned some values. Thought works well. But while I test it with real data, see that returns wrong values. Forex, when I manually count, result for all questions count must be 25, but it returns 27. Maybe I'm doing something wrong.
Note MySQL server running on my local machine, so I can give you teamviewer id and password if you want to connect to my desktop remotely and test query with real data.
I suspect the problem is that different joins are resulting in a multiplication of rows. The best way to fix this is by using subqueries along each dimension. The following is a more practical way. Replace the COUNTs in the select with COUNT DISTINCT:
SELECT c.ID, c. NAME,
COUNT(distinct lq.id) AS Questions,
COUNT(distinct CASE WHEN a.id IS NULL THEN lq.id END) AS UnAnswered,
COUNT(distinct CASE WHEN cr.id IS NULL THEN lq.id END) AS UnChecked
Compared to COUNT, COUNT DISTINCT is a resource hog (it has to remove duplicates). However, it will probably work fine for your purposes.
Use this query
SELECT
c.ID,
c.NAME,
COUNT(lq.id) AS Questions,
COUNT(IFNULL(a.id),lq.id)AS UnAnswered,
COUNT(IFNULL(cr.id),lq.id)AS UnChecked,
FROM courses c
LEFT JOIN `courses-lessons` cl ON cl.cid = c.id
LEFT JOIN `lesson_questions` AS lq ON lq.lid = cl.lid
LEFT JOIN answers a ON a.qid = lq.qid
LEFT JOIN answer_chk_results cr ON cr.aid = a.id
GROUP BY c.ID