how do you convert a join query into a nested query (using "where in" condition)?
for eg how to convert this into a nested query?
SELECT student.studentname,
schedule.subcode,
AVG(attendance.ispresent)*100 AS Attendance_Status
FROM student
JOIN attendance
ON student.usn = attendance.usn
JOIN schedule
ON schedule.sched_id = attendance.sched_id
WHERE student.usn="4jc14is008"
GROUP BY schedule.subcode
ORDER BY schedule.subcode;
This will be evaluated as an NLJ ("Nested Loop Join").
Look in student, filtering on usn = '...'.
For each such row (probably one in this case), do NLJ into attendance to locate any and all rows that satisfy the ON given. This leads to a longer (or shorter) list of rows.
For each of the rows in #2, do NLJ into schedule based on ON. Now there is a set of rows...
Do the GROUP BY and ORDER BY (these can be done simultaneously since they are identical). This will deliver no more rows than what #3 gave. (GROUP BY can never increase the number of rows.)
Do not rewire the query into WHERE x IN ( SELECT ... ), that usually make it run slower.
Related
What I'm after is to see what is the fastest lap time for particular races, which will be identified by using race name and race date.
SELECT lapName AS Name, lapDate AS Date, T
FROM Lap
INNER JOIN Race ON Lap.lapName = Race.Name
AND Lap.lapDate = Race.Date
GROUP BY Date;
It currently only displays 3 different race names, with 4 different dates, meaning I've got 4 combinations total, when there are in fact 9 unique race name, race date combinations.
Unique race data is stored in the Race table. Laptimes are stored in the LapInfo table.
I'm also getting a warning about my group statement saying it is ambiguous though it still runs.
You don't seem to need a join for this:
SELECT l.lapRaceName, l.lapRaceDate,
MIN(l.lapTime)
FROM LapInfo l
GROUP BY l.lapRaceName, l.lapRaceDate;
If you don't need a JOIN, it is superfluous to put one in the query.
First of all, your query is actually invalid SQL. You need to use the MIN function to get the fastest lapTime. Also, you have to GROUP BY lapRaceName, raceDate instead of just lapRaceName. Unfortunately, in this case, mysql is lax enough to execute it without error.
Also, you JOIN LapInfo with Race, and return jthe joined columns from LapInfo that you alias as names that can be found in Race. That's OK from SQL point of view, but that's also usulessly complicated : return directly the columns from the Race table, as they have the names that you are looking for.
Finally, it would be far better to indicate which table each column belongs to. Here, column lapTime belongs to table LapInfo, so let's make it explicit.
Query :
SELECT
Race.raceName,
Race.raceDate,
MIN(LapInfo.lapTime)
FROM
Race
INNER JOIN LapInfo
ON LapInfo.lapRaceName = Race.raceName
AND LapInfo.lapRaceDate = Race.raceDate
GROUP BY
Race.raceName,
Race.raceDate
;
I have a MYSQL query of this form:
SELECT
employee.name,
totalpayments.totalpaid
FROM
employee
JOIN (
SELECT
paychecks.employee_id,
SUM(paychecks.amount) totalpaid
FROM
paychecks
GROUP BY
paychecks.employee_id
) totalpayments on totalpayments.employee_id = employee.id
I've recently found that this returns MUCH faster in this form:
SELECT
employee.name,
(
SELECT
SUM(paychecks.amount)
FROM
paychecks
WHERE
paychecks.employee_id = employee.id
) totalpaid
FROM
employee
It surprises me that there would be a difference in speed, and that the lower query would be faster. I prefer the upper form for development, because I can run the subquery independently.
Is there a way to get the "best of both worlds": speedy results return AND being able to run the subquery in isolation?
Likely, the correlated subquery is able to make effective use of an index, which is why it's fast, even though that subquery has to be executed multiple times.
For the first query with the inline view, that causing MySQL to create a derived table, and for large sets, that's effectively a MyISAM table.
In MySQL 5.6.x and later, the optimizer may choose to add an index on the derived table, if that would allow a ref operation and the estimated cost of the ref operation is lower than the nested loops scan.
I recommend you try using EXPLAIN to see the access plan. (Based on your report of performance, I suspect you are running on MySQL version 5.5 or earlier.)
The two statements are not entirely equivalent, in the case where there are rows in employees for which there are no matching rows in paychecks.
An equivalent result could be obtained entirely avoiding a subquery:
SELECT e.name
, SUM(p.amount) AS total_paid
FROM employee e
JOIN paychecks p
ON p.employee_id = e.id
GROUP BY e.id
(Use an inner join to get a result equivalent to the first query, use a LEFT outer join to be equivalent to the second query. Wrap the SUM() aggregate in an IFNULL function if you want to return a zero rather than a NULL value when no matching row with a non-null value of amount is found in paychecks.)
Join is basically Cartesian product that means all the records of table A will be combined with all the records of table B. The output will be
number of records of table A * number of records of table b =rows in the new table
10 * 10 = 100
and out of those 100 records, the ones that match the filters will be returned in the query.
In the nested queries, there is a sample inner query and whatever is the total size of records of the inner query will be the input to the outter query that is why nested queries are faster than joins.
I have data that resemble stock data that is being updated every hour. So there are 24 entries every day for each stock. (just using stock as an example). But sometimes, the data may not be updated.
For example, let's assume we have 3 stocks, A, B, C. And assume that we gather data at various intervals during the day for each stock. The data would look something like this...
row A B C
1 3 4 5
2 3.5 4.1 5
3 2.9 3.8 4.3
What I want is to sum up the average value of each stock for this time period or
Avg(A) + Avg(B) + Avg(C)
In reality I have hundreds of stocks and hundreds of thousands of rows. I need this to calculate for a single day.
I tried this (stock names are in an array - stocks = array('A','B','C'))
SELECT SUM(AVG(stock_price)) FROM table WHERE date = [mydate] AND stock_name IN () ('".implode("','", $stocks)."') GROUP BY stock_name
but that didn't work. Can someone provide some insight?
Thanks, in advance.
Calculate the per-stock averages in a sub-query, then sum them in the main query.
SELECT SUM(average_price) AS total_averages
FROM (SELECT AVG(price) AS average_price)
FROM table
WHERE <conditions>
GROUP BY stock_name) AS averages
One way to do it, use an inline view as a rowsource:
SELECT SUM(a.avg_stock_price) AS sum_avg_stock_price
FROM ( SELECT AVG(t.stock_price) AS avg_stock_price
FROM table t
WHERE t.date = [mydate]
AND t.stock_name IN ('a','b','c')
GROUP BY t.stock_name
) a
You can run just the query from the inline view (aliased as a) to get verify the results it returns. The outer query runs against the set of rows returned by the inline view query. (MySQL refers to the inline view (aliased as a) as a "derived table".
The outer query is effectively like this:
SELECT SUM(a.avg_stock_price) AS sum_avg_stock_price
FROM a
The "trick" is that "a" isn't a regular table, it's a set of rows returned by a query; but in terms of relational algebra theory, it works the same... it's a set or rows. If a were a regular table, we could write:
SELECT b.col
FROM (
SELECT col FROM a
) b
We don't want to do that in MySQL when we don't have to, because of the inefficient way that MySQL processes that. MySQL first runs the inner query (the query in the inline view). MySQL creates a temporary MyISAM table, and inserts the rows returned by the query into the temporary MyISAM table. MySQL then runs the outer query, against that temporary table (which MySQL refers to as a "derived table") to return the result. Creating and populating a temporary table that's a copy of a regular table is a lot of overhead, especially with large sets.
What makes this powerful is that inline view query can include JOINs, WHERE clause, aggregates, GROUP BY, whatever. As long as it returns a set of rows (with appropriate column names), we can wrap the query in parens, and reference it in another query like it was a table.
In my application I have two MySQL tables, 'units' and 'impressions' in relation one to many. I need to fetch list of all ad units from units table but also fetch impressions count for each ad unit.
I have two SELECT queries to do this task (simplified for this example), first using sub-select:
SELECT
(SELECT COUNT(*) FROM impressions WHERE impression_unit_id = unit_id) AS impressions_count,
unit_id
FROM units;
and second using GROUP BY:
SELECT
COUNT(impression_id) AS impressions_count,
unit_id
FROM units
LEFT JOIN impressions ON impression_unit_id = unit_id
GROUP BY unit_id;
Sub-select query runs for each record (ad unit) so GROUP BY looks smarter but it has one JOIN more. Which one to prefer for performance?
The GROUP BY query will perform better. The query optimizer might optimize the first query to use a join, but I wouldn't count on it since it is written to use a dependent sub-query, which will be much slower. As long as the tables are properly indexed, JOINs should not be a major concern for performance.
The first query, if it doesn't get optimized to use a JOIN will have to run the sub-query for each row in the unit table, where the JOIN query does it all in one operation.
To find out how the query gets optimized, run an EXPLAIN of both queries. If the first one uses a dependent sub-query, it will be slower.
I have followed the tutorial over at tizag for the MAX() mysql function and have written the query below, which does exactly what I need. The only trouble is I need to JOIN it to two more tables so I can work with all the rows I need.
$query = "SELECT idproducts, MAX(date) FROM results GROUP BY idproducts ORDER BY MAX(date) DESC";
I have this query below, which has the JOIN I need and works:
$query = ("SELECT *
FROM operators
JOIN products
ON operators.idoperators = products.idoperator JOIN results
ON products.idProducts = results.idproducts
ORDER BY drawndate DESC
LIMIT 20");
Could someone show me how to merge the top query with the JOIN element from my second query? I am new to php and mysql, this being my first adventure into a computer language I have read and tried real hard to get those two queries to work, but I am at a brick wall. I cannot work out how to add the JOIN element to the first query :(
Could some kind person take pity on a newb and help me?
Try this query.
SELECT
*
FROM
operators
JOIN products
ON operators.idoperators = products.idoperator
JOIN
(
SELECT
idproducts,
MAX(date)
FROM results
GROUP BY idproducts
) AS t
ON products.idproducts = t.idproducts
ORDER BY drawndate DESC
LIMIT 20
JOINs function somewhat independently of aggregation functions, they just change the intermediate result-set upon which the aggregate functions operate. I like to point to the way the MySQL documentation is written, which hints uses the term 'table_reference' in the SELECT syntax, and expands on what that means in JOIN syntax. Basically, any simple query which has a table specified can simply expand that table to a complete JOIN clause and the query will operate the same basic way, just with a modified intermediate result-set.
I say "intermediate result-set" to hint at the mindset which helped me understand JOINS and aggregation. Understanding the order in which MySQL builds your final result is critical to knowing how to reliably get the results you want. Generally, it starts by looking at the first row of the first table you specify after 'FROM', and decides if it might match by looking at 'WHERE' clauses. If it is not immediately discardable, it attempts to JOIN that row to the first JOIN specified, and repeats the "will this be discarded by WHERE?". This repeats for all JOINs, which either add rows to your results set, or remove them, or leaves just the one, as appropriate for your JOINs, WHEREs and data. This process builds what I am referring to when I say "intermediate result-set". Somewhere between starting and finishing your complete query, MySQL has in it's memory a potentially massive table-like structure of data which it built using the process I just described. Only then does it begin to aggregate (GROUP) the results according to your criteria.
So for your query, it depends on what specifically you are going for (not entirely clear in OP). If you simply want the MAX(date) from the second query, you can simply add that expression to the SELECT clause and then add an aggregation spec to the end:
SELECT *, MAX(date)
FROM operators
...
GROUP BY idproducts
ORDER BY ...
Alternatively, you can add the JOIN section of the second query to the first.