Leetcode problem link: https://leetcode.com/problems/rising-temperature/
The solution that I don't understand:
SELECT
weather.id AS 'Id'
FROM
weather
JOIN
weather w ON DATEDIFF(weather.recordDate, w.recordDate) = 1
AND weather.Temperature > w.Temperature
;
Here weather and w (Alias of weather) are the same tables and DATEDIFF is comparing dates but I don't understand this, If weather and w are the same tables then doesn't that mean that DATEDIFF is comparing the same rows. The solution is correct which means that both rows are not same how?
The table is the same and the columns being compared are the same but not the same rows satisfy the conditions on both tables.
Simplistic put a table join is a internal product of the rows of the tables involved. Which means that a self join of a table with 3 rows will return 9 rows.
When you set conditions, the result set is filtered and only the combined rows that satisfy them will be returned. In the exercise, the conditions are a relation between the row of the first table instance and the second instance.
It is better if you alias both copies of the table:
SELECT w1.id
FROM weather w1 JOIN weather w2
ON DATEDIFF(w1.recordDate, w2.recordDate) = 1 AND w1.Temperature > w2.Temperature;
What this query does is join every row of the table with every other row of the same table which has as recordDate the previous day and less Temperature and returns the id of the 1st copy (if the conditions are satisfied).
In fact all rows of the table are compared against all rows, but when it comes to the same rows they are rejected because for them the conditions fail.
Also, note that your query may return the same id more than once, because it could happen that for a row of the table there may exist more than 1 other rows where the date is 1 day less and the temperature is less.
So maybe you want:
SELECT DISTINCT w1.id
.....................
Related
Am I understanding what Left Join is supposed to do?
I have a query. Call it Query A. It returns 19 records.
I have another query, Query B. It returns 1,400 records.
I insert Query B into Query A as a left join, so Query A becomes:
SELECT *
FROM tableA
LEFT JOIN (<<entire SQL of Query B>>) ON tableA.id = tableB.id
Now, a Left Join means everything from Table A, and only records from Table B where they match. So no matter what, this mixed query should not return more than the 19 records that the original Query A returns. What I actually get is 1,000 records.
Am I fundamentally misunderstanding how LEFT JOIN works?
You're not exactly misunderstanding LEFT JOIN, so much as the results implied by it. If you have only one row in A, and 1000 in B that reference to the id of that single row in A; your result will be 1000 rows. You're overlooking that the relation may be 1-to-many. The size of the "left" table/subquery (subject to WHERE conditions) is the lower bound for the number of results.
Yes, you are misunderstanding slightly.
Now, a Left Join means everything from Table A, and only records from Table B where they match.
So far, so good: the data from Table B will be included only if it matches against Table A according to the rules you specify in the ON clause.
So no matter what, this mixed query should not return more than the 19 records that the original Query A returns.
This seems like it makes sense, until you realise that more than one row in Table B can match the same row in Table A.
Let's say you have 2 rows in Table A, one with A_ID=1 and one with A_ID=3; and 10 rows in Table B; 5 of the rows in Table B have A_ID=1, and 5 have A_ID=2. All the rows in Table B have different values for B_ID.
If you use a Left Join with the condition that A_ID must match, which rows will you get?
The row from Table A with A_ID=3 will show up once, with a NULL value for B_ID, because there is no row in Table B to match it.
The 5 rows from Table B with A_ID=2 won't show up at all, because they don't match any rows from Table A.
The 5 rows from Table B with A_ID=1 will all show up, each partnered with the 1 row from Table A with A_ID=1.
So you get 6 results, even though there were only 2 rows in Table A.
I have two tables called tc_revenue and tc_rates.
tc_revenue contains :- code, revenue, startDate, endDate
tc_rate contains :- code, tier, payout, startDate, endDate
Now I need to get records where code = 100 and records should be unique..
I have used this query
SELECT *
FROM task_code_rates
LEFT JOIN task_code_revenue ON task_code_revenue.code = task_code_rates.code
WHERE task_code_rates.code = 105;
But I am getting repeated records help me to find the correct solution.
eg:
in this example every record is repeated 2 time
Thanks
Use a group by for whatever field you need unique. For example, if you want one row per code, then:
SELECT * FROM task_code_rates LEFT JOIN task_code_revenue ON task_code_revenue.code = task_code_rates.code
where task_code_rates.code = 105
group by task_code_revenue.code, task_code_revenue.tier
If code admits duplicates in both tables and you perform join only using code, then you will get the cartessian product between all matching rows from one table and all matching rows from the other.
If you have 5 records with code 100 in first table and 2 records with code 100 in second table, you'll get 5 times 2 results, all combinations between matching rows from the left and the right.
Unless you have duplicates inside one (or both) tables, all 10 results will differ in colums coming either from one table, the other or both.
But if you were expecting to get two combined rows and three rows from first table with nulls for second table columns, this will not happen.
This is how joins work, and anyway, how should the database decide which rows to combine if it didn't generate all combinations and let you decide in where clause?
Maybe you need to add more criteria to the ON clause, such as also matching dates?
I have data that resemble stock data that is being updated every hour. So there are 24 entries every day for each stock. (just using stock as an example). But sometimes, the data may not be updated.
For example, let's assume we have 3 stocks, A, B, C. And assume that we gather data at various intervals during the day for each stock. The data would look something like this...
row A B C
1 3 4 5
2 3.5 4.1 5
3 2.9 3.8 4.3
What I want is to sum up the average value of each stock for this time period or
Avg(A) + Avg(B) + Avg(C)
In reality I have hundreds of stocks and hundreds of thousands of rows. I need this to calculate for a single day.
I tried this (stock names are in an array - stocks = array('A','B','C'))
SELECT SUM(AVG(stock_price)) FROM table WHERE date = [mydate] AND stock_name IN () ('".implode("','", $stocks)."') GROUP BY stock_name
but that didn't work. Can someone provide some insight?
Thanks, in advance.
Calculate the per-stock averages in a sub-query, then sum them in the main query.
SELECT SUM(average_price) AS total_averages
FROM (SELECT AVG(price) AS average_price)
FROM table
WHERE <conditions>
GROUP BY stock_name) AS averages
One way to do it, use an inline view as a rowsource:
SELECT SUM(a.avg_stock_price) AS sum_avg_stock_price
FROM ( SELECT AVG(t.stock_price) AS avg_stock_price
FROM table t
WHERE t.date = [mydate]
AND t.stock_name IN ('a','b','c')
GROUP BY t.stock_name
) a
You can run just the query from the inline view (aliased as a) to get verify the results it returns. The outer query runs against the set of rows returned by the inline view query. (MySQL refers to the inline view (aliased as a) as a "derived table".
The outer query is effectively like this:
SELECT SUM(a.avg_stock_price) AS sum_avg_stock_price
FROM a
The "trick" is that "a" isn't a regular table, it's a set of rows returned by a query; but in terms of relational algebra theory, it works the same... it's a set or rows. If a were a regular table, we could write:
SELECT b.col
FROM (
SELECT col FROM a
) b
We don't want to do that in MySQL when we don't have to, because of the inefficient way that MySQL processes that. MySQL first runs the inner query (the query in the inline view). MySQL creates a temporary MyISAM table, and inserts the rows returned by the query into the temporary MyISAM table. MySQL then runs the outer query, against that temporary table (which MySQL refers to as a "derived table") to return the result. Creating and populating a temporary table that's a copy of a regular table is a lot of overhead, especially with large sets.
What makes this powerful is that inline view query can include JOINs, WHERE clause, aggregates, GROUP BY, whatever. As long as it returns a set of rows (with appropriate column names), we can wrap the query in parens, and reference it in another query like it was a table.
I have 3 tables with mainly string data and unique id column:
categories ~45 rows
clientfuncs ~800 rows
serverfuncs ~600 rows
All tables have unique primary AI column 'id'.
I try to count rows in one query:
SELECT COUNT(categories.id), COUNT(serverfuncs.id), COUNT(clientfuncs.id) FROM categories, serverfuncs, clientfuncs
It takes 1.5 - 1.7 s.
And when I try
SELECT COUNT(categories.id), COUNT(serverfuncs.id) FROM categories, serverfuncs
or
SELECT COUNT(categories.id), COUNT(clientfuncs.id) FROM categories, clientfuncs
or
SELECT COUNT(clientfuncs.id), COUNT(serverfuncs.id) FROM clientfuncs, serverfuncs
, it takes 0.005 - 0.01 s. (as it should be)
Can someone explain, what is the reason for this?
You're doing a cross join of 45*800*600 rows, you'll notice that when you check the result of the counts :-)
Try this instead:
SELECT
(SELECT COUNT(*) FROM categories),
(SELECT COUNT(*) FROM serverfuncs),
(SELECT COUNT(*) FROM clientfuncs);
The queries are doing cartesian product since no join condition is applied so:
1 query : 800*600*45 = 21,6 mil
2 query : 45*600 = 27 k
3 query : 45*800 ...
It's because your query is joining the tables (the commas in the last part of the query are shorthand for a join) rather than counting them individually. So your queries with only two tables will be quicker.
First of all, do you really want to use three tables in the FROM clause to compute counts that are specific to each table? This will cause the SELECT statement to produce a Cartesian product of the three tables which will result in a total number of rows of 45 x 800 x 600 from which counts are computed. Hence many duplicates of categories.id values will be counted and so are the other counts. In any case if you use first two tables in the FROM clause, the Cartesian product will contain only 45 X 800 rows which is much less than the rows the three tables produce. Hence the queries with two tables are much faster. Primary keys are of no use in this cases.
Better use three different statements to get count from each table.
If you still insist on getting counts at one shot, you may use the following syntax:
SELECT (SELECT COUNT(categories.id) FROM categories),
(SELECT COUNT(serverfuncs.id) FROM serverfuncs),
(SELECT COUNT(clientfuncs.id) FROM clientfuncs);
if your RDBMS supports SELECT statements without FROM clause. These will give correct counts and would be very fast.
Hows is it possible for these two queries to be different. I mean the first query didn't include all the rows from my left table so I put the conditions within the join part.
Query 1
SELECT COUNT(*) as opens, hours.hour as point
FROM hours
LEFT OUTER JOIN tracking ON hours.hour = HOUR(FROM_UNIXTIME(tracking.open_date))
WHERE tracking.campaign_id = 83
AND tracking.open_date < 1299538799
AND tracking.open_date > 1299452401
GROUP BY hours.hour
Query 2
SELECT COUNT(*) as opens, hours.hour as point
FROM hours
LEFT JOIN tracking ON hours.hour = HOUR(FROM_UNIXTIME(tracking.open_date))
AND tracking.campaign_id = 83
AND tracking.open_date < 1299538799
AND tracking.open_date > 1299452401
GROUP BY hours.hour
The difference is that the first query gives me 18 rows where there are no rows between point 17 to 22. But when I run the second query, it shows the fully 24 rows but for rows between 17 and 22 it has a value of 1! I would of expected it to be 0 or NULL? If it really is 1 should it not have appeared in the first query?
How has this happened?
the first JOIN is really an INNER JOIN, the outer joined table should not appear in the WHERE clause like you have in the top query, instead of COUNT(*), pick a column from the outer joined table
You're using COUNT(*), which will count every row in your result set (as it's written), since even without data in tracking, you do have data in hours.
Try changing COUNT(*) to COUNT(tracking.open_date) (or any non-nullable column within tracking; it doesn't matter which one).
COUNT(*) counts the number of rows resulted in the query.
You can use count(tracking.open_date), basically any column from tracking table (right table)
The problem is that the first query will do an outer join, with some rows containing NULL in all tables from the tracking table. Then it will apply a filter on those tracking columns, and since they are null, the corresponding row from the result set will be filtered out.
The second query will do a proper outer join on all columns.