sql self join without equal operator on column - mysql

Post the problem and SQL solution which works. My confusion is, when I am doing self join in the past, there is always some equal value (equal operator) in columns to join, but in below example, it seems self join could work without using equal operator? In my below example, using minus operator and >, no equal operator to specify which columns used to join.
Wondering if no equal operator, how did underlying self join works in my example?
Problem,
Given a Weather table, write a SQL query to find all dates' Ids with higher temperature compared to its previous (yesterday's) dates.
+---------+------------+------------------+
| Id(INT) | Date(DATE) | Temperature(INT) |
+---------+------------+------------------+
| 1 | 2015-01-01 | 10 |
| 2 | 2015-01-02 | 25 |
| 3 | 2015-01-03 | 20 |
| 4 | 2015-01-04 | 30 |
+---------+------------+------------------+
For example, return the following Ids for the above Weather table:
+----+
| Id |
+----+
| 2 |
| 4 |
+----+
SQL solution,
select W1.Id
from Weather W1, Weather W2
where TO_DAYS(W1.Date)-TO_DAYS(W2.Date) = 1 and W1.Temperature > W2.Temperature

Writing it using an ANSI join, since they're a standard part of SQL:
select W1.Id
from Weather W1
inner join
Weather W2
on TO_DAYS(W1.Date)-TO_DAYS(W2.Date) = 1 and
W1.Temperature > W2.Temperature
(Should produce an identical result set)
A join is just the process of matching up two sets of rows - you have a row source on the "left" and a row source on the "right" of the join. In trivial cases, these row sources are tables, but a join may also join the results of any previous joins as the row sources.
In theory, in the join, the result would be a cartesian product - every row on the left would be matched with every row on the right. If this is what you want, you can indicate this with CROSS JOIN.
Usually, however, we want to restrict the result of the join to less than the cartesian product of the rows. And we express those restrictions by writing an ON clause (or in the WHERE clause in your example using the old-style comma join).
The most common type of join is an equijoin, where one or more columns on each side are compared for equality. But that is by no means required. It can be any predicates that make sense. E.g. one form of join that I employ semi-regularly I described as a "triangle join" (by no means standard terminology) where every row is matched with every row that comes later:
SELECT
*
FROM
Table t1
left join
Table t2
on
t1.ID < t2.ID
And that's perfectly fine. The row with the lowest ID in Table will be matched with every other row in the table. The row with the highest ID value will not be matched with any other rows.

That's called an "implicit join" - I suggest, you read up on SQL JOIN, for example at https://en.wikipedia.org/wiki/Join_(SQL).
In short: The database looks for fitting JOIN columns without requiring you to explicitly specify them.

Related

Aggregate data over multiple fields (not records)

id | segment1 |segment2|segment3|segment4|**FREQUENT**
1 | A | B | A | A | A
2 | B | C | C | C | C
Need to find the most frequent letters from segment1 |segment2|segment3|segment4| i.e to find column FREQUENT.
First of all you need to copy all the segments into a single column. One can do it using UNION or a temporary table.
Then you need to count the frequency grouping the result by ID.
Then you need to get the most frequent value in every group. It can be done using self joining over LEFT JOIN or using ordering and row numbering by group. It is the most complicated part. See
Row number per group in mysql
Then you filter by the row number and enjoy the result.

2 inner joins between same 2 tables

I am trying to select columns from 2 tables,
The INNER JOIN conditions are $table1.idaction_url=$table2.idaction AND $table1.idaction_name=$table2.idaction.
However, From the query below, there is no output. It seems like the INNER JOIN can only take 1 condition. If I put AND to include both conditions as shown in the query below, there wont be any output. Please look at the picture below. Please advice.
$mysql=("SELECT conv(hex($table1.idvisitor), 16, 10) as visitorId,
$table1.server_time, $table1.idaction_url,
$table1.time_spent_ref_action,$table2.name,
$table2.type, $table1.idaction_name, $table2.idaction
FROM $table1
INNER JOIN $table2
ON $table1.idaction_url=$table2.idaction
AND $table1.idaction_name=$table2.idaction
WHERE conv(hex(idvisitor), 16, 10)='".$id."'
ORDER BY server_time DESC");
Short answer:
You need to use two separate inner joins, not only a single join.
E.g.
SELECT `actionurls`.`name` AS `actionUrl`, `actionnames`.`name` AS `actionName`
FROM `table1`
INNER JOIN `table2` AS `actionurls` ON `table1`.`idaction_url` = `actionurls`.`idaction`
INNER JOIN `table2` AS `actionnames` ON `table1`.`idaction_name` = `actionurls`.`idaction`
(Modify this query with any additional fields you want to select).
In depth: INNER JOIN, when done on a value unique to the second table (the table joined to the first in this operation) will only ever fetch one row. What you want to do is fetch data from the other table twice, into the same row, reading the select part of the statement.
INNER JOIN table2 ON [comparison] will, for each row selected from table1, grab any rows from table2 for which [comparison] is TRUE, then copy the row from table1 N times, where N is the amount of rows found in table2. If N = 0, then the row is skipped. In our case N=1 so INNER JOIN of idaction_name in table1 to idaction in table2 for example will allow you to select all the action names.
In order to get the action urls as well we have to INNER JOIN a second time. Now you can't join the same table twice normally, as SQL won't know which of the two joined tables is meant when you type table2.name in the first part of your query. This would be ambiguous if both had the same name. There's a solution for this, table aliases.
The output (of my answer above) is going to be something like:
+-----+------------------------+-------------------------+
| Row | actionUrl | actionName |
+-----+------------------------+-------------------------+
| 1 | unx.co.jp/ | UNIX | Kumamoto Home |
| 2 | unx.co.jp/profile.html | UNIX | Kumamoto Profile |
| ... | ... | ... |
+-----+------------------------+-------------------------+
While if you used only a single join, you would get this kind of output (using OR):
+-----+-------------------------+
| Row | actionUrl |
+-----+-------------------------+
| 1 | unx.co.jp/ |
| 2 | UNIX | Kumamoto Home |
| 3 | unx.co.jp/profile.html |
| 4 | UNIX | Kumamoto Profile |
| ... | ... |
+-----+-------------------------+
Using AND and a single join, you only get output if idaction_name == idaction_url is TRUE. This is not the case, so there's no output.
If you want to know more about how to use JOINS, consult the manual about them.
Sidenote
Also, I can't help but notice you're using variables (e.g. $table1) that store the names of the tables. Do you make sure that those values do not contain user input? And, if they do, do you at least whitelist a list of tables that users can access? You may have some security issues with this.
INNER JOIN does not put any restriction on number of conditions it can have.
The zero resultant rows means, there is no rows satisfying the two conditions simultaneously.
Make sure you are joining using correct columns. Try going step by step to identify from where the data is lost

SQL order of execution for correlated subquery

I have the following Personnel table:
+---------+----------+-------------+
| name | dept_nbr | job_title |
+---------+----------+-------------+
| Michael | 14 | Programmer |
| Kumar | 14 | Programmer |
| Dave | 14 | Programmer |
| Jane | 14 | Manager |
| Carol | 37 | Programmer |
| Joe | 37 | Programmer |
| John | 59 | CEO |
+---------+----------+-------------+
Problem: Find all dept_nbr's (departments) that have fewer than 3 programmers.
Working query:
SELECT DISTINCT dept_nbr
FROM Personnel AS P1
WHERE (SELECT COUNT(P2.dept_nbr)
FROM Personnel AS P2
WHERE P1.dept_nbr = P2.dept_nbr AND P2.job_title = 'Programmer') < 3;
Result:
37
59
Notes:
Department 14 is correctly not included as it has 3 programmers (3 is equal to but not fewer than 3). Department 59 has zero programmers, and is also correctly included in the results.
My question:
When the above query executes, how does a generic SQL engine proceed? From what I have read, SQL execution order is (roughly): From, Where, Group By, Having, and Select. So, is the following correct?
1 - The Outer Query passes each row of the Personnel table as P1 into the Inner query.
2.a - The Inner Query scans the entire Personnel table as P2, row by row, looking for rows that satisfy the condition "P1.dept_nbr = P2.dept_nbr AND P2.job_title = 'Programmer'".
2.b – Once the Inner Query is done with the entire table, it COUNTs the matching dept_nbr values and returns it to the Outer Query.
3 – In the Outer Query, if the count returned from the Inner Query satisfies the condition "WHERE (Inner Query Count Result) < 3", the corresponding dept_nbr for the P1 row is SELECTed.
4 – Following all rows processed by the Outer Query, the Outer Query does a DISTINCT on the results and displays the unique dept_nbr values.
Is my understanding above correct? Specifically, does the outer query do the DISTINCT at the very end (step #4)? It seems that in this way, the inner query does redundant scanning (for example, it processes dept_nbr = 14 four times, when it really has the answer in the first pass).
I tested the above query on sqlfiddle.com w/ MySQL 5.6.
When the above query executes, how does a generic SQL engine proceed?
From what I have read, SQL execution order is (roughly): From, Where,
Group By, Having, and Select.
This statement is -- generally -- not correct. SQL is parsed in the order that you describe. However, the execution is determined by the optimizer and might have little to do with the original query. Remember: SQL is a descriptive language, not a procedural language. It describes the result set, not the specific steps for calculating it.
That said, MySQL's execution plan is much closer to the query than most other databases (particularly more advanced databases with better optimizers). And, almost any database is going to proceed in the steps you describe for this query. The aggregation in the subquery limits the choices for optimization.
If you want to eliminate the redundancy, then do the select distinct before the filtering:
SELECT dept_nbr
FROM (SELECT DISTINCT dept_nbr FROM Personnel P1) P1
WHERE (SELECT COUNT(P2.dept_nbr)
FROM Personnel AS P2
WHERE P1.dept_nbr = P2.dept_nbr AND P2.job_title = 'Programmer'
) < 3;
You can also do this more simply with just an aggregation:
select dept_nbr
from personnel
group by dept_nbr
having sum(job_title = 'Programmer') < 3;
Add EXPLAIN (or EXPLAIN EXTENDED) before your query and it should give you the explain plan which will detail exactly the steps in order of your query. This is a very useful tool when trying to optimize queries.

Using GROUP_CONCAT in separate row records

I'm having trouble using GROUP_CONCAT. I'm pretty sure this is the only way to get what I want but it doesn't seem give me the results I need.
Here is my statement:
SELECT
b.*,
GROUP_CONCAT(c.finance_code) AS finance_codes
FROM
`oc_finance_breakpoints` b
LEFT JOIN
`oc_finance_breakpoints_codes` c ON c.breakpoint_id = b.breakpoint_id;
This will gather data in the finance_breakpoints table, structure below:
breakpoint_id
from_value
to_value
minimum_deposit
As well as multiple "finance codes" from my join table, finance_breakpoint_codes:
breakpoint_code_id
breakpoint_id
finance_code
There can be, are are likely to be, several finance codes to a breakpoint. When I run the sql when there is only one entry, I get the following:
1 | 280.00 | 750.00 | 10 | ONIF6,ONIF10,ONIF12
But if there are two entries in the breakpoints table, all that happens is it tacks the additional finance codes onto the end of the above, meaning I only ever get one row with the first set of data, and all the finance codes in one column.
Ideally I'd like it to return something such as this:
1 | 280.00 | 750.00 | 10 | ONIF6,ONIF10,ONIF12
2 | 750.00 | 1500.00 | 10 | ONIB12-9.9,ONIB24-9.9,ONIB36-9
Rather than:
1 | 280.00 | 750.00 | 10 | ONIF6,ONIF10,ONIF12,ONIB12-9.9,ONIB24-9.9,ONIB36-9
Is there any way of achieving what I want? Am I maybe using the wrong function?
The use of an aggregate function (such as GROUP_CONCAT) in your query ensures that it will return aggregated results, while the absence of an explicit grouping ensures that it will return a single, overall summary row.
You need to add a group by clause to the end of your query - like so:
SELECT
b.*,
GROUP_CONCAT(c.finance_code) AS finance_codes
FROM
`oc_finance_breakpoints` b
LEFT JOIN `oc_finance_breakpoints_codes` c
ON c.breakpoint_id = b.breakpoint_id
GROUP BY b.breakpoint_id

Microsoft Access 2003 Query - Count records and sum it

I'm creating a query with Microsoft Access 2003 and had encounter an issue. I'm new!
I've got 2 tables. First table, i have a list of records that include the name, property name and the country state. Second table, i have a list of property names, the number of units in the property and the property's country state.
I will like to count the number of records in the first table by its state, meanwhile summing up the number of units the property has in the state.
What I encountered is, when I sum the number of units, the units repeats!
Taking for example;
Table1:
Name | State | Property Name
Mr 1 | State A | Building AAA
Mr 2 | State A | Building AAA
Mr 3 | State A | Building BBB
Mr 4 | State B | Building XXX
Mr 5 | State B | Building XXX
Table2:
Property Name | State | Number of Units
Building AAA | State A | 100
Building BBB | State A | 50
Building XXX | State B | 20
My Result:
State | Number of Units | No of Records
State A | 250 | 3
State B | 40 | 2
The result i want:
State | Number of Units | No of Records
State A | 150 | 3
State B | 20 | 2
EXPANDED
Assuming you are using the Access query builder, you will need to construct three Select queries:
1) Table1 will be the source table for the first query. Use the State field twice in the query, first as a Group By field and second as a Count field. (Any of the fields could have been used for the count, since you are only interested in the number of records.) Save the query for use in the third query.
2) Table2 will be the source table for the second query. Use the State field as a Group By field and the Units field as a Sum field. Save this query, too.
3) The third query will bring the information together. For the source, use the first and second queries, with a join between them on the State field. Select the State field (from either query) as a Group By Field, the CountOfState field from the first query as a Sum field, and the SumofUnits field from the second query as a Sum field.
While the actual amount of work done by Access in producing the final result will not change, the three queries can be consolidated into a single query by editing the underlying SQL.
The new query was produced by inserting the Table1 and Table2 queries into the third, final result query, one on either side of the INNER JOIN statement. The T1 and T1 in the new query are aliases for the embedded queries that eliminate ambiguity in referencing the fields of those queries.
The new query cannot be created using the Query Builder (although the original three queries provide the raw material for it). Instead, the SQL must be written/pasted in/edited in the SQL View of the Query Builder.
SELECT T1.State AS State,
Sum(T1.CountOfState) AS Records,
Sum(T2.SumOfUnits) AS Units
FROM
(SELECT Table1.State,
Count(Table1.State) AS CountOfState
FROM Table1
GROUP BY Table1.State) T1
INNER JOIN
(SELECT Table2.State,
Sum(Table2.Units) AS SumOfUnits
FROM Table2
GROUP BY Table2.State) T2
ON T1.State = T2.State
GROUP BY T1.State;