I am beginner level in sql queries. I would like to know the purpose of outer query if the result could be obtained from inner query itself. Like in below case we are deleting set of records from a table which are less than particular timestamp(other matching conditions are matching unique keys but I wonder why we need to check equality for unique keys in same table). This query is written for purge process, we are deleting records from a table which are older than particular timestamp and for extra information we execute this query from java. But my question is to understand the purpose of outer query, if we can achieve result from inner query itself.
Nested query we have:
DELETE FROM ORDER_LOG T1
WHERE EXISTS (select *
FROM (SELECT VENDOR_NBR, ITEM_NBR, ORDER_ID, PDT_CODE, LOG_TS
FROM ORDER_LOG WHERE LOG_TS < "timestamp"
FETCH FIRST 100000 ROWS ONLY) AS T2
WHERE (T1.LOG_TS = T2.LOG_TS
AND T1.VENDOR_NBR= T2.VENDOR_NBR
AND T1.ORDER_ID=T2.ORDER_ID))
Non nested query(that I think of executable and produce same result):
DELETE FROM ORDER_LOG
WHERE LOG_TS< "timestamp"
FETCH FIRST 100000 ROWS ONLY
FETCH FIRST 100000 ROWS ONLY does not work with the DELETE instruction, you need to firstly select only those rows, and then delete them. Your second query will delete all entries that satisfy the condition LOG_TS< "timestamp"
I have a single record which joins to N other tables, and extracts a single column from each of them. I would like to put all N of those extracted columns in a single record.
After constructing the diagram below it seems like I can get to the second step easily, and then I should be able to use an aggregate function to filter out the NULL's. I have looked around for something like GROUP_COALESCE, but I couldn't find something which accomplishes this.
I have a fiddle here which unfortunately works, because MySQL will let you select columns which aren't in the GROUP BY without an aggregate at your own peril http://sqlfiddle.com/#!9/304992/1/0.
Is there a way I can make sure that it always selects the column from the record, if the record exists?
The end result should one record per group, and each column would contain the value which was inside the only row successfully joined for that group..
If I followed you correctly, you can just use aggregate functions on the columns coming from the joined tables. Aggregate functions ignore null values, so, since you have two null values and one non-null value for each column and each group, this will return the expected output (while conforming to the ONLY_FULL_GROUP_BY option).
SELECT
group_table_id,
MAX(t1.v) t1_v,
MAX(t2.v) t2_v,
MAX(t3.v) t3_v
FROM group_table
LEFT JOIN t1 ON t1.group_id = group_table_id
LEFT JOIN t2 ON t2.group_id = group_table_id
LEFT JOIN t3 ON t3.group_id = group_table_id
GROUP BY group_table_id
I don't know if this is possible, but can mysql do a sub select and retrieve multiple records?
Here is my simplified query:
SELECT table1.*,
(
SELECT table2.*
FROM Table2 table2
WHERE table2.key_id = table1.key_id
)
FROM Table1 table1
Basically, Table2 has X amount of records that I need to pull back in the query and I don't want to have to run a secondary query (for instance get the results from Table1 and then loop over those results and then get all the results from Table2).
Thanks.
No. The subquery in the SELECT clause is called a scalar subquery. A scalar subquery has two important properties:
It can only retrieve one column.
It can only retrieve zero or one rows.
A scalar subquery -- as its name implies -- substitutes for a scalar value in an expression. If the subquery returns no rows, the value used in the expression is NULL.
In your case, you can use a LEFT JOIN instead:
SELECT t1.*, t2.*
FROM Table1 t1 LEFT JOIN
Table2 t2
ON t2.key_id = t1.keyid;
Note that table aliases are a good thing. However, they should make the query simpler, so repeating the table name is not a big win.
MySQL can do a subquery that returns multiple rows or multiple columns, but it's not valid to do that in a scalar context.
You're putting a subquery in a scalar context. In other words, in the select-list, a subquery must return one column and one row (or zero rows), because it will be used for one item on the respective row as it uses the select-list to build a result.
I have two tables whome I am joining through left join. Both the tables are empty. But when I run the query, mysql returns a row with all NULLS.
I have tried several queries like
SELECT products.*,SUM(pq_quantity) as quantity
FROM `products` LEFT JOIN `products_quantities` ON `pq_product_idFk` = `p_id`
WHERE `p_volusion_id` = '37808'
OR
SELECT products.*,SUM(pq_quantity) as quantity
FROM `products` LEFT JOIN `products_quantities` ON `pq_product_idFk` = `p_id`
WHERE `p_volusion_id` = '37808' AND p_id IS NOT NULL
OR
SELECT products.*,SUM(pq_quantity) as quantity
FROM `products` LEFT JOIN `products_quantities` ON `pq_product_idFk` = `p_id` AND `p_volusion_id` = '37808' AND p_id IS NOT NULL
NONE of the above queries seem to work as I just want the result that is not NULL.
Thanks
Both the tables are empty. But when I run the query, mysql returns a row with all NULLS.
The presence of GROUP BY aggregate functions in the SELECT clause asks the GROUP BY clause to be present too. However, if it is not present, the SQL standard specifies that a single group is to be created using all the rows filtered by the WHERE clause.
Because of the * used in the SELECT clause, all the queries you posted are invalid SQL.
A query that contains a GROUP BY clause does not return rows from tables. It creates rows using the values extracted from the tables. First it creates groups (and sub-groups) using the expressions from the GROUP BY clause. All the rows from a group have the same value for the first expression specified in the GROUP BY clause.
If there are two or more expressions in the GROUP BY clause, each group is split into sub-groups using the second expression then each sub-group is further split into sub-sub-groups using the third expression (if exists) and so on.
From each such group of rows (after the last split), the database engine generates one new row and puts it into the result set. If the query contains in the SELECT clause expressions that are not either arguments of a GRUP BY aggregate function or also present in the GROUP BY clause then, most probably, these expressions will have more than one value in a subgroup. This is why the query is invalid SQL. Up to version 5.7.5, MySQL accepts such invalid queries but reserves itself the right to return any value it wants (from the group) for the offending expressions.
Back to your question, as explained above, even without having a GROUP BY clause, your query is processed as it had one and one group is created from all the rows filtered by the WHERE clause.
It is an empty group but this doesn't prevent the database engine to generate a row from it. Since there are no values to use to compute SUM(pG_quantity), NULL is the logical value it returns in the columns of the result set.
NULL is a special value that means the absence of any value or an unknown value. It make perfect sense in your case. You don't have any value in the tables, there is no way one could compute SUM(pq_quantity). Its value is not available (i.e. NULL).
I'm having trouble getting my head around evaluating correlated subqueries. An example is using a correlated subquery in SELECT so that GROUP BY isn't needed:
Consider the relations:
Movies : Title, Director Length
Schedule : Theatre, Title
I have the following query
SELECT S.Theater, MAX(M.Length)
FROM Movies M JOIN Schedule S ON M.Title=S.Title
GROUP BY S.Theater
Which gets the longest film that every theatre is playing. This is the same query without using GROUP BY:
SELECT DISTINCT S.theater,
(SELECT MAX(M.Length)
FROM Movies M
WHERE M.Title=S.Title)
FROM Schedule S
but I don't understand how it quite works.
I'd appreciate if anybody could give me an example of how correlated subqueries are evaluated.
Thanks :)
Conceptually...
To understand this, first ignore the bit about correlated subquery.
Consider the order of operations for a statement like this:
SELECT t.foo FROM mytable t
MySQL prepares an empty resultset. Rows in the resultset will consist of one column, because there is one expression in the SELECT list. A row is retrieved from mytable. MySQL puts a row into the resultset, using the value from the foo column from the mytable row, assigning it to the foo column in the resultset. Fetch the next row, repeat that same process, until there are no more rows to fetch from the table.
Pretty easy stuff. But bear with me.
Consider this statement:
SELECT t.foo AS fi, 'bar' AS fo FROM mytable t
MySQL process that the same way. Prepare an empty resultset. Rows in the resultset are going to have two columns this time. The first column is given the name fi (because we assigned the name fi with an alias). The second column in rows of the resultset will be named fo, because (again) we assigned an alias.
Now we etch a row from mytable, and insert a row into the resultset. The value of the foo column goes into the column name fi, and the literal string 'bar' goes into the column named fo. Continue fetching rows and inserting rows into the resultset, until no more rows to fetch.
Not too hard.
Next, consider this statement, which looks a little more tricky:
SELECT t.foo AS fi, (SELECT 'bar') AS fo FROM mytable t
Same thing happens again. Empty resultset. Rows have two columns, name fi and fo.
Fetch a row from mytable, and insert a row into the resultset. The value of foo goes into column fi (just like before.) This is where it gets tricky... for the second column in the resultset, MySQL executes the query inside the parens. In this case it's a pretty simple query, we can test that pretty easily to see what it returns. Take the result from that query and assign that to the fo column, and insert the row into the resultset.
Still with me?
SELECT t.foo AS fi, (SELECT q.tip FROM bartab q LIMIT 1) AS fo FROM mytable
This is starting to look more complicated. But it's not really that much different. The same things happen again. Prepare the empty resultset. Rows will have two columns, one name fi, the other named fo. Fetch a row from mytable. Get the value from foo column, and assign it to the fi column in the result row. For the fo column, execute the query, and assign the result from the query to the fo column. Insert the result row into the resultset. Fetch another row from mytable, a repeat the process.
Here we should stop and notice something. MySQL is picky about that query in the SELECT list. Really really picky. MySQL has restrictions on that. The query must return exactly one column. And it cannot return more than one row.
In that last example, for the row being inserted into the resultset, MySQL is looking for a single value to assign to the fo column. When we think about it that way, it makes sense that the query can't return more than one column... what would MySQL do with the value from the second column? And it makes sense that we don't want to return more than one row... what would MySQL do with multiple rows?
MySQL will allow the query to return zero rows. When that happens, MySQL assigns a NULL to the fo column.
If you have an understanding of that, your 95% of the way there to understanding the correlated subquery.
Let's look at another example. Our single line of SQL is getting a little unweildy, so we'll just add some line breaks and spaces to make it easier for us to work with. The extra spaces and linebreaks don't change the meaning of our statement.
SELECT t.foo AS fi
, ( SELECT q.tip
FROM bartab q
WHERE q.col = t.foo
ORDER BY q.tip DESC
LIMIT 1
) AS fo
FROM mytable t
Okay, that looks a lot more complicated. But is it really? It's the same thing again. Prepare an empty resultset. Rows will have two columns, fi and fo. Fetch a row from mytable, and get a row ready to insert into the resultset. Copy the value from the foo column, assign it to the fi column. And for the fo column, execute the query, take the single value returned by the query to the fo column, and push the row into the resultset. Fetch the next row from mytable, and repeat.
To explain (finall!) the part about "correlated".
That query we are going to run to get the result for the fo column. That contains a reference to a column from the outer table. t.foo. In this example that appears in the WHERE clause; it doesn't have to, it could appear anywhere in the statement.
What MySQL does with that, when it runs that subquery, it passes in the value of the foo column, into the query. If the row we just fetched from mytable has a value of 42 in the foo column... that subquery is equivalent to
SELECT q.tip
FROM bartab q
WHERE q.col = 42
ORDER BY q.tip DESC
LIMIT 1
But since we're not passing in the literal value of 42, what we're passing in is values from the row in the outer query, the result returned by our subquery is "related" to the row we're processing in the outer query.
We could be a lot more complicated in our subquery, as long as we remember the rule about the subquery in the SELECT list... it has to return exactly one column, and at most one row. It returns at most one value.
Correlated subqueries can appear in parts of the statement other than the SELECT list, such as the WHERE clause. The same general concept applies. For each row processed by the outer query, the values of the column(s) from that row are passed in to the subquery. The result returned from the subquery is related to the row being processed in the outer query.
The discussion omits all the steps before the actual execution... parsing the statament into tokens, performing the syntax check (keywords and identifiers in the right place). Then performing the semantics check (does mytable exist, does the user have select privilege on it, does the column foo exist in mytable). Then determining the access plan. And in the execution, obtaining the required locks, and so on. All that happens with every statement we execute.)
And we're going to not discuss the kinds of horrendous performance issues we can create with correlated subqueries. Though the previous discussion should give a clue. Since the subquery is executed for every row we're putting into the resultset (if it's in the SELECT list of our outer query), or is being executed for every row that is accessed by the outer query... if the outer query is returning 40,000 rows, that means our correlated subquery is going to be executed 40,000 times. So we better well make sure that subquery executes fast. Even when it executes fast, we're still going to execute it 40,000 times.
From a conceptual standpoint, imagine that the database is going through each row of the result without the subquery:
SELECT DISTINCT S.Theater, S.Title
FROM Schedule S
And then, for each one of those, running the subquery for you:
SELECT MAX(M.Length)
FROM Movies M
WHERE M.Title = (whatever S.Title was)
And placing that in as the value. Really, it's not (conceptually) that different from using a function:
SELECT DISTINCT S.Theater, SUBSTRING(S.Title, 1, 5)
FROM Schedule S
It's just that this function performs a query against another table, instead.
I do say conceptually, though. The database may be optimizing the correlated query into something more like a join. Whatever it does internally matters for performance, but doesn't matter as much for understanding the concept.
But, it may not return the results you're expecting. Consider the following data (sorry sqlfiddle seems to be erroring atm):
CREATE TABLE Movies (
Title varchar(255),
Length int(10) unsigned,
PRIMARY KEY (Title)
);
CREATE TABLE Schedule (
Title varchar(255),
Theater varchar(255),
PRIMARY KEY (Theater, Title)
);
INSERT INTO Movies
VALUES ('Star Wars', 121);
INSERT INTO Movies
VALUES ('Minions', 91);
INSERT INTO Movies
VALUES ('Up', 96);
INSERT INTO Schedule
VALUES ('Star Wars', 'Cinema 8');
INSERT INTO Schedule
VALUES ('Minions', 'Cinema 8');
INSERT INTO Schedule
VALUES ('Up', 'Cinema 8');
INSERT INTO Schedule
VALUES ('Star Wars', 'Cinema 6');
And then this query:
SELECT DISTINCT
S.Theater,
(
SELECT MAX(M.Length)
FROM Movies M
WHERE M.Title = S.Title
) AS MaxLength
FROM Schedule S;
You'll get this result:
+----------+-----------+
| Theater | MaxLength |
+----------+-----------+
| Cinema 6 | 121 |
| Cinema 8 | 91 |
| Cinema 8 | 121 |
| Cinema 8 | 96 |
+----------+-----------+
As you can see, it's not a replacement for GROUP BY (and you can still use GROUP BY), it's just running the subquery for each row. DISTINCT will only remove duplicates from the result. It's not giving the "greatest length" per theater anymore, it's just giving each unique movie length associated with the theater name.
PS: You might likely use an ID column of some sort to identify movies, rather than using the Title in the join. This way, if by chance the name of the movie has to be amended, it only needs to change in one place, not all over Schedule too. Plus, it's faster to join on an ID number than a string.