MySQL Intersect Query on One Table - mysql

I have a table that contains event information for users, i.e. each row contains an ID, a user's start date, completion date, and the event number. Example:
Row 1: ID 256 | StartDate 13360500 | FinishDate 13390500 | EventNum 3
I am trying to intersect all of the rows for users who have finished events 3 & 2, but I can't figure out why my query is returning no results:
SELECT table_id FROM table
WHERE table_EventNum = 3
AND table_FinishDate > 0
AND table_id IN (SELECT table_id FROM table WHERE table_EventNum = 2);
This query without the subquery (the line separated from the rest at the bottom) returns a bunch of non-null results, as it should, and the subquery also returns a bunch of non-null results (again, as it should). But for some reason the composite query doesn't return any rows at all. I know the IN command returns NULL if the expression on either side is NULL, but since both queries return results I'm not sure what else might cause this.
Any ideas? Thanks!

Assuming FinishDate is NULL when the event is not complete. Also assuming that there has to be a row with matching id and event number 2 and that event 3 cannot happen before event 2:
SELECT t1.table_id FROM table t1 INNER JOIN table t2 ON t1.table_id = t2.table_id
WHERE t1.table_EventNum = 3 AND t2.table_EventNum = 2
AND NOT t1.table_FinishDate IS NULL
Note that I could not find anything wrong with your query other than the fact that you do not need a subquery.

Related

SQL query which includes COUNT(*) in it's SELECT `clause` confuses me

I'm a newbie in SQL, trying to find my way through.
I have the following diagram:
and I'm being requested to
"Produce a list of number of items from each product which was ordered
in June 2004. Assume there's a function MONTH() and YEAR()"
The given solution is:
SELECT cat_num, COUNT(*)
FROM ord_rec AS O, include AS I
WHERE O.ord_num = I.ord_num AND
MONTH(O.ord_date) = 6 AND
YEAR(O.ord_date) = 2004
GROUP BY cat_num;
What I'm confused about is the COUNT(*). (specifically the asterisk within).
Does it COUNT all rows that are returned from the given query? So the asterisk refers to all of the returned ROWS? or am I far off?
Is it any different than having:
SELECT cat_num, COUNT(cat_num)
Thanks!
The COUNT(*) function returns the number of rows in a dataset using the SELECT statement. The function counts rows with NULL, duplicate, and non-NULL values.
The COUNT(cat_num) function returns the number of rows that do not contain NULL values.
Consider an example:
Block
Range
A
1-10
A
10-1
B
(NULL)
B
(NULL)
B
(NULL)
For this data,using query:
SELECT
COUNT(*),
COUNT(t.`Block`),
COUNT(t.`Range`)
FROM
`test_table` t
You'll obtain results :
count(*)
count(t.Block)
count(t.Range)
5
5
2
I hope that clears your confusion.
The COUNT(*) function returns the number of rows in a table in a query. It counts duplicate rows and rows that contain null values.
Overall, you can use * or ALL or DISTINCT or some expression along
with COUNT to COUNT the number of rows w.r.t. some condition or all of
the rows, depending up on the arguments you are using along with
COUNT() function.
Possible parameters for COUNT()
When the * is used for COUNT(), all records ( rows ) are COUNTed if some content NULL but COUNT(column_name) does not COUNT a record if its field is NULL.
Resources here.

SQL Query of Counts That Lie Between Values in Other Table

I'm not exactly sure how to phrase the title. I have a query that I cannot figure out:
I have a table 'values' with timestamps (1970 epoch decimal) and a blob for each row. I have a second table called 'keys' that contains a timestamps and keys to decrypt each of the blobs in the first table 'values'. The key changes periodically at random intervals and each time the key changes, a new set of keys are written to the 'keys' table. There are multiple keys and when a new key set is written to the 'keys' table, each key has a separate entry with the same timestamp.
if I do something like this:
select distinct timestamp from keys;
I get a set returned for every time the keys rotated and I wrote a new keyset into the database.
What I would like is a sql statement in mysql that returns timestamps for each keyset and the total number of records in the 'values' table between each of those key timestamps.
For instance:
Timestamp
Count
1635962134
23
1636048054
450
1636145254
701
etc...
The last row needs special consideration since its the "current" set doesn't have another entry in the keytable (yet..)
SQL Fiddle with Sample Data:
SQL FIDDLE WITH SAMPLE DATA
For the sample data above, the results should be:
| Timestamp | Count |
| --------- | ----- |
| 1635962134 | 14|
| 1636043734 | 28|
| 1636119328 | 11|
You are a little limited by the mySQL version, but you can use a variable to help create the row set. You could do it with joins, too, but it would be a little more complicated.
https://dbfiddle.uk/?rdbms=mysql_5.6&fiddle=b5d587b30f1a758ce31e3fa4745f26d0
SELECT k.key1, k.key2, count(*) as vol
FROM my_values v
JOIN (
SELECT key_ts as key1, #curr-1 as key2, #curr:= key_ts
FROM (
SELECT DISTINCT key_ts FROM my_keys
JOIN (SELECT #curr:=9999999999) var
ORDER BY key_ts DESC
) z
) k ON (v.val_ts BETWEEN k.key1 and k.key2)
GROUP BY key1, key2
First (the innermost subquery) select the distinct timestamps from my_keys and order them. I use that join just to set the variable. You could use a SET statement in a separate query also. I set the variable to an arbitrarily high timestamp, so that the last timestamp in the series will always have a partner.
Select from that the key and the variable value minus 1 (to prevent overlap), and then after that set the variable to the current key. That has to be done after everything else in the select query. That will generate two paired timestamps representing a time range.
Then just join my_values to those keys, and use BETWEEN as the join condition.
Let me know if that works for you.

Why can't this subquery return more than one row?

This Query is being generated by Django ORM using RawSQL:
SELECT `productos`.`codigo_barras`, (
SELECT
articulos.costo_us * (1 + articulos.iva_coef)
FROM
articulos
INNER JOIN (
SELECT
articulos.id, MAX(encargosProveedor.fecha_entrega)
FROM
articulos, encargosProveedor_listado_articulos, encargosProveedor, itemArticulosProveedor
WHERE
articulos.id = itemArticulosProveedor.articulos_id AND
encargosProveedor.id = encargosProveedor_listado_articulos.encargosproveedor_id
GROUP BY
articulos.producto_id
)
AS ultimos ON articulos.id = ultimos.id
) AS `ultimo_precio` FROM `productos`
It's giving an error
1242 - Subquery returns more than 1 row
This is the result of the subquery
+----+--------------------------------------+
| id | MAX(encargosProveedor.fecha_entrega) |
+----+--------------------------------------+
| 1 | 2019-04-17 |
+----+--------------------------------------+
| 3 | 2019-04-17 |
+----+--------------------------------------+
I read the MYSQL documentation but i can't understand why is there a problem with returning two rows. I've tried a lot of alternatives.
Where is the problem?
Subqueries included as columns of a SELECT statement are called "scalar subqueries". A scalar subquery should be able to produce zero or one row only since its value (the scalar) will be placed in the returned row of the result set of the query, where there's room for one value only. Therefore, if a subquery returns more than a single row, it cannot be used directly as a SELECT column.
One option is to force it to produce one row at most, maybe using an aggregation function such as MAX(), MIN(), COUNT(), etc.
Another option is to join the subquery as a "table expression", where there are no restriction on the number of returned rows.
Too long for a comment.
It's not the
SELECT articulos.id, MAX(encargosProveedor.fecha_entrega)
FROM ...
subquery that's the problem. As that is part of a JOIN expression it is allowed to return more than one row. However, since that returns more than one row, the surrounding subquery:
SELECT articulos.costo_us * (1 + articulos.iva_coef)
FROM articulos
INNER JOIN (SELECT articulos.id, MAX(encargosProveedor.fecha_entrega)
FROM ...)
will also return more than one row.
You need to figure out a way to prevent the outer subquery returning more than one row even when the inner one does, possibly by using aggregation functions such as MIN or MAX. Alternatively, you need to find a way to distinguish between rows in the inner subquery that have the same MAX(encargosProveedor.fecha_entrega) value (perhaps ordering by another value with a LIMIT 1) so that query only returns one row.

Evaluating a correlated subquery in SQL

I'm having trouble getting my head around evaluating correlated subqueries. An example is using a correlated subquery in SELECT so that GROUP BY isn't needed:
Consider the relations:
Movies : Title, Director Length
Schedule : Theatre, Title
I have the following query
SELECT S.Theater, MAX(M.Length)
FROM Movies M JOIN Schedule S ON M.Title=S.Title
GROUP BY S.Theater
Which gets the longest film that every theatre is playing. This is the same query without using GROUP BY:
SELECT DISTINCT S.theater,
(SELECT MAX(M.Length)
FROM Movies M
WHERE M.Title=S.Title)
FROM Schedule S
but I don't understand how it quite works.
I'd appreciate if anybody could give me an example of how correlated subqueries are evaluated.
Thanks :)
Conceptually...
To understand this, first ignore the bit about correlated subquery.
Consider the order of operations for a statement like this:
SELECT t.foo FROM mytable t
MySQL prepares an empty resultset. Rows in the resultset will consist of one column, because there is one expression in the SELECT list. A row is retrieved from mytable. MySQL puts a row into the resultset, using the value from the foo column from the mytable row, assigning it to the foo column in the resultset. Fetch the next row, repeat that same process, until there are no more rows to fetch from the table.
Pretty easy stuff. But bear with me.
Consider this statement:
SELECT t.foo AS fi, 'bar' AS fo FROM mytable t
MySQL process that the same way. Prepare an empty resultset. Rows in the resultset are going to have two columns this time. The first column is given the name fi (because we assigned the name fi with an alias). The second column in rows of the resultset will be named fo, because (again) we assigned an alias.
Now we etch a row from mytable, and insert a row into the resultset. The value of the foo column goes into the column name fi, and the literal string 'bar' goes into the column named fo. Continue fetching rows and inserting rows into the resultset, until no more rows to fetch.
Not too hard.
Next, consider this statement, which looks a little more tricky:
SELECT t.foo AS fi, (SELECT 'bar') AS fo FROM mytable t
Same thing happens again. Empty resultset. Rows have two columns, name fi and fo.
Fetch a row from mytable, and insert a row into the resultset. The value of foo goes into column fi (just like before.) This is where it gets tricky... for the second column in the resultset, MySQL executes the query inside the parens. In this case it's a pretty simple query, we can test that pretty easily to see what it returns. Take the result from that query and assign that to the fo column, and insert the row into the resultset.
Still with me?
SELECT t.foo AS fi, (SELECT q.tip FROM bartab q LIMIT 1) AS fo FROM mytable
This is starting to look more complicated. But it's not really that much different. The same things happen again. Prepare the empty resultset. Rows will have two columns, one name fi, the other named fo. Fetch a row from mytable. Get the value from foo column, and assign it to the fi column in the result row. For the fo column, execute the query, and assign the result from the query to the fo column. Insert the result row into the resultset. Fetch another row from mytable, a repeat the process.
Here we should stop and notice something. MySQL is picky about that query in the SELECT list. Really really picky. MySQL has restrictions on that. The query must return exactly one column. And it cannot return more than one row.
In that last example, for the row being inserted into the resultset, MySQL is looking for a single value to assign to the fo column. When we think about it that way, it makes sense that the query can't return more than one column... what would MySQL do with the value from the second column? And it makes sense that we don't want to return more than one row... what would MySQL do with multiple rows?
MySQL will allow the query to return zero rows. When that happens, MySQL assigns a NULL to the fo column.
If you have an understanding of that, your 95% of the way there to understanding the correlated subquery.
Let's look at another example. Our single line of SQL is getting a little unweildy, so we'll just add some line breaks and spaces to make it easier for us to work with. The extra spaces and linebreaks don't change the meaning of our statement.
SELECT t.foo AS fi
, ( SELECT q.tip
FROM bartab q
WHERE q.col = t.foo
ORDER BY q.tip DESC
LIMIT 1
) AS fo
FROM mytable t
Okay, that looks a lot more complicated. But is it really? It's the same thing again. Prepare an empty resultset. Rows will have two columns, fi and fo. Fetch a row from mytable, and get a row ready to insert into the resultset. Copy the value from the foo column, assign it to the fi column. And for the fo column, execute the query, take the single value returned by the query to the fo column, and push the row into the resultset. Fetch the next row from mytable, and repeat.
To explain (finall!) the part about "correlated".
That query we are going to run to get the result for the fo column. That contains a reference to a column from the outer table. t.foo. In this example that appears in the WHERE clause; it doesn't have to, it could appear anywhere in the statement.
What MySQL does with that, when it runs that subquery, it passes in the value of the foo column, into the query. If the row we just fetched from mytable has a value of 42 in the foo column... that subquery is equivalent to
SELECT q.tip
FROM bartab q
WHERE q.col = 42
ORDER BY q.tip DESC
LIMIT 1
But since we're not passing in the literal value of 42, what we're passing in is values from the row in the outer query, the result returned by our subquery is "related" to the row we're processing in the outer query.
We could be a lot more complicated in our subquery, as long as we remember the rule about the subquery in the SELECT list... it has to return exactly one column, and at most one row. It returns at most one value.
Correlated subqueries can appear in parts of the statement other than the SELECT list, such as the WHERE clause. The same general concept applies. For each row processed by the outer query, the values of the column(s) from that row are passed in to the subquery. The result returned from the subquery is related to the row being processed in the outer query.
The discussion omits all the steps before the actual execution... parsing the statament into tokens, performing the syntax check (keywords and identifiers in the right place). Then performing the semantics check (does mytable exist, does the user have select privilege on it, does the column foo exist in mytable). Then determining the access plan. And in the execution, obtaining the required locks, and so on. All that happens with every statement we execute.)
And we're going to not discuss the kinds of horrendous performance issues we can create with correlated subqueries. Though the previous discussion should give a clue. Since the subquery is executed for every row we're putting into the resultset (if it's in the SELECT list of our outer query), or is being executed for every row that is accessed by the outer query... if the outer query is returning 40,000 rows, that means our correlated subquery is going to be executed 40,000 times. So we better well make sure that subquery executes fast. Even when it executes fast, we're still going to execute it 40,000 times.
From a conceptual standpoint, imagine that the database is going through each row of the result without the subquery:
SELECT DISTINCT S.Theater, S.Title
FROM Schedule S
And then, for each one of those, running the subquery for you:
SELECT MAX(M.Length)
FROM Movies M
WHERE M.Title = (whatever S.Title was)
And placing that in as the value. Really, it's not (conceptually) that different from using a function:
SELECT DISTINCT S.Theater, SUBSTRING(S.Title, 1, 5)
FROM Schedule S
It's just that this function performs a query against another table, instead.
I do say conceptually, though. The database may be optimizing the correlated query into something more like a join. Whatever it does internally matters for performance, but doesn't matter as much for understanding the concept.
But, it may not return the results you're expecting. Consider the following data (sorry sqlfiddle seems to be erroring atm):
CREATE TABLE Movies (
Title varchar(255),
Length int(10) unsigned,
PRIMARY KEY (Title)
);
CREATE TABLE Schedule (
Title varchar(255),
Theater varchar(255),
PRIMARY KEY (Theater, Title)
);
INSERT INTO Movies
VALUES ('Star Wars', 121);
INSERT INTO Movies
VALUES ('Minions', 91);
INSERT INTO Movies
VALUES ('Up', 96);
INSERT INTO Schedule
VALUES ('Star Wars', 'Cinema 8');
INSERT INTO Schedule
VALUES ('Minions', 'Cinema 8');
INSERT INTO Schedule
VALUES ('Up', 'Cinema 8');
INSERT INTO Schedule
VALUES ('Star Wars', 'Cinema 6');
And then this query:
SELECT DISTINCT
S.Theater,
(
SELECT MAX(M.Length)
FROM Movies M
WHERE M.Title = S.Title
) AS MaxLength
FROM Schedule S;
You'll get this result:
+----------+-----------+
| Theater | MaxLength |
+----------+-----------+
| Cinema 6 | 121 |
| Cinema 8 | 91 |
| Cinema 8 | 121 |
| Cinema 8 | 96 |
+----------+-----------+
As you can see, it's not a replacement for GROUP BY (and you can still use GROUP BY), it's just running the subquery for each row. DISTINCT will only remove duplicates from the result. It's not giving the "greatest length" per theater anymore, it's just giving each unique movie length associated with the theater name.
PS: You might likely use an ID column of some sort to identify movies, rather than using the Title in the join. This way, if by chance the name of the movie has to be amended, it only needs to change in one place, not all over Schedule too. Plus, it's faster to join on an ID number than a string.

MySQL select several columns of several tables at the same time without using JOIN

SELECT id,
(SELECT payer_id FROM transactions WHERE id = user_id),
(SELECT sentence FROM cofg_sentences WHERE id = user_id),
(SELECT name FROM cofg_options WHERE id = user_id),
(SELECT hour FROM cofg_time WHERE id = user_id),
(SELECT field_id, url FROM cofg_feeds WHERE id = user_id),
(SELECT field_id, fb_user FROM cofg_accounts WHERE id = user_id)
FROM users WHERE token = '.......'
I am trying to do this query but I am receiving this error:
#1241 - Operand should contain 1 column(s)
If I delete second column of subquery I get:
#1242 - Subquery returns more than 1 row
I know there are other question with the same error but the queries are different. I don't want to use JOIN because I have read it will decrease the perfomance.
What are the problems here? Any kind of help is appreciated :)
Thanks in advance
I don't want to use JOIN because I have read it will decrease the perfomance`
What did you find when you measured it?
The pattern of putting sub queries in the select clause is a viable technique for fixing some performance problems if you know what the optimiser is doing and what your data looks like.
You should start by implementing your code using proper joins then change it if it needs to be changed.
#1241 - Operand should contain 1 column(s)
Is caused by
(SELECT field_id, url FROM
If it is appropriate to use a sub query you can avoid this error by concatenating the values with a separator then splitting them later.
#1242 - Subquery returns more than 1 row
Should be self evident and is solved by aggregating the rows or limiting to one row.
Do you agree the tipical SQL row is something like:
SQL Row Example:
COLUMN1 COLUMN2 COLUMN3 COLUMN4
1 anyvalue 2 othervalue
When you run a subquery it should return a simple VALUE with 1 row and 1 column so you still having a tipical SQL row.
SQL Row Example:
COLUMN1 COLUMN2 COLUMN3 COLUMN4
SUBQUERY anyvalue 2 othervalue
Lets go. Fist error:
1241 - Operand should contain 1 column(s)
you are trying to put 2 columns in a place supposed to have one simple value.
Second error:
1242 - Subquery returns more than 1 row
your SQL is returning many rows with many values but it is expecting only one simple value as I said before.
Adding a TOP to limit the row numbers would be a solution in your case:
SELECT id,
(SELECT TOP 1 payer_id FROM transactions WHERE id = user_id),
(SELECT TOP 1 sentence FROM cofg_sentences WHERE id = user_id),
(SELECT TOP 1 name FROM cofg_options WHERE id = user_id),
(SELECT TOP 1 hour FROM cofg_time WHERE id = user_id),
(SELECT TOP 1 url FROM cofg_feeds WHERE id = user_id),
(SELECT TOP 1 fb_user FROM cofg_accounts WHERE id = user_id)
FROM users WHERE token = '.......'
I don't want to interfere in your logic. I will just explain something about subquery:
You have 6 subqueries in your SELECT. It means you will hit the database six times for each user.
If you have 100 users you will hit your database 600 times just in one simple SQL instruction.
Think about use joins!