We have a table which has two columns -- ID and Value. The ID is the index of table row, and the Value consists of Fixed String and Key (a number) in hexadecimal storing as string in the database. Take 00001810010 as an example, the fixed string is 0000181 and the seconds part is the key -- 0010.
Table
ID Value
0 00001810000
1 00001810010
2 00001810500
3 00001810900
4 0000181090a
What I want to get from the above table is the Number Interval between rows, for above table the result is
[1, 9], [11, 4FF], [501, 8FF], [901, 909]
I can read all the records into memory and handle them via C++, but is it possible to implement it through MySQL statements only? How?
I would be tempted to match up a row with the previous row with something like this:-
SELECT sub1.id AS this_row_id,
sub1.value AS this_row_value,
z.id AS prev_row_id,
z.value AS prev_row_value
FROM
(
SELECT a.id, a.value, MAX(b.id) AS bid
FROM some_table a
INNER JOIN some_table b
ON a.id > b.id
GROUP BY a.id, a.value
) sub1
INNER JOIN some_table z
ON z.id = sub1.bid
You might want to use LEFT OUTER JOINs rather than INNER JOINs depending on what you want for the first record (where there is no previous record to match on).
Related
I have two similar SELECT queries that retrieve data from the same table "my_table".
-- 1st select
SELECT
my_table.id,
a,
b
FROM my_table
JOIN table2 ON u = v
JOIN table3 ON x = y
UNION ALL
-- 2st select
SELECT
my_table.id,
a,
b
FROM my_table
JOIN table2 ON r = s
JOIN table3 ON t = u
Duplicates are to be filtered out under the following conditions:
If the second select returns an id that is already present in the 1st select, it should be discarded.
Is there an easy solution without using a common table expression?
Note: The SQL does not have to be a UNION and can also be changed.
UNION filters out duplicate rows by default. UNION ALL does not remove duplicates.
But the duplicates are based on all columns being identical, not just the id column. If a given id value occurs in both queries, but any of the other two columns are different, then it counts as a distinct row.
If you want to reduce the result to a single row per id, the use a GROUP BY:
SELECT id, ...aggregate expressions...
FROM (
SELECT my_table.id, a, b ...
UNION
SELECT my_table.id, a, b ...
) AS t
GROUP BY id;
When you GROUP BY id, then any other expressions of the outer select-list must be in aggregate functions like MAX() or SUM(), etc.
The reason it is important to use an aggregate function is that when there are multiple rows with the same id value which you want to reduce to one row, what value should be displayed for a and b?
Example:
id
a
b
4
12
24
4
18
28
If you group by id, you would get one row for id=4, but what value for the other two columns?
id
a
b
4
?
?
Read https://dev.mysql.com/doc/refman/8.0/en/group-by-handling.html for more details on this. Or my answer to Reason for Column is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause
You must use an aggregate function, which includes GROUP_CONCAT() to append all the values from that column in a comma-separated list. Or you can use ANY_VALUE() which picks one of the values from that column arbitrarily.
I think this should do it:
-- 1st select
SELECT
my_table.id,
a,
b
FROM my_table
JOIN table2 ON u = v
JOIN table3 ON x = y
WHERE id NOT IN (
SELECT
my_table.id,
FROM my_table
JOIN table2 ON r = s
JOIN table3 ON t = u
)
UNION ALL
-- 2st select
SELECT
my_table.id,
a,
b
FROM my_table
JOIN table2 ON r = s
JOIN table3 ON t = u
Suppose I have four tables: tbl1 ... tbl4. Each has a unique numerical id field. tbl1, tbl2 and tbl3 each has a foreign key field for the next table in the sequence. E.g. tbl1 has a tbl2_id foreign key field, and so on. Each table also has a field order (and other fields not relevant to the question).
It is straightforward to join all four tables to return all rows of tbl1 together with corresponding fields from the other three fields. It is also easy to order this result set by a specific ORDER BY combination of the order fields. It is also easy to return just the row that corresponds to some particular id in tbl1, e.g. WHERE tbl1.id = 7777.
QUESTION: what query most efficiently returns (e.g.) 100 rows, starting from the row corresponding to id=7777, in the order determined by the specific combination of order fields?
Using ROW_NUMBER or (an emulation of it in MySQL version < 8) to get the position of the id=7777 row, and then using that in a new version of the same query to set the offset in the LIMIT clause would be one approach. (With a read lock in between.) But can it be done in a single query?
# FIRST QUERY: get row number of result row where tbl1.id = 7777
SELECT x.row_number
FROM
(SELECT #row_number:=#row_number+1 AS row_number, tbl1.id AS id
FROM (SELECT #row_number:=0) AS t, tbl1
INNER JOIN tbl2 ON tbl2.id = tbl1.tbl2_id
INNER JOIN tbl3 ON tbl3.id = tbl2.tbl3_id
INNER JOIN tbl4 ON tbl4.id = tbl3.tbl4_id
WHERE <some conditions>
ORDER BY tbl4.order, tbl3.order, tbl2.order, tbl1.order
) AS x
WHERE id=7777;
Store the row number from the above query and use it to bind :offset in the following query.
# SECOND QUERY : Get 100 rows starting from the one with id=7777
SELECT x.field1, x.field2, <etc.>
FROM
(SELECT #row_number:=#row_number+1 AS row_number, field1, field2
FROM (SELECT #row_number:=0) AS t, tbl1
INNER JOIN tbl2 ON tbl2.id = tbl1.tbl2_id
INNER JOIN tbl3 ON tbl3.id = tbl2.tbl3_id
INNER JOIN tbl4 ON tbl4.id = tbl3.tbl4_id
WHERE <same conditions as before>
ORDER BY tbl4.order, tbl3.order, tbl2.order, tbl1.order
) AS x
LIMIT :offset, 100;
Clarify question
In the general case, you won't ask for WHERE id1 > 7777. Instead, you have a tuple of (11,22,33,44) and you want to "continue where you left off".
Two discussions, with
That is messy, but not impossible. See Iterating through a compound key . Ig gives an example of doing it with 2 columns; 4 columns coming from 4 tables is an extension of such.
A variation
Here is another discussion of such: https://dba.stackexchange.com/questions/164428/should-i-store-data-pre-ordered-rather-than-ordering-on-the-fly/164755#164755
In actually implementing such, I have found that letting the "100" (LIMIT) be flexible can be easier to think through. The idea is: reach forward 100 rows (with LIMIT 100,1). Let's say you get (111,222,333,444). If you are currently at (111, ...), then deal with id2/3/4. If it is, say, (113, ...), then do WHERE id1 < 113 and leave off any specification of id2/3/4. This means fetching less than 100 rows, but it lands you just shy of starting id1=113.
That is, it involves constructing a WHERE clause with between 1 and 4 conditions.
In all cases, your query says ORDER BY id1, id2, id3, id4. And the only use for LIMIT is in the probe to figure out how far ahead the 100th row is (with LIMIT 100,1).
I think I can dig out some old Perl code for that.
There are 4 columns in table A, id, name, create_time and content.
create table A
(
id int primary key,
name varchar(20),
create_time datetime,
content varchar(4000)
);
create table B like A;
I want to select max create_time records in the same name, and insert into another table B.
Execute sql as follow, but the time consumption is unacceptable.
insert into B
select A.*
from A,
(select name, max(create_time) create_time from B group by name) tmp
where A.name = tmp.name
and A.create_time = tmp.create_time;
A table has 1000W rows and 10GB, execute sql spend 200s.
Is there any way to do this job faster, or change which parameters in MySQL Server to run faster.
p:
table A can be any type, paration table or some else.
First be sure you have proper index on A (name, create_time) and B (name, create_time)
then try using explicit join and on condtion
insert into B
select A.*
from A
inner join (
select name, max(create_time) create_time
from B
group by name) tmp on ( A.name = tmp.name and A.create_time = tmp.create_time)
The query you need is:
INSERT INTO B
SELECT m.*
FROM A m # m from "max"
LEFT JOIN A l # l from "later"
ON m.name = l.name # the same name
AND m.create_time < l.create_time # "l" was created later than "m"
WHERE l.name IS NULL # there is no "later"
How it works:
It joins A aliased as m (from "max") against itself aliased as l (from "later" than "max"). The LEFT JOIN ensures that, in the absence of a WHERE clause, all the rows from m are present in the result set. Each row from m is combined with all rows from l that have the same name (m.name = l.name) and are created after the row from m (m.create_time < l.create_time). The WHERE condition keeps into the results set only the rows from m that do not have any match in l (there is no record with the same name and greater creation time).
Discussion
If there are more than one rows in A that have the same name and creation_time, the query returns all of them. In order to keep only one of them and additional condition is required.
Add:
OR (m.create_time = l.create_time AND m.id < l.id)
to the ON clause (right before WHERE). Adjust/replace the m.id < l.id part of the condition to suit your needs (this version favors the rows inserted earlier in the table).
Make sure the table A has indexes on the columns used by the query (name and create_time). Otherwise the performance improvement compared with your original query is not significant.
I have a table where I store items and the time where they are relevant. For this question the following columns are relevant:
CREATE TABLE my_items
(
id INTEGER,
category INTEGER,
t DOUBLE
);
I want to select all items from a specific category (e.g. 1) and the sets of items that have a time within +- 5 (seconds) from these items.
I will probably do this with two types of queries in a script:
SELECT id,t from my_items where category=1;
then loop over the result set, using each result row's time as t_q1, and do a separate query:
SELECT id from my_items where t >= t_q1-5 AND t <= t_q1+5;
How can I do this in one query?
You can use a join. Take your subquery that selects all category 1 items, and join it with the original table on the condition that the time is within +/- five. It's possible that duplicate rows are returned, so you can group by id to avoid that:
SELECT t.*
FROM myTable t
JOIN (SELECT id, timeCol FROM myTable WHERE category = 1) t1
ON t.timeCol BETWEEN (t1.timeCol - 5) AND (t1.timeCol + 5)
OR t.id = t1.id
GROUP BY t.id;
I added the OR t.id = t1.id to make sure that the rows of category 1 are still included.
You can use a single query with all you criteria if there is only one table
SELECT id,t from my_items where category=1 AND t >= t_q1-5 AND t <= t_q1+5;
If there is two tables, use a right join on the timestamps table for performance.
select id
from my_items i,
(select min(t) min_t, max(t) max_t from my_items where category=1) i2
where i.category = 1 or
i.t between i2.min_t-5 and i2.max_t+5
I am having trouble with the relational algebra and transformation into SQL of this rather complicated query:
I need to select all values from table A joined to table B where there are no matching records in table B, or there are matching records but the set of matching records do not have a field that contains one of 4 of a possible 8 total values.
Database is MySQL 5.0... using an InnoDB engine for the tables.
Select
a.*
from
a
left join
b
on
a.id=b.id
where
b.id is null
or
b.field1 not in ("value1","value2","value3","value4");
I'm not sure if there is any real performance improvement but one other way is:
SELECT
*
FROM
tableA
WHERE
id NOT IN ( SELECT id FROM tableB WHERE field1 NOT IN ("value1", "value2"));
Your requirements are a bit unclear. My 1st interpretation is that you only want the A columns, and never more than 1 instance of a given A row.
select * from A where not exists (
select B.id
from B
where B.id=A.id
and B.field in ('badVal1','badVal2','badVal3','badVal4')
)
My 2nd interpretation is you want all columns from (A outer joined to B), with perhaps more than one instance of an A row if there are multiple B rows, as long as not exists B row with forbidden value.
select * from A
left outer join B on A.id=B.id
where not exists (
select C.id
from B as C
where A.id=C.id
and C.field in ('badVal1','badVal2','badVal3','badVal4')
)
Both queries could be expressed using NOT IN instead of correlated NOT EXISTS. Its hard to know which would be faster without knowing the data.