I have 2 columns: column1 (id) is primary and column2 (title) is not primary (I mean contents of column2 can be repetitive in column2). Then I want to know the selecting speed is the same for the following two lines of code or not ?
Query #1:
SELECT *
FROM table
WHERE id = '$id' AND title = '$title';
Query #2:
SELECT *
FROM table
WHERE title = '$title' AND id = '$id';
Your two queries should have exactly the same execution plan. Both conditions are applied at the same time or at least in the same way.
If you want to optimize this query:
SELECT *
FROM table
WHERE id = '$id' AND title = '$title';
Then you can use an index:
create index idx_table_id_title on table(id, title)
Also, when writing queries in an application, you should use parameters for the queries rather than substituting values directly into the query string.
Related
I have a use-case where I have to calculate unmatching rows(excluding matching records) from two different partition's from a partitioned hive table.
Let's suppose there is a partitioned table called test which is partitioned on column as_of_date. Now to get the unmatching rows I tried with two option-
1.)
select count(x.item_id)
from
(select coalesce(test_new.item_id, test_old.item_id) as item_id
from
(select item_id from test where as_of_date = '2019-03-10') test_new
full outer join
(select item_id from test where as_of_date = '2019-03-09') test_old
on test_new.item_id = test_old.item_id
where coalesce(test_new.item_id,0) != coalesce(test_old.item_id,0)) as x;
2.) I am creating a view first and then querying on that
create view test_diff as
select coalesce(test_new.item_id, test_old.item_id) as item_id, coalesce(test_new.as_of_date, date_add(test_old.as_of_date, 1)) as as_of_date
from test test_new
full outer join test test_old
on (test_new.item_id = test_old.item_id and date_sub(test_new.as_of_date, 1) = test_old.as_of_date)
where coalesce(test_new.item_id,0) != coalesce(test_old.item_id,0);
Then I am using query
select count(distinct item_id) from test_diff where as_of_date = '2019-03-10';
Both the case are returning different count. In second option I am getting lesser count. Please provide any suggestion on why counts are different.
Assuming you taken care of test_new,test_old tables(filtered with as_of_date = '2019-03-10') in 2nd option.
1st option , you are using select clause count(X.item_id), where as 2nd option count(distinct). distinct might have reduced your item count in later option.
I have the following SQL query I want to optimize :
select table1.tiers as col1, table1.id_item as col2
from items table1
where (table1.tiers is not null)
and table1.tiers<>''
and table1.id_item = (select max(table2.id_item)
from items table2
where table1.tiers=table2.tiers)
and table1.n_version_item=(select max(table2.n_version_item)
from items table2
where table2.id_item=table1.id_item)
I tried this:
select table1.tiers as col1, table1.id_item as col2
from items table1
where (table1.tiers is not null)
and table1.tiers<> ''
and CONCAT(table1.id_item,table1.n_version_item) = (select CONCAT(max(table2.id_item),max(table2.n_version_item))
from items table2
where table2.id_item=table1.id_item
and table1.tiers=table2.tiers)
But I'am not getting the same result. Original first query is returning fewer rows than the modified one. Note that table items has a primary key (id,version) and for each couple a tier can be affected.
When using a function , it will prevent the index to be used , so CONCAT(table1.id_item,table1.n_version_item) will not read an index unless if its Function based index. however as a_horse_with_no_name mentioned in comments you can use the below :
select itm.tiers as col1, itm.id_item as col2
from items itm
where itm.tiers is not null
and itm.tiers<>''
and (itm.id_item , itm.n_version_item)= (select
max(item_sub.id_item),max(item_sub.n_version_item)
from items item_sub
where itm.tiers=item_sub.tiers)
Then you have to check the query plan of the query what index is using ( you can start an index with column tiers and other index on id_item and n_version_item )
I think you want:
select i.tiers as col1, i.id_item as col2
from items i
where i.tiers is not null and -- redundant, but I'm leaving it in
i.tiers <> ''
(id_item, n_version_item) = (select i2.id_item, max(i2.n_version_item)
from items i2
where i2.tiers = i.tiers
order by i2.id_item desc, i2.n_version_item desc
limit 1
);
For this version, you want an index on items(tiers, id_item, n_version_item).
If you hide a column inside a 'function' (CONCAT, DATE, etc, etc), no index can be used to help performance. This eliminates your second version from consideration.
Related to that is the use of "Row Constructors" (see a_horse_with_no_name's Comment). They have historically been poorly optimized; avoid them. I am referring to WHERE (a,b) IN ( (1,2), ...) or other variants.
Now, let's dissect
and table1.id_item = (select max(table2.id_item)
from items table2
where table1.tiers=table2.tiers)
table2 needs INDEX(tiers, id_item) in that order. With such, the subquery is very fast. The other subquery needs INDEX(id_item, n_version_item) Those feed into the rest:
and table1.id_item = <<value>>
Now let's look at the whole
where (table1.tiers is not null)
and table1.tiers<>''
and table1.id_item = <<value>>
and table1.n_version_item = <<value>>
= is easy to optimize; the others are not. So let's build
INDEX(id_item, n_version_item, -- in either order
tiers) -- last
By using the order I specified, you can avoid also needing INDEX(id_item, n_version_item) that was mentioned above.
(It would help if you provided SHOW CREATE TABLE; I need to know what the PK is, and some other things.)
As a bonus, these indexes will be "covering indexes".
As a final note (a minor one):
where (table1.tiers is not null)
and table1.tiers<>''
It would be better to decide on only one encoding (NULL vs empty string) to whatever you are indicating by such.
I have the below SQL statement and I am trying to add a USE INDEX clause to index on percent and doc_type which I have already added. No matter where I put the USE INDEX (ipercent,idoc_type) clause, it gives me an error say USE is in the wrong place. Any ideas?
select name,e_title
from
(select * from problem2.workson natural join
(problem2.documents,problem2.employees)) as newTable
where percent = 100
and (doc_type = 'internal-report'
or doc_type = 'external-report')
group by name
having count(name) > 1
Explain output:
The schematic code of what I am trying to do:
INPUT VAR inputOne; (First input of the desired statement)
INPUT VAR inputTwo; (Second input of the desired statement)
INPUT VAR inputThree; (Third input of the desired statement)
-
VAR repResult = getResult("SELECT * FROM `representatives` WHERE `rID` = inputOne LIMIT 1;")
VAR evResult = getResult("SELECT `events`.`eID` FROM `events` WHERE `eventDateTime` = inputTwo LIMIT 1;")
if (repResult != null && evResult != null) {
execureQuery("INSERT INTO `votes` (`representatives_rID`, `events_eID`, `voteResult`) VALUES(inputOne,evResult.eID,inputThree);");
}
It is quite slow, when I execute them in separated statement, especially because there are ~1.000.000 that needs to be checked and inserted.
I was wondering, if there is any alternative, one-query way of doing this.
You can use INSERT-SELECT syntax to accomplish this:
INSERT INTO `votes` (`representatives_rID`, `events_eID`, `voteResult`)
select inputOne, `events`.`eID`, inputThree FROM `events` WHERE `eventDateTime` = inputTwo LIMIT 1
The above combines all three params into one INSERT-SELECT statement where the results of the select are sent to the insert.
See: Insert into ... values ( SELECT ... FROM ... ) for select-insert statement.
Yes, you can put these statements into 1 Stored Procedure (which returns 2 resultsets and performs 1 insert), but no, this probably wouldn't help. Because 3 SQL statements are not a big amount of network traffic, and because Stored Procedures are slow in MySQL.
Is rID a primary key? Does the first query extract big fields you don't really need?
Is there a unique index on eventDateTime? If the table is not InnoDB, the index should explicitly include eID, so it becomes a covering index.
After making those keys, you can drop LIMIT 1.
Are rID and eID datatypes as small as possible?
I have a table defined like the following...
CREATE table actions (
id INTEGER PRIMARY KEY AUTO_INCREMENT,
end BOOLEAN,
type VARCHAR(15) NOT NULL,
subtype_a VARCHAR(15),
subtype_b VARCHAR(15),
);
I'm trying to query for the last end action of some type to happen on each unique (subtype_a, subtype_b) pair, similar to a group by (except SQLite doesn't say what row is guaranteed to be returned by a group by).
On an SQLite database of about 1MB, the query I have now can take upwards of two seconds, but I need to speed it up to take under a second (since this will be called frequently).
example query:
SELECT * FROM actions a_out
WHERE id =
(SELECT MAX(a_in.id) FROM actions a_in
WHERE a_out.subtype_a = a_in.subtype_a
AND a_out.subtype_b = a_in.subtype_b
AND a_in.status IS NOT NULL
AND a_in.type = "some_type");
If it helps, I know all the unique possibilities for a (subtype_a,subtype_b)
eg:
(a,1)
(a,2)
(b,3)
(b,4)
(b,5)
(b,6)
Beginning with version 3.7.11, SQLite guarantees which record is returned in a group:
Queries of the form: "SELECT max(x), y FROM table" returns the value of y on the same row that contains the maximum x value.
So greatest-n-per-group can be implemented in a much simpler way:
SELECT *, max(id)
FROM actions
WHERE type = 'some_type'
GROUP BY subtype_a, subtype_b
Is this any faster?
select * from actions where id in (select max(id) from actions where type="some_type" group by subtype_a, subtype_b);
This is the greatest-in-per-group problem that comes up frequently on StackOverflow.
Here's how I solve it:
SELECT a_out.* FROM actions a_out
LEFT OUTER JOIN actions a_in ON a_out.subtype_a = a_in.subtype_a
AND a_out.subtype_b = a_in.subtype_b
AND a_out.id < a_in.id
WHERE a_out.type = "some type" AND a_in.id IS NULL
If you have an index on (type, subtype_a, subtype_b, id) this should run very fast.
See also my answers to similar SQL questions:
Fetch the row which has the Max value for a column
Retrieving the last record in each group
SQL join: selecting the last records in a one-to-many relationship
Or this brilliant article by Jan Kneschke: Groupwise Max.