SQL Query optimization -performance issues - mysql

I have the following SQL query I want to optimize :
select table1.tiers as col1, table1.id_item as col2
from items table1
where (table1.tiers is not null)
and table1.tiers<>''
and table1.id_item = (select max(table2.id_item)
from items table2
where table1.tiers=table2.tiers)
and table1.n_version_item=(select max(table2.n_version_item)
from items table2
where table2.id_item=table1.id_item)
I tried this:
select table1.tiers as col1, table1.id_item as col2
from items table1
where (table1.tiers is not null)
and table1.tiers<> ''
and CONCAT(table1.id_item,table1.n_version_item) = (select CONCAT(max(table2.id_item),max(table2.n_version_item))
from items table2
where table2.id_item=table1.id_item
and table1.tiers=table2.tiers)
But I'am not getting the same result. Original first query is returning fewer rows than the modified one. Note that table items has a primary key (id,version) and for each couple a tier can be affected.

When using a function , it will prevent the index to be used , so CONCAT(table1.id_item,table1.n_version_item) will not read an index unless if its Function based index. however as a_horse_with_no_name mentioned in comments you can use the below :
select itm.tiers as col1, itm.id_item as col2
from items itm
where itm.tiers is not null
and itm.tiers<>''
and (itm.id_item , itm.n_version_item)= (select
max(item_sub.id_item),max(item_sub.n_version_item)
from items item_sub
where itm.tiers=item_sub.tiers)
Then you have to check the query plan of the query what index is using ( you can start an index with column tiers and other index on id_item and n_version_item )

I think you want:
select i.tiers as col1, i.id_item as col2
from items i
where i.tiers is not null and -- redundant, but I'm leaving it in
i.tiers <> ''
(id_item, n_version_item) = (select i2.id_item, max(i2.n_version_item)
from items i2
where i2.tiers = i.tiers
order by i2.id_item desc, i2.n_version_item desc
limit 1
);
For this version, you want an index on items(tiers, id_item, n_version_item).

If you hide a column inside a 'function' (CONCAT, DATE, etc, etc), no index can be used to help performance. This eliminates your second version from consideration.
Related to that is the use of "Row Constructors" (see a_horse_with_no_name's Comment). They have historically been poorly optimized; avoid them. I am referring to WHERE (a,b) IN ( (1,2), ...) or other variants.
Now, let's dissect
and table1.id_item = (select max(table2.id_item)
from items table2
where table1.tiers=table2.tiers)
table2 needs INDEX(tiers, id_item) in that order. With such, the subquery is very fast. The other subquery needs INDEX(id_item, n_version_item) Those feed into the rest:
and table1.id_item = <<value>>
Now let's look at the whole
where (table1.tiers is not null)
and table1.tiers<>''
and table1.id_item = <<value>>
and table1.n_version_item = <<value>>
= is easy to optimize; the others are not. So let's build
INDEX(id_item, n_version_item, -- in either order
tiers) -- last
By using the order I specified, you can avoid also needing INDEX(id_item, n_version_item) that was mentioned above.
(It would help if you provided SHOW CREATE TABLE; I need to know what the PK is, and some other things.)
As a bonus, these indexes will be "covering indexes".
As a final note (a minor one):
where (table1.tiers is not null)
and table1.tiers<>''
It would be better to decide on only one encoding (NULL vs empty string) to whatever you are indicating by such.

Related

Calculating unmatching rows in partitioned table in hive

I have a use-case where I have to calculate unmatching rows(excluding matching records) from two different partition's from a partitioned hive table.
Let's suppose there is a partitioned table called test which is partitioned on column as_of_date. Now to get the unmatching rows I tried with two option-
1.)
select count(x.item_id)
from
(select coalesce(test_new.item_id, test_old.item_id) as item_id
from
(select item_id from test where as_of_date = '2019-03-10') test_new
full outer join
(select item_id from test where as_of_date = '2019-03-09') test_old
on test_new.item_id = test_old.item_id
where coalesce(test_new.item_id,0) != coalesce(test_old.item_id,0)) as x;
2.) I am creating a view first and then querying on that
create view test_diff as
select coalesce(test_new.item_id, test_old.item_id) as item_id, coalesce(test_new.as_of_date, date_add(test_old.as_of_date, 1)) as as_of_date
from test test_new
full outer join test test_old
on (test_new.item_id = test_old.item_id and date_sub(test_new.as_of_date, 1) = test_old.as_of_date)
where coalesce(test_new.item_id,0) != coalesce(test_old.item_id,0);
Then I am using query
select count(distinct item_id) from test_diff where as_of_date = '2019-03-10';
Both the case are returning different count. In second option I am getting lesser count. Please provide any suggestion on why counts are different.
Assuming you taken care of test_new,test_old tables(filtered with as_of_date = '2019-03-10') in 2nd option.
1st option , you are using select clause count(X.item_id), where as 2nd option count(distinct). distinct might have reduced your item count in later option.

mysql update table set column3 where table1.column1 like concat ('%',table2.column2,'%')

I saw recently (can't find it now) this syntax:
... LIKE CONCAT('%',col1,'%')
It is working for Selects but for update, it affects 0 rows
this is my query:
update locations set email = (
select col2 from vendoremail
where locations.city LIKE CONCAT('%',col1,'%')
AND locations.zip LIKE CONCAT('%',col1,'%')
)
here is a sample of col1 :
"455 N Cherokee St: Muskogee, OK 74403"
without the quotes
I hope I have given enough data to elicit an answer or two - thank you!
You have it backwards. You want to put the city and zip into the pattern.
update locations set email = (
select col2 from vendoremail
where col1 LIKE CONCAT('%', locations.city, '%', locations.zip, '%')
)
However, this may not always work properly. If you have two vendors in the same city+zip, the subquery will return 2 emails, but when you use a subquery as a value it has to return only 1 row. You can add LIMIT 1 to the subquery to prevent an error when this happens. But it will be selecting one of the vendors unpredictably -- maybe you should come up with a more reliable way to match the tables.
If col1 is = "455 N Cherokee St: Muskogee, OK 74403"
i think location.city is = Muskogee and locations.zip is = 74403
then the query should be
update locations
set email = (
select col2 from vendoremail
where col1 LIKE CONCAT('%',locations.city,'%')
AND col1 locations.zip LIKE CONCAT('%',locations.zip,'%')
)

why using IN (or NOT IN) clause in a query makes it really slow

I have a query:
SELECT DISTINCT field1 FROM table1 WHERE field2 = something
(table1 contains 1 million records, execution time:0.106sec, returns: 20 records)
Another query
SELECT DISTINCT similarField1 FROM table2 WHERE similarField2 = somethingElse
(table2 contains half million records, execution time:0.078sec, returns: 20 records)
Now if I run a query, by combining above both:
SELECT DISTINCT field1 FROM table1 WHERE field2 = something AND field1 NOT IN (SELECT DISTINCT similarField1 FROM table2 WHERE similarField2 = somethingElse)
It does't give result even running for 10mins. Why it has became dramatically slow, and what could be a potential solution.
edit: I am using MySQL with dbvisualizer 6.5
You don't need to use DISTINCT on the sub-query. Try to use NOT EXISTS which probably is more efficient in SQL-Server:
SELECT DISTINCT field1
FROM table1
WHERE field2 = #something
AND NOT EXISTS
(
SELECT 1 FROM table2
WHERE table2.similarfield1 = table1.field2
AND table2.similarfield2 = #somethingelse
)
Edit: Since you have updated the tags, i'm not sure if this is more efficient in MySql. However, i'd prefer NOT EXISTS anyway since it also works with NULL values(if you use IS NULL) and is easier to read and to maintain.
my query and advice are similar to #TimSchmelter.
In fact you should not use distinct at all. First you should remove distinct and check if you are getting duplicate records you have just ask part of your problem.Table design are not clear.
You should post your complete problem and query here without any hesitant. Also don't forget to apply index on feild2, feild1,similarField1,similarField2.
SELECT DISTINCT field1
FROM table1 tbl1
WHERE field2 = something
AND NOT EXISTS (
SELECT similarField1
FROM table2 tbl2
WHERE tbl1.field1 = tbl2.similarField1
AND similarField2 = somethingElse
)

Reorder a MYSQL table

I have a MySql table with a 'Order' field but when a record gets deleted a gap appears
how can i update my 'Order' field sequentially ?
If possible in one query 1 1
id.........order
1...........1
5...........2
4...........4
3...........6
5...........8
to
id.........order
1...........1
5...........2
4...........3
3...........4
5...........5
I could do this record by record
Getting a SELECT orderd by Order and row by row changing the Order field
but to be honest i don't like it.
thanks
Extra info :
I also would like to change it this way :
id.........order
1...........1
5...........2
4...........3
3...........3.5
5...........4
to
id.........order
1...........1
5...........2
4...........3
3...........4
5...........5
In MySQL you can do this:
update t join
(select t.*, (#rn := #rn + 1) as rn
from t cross join
(select #rn := 0) const
order by t.`order`
) torder
on t.id = torder.id
set `order` = torder.rn;
In most databases, you can also do this with a correlated subquery. But this might be a problem in MySQL because it doesn't allow the table being updated as a subquery:
update t
set `order` = (select count(*)
from t t2
where t2.`order` < t.`order` or
(t2.`order` = t.`order` and t2.id <= t.id)
);
There is no need to re-number or re-order. The table just gives you all your data. If you need it presented a certain way, that is the job of a query.
You don't even need to change the order value in the query either, just do:
SELECT * FROM MyTable WHERE mycolumn = 'MyCondition' ORDER BY order;
The above answer is excellent but it took me a while to grok it so I offer a slight rewrite which I hope brings clarity to others faster:
update
originalTable
join (select originalTable.ID,
(#newValue := #newValue + 10) as newValue
from originalTable
cross join (select #newValue := 0) newTable
order by originalTable.Sequence)
originalTable_reordered
on originalTable.ID = originalTable_reordered.ID
set originalTable.Sequence = originalTable_reordered.newValue;
Note that originalTable.* is NOT required - only the field used for the final join.
My example assumes the field to be updated is called Sequence (perhaps clearer in intent than order but mainly sidesteps the reserved keyword issue)
What took me a while to get was that "const" in the original answer was not a MySQL keyword. (I'm never a fan of abbreviations for that reason -- the can be interpreted many ways at times especially at these very when it is best they not be misinterpreted. Makes for verbose code I know but clarity always trumps convenience in my books.)
Not quite sure what the select #newValue := 0 is for but I think this is a side effect of having to express a variable before it can be used later on.
The value of this update is of course an atomic update to all the rows in question rather than doing a data pull and updating single rows one by one pragmatically.
My next question, which should not be difficult to ascertain, but I've learned that SQL can be a trick beast at the best of times, is to see if this can be safely done on a subset of data. (Where some originalTable.parentID is a set value).

SQL ANY & ALL Operators

I have started using sql and have heard much about the ANY and ALL operators. Can somebody explain to me the kind of queries they are used in and how they work?
The ANY and ALL operators allow you to perform a comparison between a single column value and a range of other values. For instance:
select * from Table1 t1 where t1.Col1 < ANY(select value from Table2)
ANY means that the condition will be satisfied if the operation is true for any of the values in the range. ALL means that the condition will be satisfied only if the operation is true for all values in the range.
To use an example that might hit closer to home, doing this:
select * from Table1 t1 where t1.Col1 = ANY(select value from Table2)
Is the same as doing this:
select * from Table1 t1 where t1.Col1 in (select value from Table2)
I have heard much about the ANY and
ALL operators
I'm mildly surprised: I rarely see them used myself. Far more commonly seen are WHERE val IN (subquery) and WHERE EXISTS (subquery).
To borrow #Adam Robinson's example:
SELECT *
FROM Table1 AS t1
WHERE t1.Col1 < ANY (
SELECT value
FROM Table2
);
I more usually see this written like this:
SELECT *
FROM Table1 AS t1
WHERE EXISTS (
SELECT *
FROM Table2 AS t2
WHERE t1.Col1 < t2.value
);
I find this construct easier to read because the parameters of the predicate (t1.Col1 and t2.value respectively) are closer together.
Answers above addressed some aspects of "ANY" and did not address "ALL".
Both of these are more useful when comparing against another table and its entries are changing dynamically.
Especially true for < ANY and > ANY, since for static arguments, you could just take MAX/MIN respectively, and drop the "ANY".
For example, this query -
SELECT ProductName, ProductID FROM Products
WHERE ProductID > ANY (100, 200, 300);
can be simplified to -
SELECT ProductName, ProductID FROM Products
WHERE ProductID > 100;
Note that the "ALL" query will end up comparing one column value with ALL (...) which will always be false unless "ALL" arguments are identical.
For ex -
SELECT ProductName, ProductID FROM Products
WHERE ProductID = ALL (SELECT ProductID FROM OrderDetails);
which is always empty/ false when subquery is multi-valued like -
SELECT ProductName, ProductID FROM Products
WHERE ProductID = ALL (10, 20, 30);
Adding to Adam's reply, be wary that the syntax can be ambiguous:
SELECT b1 = ANY((SELECT b2 FROM t2 ...)) FROM t1 ...;
Here ANY can be considered either as introducing a subquery, or as being an aggregate function, if the subquery returns one row with a Boolean value. (via postgresql.org)
Sample query that may put some context into this. Let's say we have a database of major league baseball players and we have a database of common Puerto Rican last names. Let's say somebody wanted to see how common Puerto Rican players are on the MLB. They could run the following query:
SELECT mlb_roster.last_name FROM mlb_roster WHERE mlb_roster.last_name = ANY (SELECT common_pr_names.last_name FROM common_pr_names)
What the query is doing here is comparing the last names on the MLB roster and displaying only the ones that are also found on the list of common Puerto Rican names.