TSQL Multiple IN queries bad performance - sql-server-2008

Why queries like
delete from A where F1 IN (1,2,3,5,5) and F2 IN (7,9,10,11)
are so slow (F1 and F2 are indexed, stats updated) and how do you
optimize them?

Given your example, I'm not sure there's anything you could do to increase performance.
However, your example is simplistic, and if instead your example were using subqueries in the IN statements, then it would probably have room for improvement, perhaps by using an EXISTS instead or just joining. I think the meat of this question is probably about performance issues with IN statements though, right?
Your best tool when considering performance is to examine the explain plans of different solutions and see which one makes most sense for the amount and types of data you expect.
This SO post explains some about how an IN statement works...
SQL Server IN vs. EXISTS Performance
Here's a blog that also discusses performance factors...
http://sqlknowledgebank.blogspot.com/2012/11/in-exists-clause-and-their-performance.html

By guess is a dual loop
My sample is a select
It is a lot faster to optimize a select first
With a join on a PK the query optimizer has more to work with
But with the PK you cannot insert 5 twice
create table #tempF1 (ID int primary key);
insert into #tempF1 values (1),(2),(3),(4);
create table #tempF2 (ID int primary key);
insert into #tempF2 values (1),(2),(3),(5);
select *
from tbl
inner merge join #tempF1
on tbl.F1 = #tempF1.ID
inner merge join #tempF2
on tbl.F1 = #tempF2.ID
May not work in your situation and test other join hints and no hint
I use this technique on some big tables with complex queries where the query optimizer got stupid

Related

Should i rather use a subquery or a combined WHERE?

This specific situation may seem a bit silly, but i just want to know how i should solve it: there is a table (schools) and in this table you find all students with their school-id. The order is completely random, but with a SELECT statement you can sort it.
CREATE TABLE schools (school_id int, name varchar(32), age ...);
Now i want to search for a student by his name (with LIKE '%name%'), but only if he's in a certain school.
I already tried this:
SELECT * FROM `schools` WHERE `school_id` = 33 and `name` LIKE '%max%';
But then i realized, that i could also use subqueries like:
SELECT * FROM (SELECT * FROM `schools` WHERE `school_id` = 33) AS a
WHERE a.name LIKE '%max%';
Which way is more efficient/has a higher performance?
You can use the EXPLAIN keyword to see exactly how each query is executed.
I'd say it's almost a definite that these two will execute identically.
The query optimizer will probably choose the same plan for both queries. If you want to know for sure, look at the execution plan when you execute each query.
The query without the subquery is probably more efficient in MySQL:
SELECT *
FROM `schools`
WHERE `school_id` = 33 and `name` LIKE '%max%';
MySQL has this nasty tendency to materialize subqueries -- that is, to actually run the subquery and save it as a temporary table (it is getting better, though). Most other databases do not do this. So, in other databases, the two should be equivalent.
MySQL is smart enough to use an index, if available, for school_id, even though there are other comparisons. If no indexes are available, it will be doing a full table scan, which will probably dominate the performance.

EXISTS vs ALL, ANY, SOME

I'm trying to understand the difference between EXISTS and ALL in MySQL. Let me give you an example:
SELECT *
FROM table1
WHERE NOT EXISTS (
SELECT *
FROM table2
WHERE table2.val < table1.val
);
SELECT *
FROM table1
WHERE val <= ALL( SELECT val FROM table2 );
A quote from MySQL docs:
Traditionally, an EXISTS subquery starts with SELECT *, but it could
begin with SELECT 5 or SELECT column1 or anything at all. MySQL
ignores the SELECT list in such a subquery, so it makes no difference. [1]
Reading this, it seems to me that mysql should be able to translate both queries to the same relational algebra expression. Both queries are just a simple comparison between values from two tables. However, that doesn't seem to be the case. I tried both queries and the second one performs much better than the first one.
How are this queries exactly handled by the optimizer?
Why the optimizer can't make the first query perform as the second one?
Is it always more efficient to use an ALL/ANY/SOME condition?
The queries in your question are not equivalent, so they will have different execution plans regardless of how well they're optimized. If you used NOT val > ANY(...) then it would be equivalent.
You should always use EXPLAIN to see the execution plan of a query and realize that the execution plan can change as your data changes. Testing and understanding the execution plan will help you determine which methods perform better. There is no hard and fast rule for ALL/ANY/SOME and they're often optimized down to an EXISTS.

MySQL performance of VIEW for tables combined with UNION ALL

Let's say I have 2 tables in MySQL:
create table `persons` (
`id` bigint unsigned not null auto_increment,
`first_name` varchar(64),
`surname` varchar(64),
primary key(`id`)
);
create table `companies` (
`id` bigint unsigned not null auto_increment,
`name` varchar(128),
primary key(`id`)
);
Now, very often I need to treat them the same, that's why following query:
select person.id as `id`, concat(person.first_name, ' ', person.surname) as `name`, 'person' as `person_type`
from persons
union all
select company.id as `id`, company.name as `name`, 'company' as `person_type`
from companies
starts to appear in other queries quite often: as part of joins or subselects.
For now, I simply inject this query into joins or subselects like:
select *
from some_table row
left outer join (>>> query from above goes here <<<) as `persons`
on row.person_id = persons.id and row.person_type = persons.person_type
But, today I had to use discussed union query into another query multiple times i.e. join it twice.
Since I never had experience with views and heard that they have many disadvantages, my question is:
Is it normal practice to create a view for discussed union query and use it in my joins , subselects etc? In terms of performance - will it be worse, equal or better comparing to just inserting it into joins, subselects etc? Are there any drawbacks of having a view in this case?
Thanks in advance for any help!
I concur with all of the points in Bill Karwin's excellent answer.
Q: Is it normal practice to create a view for discussed union query and use it in my joins, subselects etc?
A: With MySQL the more normal practices is to avoid using "CREATE VIEW" statement.
Q: In terms of performance - will it be worse, equal or better comparing to just inserting it into joins, subselects etc?
A: Referencing a view object will have the identical performance to an equivalent inline view.
(There might be a teensy-tiny bit more work to lookup the view object, checking privileges, and then replace the view reference with the stored SQL, vs. sending a statement that is just a teeny-tiny bit longer. But any of those differences are insignificant.)
Q: Are there any drawbacks of having a view in this case?
A: The biggest drawback is in how MySQL processes a view, whether it's stored or inline. MySQL will always run the view query and materialize the results from that query as a temporary MyISAM table. But there's no difference there whether the view definition is stored, or whether it's included inline. (Other RDBMSs process views much differently than MySQL).
One big drawback of a view is that predicates from the outer query NEVER get pushed down into the view query. Every time you reference that view, even with a query for a single id value, MySQL is going to run the view query and create a temporary MyISAM table (with no indexes on it), and THEN MySQL will run the outer query against that temporary MyISAM table.
So, in terms of performance, think of a reference to a view on par with "CREATE TEMPORARY TABLE t (cols) ENGINE=MyISAM" and "INSERT INTO t (cols) SELECT ...".
MySQL actually refers to an inline view as a "derived table", and that name makes a lot of sense, when we understand what MySQL is doing with it.
My personal preference is to not use the "CREATE VIEW" statement. The biggest drawback (as I see it) is that it "hides" SQL that is being executed. For the future reader, the reference to the view looks like a table. And then, when he goes to write a SQL statement, he's going to reference the view like it was a table, so very convenient. Then he decides he's going to join that table to itself, with another reference to it. (For the second reference, MySQL also runs that query again, and creates yet another temporary (and unindexed) MyISAM table. And now there's a JOIN operation on that. And then a predicate "WHERE view.column = 'foo'" gets added on the outer query.
It ends up "hiding" the most obvious performance improvement, sliding that predicate into the view query.
And then, someone comes along and decides they are going to create new view, which references the old view. He only needs a subset of rows, and can't modify the existing view because that might break something, so he creates a new view... CREATE VIEW myview FROM publicview p WHERE p.col = 'foo'.
And, now, a reference to myview is going to first run the publicview query, create a temporary MyISAM table, then the myview query gets run against that, creating another temporary MyISAM table, which the outer query is going to run against.
Basically, the convenience of the view has the potential for unintentional performance problems. With the view definition available on the database for anyone to use, someone is going to use it, even where it's not the most appropriate solution.
At least with an inline view, the person writing the SQL statement is more aware of the actual SQL being executed, and having all that SQL laid out gives an opportunity for tweaking it for performance.
My two cents.
TAMING BEASTLY SQL
I find that applying regular formatting rules (that my tools automatically do) can bend monstrous SQL into something I can read and work with.
SELECT row.col1
, row.col2
, person.*
FROM some_table row
LEFT
JOIN ( SELECT 'person' AS `person_type`
, p.id AS `id`
, CONCAT(p.first_name,' ',p.surname) AS `name`
FROM person p
UNION ALL
SELECT 'company' AS `person_type`
, c.id AS `id`
, c.name AS `name`
FROM company c
) person
ON person.id = row.person_id
AND person.person_type = row.person_type
I'd be equally likely to avoid the inline view at all, and use conditional expressions in the SELECT list, though this does get more unwieldy for lots of columns.
SELECT row.col1
, row.col2
, row.person_type AS ref_person_type
, row.person_id AS ref_person_id
, CASE
WHEN row.person_type = 'person' THEN p.id
WHEN row.person_type = 'company' THEN c.id
END AS `person_id`
, CASE
WHEN row.person_type = 'person' THEN CONCAT(p.first_name,' ',p.surname)
WHEN row.person_type = 'company' THEN c.name
END AS `name`
FROM some_table row
LEFT
JOIN person p
ON row.person_type = 'person'
AND p.id = row.person_id
LEFT
JOIN company c
ON row.person_type = 'company'
AND c.id = row.person_id
A view makes your SQL shorter. That's all.
It's a common misconception for MySQL users that views store anything. They don't (at least not in MySQL). They're more like an alias or a macro. Querying the view is most often just like running the query in the "expanded" form. Querying a view twice in one query (as in the join example you mentioned) doesn't take any advantage of the view -- it will run the query twice.
In fact, view can cause worse performance, depending on the query and how you use them, because they may need to store the result in a temporary table every time you query them.
See http://dev.mysql.com/doc/refman/5.6/en/view-algorithms.html for more details on when a view uses the temptable algorithm.
On the other hand, UNION queries also create temporary tables as they accumulate their results. So you're stuck with the cost of a temp table anyway.

SQL Server Partition Performance Issue

I have the following scenario:
TableA (ID GUID, Type INT) : +60M rows
TableB (ID GUID) : +5M rows
TableA has an Index on ID and Type
TableB the Primary Key is ID
I need to improve the following query:
SELECT * FROM TableA A
INNER JOIN TableB B
ON A.ID = B.ID AND A.Type = 5
The query takes about 30 seconds to complete.
We have tried partitioning TableA on the Type column but the query execution time remains the same. Even the execution plan is still the same. As far as I understood partitioning the table should greatly improve the performance?
Do I have to adjust my query to use the partition thus increasing performance?
Are my indexes wrong?
Thanks in advance!
You are one of the people who think partitioning is a magic switch that improves performance when pressed. Partitioning mostly reduces performance and helps in a few narrow cases. It is mostly a management feature for bulk loading and data archiving/deletion.
Partitioning introduces serious consequences and cannot be done without proper understanding, planing and testing.
Create the proper indices (in your case A(Type, ID) would be a good start. Alternatively A(ID) WHERE Type = 5 (a filtered index)).

Better way to accomplish Nested SQL Query?

Right now I'm implementing the following sql query for an iphone-app, and I'm using HTTP GET. The SQL query does not contain joins, so is it efficient enough?
SELECT
menu_name
FROM Menus
WHERE
menu_id IN (
SELECT
menus_id
FROM Restaurants_Menus
WHERE Restaurants_id = '$restaurantID'
)
Only you can answer if it is efficient enough. If it meets your needs, then it is fine. However, it may be faster if you use a JOIN:
SELECT
Menus.menu_name
FROM
Menus
JOIN Restaurants_Menus ON Menus.menu_id = Restaurants_Menus.menus_id
WHERE Restaurants_Menus.Restaurants_id = '$restaurantID'
You can run them both with EXPLAIN to determine where indexes are being used and judge the query execution time. If Restaurants_Menus is not a large table, and Restaurants_id is a primary key, the two queries are not likely to differ much in execution time.