Should i rather use a subquery or a combined WHERE?

Should i rather use a subquery or a combined WHERE? - mysql

This specific situation may seem a bit silly, but i just want to know how i should solve it: there is a table (schools) and in this table you find all students with their school-id. The order is completely random, but with a SELECT statement you can sort it.
CREATE TABLE schools (school_id int, name varchar(32), age ...);
Now i want to search for a student by his name (with LIKE '%name%'), but only if he's in a certain school.
I already tried this:
SELECT * FROM `schools` WHERE `school_id` = 33 and `name` LIKE '%max%';
But then i realized, that i could also use subqueries like:
SELECT * FROM (SELECT * FROM `schools` WHERE `school_id` = 33) AS a
WHERE a.name LIKE '%max%';
Which way is more efficient/has a higher performance?

You can use the EXPLAIN keyword to see exactly how each query is executed.
I'd say it's almost a definite that these two will execute identically.

The query optimizer will probably choose the same plan for both queries. If you want to know for sure, look at the execution plan when you execute each query.

The query without the subquery is probably more efficient in MySQL:
SELECT *
FROM `schools`
WHERE `school_id` = 33 and `name` LIKE '%max%';
MySQL has this nasty tendency to materialize subqueries -- that is, to actually run the subquery and save it as a temporary table (it is getting better, though). Most other databases do not do this. So, in other databases, the two should be equivalent.
MySQL is smart enough to use an index, if available, for school_id, even though there are other comparisons. If no indexes are available, it will be doing a full table scan, which will probably dominate the performance.

Related

MySQL order by makes query slow solved, but not sure why

I have the following query:
select *
from
`twitter_posts`
where
`main_handle_id` in (
select
`twitter`.`main_handle_id`
from
`users`
inner join `twitter` on `twitter`.`user_id` = `user`.`id`
where
`users` LIKE 'foo'
)
order by created_at
This query runs dramatically slow when the order by is used. The twitter_posts table has indexes on the created_at timestamp and on the main_handle_id column.
I used "explain" to see what the engine was planning on doing and noticed a filesort... which I was not expecting to see as an index exists on the twitter_posts table. After skimming through the various posts covering this on Stackoverflow and blogs, none of the examples worked for me. I had a hunch, that maybe, the sql optimizer got confused somehow with the nested select. So I wrapped the query and ordered the result of that by created_at:
select * (select *
from
`twitter_posts`
where
`main_handle_id` in (
select
`twitter`.`main_handle_id`
from
`users`
inner join `twitter` on `twitter`.`user_id` = `user`.`id`
where
`users` LIKE 'foo'
)
)
order by created_at
Using explain on this one, DOES use the index to sort and returns the response in a fraction of what it takes with filesort. So I've "solved" my problem, but I don't understand why SQL would make a different choice based on the solution. As far as I can see, I'm just wrapping the result and ask everything of that result in a seemingly redundant statement...
Edit: I'm still unaware of why the optimizer started using the index on the created_at when using a redundant subquery, but in some cases it wasn't faster. I've now gone with the solution to add the statement "FORCE INDEX FOR ORDER BY created_at_index_name".

Don't use IN ( SELECT ... ); convert that to a JOIN. Or maybe converting to EXISTS ( SELECT ... ) may optimize better.
Please provide the EXPLAINs. Even if they are both the same. Provide EXPLAIN FORMAT=JSON SELECT ... if your version has such.
If the Optimizer is confused by a nested query, why add another nesting??
Which table is users in?
You mention user.id, yet there is no user table in FROM or JOIN.
What version of MySQL (or MariaDB) are you using? 5.6 and especially 5.7 have significant differences in the optimizer, with respect to MariaDB.
Please provide SHOW CREATE TABLE for each table mentioned in the query.
There are hints in your comments that the queries you presented are not the ones giving you trouble. Please do not ask about a query that does not itself demonstrate the problem.
Oh, is it users.name? Need to see the CREATE TABLEs to see if you have a suitable index.
Since MariaDB was mentioned, I am adding that tag. (This is probably the only way to get someone with specific MariaDB expertise involved.)

How to remove this filesort?

I have problem, my slow query is still using filesort.
I can not get rid of this extra flag. Could you please help me?
My query looks like this:
select `User`,
`Friend`
from `friends`
WHERE
(`User`='3053741' || `Friend`='3053741')
AND `Status`='1'
ORDER by `Recent` DESC;
Explain says this:
http://blindr.eu/stack/1.jpg
My table structure and indexed (it is in slovak language, but it should be the same):
http://blindr.eu/stack/2.jpg
A little help is needed, how to get rid of this filesort?

USING FILESORT does not mean, that MySQL is actually using a file. The naming is a bit "unlucky". The sorting is done in memory, as long as the data fits into memory. That said, have a look at the variable sort_buffer_size. But keep in mind, that this is a per session variable. Every connection to your server allocates this memory.
The other option is of course, to do the sorting on application level. You have no LIMIT clause anyway.
Oh, and you have too many indexes. Combined indexes can also be used when just the left-most column is used. So some of your indexes are redundant.
If performance is really that much of an issue, try another approach, cause the OR in the WHERE makes it hard, if not impossible, to get rid of using filesort via indexes. Try this query:
SELECT * FROM (
select "this is a user" AS friend_or_user, `User`, Recent
from `friends`
WHERE
`User`='3053741'
AND `Status`='1'
UNION ALL
select "this is a friend",
`Friend`, Recent
from `friends`
WHERE
`Friend`='3053741'
AND `Status`='1'
) here_needs_to_be_an_alias
ORDER by `Recent` DESC

MySQL performance of VIEW for tables combined with UNION ALL

Let's say I have 2 tables in MySQL:
create table `persons` (
`id` bigint unsigned not null auto_increment,
`first_name` varchar(64),
`surname` varchar(64),
primary key(`id`)
);
create table `companies` (
`id` bigint unsigned not null auto_increment,
`name` varchar(128),
primary key(`id`)
);
Now, very often I need to treat them the same, that's why following query:
select person.id as `id`, concat(person.first_name, ' ', person.surname) as `name`, 'person' as `person_type`
from persons
union all
select company.id as `id`, company.name as `name`, 'company' as `person_type`
from companies
starts to appear in other queries quite often: as part of joins or subselects.
For now, I simply inject this query into joins or subselects like:
select *
from some_table row
left outer join (>>> query from above goes here <<<) as `persons`
on row.person_id = persons.id and row.person_type = persons.person_type
But, today I had to use discussed union query into another query multiple times i.e. join it twice.
Since I never had experience with views and heard that they have many disadvantages, my question is:
Is it normal practice to create a view for discussed union query and use it in my joins , subselects etc? In terms of performance - will it be worse, equal or better comparing to just inserting it into joins, subselects etc? Are there any drawbacks of having a view in this case?
Thanks in advance for any help!

I concur with all of the points in Bill Karwin's excellent answer.
Q: Is it normal practice to create a view for discussed union query and use it in my joins, subselects etc?
A: With MySQL the more normal practices is to avoid using "CREATE VIEW" statement.
Q: In terms of performance - will it be worse, equal or better comparing to just inserting it into joins, subselects etc?
A: Referencing a view object will have the identical performance to an equivalent inline view.
(There might be a teensy-tiny bit more work to lookup the view object, checking privileges, and then replace the view reference with the stored SQL, vs. sending a statement that is just a teeny-tiny bit longer. But any of those differences are insignificant.)
Q: Are there any drawbacks of having a view in this case?
A: The biggest drawback is in how MySQL processes a view, whether it's stored or inline. MySQL will always run the view query and materialize the results from that query as a temporary MyISAM table. But there's no difference there whether the view definition is stored, or whether it's included inline. (Other RDBMSs process views much differently than MySQL).
One big drawback of a view is that predicates from the outer query NEVER get pushed down into the view query. Every time you reference that view, even with a query for a single id value, MySQL is going to run the view query and create a temporary MyISAM table (with no indexes on it), and THEN MySQL will run the outer query against that temporary MyISAM table.
So, in terms of performance, think of a reference to a view on par with "CREATE TEMPORARY TABLE t (cols) ENGINE=MyISAM" and "INSERT INTO t (cols) SELECT ...".
MySQL actually refers to an inline view as a "derived table", and that name makes a lot of sense, when we understand what MySQL is doing with it.
My personal preference is to not use the "CREATE VIEW" statement. The biggest drawback (as I see it) is that it "hides" SQL that is being executed. For the future reader, the reference to the view looks like a table. And then, when he goes to write a SQL statement, he's going to reference the view like it was a table, so very convenient. Then he decides he's going to join that table to itself, with another reference to it. (For the second reference, MySQL also runs that query again, and creates yet another temporary (and unindexed) MyISAM table. And now there's a JOIN operation on that. And then a predicate "WHERE view.column = 'foo'" gets added on the outer query.
It ends up "hiding" the most obvious performance improvement, sliding that predicate into the view query.
And then, someone comes along and decides they are going to create new view, which references the old view. He only needs a subset of rows, and can't modify the existing view because that might break something, so he creates a new view... CREATE VIEW myview FROM publicview p WHERE p.col = 'foo'.
And, now, a reference to myview is going to first run the publicview query, create a temporary MyISAM table, then the myview query gets run against that, creating another temporary MyISAM table, which the outer query is going to run against.
Basically, the convenience of the view has the potential for unintentional performance problems. With the view definition available on the database for anyone to use, someone is going to use it, even where it's not the most appropriate solution.
At least with an inline view, the person writing the SQL statement is more aware of the actual SQL being executed, and having all that SQL laid out gives an opportunity for tweaking it for performance.
My two cents.
TAMING BEASTLY SQL
I find that applying regular formatting rules (that my tools automatically do) can bend monstrous SQL into something I can read and work with.
SELECT row.col1
, row.col2
, person.*
FROM some_table row
LEFT
JOIN ( SELECT 'person' AS `person_type`
, p.id AS `id`
, CONCAT(p.first_name,' ',p.surname) AS `name`
FROM person p
UNION ALL
SELECT 'company' AS `person_type`
, c.id AS `id`
, c.name AS `name`
FROM company c
) person
ON person.id = row.person_id
AND person.person_type = row.person_type
I'd be equally likely to avoid the inline view at all, and use conditional expressions in the SELECT list, though this does get more unwieldy for lots of columns.
SELECT row.col1
, row.col2
, row.person_type AS ref_person_type
, row.person_id AS ref_person_id
, CASE
WHEN row.person_type = 'person' THEN p.id
WHEN row.person_type = 'company' THEN c.id
END AS `person_id`
, CASE
WHEN row.person_type = 'person' THEN CONCAT(p.first_name,' ',p.surname)
WHEN row.person_type = 'company' THEN c.name
END AS `name`
FROM some_table row
LEFT
JOIN person p
ON row.person_type = 'person'
AND p.id = row.person_id
LEFT
JOIN company c
ON row.person_type = 'company'
AND c.id = row.person_id

A view makes your SQL shorter. That's all.
It's a common misconception for MySQL users that views store anything. They don't (at least not in MySQL). They're more like an alias or a macro. Querying the view is most often just like running the query in the "expanded" form. Querying a view twice in one query (as in the join example you mentioned) doesn't take any advantage of the view -- it will run the query twice.
In fact, view can cause worse performance, depending on the query and how you use them, because they may need to store the result in a temporary table every time you query them.
See http://dev.mysql.com/doc/refman/5.6/en/view-algorithms.html for more details on when a view uses the temptable algorithm.
On the other hand, UNION queries also create temporary tables as they accumulate their results. So you're stuck with the cost of a temp table anyway.

SELECT with JOIN where joined row is NULL

I am trying to select rows from a table which don't have a correspondence in the other table.
For this purpose, I'm currently using LEFT JOIN and WHERE joined_table.any_column IS NULL, but I don't think that's the fastest way.
SELECT * FROM main_table mt LEFT JOIN joined_table jt ON mt.foreign_id=jt.id WHERE jt.id IS NULL
This query works, but as I said, I'm looking for a faster alternative.

Your query is a standard query for this:
SELECT *
FROM main_table mt LEFT JOIN
joined_table jt
ON mt.foreign_id=jt.id
WHERE jt.id IS NULL;
You can try this as well:
SELECT mt.*
FROM main_table mt
WHERE not exists (select 1 from joined_table jt where mt.foreign_id = jt.id);
In some versions of MySQL, it might produce a better execution plan.

In my experience with MSSQL the syntax used (usually) produces the exact same query plan as the WHERE NOT EXISTS() syntax, however this is mysql, so I can't be sure about performance!!
That said, I'm a much bigger fan of using the WHERE NOT EXISTS() syntax for the following reasons :
it's easier to read. If you speak a bit of English anyone can deduce the meaning of the query
it's more foolproof, I've seen people test for NULL on a NULL-able field
it can't have side effects like 'doubled-records' due to the JOIN. If the referenced field is unique there is no problem, but again I've seen situations where people chose 'insufficient keys' causing the main-table to get multiple hits against the joined table... and off course they solved it again using DISTINCT (aarrgg!!! =)
As for performance, make sure to have a (unique) index on the referenced field(s) and if possible put a FK-relationship between both tables. Query-wise I doubt you can squeeze much more out of it.
My 2 cents.

The query that you are running is usually the fastest option, just make sure that you have an index forh both mt.foreign_id and jt.id.
You mentioned that this query is more complex, so it might be possible that the problem is in another part of the query. You should check the execution plan to see what is wrong and fix it.

TSQL Multiple IN queries bad performance

Why queries like
delete from A where F1 IN (1,2,3,5,5) and F2 IN (7,9,10,11)
are so slow (F1 and F2 are indexed, stats updated) and how do you
optimize them?

Given your example, I'm not sure there's anything you could do to increase performance.
However, your example is simplistic, and if instead your example were using subqueries in the IN statements, then it would probably have room for improvement, perhaps by using an EXISTS instead or just joining. I think the meat of this question is probably about performance issues with IN statements though, right?
Your best tool when considering performance is to examine the explain plans of different solutions and see which one makes most sense for the amount and types of data you expect.
This SO post explains some about how an IN statement works...
SQL Server IN vs. EXISTS Performance
Here's a blog that also discusses performance factors...
http://sqlknowledgebank.blogspot.com/2012/11/in-exists-clause-and-their-performance.html

By guess is a dual loop
My sample is a select
It is a lot faster to optimize a select first
With a join on a PK the query optimizer has more to work with
But with the PK you cannot insert 5 twice
create table #tempF1 (ID int primary key);
insert into #tempF1 values (1),(2),(3),(4);
create table #tempF2 (ID int primary key);
insert into #tempF2 values (1),(2),(3),(5);
select *
from tbl
inner merge join #tempF1
on tbl.F1 = #tempF1.ID
inner merge join #tempF2
on tbl.F1 = #tempF2.ID
May not work in your situation and test other join hints and no hint
I use this technique on some big tables with complex queries where the query optimizer got stupid

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008