speed select vs join - mysql

The two quires below do the same thing. Basically show all the id's of table 1, which are present in table 2. The thing which puzzles me is that the simple select is way way faster than the JOIN, I would have expected that the JOIN is a bit slower, but not by that much...5 seconds vs. 0.2
Can anyone elaborate on this ?
SELECT table1.id FROM
table1,table2 WHERE
table1.id=table2.id
Duration/Fetch 0.295/0.028 (MySql Workbench 5.2.47)
SELECT table1.id
FROM table1
INNER JOIN table2
ON table1.id=table2.id
Duration/Fetch 5.035/0.027 (MySql Workbench 5.2.47)

Q: Can anyone elaborate on this?
A: Before we go the "a bug in MySQL" route that #a_horse_with_no_name seems impatient to race down, we'd really need to ensure that this is repeatable behavior, and isn't just a quirk.
And to do that, we'd really need to see the elapsed time result from more than one run of the query.
If the query cache is enabled on the server, we want to run the queries with the SQL_NO_CACHE hint added (SELECT SQL_NO_CACHE table1.id ...) so we know we aren't retrieving cached results.
I'd repeat the execution of each query at least three times, and throw out the result from the first run, and average the other runs. (The purpose of this is to eliminate the impact of the table data not being in the cache, either InnoDB buffer, or the filesystem cache.)
Also, run an EXPLAIN SELECT ... for each query. And compare the access plans.
If either of these tables is MyISAM storage engine, note that MyISAM tables are subject to locking by DML operations; while an INSERT, UPDATE or DELETE operation is run on the table, the SELECT statements will be blocked from accessing the table. (But five seconds seems a bit much for that, unless these are really large tables, or really inefficient DML statements).
With InnoDB, the SELECT queries won't be blocked by DML operations.
Elapsed time is also going to depend on what else is going on on the system.
But the total elapsed time is going include more than just the time in the MySQL server. Temporarily turning on the MySQL general_log would allow you to capture the statements that are actually being processed by the server.

This looks like something that could be further optimized by the database engine if indeed you are running both queries under the exact same context.
SQL is declarative. By successfully declaring what you want, the engine has free reign to restructure the "How" of your request to bring back the fastest result.
The earliest versions of SQL didn't even have the keyword JOIN. There was only the comma.
There are many coding constructs in SQL that imperatively force a single inferior methodology over another and they should be avoided. JOIN shouldn't be avoided. Something sounds a miss. JOIN is the core element of SQL. It would be a shame to always have to use commas.
There are a zillion factors that go into the performance of a JOIN all based your environment, schema, and data. Chances are that your table1 and table2 represent a fringe case that may have gotten past the optimization algorithms.

The SQL_NO_CACHE worked, the new results are:
Duration/Fetch 5.065 / 0.027 for the select where and
Duration/Fetch 5.050 / 0.027 for the join
I would have thought that the "select where" would be faster, but the join was actually a tad swifter. But the difference is negligible
I would like to thank everyone for their response.

Related

MySql count(*) super slow with concurrent queries

I have a script that tries to read all the rows from a table like this:
select count(*) from table where col1 = 'Y' or col1 is null;
col1 and col2 are not indexed and this query usually takes ~20 seconds but if someone is already running this query, it takes ages and gets blocked.
We just have around 100k rows in the table and I tried it without the where clause and it causes the same issue.
The table uses InnoDB so, it doesn't store the exact count but I am curious if there is any concurrency parameter I should look into. I am not sure if absence of indexes on the table causes the issue but it doesn't make sense to me.
Thanks!
If they are not indexed, then it is required to read the entire disk files of your tables to find your data. A single hard disk cannot perform very well concurrent read intensive operations. You have to index.
It looks like your SELECT COUNT(*)... query is being serialized with other operations on your table. Unless you tell the MySQL server otherwise, your query will do its best to be very precise.
Try changing the transaction isolation level by issuing this command immediately before your query.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
Setting this enables so-called dirty reads, which means you might not count everything in the table that changes during your operation. But that probably will not foul up your application too badly.
(Adding appropriate indexes is always a good idea, but not the cause of the problem you ask about.)

MySQL JOIN Keeps Timing Out

I am currently trying to run a JOIN between two tables in a local MySQL database and it's not working. Below is the query, I am even limiting the query to 10 rows just to run a test. After running this query for 15-20 minutes, it tells me "Error Code" 2013. Lost connection to MySQL server during query". My computer is not going to sleep, and I'm not doing anything to interrupt the connection.
SELECT rd_allid.CreateDate, rd_allid.SrceId, adobe.Date, adobe.Id
FROM rd_allid JOIN adobe
ON rd_allid.SrceId = adobe.Id
LIMIT 10
The rd_allid table has 17 million rows of data and the adobe table has 10 million. I know this is a lot, but I have a strong computer. My processor is an i7 6700 3.4GHz and I have 32GB of ram. I'm also running this on a solid state drive.
Any ideas why I cannot run this query?
"Why I cannot run this query?"
There's not enough information to determine definitively what is happening. We can only make guesses and speculations. And offer some suggestions.
I suspect MySQL is attempting to materialize the entire resultset before the LIMIT 10 clause is applied. For this query, there's no optimization for the LIMIT clause.
And we might guess that there is not a suitable index for the JOIN operation, which is causing MySQL to perform a nested loops join.
We also suspect that MySQL is encountering some resource limitation which is causing the session to be terminated. Possibly filling up all space in /tmp (that usually throws an error, something like "invalid/corrupted myisam table '#tmpNNN'", something of that ilk. Or it could be some other resource constraint. Without doing an analysis, we're just guessing.
It's possible MySQL wrote something to the error log (hostname.err). I'd check there.
But whatever condition MySQL is running into (the answer to the question "Why I cannot run this query")
I'm seriously questioning the purpose of the query. Why is that query being run? Why is returning that particular resultset important?
There are several possible queries we could execute. Some of those will run a long time, and some will be much more performant.
One of the best ways to investigate query performance is to use MySQL EXPLAIN. That will show us the query execution plan, revealing the operations that MySQL will perform, and in what order, and indexes will be used.
We can make some suggestions as to some possible indexes to add, based on the query shown e.g. on adobe (id, date).
And we can make some suggestions about modifications to the query (e.g. adding a WHERE clause, using a LEFT JOIN, incorporate inline views, etc. But we don't have enough of a specification to recommend a suitable alternative.
You can try something like:
SELECT rd_allidT.CreateDate, rd_allidT.SrceId, adobe.Date, adobe.Id
FROM
(SELECT CreateDate, SrceId FROM rd_allid ORDER BY SrceId LIMIT 1000) rd_allidT
INNER JOIN
(SELECT Id FROM adobe ORDER BY Id LIMIT 1000) adobeT ON adobeT.id = rd_allidT.SrceId;
This may help you get a faster response times.
Also if you are not interested in all the relation you can also put some WHERE clauses that will be executed before the INNER JOIN making the query faster also.

Improve this update query

So I've been struggling to run this query. It takes a really long time.
Its MySQL Innodb. The fields I am using are indexed. Its on a pretty beefy server with around 10gig allocated to the innodb pool config thing.
UPDATE TEMP_account_product p
JOIN products_temp c ON (c.`some_id` = p.`old_someid`)
SET p.`product` = c.id
WHERE p.product IS NULL;
The thing to note here is that both tables contain around 900,000 rows. this line brings back around 800,000 records (WHERE p.product IS NULL;)
I have a feeling I'm kinda screwed here but thought Id try anyway.
I think that possible reasons of slow execution of such type of request can be:
MOST probable - you have an INDEX(es) on updated field and that request is updating lot of rows - in that case MySQL will need to do a lot of work rebuilding that INDEX(es) during UPDATE. In that case just DROP the INDEX(es) before request, and later recreate it (if needed).
JOIN is slow (you can check it by select with that JOIN) - i.e. join is done w/o INDEXES. Add indexes in that case.
slow filtering of WHERE (i.e. MySQL make a full scan to filter), - you can check how fast it is by select with same filter.
I suggest running it in batches, so that you don't need to rely on the query plan to decide not to being the entire result set into memory before it starts doing the updates. Add something like LIMIT 1000 to the query, and then run it until the number of affected rows is zero (technique depends on your environment, but I think it could be done in SQL).
UPDATE, this is not a valid option (as-is).
Sure enough, I overlooked this in the UPDATE docs:
For the multiple-table syntax ... In this case, ORDER BY and LIMIT cannot be used.

Determining which columns to index in MySQL in CakePHP

I have a web app, which has quite a few queries being fired from every page. As more data was added to the DB, we noticed that the pages were taking longer and longer to load.
On examining PhpMyAdmin -> Status -> Joins, we noticed this (with the number in red):
Select_full_join 348.6 k The number of joins that do not use indexes. If this value is not 0, you should carefully check the indexes of your tables.
How do I determine which joins are causing the problems? Are all the joins equally to be blamed?
How do I determine which columns should be indexed, for the performance to be proper?
We are using CakePHP + MySQL, and the queries are all auto-generated.
The rule of thumb that I have always used, is that if I am using join, the fields that I am joining on need to be indexed.
For instance, if you have a query like the following:
SELECT t1.name, t2.salary
FROM employee AS t1
INNER JOIN info AS t2 ON t1.name = t2.name;
Both t1.name and t2.name should be indexed.
Below are some good reads for this as well:
Optimizing MySQL: Importance of JOIN Order
How to optimize MySQL JOIN queries through indexing
And in general, this guy's site has some good info as well.
MySQL Optimizer Team
Edit: This is always helpful.
And if you have access to your server settings, check out:
MySQL Slow Server Logs
Once you have a log of slow queries, you can use explain on them to see what needs indexing.
If you don't know which queries are running inefficiently, you have a couple of choices.
You could try this:
Try issuing the command SHOW FULL PROCESSLIST from phpmyadmin while your web site is active. It will show you, hopefully, a bunch of slow running queries. The FULL processlist should give you the entire query. You could then use the EXPLAIN command to figure out what it's doing.
You should also try this:
Think through the work your application is doing on behalf of your users. Think through which of your queries have to romp through lots of data to deliver value to the users. Think through which tables are growing as your application gets used more and more.
Then, find your queries that deliver that value, and that access your growing tables. Again, use the EXPLAIN command to see how MySQL is processing them, and add indexes as needed.
I suspect it will be very obvious which indexes you should add. Add the obvious ones, then let your system stabilize for a couple of workdays, then remeasure.
Notice that this is a normal part of bringing a new application into production.

Does MySQL record how often it uses indices?

I've got a table (InnoDB) with a fair number of indices. It would be great to know if one (or more) of these was never actually used. I don't care as much about the disk space, but insertion speed is sometimes an issue.
Does MySQL record any statistics on how often it has used each index when running queries?
There is a way to find unused indexes with a small patch to MySQL.
I'd suggest turning on profiling and doing some bottleneck analysis.
set profiling=1;
After which point you can let some of your heavy queries run for awhile. Eventually, you turn it off and than examine the queries that ran to see which were heaviest in execution time. If you don't see any
Some other commands to note are:
show profiles;
select sum(duration) from information_schema.profiling where query_id=<ID you want to look at>;
show profile for query <you want to look at>;
Finally if you wish to see unused indexes and you have userstats enabled
SELECT DISTINCT s.TABLE_SCHEMA, s.TABLE_NAME, s.INDEX_NAME
FROM information_schema.statistics `s` LEFT JOIN information_schema.index_statistics INDXS
ON (s.TABLE_SCHEMA = INDXS.TABLE_SCHEMA AND
s.TABLE_NAME=INDXS.TABLE_NAME AND
s.INDEX_NAME=INDXS.INDEX_NAME)
WHERE INDXS.TABLE_SCHEMA IS NULL;
AFAIK, it only records total count of index usages.
SHOW STATUS; has the Select_scan column.
The plans for queries don't vary much - turn on the (slow) query logging with a threshold of 0 seconds, then write a bit of code in the language of your choice to parse the log files and strip all the literals out the queries, then do explain plans for any query executed more then 'N' times.
C.