MySQL query caching of inner query - mysql

I have a large query with many nested SELECT statements. A simplified version might look like this:
SELECT * FROM tableA WHERE x IN(
SELECT * FROM tableB WHERE x IN(
SELECT * FROM tableC WHERE user_id = y
)
)
Crucially, the innermost statement starts off by looking at the user_id and selecting a list of id numbers to use in the rest of the query.
The problem I'm having is that even if two users have the same data in tableC, the rest of the query doesn't seem to be cached.
For example if SELECT * FROM tableC WHERE user_id = 1 returns (1,2,3,4,5)
and SELECT * FROM tableC WHERE user_id = 2 also returns (1,2,3,4,5)
If I run the full query with user_id = 1 the execution time is about 0.007 seconds. If I re-run the query, I get a reduced execution time of 0.002. If I change the user_id to 2 and run the query, the execution time goes back to 0.007 for the first time the query is run. Is it possible for mySQL to cache the result of the individual parts of a query?

It seems that you use mysql. So when you run query 'SELECT * FROM tableC WHERE user_id = 1' first time you get the result '1,2,3,4,5' and you query goes to query cache. Therefore the time execution after the second running is less than the first one. In this case your result is associated with your first query.
When you run the second query your server doesn't know anything about it. So it runs it and returns something(in your case results are identical). Next time when you run the second query you will get it from query cache and it will be significantly fast. Anyway the server will store two different records in query cache.

Related

MySQL Using IN() SubQuery Creates Much Longer Execution Time

What is the difference between the following? The first query takes 0.00 to execute, the second takes 0.00 to execute, the third takes 0.71 to execute. For some reason when I put the two queries together in example 3 it takes much longer to execute. Table traffic has an index on shortcode and ip, table redirect has an index on campaign and pubid.
Is there another type of index that could speed this scenario up?
Query 1: (0.00 Execution)
SELECT * FROM traffic
WHERE shortcode IN ('CODE1')
GROUP BY ip
Query 2: (0.00 Execution)
SELECT shortcode FROM redirect WHERE campaign = '385' AND pubid = '35'
Query 3: (0.71 Execution)
SELECT * FROM traffic
WHERE shortcode IN
(SELECT shortcode FROM redirect WHERE campaign = '385' AND pubid = '35')
GROUP BY ip
In older versions of MySQL, the IN ( SELECT ... ) construct was very poorly optimized. For every row in traffic it would re-execute the subquery. What version of MySQL are you using? The simple and efficient solution was to turn it into a JOIN.
SELECT t.*
FROM traffic AS t
JOIN redirect AS r USING(shortcode)
WHERE campaign = '385'
AND pubid = '35'
GROUP BY ip
You also need INDEX(campaign, pubid, shortcode) .
There is a "bug" in the either query -- You are asking for all columns, but grouping by only ip. If the rest of the columns are not really dependent on ip.

Count query time in Laravel?

I want to get first and last row in table, and i found 2 way to do it:
$first = DB::table('shops')->first();
$last = DB::table('shops')->orderBy('id','DESC')->first();
And:
$shops = DB::table('shops')->get();
$first2 = $shops[0];
$last2 = $shops[count($shops)-1];
My question is, which way perform faster with the same DB? or any way log query time?
DB maybe large, 1.000 rows, 10.000 rows, etc,...
For about 8000 rows, here are the results:
The first way:
[2016-03-01 19:14:11] local.DEBUG: select * from `shops` limit 1; in 1.27 ms
[2016-03-01 19:14:11] local.DEBUG: select * from `shops` order by `id` desc limit 1; in 3.04 ms
The second way:
local.DEBUG: select * from `shops`; in 188.98 ms
You can see that the second way totally slower than the first.
Because in the second way you have to get all records from shops table. It takes a lot of times.
For bigger data set, I think the second way will not work because of timeout in request.
Update:
Just for another experiment.
I try the third way to resolve your problem in one query like the following:
$shops = DB::table('shops')
->whereRaw('id = (SELECT MIN(id) from shops)')
->orWhereRaw('id = (Select MAX(id) from shops)')
->get();
And I compare with the first way. And here is the result:
# the 3rd way
[2016-03-01 19:51:56] local.DEBUG: select * from `shops` where id = (SELECT MIN(id) from shops) or id = (Select MAX(id) from shops); in 1.04 ms
# the 1st way
[2016-03-01 19:52:02] local.DEBUG: select * from `shops` limit 1; in 0.67 ms
[2016-03-01 19:52:02] local.DEBUG: select * from `shops` order by `id` desc limit 1; in 0.5 ms
It seems that with a subquery the query time is faster.
The difference between the timing on the queries will really come down to how quickly your machine can process them, and of course the number of rows it is working with.
I can tell you now, though, that the first solution is only grabbing one result in each query and so will be much quicker. You are fetching only two rows with two queries.
The second is loading all of your rows in to an array and then you are grabbing the first and last, with 1,000 to 10,000 rows this could take a fair bit of time.
You could always try run the queries manually and see just how long they take to load to get an actual difference.
You can get the exact SQL executed by adding toSql() to the end of the eloquent query, like so.
Like I commented: The first example will be way faster. Because you're only getting two records. The last example will get all the records, which in your example might be thousands of records. That's a huge difference in page loading time.
If you want to see how long your page loading is you might want to check barryvdh/laravel-debugbar. Which will give great information about all sorts of things, including page loading time.

Slow mysql update with sum

I have query like this
UPDATE linksupload as lu SET lu.progress = (SELECT SUM(l.submitted)
FROM links l
WHERE l.upload_id = lu.id)
It takes 10 sec to execute. linksupload contains 10 rows, links contains 700k rows.
Query:
UPDATE linksupload as lu SET lu.progress = (SELECT count(*)
FROM links l
WHERE l.upload_id = lu.id)
takes 0,0003 sek to execute. Also select with sum with group by from first query is fast. upload_id and id are indexes. Why first query takes so long time to execute? How to speed it up?
Indexes allow the database application to find data fast; without reading the whole table.
second query should be just count so it is not reading table. But first query should be sum of submitted column.So it is reading table.
First query should be slow.

Finding Total rows for query which uses Limit

I have a function which return me X rows, where X is user selected parameter. I know I can use SQLCALCFOUND_ROWS in query, but I have to use select foundrows(); immediate after. If use select foundrows() after the main query inside my cfquery tag, only the values of total rows are returned. If I use it in another cfquery, it is possible that there is another query in mysql thread and my results are not available. What would be the better way to handle this.
Ok, so I implemented it by wrapping my queries in a cftransaction. Please NOTE: I have to resort to this method only when my primary query is huge and/or running that as subquery to get total record count is not an option. MySql states that SELECT FOUND_ROWS() should run immediately after the query with SQL_CALC_FOUND_ROWS. If I run it in same cfquery tag, I cannot access data from my primary query. If I run another cfquery there is risk that query connection will be returned to pool. #Leigh mentioned that running both queries within a cftransaction ensures that the connection is retained.
Not the best solution, but much better than running a huge query twice.
<cftransaction>
<cfquery name="qry1" datasource="dsn">
select SQL_CALC_FOUND_ROWS col1,col2 from someTable
some complex joins
where a=b
</cfquery>
<cfquery name="qry2" datasource="dsn">
select FOUND_ROWS() as TotalRows;
</cfquery>
</cftransaction>
This answer depends very much on the complexity and time cost of your query. But in a simple example, you can just build a count into the initial query as a subselect. If you have a table of countries for example, you can do the following to get the top 10 and also the total number
select top 10
country_name
,(select COUNT(*) from country) as total
from
country
You could just do a query of the query object that's returned. Feed the name of the query as the datasource and you can run SQL on it. I believe you should be able to do a select count(*) and count the rows that are returned from the first query.

Mysql Order By after Group By causes very slow query

I can't figure out why the 'order by' clauses in Query2 below causes it to take over a minute while the first one returns results instantly. Is there a better way to do this 'order by'
Fast:
select c.id, max(date(a.sent)) as sent,
if(c.id in (select id from bin where (num=1 or num=2)),1,0) as done
from test c, test2 a
where c.id=a.id
group by c.id
limit 1;
Slow
select c.id, max(date(a.sent)) as sent,
if(c.id in (select id from bin where (num=1 or num=2)),1,0) as done
from test c, test2 a
where c.id=a.id
group by c.id
order by done, sent
limit 1;
It's because the "columns" in the order by clause are not real columns, but aliases for calculations elsewhere in the query. Thus, they aren't indexed, and the server has to order them on the fly. Using a join for the calculation of done, rather than a subquery, would likely speed this up a lot.
If you were bringing back all records, the sorting should not take much time, even though they are computed / non-indexed fields. However, you are using "Limit 1". This changes the approach of the optimizer.
In the first case, you are ordering by an ID. Since you have "limit 1" and the ID probably has an index, the optimizer can go ID by ID and when it gets one records that matches the WHERE clause it can return.
However, in the second query, even though you only want 1 record, the optimizer does not know which one that will be unless it computes the entire set (as if you did not have "limit 1") and then returns only the first one.
Take off the "LIMIT 1" and compare the two queries. If the difference remains, it may be a different problem.
It is difficult to say what would work best with your volumes. Try this query:
select id, max(date(sent)) as sent, 0 As done
from test2
where exists (select 1 from bin where bin.id=test2.id and num not in (1,2))
group by id
union all
select id, max(date(sent)) as sent, 1 As done
from test2
where exists (select 1 from bin where bin.id=test2.id and num in (1,2))
group by id
order by done, sent
limit 1
SQL Fiddle is here if you want to tweak it.
I left out the test table, because you were not bringing back any field except ID, which is already on test2. If you need other fields from test, you will have to tweak it.