merging SQL statements and how can it affect processing time

merging SQL statements and how can it affect processing time - mysql

Let's assume I have the following tables:
items table
item_id|view_count
item_views table
view_id|item_id|ip_address|last_view
What I would like to do is:
If last view of item with given item_id by given ip_address was 1+ hour ago I would like to increment view_count of item in items table. And as a result get the view count of item. How I will do it normally:
q = SELECT count(*) FROM item_views WHERE item_id='item_id' AND ip_address='some_ip' AND last_view < current_time-60*60
if(q==1) then q = UPDATE items SET view_count = view_count+1 WHERE item_id='item_id'
//and finally get view_count of item
q = SELECT view_count FROM items WHERE item_id='item_id'
Here I used 3 SQL queries. How can I merge it into one SQL query? And how can it affect the processing time? Will it be faster or slower than previous method?

I don't think your logic is correct for what you describe that you want. The query:
SELECT count(*)
FROM item_views
WHERE item_id='item_id' AND
ip_address='some_ip' AND
last_view < current_time-60*60
is counting the number of views longer ago than your time frame. I think you want:
last_view > current_time-60*60
and then have if q = 0 on the next line.
MySQL is pretty good with the performance of not exists, so the following should work well:
update items
set view_count = view_count+1
WHERE item_id='item_id' and
not exists (select 1
from item_views
where item_id='item_id' AND
ip_address='some_ip' AND
last_view > current_time-60*60
)
It will work much better with an index on item_views(item_id, ip_address, last_view) and an index on item(item_id).
In MySQL scripting, you could then write:
. . .
set view_count = (#q := view_count+1)
. . .
This would also give you the variable you are looking for.

update target
set target.view_count = target.view_count + 1
from items target
inner join (
select item_id
from item_views
where item_id = 'item_id'
and ip_address = 'some_ip'
and last_view < current_time - 60*60
) ref
on ref.item_id = target.item_id;
You can only combine the update statement with the condition using a join as in the above example; but you'll still need a separate select statement.
It may be slower on very large set and/or unindexed table.

Related

mysql is scanning table despite index

I have the following mysql query that I think should be faster. The database table has 1 million records and the query table 3.5 seconds
set #numberofdayssinceexpiration = 1;
set #today = DATE(now());
set #start_position = (#pagenumber-1)* #pagesize;
SELECT *
FROM (SELECT ad.id,
title,
description,
startson,
expireson,
ad.appuserid UserId,
user.email UserName,
ExpiredCount.totalcount
FROM advertisement ad
LEFT JOIN (SELECT servicetypeid,
Count(*) AS TotalCount
FROM advertisement
WHERE Datediff(#today,expireson) =
#numberofdayssinceexpiration
AND sendreminderafterexpiration = 1
GROUP BY servicetypeid) AS ExpiredCount
ON ExpiredCount.servicetypeid = ad.servicetypeid
LEFT JOIN aspnetusers user
ON user.id = ad.appuserid
WHERE Datediff(#today,expireson) = #numberofdayssinceexpiration
AND sendreminderafterexpiration = 1
ORDER BY ad.id) AS expiredAds
LIMIT 20 offset 1;
Here's the execution plan:
Here are the indexes defined on the table:
I wonder what I am doing wrong.
Thanks for any help

First, I would like to point out some problems. Then I will get into your Question.
LIMIT 20 OFFSET 1 gives you 20 rows starting with the second row.
The lack of an ORDER BY in the outer query may lead to an unpredictable ordering. In particular, the Limit and Offset can pick whatever they want. New versions will actually throw away the ORDER BY in the subquery.
DATEDIFF, being a function, makes that part of the WHERE not 'sargeable'. That is it can't use an INDEX. The usual way (which is sargeable) to compare dates is (assuming expireson is of datatype DATE):
WHERE expireson >= CURDATE() - INTERVAL 1 DAY
Please qualify each column name. With that, I may be able to advise on optimal indexes.
Please provide SHOW CREATE TABLE so that we can see what column(s) are in each index.

SQL subquery with reference to parent table gives "Unknown column 'test_results.id' in 'where clause'"

First off, take a look at diagram (this is an application for testing students knowledge)
I already have working application, which calculates score (in percents), but to sort by score, it is required to select all the records (of current test). And it drastically slows down app (~ 10 seconds of waiting). So I decided to move that logic into single sql query.
Now, my SQL query looks like this:
select test_results.*,
(
select test_result_total_score * 100 / test_result_total_max_score
from (
select (select sum(question_score)
from (
select question_total_right_answers = question_total_options as question_score
from (
select (
select count(*)
from answers
inner join answer_options on answer_options.id = answers.answer_option_id
where answers.asked_question_id = asked_questions.id
and answers.is_chosen = answer_options.is_right
) as question_total_right_answers,
(
select count(*)
from answers
left join answer_options on answer_options.id = answers.answer_option_id
where answers.asked_question_id = asked_questions.id
) as question_total_options
from asked_questions
where asked_questions.test_result_id = test_results.id
) as rigt_per_question
) as questions_scores) as test_result_total_score,
(select count(*)
from asked_questions
where asked_questions.test_result_id = test_results.id) as test_result_total_max_score
) as right_per_test_result
) as result_in_percents
from test_results
where test_results.id between 1 and 200;
Here is what it should do: for each asked question collect how many answer options there are (question_total_options) and how many answers user selected right (question_total_right_answers) - the very nested subqueries.
Then for each of this results calculate score (this is basically 1 if user selected all right options and 0 if at least one option is selected not right).
After that, we sum scores of all that questions (test_result_total_score - how many questions user answered right). Also, we calculate how many questions there are in test result (test_result_total_max_score).
With that information we can calculate percentage of right answered questions (test_result_total_score * 100 / test_result_total_max_score)
And the error is on lines 23 and 28:
where asked_questions.test_result_id = test_results.id
where asked_questions.test_result_id = test_results.id) as test_result_total_max_score
It says: [42S22][1054] Unknown column 'test_results.id' in 'where clause'
I have tried using variable #test_result_id like this:
select test_results.*,
#test_result_id := test_results.id,
( ... )
where asked_questions.test_result_id = #test_result_id
where asked_questions.test_result_id = #test_result_id) as test_result_total_max_score
And it evaluates, but in wrong way (probably because order of evaluation select values is undefined). BTW, all result_in_percents correspond to very first result.

For those facing similar problem, it seems that there is no simple solution.
First off, you can try rewrite your subqueries with joins as I did (see below). But when you would like to perform group operations on grouped results, you are really unhappy person). A "dirty" solution might be create function to overcome barrier of nesting subqueries.
create function test_result_in_percents(test_result_id bigint unsigned)
returns float
begin
return (
select sum(tmp.question_right) * 100 / count(*)
from (select sum(answers.is_chosen = answer_options.is_right) = count(*) as question_right
, asked_questions.test_result_id as test_result_id
from answers
inner join answer_options on answer_options.id = answers.answer_option_id
inner join asked_questions on asked_questions.id = answers.asked_question_id
where asked_questions.test_result_id = test_result_id
group by answers.asked_question_id
) as tmp
group by test_result_id
);
end;
And then, just use this function:
select (test_result_in_percents(test_results.id)) as `result_percents`
from `test_results`
where `test_results`.`test_id` = 181
and `test_results`.`test_id` is not null
order by `test_results`.`id` desc;

Query takes more than 40 seconds to execute

This query takes more than 40 seconds to execute on a table that has 200k rows
SELECT
my_robots.*,
(
SELECT count(id)
FROM hpsi_trading
WHERE estado <= 1 and idRobot = my_robots.id
) as openorders,
apikeys.apikey,
apikeys.apisecret
FROM my_robots, apikeys
WHERE estado <= 1
and idRobot = '2'
and ready = '1'
and apikeys.id = my_robots.idApiKey
and (my_robots.id LIKE '%0'
OR my_robots.id LIKE '%1'
OR my_robots.id LIKE '%2')
I know it is because of the count inside the query, but how could i fix this efficiently.
Edit: Explain
Thanks.

Use GROUP BY instead
SELECT my_robots.*,
count(id) as openorders,
apikeys.apikey,
apikeys.apisecret
FROM my_robots
JOIN apikeys ON apikeys.id = my_robots.idApiKey
LEFT JOIN hpsi_trading ON hpsi_trading.idRobot = my_robots.id and estado <= 1
WHERE estado <= 1 and
idRobot = '2' and
ready = '1' and
(
my_robots.id LIKE '%0' OR
my_robots.id LIKE '%1' OR
my_robots.id LIKE '%2'
)
GROUP BY my_robots.id, apikeys.apikey, apikeys.apisecret
Use explicit JOIN syntax. Some indexes will be needed to run it fast, however, the database structure is not clear from your post (and from your query as well).

The explain plan shows that the largest pain is selecting the data from the table hpsi_trading.
The challenge from the database's point of view is that the query contains a correlated subquery in the SELECT clause, which needs to be executed once for each result of the outer query (after filtering).
Replacing this subquery with a JOIN + GROUP BY will require MySQL to join between all these records (inflate) and only then deflate the data using GROUP BY, which might take time.
Instead, I would extract the subquery to a temporary table, which is grouped during creation, index it and join to it. That way, the subquery will run once, using a quick covering index, it will already group the data and only then join it to the other table.
This far, it's all pros. But, the con here is that extracting a subquery to a temporary table might require more effort on the development side.
Please try this version and let me know if it helped (if not, please provide a fresh EXPLAIN plan screenshot):
Creating the temp table:
CREATE TEMPORARY TABLE IF NOT EXISTS temp1 AS
SELECT idRobot, COUNT(id) as openorders
FROM hpsi_trading
WHERE estado <= 1
GROUP BY idRobot;
The modified query:
SELECT
my_robots.*,
temp1.openorders,
apikeys.apikey,
apikeys.apisecret
FROM
my_robots,
apikeys
LEFT JOIN temp1 on temp1.idRobot = my_robots.id
WHERE
estado <= 1 AND idRobot = '2'
AND ready = '1'
AND apikeys.id = my_robots.idApiKey
AND (my_robots.id LIKE '%0'
OR my_robots.id LIKE '%1'
OR my_robots.id LIKE '%2')
The indexes to add for this solution (I assumed from logic that estado, idRobot and ready are from the apikeys table. If that's not the case, let me know and I'll adjust the indexes):
ALTER TABLE `temp1` ADD INDEX `temp1_index_1` (idRobot);
ALTER TABLE `hpsi_trading` ADD INDEX `hpsi_trading_index_1` (idRobot, estado, id);
ALTER TABLE `apikeys` ADD INDEX `apikeys_index_1` (`idRobot`, `ready`, `id`, `estado`);
ALTER TABLE `my_robots` ADD INDEX `my_robots_index_1` (`idApiKey`);

Retrieving rows that have 2 columns matching and 1 different

Below is my table called 'datapoints'. I am trying to retrieve instances where there are different instances of 'sensorValue' for the same 'timeOfReading' and 'sensorNumber'.
For example:
sensorNumber sensorValue timeOfReading
5 5 6
5 5 6
5 6 10 <----same time/sensor diff value!
5 7 10 <----same time/sensor diff value!
Should output: sensorNumber:5, timeOfReading: 10 as a result.
I understand this is a duplicate question, in fact I have one of the links provided below for references - however none of the solutions are working as my query simply never ends.
Below is my SQL code:
SELECT table1.sensorNumber, table1.timeOfReading
FROM datapoints table1
WHERE (SELECT COUNT(*)
FROM datapoints table2
WHERE table1.sensorNumber = table2.sensorNumber
AND table1.timeOfReading = table1.timeOfReading
AND table1.sensorValue != table2.sensorValue) > 1
AND table1.timeOfReading < 20;
Notice I have placed a bound for timeOfReading as low as 20. I also tried setting a bound for both table1 and table 2 as well but the query just runs until timeout without displaying results no matter what I put...
The database contains about 700mb of data, so I do not think I can just run this on the entire DB in a reasonable amount of time, I am wondering if this is the culprit?
If so how could I properly limit my query to run a search efficiently? If not what am doing wrong that this is not working?
Select rows having 2 columns equal value
EDIT:
Error Code: 2013. Lost connection to MySQL server during query 600.000 sec
When I try to run the query again I get this error unless I restart
Error Code: 2006. MySQL server has gone away 0.000 sec

You can use a self-JOIN to match related rows in the same table.
SELECT DISTINCT t1.sensorNumber, t1.timeOfReading
FROM datapoints AS t1
JOIN datapoints AS t2
ON t1.sensorNumber = t2.sensorNumber
AND t1.timeOfReading = t2.timeOfReading
AND t1.sensorValue != t2.sensorValue
WHERE t1.timeOfReading < 20
DEMO
To improve performance, make sure you have a composite index on sensorNumber and timeOfReading:
CREATE INDEX ix_sn_tr on datapoints (sensorNumber, timeOfReading);

I think you have missed a condition. Add a not condition also to retrieve only instances with different values.
SELECT *
FROM new_table a
WHERE EXISTS (SELECT * FROM new_table b
WHERE a.num = b.num
AND a.timeRead = b.timeRead
AND a.value != b.value);

you can try this query
select testTable.* from testTable inner join (
SELECT sensorNumber,timeOfReading
FROM testTable
group by sensorNumber , timeOfReading having Count(distinct sensorValue) > 1) t
on
t.sensorNumber = testTable.sensorNumber and t.timeOfReading = testTable.timeOfReading;
here is sqlFiddle

This query will return the sensorNumber and the timeOfReading where there are different values of sensorValue:
select sensorNumber, timeOfReading
from tablename
group by sensorNumber, timeOfReading
having count(distinct sensorValue)>1
and this will return the actual records:
select t.*
from
tablename t inner join (
select sensorNumber, timeOfReading
from tablename
group by sensorNumber, timeOfReading
having count(distinct sensorValue)>1
) d on t.sensorNumber=d.sensorNumber and t.timeOfReading=d.timeOfReading
I would suggest you to add an index on sensorNumber, timeOfReading
alter table tablename add index idx_sensor_time (sensorNumber, timeOfReading)

Update with SELECT and group without GROUP BY

I have a table like this (MySQL 5.0.x, MyISAM):
response{id, title, status, ...} (status: 1 new, 3 multi)
I would like to update the status from new (status=1) to multi (status=3) of all the responses if at least 20 have the same title.
I have this one, but it does not work :
UPDATE response SET status = 3 WHERE status = 1 AND title IN (
SELECT title FROM (
SELECT DISTINCT(r.title) FROM response r WHERE EXISTS (
SELECT 1 FROM response spam WHERE spam.title = r.title LIMIT 20, 1)
)
as u)
Please note:
I do the nested select to avoid the famous You can't specify target table 'response' for update in FROM clause
I cannot use GROUP BY for performance reasons. The query cost with a solution using LIMIT is way better (but it is less readable).
EDIT:
It is possible to do SELECT FROM an UPDATE target in MySQL. See solution here
The issue is on the data selected which is totaly wrong.
The only solution I found which works is with a GROUP BY:
UPDATE response SET status = 3
WHERE status = 1 AND title IN (SELECT title
FROM (SELECT title
FROM response
GROUP BY title
HAVING COUNT(1) >= 20)
as derived_response)
Thanks for your help! :)

MySQL doesn't like it when you try to UPDATE and SELECT from the same table in one query. It has to do with locking priorities, etc.
Here's how I would solve this problem:
SELECT CONCAT('UPDATE response SET status = 3 ',
'WHERE status = 1 AND title = ', QUOTE(title), ';') AS sql
FROM response
GROUP BY title
HAVING COUNT(*) >= 20;
This query produces a series of UPDATE statements, with the quoted titles that deserve to be updated embedded. Capture the result and run it as an SQL script.
I understand that GROUP BY in MySQL often incurs a temporary table, and this can be costly. But is that a deal-breaker? How frequently do you need to run this query? Besides, any other solutions are likely to require a temporary table too.
I can think of one way to solve this problem without using GROUP BY:
CREATE TEMPORARY TABLE titlecount (c INTEGER, title VARCHAR(100) PRIMARY KEY);
INSERT INTO titlecount (c, title)
SELECT 1, title FROM response
ON DUPLICATE KEY UPDATE c = c+1;
UPDATE response JOIN titlecount USING (title)
SET response.status = 3
WHERE response.status = 1 AND titlecount.c >= 20;
But this also uses a temporary table, which is why you try to avoid using GROUP BY in the first place.

I would write something straightforward like below
UPDATE `response`, (
SELECT title, count(title) as count from `response`
WHERE status = 1
GROUP BY title
) AS tmp
SET response.status = 3
WHERE status = 1 AND response.title = tmp.title AND count >= 20;
Is using GROUP BY really that slow ? The solution you tried to implement looks like requesting again and again on the same table and should be way slower than using GROUP BY if it worked.

This is a funny peculiarity with MySQL - I can't think of a way to do it in a single statement (GROUP BY or no GROUP BY).
You could select the appropriate response rows into a temporary table first then do the update by selecting from that temp table.

you'll have to use a temporary table:
create temporary table r_update (title varchar(10));
insert r_update
select title
from response
group
by title
having count(*) < 20;
update response r
left outer
join r_update ru
on ru.title = r.title
set status = case when ru.title is null then 3 else 1;

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

merging SQL statements and how can it affect processing time - mysql

Related

mysql is scanning table despite index

SQL subquery with reference to parent table gives "Unknown column 'test_results.id' in 'where clause'"

Query takes more than 40 seconds to execute

Retrieving rows that have 2 columns matching and 1 different

Update with SELECT and group without GROUP BY

Categories

Resources