have following query which runs slow,
It produces a list of ACTION records with "action.typeid=1"
and also counts if an ACTION record with "typeid=2" exists.
It is this count which is slowing things - it uses temporary and filesort!!!
can you help me find out how to improve.
EXPLAIN
SELECT
action.actionid
FROM
ACTION
INNER JOIN EVENT
ON action.eventid = event.eventid
LEFT JOIN
(SELECT
COUNT(1),
action.eventid
FROM
ACTION
WHERE (action.typeid = '2')
GROUP BY action.eventid) AS act
ON act.eventid = event.eventid
WHERE actiondate2 BETWEEN 20130601
AND 20131031
AND event.siteid = 1
AND action.typeid = 1
The following indexes exist
CREATE INDEX idx_cusid ON `event` (cusid);
CREATE INDEX idx_actiontypeid ON `action` (typeid);
CREATE INDEX idx_actioneventid ON `action` (eventid);
CREATE INDEX idx_actiondate ON `action` (actiondate2);
CREATE INDEX idx_eventsiteid ON `event` (siteid);
Sir, the requirements in the question are still unclear to me.
I am going to explain my confusion using examples.
Please take a look at a simple demo: http://sqlfiddle.com/#!2/19f52c/6
This demo contains a simplified (for a sake of clarity) database structure and queries.
The query from the question (the first query in the demo) returns the following results:
SELECT action.actionid
FROM ACTION
INNER JOIN EVENT
ON action.eventid = event.eventid
LEFT JOIN
(SELECT
COUNT(1),
action.eventid
FROM
ACTION
WHERE (action.typeid = '2')
GROUP BY action.eventid) AS act
ON act.eventid = event.eventid
WHERE -- actiondate2 BETWEEN 20130601 AND 20131031 AND
event.siteid = 1
AND action.typeid = 1
;
+ ------------- +
| actionid |
+ ------------- +
| 1 |
| 3 |
| 5 |
+ ------------- +
Hovever, in the above query the subquery with alias ACT is simply ... useless.
The query executes this subquery (consuming time and server resources), then ... ignores its results, just throws them away.
The above query is equivalent to the below query (the second query in the demo) - it returns identical results as the query from the question, but without using the subquery (saving time and resources, therefore will perform much better):
SELECT action.actionid
FROM
ACTION
INNER JOIN EVENT
ON action.eventid = event.eventid
WHERE -- actiondate2 BETWEEN 20130601 AND 20131031 AND
event.siteid = 1
AND action.typeid = 1
;
+ ------------- +
| actionid |
+ ------------- +
| 1 |
| 3 |
| 5 |
+ ------------- +
If your intent is to optimize the query show in your question - then please simply use the query shown above, this is the answer to your question.
However, looking at your comments about expected results, it appears that the query in the question is probably wrong - it doesn't give expected results.
Well, but it's still unclear what the query should give ? There are many possibilities, I'll show some of them below.
If you need to list all action.actionid with typeid = 1, but only such records, for which it exists any record with the same eventid and typeid = 2 ... then use the below query (the 3rd query in the demo):
SELECT a.actionid
FROM
ACTION a
INNER JOIN EVENT e
ON a.eventid = e.eventid
WHERE -- actiondate2 BETWEEN 20130601 AND 20131031 AND
EXISTS ( SELECT 1 FROM action a1
WHERE a1.eventid = a.eventid
AND a1.typeid = 2
)
AND e.siteid = 1
AND a.typeid = 1
;
+ ------------- +
| actionid |
+ ------------- +
| 1 |
| 3 |
+ ------------- +
This query uses the EXISTS operator, instead of COUNT() - if we want an information that same record exist, we don't need to count all of them! The count must read all of records, the EXISTS stops reading the table if it finds the first record that meets conditions - therefore EXISTS is usually faster than COUNT().
If you need to list all action.actionid with typeid = 1,and also display an information that some corresponding records exist with typeid = 2 - then use the below query (the fourth query in the demo):
SELECT
a.actionid ,
CASE WHEN EXISTS ( SELECT 1 FROM action a1
WHERE a1.eventid = a.eventid
AND a1.typeid = 2
)
THEN 'typeid=2 EXISTS'
ELSE 'typeid=2 DOESN''T EXIST'
END typeid2_exist
FROM
ACTION a
INNER JOIN EVENT e
ON a.eventid = e.eventid
WHERE -- actiondate2 BETWEEN 20130601 AND 20131031 AND
e.siteid = 1
AND a.typeid = 1
;
+ ------------- + ---------------------- +
| actionid | typeid2_exist |
+ ------------- + ---------------------- +
| 1 | typeid=2 EXISTS |
| 3 | typeid=2 EXISTS |
| 5 | typeid=2 DOESN'T EXIST |
+ ------------- + ---------------------- +
But if you really need to count corresponding records with typeid = 2 - then this query can help (the fifth query in the demo):
SELECT
a.actionid ,
( SELECT count(*) FROM action a1
WHERE a1.eventid = a.eventid
AND a1.typeid = 2
) typeid2_count
FROM
ACTION a
INNER JOIN EVENT e
ON a.eventid = e.eventid
WHERE -- actiondate2 BETWEEN 20130601 AND 20131031 AND
e.siteid = 1
AND a.typeid = 1
;
+ ------------- + ------------------ +
| actionid | typeid2_count |
+ ------------- + ------------------ +
| 1 | 1 |
| 3 | 1 |
| 5 | 0 |
+ ------------- + ------------------ +
If none of queries shown above meet your requirements, please show (basing on sample data in the demo) the results that the query should return - this helps someone in this forum to build a proper query that meets all your requirements.
Then, when we recognize the right query that meet all expectations, we can start to optimize it's performance.
Related
I have two tables, one is an index (or map) which helps when other when pulling queries.
SELECT v.*
FROM smv_ v
WHERE (SELECT p.network
FROM providers p
WHERE p.provider_id = v.provider_id) = 'RUU='
AND (SELECT p.va
FROM providers p
WHERE p.provider_id = v.provider_id) = 'MjU='
LIMIT 1;
Because we do not know the name of the column that holds the main data, we need to look it up, using the provider_id which is in both tables, and then query.
I am not getting any errors, but also no data back. I have spent the past hour trying to put this on sqlfiddle, but it kept crashing, so I just wanted to check if my code is really wrong, hence the crashing?
In the above example, I am looking in the providers table for column network, where the provider_id matches, and then use that as the column on smv.
I am sure i have done this before just like this, but after the weekend trying I thought i would ask on here.
Thanks in Advance.
UPDATE
Here is an example of the data:
THis is the providers, this links so no matter what the name of the column on the smv table, we can link them.
+---+---+---------------+---------+-------+--------+-----+-------+--------+
| | A | B | C | D | E | F | G | H |
+---+---+---------------+---------+-------+--------+-----+-------+--------+
| 1 | 1 | Home | network | batch | bs | bp | va | bex |
| 2 | 2 | Recharge | code | id | serial | pin | value | expire |
+---+---+---------------+---------+-------+--------+-----+-------+--------+
In the example above, G will mean in the smv column for recharge we be value. So that is what we would look for in our WHERE clause.
Here is the smv table:
+---+---+-----------+-----------+---+----+---------------------+-----+--+
| | A | B | C | D | E | F | value | va |
+---+---+-----------+-----------+---+----+---------------------+-----+--+
| 1 | 1 | X2 | Home | 4 | 10 | 2016-09-26 15:20:58 | | 7 |
| 2 | 2 | X2 | Recharge | 4 | 11 | 2016-09-26 15:20:58 | 9 | |
+---+---+-----------+-----------+---+----+---------------------+-----+--+
value in the same example as above would be 9, or 'RUU=' decoded.
So we do not know the name of the rows, until the row from smv is called, once we have this, we can look up what column name we need to get the correct information.
Hope this helps.
MORE INFO
At the point of triggering, we do not know what the row consists of the right data because some many of the fields would be empty. The map is there to help we query the right column, to get the right row (smv grows over time depending on whats uploaded.)
1) SELECT p.va FROM providers p WHERE p.network = 'Recharge' ;
2) SELECT s.* FROM smv s, providers p WHERE p.network = 'Recharge';
1) gives me the correct column I need to look up and query smv, using the above examples it would come back with "value". So I need to now look up, within the smv table, C = Recharge, and value = '9'. This should bring me back row 2 of the smv table.
So individually both 1 and 2 queries work, but I need them put together so the query is done on the database server.
Hope this gives more insight
Even More Info
From reading other posts, which are not really doing what I need, i have come up with this:
SELECT s.*
FROM (SELECT
(SELECT p.va
FROM dh_smv_providers p
WHERE p.provider_name = 'vodaphone'
LIMIT 1) AS net,
(SELECT p.bex
FROM dh_smv_providers p
WHERE p.provider_name = 'vodaphone'
LIMIT 1) AS bex
FROM dh_smv_providers) AS val, dh_smv_ s
WHERE s.provider_id = 'vodaphone' AND net = '20'
ORDER BY from_base64(val.bex) DESC;
The above comes back blank, but if i replace net, in the WHERE clause with a column I know exists, I do get the results expected:
SELECT s.*
FROM (SELECT
(SELECT p.va
FROM dh_smv_providers p
WHERE p.provider_name = 'vodaphone'
LIMIT 1) AS net,
(SELECT p.bex
FROM dh_smv_providers p
WHERE p.provider_name = 'vodaphone'
LIMIT 1) AS bex
FROM dh_smv_providers) AS val, dh_smv_ s
WHERE s.provider_id = 'vodaphone' AND value = '20'
ORDER BY from_base64(val.bex) DESC;
So what I am doing wrong, which is net, not showing the value derived from the subquery "value" ?
Thanks
SELECT
v.*,
p.network, p.va
FROM
smv_ v
INNER JOIN
providers p ON p.provider_id = v.provider_id
WHERE
p.network = 'RUU=' AND p.va = 'MjU='
LIMIT 1;
The tables talk to each other via the JOIN syntax. This completely circumvents the need (and limitations) of sub-selects.
The INNER JOIN means that only fully successful matches are returned, you may need to adjust this type of join for your situation but the SQL will return a row of all v columns where p.va = MjU and p.network = RUU and p.provider_id = v.provider_id.
What I was trying to explain in comments is that subqueries do not have any knowledge of their outer query:
SELECT *
FROM a
WHERE (SELECT * FROM b WHERE a)
AND (SELECT * FROM c WHERE a OR b)
This layout (as you have in your question) is that b knows nothing about a because the b query is executed first, then the c query, then finally the a query. So your original query is looking for WHERE p.provider_id = v.provider_id but v has not yet been defined so the result is false.
I am having problems with a specific query - respectively creating the query in the first place.
The columns can be reduced to id, seconds and status.
=============================
| id | seconds | status |
-----------------------------
| 0 | 0 | 0 |
| 1 | 12 | 1 |
| 2 | 25 | 0 |
| 3 | 37 | 1 |
| 4 | 42 | 0 |
=============================
What I'd like to have: All entries with status = 1 PLUS all entries that are less than 10 seconds away from those entries. Basically, I want to fetch all possible pairs (or triplets, etc.) of rows to check manually (later automatically) whether they need to be paired (there is a column parent_id for this purpose, but we don't need that for the query). I could do this in code (first select all status=1, then loop), but I wonder whether it is possible to do this purely in the database.
Thus, my desired output would be the following:
=============================
| id | seconds | status |
-----------------------------
| 1 | 12 | 1 | <- status = 1
| 3 | 37 | 1 | <- status = 1
| 4 | 42 | 0 | <- only 5 seconds after status = 1
=============================
My current best guess is this:
SELECT * FROM entries e0
WHERE
e0.status = 1 OR
e0.status = 0 AND
0 < (SELECT count(*)
FROM entries e1
WHERE e1.status = 1 AND abs(e1.seconds - e0.seconds) < 10)
But this fetches the whole table, and I don't really know why - and it takes a long time to do so (there is an index on the column seconds, the table has 9000 entries).
Is there a way to do this (maybe even effiently)?
Here's one option with union all and exists:
select * from entries where status = 1
union all
select * from entries e where status = 0 and
exists (select 1
from entries e2
where e2.status = 1 and
abs(e.seconds - e2.seconds) < 10
)
SQL Fiddle Demo
Alternatively you could use an outer join with distinct instead of exists:
select distinct e.*
from entries e
left join entries e2 on e2.status = 1
where e.status = 1 or abs(e.seconds - e2.seconds) < 10
More Fiddle
I prefer to do it in a single query. However there are also ways of doing it with exists or subqueries as well. Utilizing an outer join means you can grab everything at once with a nicely crafted where and join statements, adding a group by or distinct based on your performance situation will tidy up your results and make them unique rows.
My suggestion on where statements to ensure your intentions are met is to use parenthesis to establish your intended precedence. It will make your code clearer to your intentions.
WHERE Condition1 = True OR Condition2 = True AND Condition3 = True
Should be
WHERE Condition1 = True OR (Condition2 = True AND Condition3 = True)
Oddly, I would not have thought it would evaluate in the manner you mention because of past experience but then again I ALWAYS use parenthesis to establish my precedence to make it more clear and easier to craft more complex conditions.
Reason you are getting the whole table. Is because of the data in your table. Seriously, sometimes we go looking for the answer and make it complicated, I prefer my way of solving the query of yours but given your result set example my query and yours get the same results! Try changing the 10 seconds down to 1/2/3 etc and see what the effect of your query is. My assumption would be in your full dataset that your any record with a status of 0 is within 10 seconds of a record that has a status of 1...... I would have commented back but this is one of the first questions I have answered.
Here is some example code based on your dataset and query.
DECLARE #Entries AS TABLE (
Id INT
,Seconds INT
,[Status] BIT
)
INSERT INTO #Entries (Id, Seconds, [Status])
VALUES (0,0,0 )
,(1,12,1 )
,(2,25,0 )
,(3,37,1 )
,(4,42,0 )
SELECT *
FROM
#Entries e0
WHERE
e0.Status = 1
OR e0.Status = 0
AND 0 < (SELECT count(*)
FROM
#Entries e1
WHERE e1.Status = 1 AND ABS(e1.Seconds - e0.Seconds) < 10)
SELECT DISTINCT
e0.*
FROM
#Entries e0
LEFT JOIN #Entries e1
ON e1.[Status] = 1
AND ABS(e1.seconds - e0.seconds) < 10
WHERE
e0.[Status] = 1
OR e1.id IS NOT NULL
I have a table similar to the one shown below.
-----------------------------
JOB ID | parameter | result |
-----------------------------
1 | xyz | 10 |
1 | abc | 15 |
2 | xyz | 12 |
2 | abc | 8 |
2 | mno | 20 |
-----------------------------
I want the result as shown below.
parameter | result 1 | result 2 |
----------------------------------
xyz | 10 | 12 |
mno | NULL | 20 |
abc | 15 | 8 |
----------------------------------
My goal is to have a single table which can compare the result values of two different jobs. It can be two or more jobs.
you want to simulate a pivot table since mysql doesn't have pivots.
select
param,
max(case when id = 1 then res else null end) as 'result 1',
max(case when id = 2 then res else null end) as 'result 2'
from table
group by param
SQL FIDDLE TO PLAY WITH
If you are using MySQL there are no "outer join" need to use union right and left join:
Something like:
select t1.parameter, t1.result 'Result 1', t2.result 'Result 2' from
table as t1 left join table as t2
on t1.parameter=t2.parameter
where t1.'JOB ID' = 1 and t2.'JOB ID' = 2
union
select t1.parameter, t1.result 'Result 1', t2.result 'Result 2' from
table as t1 right join table as t2
on t1.parameter=t2.parameter
where t1.'JOB ID' = 1 and t2.'JOB ID' = 2
If the SQL with full outer join will make it more easier:
select t1.parameter, t1.result 'Result 1', t2.result 'Result 2' from
table as t1 outer join table as t2
on t1.parameter=t2.parameter
where t1.'JOB ID' = 1 and t2.'JOB ID' = 2
In Postgres, you can use something like:
select parameter, (array_agg(result))[1], (array_agg(result))[2] from my_table group by parameter;
The idea is: aggregate all the results for a given parameter into an array of results, and then fetch individual elements from those arrays.
I think that you can achieve something similar in MySQL by using GROUP_CONCAT(), although it returns a string instead of an array, so you cannot easily index it. But you can split by commas after that.
select q1.parameter, q2.result as r1, q3.result as r2
from
(select distinct parameter from temp2) q1
left join (select parameter, result from temp2 where job_id = 1) q2
on q1.parameter = q2.parameter
left join (select parameter, result from temp2 where job_id = 2) q3
on q1.parameter = q3.parameter;
It works, but it's not efficient. Still, since I'm gathering you are trying to solve something more complex than what's presented, this might help form your general solution.
While I'm at it, here's a slightly cleaner solution:
select distinct q1.parameter, q2.result as r1, q3.result as r2
from
temp2 q1
left join (select parameter, result from temp2 where job_id = 1) q2
on q1.parameter = q2.parameter
left join (select parameter, result from temp2 where job_id = 2) q3
on q1.parameter = q3.parameter;
I have three tables as following:
USERS TABLE
id_user| name |
---------------
1 | ...
2 | ...
SERVICES TABLE
id_service | name |
-------------------
1 | ...
2 | ...
3 | ...
USER_SERVICES TABLE (n-m)
id_user | id_service
--------------------
1 | 1
1 | 2
2 | 1
And I need to do a SELECT starting from "SELECT * FROM users" and then, getting the users by services. Ex. I need to get every user with services = 1 and services = 2 (and maybe he has other more services, but 1 and 2 for sure).
I did the following:
SELECT *
FROM `users`
INNER JOIN user_services ON users.id_user = user_services.id_user
WHERE id_service=1 AND id_service=2
But this, of course dont works since there is not a single record matching service = 1 and service = 2.
What can I do?
Add an extra join for the other service you want to check:-
SELECT *
FROM `users`
INNER JOIN user_services us1 ON users.id_user = us1.id_user AND us1.id_service=1
INNER JOIN user_services us2 ON users.id_user = us2.id_user AND us2.id_service=2
select t.*,
(select count(*) from user_services where id_user = t.id_user) how_much
from users t;
Is this what you want???
It shows the data of the users and how much services are in the services table. Other possibility is this:
select t.*,
(case when (select count(*)
from user_services where id_user = 1) > 0
then 'service1'
else 'null'
end) has_service_1
from users t;
The problem with this select is that you have to repeat this case...end as much times as id_services you have, so it doesn't make sense if the number of services is increasing over time. On the contrary, if it is a somewhat fixed number, and it is not a big number, this could be a solution.
I'd like to count how many occurrences of a value happen before a specific value
Below is my starting table
+-----------------+--------------+------------+
| Id | Activity | Time |
+-----------------+--------------+------------+
| 1 | Click | 1392263852 |
| 2 | Error | 1392263853 |
| 3 | Finish | 1392263862 |
| 4 | Click | 1392263883 |
| 5 | Click | 1392263888 |
| 6 | Finish | 1392263952 |
+-----------------+--------------+------------+
I'd like to count how many clicks happen before a finish happens.
I've got a very roundabout way of doing it where I write a function to find the last
finished activity and query the clicks between the finishes.
Also repeat this for Error.
What I'd like to achieve is the below table
+-----------------+--------------+------------+--------------+------------+
| Id | Activity | Time | Clicks | Error |
+-----------------+--------------+------------+--------------+------------+
| 3 | Finish | 1392263862 | 1 | 1 |
| 6 | Finish | 1392263952 | 2 | 0 |
+-----------------+--------------+------------+--------------+------------+
This table is very long so I'm looking for an efficient solution.
If anyone has any ideas.
Thanks heaps!
This is a complicated problem. Here is an approach to solving it. The groups between the "finish" records need to be identified as being the same, by assigning a group identifier to them. This identifier can be calculated by counting the number of "finish" records with a larger id.
Once this is assigned, your results can be calculated using an aggregation.
The group identifier can be calculated using a correlated subquery:
select max(id) as id, 'Finish' as Activity, max(time) as Time,
sum(Activity = 'Clicks') as Clicks, sum(activity = 'Error') as Error
from (select s.*,
(select sum(s2.activity = 'Finish')
from starting s2
where s2.id >= s.id
) as FinishCount
from starting s
) s
group by FinishCount;
A version that leverages user(session) variables
SELECT MAX(id) id,
MAX(activity) activity,
MAX(time) time,
SUM(activity = 'Click') clicks,
SUM(activity = 'Error') error
FROM
(
SELECT t.*, #g := IF(activity <> 'Finish' AND #a = 'Finish', #g + 1, #g) g, #a := activity
FROM table1 t CROSS JOIN (SELECT #g := 0, #a := NULL) i
ORDER BY time
) q
GROUP BY g
Output:
| ID | ACTIVITY | TIME | CLICKS | ERROR |
|----|----------|------------|--------|-------|
| 3 | Finish | 1392263862 | 1 | 1 |
| 6 | Finish | 1392263952 | 2 | 0 |
Here is SQLFiddle demo
Try:
select x.id
, x.activity
, x.time
, sum(case when y.activity = 'Click' then 1 else 0 end) as clicks
, sum(case when y.activity = 'Error' then 1 else 0 end) as errors
from tbl x, tbl y
where x.activity = 'Finish'
and y.time < x.time
and (y.time > (select max(z.time) from tbl z where z.activity = 'Finish' and z.time < x.time)
or x.time = (select min(z.time) from tbl z where z.activity = 'Finish'))
group by x.id
, x.activity
, x.time
order by x.id
Here's another method of using variables, which is somewhat different to #peterm's:
SELECT
Id,
Activity,
Time,
Clicks,
Errors
FROM (
SELECT
t.*,
#clicks := #clicks + (activity = 'Click') AS Clicks,
#errors := #errors + (activity = 'Error') AS Errors,
#clicks := #clicks * (activity <> 'Finish'),
#errors := #errors * (activity <> 'Finish')
FROM
`starting` t
CROSS JOIN
(SELECT #clicks := 0, #errors := 0) i
ORDER BY
time
) AS s
WHERE Activity = 'Finish'
;
What's similar to Peter's query is that this one uses a subquery that's returning all the rows, setting some variables along the way and returning the variables' values as columns. That may be common to most methods that use variables, though, and that's where the similarity between these two queries ends.
The difference is in how the accumulated results are calculated. Here all the accumulation is done in the subquery, and the main query merely filters the derived dataset on Activity = 'Finish' to return the final result set. In contrast, the other query uses grouping and aggregation at the outer level to get the accumulated results, which may make it slower than mine in comparison.
At the same time Peter's suggestion is more easily scalable in terms of coding. If you happen to have to extend the number of activities to account for, his query would only need expansion in the form of adding one SUM(activity = '...') AS ... per new activity to the outer SELECT, whereas in my query you would need to add a variable and several expressions, as well as a column in the outer SELECT, per every new activity, which would bloat the resulting code much more quickly.