Improving this MySQL Query - Select as sub-query - mysql

I have this query
SELECT
shot.hole AS hole,
shot.id AS id,
(SELECT s.id FROM shot AS s
WHERE s.hole = shot.hole AND s.shot_number > shot.shot_number AND shot.round_id = s.round_id
ORDER BY s.shot_number ASC LIMIT 1) AS next_shot_id,
shot.distance AS distance_remaining,
shot.type AS hit_type,
shot.area AS onto
FROM shot
JOIN course ON shot.course_id = course.id
JOIN round ON shot.round_id = round.id
WHERE round.uID = 78
This returns 900~ rows in around 0.7 seconds. This is OK-ish, but there are more lines like this required
(SELECT s.id FROM shot AS s
WHERE s.hole = shot.hole AND s.shot_number > shot.shot_number AND shot.round_id = s.round_id
ORDER BY s.shot_number ASC LIMIT 1) AS next_shot_id,
For example
(SELECT s.id FROM shot AS s
WHERE s.hole = shot.hole AND s.shot_number < shot.shot_number AND shot.round_id = s.round_id
ORDER BY s.shot_number ASC LIMIT 1) AS past_shot_id,
Adding this increases the load time to 10s of seconds which is far too long and the page often doesn't load at all or MySQL just locks up and using show processlist shows that the query is just sat there sending data.
Removing the ORDER BY s.shot_number ASC clause in those sub queries reduces the query time down to 0.05 seconds which is much much better. But the ORDER BY is required to ensure that the next or past row (shot) is returned, rather than any old random row.
How can I improve this query to make it run faster and return the same results. Perhaps my approach for obtaining the next and past rows is sub optimal and I need to look at a different way of returning those next and previous row IDs?
EDIT - additional background info
The query was fine on my testing domain, a subdomain. But when moved to the live domain the issues started. Hardly anything was changed yet the whole site came to halt because of these new slow queries. Key notes:
Different domain
Different folder in /var/www
Same DB
Same DB credentials
Same code
Added indexes in an attempt to fix - this didn't help
Could any of these affected the load time?

This will get marked down in a minute for 'not being an answer', but it illustrates a possible solution without simply handing it to you on a plate....
SELECT * FROM ints;
+---+
| i |
+---+
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
+---+
SELECT x.i, MIN(y.i) FROM ints x LEFT JOIN ints y ON y.i > x.i GROUP BY x.i;
+---+----------+
| i | MIN(y.i) |
+---+----------+
| 0 | 1 |
| 1 | 2 |
| 2 | 3 |
| 3 | 4 |
| 4 | 5 |
| 5 | 6 |
| 6 | 7 |
| 7 | 8 |
| 8 | 9 |
| 9 | NULL |
+---+----------+

To expand on Strawberry's answer, doing additional left-join for a "pre-query" to get all the prior / next IDs, then join out to get whatever details you need.
select
Shot.ID,
Shot.Hole,
Shot.Distance as Distance_Remaining,
Shot.Type as Hit_Type,
Shot.Area as Onto
PriorShot.Hole as PriorHole,
PriorShot.Distance as PriorDistanceRemain,
NextShot.Hole as NextHole,
NextShot.Distance as NextDistanceRemain
from
( SELECT
shot.id,
MIN(nextshot.id) as NextShotID,
MAX(priorshot.id) as PriorShotID
FROM
round
JOIN shot
on round.id = shot.round_id
LEFT JOIN shot nextshot
ON shot.round_id = nextshot.round_id
AND shot.hole = nextshot.hole
AND shot.shot_number < nextshot.shot_number
LEFT JOIN shot priorshot
ON shot.round_id = priorshot.round_id
AND shot.hole = priorshot.hole
AND shot.shot_number > priorshot.shot_number
WHERE
round.uID = 78
GROUP BY
shot.id ) AllShots
JOIN Shot
on AllShots.id = Shot.ID
LEFT JOIN shot PriorShot
on AllShots.PriorShotID = PriorShot.ID
LEFT JOIN shot NextShot
on AllShots.NextShotID = NextShot.ID
The inner query gets only those for round.uID = 78, then you can join to the next / prior as needed. I did not add the joins to the course and round tables as no result columns were presented, but could easily be added.

I wonder how well the following performs. It replaces the joining operations with string operations.
SELECT shot.hole AS hole, shot.id AS id,
substring_index(substring_index(shots, ',', find_in_set(shot.id, ss.shots) + 1), ',', -1
) as nextsi,
substring_index(substring_index(shots, ',', find_in_set(shot.id, ss.shots) - 1), ',', -1
) as prevsi,
shot.distance AS distance_remaining, shot.type AS hit_type, shot.area AS onto
FROM shot JOIN
course
ON shot.course_id = course.id JOIN
round
ON shot.round_id = round.id join
(select s.round_id, s.hole, group_concat(s.id order by s.shot_number) as shots
from shot s
group by s.round_id, s.hole
) ss
on ss.round_id = shot.round_id and ss.hole = shot.hole
WHERE round.uID = 78
Note that this doesn't work fully -- it will produce erroneous results on the first and last shot. I'm wondering how the performance is before fixing those details.

Related

How can I count rows in a 1:N:N relation in a faster way?

This question is a bit complicated to me, and I can't explain it in one sentence so the title may seem quite ambiguous.
I have 3 tables in my MySQL database, their structure is shown below:
word_list (5 million rows)
+-----+--------+
| wid | word |
+-----+--------+
| 1 | foo |
| 2 | bar |
| 3 | hello |
+-----+--------+
paper_word_relation (10 million rows)
+-----+-------+
| pid | word |
+-----+-------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 1 |
| 2 | 3 |
+-----+-------+
paper_citation_relation (80K rows)
+----------+--------+
| pid_from | pid_to |
+----------+--------+
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 2 | 1 |
| 2 | 3 |
+----------+--------+
I want to find out how many papers contain word W, and cite the papers also contain word W.(for each word in the list)
I use two inner join to do this job but it seems extremely slow when the word is popular - above 50s (quite fast if the word is rarely used - below 0.1s), here is my code
SELECT COUNT(*) FROM (
SELECT a.pid_from, a.pid_to, b.word FROM paper_citation_relation AS a
INNER JOIN paper_word_relation AS b ON a.pid_from = b.pid
INNER JOIN paper_word_relation AS c ON a.pid_to = c.pid
WHERE b.word = 2 AND c.word = 2) AS d
How can I do this faster? Is my query not efficient enough or it's the problem about the amount of data?
I can only come up with one solution that I delete the words which occur less than 2 in the paper_word_relation table. (About 4 million words only occur once)
Thanks!
If you are only concerned with getting the Count, you should not be first getting the results into a Derived Table, and then Count the rows out. This may create unnecessary temporary tables storing lots of data in-memory. You can directly count the number of rows.
I also think that you need to count unique number of papers. Because of Many-to-Many relationships in paper_citation_relation table, duplicate rows may be coming for a single paper.
SELECT COUNT(DISTINCT a.pid_from)
FROM paper_citation_relation AS a
INNER JOIN paper_word_relation AS b ON a.pid_from = b.pid
INNER JOIN paper_word_relation AS c ON a.pid_to = c.pid
WHERE b.word = 2 AND c.word = 2
For performance, you will need following indexing:
Composite Index on (pid_from, pid_to) in the paper_citation_relation table.
Composite Index on (pid, word) in the paper_word_relation table.
We may also possibly optimize the query further by reducing one join and use conditional AND/OR based filtering in HAVING. You will need to benchmark it though.
SELECT COUNT(*)
FROM (
SELECT a.pid_from
FROM paper_citation_relation AS a
INNER JOIN paper_word_relation AS b
ON (a.pid_from = b.pid OR
a.pid_to = b.pid)
GROUP BY a.pid_from
HAVING SUM(a.pid_from = b.pid AND b.word = 2) AND
SUM(a.pid_to = b.pid AND b.word = 2)
)
After the first 1:n join you get the same pid_to multiple times and your next join is no longer 1:n but n:m, creating a possibly huge intermediate result before the final DISTINCT. It's similar to a CROSS JOIN and it's getting worse for popular words, e.g. 10*10 vs. 1000*1000 rows.
You must remove the duplicates before the join, this should return the same number as #MadhurBhaiya's answer
SELECT Count(*) -- no more DISTINCT needed
FROM
(
SELECT DISTINCT cr.pid_to -- reducing m to 1
FROM paper_citation_relation AS cr
JOIN paper_word_relation AS wr
ON cr.pid_from = wr.pid
WHERE wr.word = 2
) AS dt
JOIN paper_word_relation AS wr
ON dt.pid_to = wr.pid -- 1:n join again
WHERE wr.word = 2
If you want to count the number of papers which have been cited you need to get a distinct list of pid (either pid_from or pid_to) from paper_citation_relation first and then join to the specific word.
SELECT Count(*)
FROM
( -- get a unique list of cited or citing papers
SELECT pid_from AS pid -- citing
FROM paper_citation_relation
UNION -- DISTINCT by default
SELECT pid_to -- cited
FROM paper_citation_relation
) AS dt
JOIN paper_word_relation AS wr
ON wr.pid = dt.pid
WHERE wr.word = 2 -- now check for the searched word
The number returned by this might be slightly higher (it counts a paper regardless if cited or citing).

Issue with mysql query that calls column name from another table

I have two tables, one is an index (or map) which helps when other when pulling queries.
SELECT v.*
FROM smv_ v
WHERE (SELECT p.network
FROM providers p
WHERE p.provider_id = v.provider_id) = 'RUU='
AND (SELECT p.va
FROM providers p
WHERE p.provider_id = v.provider_id) = 'MjU='
LIMIT 1;
Because we do not know the name of the column that holds the main data, we need to look it up, using the provider_id which is in both tables, and then query.
I am not getting any errors, but also no data back. I have spent the past hour trying to put this on sqlfiddle, but it kept crashing, so I just wanted to check if my code is really wrong, hence the crashing?
In the above example, I am looking in the providers table for column network, where the provider_id matches, and then use that as the column on smv.
I am sure i have done this before just like this, but after the weekend trying I thought i would ask on here.
Thanks in Advance.
UPDATE
Here is an example of the data:
THis is the providers, this links so no matter what the name of the column on the smv table, we can link them.
+---+---+---------------+---------+-------+--------+-----+-------+--------+
| | A | B | C | D | E | F | G | H |
+---+---+---------------+---------+-------+--------+-----+-------+--------+
| 1 | 1 | Home | network | batch | bs | bp | va | bex |
| 2 | 2 | Recharge | code | id | serial | pin | value | expire |
+---+---+---------------+---------+-------+--------+-----+-------+--------+
In the example above, G will mean in the smv column for recharge we be value. So that is what we would look for in our WHERE clause.
Here is the smv table:
+---+---+-----------+-----------+---+----+---------------------+-----+--+
| | A | B | C | D | E | F | value | va |
+---+---+-----------+-----------+---+----+---------------------+-----+--+
| 1 | 1 | X2 | Home | 4 | 10 | 2016-09-26 15:20:58 | | 7 |
| 2 | 2 | X2 | Recharge | 4 | 11 | 2016-09-26 15:20:58 | 9 | |
+---+---+-----------+-----------+---+----+---------------------+-----+--+
value in the same example as above would be 9, or 'RUU=' decoded.
So we do not know the name of the rows, until the row from smv is called, once we have this, we can look up what column name we need to get the correct information.
Hope this helps.
MORE INFO
At the point of triggering, we do not know what the row consists of the right data because some many of the fields would be empty. The map is there to help we query the right column, to get the right row (smv grows over time depending on whats uploaded.)
1) SELECT p.va FROM providers p WHERE p.network = 'Recharge' ;
2) SELECT s.* FROM smv s, providers p WHERE p.network = 'Recharge';
1) gives me the correct column I need to look up and query smv, using the above examples it would come back with "value". So I need to now look up, within the smv table, C = Recharge, and value = '9'. This should bring me back row 2 of the smv table.
So individually both 1 and 2 queries work, but I need them put together so the query is done on the database server.
Hope this gives more insight
Even More Info
From reading other posts, which are not really doing what I need, i have come up with this:
SELECT s.*
FROM (SELECT
(SELECT p.va
FROM dh_smv_providers p
WHERE p.provider_name = 'vodaphone'
LIMIT 1) AS net,
(SELECT p.bex
FROM dh_smv_providers p
WHERE p.provider_name = 'vodaphone'
LIMIT 1) AS bex
FROM dh_smv_providers) AS val, dh_smv_ s
WHERE s.provider_id = 'vodaphone' AND net = '20'
ORDER BY from_base64(val.bex) DESC;
The above comes back blank, but if i replace net, in the WHERE clause with a column I know exists, I do get the results expected:
SELECT s.*
FROM (SELECT
(SELECT p.va
FROM dh_smv_providers p
WHERE p.provider_name = 'vodaphone'
LIMIT 1) AS net,
(SELECT p.bex
FROM dh_smv_providers p
WHERE p.provider_name = 'vodaphone'
LIMIT 1) AS bex
FROM dh_smv_providers) AS val, dh_smv_ s
WHERE s.provider_id = 'vodaphone' AND value = '20'
ORDER BY from_base64(val.bex) DESC;
So what I am doing wrong, which is net, not showing the value derived from the subquery "value" ?
Thanks
SELECT
v.*,
p.network, p.va
FROM
smv_ v
INNER JOIN
providers p ON p.provider_id = v.provider_id
WHERE
p.network = 'RUU=' AND p.va = 'MjU='
LIMIT 1;
The tables talk to each other via the JOIN syntax. This completely circumvents the need (and limitations) of sub-selects.
The INNER JOIN means that only fully successful matches are returned, you may need to adjust this type of join for your situation but the SQL will return a row of all v columns where p.va = MjU and p.network = RUU and p.provider_id = v.provider_id.
What I was trying to explain in comments is that subqueries do not have any knowledge of their outer query:
SELECT *
FROM a
WHERE (SELECT * FROM b WHERE a)
AND (SELECT * FROM c WHERE a OR b)
This layout (as you have in your question) is that b knows nothing about a because the b query is executed first, then the c query, then finally the a query. So your original query is looking for WHERE p.provider_id = v.provider_id but v has not yet been defined so the result is false.

Create a query that fetches rows plus their adjacent rows

I am having problems with a specific query - respectively creating the query in the first place.
The columns can be reduced to id, seconds and status.
=============================
| id | seconds | status |
-----------------------------
| 0 | 0 | 0 |
| 1 | 12 | 1 |
| 2 | 25 | 0 |
| 3 | 37 | 1 |
| 4 | 42 | 0 |
=============================
What I'd like to have: All entries with status = 1 PLUS all entries that are less than 10 seconds away from those entries. Basically, I want to fetch all possible pairs (or triplets, etc.) of rows to check manually (later automatically) whether they need to be paired (there is a column parent_id for this purpose, but we don't need that for the query). I could do this in code (first select all status=1, then loop), but I wonder whether it is possible to do this purely in the database.
Thus, my desired output would be the following:
=============================
| id | seconds | status |
-----------------------------
| 1 | 12 | 1 | <- status = 1
| 3 | 37 | 1 | <- status = 1
| 4 | 42 | 0 | <- only 5 seconds after status = 1
=============================
My current best guess is this:
SELECT * FROM entries e0
WHERE
e0.status = 1 OR
e0.status = 0 AND
0 < (SELECT count(*)
FROM entries e1
WHERE e1.status = 1 AND abs(e1.seconds - e0.seconds) < 10)
But this fetches the whole table, and I don't really know why - and it takes a long time to do so (there is an index on the column seconds, the table has 9000 entries).
Is there a way to do this (maybe even effiently)?
Here's one option with union all and exists:
select * from entries where status = 1
union all
select * from entries e where status = 0 and
exists (select 1
from entries e2
where e2.status = 1 and
abs(e.seconds - e2.seconds) < 10
)
SQL Fiddle Demo
Alternatively you could use an outer join with distinct instead of exists:
select distinct e.*
from entries e
left join entries e2 on e2.status = 1
where e.status = 1 or abs(e.seconds - e2.seconds) < 10
More Fiddle
I prefer to do it in a single query. However there are also ways of doing it with exists or subqueries as well. Utilizing an outer join means you can grab everything at once with a nicely crafted where and join statements, adding a group by or distinct based on your performance situation will tidy up your results and make them unique rows.
My suggestion on where statements to ensure your intentions are met is to use parenthesis to establish your intended precedence. It will make your code clearer to your intentions.
WHERE Condition1 = True OR Condition2 = True AND Condition3 = True
Should be
WHERE Condition1 = True OR (Condition2 = True AND Condition3 = True)
Oddly, I would not have thought it would evaluate in the manner you mention because of past experience but then again I ALWAYS use parenthesis to establish my precedence to make it more clear and easier to craft more complex conditions.
Reason you are getting the whole table. Is because of the data in your table. Seriously, sometimes we go looking for the answer and make it complicated, I prefer my way of solving the query of yours but given your result set example my query and yours get the same results! Try changing the 10 seconds down to 1/2/3 etc and see what the effect of your query is. My assumption would be in your full dataset that your any record with a status of 0 is within 10 seconds of a record that has a status of 1...... I would have commented back but this is one of the first questions I have answered.
Here is some example code based on your dataset and query.
DECLARE #Entries AS TABLE (
Id INT
,Seconds INT
,[Status] BIT
)
INSERT INTO #Entries (Id, Seconds, [Status])
VALUES (0,0,0 )
,(1,12,1 )
,(2,25,0 )
,(3,37,1 )
,(4,42,0 )
SELECT *
FROM
#Entries e0
WHERE
e0.Status = 1
OR e0.Status = 0
AND 0 < (SELECT count(*)
FROM
#Entries e1
WHERE e1.Status = 1 AND ABS(e1.Seconds - e0.Seconds) < 10)
SELECT DISTINCT
e0.*
FROM
#Entries e0
LEFT JOIN #Entries e1
ON e1.[Status] = 1
AND ABS(e1.seconds - e0.seconds) < 10
WHERE
e0.[Status] = 1
OR e1.id IS NOT NULL

Filtering without duplicates in mysql

I'm fetching data from four different tables. This is how I did:
select value_split.id,
count(invoice_request.id) as counts,
GROUP_CONCAT(`category`.`category_name`) as `category_name`
from value_split
left join invoice_request
on invoice_request.id = value_split.id
and MONTH(invoice_request.from_date) = '05'
and YEAR(invoice_request.from_date) = '2016'
and invoice_request.status = '1'
left join `category_value` on value_split.id = category_value.id
left join `category` on category_value.category_id = category.category_id
where MONTH(value_split.date) = '05'
and YEAR(value_split.date) = '2016'
group by value_split.id, value_split.date
limit 0, 2147483647
The expected answer is:
id|counts|category_name
1 | 1 | service, drop
2 | 2 | drop
3 | 1 | service
But, what I get is:
id|counts|category_name
1 | 5 | service, drop
2 | 4 | drop
3 | 5 | service
If I remove line 10 and 11, I get the correct answer. But, if I include them, I get the wrong one. But, I want to display that too. So, what should I do to get the correct output? What's wrong with this?
Use
COUNT(DISTINCT invoice_request.id) AS counts,

Combining 2 SUMS in MySQL and optimising query

I have the below code which works:
SELECT admin_teams.name,
SUM(temp_orders.amount_paid) as amount,
SUM(instalments.amount) as amount2
FROM temp_orders
LEFT JOIN admin_teams
ON admin_teams.id = temp_orders.team
LEFT JOIN instalments
ON instalments.order_id = temp_orders.order_id
WHERE
(DATE(temp_orders.date_paid) = CURDATE()
OR DATE(instalments.date_paid) = CURDATE())
AND (temp_orders.pay_status = 4
OR instalments.pay_status = 4)
GROUP BY temp_orders.team
ORDER BY temp_orders.team ASC
LIMIT 5
It produces a table that looks like:
+-------------+--------+---------+
| name | amount | amount2 |
+-------------+--------+---------+
| team name 1 | 100 | 150 |
| team name 2 | 200 | 250 |
| team name 3 | 300 | 175 |
+-------------+--------+---------+
I have two issues;
I actually only want one column which is the sum of amount and amount2.
The query is VERY slow - this took 190 sec to run.
I did have it almost working with a Union which was almost instant - I couldn't however get it fully working because the number of columns in my first select statement will not match those in the second - the table 'instalments' does not have a team column but the table temp_orders does.
Can anyone help with either problem?
Thanks.
SELECT admin_teams.name,
(SUM(temp_orders.amount_paid) + SUM(instalments.amount)) as amount,
FROM temp_orders
LEFT JOIN admin_teams
ON admin_teams.id = temp_orders.team
LEFT JOIN instalments
ON instalments.order_id = temp_orders.order_id
WHERE
temp_orders.date_paid >= CURDATE()
OR instalments.date_paid >= CURDATE())
AND (temp_orders.pay_status = 4
OR instalments.pay_status = 4)
GROUP BY temp_orders.team
ORDER BY temp_orders.team ASC
LIMIT 5
And add these indexes
ALTER TABLE temp_orders ADD KEY (date_paid ,pay_status,team);
ALTER TABLE instalments ADD KEY (date_paid ,pay_status);