MySQL query index & performance improvements - mysql

I have created an application to track progress in League of Legends for me and my friends. For this purpose, I collect information about the current rank several times a day into my MySQL database. To fetch the results and show the to them in the graph, I use the following query / queries:
SELECT
lol_summoner.name as name, grid.series + ? as timestamp,
AVG(NULLIF(lol.points, 0)) as points
FROM
series_tmp grid
JOIN
lol ON lol.timestamp >= grid.series AND lol.timestamp < grid.series + ?
JOIN
lol_summoner ON lol.summoner = lol_summoner.id
GROUP BY
lol_summoner.name, grid.series
ORDER BY
name, timestamp ASC
SELECT
lol_summoner.name as name, grid.series + ? as timestamp,
AVG(NULLIF(lol.points, 0)) as points
FROM
series_tmp grid
JOIN
lol ON lol.timestamp >= grid.series AND lol.timestamp < grid.series + ?
JOIN
lol_summoner ON lol.summoner = lol_summoner.id
WHERE
lol_summoner.name IN (". str_repeat('?, ', count($names) - 1) ."?)
GROUP BY
lol_summoner.name, grid.series
ORDER BY
name, timestamp ASC
The first query is used in case I want to retrieve all players which are saved in the database. The grid table is a temporary table which generated timestamps in a specific interval to retrive information in chunks of this interval. The two variable in this query are the interval. The second query is used if I want to retrieve information for specific players only.
The grid table is produces by the following stored procedure which is called with three parameters (n_first - first timestamp, n_last - last timestamp, n_increments - increments between two timestamps):
BEGIN
-- Create tmp table
DROP TEMPORARY TABLE IF EXISTS series_tmp;
CREATE TEMPORARY TABLE series_tmp (
series bigint
) engine = memory;
WHILE n_first <= n_last DO
-- Insert in tmp table
INSERT INTO series_tmp (series) VALUES (n_first);
-- Increment value by one
SET n_first = n_first + n_increment;
END WHILE;
END
The query works and finishes in reasonable time (~10 seconds) but I am thankful for any help to improve the query by either rewriting it or adding additional indexes to the database.
/Edit:
After review of #Rick James answer, I modified the queries as follows:
SELECT lol_summoner.name as name, (lol.timestamp div :range) * :range + :half_range as timestamp, AVG(NULLIF(lol.points, 0)) as points
FROM lol
JOIN lol_summoner ON lol.summoner = lol_summoner.id
GROUP by lol_summoner.name, lol.timestamp div :range
ORDER by name, timestamp ASC
SELECT lol_summoner.name as name, (lol.timestamp div :range) * :range + :half_range as timestamp, AVG(NULLIF(lol.points, 0)) as points
FROM lol
JOIN lol_summoner ON lol.summoner = lol_summoner.id
WHERE lol_summoner.name IN (<NAMES>)
GROUP by lol_summoner.name, lol.timestamp div " . $steps . "
ORDER by name, timestamp ASC
This improves the query execution time by a really good margin (finished way under 1s).

Problem 1 and Solution
You need a series of integers between two values? And they differ by 1? Or by some larger value?
First, create a permanent table of the numbers from 0 to some large enough value:
CREATE TABLE Num10 ( n INT );
INSERT INTO Num10 VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
CREATE TABLE Nums ( n INT, PRIMARY KEY(n))
SELECT a.n*1000 + b.n*100 + c.n*10 + d.n
FROM Num10 AS a
JOIN Num10 AS b -- note "cross join"
JOIN Num10 AS c
JOIN Num10 AS d;
Now Nums has 0..9999. (Make it bigger if you might need more.)
To get a sequence of consecutive numbers from 123 through 234:
SELECT 123 + n FROM Nums WHERE n < 234-123+1;
To get a sequence of consecutive numbers from 12345 through 23456, in steps of 15:
SELECT 12345 + 15*n FROM Nums WHERE n < (23456-12345+1)/15;
JOIN to a SELECT like one of those instead of to series_tmp.
Barring other issue, that should significantly speed things up.
Problem 2
You are GROUPing BY series, but ORDERing by timestamp. They are related, so you might get the 'right' answer. But think about it.
Problem 3
You seem to be building "buckets" (called "series"?) from "timestamps". Is this correct? If so, let's work backwards -- Turn a "timestamp" into a "bucket" number:
bucket_number = (timestamp - start) / bucket_size
By doing that throughout, you can avoid 'Problem 1' and eliminate my solution to it. That is, reformulate the entire queries in terms of buckets.

Related

MySQL Count from a table where condition on the last row of a related table

I'm new with MySQL and actually have a problem. (... and my English is poor... :D)
The database (extract)
I have 3 tables: Batch, MainPost and MainPostHistory.
A Batch has 1 to x MainPost, and a MainPost has 1 to x MainPostHistory (kind of log).
Every tables have an auto-increment primary key.
In addition, a MainPostHistory is defined by a DateTime and a MainPostStatusID.
Of course, all tables are linked by foreign key indexes.
What I have to do
I have to count (for each Batch) the number of MainPost having their last MainPostHistory with a MainPostStatusID equals to (for an example) 0.
So I have 2 parameters: the BatchID and the MainPostStatusID to check.
What I've done
I wrote the following query, but receive an error "Unknown column MP.ID" :
SELECT COUNT(*)
FROM MainPost AS MP
WHERE (MP.BatchID = #BatchID) AND (((
SELECT qMPH.MainPostStatusID
FROM (
SELECT MPH.MainPostStatusID
FROM MainPostHistory AS MPH
WHERE MPH.MainPostID = MP.ID
ORDER BY MPH.DateTime DESC
LIMIT 1
) AS qMPH
)) = #SearchedMainPostStatusID);
What I expect
Why this error, and how to solve it?
And, by the way, is it the best way to do it?
Please! And thanks for reading! :-)
You don't need to nest the subquery inside another one where MP.ID is out of scope:
SELECT COUNT(*)
FROM MainPost AS MP
WHERE (MP.BatchID = #BatchID) AND (
SELECT MPH.MainPostStatusID
FROM MainPostHistory AS MPH
WHERE MPH.MainPostID = MP.ID
ORDER BY MPH.DateTime DESC
LIMIT 1
) = #SearchedMainPostStatusID;

Find next or previous ID when query contains multiple cases

I am looking for the most efficient way to find the next or previous ID of the following query:
SELECT *
FROM transactions
ORDER
BY CASE order_status
WHEN 'order_accepted' THEN 1
WHEN 'processing_order' THEN 2
WHEN 'order_send_mailer' THEN 3
WHEN 'order_send' THEN 4
WHEN 'order_received' THEN 5
WHEN 'order_refunded' THEN 6
ELSE 7 END
, id DESC limit 1;
I tried adding a where id > '$id' or where id < '$id' claus to the query but it didn't give me te next or previous ID I was looking for.
For those that need some explanation of what I am trying to do: It's to go to the next or previous order by case with a forward of backward button.
What it currently looks like:
-id- -order_status-
9399 order_accepted
9398 processing_order
9363 processing_order
9403 order_send_mailer
9318 order_send
9346 order_received
9345 order_received
9050 order_refunded
The next ID for example of 9403 would be 9363 and previous ID would be 9318
Change your order_status into an enum column. This will save disk space and make sorting by order_status simpler and faster.
-- Add a new version of the column using an enum.
-- These strings are aliases for ordered numbers.
-- 'order_accepted' is 1, 'processing_order' is 2, etc.
alter table transactions add column enum_order_status enum(
'order_accepted',
'processing_order',
'order_send_mailer',
'order_send',
'order_received',
'order_refunded'
) not null;
-- Copy the status into the new enum column.
-- MySQL will translate the string into the number for you.
update transactions
set enum_order_status = order_status;
-- Drop the old column.
alter table transactions drop column order_status;
-- Rename the new enum column.
alter table transactions rename column enum_order_status to order_status;
-- Index it.
create index transactions_order_status on transactions(order_status);
-- Enjoy your vastly simplified and much faster query.
select *
from transactions
order by order_status, id desc
That's not actually necessary, but it makes everything much simpler.
With that out of the way, use the window functions lead and lag to refer to the previous and next rows in a query.
select
id, order_status,
lead(id) over w, lead(order_status) over w,
lag(id) over w, lag(order_status) over w
from transactions
window w as (order by order_status, id desc);
Note, window functions were added in MySQL 8. If you're using an older version I recommend upgrading ASAP; MySQL 8 has many big improvements. Otherwise you can simulate it with correlated subqueries and self-joins.
If you want the previous and next rows of a specific row, use the technique from this answer. We add row_numbers to the table in the desired order, and then fetch 9403 and its previous and next row by row number.
-- Add a row number to your table in the desired order.
with ordered_transactions as (
select
*, row_number() over w as rn
from transactions
window w as (order by order_status, id desc)
)
select *
from ordered_transactions
-- Find the row number for ID 9403, then add -1, 0, and 1.
-- If 9403 is row number 5 you'll fetch row numbers 4, 5, and 6.
where ot.rn in (
select rn+i
from ordered_transactions ot
-- All this is doing is making us three "rows" where i = -1, 0, and 1.
cross join (SELECT -1 AS i UNION ALL SELECT 0 UNION ALL SELECT 1) cj
where ot.id = 9403
);
Try it.

merging SQL statements and how can it affect processing time

Let's assume I have the following tables:
items table
item_id|view_count
item_views table
view_id|item_id|ip_address|last_view
What I would like to do is:
If last view of item with given item_id by given ip_address was 1+ hour ago I would like to increment view_count of item in items table. And as a result get the view count of item. How I will do it normally:
q = SELECT count(*) FROM item_views WHERE item_id='item_id' AND ip_address='some_ip' AND last_view < current_time-60*60
if(q==1) then q = UPDATE items SET view_count = view_count+1 WHERE item_id='item_id'
//and finally get view_count of item
q = SELECT view_count FROM items WHERE item_id='item_id'
Here I used 3 SQL queries. How can I merge it into one SQL query? And how can it affect the processing time? Will it be faster or slower than previous method?
I don't think your logic is correct for what you describe that you want. The query:
SELECT count(*)
FROM item_views
WHERE item_id='item_id' AND
ip_address='some_ip' AND
last_view < current_time-60*60
is counting the number of views longer ago than your time frame. I think you want:
last_view > current_time-60*60
and then have if q = 0 on the next line.
MySQL is pretty good with the performance of not exists, so the following should work well:
update items
set view_count = view_count+1
WHERE item_id='item_id' and
not exists (select 1
from item_views
where item_id='item_id' AND
ip_address='some_ip' AND
last_view > current_time-60*60
)
It will work much better with an index on item_views(item_id, ip_address, last_view) and an index on item(item_id).
In MySQL scripting, you could then write:
. . .
set view_count = (#q := view_count+1)
. . .
This would also give you the variable you are looking for.
update target
set target.view_count = target.view_count + 1
from items target
inner join (
select item_id
from item_views
where item_id = 'item_id'
and ip_address = 'some_ip'
and last_view < current_time - 60*60
) ref
on ref.item_id = target.item_id;
You can only combine the update statement with the condition using a join as in the above example; but you'll still need a separate select statement.
It may be slower on very large set and/or unindexed table.

How can you find ID gaps in a MySQL recordset?

The issue here is related to another question I had...
I have millions of records, and the ID of each of those records is auto-incremented, unfortunately sometimes the ID that is generated is sometimes thrown away so there are many many gaps between IDs.
I want to find the gaps, and re-use the ids that were abandoned.
What's an efficient way to do this in MySQL?
First of all, what advantage are you trying to get by reusing the skipped values? An ordinary INT UNSIGNED will let you count up to 4,294,967,295. With "millions of records" your database would have to grow a thousand times over before running out of valid IDs. (And then using a BIGINT UNSIGNED will bump you up to 18,446,744,073,709,551,615 values.)
Trying to recycle values MySQL has skipped is likely to use up a lot of your time trying to compensate for something that really doesn't bother MySQL in the first place.
With that said, you can find missing IDs with something like:
SELECT id + 1
FROM the_table
WHERE NOT EXISTS (SELECT 1 FROM the_table t2 WHERE t2.id = the_table.id + 1);
This will find only the first missing number in each sequence (e.g., if you have {1, 2, 3, 8, 10} it will find {4,9}) but it's likely to be efficient, and of course once you've filled in an ID you can always run it again.
The following will return a row for each gap in the integer field "n" in mytab:
/* cs will contain 1 row for each contiguous sequence of integers in mytab.n
and will have the start of that chain.
ce will contain the end of that chain */
create temporary table cs (row int auto_increment primary key, n int);
create temporary table ce like cs;
insert into cs (n) select n from mytab where n-1 not in (select n from mytab) order by n;
insert into ce (n) select n from mytab where n+1 not in (select n from mytab) order by n;
select ce.n + 1 as bgap, cs.n - 1 as egap
from cs, ce where cs.row = ce.row + 1;
If instead of the gaps you want the contiguous chains then the final select should be:
select cs.n as bchain, ce.n as echain from cs,ce where cs.row=ce.row;
This solution is better, in case you need to include the first element as 1:
SELECT
1 AS gap_start,
MIN(e.id) - 1 AS gap_end
FROM
factura_entrada e
WHERE
NOT EXISTS(
SELECT
1
FROM
factura_entrada
WHERE
id = 1
)
LIMIT 1
UNION
SELECT
a.id + 1 AS gap_start,
MIN(b.id)- 1 AS gap_end
FROM
factura_entrada AS a,
factura_entrada AS b
WHERE
a.id < b.id
GROUP BY
a.id
HAVING
gap_start < MIN(b.id);
If you are using an MariaDB you have a faster option
SELECT * FROM seq_1_to_50000 where seq not in (select col from table);
docs: https://mariadb.com/kb/en/mariadb/sequence/

Producing multiple maximum and minimum values with SQL Query

I'm becoming frustrated with a curious limitation of SQL - its apparent inability to relate one record to another outside of aggregate functions. My problem is summarized thusly.
I have a table, already sorted. I need to find its maximum values (note the plural!) and minimum values. No, I am not looking for a single maximum or single minimum. More specifically I'm trying to generate a list of the local peaks of a numeric sequence. A rough description of an algorithm to generate this is:
WHILE NOT END_OF_TABLE
IF RECORD != FIRST_RECORD AND RECORD != LAST_RECORD THEN
IF ((RECORD(Field)<RECORD_PREVIOUS(Field) AND RECORD(Field)<RECORD_NEXT(Field)) OR
RECORD(Field)>RECORD_PREVIOUS(Field) AND RECORD(Field)>RECORD_NEXT(Field)) THEN
ADD_RESULT RECORD
END IF
END IF
END WHILE
See the Problem? I need to do a query that a given record must compare against the previous and next records' values. Can this even be accomplished in standard SQL?
Your frustration is shared by many; while SQL is great for working with general sets, it's terribly deficient when trying to work with issues specific to ordered sets (whether it's physically ordered in the table or there is an implicit or explicit logical order is irrelevant). There are some things that can help (for example, the rank() and row_number() functions), but the solutions can differ across RDBMS's.
If you can be specific about which platform you're working with, I or someone else can provide a more detailed answer.
You have to self-join twice and generate a rownumber without gaps:
In T-SQL:
WITH ordered AS (
SELECT ROW_NUMBER() OVER (ORDER BY your_sort_order) AS RowNumber
,* -- other columns here
)
SELECT *
FROM ordered
LEFT JOIN ordered AS prev
ON prev.RowNumber = ordered.RowNumber - 1
LEFT JOIN ordered AS next
ON next.RowNumber = ordered.RowNumber + 1
WHERE -- here you put in your local min/local max and end-point handling logic - end points will have NULL in next/prev
Yes. You need a self join - but without a database schema, it's hard to be specific about the solution.
Specifically, I'm wondering about the "ordering" thing you mention - but I'm going to assume there's an "ID" field we can use for this.
(Oh, and I'm using old-school join syntax, coz I'm a dinosaur).
select *
from myTable main,
myTable previous,
myTable next
where previous.id = main.id - 1
and next.id = main.id + 1
and previous.record > main.record
and next.record < main.record
(I think I've interpreted your requirement correctly in the greater/less than clauses, but adjust to taste).
SELECT
current.RowID,
current.Value,
CASE WHEN
(
(current.Value < COALESCE(previous.Value, current.Value + 1))
AND
(current.Value < COALESCE(subsequent.Value, current.Value + 1))
)
THEN
'Minima'
ELSE
'Maxima'
END
FROM
myTable current
LEFT JOIN
myTable previous
ON previous.RowID = (SELECT MAX(RowID) FROM myTable WHERE RowID < current.ROWID)
LEFT JOIN
myTable subsequent
ON subsequent.RowID = (SELECT MIN(RowID) FROM myTable WHERE RowID > current.ROWID)
WHERE
(
(current.Value < COALESCE(previous.Value, current.Value + 1))
AND
(current.Value < COALESCE(subsequent.Value, current.Value + 1))
)
OR
(
(current.Value > COALESCE(previous.Value, current.Value - 1))
AND
(current.Value > COALESCE(subsequent.Value, current.Value - 1))
)
Note: The < and > logic is copied from you, but does not cater for local maxima/minima that are equal across one or more consecutive records.
Note: I've created a fictional RowID to join the records in order, all the is important is that the joins get the "previous" and "subsequent" records.
Note: The LEFT JOINs and COALESCE statements cause the first and last values to always be counted as a maxima or minima.