MySQL Procedure / MySQL Function - mysql

I am still relatively new to MySQL and am stuck on a bit of data engineering.
I have a table with following:
Event_ID, Minutes, EventCode
I have multiple rows with same Event_ID and what event has occurred (eventcode) along with when in minutes (Minutes).
What I want to do is output to a new table the sequence of events based on the minutes for an event_id:
Eg:
Source:
Event_ID, Minutes, EventCode
12, 45, A
12, 49, B
12, 78, A
WOuld be transformed into:
12, 45, A, 1
12, 49, B, 2
12, 78, B, 3
So the last column shows the sequence. Although it can be assmed the source table is sorted by event_id following by minutes I would rather a solution that worked for it to be unsorted if possible
Some pointers would be great!
Thanks

Im MySQL 8 and higher you can use the row_number() window function.
SELECT event_id,
minutes,
eventcode,
row_number() OVER (PARTITION BY event_id
ORDER BY minutes)
FROM elbat;

Try this query:
select event_id, minutes, eventcode, #rownum:=#rownum+1 No from elbat, (SELECT #rownum:=0) r;

Related

Select with Inner Join operator and IN (long results) - Optimize Query

I do a MySql query that looks for results in two different tables.
Tables
Contract
id, contract, creditor_id, client_id, event_id
Invoice
id, contract_id, invoice, due, value
The idea is to select the contracts using some parameters in the query, such as:
initial delay and final, initial value and final, events, creditor.
For this, I use the INNER JOIN, HAVING and IN.
Details:
After receiving the result, I take the values ​​and loop to make an update on each query result, using the result ID.
I built an example in SQL Fiddle for better visualization.
The problem is, when I do this query with very long results or thousands of lines, the query is really slow.
So, I wanted to know if there is a better way to do the same query in an optimal way.
Query:
SELECT `c`.`id`,
`c`.`contract`,
`c`.`creditor_id`,
`c`.`client_id`,
`c`.`event_id`,
`t`.`total_value`,
`delay`
FROM `contract` `c`
INNER JOIN
(SELECT contract_id,
Sum(value) total_value,
Datediff(Curdate(), due) AS delay
FROM invoice t GROUP BY contract_id
HAVING delay <= 99999
AND delay >= 1
AND total_value >= 1
AND total_value < 99999) t ON `t`.`contract_id` = `c`.`id`
WHERE `c`.`creditor_id` = 1
AND `c`.`event_id` IN(4, 7, 5, 8, 13, 3, 6, 15, 2, 24, 1, 21, 20, 14, 17, 18, 16, 23, 25, 22, 9, 10, 26, 12, 19, 11)
If "1..99999" means "any value", then remove the test from the query. That is construct a different query when the user wants an open-ended test.
Deal with the lack of due in the GROUP BY.
Change Datediff(Curdate(), due) > 123 to due < CURDATE() - INTERVAL 123 DAY. That will give us a chance to use due in an INDEX.
Qualify due and value; we can't tell which table they are in.
Please provide SHOW CREATE TABLE.
c could use INDEX(creditor_id, event_id), but after the above issues are addressed, there may be an even better index.

MySQL - Pull most recent value within date range for group of IDs

I have the query below
SELECT SUM(CAST(hd.value AS SIGNED)) as case_count
FROM historical_data hd
WHERE hd.tag_id IN (45,109,173,237,301,365,429)
AND hd.shift = 1
AND hd.timestamp BETWEEN '2018-04-10' AND '2018-04-11'
ORDER BY TIMESTAMP DESC
and with this I'm trying to select a SUM of the value for each of the IDs passed, during the time frame in the BETWEEN statement - but the most recent respective to that timeframe. So the end result would be a SUM of the case_count values for each ID passed in at the last timestamp the ID has i nthat date range.
I am having trouble figuring out HOW to accomplish this. My historical_data table is HUGE, however I do have very specific indexing on it that allows the queries to function fairly well - as well as partitioning on the table by YEAR.
Can anyone provide a pointer on how to get the data I need? I'd rather not loop over the list of IDs and run this query without the SUM and a LIMIT 1, but I guess I can if that's the only way.
Here is one method:
SELECT SUM(CAST(hd.value AS SIGNED)) as case_count
FROM historical_data hd
WHERE hd.tag_id IN (45, 109, 173, 237, 301, 365, 429) AND
hd.shift = 1 AND
hd.timestamp = (SELECT MAX(hd2.timestamp)
FROM historical_data hd
WHERE hd2.tag_id = hd.tag_id AND
hd2.shift = hd.shift AND
hd2.timestamp BETWEEN '2018-04-10' AND '2018-04-11'
);
The optimal index for this query is on historical_data(shift, tag_id, timestamp).

Query Database Accurately Based on Timestamp

I am currently having an accuracy issue when querying price vs. time in a Google Big Query Dataset. What I would like is the price of an asset every five minutes, yet there are some assets that have an empty row for an exact minute.
For example, with VEN vs ICX which are two cryptocurrencies, there might be a time at which price data is not available for a specific second. In my query, I am querying a database for every 300 seconds and taking the price data, yet some assets don't have a timestamp for 5 minutes and 0 seconds. Thus, I would like the get the last known price: a good price to use would be 4 minutes and 58 seconds.
My query right now is:
SELECT MIN(price) AS PRICE, timestamp
FROM [coin_data]
WHERE coin="BTCUSD" AND TIMESTAMP_TO_SEC(timestamp) % 300 = 0
GROUP BY timestamp
ORDER BY timestamp ASC
This query results in this sort of gap in specific places:
Row((10339.25, datetime.datetime(2018, 2, 26, 21, 55, tzinfo=<UTC>)))
Row((10354.62, datetime.datetime(2018, 2, 26, 22, 0, tzinfo=<UTC>)))
Row((10320.0, datetime.datetime(2018, 2, 26, 22, 10[should be 5 for 5 min], tzinfo=<UTC>)))
This one should not be 10 in the last column as that is the minutes place and it should read 5 mins.
In order to select a row that has a 5 minute mark/timestamp if it exists, or the closest existing entry, you can use "(analytic) window functions"(uses OVER()) instead of aggregate functions(uses GROUP BY), as following:
group all rows into "separate" 5 minute groups
sort them by proximity to the desired time
select the first row from each partition.
Here I am using OVER clause to create the "window frames" and sorts the rows in them. Then RANK() numbers all rows in each window frame as they are sorted.
Standard SQL
WITH
data AS (
SELECT *,
CAST(FLOOR(UNIX_SECONDS(timestamp)/300) AS INT64) AS timegroup
FROM
`coin_data` )
SELECT min(price) as min_price, timestamp
FROM
(SELECT *, RANK() OVER(PARTITION BY timegroup ORDER BY timestamp ASC) AS rank
FROM data)
WHERE rank = 1
group by timestamp
ORDER BY timestamp ASC
Legacy SQL
SELECT MIN(price) AS min_price, timestamp
FROM (
SELECT *,
RANK() OVER(PARTITION BY timegroup ORDER BY timestamp ASC) AS rank,
FROM (
SELECT *,
INTEGER(FLOOR(TIMESTAMP_TO_SEC(timestamp)/300)) AS timegroup
FROM [coin_data]) AS data )
WHERE rank = 1
GROUP BY timestamp
ORDER BY timestamp ASC
It seems that you have many prices for the same time stamp in which case you may want to add another field to OVER clause.
OVER(PARTITION BY timegroup, exchange ORDER BY timestamp ASC)
Notes:
Consider migrating to Standard SQL, which is the preferred SQL dialect for querying data stored in BigQuery. You can do that on single query basis, so you don't have to migrate everything at the same time.
My idea was to provide a general query that would illustrate the principle so I don't filter for empty rows, because it's not clear if they are null or empty string and it's not really necessary for the answer.

Long running MYSQL query, Frame Table and View with grouping

I have a table t_date_interval_30 that is cartesian product of a 365 calendar year of dates, and a time field incremented at 30 minute intervals. I use this as a framework to hang call data on.
t_date_interval_30
DATE, DAYNAME, INTERVAL
'2013-01-01', 'Tuesday', '00:00:00'
'2013-01-01', 'Tuesday', '00:30:00'
'2013-01-01', 'Tuesday', '01:00:00'
'2013-01-01', 'Tuesday', '01:30:00'
'2013-01-01', 'Tuesday', '02:00:00'
'2013-01-01', 'Tuesday', '02:30:00'
ETC...
Next I have a view v_call_details that is a summarized view of the call data. Call data is summarized down to one row per call session initiated - the source for this can have multiple rows per call session; i.e., call rolls Ring No Answer from one target to another, each leg of the call increments a new record row.
v_call_details
CLIENT, CSQ, SESS_ID, DATE, CALL_START, CONT_DISP, MET_SLA
'Acme','ACME_CSQ','123-123456789-01','2013-01-01','2013-01-01 00:12:34','ABANDONED',TRUE
'Acme','ACME_CSQ','123-123456998-01','2013-01-01','2013-01-01 00:45:02','HANDLED',TRUE
'Acme','ACME_CSQ','123-123457291-01','2013-01-02','2013-01-02 13:31:58','HANDLED',FALSE
ETC...
So, when I run the below query it takes forever.
SELECT
cd.`client`,
cd.`csq`,
di.`date`,
di.`dayname`,
di.`interval`,
count(cd.`sess_id`) AS `calls`,
(count(cd.`sess_id`) - sum(IF(cd.`cont_disp` = 'ABANDONED'
AND cd.`met_sla` > 0,
1,
0))) AS `presented`
FROM
t_date_interval_30 di
LEFT JOIN
v_call_details cd ON (di.`date` = cd.`date`
AND di.`interval` = SEC_TO_TIME((TIME_TO_SEC(cd.`call_start`) DIV 1800) * 1800))
WHERE
di.`date` BETWEEN '2013-05-01' AND '2013-05-02'
GROUP BY cd.`csq`, di.`date`, di.`interval`
I have never really worked with indexes (though I have tried adding a few to the DATE values and CALL_START values). When I run an EXPLAIN EXTENDED I get the below results.
id, select_type, table, type, possible_keys, key, key_len, ref, rows, filtered, Extra
1, PRIMARY, di, range, i_date, i_date, 3, , 96, 100.00, Using where; Using temporary; Using filesort
1, PRIMARY, <derived2>, ALL, , , , , 153419, 100.00, ,
2, DERIVED, t_cisco_csq_agent_details, ALL , , , , 161925, 100.00, Using temporary; Using filesort
2, DERIVED, t_lkp_clients, ALL , , , , 56, 100.00, ,
Any advice would be greatly appreciated. Right now if I run the query, returning results for 2 days worth of data takes roughly 70 seconds. At that rate, doing a 90 day report will take an hour and a half... I need to find a way to bring that down.
First, don't assume that 90 days worth of data will require 45 times the effort of 2 days. Your query is doing a full scan of the call details table, and this may account for much of the effort. MySQL can propagate the condition on date from di to cd through the equijoin. I'm not sure if it does in this case (because of the second condition).
Second, you are using a view. That might make it impossible to actually improve performance. You can try, but you should try to write the query without the view.
My next question is how long does this take to run:
select cd.csq, cd.`date`,
SEC_TO_TIME((TIME_TO_SEC(cd.`call_start`) DIV 1800) * 1800)) as interval,
count(*)
from v_call_details cd
WHERE cd.`date` BETWEEN '2013-05-01' AND '2013-05-02';
If this takes a reasonable amount of time, then test it for 90 days. If that works, then you can do the aggregation first and then join back to the di table. This is just an idea. I suspect the real performance problem is in the view.

SQL calculate SUM of part of the data

I have a table date, subs, unsubs, where date is PK, and subs and unsubs are the number of subscribes and unsubscribes on that date.
I want to add a column to calculate the number of total subscribers on each day, that is, if sort date, subsall = subsall of last row + subs - unsubs.
For example: the table is
----------------
2012-01-01, 5, 2
2012-01-02, 6, 1
2012-01-03, 4, 3
----------------
I want to add a new column and the table becomes
-------------------
2012-01-01, 5, 2, 3
2012-01-02, 6, 1, 8
2012-01-03, 4, 3, 9
-------------------
What is the MySQL command to do that?
You are looking for a cumulative sum. Try this:
SELECT date, subs, unsubs, (#csum := #csum + subs - unsubs) cumulative
FROM table, (SELECT #csum :=0) c
ORDER BY date
You should create trigger on INSERT action, please see the manual: http://dev.mysql.com/doc/refman/5.0/en/create-trigger.html
You can just select the difference of the two columns and give it an alias of 'subsall' like this:
SELECT date, subs, unsubs, (subs - unsubs) as 'subsall' FROM table
SELECT (COUNT(subs) - COUNT(unsubs)) as total subscribers from table
GROUP BY DATE_FORMAT(date, '%d-%m-%Y')
Hope this works for you.