Optimize a query for calcilating datetime difference - mysql

I have a SQL table:
+---------+----------+---------------------+---------------------+---------+
| id | party_id | begintime | endtime | to_meas |
+---------+----------+---------------------+---------------------+---------+
| 1395035 | 9255 | 2010-09-26 00:34:02 | 2010-09-26 03:56:20 | 0 |
| 1395036 | 8974 | 2009-07-10 11:00:00 | 2009-07-10 21:30:00 | 0 |
| 1395037 | 8974 | 2009-07-10 23:14:00 | 2009-07-11 08:48:00 | 0 |
| 1395038 | 8975 | 2009-07-10 11:00:00 | 2009-07-10 21:30:00 | 0 |
| 1395039 | 8975 | 2009-07-10 23:14:00 | 2009-07-11 08:48:00 | 0 |
| 1395040 | 8974 | 2009-07-11 10:08:31 | 2009-07-12 18:49:51 | 0 |
| 1395041 | 8975 | 2009-07-11 10:08:31 | 2009-07-12 18:49:51 | 0 |
| 1395042 | 8974 | 2009-07-12 20:38:27 | 2009-07-13 20:33:21 | 0 |
| 1395043 | 8975 | 2009-07-12 20:38:27 | 2009-07-13 20:33:21 | 0 |
| 1395044 | 8974 | 2009-07-13 21:57:37 | 2009-07-15 08:25:45 | 0 |
| 1395045 | 8975 | 2009-07-13 21:57:37 | 2009-07-15 08:25:45 | 0 |
| 1395046 | 8974 | 2009-07-15 08:51:25 | 2009-07-16 10:29:13 | 0 |
| 1395047 | 8975 | 2009-07-15 08:51:25 | 2009-07-16 10:29:13 | 0 |
| 1395048 | 8974 | 2009-07-16 12:22:22 | 2009-07-17 14:39:10 | 0 |
| 1395049 | 8975 | 2009-07-16 12:22:22 | 2009-07-17 14:39:10 | 0 |
| 1395050 | 8976 | 2009-07-24 16:53:48 | 2009-07-25 08:47:29 | 0 |
| 1395051 | 8977 | 2009-07-24 16:53:48 | 2009-07-25 08:47:29 | 0 |
| 1395052 | 8978 | 2009-07-24 16:53:48 | 2009-07-25 08:47:29 | 0 |
| 1395053 | 8979 | 2009-07-24 16:53:48 | 2009-07-25 08:47:29 | 0 |
| 1395054 | 8976 | 2009-07-25 10:47:14 | 2009-07-26 09:41:44 | 0 |
+---------+----------+---------------------+---------------------+---------+
...
I need to calculate time between begintime and previous endtime and set to_meas to 1 if this difference is > 30 minutes. Here is my attempt to do it in MySQL:
update doses d set to_meas=1 where d.id in
(select a.id from party join (select * from doses) a
on party_id=a.party_id
left join (select * from doses) b
on party.id=b.party_id
and b.begintime=(select min(begintime)
from (select * from doses) c
where c.begintime > a.endtime)
and timestampdiff(minute, a.endtime, b.begintime) > 30
group by party.id);
This command runs (quasi-) forever. I've tried to do it in python's pandas:
conn = engine.connect()
sql =
'''
select doses.id, party_id, party.ml, begintime, endtime
from doses join party on party.id=doses.party_id
'''
df = pd.read_sql(con=conn, sql=sql,
measure = df.groupby('party_id', as_index=False).apply(
lambda x: x[pd.to_datetime(x['begintime']) -
pd.to_datetime(x.shift()['endtime']) > pd.to_timedelta('30 minutes')])
measure_ids = measure['id'].to_list()
measure_list = ','.join([str(x) for x in measure_ids])
conn.execute(
'update doses set to_meas=true where id in(%s)' % measure_list)
The last statement runs about 10 seconds. Is there a way to optimize SQL code for running as fast as the pandas` one?

In MySQL 8.0, you can get select the result you want with window functions, like so:
select d.*,
(begintime > lag(endtime) over(partition by pary_id order by endtime) + interval 30 minute) as to_meas
from doses d
In earlier versions:
select d.*,
(
begintime > (
select max(endtime) + interval 30 minute
from doses d1
where d1.party_id = d.party_id and d1.endtime < d.endtime
)
) as to_meas
from doses d
I would not recommend storing such derived information. You can use the query, or create a view. But if you really insist on an update:
update doses d
inner join (
select id,
(
begintime > (
select max(endtime) + interval 30 minute
from doses d1
where d1.party_id = d.party_id and d1.endtime < d.endtime
)
) as to_meas
from doses d
) d1 on d1.id = d.id
set d.to_meas = d1.to_meas

You can update your data using exists as follows:
Update doses d
Set meas = 1
Where begintime > (select max(dd.endtime) + interval '30' minute
From doses dd where dd.begintime < d.begintime
And dd.party_id = d.party_id)

If you want to update the data, you can use window functions in the update:
update doses d join
(select d.*,
lag(d.endtime) over (partition by d.party_id order by d.endtime) as prev_endtime
from doses d
) dd
on d.id = dd.id and
d.starttime > dd.prev_endtime + interval 30 minute
set to_meas = 1;
Then, for this query, you want an index on doses(party_id, endtime). I assume that id is already declared as a primary key.
Note: With this index, you might find it faster simply to calculate the value on the fly rather than storing it in the table.
EDIT:
In older versions of MySQL, you can phrase this as:
update doses d join
(select d.*,
(select d2.endtime
from doses d2
where d2.party_id = d.party_id and
d2.endtime < d.endtime
) as prev_endtime
from doses d
) dd
on d.id = dd.id and
d.starttime > dd.prev_endtime + interval 30 minute
set to_meas = 1;
You have relatively few rows per party_id so a correlated query seems reasonable. This also needs an index on (party_id, endtime).

Related

Nested JOIN to create custom dynamic columns

I have a table veicoli (vehicles) like this:
-------------------------------
| ID | Modello | Targa |
-------------------------------
| 1 | IVECO | XA123WE |
-------------------------------
| 2 | IVECO | CF556XD |
-------------------------------
| 3 | FIAT | AS332ZZ |
-------------------------------
| 4 | GOLF | GF567YU |
-------------------------------
For each vehicle I have none, one or multiple revisioni_veicolo (revisions) (the one with bigger DateExpiring is the one I need to check if revision is still valid or not based on today date)
-------------------------------------------------------------------
| ID | veicoli_ID | DateExpiring | Pass_Success |
-------------------------------------------------------------------
| 1 | 1 | 2019-07-01 | 1
------------------------------------------------------------------
| 2 | 1 | 2020-10-01 | 0
-------------------------------------------------------------------
| 3 | 2 | 2019-11-25 | 1
-------------------------------------------------------------------
| 4 | 2 | 2018-10-20 | 1
-------------------------------------------------------------------
| 5 | 4 | 2017-10-20 | 1
-------------------------------------------------------------------
Based on my example above (today is 2019-10-29):
Vehicle: ID = 1 has a revision still active (2020-10-01) but not passed (Pass_success = 0)
Vehicle: ID = 2 has a revision still active (2019-11-25) and passed (Pass_success = 1)
Vehicle: ID = 3 has no revision yet
Vehicle: ID = 4 has revision, but no active revision (last expired on 2017-10-20) but the last one passed the check (Pass_success = 1)
What I need is to have 3 new custom columns created dynamically on my query result:
-------------------------------------------------------------------------------------------
| ID | Modello | Targa | RevisionPresent | RevisionStillActive | LastRevisionPassed |
-------------------------------------------------------------------------------------------
| 1 | IVECO | XA123WE | true | true | false
-------------------------------------------------------------------------------------------
| 2 | IVECO | CF556XD | true | true | true
-------------------------------------------------------------------------------------------
| 3 | FIAT | AS332ZZ | false | false | false
-------------------------------------------------------------------------------------------
| 4 | GOLF | GF567YU | true | false | true
-------------------------------------------------------------------------------------------
I tried to start with my old post: MYSQL INNER JOIN to get 3 types of result
But I'm very confused using nested JOIN
I tried starting a fiddle but i'm stuck on syntax error: http://sqlfiddle.com/#!9/3c70bf/2
You need a LEFT JOIN of the tables and conditional aggregation:
select v.ID, v.Modello, v.Targa,
max(r.DataScadenzaRevisione is not null) RevisionPresent,
coalesce(max(r.DataScadenzaRevisione >= current_date()), 0) RevisionStillActive,
max(case when r.DataScadenzaRevisione = g.maxdate then r.EsitoPositivo else 0 end) LastRevisionPassed
from veicoli v
left join revisioni_veicolo r on r.veicoli_ID = v.id
left join (
select veicoli_id, max(DataScadenzaRevisione) maxdate
from revisioni_veicolo
group by veicoli_id
) g on g.veicoli_ID = v.id
group by v.ID, v.Modello, v.Targa
See the demo.
Results:
| ID | Modello | Targa | RevisionPresent | RevisionStillActive | LastRevisionPassed |
| --- | ------- | ------- | --------------- | ------------------- | ------------------ |
| 1 | IVECO | XA123WE | 1 | 1 | 0 |
| 2 | IVECO | CF556XD | 1 | 1 | 1 |
| 3 | FIAT | AS332ZZ | 0 | 0 | 0 |
| 4 | GOLF | GF567YU | 1 | 0 | 1 |
...
LEFT JOIN (SELECT a.veicoli_ID, a.EsitoPositivo AS StatoUltimaRevisione,
a.DataScadenzaRevisione FROM revisioni_veicolo) a
...
There's two things wrong with this.
The alias a is defined for this subquery, so you can't reference it inside the subquery. But you don't need to qualify the columns in this subquery anyway - you didn't do this in other subqueries, so I'm not sure why you did it in this case.
You don't have any join condition for this join. MySQL is a little bit inconsistent about when join conditions are required. But in this case, you need one.
After I tested the query with these two corrections, it works.
Basically you just need to look at the last revision of each vehicule to produce that resultset.
You can do the filtering with a correlated subquery:
select
v.ID,
v.Modello,
v.Targa,
(DataScadenzaRevisione >= now()) RevisionPresent,
(DataScadenzaRevisione >= now() and EsitoPositivo = 1) RevisionStillActive,
(EsitoPositivo = 1) LastRevisionPassed
from
veicoli v
left join revisioni_veicolo r
on r.veicoli_ID = v.ID
and r.DataScadenzaRevisione = (
select max(DataScadenzaRevisione)
from revisioni_veicolo r1
where r1.veicoli_ID = v.ID
)
You can check the results with your sample data in this db fiddle.
Or you can use a window function (this requires MySQL 8.0):
select
v.ID,
v.Modello,
v.Targa,
(DataScadenzaRevisione >= now()) RevisionPresent,
(DataScadenzaRevisione >= now() and EsitoPositivo = 1) RevisionStillActive,
(EsitoPositivo = 1) LastRevisionPassed
from (
select
v.*,
r.*,
row_number() over(partition by ID order by r.DataScadenzaRevisione desc) rn
from veicoli v
left join revisioni_veicolo r on r.veicoli_ID = v.ID
) where coaelesce(rn, 1) = 1

Using left join with min

I am trying to connect two tables with left join and a date.
My SQL Query
SELECT
ord.`ordernumber` bestellnummer,
his.`change_date` zahldatum
FROM
`s_order` ord
LEFT JOIN
`s_order_history` his ON ((ord.`id`=his.`orderID`) AND (ord.`cleared`=his.`payment_status_id`)) #AND MIN(his.`change_date`)
WHERE
ord.`ordertime` >= \''.$dateSTART.'\' AND ord.`ordertime` <= \''.$dateSTOP.'\'' ;
s_order
+----+---------------------+---------+-------------+
| id | ordertime | cleared | ordernumber |
+----+---------------------+---------+-------------+
| 1 | 2014-08-11 19:53:43 | 2 | 123 |
| 2 | 2014-08-15 18:33:34 | 2 | 125 |
+----+---------------------+---------+-------------+
s_order_history
+----+-------------------+-----------------+---------+---------------------+
| id | payment_status_id | order_status_id | orderID | orderID change_date |
+----+-------------------+-----------------+---------+---------------------+
| 1 | 1 | 5 | 1 | 2014-08-11 20:53:43 |
| 2 | 2 | 5 | 1 | 2014-08-11 22:53:43 |
| 3 | 2 | 7 | 1 | 2014-08-12 19:53:43 |
| 4 | 1 | 5 | 2 | 2014-08-15 18:33:34 |
| 5 | 1 | 6 | 2 | 2014-08-16 18:33:34 |
| 6 | 2 | 6 | 2 | 2014-08-17 18:33:34 |
+----+-------------------+-----------------+---------+---------------------+
Wanted result:
+-------------+---------------------+
| ordernumber | change_date |
+-------------+---------------------+
| 123 | 2014-08-11 22:53:43 |
| 125 | 2014-08-17 18:33:34 |
+-------------+---------------------+
The problem I have is getting only the date, where the cleared/payment_status_id value has been changed in s_order. I currently get all dates where the payment_status_id matches the current cleared value, but I only need the one, where it happend first.
This is only an excerpt of the actually query, since the original is a lot longer (mostly more left joins and a lot more tables).
You can group data by ordernumber
SELECT
ord.`ordernumber` bestellnummer,
MIN(his.`min_change_date`) as zahldatum
FROM
`s_order` ord
LEFT JOIN
`s_order_history` his ON ((ord.`id`=his.`orderID`) AND (ord.`cleared`=his.`payment_status_id`)) #AND MIN(his.`change_date`)
WHERE
ord.`ordertime` >= \''.$dateSTART.'\' AND ord.`ordertime` <= \''.$dateSTOP.'\''
GROUP BY
ord.`ordernumber`;
or you can group data in a subquery:
SELECT
ord.`ordernumber` bestellnummer,
his.`min_change_date` zahldatum
FROM
`s_order` ord
LEFT JOIN (
SELECT
orderID, payment_status_id, MIN(change_date) as min_change_date
FROM
s_order_history
GROUP BY
orderID, payment_status_id
) his ON (ord.`id` = his.`orderID` AND ord.`cleared` = his.`payment_status_id`)
WHERE
ord.`ordertime` >= \''.$dateSTART.'\' AND ord.`ordertime` <= \''.$dateSTOP.'\'';
Try this:
select s_order.ordernumber, min(s_order_history.change_date)
from s_order left join s_order_history
on s_order.id = s_order_history.orderID
and s_order.cleared = s_order_history.payment_status_id
group by s_order.order_id
SELECT ord.`ordernumber` bestellnummer,
MIN( his.`change_date` ) zahldatum
...
GROUP BY ord.`ordernumber`
MIN is an aggregate function so you can't use it in a JOIN straight up like you've tried above. You also are not comparing it to a value in your JOIN.
You'll want to do something like:
his.`change_date` = (SELECT MIN(his.`change_date`) FROM s_order_history where ord.`id` = his.`orderID`)
in your JOIN.

How to determine daily accumlated values in mysql for each sample?

I've got a mysql table that has a running total:
+---------------------+--------+
| Timestamp | Total |
+---------------------+--------+
| 2012-07-04 05:35:00 | 1.280 | 1.280-1.280 = 0
| 2012-07-04 09:25:00 | 2.173 | 2.173-1.280 = 0.893
| 2012-07-04 09:30:00 | 2.219 | 2.219-1.280 = 0.939
| 2012-07-04 15:00:00 | 7.778 | 7.778-1.280 = 6.498
| 2012-07-04 21:05:00 | 13.032 | 13.032-1.280 = 11.752
| 2012-07-04 22:00:00 | 13.033 | 13.033-1.280 = 11.753
| 2012-07-05 05:20:00 | 13.033 | 13.033-13.033 = 0
| 2012-07-05 07:10:00 | 13.140 | 13.140-13.033 = 0.107
| 2012-07-05 10:15:00 | 14.993 | 14.993-13.033 = 1.960
| 2012-07-05 11:35:00 | 16.870 | 16.870-13.033 = 3.837
+---------------------+--------+
What I'm looking for is a query that determines the aggregated daily increase for each interval.
I've tried to show the desired outcome as well as the calculation behind each row. I've tried already several things with a join, but somehow I fail to determine what the starting value for each day is.
Thanks.
I can't vouch for the efficiency of this query, but it does get you the results you are looking for:
SELECT t1.`Timestamp`, t1.`Total`,
CASE WHEN t1.`timestamp` =
(SELECT MIN(t2.`Timestamp`)
FROM myTable t2
WHERE DATE(t2.`Timestamp`)=DATE(t1.`Timestamp`))
THEN 0
ELSE t1.`Total` - (SELECT MIN(t3.`Total`)
FROM myTable t3
WHERE DATE(t3.`Timestamp`)=DATE(t1.`Timestamp`))
END AS Diff
FROM myTable t1
ORDER BY `Timestamp`
Alternate Solution (more efficient I think)
SELECT t1.`Timestamp`, t1.`Total`, (t1.`Total` - d1.MinVal) diff
FROM myTable t1
INNER JOIN
(SELECT DATE(`Timestamp`) ts_date,
MIN(`Total`) AS MinVal
FROM myTable
GROUP BY ts_date) d1
ON DATE(t1.`Timestamp`) = d1.ts_date

Efficient assignment of percentile/rank in MYSQL

I have a couple of very large tables (over 400,000 rows) that look like the following:
+---------+--------+---------------+
| ID | M1 | M1_Percentile |
+---------+--------+---------------+
| 3684514 | 3.2997 | NULL |
| 3684515 | 3.0476 | NULL |
| 3684516 | 2.6499 | NULL |
| 3684517 | 0.3585 | NULL |
| 3684518 | 1.6919 | NULL |
| 3684519 | 2.8515 | NULL |
| 3684520 | 4.0728 | NULL |
| 3684521 | 4.0224 | NULL |
| 3684522 | 5.8207 | NULL |
| 3684523 | 6.8291 | NULL |
+---------+--------+---------------+...about 400,000 more
I need to assign each row in the M1_Percentile column a value that represents "the percent of rows with M1 values equal or lower to the current row's M1 value"
In other words, I need:
I implemented this sucessfully, but it is FAR FAR too slow. If anyone could create a more efficient version of the following code, I would really appreciate it!
UPDATE myTable AS X JOIN (
SELECT
s1.ID, COUNT(s2.ID)/ (SELECT COUNT(*) FROM myTable) * 100 AS percentile
FROM
myTable s1 JOIN myTable s2 on (s2.M1 <= s1.M1)
GROUP BY s1.ID
ORDER BY s1.ID) AS Z
ON (X.ID = Z.ID)
SET X.M1_Percentile = Z.percentile;
This is the (correct but slow) result from the above query if the number of rows is limited to the ones you see (10 rows):
+---------+--------+---------------+
| ID | M1 | M1_Percentile |
+---------+--------+---------------+
| 3684514 | 3.2997 | 60 |
| 3684515 | 3.0476 | 50 |
| 3684516 | 2.6499 | 30 |
| 3684517 | 0.3585 | 10 |
| 3684518 | 1.6919 | 20 |
| 3684519 | 2.8515 | 40 |
| 3684520 | 4.0728 | 80 |
| 3684521 | 4.0224 | 70 |
| 3684522 | 5.8207 | 90 |
| 3684523 | 6.8291 | 100 |
+---------+--------+---------------+
Producing the same results for the entire 400,000 rows takes magnitudes longer.
I cannot test this, but you could try something like:
update table t
set mi_percentile = (
select count(*)
from table t1
where M1 < t.M1 / (
select count(*)
from table));
UPDATE:
update test t
set m1_pc = (
(select count(*) from test t1 where t1.M1 < t.M1) * 100 /
( select count(*) from test));
This works in Oracle (the only database I have available). I do remember getting that error in MySQL. It is very annoying.
Fair warning: mysql isn't my native environment. However, after a little research, I think the following query should be workable:
UPDATE myTable AS X
JOIN (
SELECT X.ID, (
SELECT COUNT(*)
FROM myTable X1
WHERE (X.M1, X.id) >= (X1.M1, X1.id) as Rank)
FROM myTable as X
) AS RowRank
ON (X.ID = RowRank.ID)
CROSS JOIN (
SELECT COUNT(*) as TotalCount
FROM myTable
) AS TotalCount
SET X.M1_Percentile = RowRank.Rank / TotalCount.TotalCount;

How to resolve this MySQL query?

I have a table that looks like this:
CREATE TEMPORARY TABLE MainList (
`pTime` int(10) unsigned NOT NULL,
`STD` double NOT NULL,
PRIMARY KEY (`pTime`)
) ENGINE=MEMORY;
+------------+-------------+
| pTime | STD |
+------------+-------------+
| 1106080500 | -0.5058072 |
| 1106081100 | -0.82790455 |
| 1106081400 | -0.59226294 |
| 1106081700 | -0.99998194 |
| 1106540100 | -0.86649279 |
| 1107194700 | 1.51340543 |
| 1107305700 | 0.96225296 |
| 1107306300 | 0.53937716 |
+------------+-------------+ .. etc
pTime is my primary key.
I want to make a query that, for every row in my table, will find the first pTime where STD has a flipped sign and is further away from 0 than STD of the above table. (For simplicity's sake, just imagine that I am looking for 0-STD)
Here is an example of the output I want:
+------------+-------------+------------+-------------+
| pTime | STD | pTime_Oppo | STD_Oppo |
+------------+-------------+------------+-------------+
| 1106080500 | -0.5058072 | 1106090400 | 0.57510881 |
| 1106081100 | -0.82790455 | 1106091300 | 0.85599817 |
| 1106081400 | -0.59226294 | 1106091300 | 0.85599817 |
| 1106081700 | -0.99998194 | 1106091600 | 1.0660959 |
+------------+-------------+------------+-------------+
I can't seem to get it right!
I tried the following:
SELECT DISTINCT
MainList.pTime,
MainList.STD,
b34d1.pTime,
b34d1.STD
FROM
MainList
JOIN b34d1 ON(
b34d1.pTime > MainList.pTime
AND(
(
MainList.STD > 0
AND b34d1.STD <= 0 - MainList.STD
)
OR(
MainList.STD < 0
AND b34d1.STD >= 0 - MainList.STD
)
)
);
That code just freezes my server up.
P.S Table b34d1 is just like MainList, except it contains much more elements:
mysql> select STD, Slope from b31d1 limit 10;
+-------------+--------------+
| STD | Slope |
+-------------+--------------+
| -0.44922675 | -5.2016129 |
| -0.11892021 | -8.15249267 |
| 0.62574686 | -10.19794721 |
| 1.10469057 | -12.43768328 |
| 1.52917352 | -13.08651026 |
| 1.61803899 | -13.2441349 |
| 1.82686555 | -12.04912023 |
| 2.07480736 | -11.22067449 |
| 2.45529961 | -7.84090909 |
| 1.86468335 | -6.26466276 |
+-------------+--------------+
mysql> select count(*) from b31d1;
+----------+
| count(*) |
+----------+
| 439340 |
+----------+
1 row in set (0.00 sec)
In fact MainList is just a filtered version of b34d1 that uses the MEMORY engine
mysql> show create table b34d1;
+-------+-----------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------+
| Table | Create Table
|
+-------+-----------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------+
| b34d1 | CREATE TABLE `b34d1` (
`pTime` int(10) unsigned NOT NULL,
`Slope` double NOT NULL,
`STD` double NOT NULL,
PRIMARY KEY (`pTime`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 MIN_ROWS=339331 MAX_ROWS=539331 PACK_KEYS=1 ROW_FORMAT=FIXED |
+-------+-----------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------+
Edit: I just did a little experiment and I am very confused by the results:
SELECT DISTINCT
b34d1.pTime,
b34d1.STD,
Anti.pTime,
Anti.STD
FROM
b34d1
LEFT JOIN b34d1 As Anti ON(
Anti.pTime > b34d1.pTime
AND(
(
b34d1.STD > 0
AND b34d1.STD <= 0 - Anti.STD
)
OR(
b34d1.STD < 0
AND b34d1.STD >= 0 - Anti.STD
)
)
) limit 10;
+------------+-------------+------------+------------+
| pTime | STD | pTime | STD |
+------------+-------------+------------+------------+
| 1104537600 | -0.70381962 | 1104539100 | 0.73473692 |
| 1104537600 | -0.70381962 | 1104714000 | 1.46733274 |
| 1104537600 | -0.70381962 | 1104714300 | 2.02097356 |
| 1104537600 | -0.70381962 | 1104714600 | 2.60642099 |
| 1104537600 | -0.70381962 | 1104714900 | 2.01006557 |
| 1104537600 | -0.70381962 | 1104715200 | 1.97724189 |
| 1104537600 | -0.70381962 | 1104715500 | 1.85683704 |
| 1104537600 | -0.70381962 | 1104715800 | 1.2754127 |
| 1104537600 | -0.70381962 | 1104716100 | 0.87900156 |
| 1104537600 | -0.70381962 | 1104716400 | 0.72957739 |
+------------+-------------+------------+------------+
Why are all the values under the first pTime the same?
Selecting other fields from a row having some aggregate statistic (such as a minimum or maximum value) is a little messy in SQL. Such queries aren't so simple. You typically need an extra join or a subquery. For example:
SELECT m.pTime, m.STD, m2.pTime AS pTime_Oppo, m2.STD AS STD_Oppo
FROM MainList AS m
JOIN
(SELECT m1.pTime, MIN(m2.pTime) AS pTime_Oppo
FROM MainList AS m1
JOIN MainList AS m2
ON m1.pTime < m2.pTime AND SIGN(m1.STD) != SIGN(m2.STD)
WHERE ABS(m1.STD) <= ABS(m2.std)
GROUP BY m1.pTime
) AS oppo ON m.pTime = oppo.pTime
JOIN MainList AS m2 ON oppo.pTime_Oppo = m2.pTime
;
Using the sample data:
INSERT INTO MainList (`pTime`, `STD`)
VALUES
(1106080500, -0.5058072),
(1106081100, -0.82790455),
(1106081400, -0.59226294),
(1106081700, -0.99998194),
(1106090400, 0.57510881),
(1106091300, 0.85599817),
(1106091600, 1.0660959),
(1106540100, -0.86649279),
(1107194700, 1.51340543),
(1107305700, 0.96225296),
(1107306300, 0.53937716),
;
The results are:
+------------+-------------+------------+-------------+
| pTime | STD | pTime_Oppo | STD_Oppo |
+------------+-------------+------------+-------------+
| 1106080500 | -0.5058072 | 1106090400 | 0.57510881 |
| 1106081100 | -0.82790455 | 1106091300 | 0.85599817 |
| 1106081400 | -0.59226294 | 1106091300 | 0.85599817 |
| 1106081700 | -0.99998194 | 1106091600 | 1.0660959 |
| 1106090400 | 0.57510881 | 1106540100 | -0.86649279 |
| 1106091300 | 0.85599817 | 1106540100 | -0.86649279 |
| 1106540100 | -0.86649279 | 1107194700 | 1.51340543 |
+------------+-------------+------------+-------------+
Any solution based on functions like ABS or SIGN or anything similar required to check sign is doomed to be ineffective on big sets of data, because it makes indexing impossible.
You are creating a temporary table inside a SP so you can alter it schema without losing anything, adding a column that stores sign of STD and storing STD itself unsigned will give you HUGE performance boost, because you can simply find first bigger pTime and bigger STD with a different sign and all conditions can use indices in a query like this (STD_positive keeps STD's sign):
SELECT * from mainlist m
LEFT JOIN mainlist mu
ON mu.pTime = ( SELECT md.pTime FROM mainlist md
WHERE m.pTime < md.pTime
AND m.STD < md.STD
AND m.STD_positive <> md.STD_positive
ORDER BY md.pTime
LIMIT 1 )
LEFT JOIN is needed here to return rows that dont have bigger STD. If you don't need them use simple JOIN. This query should run fine even on lots of records, with proper indices based on careful checking of EXPLAIN output, starting with an index on STD.
SELECT
m.pTime,
m.STD,
mo.pTime AS pTime_Oppo,
-mo.STD AS STD_Oppo
FROM MainList m
INNER JOIN (
SELECT
pTime,
-STD AS STD
FROM MainList
) mo ON m.STD > 0 AND mo.STD > m.STD
OR m.STD < 0 AND mo.STD < m.STD
LEFT JOIN (
SELECT
pTime,
-STD AS STD
FROM MainList
) mo2 ON mo.STD > 0 AND mo2.STD > m.STD AND mo.STD > mo2.STD
OR mo.STD < 0 AND mo2.STD < m.STD AND mo.STD < mo2.STD
WHERE mo2.pTime IS NULL