How to resolve this MySQL query? - mysql

I have a table that looks like this:
CREATE TEMPORARY TABLE MainList (
`pTime` int(10) unsigned NOT NULL,
`STD` double NOT NULL,
PRIMARY KEY (`pTime`)
) ENGINE=MEMORY;
+------------+-------------+
| pTime | STD |
+------------+-------------+
| 1106080500 | -0.5058072 |
| 1106081100 | -0.82790455 |
| 1106081400 | -0.59226294 |
| 1106081700 | -0.99998194 |
| 1106540100 | -0.86649279 |
| 1107194700 | 1.51340543 |
| 1107305700 | 0.96225296 |
| 1107306300 | 0.53937716 |
+------------+-------------+ .. etc
pTime is my primary key.
I want to make a query that, for every row in my table, will find the first pTime where STD has a flipped sign and is further away from 0 than STD of the above table. (For simplicity's sake, just imagine that I am looking for 0-STD)
Here is an example of the output I want:
+------------+-------------+------------+-------------+
| pTime | STD | pTime_Oppo | STD_Oppo |
+------------+-------------+------------+-------------+
| 1106080500 | -0.5058072 | 1106090400 | 0.57510881 |
| 1106081100 | -0.82790455 | 1106091300 | 0.85599817 |
| 1106081400 | -0.59226294 | 1106091300 | 0.85599817 |
| 1106081700 | -0.99998194 | 1106091600 | 1.0660959 |
+------------+-------------+------------+-------------+
I can't seem to get it right!
I tried the following:
SELECT DISTINCT
MainList.pTime,
MainList.STD,
b34d1.pTime,
b34d1.STD
FROM
MainList
JOIN b34d1 ON(
b34d1.pTime > MainList.pTime
AND(
(
MainList.STD > 0
AND b34d1.STD <= 0 - MainList.STD
)
OR(
MainList.STD < 0
AND b34d1.STD >= 0 - MainList.STD
)
)
);
That code just freezes my server up.
P.S Table b34d1 is just like MainList, except it contains much more elements:
mysql> select STD, Slope from b31d1 limit 10;
+-------------+--------------+
| STD | Slope |
+-------------+--------------+
| -0.44922675 | -5.2016129 |
| -0.11892021 | -8.15249267 |
| 0.62574686 | -10.19794721 |
| 1.10469057 | -12.43768328 |
| 1.52917352 | -13.08651026 |
| 1.61803899 | -13.2441349 |
| 1.82686555 | -12.04912023 |
| 2.07480736 | -11.22067449 |
| 2.45529961 | -7.84090909 |
| 1.86468335 | -6.26466276 |
+-------------+--------------+
mysql> select count(*) from b31d1;
+----------+
| count(*) |
+----------+
| 439340 |
+----------+
1 row in set (0.00 sec)
In fact MainList is just a filtered version of b34d1 that uses the MEMORY engine
mysql> show create table b34d1;
+-------+-----------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------+
| Table | Create Table
|
+-------+-----------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------+
| b34d1 | CREATE TABLE `b34d1` (
`pTime` int(10) unsigned NOT NULL,
`Slope` double NOT NULL,
`STD` double NOT NULL,
PRIMARY KEY (`pTime`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 MIN_ROWS=339331 MAX_ROWS=539331 PACK_KEYS=1 ROW_FORMAT=FIXED |
+-------+-----------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------+
Edit: I just did a little experiment and I am very confused by the results:
SELECT DISTINCT
b34d1.pTime,
b34d1.STD,
Anti.pTime,
Anti.STD
FROM
b34d1
LEFT JOIN b34d1 As Anti ON(
Anti.pTime > b34d1.pTime
AND(
(
b34d1.STD > 0
AND b34d1.STD <= 0 - Anti.STD
)
OR(
b34d1.STD < 0
AND b34d1.STD >= 0 - Anti.STD
)
)
) limit 10;
+------------+-------------+------------+------------+
| pTime | STD | pTime | STD |
+------------+-------------+------------+------------+
| 1104537600 | -0.70381962 | 1104539100 | 0.73473692 |
| 1104537600 | -0.70381962 | 1104714000 | 1.46733274 |
| 1104537600 | -0.70381962 | 1104714300 | 2.02097356 |
| 1104537600 | -0.70381962 | 1104714600 | 2.60642099 |
| 1104537600 | -0.70381962 | 1104714900 | 2.01006557 |
| 1104537600 | -0.70381962 | 1104715200 | 1.97724189 |
| 1104537600 | -0.70381962 | 1104715500 | 1.85683704 |
| 1104537600 | -0.70381962 | 1104715800 | 1.2754127 |
| 1104537600 | -0.70381962 | 1104716100 | 0.87900156 |
| 1104537600 | -0.70381962 | 1104716400 | 0.72957739 |
+------------+-------------+------------+------------+
Why are all the values under the first pTime the same?

Selecting other fields from a row having some aggregate statistic (such as a minimum or maximum value) is a little messy in SQL. Such queries aren't so simple. You typically need an extra join or a subquery. For example:
SELECT m.pTime, m.STD, m2.pTime AS pTime_Oppo, m2.STD AS STD_Oppo
FROM MainList AS m
JOIN
(SELECT m1.pTime, MIN(m2.pTime) AS pTime_Oppo
FROM MainList AS m1
JOIN MainList AS m2
ON m1.pTime < m2.pTime AND SIGN(m1.STD) != SIGN(m2.STD)
WHERE ABS(m1.STD) <= ABS(m2.std)
GROUP BY m1.pTime
) AS oppo ON m.pTime = oppo.pTime
JOIN MainList AS m2 ON oppo.pTime_Oppo = m2.pTime
;
Using the sample data:
INSERT INTO MainList (`pTime`, `STD`)
VALUES
(1106080500, -0.5058072),
(1106081100, -0.82790455),
(1106081400, -0.59226294),
(1106081700, -0.99998194),
(1106090400, 0.57510881),
(1106091300, 0.85599817),
(1106091600, 1.0660959),
(1106540100, -0.86649279),
(1107194700, 1.51340543),
(1107305700, 0.96225296),
(1107306300, 0.53937716),
;
The results are:
+------------+-------------+------------+-------------+
| pTime | STD | pTime_Oppo | STD_Oppo |
+------------+-------------+------------+-------------+
| 1106080500 | -0.5058072 | 1106090400 | 0.57510881 |
| 1106081100 | -0.82790455 | 1106091300 | 0.85599817 |
| 1106081400 | -0.59226294 | 1106091300 | 0.85599817 |
| 1106081700 | -0.99998194 | 1106091600 | 1.0660959 |
| 1106090400 | 0.57510881 | 1106540100 | -0.86649279 |
| 1106091300 | 0.85599817 | 1106540100 | -0.86649279 |
| 1106540100 | -0.86649279 | 1107194700 | 1.51340543 |
+------------+-------------+------------+-------------+

Any solution based on functions like ABS or SIGN or anything similar required to check sign is doomed to be ineffective on big sets of data, because it makes indexing impossible.
You are creating a temporary table inside a SP so you can alter it schema without losing anything, adding a column that stores sign of STD and storing STD itself unsigned will give you HUGE performance boost, because you can simply find first bigger pTime and bigger STD with a different sign and all conditions can use indices in a query like this (STD_positive keeps STD's sign):
SELECT * from mainlist m
LEFT JOIN mainlist mu
ON mu.pTime = ( SELECT md.pTime FROM mainlist md
WHERE m.pTime < md.pTime
AND m.STD < md.STD
AND m.STD_positive <> md.STD_positive
ORDER BY md.pTime
LIMIT 1 )
LEFT JOIN is needed here to return rows that dont have bigger STD. If you don't need them use simple JOIN. This query should run fine even on lots of records, with proper indices based on careful checking of EXPLAIN output, starting with an index on STD.

SELECT
m.pTime,
m.STD,
mo.pTime AS pTime_Oppo,
-mo.STD AS STD_Oppo
FROM MainList m
INNER JOIN (
SELECT
pTime,
-STD AS STD
FROM MainList
) mo ON m.STD > 0 AND mo.STD > m.STD
OR m.STD < 0 AND mo.STD < m.STD
LEFT JOIN (
SELECT
pTime,
-STD AS STD
FROM MainList
) mo2 ON mo.STD > 0 AND mo2.STD > m.STD AND mo.STD > mo2.STD
OR mo.STD < 0 AND mo2.STD < m.STD AND mo.STD < mo2.STD
WHERE mo2.pTime IS NULL

Related

How come I get a syntax error when using a CTE to update a table?

I have a column (share_2pp) that needs to be updated with a calculated result from the table. This select query produces the column (share_2pp) I would like.
WITH cte
AS (
SELECT recipe
, SUM(meal_nr) AS meal_sum
FROM w03_forecast
GROUP BY recipe
)
SELECT w03_forecast.recipe
, w03_forecast.meal_nr
, meal_sum
, (meal_nr / meal_sum) AS share_2pp
FROM w03_forecast
INNER JOIN cte
ON w03_forecast.recipe = cte.recipe;
+--------+---------+----------+-----------+
| recipe | meal_nr | meal_sum | share_2pp |
+--------+---------+----------+-----------+
| 1 | 3842 | 4593 | 0.8365 |
| 2 | 4284 | 5130 | 0.8351 |
| 3 | 4166 | 4926 | 0.8457 |
| 4 | 2830 | 3382 | 0.8368 |
| 5 | 2495 | 2935 | 0.8501 |
| 1 | 751 | 4593 | 0.1635 |
| 2 | 846 | 5130 | 0.1649 |
| 3 | 760 | 4926 | 0.1543 |
| 4 | 552 | 3382 | 0.1632 |
| 5 | 440 | 2935 | 0.1499 |
+--------+---------+----------+-----------+
However, when I try to update the table I get a syntax error at FROM.
WITH cte
AS (
SELECT recipe
, SUM(meal_nr) AS meal_sum
FROM w03_forecast
GROUP BY recipe
)
UPDATE w03_forecast
SET w03_forecast.share_2pp = (meal_nr / meal_sum)
FROM w03_forecast
INNER JOIN cte
ON w03_forecast.recipe = cte.recipe;
I think you can just use JOIN:
UPDATE w03_forecast f JOIN
(SELECT recipe, SUM(meal_nr) AS meal_sum
FROM w03_forecast
GROUP BY recipe
) r
USING (recipe)
SET f.share_2pp = (meal_nr / meal_sum);
That said, there is no reason to store this in the table. It can easily be calculated on-the-fly using window functions:
select f.*,
(meal_nr / sum(meal_nr) over ()) as share_2pp
from w03_forecast f

Converting CHAR Primary Key to INT in MySQL/MariaDB

I have a table that uses CHAR as the primary key for customers. I am attempting to load this table into a schema such that the primary key should be an INT.
DROP TABLE IF EXISTS `customers`;
CREATE TABLE `customers` (
`customer_id` char(5) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `customers` VALUES ('99944'),('99946'),('99976'),('A0014'),('A0049'),('A0124'),('C01AH'),('C01AQ'),('C01AW'),('C01AX'),('C01AY'),('C01AZ');
Fiddle
I have attempted variations on select cast(customer_id AS UNSIGNED) FROM customers; but only get back 0s for the non-int rows. How do I cast the non-int rows into a consistent INT result?
The ideal result would look like this:
For customer IDs that are solely integers, leave them alone.
For customer IDs that contain any letter, replace everything in the ID with a unique numerical identifier.
Expected result:
SELECT * FROM Customers;
`customer_id`
-------
99944
99946
99976
13871911
13871912
13871913
13872128
13872229
13872293
13872505
13872512
13872561
GMB did give me a other idea.
Using the HEX() and CONV(.., 16, 10) to convert from hexadecimals into decimales
Query
SELECT
customers.customer_id
, CASE
WHEN (customers.customer_id >> 0) > 0
THEN customers.customer_id >> 0
ELSE
CONV(HEX(customers.customer_id), 16, 10)
END
AS customer_id_int
FROM
customers;
Result
| customer_id | customer_id_int |
| ----------- | --------------- |
| 99944 | 99944 |
| 99946 | 99946 |
| 99976 | 99976 |
| A0014 | 279981338932 |
| A0049 | 279981339705 |
| A0124 | 279981404724 |
| C01AH | 288571343176 |
| C01AQ | 288571343185 |
| C01AW | 288571343191 |
| C01AX | 288571343192 |
| C01AY | 288571343193 |
| C01AZ | 288571343194 |
p.s
It might be generating a to large int you need to use a BIGINT datatype.
see demo
Updated
A other method to generate smaller int's (UNSIGNED INT) which uses a "SQL number generator", SUBSTRING(), ORD() and GROUP_CONCAT().
Query
SELECT
customers.customer_id
CASE
WHEN customers.customer_id >> 1 > 0
THEN customers.customer_id
ELSE
GROUP_CONCAT(
CASE
WHEN SUBSTRING(customers.customer_id, number_generator.number, 1) NOT BETWEEN 'A' AND 'Z'
THEN SUBSTRING(customers.customer_id, number_generator.number, 1) >> 1
ELSE ORD(SUBSTRING(customers.customer_id, number_generator.number, 1))
END
ORDER BY
number_generator.number ASC
SEPARATOR ''
)
END
) AS customer_id_int
FROM (
SELECT
record_1.number
FROM (
SELECT 1 AS number UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5
) AS record_1
) AS number_generator
CROSS JOIN
customers
GROUP BY
customers.customer_id
ORDER BY
customers.customer_id ASC
Result
| customer_id | customer_id_int |
| ----------- | --------------- |
| 99944 | 99944 |
| 99946 | 99946 |
| 99976 | 99976 |
| A0014 | 650002 |
| A0049 | 650024 |
| A0124 | 650012 |
| C01AH | 67006572 |
| C01AQ | 67006581 |
| C01AW | 67006587 |
| C01AX | 67006588 |
| C01AY | 67006589 |
| C01AZ | 67006590 |
see demo
With Maria DB >= 10.0.5, here is a solution to turn a string primary key to an integer primary key in a predictable manner :
SELECT
customer_id old_id,
CAST(
REGEXP_REPLACE(customer_id, '([^0-9])', ORD('$1'))
AS UNSIGNED
) new_id
FROM customers;
REGEXP_REPLACE() captures non-numeric characters (anywhere in the string) and ORD() turns each of them into its ordinal (numerical) represtation.
Demo on DB Fiddle :
old_id | new_id
:----- | -------:
99944 | 99944
9Z946 | 936946
A9CZ6 | 36936366
A0C14 | 3603614
0ABC0 | 3636360
Using MySQL 8.0 REGEXP_REPLACE:
select cast(REGEXP_REPLACE(customer_id,'[^0-9]','') AS UNSIGNED) FROM customers;
db<>fiddle demo

MySQL : Update Table upon certain conditions

I have 2 tables , master and current table( refreshed very hr).
Both the table shave same structure:
Chk | description |state | date
I need to update / append ( add the new row ) into the master table if :
1) rows that have new IDs or
2) if a particular variable ( 'state' in this case) has changed. I tried to do it using below without success :
INSERT into AGILE_TICKETS_DLY
SELECT * FROM CURR_AGILE_TICKETS curr
WHERE EXISTS (SELECT * FROM AGILE_TICKETS_DLY mstr
WHERE (curr.chk != mstr.chk) OR ( curr.chk = mstr.chk and
mstr.state != curr.state))
Any pointers on how to achieve this ?
You can try this pair of queries:
-- insert new rows
insert into agile_tickets_dly
select * from curr_agile_tickets
where chk not in (select chk from agile_tickets_dly);
-- update updated rows
update agile_tickets_dly x
join
(
select b.chk chk,b.description description,b.state state,b.date date
from agile_tickets_dly a, curr_agile_tickets b
where
a.chk=b.chk and
(a.description != b.description or a.state != b.state or a.date != b.date)
) y
on x.chk=y.chk
set x.description = y.description, x.state= y.state, x.date = y.date;
Illustration:
select * from agile_tickets_dly;
+------+-------------+---------+------------+
| chk | description | state | date |
+------+-------------+---------+------------+
| 0 | desc-0 | state-1 | 01-01-2017 |
| 1 | desc-1 | state-1 | 01-01-2018 |
| 2 | desc-2 | state-2 | 01-02-2018 |
| 3 | desc-3 | state-3 | 01-03-2018 |
+------+-------------+---------+------------+
-- one new row with chk=4, three updated rows with chk=1,2,3
select * from curr_agile_tickets;
+------+----------------+-----------------+----------------+
| chk | description | state | date |
+------+----------------+-----------------+----------------+
| 0 | desc-0 | state-1 | 01-01-2017 |
| 1 | desc-1 | state-1 | date-1-updated |
| 2 | desc-2-updated | state-2 | 01-02-2018 |
| 3 | desc-3 | state-3-updated | 01-03-2018 |
| 4 | desc-4 | state-4 | 01-04-2018 |
+------+----------------+-----------------+----------------+
-- after executing the two queries
select * from agile_tickets_dly;
+------+----------------+-----------------+----------------+
| chk | description | state | date |
+------+----------------+-----------------+----------------+
| 0 | desc-0 | state-1 | 01-01-2017 |
| 1 | desc-1 | state-1 | date-1-updated |
| 2 | desc-2-updated | state-2 | 01-02-2018 |
| 3 | desc-3 | state-3-updated | 01-03-2018 |
| 4 | desc-4 | state-4 | 01-04-2018 |
+------+----------------+-----------------+----------------+
I tried to do this in 2 separate steps:
1) First I append all new rows with IDs : this worked
INSERT into AGILE_TICKETS_DLY
SELECT * FROM CURR_AGILE_TICKETS curr
WHERE not EXISTS (SELECT * FROM AGILE_TICKETS_DLY mstr
WHERE (curr.chk = mstr.chk));
But then, I tried to do below got an error
2) Then replace the 'State' variable with new value:
INSERT into AGILE_TICKETS_DLY_1 (state)
SELECT state
from CURR_AGILE_TICKETS_1 curr
where exists ( select * from AGILE_TICKETS_DLY_1 mstr where curr.chk =
mstr.chk);
But this gives me an error :
SQL Error (1364) : Field 'chk' doesn't have a default value.
What does that mean ?

Why i get mysql result column value dismatch

msyql query
SELECT id,student_user_id,MIN(start_time) FROM appoint_course
WHERE student_user_id IN(
931,2034,2068,2111,2115,2173,2181,2285,2500,2505,2507,
2518,2594,2596,2600,2608,2637,2652,2654
)
AND course_type=3 and disabled=0 GROUP BY student_user_id;
result
[query result]
+-------+-----------------+-----------------+
| id | student_user_id | MIN(start_time) |
+-------+-----------------+-----------------+
| 8356 | 931 | 1500351000 |
| 9205 | 2034 | 1501733400 |
| 9246 | 2068 | 1501649100 |
| 9755 | 2111 | 1502943000 |
| 9585 | 2115 | 1502595300 |
| 10820 | 2173 | 1503545700 |
| 9594 | 2181 | 1502852400 |
| 10324 | 2285 | 1502852400 |
| 11204 | 2500 | 1504839600 |
| 11152 | 2507 | 1504064100 |
| 12480 | 2594 | 1505707800 |
| 11521 | 2608 | 1504494000 |
| 11818 | 2652 | 1504753200 |
+-------+-----------------+-----------------+
but right start time is:
id: 9594
start_time: 1503284400
9594 right start_time is 1503284400 not 1502852400.In fact 1502852400 is a record of 9597
I do not know why.
In any other database your query would return an error, because id is not in the group by. The correct query is:
SELECT student_user_id, MIN(start_time)
FROM appoint_course
WHERE student_user_id IN (931,2034,2068,2111,2115,2173,2181,2285,2500,2505,2507,2518,2594,2596,2600,2608,2637,2652,2654) AND
course_type = 3 and disabled = 0
GROUP BY student_user_id;
In your case, adding a simple MIN(id) to the SELECT might work, assuming that ids increase with the start time.
More generally, you appear to want:
SELECT ac.*
FROM appoint_course ac
WHERE ac.student_user_id IN (931,2034,2068,2111,2115,2173,2181,2285,2500,2505,2507,2518,2594,2596,2600,2608,2637,2652,2654) AND
ac.course_type = 3 AND ac.disabled = 0 AND
ac.start_time = (SELECT MIN(ac2.start_time)
FROM appoint_course ac2
WHERE ac2.student_user_id = ac.student_user_id AND
ac2.course_type = ac.course_type AND
ac2.disabled = ac.disabled
);
No GROUP BY is necessary.
I should add that there is a MySQL hack that often works:
SELECT student_user_id, MIN(start_time),
SUBSTRING_INDEX(GROUP_CONCAT(id ORDER BY start_time), ',', 1) as id_at_min_start_time
FROM appoint_course
WHERE student_user_id IN (931,2034,2068,2111,2115,2173,2181,2285,2500,2505,2507,2518,2594,2596,2600,2608,2637,2652,2654) AND
course_type = 3 and disabled = 0
GROUP BY student_user_id;
This uses string manipulations and the GROUP_CONCAT() can overflow internal buffer sizes.

Efficient assignment of percentile/rank in MYSQL

I have a couple of very large tables (over 400,000 rows) that look like the following:
+---------+--------+---------------+
| ID | M1 | M1_Percentile |
+---------+--------+---------------+
| 3684514 | 3.2997 | NULL |
| 3684515 | 3.0476 | NULL |
| 3684516 | 2.6499 | NULL |
| 3684517 | 0.3585 | NULL |
| 3684518 | 1.6919 | NULL |
| 3684519 | 2.8515 | NULL |
| 3684520 | 4.0728 | NULL |
| 3684521 | 4.0224 | NULL |
| 3684522 | 5.8207 | NULL |
| 3684523 | 6.8291 | NULL |
+---------+--------+---------------+...about 400,000 more
I need to assign each row in the M1_Percentile column a value that represents "the percent of rows with M1 values equal or lower to the current row's M1 value"
In other words, I need:
I implemented this sucessfully, but it is FAR FAR too slow. If anyone could create a more efficient version of the following code, I would really appreciate it!
UPDATE myTable AS X JOIN (
SELECT
s1.ID, COUNT(s2.ID)/ (SELECT COUNT(*) FROM myTable) * 100 AS percentile
FROM
myTable s1 JOIN myTable s2 on (s2.M1 <= s1.M1)
GROUP BY s1.ID
ORDER BY s1.ID) AS Z
ON (X.ID = Z.ID)
SET X.M1_Percentile = Z.percentile;
This is the (correct but slow) result from the above query if the number of rows is limited to the ones you see (10 rows):
+---------+--------+---------------+
| ID | M1 | M1_Percentile |
+---------+--------+---------------+
| 3684514 | 3.2997 | 60 |
| 3684515 | 3.0476 | 50 |
| 3684516 | 2.6499 | 30 |
| 3684517 | 0.3585 | 10 |
| 3684518 | 1.6919 | 20 |
| 3684519 | 2.8515 | 40 |
| 3684520 | 4.0728 | 80 |
| 3684521 | 4.0224 | 70 |
| 3684522 | 5.8207 | 90 |
| 3684523 | 6.8291 | 100 |
+---------+--------+---------------+
Producing the same results for the entire 400,000 rows takes magnitudes longer.
I cannot test this, but you could try something like:
update table t
set mi_percentile = (
select count(*)
from table t1
where M1 < t.M1 / (
select count(*)
from table));
UPDATE:
update test t
set m1_pc = (
(select count(*) from test t1 where t1.M1 < t.M1) * 100 /
( select count(*) from test));
This works in Oracle (the only database I have available). I do remember getting that error in MySQL. It is very annoying.
Fair warning: mysql isn't my native environment. However, after a little research, I think the following query should be workable:
UPDATE myTable AS X
JOIN (
SELECT X.ID, (
SELECT COUNT(*)
FROM myTable X1
WHERE (X.M1, X.id) >= (X1.M1, X1.id) as Rank)
FROM myTable as X
) AS RowRank
ON (X.ID = RowRank.ID)
CROSS JOIN (
SELECT COUNT(*) as TotalCount
FROM myTable
) AS TotalCount
SET X.M1_Percentile = RowRank.Rank / TotalCount.TotalCount;