MySQL Question - I want to eliminate all text within any parenthesis - mysql

Just checking to see if any of you would have a solution for this – from the below text like this I want to eliminate all text within any parenthesis.
Input –
PAY - addition,FILES (aaaaaaaaaaaaaa/bbbbbbbbbbbs i.e. ssss,ffff – i.e. cccccc),DED (ppppppp, llllll, fffff gggg),LOSS (ddddd, hhhhhh – i.e.),F TO G ( “F” is switching to “G”)
Output –
PAY - addition,FILES,DED,LOSS,F TO G

If you are running MySQL 8.0, you can do this with regexp_replace():
regexp_replace(mytext, '\\([^)]*\\)', '')
This works as long as there are no nested parentheses in the expression (which is consistent with your sample data).
Demo on DB Fiddle:
select regexp_replace(
'PAY - addition,FILES (aaaaaaaaaaaaaa/bbbbbbbbbbbs i.e. ssss,ffff – i.e. cccccc),DED (ppppppp, llllll, fffff gggg),LOSS (ddddd, hhhhhh – i.e.),F TO G ( “F” is switching to “G”)',
'\\([^)]*\\)',
''
) val
| val |
| :--------------------------------------- |
| PAY - addition,FILES ,DED ,LOSS ,F TO G |

Another on for MYSQL8.0:
SET #input:="PAY - addition,FILES (aaaaaaaaaaaaaa/bbbbbbbbbbbs i.e. ssss,ffff – i.e. cccccc),DED (ppppppp, llllll, fffff gggg),LOSS (ddddd, hhhhhh – i.e.),F TO G ( “F” is switching to “G”)";
with recursive cte as (
select
0 i,
#input as text
union all
select
i+1,
CASE WHEN instr(text,'(') >0 AND instr(text,')')>instr(text,'(') THEN REPLACE(text, substring(text,instr(text,'('),instr(text,')')-instr(text,'(')+1), '') ELSE '' END
from cte
where i<10
) select text from cte where text<>'' order by i desc limit 1;
output:
+------------------------------------------+
| text |
+------------------------------------------+
| PAY - addition,FILES ,DED ,LOSS ,F TO G |
+------------------------------------------+
1 row in set (0.00 sec)

Related

How to show positions of '1' within a long number mysql

I know I can find the first position of 1 with my number by using the following:
SELECT POSITION("1" IN "0000100001000001");
How would I find all the positions of 1 to return 5;10;16
Your string appears to be a 16-bit number represented in base-2.
I set a user variable to your example string.
set #bin = '0000100001000001';
We can use CONV() to convert it to a base-10 number instead of base-2. This allows the integer value to be used when we use it in numeric expressions.
mysql> select conv(#bin, 2, 10);
+-------------------+
| conv(#bin, 2, 10) |
+-------------------+
| 2113 |
+-------------------+
Then we can test for a particular bit set in this number using the & bitwise-and operator.
mysql> select conv(#bin, 2, 10) & 64;
+------------------------+
| conv(#bin, 2, 10) & 64 |
+------------------------+
| 64 |
+------------------------+
We can test all the bits of the integer value. If a given bit is set, then substitute the "position," as you call it, for that bit (counting from left to right, which is the opposite of the traditional bit positions).
If the bit is not set, then default to NULL. Then concatenate these together using CONCAT_WS(), which ignores NULLs.
select concat_ws(';',
case conv(#bin,2,10)&32768 when 32768 then 1 end,
case conv(#bin,2,10)&16384 when 16384 then 2 end,
case conv(#bin,2,10)&8192 when 8192 then 3 end,
case conv(#bin,2,10)&4096 when 4096 then 4 end,
case conv(#bin,2,10)&2048 when 2048 then 5 end,
case conv(#bin,2,10)&1024 when 1024 then 6 end,
case conv(#bin,2,10)&512 when 512 then 7 end,
case conv(#bin,2,10)&256 when 256 then 8 end,
case conv(#bin,2,10)&128 when 128 then 9 end,
case conv(#bin,2,10)&64 when 64 then 10 end,
case conv(#bin,2,10)&32 when 32 then 11 end,
case conv(#bin,2,10)&16 when 16 then 12 end,
case conv(#bin,2,10)&8 when 8 then 13 end,
case conv(#bin,2,10)&4 when 4 then 14 end,
case conv(#bin,2,10)&2 when 2 then 15 end,
case conv(#bin,2,10)&1 when 1 then 16 end) as bits_set;
Output:
+----------+
| bits_set |
+----------+
| 5;10;16 |
+----------+
There is no such functionality built in. You can create your own function for this.
delimiter $$
create function f_position_multiple(
in_f char(1),
in_str text
)
returns text
begin
declare v_delim char(1);
declare v_loc int;
declare v_ret text;
set v_ret = '';
set v_delim = '';
set v_loc = 0;
set v_loc = locate(in_f, in_str, v_loc+1);
while(v_loc>0) do
set v_ret = concat(v_ret, v_delim, v_loc);
set v_delim = ';';
set v_loc = locate(in_f, in_str, v_loc+1);
end while;
return v_ret;
end
$$
And then you can use:
select f_position_multiple('1', '1001001')
A solution for MySql 8.0+ with a recursive CTE:
set #n = '0000100001000001';
with recursive cte as (
select 0 pos, ' ' bit
union all
select pos + 1, substring(#n, pos + 1, 1)
from cte
where pos < length(#n)
)
select group_concat(pos order by pos separator ';') result
from cte
where bit = '1'
See the demo.
Result:
| result |
| ------- |
| 5;10;16 |

Two things to do in MySQL IF()

I have a problem concerning the IF() function in MySQL.
I would like to return a string and change the value of a variable. Somwhat like:
IF(#firstRow=1, "Dear" AND #firstRow:=0, "dear")
This outputs only '0' instead of 'Dear'...
I would be very thankful for some input on ways I could solve this problem!
Louis :)
AND is a boolean operator, not a "also do this other thing" operator.
"Dear" AND 0 returns 0 because 0 is treated as false in MySQL and <anything> AND false will return false.
Also because the integer/boolean value of "Dear" is 0 as well. Using a string in a numeric context just reads initial digits in the string, if any, and ignores the rest.
It's not clear what your problem is, but I guess you want to capitalize the word "dear" if the row is the first one in the result set.
Instead of being too clever by half trying to fit the side-effect into your expression, do yourself a favor and break it out into a separate column:
mysql> SELECT IF(#firstRow=1, 'Dear', 'dear'), #firstRow:=0 AS _ignoreThis
-> FROM (SELECT #firstRow:=1) AS _init
-> CROSS JOIN
-> mytable;
+---------------------------------+-------------+
| IF(#firstRow=1, 'Dear', 'dear') | _ignoreThis |
+---------------------------------+-------------+
| Dear | 0 |
| dear | 0 |
| dear | 0 |
+---------------------------------+-------------+
But if you really want to make your code as confusing and unreadable as possible, you can do something like this:
SELECT IF(#firstRow=1, CONCAT('Dear', IF(#firstRow:=0, '', '')), 'dear')
FROM (SELECT #firstRow:=1) AS _init
CROSS JOIN
...
But remember this important metric of code quality: WTFs per minute.
Use a case expression instead of IF() as the syntax is far easier to follow e.g.
select
case when #firstRow = 1 then 'Dear' else 'dear' end AS Salutation
, #firstRow := 0
from (
select 1 n union all
select 2 n union all
select 3
) d
cross join (SELECT #firstRow:=1) var
+---+------------+----------------+
| | Salutation | #firstRow := 0 |
+---+------------+----------------+
| 1 | Dear | 0 |
| 2 | dear | 0 |
| 3 | dear | 0 |
+---+------------+----------------+
Demo

Order by for column in varchar type

I have the following column strand which is ordered in ascending order but its taking 3.10 as next after 3.1 instead of 3.2..
the column is varchar type..
Strand
3.1
3.1.1
3.1.1.1
3.1.1.2
3.1.2
3.1.2.1
3.10 # wrong
3.10.1 # wrong
3.10.1.1 # wrong
3.2 <- this should have been after 3.1.2.1
3.2.1
3.2.1.1
..
3.9
3.9.1.1
<- here is where 3.10 , 3.10.1 and 3.10.1.1 should reside
I used the following query to order it;
SELECT * FROM [table1]
ORDER BY RPAD(Strand,4,'.0') ;
how to make sure its ordered in the right way such that 3.10,3.10.1 and 3.10.1.1 is at last
Try this:
DROP TABLE T1;
CREATE TABLE T1 (Strand VARCHAR(20));
INSERT INTO T1 VALUES ('3.1');
INSERT INTO T1 VALUES('3.1.1');
INSERT INTO T1 VALUES('3.1.1.1');
INSERT INTO T1 VALUES('3.1.1.2');
INSERT INTO T1 VALUES('3.2');
INSERT INTO T1 VALUES('3.2.1');
INSERT INTO T1 VALUES('3.10');
INSERT INTO T1 VALUES('3.10.1');
SELECT * FROM T1
ORDER BY STRAND;
SELECT *
FROM T1
ORDER BY
CAST(SUBSTRING_INDEX(CONCAT(Strand+'.0.0.0.0','.',1) AS UNSIGNED INTEGER) *1000 +
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(Strand,'.0.0.0.0'),'.',2),'.',-1) AS UNSIGNED INTEGER) *100 +
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(Strand,'.0.0.0.0'),'.',3),'.',-1) AS UNSIGNED INTEGER) *10 +
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(Strand,'.0.0.0.0'),'.',4),'.',-1) AS UNSIGNED INTEGER)
Output not ordeded:
Strand
1 3.1
2 3.1.1
3 3.1.1.1
4 3.1.1.2
5 3.10
6 3.10.1
7 3.2
8 3.2.1
Output Ordered:
Strand
1 3.1
2 3.1.1
3 3.1.1.1
4 3.1.1.2
5 3.2
6 3.2.1
7 3.10
8 3.10.1
you can order the result baset on the integer value of your field. your code will looks like
select [myfield]from [mytable] order by
convert(RPAD(replace([myfield],'.',''),4,0),UNSIGNED INTEGER);
in this code replace function will cleand the dots (.)
hope thin help
You must normalize each group of digits
SELECT * FROM [table1]
ORDER BY CONCAT(
LPAD(SUBSTRING_INDEX(Strand,'.',1),3,'0'), '-',
LPAD(SUBSTRING_INDEX(SUBSTRING_INDEX(Strand,'.',2),'.',-1),3,'0'), '-',
LPAD(SUBSTRING_INDEX(SUBSTRING_INDEX(Strand,'.',3),'.',-1),3,'0'), '-',
LPAD(SUBSTRING_INDEX(SUBSTRING_INDEX(Strand,'.',3),'.',-1),3,'0'));
sample
mysql> SELECT CONCAT(
-> LPAD(SUBSTRING_INDEX('3.10.1.1','.',1),3,'0'), '-',
-> LPAD(SUBSTRING_INDEX(SUBSTRING_INDEX('3.10.1.1','.',2),'.',-1),3,'0'), '-',
-> LPAD(SUBSTRING_INDEX(SUBSTRING_INDEX('3.10.1.1','.',3),'.',-1),3,'0'), '-',
-> LPAD(SUBSTRING_INDEX(SUBSTRING_INDEX('3.10.1.1','.',3),'.',-1),3,'0'));
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| CONCAT(
LPAD(SUBSTRING_INDEX('3.10.1.1','.',1),3,'0'), '-',
LPAD(SUBSTRING_INDEX(SUBSTRING_INDEX('3.10.1.1','.',2),'.',-1),3,'0'), '-',
LPAD(SUBSTRING_INDEX(SUBSTRING_INDEX('3.10.1.1','.',3),'.',-1),3,'0'), '-',
LPAD(SUBSTRING_INDEX(SUBSTRI |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 003-010-001-001 |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0,00 sec)
Cause "strand" column is text data, so it will be ordered in alphabetical. To make it be ordered as your desire, you should format your data before insert or update it. Suppose maximum digit for each level is 3, your data should be formated like this
003.001
003.001.001
003.001.001.001
003.002
003.002.001
003.002.001.001
003.010
010.001
The altenative way is splitting "strand" column into mutiple columns. Each column will store data for each level, such as
Level1 | Level2 | Level3 ...
3 | 1 | 0
3 | 1 | 1
3 | 2 | 0
...
3 | 10 | 0
Datatype of these columns should be number and then you should be able to order by these columns.
If the point(.) in your data is no more than 3, you can try this:
select *
from demo
order by replace(Strand, '.', '') * pow(10, (3 + length(replace(Strand, '.', '')) - length(Strand)))
If the point is uncertain, here you can should use subquery to get max num of point:
select demo.Strand
from demo
cross join (
select max(length(Strand) - length(replace(Strand, '.', ''))) as num from demo
) t
order by replace(Strand, '.', '') * pow(10, (num + length(replace(Strand, '.', '')) - length(Strand)))
See demo in Rextester.
As you see, I've used function replace, length, pow in order by clause.
1) replace(Strand, '.', '') will give us int number, like:
replace('3.10.1.1', '.', '') => 31011;
2) (3 + length(replace(Strand, '.', '')) - length(Strand)) will give us the count of point which the max num of point minus point's count in Strand, like:
3.1 => 2;
3)pow returns the value of X raised to the power of Y;
so the sample data will be calculated like:
3100
3110
3111
3112
3120
3121
31000
31010
31011
3200
3210
3211
3900
3911
by these nums, you will get the right sort.

MYSQL query average price

I have to calculate the average price of a house in Groningen.
Though the price is not stored as an number but as a string (with some additional information) and it uses a point ('.') as a thousands separator.
Price is stored as 'Vraagprijs' in Dutch.
The table results are:
€ 95.000 k.k.
€ 116.500 v.o.n.
€ 115.000 v.o.n.
and goes so on...
My query:
'$'SELECT AVG(SUBSTRING(value,8,8)) AS AveragePrice_Groningen
FROM properties
WHERE name = 'vraagprijs'
AND EXISTS (SELECT *
FROM estate
WHERE pc_wp LIKE '%Groningen%'
AND properties.woid = estate.id);
The result is:
209.47509187620884
But it has to be:
20947509187620,884
How can i get this done?
The AVG(SUBSTRING(value,8,8)) dosent work:
sample
MariaDB [yourSchema]> SELECT *,SUBSTRING(`value`,8,8), SUBSTRING_INDEX(SUBSTRING_INDEX(`value`, ' ', -2),' ',1) FROM properties;
+----+-----------------------+------------------------+----------------------------------------------------------+
| id | value | SUBSTRING(`value`,8,8) | SUBSTRING_INDEX(SUBSTRING_INDEX(`value`, ' ', -2),' ',1) |
+----+-----------------------+------------------------+----------------------------------------------------------+
| 1 | € 95.000 k.k. | 95.000 k | 95.000 |
| 2 | € 116.500 v.o.n. | 116.500 | 116.500 |
| 3 | € 115.000 v.o.n. | 115.000 | 115.000 |
+----+-----------------------+------------------------+----------------------------------------------------------+
3 rows in set (0.00 sec)
MariaDB [yourSchema]>
**change it to **
AVG(SUBSTRING_INDEX(SUBSTRING_INDEX(`value`, ' ', -2),' ',1))
Try using a CAST DECIMAL and SPLIT for get the right part of the string
'$'
SELECT AVG( CAST(SPLIT_STR(value,' ', 2)) AS DECIMAL) AS AveragePrice_Groningen
FROM properties
WHERE name = 'vraagprijs'
AND EXISTS (SELECT *
FROM estate
WHERE pc_wp LIKE '%Groningen%'
AND properties.woid = estate.id);
You entered the data with the . as decimal separator, which is normal in Dutch, but not normal in English where they tend to use the , as decimal separator.
Enter the data into you database as 215000.000, etc and you should get normal values as answer.

Two methods of performing cohort analysis in MySQL using joins

I make a cohort analysis processor. Input parameters: time range and step, condition (initial event) to exctract cohorts, additional condition (retention event) to check after each N hours/days/months. Output parameters: cohort analysis grid, like this:
0h | 16h | 32h | 48h | 64h | 80h | 96h |
cohort #00 15 | 6 | 4 | 1 | 1 | 2 | 2 |
cohort #01 1 | 35 | 8 | 0 | 2 | 0 | 1 |
cohort #02 0 | 3 | 31 | 11 | 5 | 3 | 0 |
cohort #03 0 | 0 | 4 | 27 | 7 | 6 | 2 |
cohort #04 0 | 1 | 1 | 4 | 29 | 4 | 3 |
Basically:
fetch cohorts: unique users who did something 1 in every period from time_begin every time_step.
find how many of them (in each cohort) did something 2 after N seconds, N*2 seconds, N*3, and so on until now.
In short - I have 2 solutions. One works too slow and includes a heavy select with joins for each data step: 1 day, 2 day, 3 day, etc. I want to optimize it by joining result for every data step to cohorts - and it's the second solution. It looks like it works but I'm not sure it's the best way and that it will give the same result even if cohorts will intersect. Please check it out.
Here's the whole story.
I have a table of > 100,000 events, something like this:
#user-id, timestamp, event_name
events_view (uid varchar(64), tm int(11), e varchar(64))
example input row:
"user_sampleid1", 1423836540, "level_end:001:win"
To make a cohort analisys first I extract cohorts: for example, users, who send special event '1st_launch' in 10 hour periods starting from 2015-02-13 and ending with 2015-02-16. All code in this post is simplified and shortened to see the idea.
DROP TABLE IF EXISTS tmp_c;
create temporary table tmp_c (uid varchar(64), tm int(11), c int(11) );
set beg = UNIX_TIMESTAMP('2015-02-13 00:00:00');
set en = UNIX_TIMESTAMP('2015-02-16 00:00:00');
select min(tm) into t_start from events_view ;
select max(tm) into t_end from events_view ;
if beg < t_start then
set beg = t_start;
end if;
if en > t_end then
set en = t_end;
end if;
set period = 3600 * 10;
set cnt_c = ceil((en - beg) / period) ;
/*works quick enough*/
WHILE i < cnt_c DO
insert into tmp_c (
select uid, min(tm), i from events_view where
locate("1st_launch", e) > 0 and tm > (beg + period * i)
AND tm <= (beg + period * (i+1)) group by uid );
SET i = i+1;
END WHILE;
Cohorts may consist the same user ids, though usually one user is exist only in one cohort. And in each cohort users are unique.
Now I have temp table like this:
user_id | 1st timestamp | cohort_no
uid1 1423836540 0
uid2 1423839540 0
uid3 1423841160 1
uid4 1423841460 2
...
uidN 1423843080 M
Then I need to again divide time range on periods and calculate for each period how many users from each cohort have sent event "level_end:001:win".
For each small period I select all unique users who have sent "level_end:001:win" event and left join them to tmp_c cohorts table. So I have something like this:
user_id | 1st timestamp | cohort_no | user_id | other fields...
uid1 1423836540 0 uid1
uid2 1423839540 0 null
uid3 1423841160 1 null
uid4 1423841460 2 uid4
...
uidN 1423843080 M null
This way I see how many users from my cohorts are in those who have sent "level_end:001:win", exclude not found by where clause: where t2.uid is not null.
Finally I perform grouping and have counts of users in each cohort, who have sent "level_end:001:win" in this particluar period.
Here's the code:
DROP TABLE IF EXISTS tmp_res;
create temporary table tmp_res (uid varchar(64) CHARACTER SET cp1251 NOT NULL, c int(11), cnt int(11) );
set i = 0;
set cnt_c = ceil((t_end - beg) / period) ;
WHILE i < cnt_c DO
insert into tmp_res
select concat(beg + period * i, "_", beg + period * (i+1)), c, count(distinct(uid)) from
(select t1.uid, t1.c from tmp_c t1 left join
(select uid, min(tm) from events_view where
locate("level_end:001:win", e) > 0 and
tm > (beg + period * i) AND tm <= (beg + period * (i+1)) group by uid ) t2
on t1.uid = t2.uid where t2.uid is not null) t3
group by c;
SET i = i+1;
END WHILE;
/*getting result of the first method: tooo slooooow!*/
select * from tmp_res;
The result I've got (it's ok that some cohorts are not appear on some periods):
"1423832400_1423890000","1","35"
"1423832400_1423890000","2","3"
"1423832400_1423890000","3","1"
"1423832400_1423890000","4","1"
"1423890000_1423947600","1","21"
"1423890000_1423947600","2","50"
"1423890000_1423947600","3","2"
"1423947600_1424005200","1","9"
"1423947600_1424005200","2","24"
"1423947600_1424005200","3","70"
"1423947600_1424005200","4","6"
"1424005200_1424062800","1","7"
"1424005200_1424062800","2","15"
"1424005200_1424062800","3","21"
"1424005200_1424062800","4","32"
"1424062800_1424120400","1","7"
"1424062800_1424120400","2","13"
"1424062800_1424120400","3","24"
"1424062800_1424120400","4","18"
"1424120400_1424178000","1","10"
"1424120400_1424178000","2","12"
"1424120400_1424178000","3","18"
"1424120400_1424178000","4","14"
"1424178000_1424235600","1","6"
"1424178000_1424235600","2","7"
"1424178000_1424235600","3","9"
"1424178000_1424235600","4","12"
"1424235600_1424293200","1","6"
"1424235600_1424293200","2","8"
"1424235600_1424293200","3","9"
"1424235600_1424293200","4","5"
"1424293200_1424350800","1","5"
"1424293200_1424350800","2","3"
"1424293200_1424350800","3","11"
"1424293200_1424350800","4","10"
"1424350800_1424408400","1","8"
"1424350800_1424408400","2","5"
"1424350800_1424408400","3","7"
"1424350800_1424408400","4","7"
"1424408400_1424466000","2","6"
"1424408400_1424466000","3","7"
"1424408400_1424466000","4","3"
"1424466000_1424523600","1","3"
"1424466000_1424523600","2","4"
"1424466000_1424523600","3","8"
"1424466000_1424523600","4","2"
"1424523600_1424581200","2","3"
"1424523600_1424581200","3","3"
It works but it takes too much time to process because there are many queries here instead of one, so I need to rewrite it.
I think it can be rewritten with joins, but I'm still not sure how.
I decided to make a temporary table and write period boundaries in it:
DROP TABLE IF EXISTS tmp_times;
create temporary table tmp_times (tm_start int(11), tm_end int(11));
set cnt_c = ceil((t_end - beg) / period) ;
set i = 0;
WHILE i < cnt_c DO
insert into tmp_times values( beg + period * i, beg + period * (i+1));
SET i = i+1;
END WHILE;
Then I get periods-to-events mapping (user_id + timestamp represent particular event) to temp table and left join it to cohorts table and group the result:
SELECT Concat(tm_start, "_", tm_end) per,
t1.c coh,
Count(DISTINCT( t2.uid ))
FROM tmp_c t1
LEFT JOIN (SELECT *
FROM tmp_times t3
LEFT JOIN (SELECT uid,
tm
FROM events_view
WHERE Locate("level_end:101:win", e) > 0)
t4
ON ( t4.tm > t3.tm_start
AND t4.tm <= t3.tm_end )
WHERE t4.uid IS NOT NULL
ORDER BY t3.tm_start) t2
ON t1.uid = t2.uid
WHERE t2.uid IS NOT NULL
GROUP BY per,
coh
ORDER BY per,
coh;
In my tests this returns the same result as method #1. I can't check the result manually, but I understand how method #1 work more and as far I can see it gives what I want. Method #2 is faster, but I'm not sure it's the best way and it will give the same result even if cohorts will intersect.
Maybe there are well-known common methods to perform a cohort analysis in SQL? Is method #1 I use more reliable than method #2? I work with joins not that often, that's why still do not fully understand joins magic yet.
Method #2 looks like pure magic, and I used to not believe in what I don't understand :)
Thanks for answers!