Replace all occurrences of subquery in select result - mysql

Working on an export from a Sparx EA database in MySQL.
The database contains objects that have notes
select o.Note from t_object o
The result could be
Note
Contains reference to term1 and term2
Another note that mentions term1 only
A note that doesn't mention any terms
There is also a glossary that I can query like this
select g.TERM
from t_glossary g
union
select o.Name
from t_diagram d
join t_diagramobjects dgo
on dgo.Diagram_ID = d.Diagram_ID
join t_object o
on o.Object_ID = dgo.Object_ID
where 1=1
and d.styleEx like '%MDGDgm=Glossary Item Lists::GlossaryItemList;%'
The result of this query
TERM
term1
term2
The requirement is that I underline each word in the notes of the first query that is an exact match to one of the terms in the second query. Underlining can be done by enclosing the word in <u> </u> tags
So the final query result should be
Note
Contains reference to <u>term1</u> and <u>term2</u>
Another note that mentions <u>term1</u>only
A note that doesn't mention any terms
Is there any way to do this in a select query? (so without variables, temp tables, loops, and all that stuff)

I think regular expressions might be a better approach. For your example, you want:
select regexp_replace(note, '(term1|term2)', '<u>$1</u>')
from t_object;
You can easily construct this in MySQL as:
select regexp_replace(note, pattern, '<u>$1</u>')
from t_object cross join
(select concat('(', group_concat(term separator '|'), ')') as pattern
from t_glossary
) g;
Here is a db<>fiddle.
Regular expressions have a key advantage that they give you more flexibility on the word boundaries. The above replaces any occurrence of the terms, regardless of surrounding characters. But you can adjust that using the power of regular expressions.
I might also suggests that such replacement could be done -- using regular expressions -- at the application layer.

Here I have replace all the TERM from t_glossary table in note column from t_object table with <ul>Term</ul>
Schema:
create table t_object(note varchar(500));
insert into t_object
select 'Contains reference to term1 and term2' as Note
union all
select 'Another note that mentions term1 only'
union all
select 'A note that doesn''t mention any terms';
create table t_glossary (TERM varchar(500));
insert into t_glossary
select 'term1 '
union all
select 'term2';
Query:
WITH recursive CTE (note, note2, level) AS
(
SELECT note, note , 0 level
FROM t_object
UNION ALL
SELECT CTE.note,
REPLACE(CTE.note2, g.TERM, concat(' <u>', g.term , '</u> ')), CTE.level + 1
FROM CTE
INNER JOIN t_glossary g ON CTE.note2 LIKE concat('%' , g.TERM , '%') and CTE.note2 not like concat('%<u>', g.term , '</u>%')
)
SELECT DISTINCT note2, note, level
FROM CTE
WHERE level =
(SELECT MAX(level) FROM CTE c WHERE CTE.note = c.note)
Output:
note2
note
level
A note that doesn't mention any terms
A note that doesn't mention any terms
0
Another note that mentions <u>term1 </u> only
Another note that mentions term1 only
1
Contains reference to <u>term1 </u> and <u>term2</u>
Contains reference to term1 and term2
2
db<>fiddle here

Related

how to query by checking if specific fields start with a value from a given array?

(MySQL)
I have a query to check if 'phone_number' or 'fax_number' startsWith a value from a given array,
lets say const possibleValues = [123,432,645,234]
currently my query runs with the 'or' condition, to check if -
'phone_number' or 'fax_number' that starts with 123
or
'phone_number' or 'fax_number' that starts with 432
or
'phone_number' or 'fax_number' that starts with 645
or
'phone_number' or 'fax_number' that starts with 234
it runs extremely slow on a big database, and I wish to make it faster,
is there a way to make it run faster?
I'm kinda new to sql queries,
any help would be highly appreciated!
You can try something like:
SELECT * FROM table_1
WHERE CONCAT(',', `phone_number`, ',') REGEXP ',(123|432|645|234),'
OR CONCAT(',', `fax_number`, ',') REGEXP ',(123|432|645|234),';
Demo
Try creating an in-line table and join with it.
WITH telnostart(telnostart) AS (
SELECT '123'
UNION ALL SELECT '432'
UNION ALL SELECT '645'
UNION ALL SELECT '234'
)
SELECT
*
FROM your_search_table
JOIN telnostart ON (
LEFT(tel_number,3) = telnostart
OR LEFT(fax_number,3) = telnostart
you can use a case statement to add a flag column
select *
,case when left(phone_number,3) in (123,432,645,234) or left(fax_number,3) in (123,432,645,234) then 1 else 0 end as contact_check_flag
from table_name
As per your requirement, you can filter it or use it elsewhere.
SELECT * FROM table_1
WHERE `phone_number` REGEXP '^(123|432|645|234)'
OR `fax_number` REGEXP '^(123|432|645|234)';
But it won't be fast. (And no regular INDEX will help.)
If there phone numbers are spelled out like in the US: "123-456-7890", then you could use a FULLTEXT(phone_number, fax_number) index and
SELECT * FROM table_1
WHERE MATCH(phone_number, fax_number)
AGAINST('123 432 645 234');
This is likely to be much faster, but not as "general".

How to SQL select excluding strings with N duplicate characters in a row

How to make a select from a table excluding rows with N duplicate characters in a row in a certain column? Let's say N=5
'0000011114BR13471' // Exclude
'554XXXXXXXXXXXXXX' // Exclude
'000111114BR134716' // Exclude
'000011114BR134716' // Include
'11880000000000000' // Exclude
'12345678901200000' // Exclude
'12345678901200001' // Include
I tried many combinations but none of them worked. For example:
SELECT * FROM mytable WHERE not (mycolumn regexp '(.)\1{5,}');
Thank you!
You can use the LIKE to match regular Expression and EXCEPT clause to exclude unwanted Results. A query like this might work
In SQL SERVER
Select * from myTable
EXCEPT
Select * from myTable WHERE ColumnName like '(.)\1{4,}'
In MySQl
Select * from myTable
where ColumnName Not In(
Select ColumnName from myTable WHERE ColumnName RLIKE '(.)\1{4,}')
Here N=5. the 4 in regular expration represents 5 duplicates.
I don't think MySQL supports back-references in regular expressions -- which is a shame for your problem. One method is brute force:
select t.*
from t
where col not regexp '0{5}|1{5}|2{5}|3{5}|4{5}|5{5}|6{5}|7{5}|8{5}|9{5}|X{5}';
Another method is a recursive CTE, which breaks up the string into individual characters and then uses window functions to determine if there are 5 in a row:
with recursive cte as (
select col,left(col, 1) as chr, substr(col, 2) as rest, 1 as lev
from t
union all
select col, left(rest, 1), substr(rest, 2), lev + 1
from cte
where rest <> ''
)
select col
from (select cte.*,
lead(lev, 4) over (partition by col, chr order by lev) as chr_4
from cte
) x
group by col
having max(chr_4 = lev + 4) = 0
Here is a db<>fiddle.
In MySQL 8.0:
mycolumn regexp '(.)\\1{4}'
Notes:
Two backslashes are needed.
Since there is 1 selected (.), you need to check for only 4 more, not 5.
The , (meaning "or more") is unnecessary.

Replace substrings using a reference table in MYSQL

I am trying to replace substrings within one text column in my table using a reference table.
To my knowledge, the replace(column, string1,string2) function will only work with strings as the second and third input.
Here is a visual of what I am trying to do. To be clear, the reference table I need to use is much larger - otherwise, I would use four replace functions.
EDIT: Thank you to everyone who has pointed out how bad this data model is built. Though I am not an expert on building efficient data models, I do know this one is built terribly. However, the structure of this model is completely out of my control. Apologies for not mentioning that from the get-go.
table1
Farms
Animals
Farm1
Cow, Pig
Farm2
Dog, Cow, Cat
Farm3
Dog
referenceTable
refColumn1
refColumn2
Cow
Moo
Pig
Oink
Dog
Bark
Cat
Meow
And here is what I would like the result column to be..
table1
Farms
Animals
Farm1
Moo, Oink
Farm2
Bark, Moo, Meow
Farm3
Bark
First question on stackoverflow so apologies if I missed anything.
Any help is appreciated! Thank you!
To loop over comma (or ', ' in this case) separated values, you can use a double substring_index and a join against a sequence table (where the sequence is <= the number of joined values in a given row, as determined with char_length/replace):
select t1.Farms, group_concat(rt.refColumn2 order by which.n separator ', ') Animals
from table1 t1
join (select 1 n union select 2 union select 3) which
on ((char_length(t1.Animals)-char_length(replace(t1.Animals,', ','')))/char_length(', '))+1 >= which.n
join referenceTable rt on rt.refColumn1=substring_index(substring_index(t1.Animals,', ',which.n),', ',-1)
group by t1.Farms
Here I use an ad hoc sequence table of 1 through 3, assuming no row will have more than 3 animals; expand as necessary or alternatively use a cte.
You have a really lousy data model and you should fix it. You should not be storing multiple values in a string column. Each value pair should be on its own row.
Let me assume that someone else created these tables and you have no choice. If that is the case, MySQL has a solution. I think I would suggest:
select t1.*, -- or whatever columns you want
(select group_concat(rt.refColumn2
order by find_in_set(rt.refColumn1, replace(t1.animals, ', ', ','))
separator ', '
)
from referenceTable rt
where find_in_set(rt.refColumn1, replace(t1.animals, ', ', ',')) > 0
)
from table1 t1
I'm more fluent in Sql Server than MySql, having got a solution working in Sql Server the real challenge was converting to a working MySql version!
See if this meets your needs. It works for your sample data, you may of course need to tweak if it doesn't fully represent your real world data.
with w as (
select *, case when animals like '%' || refcol1 || '%' then locate(refcol1,animals) end pos
from t1
join lateral (select * from t2)t2 on 1=1
)
select farms, group_concat(refcol2 order by pos separator ',') as Animals
from w
where pos>0
group by farms
order by farms
Working DB<>Fiddle

MySQL query to categorised address based on values using regular Expression

I have a table which has one column of addresses. I have some 10 - 11 places name.
When i query that table using 'Select * ...', i want to create a new column which matches the values with address fields and store that values into new column of exist else 'Not Found'.
The table has address column as below. I want to extract areas from it such as BTM Layot, Wilson Garden
When i do the select query, the output should be that address field and one more field which will give me the abstract location area from address field. And if any value does not matches the address field then it shoud display as 'Area Nt Specified'
Consider a cross join query (query with no joins but a list of tables in FROM clause) between the larger table of addresses (t1) and smaller table of your 10-11 places (t2) holding BTM Layot, Wilson Garden... values. This will be scalable instead of manually entering/editing places in an IN clause.
Then use a LIKE expression in a WHERE clause to match the places which are a part of the larger address string. However, to return all original address values with matched places use the LEFT JOIN...NOT NULL query with cross join as derived table (sub).
SELECT `maintable`.`address`, IFNULL(sub.`place`, 'Area Nt Specified') As matchplaces
FROM `maintable`
LEFT JOIN
(SELECT t1.ID, t1.address, t2.place
FROM `maintable` As t1,
(SELECT `place` FROM `placestable`) As t2
WHERE t1.address LIKE Concat('%',t2.place,'%')) As sub
ON `maintable`.ID = sub.ID
WHERE `maintable`.ID IS NOT NULL;
If really need to use regular expression, replace the LIKE expression in derived table with below:
WHERE t1.address regexp t2.place
If you have a list of know places, then you can do:
select (case when address regexp '(, BTM Layout|, Bapuji Nagar|, Adugodi)$'
then substring_index(address, ', ', -1)
else 'Not Found'
end)
You can expand the regular expression to include as many places as you like.
Or alternatively, you don't really need a regular expression:
select (case when substring_index(address, ', ', -1) in ('BTM Layout', 'Bapuji Nagar', 'Adugodi', . . .)
then substring_index(address, ', ', -1)
else 'Not Found'
end)

Ordering a Union Query in MS Access SQL

OK I have a particularly nasty union ordering problem so any help would be appreciated.
The scenario is this:
Member Table with the following records (actual data):
REI882
YUI987
POBO37
NUBS26
BTBU12
MZBY10
TYBW54
(These are listed in the order I want them back from my query.)
There are a number of business rules about the construction of these MemberIDs which I believe are unrelated to the sort. They're historic and set in stone. I'm stuck with them. They indicate seniority of the member.
The ordering is done from the last 4 characters in the ID, ascending. The first two characters of the ID are completely meaningless as far as the sort is concerned.
So the topmost possible record is ??A001 (most senior) and the lowest possible record is ??ZZ99 (least senior).
When I query my member table the list I get back must display most senior at top... Obviously a standard sort does not work. This is what I have to date:
The first of these queries deals with sorting members whose ID only has 1 leading letter. The second deals with those with 2 leading letters.
SELECT * FROM (
SELECT Member.ID
FROM Member
WHERE (((IsNumeric(Mid([Member.ID],4,1)))=-1)) **check the 4th character is a digit
ORDER BY (Mid([Member.ID],3,1)), (Mid([Member.ID],4,1)), (Mid([Member.ID],5,1)), (Mid([Member.ID],6,1))
) t1
UNION
SELECT * FROM (
SELECT Member.ID
FROM Member
WHERE (((IsNumeric(Mid([Member.ID],4,1)))=0)) **check the 4th character is a letter
ORDER BY (Mid([Member.ID],3,1)), (Mid([Member.ID],4,1)), (Mid([Member.ID],5,1)), (Mid([Member.ID],6,1))
) t2
But I get CRAZY results with the union! If I run each of the selects individually - no problem my funky (heavily reliant on some nasty string manipulation in access!) sort works exactly as I want it.
I understand this is pretty complicated but I hope I've explained it clearly and that someone is up for some kudos for figuring it out!!!
edit: The result from my query is seemingly random:
YUI987
MZBY10
NUBS26
BTBU12
REI882
POBO37
TYBW54
ORDER BY in a SELECT statement that UNION with another SELECT is not correct.
See Specifying a conditional order here
You can use this:
SELECT ID FROM(
(SELECT Member.ID,1 AS T,Left([Member.ID],2) AS Part1, Right([Member.ID],4) AS Part2
FROM Member
WHERE (((IsNumeric(Mid([Member.ID],3,1)))=-1)))
UNION
(SELECT Member.ID,2 AS T,Left([Member.ID],3) AS Part1, Right([Member.ID],3) AS Part2
FROM Member
WHERE (((IsNumeric(Mid([Member.ID],4,1)))=-1) and ((IsNumeric(Mid([Member.ID],3,1)))=0)))
UNION
(SELECT Member.ID,3 AS T,Left([Member.ID],4) AS Part1, Right([Member.ID],2) AS Part2
FROM Member
WHERE (((IsNumeric(Mid([Member.ID],5,1)))=-1) and ((IsNumeric(Mid([Member.ID],4,1)))=0)))
ORDER BY T,Part1,Part2)
#Justin Kirk: I don't know what is your problem exactly. But I hope it can help you
Why are you not using the RIGHT function.
Something like
SELECT ID
FROM (
SELECT ID
FROM (
SELECT Member.ID
FROM Member
WHERE (((IsNumeric(Mid([Member.ID],4,1)))=-1)) **check the 4th character is a digit
) t1
UNION
SELECT ID
FROM (
SELECT Member.ID
FROM Member
WHERE (((IsNumeric(Mid([Member.ID],4,1)))=0)) **check the 4th character is a letter
) t2
) t3
ORDER BY RIGHT(ID,4)
How about skipping the UNION?
SELECT members.ID
FROM members
ORDER BY Right([ID],3), Right(id,4)
Based on the new rules, this mess may work.
SELECT
Len(IIf([textId] Like "[a-z][a-z][0-9][0-9][0-9][0-9]",Left([textid],2),
IIf([textId] Like "[a-z][a-z][a-z][0-9][0-9][0-9]",Left([textid],3),
IIf([textId] Like "[a-z][a-z][a-z][a-z][0-9][0-9]",Left([textid],4),"_")))) AS Ln,
IIf(textId Like "[a-z][a-z][0-9][0-9][0-9][0-9]",Left(textid,2),
IIf(textId Like "[a-z][a-z][a-z][0-9][0-9][0-9]",Left(textid,3),
IIf(textId Like "[a-z][a-z][a-z][a-z][0-9][0-9]",Left(textid,4),"_"))) AS Alpha,
IIf(textId Like "[a-z][a-z][0-9][0-9][0-9][0-9]",Val(Right(textid,4)),
IIf(textId Like "[a-z][a-z][a-z][0-9][0-9][0-9]",Val(Right(textid,3)),
IIf(textId Like "[a-z][a-z][a-z][a-z][0-9][0-9]",Val(Right(textid,2)),0))) AS Numbr,
table.textid
FROM table
ORDER BY
Len(IIf([textId] Like "[a-z][a-z][0-9][0-9][0-9][0-9]",Left([textid],2),
IIf([textId] Like "[a-z][a-z][a-z][0-9][0-9][0-9]",Left([textid],3),
IIf([textId] Like "[a-z][a-z][a-z][a-z][0-9][0-9]",Left([textid],4),"_")))),
IIf(textId Like "[a-z][a-z][0-9][0-9][0-9][0-9]",Left(textid,2),
IIf(textId Like "[a-z][a-z][a-z][0-9][0-9][0-9]",Left(textid,3),
IIf(textId Like "[a-z][a-z][a-z][a-z][0-9][0-9]",Left(textid,4),"_"))),
IIf(textId Like "[a-z][a-z][0-9][0-9][0-9][0-9]",Val(Right(textid,4)),
IIf(textId Like "[a-z][a-z][a-z][0-9][0-9][0-9]",Val(Right(textid,3)),
IIf(textId Like "[a-z][a-z][a-z][a-z][0-9][0-9]",Val(Right(textid,2)),0)))