Replace substrings using a reference table in MYSQL

Replace substrings using a reference table in MYSQL - mysql

I am trying to replace substrings within one text column in my table using a reference table.
To my knowledge, the replace(column, string1,string2) function will only work with strings as the second and third input.
Here is a visual of what I am trying to do. To be clear, the reference table I need to use is much larger - otherwise, I would use four replace functions.
EDIT: Thank you to everyone who has pointed out how bad this data model is built. Though I am not an expert on building efficient data models, I do know this one is built terribly. However, the structure of this model is completely out of my control. Apologies for not mentioning that from the get-go.
table1
Farms
Animals
Farm1
Cow, Pig
Farm2
Dog, Cow, Cat
Farm3
Dog
referenceTable
refColumn1
refColumn2
Cow
Moo
Pig
Oink
Dog
Bark
Cat
Meow
And here is what I would like the result column to be..
table1
Farms
Animals
Farm1
Moo, Oink
Farm2
Bark, Moo, Meow
Farm3
Bark
First question on stackoverflow so apologies if I missed anything.
Any help is appreciated! Thank you!

To loop over comma (or ', ' in this case) separated values, you can use a double substring_index and a join against a sequence table (where the sequence is <= the number of joined values in a given row, as determined with char_length/replace):
select t1.Farms, group_concat(rt.refColumn2 order by which.n separator ', ') Animals
from table1 t1
join (select 1 n union select 2 union select 3) which
on ((char_length(t1.Animals)-char_length(replace(t1.Animals,', ','')))/char_length(', '))+1 >= which.n
join referenceTable rt on rt.refColumn1=substring_index(substring_index(t1.Animals,', ',which.n),', ',-1)
group by t1.Farms
Here I use an ad hoc sequence table of 1 through 3, assuming no row will have more than 3 animals; expand as necessary or alternatively use a cte.

You have a really lousy data model and you should fix it. You should not be storing multiple values in a string column. Each value pair should be on its own row.
Let me assume that someone else created these tables and you have no choice. If that is the case, MySQL has a solution. I think I would suggest:
select t1.*, -- or whatever columns you want
(select group_concat(rt.refColumn2
order by find_in_set(rt.refColumn1, replace(t1.animals, ', ', ','))
separator ', '
)
from referenceTable rt
where find_in_set(rt.refColumn1, replace(t1.animals, ', ', ',')) > 0
)
from table1 t1

I'm more fluent in Sql Server than MySql, having got a solution working in Sql Server the real challenge was converting to a working MySql version!
See if this meets your needs. It works for your sample data, you may of course need to tweak if it doesn't fully represent your real world data.
with w as (
select *, case when animals like '%' || refcol1 || '%' then locate(refcol1,animals) end pos
from t1
join lateral (select * from t2)t2 on 1=1
)
select farms, group_concat(refcol2 order by pos separator ',') as Animals
from w
where pos>0
group by farms
order by farms
Working DB<>Fiddle

Related

Counting how many fields (in a row) are filled in SQL

I want to count how many columns in a row are not NULL.
The table is quite big (more than 100 columns), therefore I would like to not do it manually or using php (since I dont use php) using this approach Counting how many MySQL fields in a row are filled (or empty).
Is there a simple query I can use in a select like SELECT COUNT(NOT ISNULL(*)) FROM big_table;
Thanks in advance...

Agree with comments above:
There is something wrong in the data since there is a need for such analysis.
You can't completely make it automatic.
But I have a recipe for you for simplifying the process. There are only 2 steps needed to achieve your aim.
Step 0. In the step1 you'll need to get the name of your table schema. Normally, the devs know in what schema does the table reside, but still... Here is how you can find it
select *
from information_schema.tables
where table_name = 'test_table';
Step 1. First of all you need to get the list of columns. Getting just the list of cols won't help you out at all, but this list is all we need to be able to create SELECT statement, right? So, let's make database to prepare select statement for us
select concat('select (length(concat(',
group_concat(concat('ifnull(', column_name, ', ''###'')') separator ','),
')) - length(replace(concat(',
group_concat(concat('ifnull(', column_name, ', ''###'')') separator ','),
'), ''###'', ''''))) / length(''###'')
from test_table')
from information_schema.columns
where table_schema = 'test'
and table_name = 'test_table'
order by table_name,ordinal_position;
Step 3. Execute statement you've got on step 2.
select (length(concat(.. list of cols ..)) -
length(replace(concat(.. list of cols .. ), '###', ''))) / length('###')
from test_table
The select looks tricky but it's simple: first replace all nulls with some symbols that you're sure you'll never get in those columns. I usually do that replacing nulls with "###". that what all that "ifnull"s are here for.
Next, count symbols with "length". In my case it was 14
After that, replace all "###" with blanks and count length again. It's 11 now. For that I was using "length(replace" functions together
Last, just divide (14 - 11) by a length of a replacement string ("###" - 3). You'll get 1. This is exactly amount of nulls in my test string.
Here's a test case you can play with
Do not hesitate to ask if needed

how to pass multiple variables in WHERE ... IN in stored procedure? [duplicate]

I have a column in one of my table where I store multiple ids seperated by comma's.
Is there a way in which I can use this column's value in the "IN" clause of a query.
The column(city) has values like 6,7,8,16,21,2
I need to use as
select * from table where e_ID in (Select city from locations where e_Id=?)
I am satisfied with Crozin's answer, but I am open to suggestions, views and options.
Feel free to share your views.

Building on the FIND_IN_SET() example from #Jeremy Smith, you can do it with a join so you don't have to run a subquery.
SELECT * FROM table t
JOIN locations l ON FIND_IN_SET(t.e_ID, l.city) > 0
WHERE l.e_ID = ?
This is known to perform very poorly, since it has to do table-scans, evaluating the FIND_IN_SET() function for every combination of rows in table and locations. It cannot make use of an index, and there's no way to improve it.
I know you said you are trying to make the best of a bad database design, but you must understand just how drastically bad this is.
Explanation: Suppose I were to ask you to look up everyone in a telephone book whose first, middle, or last initial is "J." There's no way the sorted order of the book helps in this case, since you have to scan every single page anyway.
The LIKE solution given by #fthiella has a similar problem with regards to performance. It cannot be indexed.
Also see my answer to Is storing a delimited list in a database column really that bad? for other pitfalls of this way of storing denormalized data.
If you can create a supplementary table to store an index, you can map the locations to each entry in the city list:
CREATE TABLE location2city (
location INT,
city INT,
PRIMARY KEY (location, city)
);
Assuming you have a lookup table for all possible cities (not just those mentioned in the table) you can bear the inefficiency one time to produce the mapping:
INSERT INTO location2city (location, city)
SELECT l.e_ID, c.e_ID FROM cities c JOIN locations l
ON FIND_IN_SET(c.e_ID, l.city) > 0;
Now you can run a much more efficient query to find entries in your table:
SELECT * FROM location2city l
JOIN table t ON t.e_ID = l.city
WHERE l.e_ID = ?;
This can make use of an index. Now you just need to take care that any INSERT/UPDATE/DELETE of rows in locations also inserts the corresponding mapping rows in location2city.

From MySQL's point of view you're not storing multiple ids separated by comma - you're storing a text value, which has the exact same meaing as "Hello World" or "I like cakes!" - i.e. it doesn't have any meaing.
What you have to do is to create a separated table that will link two objects from the database together. Read more about many-to-many or one-to-many (depending on your requirements) relationships in SQL-based databases.

Rather than use IN on your query, use FIND_IN_SET (docs):
SELECT * FROM table
WHERE 0 < FIND_IN_SET(e_ID, (
SELECT city FROM locations WHERE e_ID=?))
The usual caveats about first form normalization apply (the database shouldn't store multiple values in a single column), but if you're stuck with it, then the above statement should help.

This does not use IN clause, but it should do what you need:
Select *
from table
where
CONCAT(',', (Select city from locations where e_Id=?), ',')
LIKE
CONCAT('%,', e_ID, ',%')
but you have to make sure that e_ID does not contain any commas or any jolly character.
e.g.
CONCAT(',', '6,7,8,16,21,2', ',') returns ',6,7,8,16,21,2,'
e_ID=1 --> ',6,7,8,16,21,2,' LIKE '%,1,%' ? FALSE
e_ID=6 --> ',6,7,8,16,21,2,' LIKE '%,6,%' ? TRUE
e_ID=21 --> ',6,7,8,16,21,2,' LIKE '%,21,%' ? TRUE
e_ID=2 --> ',6,7,8,16,21,2,' LIKE '%,2,%' ? TRUE
e_ID=3 --> ',6,7,8,16,21,2,' LIKE '%,3,%' ? FALSE
etc.

Don't know if this is what you want to accomplish. With MySQL there is feature to concatenate values from a group GROUP_CONCAT
You can try something like this:
select * from table where e_ID in (Select GROUP_CONCAT(city SEPARATOR ',') from locations where e_Id=?)

this one in for oracle ..here string concatenation is done by wm_concat
select * from table where e_ID in (Select wm_concat(city) from locations where e_Id=?)
yes i agree with raheel shan .. in order put this "in" clause we need to make that column into row below code one do that job.
select * from table where to_char(e_ID)
in (
select substr(city,instr(city,',',1,rownum)+1,instr(city,',',1,rownum+1)-instr(city,',',1,rownum)-1) from
(
select ','||WM_CONCAT(city)||',' city,length(WM_CONCAT(city))-length(replace(WM_CONCAT(city),','))+1 CNT from locations where e_Id=? ) TST
,ALL_OBJECTS OBJ where TST.CNT>=rownum
) ;

you should use
FIND_IN_SET Returns position of value in string of comma-separated values
mysql> SELECT FIND_IN_SET('b','a,b,c,d');
-> 2

You need to "SPLIT" the city column values. It will be like:
SELECT *
FROM table
WHERE e_ID IN (SELECT TO_NUMBER(
SPLIT_STR(city /*string*/
, ',' /*delimiter*/
, 1 /*start_position*/
)
)
FROM locations);
You can read more about the MySQL split_str function here: http://blog.fedecarg.com/2009/02/22/mysql-split-string-function/
Also, I have used the TO_NUMBER function of Oracle here. Please replace it with a proper MySQL function.

IN takes rows so taking comma seperated column for search will not do what you want but if you provide data like this ('1','2','3') this will work but you can not save data like this in your field whatever you insert in the column it will take the whole thing as a string.

You can create a prepared statement dynamically like this
set #sql = concat('select * from city where city_id in (',
(select cities from location where location_id = 3),
')');
prepare in_stmt from #sql;
execute in_stmt;
deallocate prepare in_stmt;

Ref: Use a comma-separated string in an IN () in MySQL
Recently I faced the same problem and this is how I resolved it.
It worked for me, hope this is what you were looking for.
select * from table_name t where (select (CONCAT(',',(Select city from locations l where l.e_Id=?),',')) as city_string) LIKE CONCAT('%,',t.e_ID,',%');
Example: It will look like this
select * from table_name t where ',6,7,8,16,21,2,' LIKE '%,2,%';

Select All Distinct Words in Column MYSQL

I have a column in which is stored nothing but text separated by one space. There may be one to maybe 5 words in each field of the column. I need a query to return all the distinct words in that column.
Tried:
SELECT DISTINCT tags FROM documents ORDER BY tags
but does not work.
To Elaborate.
I have a column called tags. In it I may have the following entries:
Row 1 Red Green Blue Yellow
Row 2 Red Blue Orange
Row 3 Green Blue Brown
I want to select all the DISTINCT words in the entire column - all fields. It would return:
Red Green Blue Yellow Orange Brown
If I counted each it would return:
2 Red
2 Green
3 Blue
1 Yellow
1 Brown
1 Orange

To fix this I ended up creating a second table where all keywords where inserted on their own row each along with a record key that tied them back to the original record in the main data table. I then just have to SELECT DISTINCT to get all tags or I can SELECT DISTINCT with a WHERE clause specifying the original record to get the tags associated with a unique record. Much easier.

There is not a good solution for this. You can achieve this with JSON functions as of 5.6, I think, but it's a little tricky until 8.0, when mySQL added the JSON_TABLE function, which can convert json data to a table like object and perform selects on it, but how it will perform is dependent on your actual data. Here's a working example:
CREATE TABLE t(raw varchar(100));
INSERT INTO t (raw) VALUES ('this is a test');
You will need to strip the symbols (commas, periods, maybe others) from your text, then replace any white text with ",", then wrap the whole thing in [" and "] to json format it. I'm not going to give a full featured example, because you know better than I do what your data looks like, but something like this (in its simplest form):
SELECT CONCAT('["', REPLACE(raw, ' ', '","'), '"]') FROM t;
With JSON_TABLE, you can do something like this:
SELECT CONCAT('["', REPLACE(raw, ' ', '","'), '"]') INTO #delimited FROM t;
SELECT *
FROM JSON_TABLE(
#delimited,
"$[*]"
COLUMNS(Value varchar(50) PATH "$")
) d;
See this fiddle: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=7a86fcc77408ff5dfec7a805c6e4117a
At this point you have a table of the split words, and you can replace SELECT * with whatever counting query you want, probably SELECT Value, count(*) as vol. You will also need to use group_concat to handle multiple rows. Like this:
insert into t (raw) values ('this is also a test'), ('and you can test it');
select concat(
'["',
replace(group_concat(raw SEPARATOR '","'), ' ', '","'),
'"]'
) into #delimited from t;
SELECT Value, count(*) as vol
FROM JSON_TABLE(
#delimited,
"$[*]"
COLUMNS(Value varchar(50) PATH "$")
) d
GROUP BY Value ORDER BY count(*) DESC;
If you are running <8.0, you can still accomplish this, but it will take some hackiness, like generating an arbitrary list of numbers and constructing the paths dynamically from that.

Retrieve specific string from column in SQL table

Hi I have the following data column where the typical data will look like:
Row 1: RCS CARD: THANK YOU FOR YOUR PURCHASE AT PICK N PAY ON CARD ...1820 FOR R371.71 ON 14-03-2013 AT 09:46. AVAIL CREDIT R67. FOR QUERIES CALL 0861028889
Row 2: RCS CARD: THANK YOU FOR YOUR PURCHASE AT PICK N PAY ON CARD ...6825 FOR R3061.93 ON 14-03-2013 AT 09:45. AVAIL CREDIT R39. FOR QUERIES CALL 0861028889
I need to be able to extract the R371.71 and R3061.93 from row 1 and 2. What is the most accurate way to do this? Keeping in mind that R amount will change from row to row so a simple substring will not work?
Any advice would be extremely helpful.
Thanks,
Jonathan

Well the proper way to do it is to use regexp in an external script/app since MySQL doesn't support regular expression sub strings.
If you do insist on using SQL the only way I could think of is by assuming that the string starts with:
RCS CARD: THANK YOU FOR YOUR PURCHASE AT PICK N PAY ON CARD
and just ignore that part. so the SQL should be:
SELECT SUBSTR(t, LOCATE('FOR', t, 61)+5 ,LOCATE('ON', t, 61)-1-LOCATE('FOR', t, 61)-5)
FROM DATA
Again I would use regexp but you can see it's working in this SQLFiddle: http://sqlfiddle.com/#!2/966ad/7

If the column in concern has consistent text format as you mentioned in the question, then you can make use of substring_index, locate and substring functions to find the amount value.
select
-- column_name,
substring_index( substring( column_name,
locate( 'FOR R', column_name, 1 )
+ length( 'FOR R' )
- 1
), ' ', 1
) as amount
from table_name
where
column_name like '%RCS CARD: THANK YOU FOR YOUR PURCHASE AT PICK N PAY ON CARD%';
Demo # MySQL 5.5.32 Fiddle
If you want to extract only the amount without prefix 'R' then, remove the '-1' line from the above query.

MySQL - Using Order By result created by a subquery group_concat or join issue

This is a query I've been puzzling over for quite some time, I've never been able to get it to work quite right and after about 40 hours of pondering I've gotten to this point.
Setup
For the example issue we have 2 tables, one being...
field_site_id field_sitename field_admins
1 Some Site 1,
2 Other Site 1,2,
And the other is admins like...
field_user_id field_firstname field_lastname
1 Joe Bloggs
2 Barry Wills
Now all this query is designed to do is the following:
List all sites in the database
Using a JOIN and FIND_IN_SET to pull each admin
And GROUP_CONCAT(field_firstname, ' ', field_lastname) with a GROUP BY to build a field with the real user names.
Also allow HAVING to filter on the custom result to narrow the results down further.
All this part works perfectly fine.
What I can't work out how to achieve is to sort the results by the GROUP_CONCAT result, I imagine this is being the ORDER BY works before the concat function therefore the data doesn't exist to order by it, so what would the alternative be?
Code examples:
SELECT *
GROUP_CONCAT(DISTINCT field_firstname, ' ', field_lastname ORDER BY field_lastname SEPARATOR ', ') AS field_admins_fullname,
FROM `table_sites`
LEFT JOIN `table_admins` ON FIND_IN_SET( `table_admins`.`field_user_id`, `table_sites`.`field_site_id` ) > 0
GROUP BY field_site_id
I also tried a query that used a subquery to gather the group_concat result as below...
( SELECT GROUP_CONCAT(field_firstname, ' ', field_lastname ORDER BY field_lastname ASC SEPARATOR ', ') FROM table_admins
WHERE FIND_IN_SET( `table_admins`.`field_user_id`, `table_sites`.`field_admins` ) > 0
) AS field_admins_fullname
Conclusion
Either way attempting to ORDER BY field_admins_fullname will not create the correct results, it won't error out but assume that's because the given ORDER BY is blank so it just does whatever it wants.
Any suggestions would be welcome, if this is just not possible, what would be another recommend index methodology?

Two things I see wrong:
1st, is the JOIN. It should be using s.field_admins and not field_site_id :
ON FIND_IN_SET( a.field_user_id, s.field_admins ) > 0
2nd, you should use the CONCAT() function (to conactenate fields from the same row) inside the GROUP_CONCAT().
Try this:
SELECT s.field_site_id
, s.field_sitename
, GROUP_CONCAT( CONCAT(a.field_firstname, ' ', a.field_lastname)
ORDER BY a.field_lastname ASC
SEPARATOR ', '
)
AS field_admins_fullname
FROM table_sites s
LEFT JOIN table_admins a
ON FIND_IN_SET( a.field_user_id, s.field_admins ) > 0
GROUP BY s.field_site_id
Friendly advice:
Don't use Do use
------------ --------
table_sites site
table_admins admin
field_site_id site_id
field_sitename sitename
field_admins admins
But what should really be stressed, is your setup. Having fields that have comma separated values lead to this kind of horrible queries that use FIND_IN_SET() for joins and GROUP_CONCAT() for showing results. Horrible to see, difficult to maintain and most important, very, very slow as no index can be used.
You should have something like this instead:
Setup suggestion
Table: site
site_id sitename
1 Some Site
2 Other Site
Table: site_admin
site_id admin_id
1 1
2 1
2 2
Table: admin
user_id firstname lastname
1 Joe Bloggs
2 Barry Wills

I think you need to repeat the complex CONCAT statement you are selecting within the ORDER BY.
So your order by would be more like...
ORDER BY (GROUP_CONCAT(DISTINCT field_firstname, ' ',
field_lastname ORDER BY field_lastname SEPARATOR ', ')) ASC
I have not tried this but I had a similar issue which this seemed to solve but it was much simpler without the DISTINCT etc.

wrong group by, try this ?
GROUP BY field_site_id

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Replace substrings using a reference table in MYSQL - mysql

Related

Counting how many fields (in a row) are filled in SQL

how to pass multiple variables in WHERE ... IN in stored procedure? [duplicate]

Select All Distinct Words in Column MYSQL

Retrieve specific string from column in SQL table

MySQL - Using Order By result created by a subquery group_concat or join issue

Categories

Resources