I have a column in which is stored nothing but text separated by one space. There may be one to maybe 5 words in each field of the column. I need a query to return all the distinct words in that column.
Tried:
SELECT DISTINCT tags FROM documents ORDER BY tags
but does not work.
To Elaborate.
I have a column called tags. In it I may have the following entries:
Row 1 Red Green Blue Yellow
Row 2 Red Blue Orange
Row 3 Green Blue Brown
I want to select all the DISTINCT words in the entire column - all fields. It would return:
Red Green Blue Yellow Orange Brown
If I counted each it would return:
2 Red
2 Green
3 Blue
1 Yellow
1 Brown
1 Orange
To fix this I ended up creating a second table where all keywords where inserted on their own row each along with a record key that tied them back to the original record in the main data table. I then just have to SELECT DISTINCT to get all tags or I can SELECT DISTINCT with a WHERE clause specifying the original record to get the tags associated with a unique record. Much easier.
There is not a good solution for this. You can achieve this with JSON functions as of 5.6, I think, but it's a little tricky until 8.0, when mySQL added the JSON_TABLE function, which can convert json data to a table like object and perform selects on it, but how it will perform is dependent on your actual data. Here's a working example:
CREATE TABLE t(raw varchar(100));
INSERT INTO t (raw) VALUES ('this is a test');
You will need to strip the symbols (commas, periods, maybe others) from your text, then replace any white text with ",", then wrap the whole thing in [" and "] to json format it. I'm not going to give a full featured example, because you know better than I do what your data looks like, but something like this (in its simplest form):
SELECT CONCAT('["', REPLACE(raw, ' ', '","'), '"]') FROM t;
With JSON_TABLE, you can do something like this:
SELECT CONCAT('["', REPLACE(raw, ' ', '","'), '"]') INTO #delimited FROM t;
SELECT *
FROM JSON_TABLE(
#delimited,
"$[*]"
COLUMNS(Value varchar(50) PATH "$")
) d;
See this fiddle: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=7a86fcc77408ff5dfec7a805c6e4117a
At this point you have a table of the split words, and you can replace SELECT * with whatever counting query you want, probably SELECT Value, count(*) as vol. You will also need to use group_concat to handle multiple rows. Like this:
insert into t (raw) values ('this is also a test'), ('and you can test it');
select concat(
'["',
replace(group_concat(raw SEPARATOR '","'), ' ', '","'),
'"]'
) into #delimited from t;
SELECT Value, count(*) as vol
FROM JSON_TABLE(
#delimited,
"$[*]"
COLUMNS(Value varchar(50) PATH "$")
) d
GROUP BY Value ORDER BY count(*) DESC;
If you are running <8.0, you can still accomplish this, but it will take some hackiness, like generating an arbitrary list of numbers and constructing the paths dynamically from that.
Related
I want to count how many columns in a row are not NULL.
The table is quite big (more than 100 columns), therefore I would like to not do it manually or using php (since I dont use php) using this approach Counting how many MySQL fields in a row are filled (or empty).
Is there a simple query I can use in a select like SELECT COUNT(NOT ISNULL(*)) FROM big_table;
Thanks in advance...
Agree with comments above:
There is something wrong in the data since there is a need for such analysis.
You can't completely make it automatic.
But I have a recipe for you for simplifying the process. There are only 2 steps needed to achieve your aim.
Step 0. In the step1 you'll need to get the name of your table schema. Normally, the devs know in what schema does the table reside, but still... Here is how you can find it
select *
from information_schema.tables
where table_name = 'test_table';
Step 1. First of all you need to get the list of columns. Getting just the list of cols won't help you out at all, but this list is all we need to be able to create SELECT statement, right? So, let's make database to prepare select statement for us
select concat('select (length(concat(',
group_concat(concat('ifnull(', column_name, ', ''###'')') separator ','),
')) - length(replace(concat(',
group_concat(concat('ifnull(', column_name, ', ''###'')') separator ','),
'), ''###'', ''''))) / length(''###'')
from test_table')
from information_schema.columns
where table_schema = 'test'
and table_name = 'test_table'
order by table_name,ordinal_position;
Step 3. Execute statement you've got on step 2.
select (length(concat(.. list of cols ..)) -
length(replace(concat(.. list of cols .. ), '###', ''))) / length('###')
from test_table
The select looks tricky but it's simple: first replace all nulls with some symbols that you're sure you'll never get in those columns. I usually do that replacing nulls with "###". that what all that "ifnull"s are here for.
Next, count symbols with "length". In my case it was 14
After that, replace all "###" with blanks and count length again. It's 11 now. For that I was using "length(replace" functions together
Last, just divide (14 - 11) by a length of a replacement string ("###" - 3). You'll get 1. This is exactly amount of nulls in my test string.
Here's a test case you can play with
Do not hesitate to ask if needed
I am trying to replace substrings within one text column in my table using a reference table.
To my knowledge, the replace(column, string1,string2) function will only work with strings as the second and third input.
Here is a visual of what I am trying to do. To be clear, the reference table I need to use is much larger - otherwise, I would use four replace functions.
EDIT: Thank you to everyone who has pointed out how bad this data model is built. Though I am not an expert on building efficient data models, I do know this one is built terribly. However, the structure of this model is completely out of my control. Apologies for not mentioning that from the get-go.
table1
Farms
Animals
Farm1
Cow, Pig
Farm2
Dog, Cow, Cat
Farm3
Dog
referenceTable
refColumn1
refColumn2
Cow
Moo
Pig
Oink
Dog
Bark
Cat
Meow
And here is what I would like the result column to be..
table1
Farms
Animals
Farm1
Moo, Oink
Farm2
Bark, Moo, Meow
Farm3
Bark
First question on stackoverflow so apologies if I missed anything.
Any help is appreciated! Thank you!
To loop over comma (or ', ' in this case) separated values, you can use a double substring_index and a join against a sequence table (where the sequence is <= the number of joined values in a given row, as determined with char_length/replace):
select t1.Farms, group_concat(rt.refColumn2 order by which.n separator ', ') Animals
from table1 t1
join (select 1 n union select 2 union select 3) which
on ((char_length(t1.Animals)-char_length(replace(t1.Animals,', ','')))/char_length(', '))+1 >= which.n
join referenceTable rt on rt.refColumn1=substring_index(substring_index(t1.Animals,', ',which.n),', ',-1)
group by t1.Farms
Here I use an ad hoc sequence table of 1 through 3, assuming no row will have more than 3 animals; expand as necessary or alternatively use a cte.
You have a really lousy data model and you should fix it. You should not be storing multiple values in a string column. Each value pair should be on its own row.
Let me assume that someone else created these tables and you have no choice. If that is the case, MySQL has a solution. I think I would suggest:
select t1.*, -- or whatever columns you want
(select group_concat(rt.refColumn2
order by find_in_set(rt.refColumn1, replace(t1.animals, ', ', ','))
separator ', '
)
from referenceTable rt
where find_in_set(rt.refColumn1, replace(t1.animals, ', ', ',')) > 0
)
from table1 t1
I'm more fluent in Sql Server than MySql, having got a solution working in Sql Server the real challenge was converting to a working MySql version!
See if this meets your needs. It works for your sample data, you may of course need to tweak if it doesn't fully represent your real world data.
with w as (
select *, case when animals like '%' || refcol1 || '%' then locate(refcol1,animals) end pos
from t1
join lateral (select * from t2)t2 on 1=1
)
select farms, group_concat(refcol2 order by pos separator ',') as Animals
from w
where pos>0
group by farms
order by farms
Working DB<>Fiddle
I have a varchar(255) field within a source table and the following contents:
50339 My great example
2020002 Next ID but different title
202020 Here we go
Now I am processing the data and do an insert select query on it. From this field I would need the INT number at the beginning of the field. IT IS followed by 2 spaces and a text with var length, this text is what I need as well but for another field. In General I want to to put text and ID in two fields which are now in one.
I tried to grab it like this:
SELECT STATUS REGEXP '^(/d{6,8}) ' FROM products_test WHERE STATUS is not null
But then I learned that in MySQL 5.x there are no regexp within the SELECT statement.
How could I seperate those values within a single select statment, so I can use it in my INSERT SELECT?
From the correct solution of user slaakso, resulted another related problem since somtimes the STATUS field is empty which then results in only one insert, but in case there is a value I split it into two fields. So the count does not match.
My case statement with his solution somehow contains a syntax problem:
CASE STATUS WHEN ''
THEN(
NULL,
NULL
)
ELSE(
cast(STATUS as unsigned),
substring(STATUS, locate(' ', STATUS)+3)
)
END
You can do following. Note that you need to treat the columns separately:
select
if(ifnull(status, '')!='', cast(status as unsigned), null),
if(ifnull(status, '')!='', substring(status, locate(' ', status)+2), null)
from products_test;
See db-fiddle
schema:
customers(name, mailid, city)
What to find:
Find all customer names consisting of three or more words (for example King George V).
What I tried:
select name from customers
where name like
'%[A-Za-z0-9][A-Za-z0-9]% %[A-Za-z0-9][A-Za-z0-9]% %[A-Za-z0-9][A-Za-z0-9]%'
what is surprising me:
If I am trying for two words (removing the last %[A-Za-z0-9]% from my query), its working fine but its not working for three words :(
MySQL Solution:
If a name has words separated by space character, then,
Try the following:
select name from customers
where ( length( name )
-
length( replace( name, ' ', '' ) ) + 1
) >= 3
In t-sql, the like clause can contain multiple wild card checks - eg:
SELECT * FROM Customers WHERE Name like '% % %'
will return those Names where two spaces are contained.
If you are consistent with the spacing between names, you could use this logic
SELECT LENGTH(name)-LENGTH(REPLACE(name,' ',''))
FROM customers
Or you can try this too if your sql dont have length function ( which is the situation I have when I'm doing an online exercies...) Inspired by answers above
SELECT name FROM customers WHERE (replace(name,' ','*')) LIKE '%*%*%'
I have read quite a few selcet+update questions in here but cannot understand how to do it. So will have to ask from the beginning.
I would like to update a table based on data in another table. Setup is like this:
- TABLE a ( int ; string )
ID WORD
1 banana
2 orange
3 apple
- TABLE b ( "comma separated" string ; string )
WORDS TEXTAREA
0 banana -> 0,1
0 orange apple apple -> BEST:0,2,3 ELSE 0,2,3,3
0 banana orange apple -> 0,1,2,3
Now I would like to for each word in TABLE a append ",a.ID" to b.WORDS like:
SELECT id, word FROM a
(for each) -> UPDATE b SET words = CONCAT(words, ',', a.id) WHERE b.textarea like %a.word%
Or even better: replace the word found in b.textarea with ",a.id" so it is the b.textarea that ends up beeing a comma separeted string of id's... But I do not know if that is possible.
Tried this but not working. But I think I am getting closer:
UPDATE a, b
SET b.textarea =
replace(b.textarea,a.word,CONCAT(',',a.id))
WHERE a.word IN (b.textarea)
ORDER BY length(a.word) DESC
I ended up doing a work-a-round. I exported all a.words to excel and created an update for each row like this:
UPDATE `tx_ogarktiskdocarchive_loebe` SET `temp_dictionay` = replace(lower(temp_dictionay) , lower('Drygalski’s Grønlandsekspedition'), CONCAT(',',191));
Then I pasted the aprox 1000 rows into ans sql file and executed it. Done.
I had to do "a cleaner double post" of this one to get the answer.
A solution can be put together based on this manual:
http://dev.mysql.com/doc/refman/5.1/en/group-by-functions.html#function_group-concat
GROUP_CONCAT will make a comma separated string based on the fileds it shall CONCAT. Perfect. And regarding the preferred solution with no dublicates in the result there is this example in the manual that will filter out dublicates using DISTINCT inside the GROUP_CONCAT:
mysql> SELECT student_name,
-> GROUP_CONCAT(DISTINCT test_score
-> ORDER BY test_score DESC SEPARATOR ' ')
-> FROM student
-> GROUP BY student_name;