mysql - How to split comma separated text and create table - mysql

how to split comma separated string from one column and turn it into several columns?
this is my table:
SELECT id,lik FROM `tbl_users_posts` WHERE id=1;
id lik
-------------
1 10,11,12,13,14,15
how can i split 'lik' column and get this result?
id lik
-------------
1 10
1 11
1 12
1 13
1 14
1 15
displays id 1 in the first row and split the 'lik' column into pieces in the second row and displays it one by one

Unfortunately MySQL doesn't have a split string functions. One way is create a temporary table as following with the max values of the largest row:
create temporary table numbers as (
select 1 as n
union select 2 as n
union select 3 as n
union select 4 as n
union select 5 as n
union select 6 as n
union select 7 as n
union select 8 as n
);
Then you can use substring_index to accomplish the desired result
select id,
substring_index( substring_index(lik, ',', n),',', -1) as lik
from tbl_users_posts
join numbers on char_length(lik) - char_length(replace(lik, ',', '')) >= n - 1
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=84bc1b4e60a7feea5af0d0b568bc7bcb
Edit.
Another method if you have MySQL 8+ for the lik string to split into thousands of pieces without a loop is create an temporary table using recursive cte as follows:
CREATE TEMPORARY TABLE numbers WITH RECURSIVE cte AS
( select 1 as n
union all
select n +1
from cte
limit 1000
)
SELECT * FROM cte;
And then use the same query as above:
select id,
substring_index( substring_index(lik, ',', n),',', -1) as lik
from tbl_users_posts
join numbers on char_length(lik) - char_length(replace(lik, ',', '')) >= n - 1
order by id asc;
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=9231202418ce9b17aef8609ad6875fbe

If the lik is a number that can be found in the database you can do:
select p.id, t.lik_id
from table_containing_lik t
join tbl_users_posts p on find_in_set(t.lik_id, p.lik)
where p.id=1;

Related

Split string on token and aggregate on split words

I have a field that includes files that have 'words' separated by an underscore, _, such as this:
`file_name`
MY_NEW_MOVIE.mov
HD_VIDEO_720p.mov
720p_DISNEY_MOVIE.mov
LG_TYLERPERRY_FEATURE_HD_8CH_EN_L9714343_16X9_235_2398_FINAL_FRSUB.srt
And I want to split on _ and get the count of each word after the split, meaining:
`word` `count`
MY 1
NEW 1
MOVIE 2
HD 1
VIDEO 1
720p 2
DISNEY 1
Would it be possible/feasible to do this in SQL? So far I have just gotten the perfunctory "remove the file extension", but not sure how I could split on the token and then count that:
select left(file_name, length(file_name) - length(substring_index(file_name, '.', -1))-1) from asset
Additionally,
The result you want can be achieved with a query derived from this answer, which uses a generated numbers table along with SUBSTRING_INDEX to split out all the words in each file_name. This is then used as a derived table to count the occurrence of each word. Note the numbers table must have sufficient values to cover the maximum number of words in a filename (12 for this sample data).
SELECT word, COUNT(*)
FROM (
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(LEFT(file_name, LENGTH(file_name)-4), '_', numbers.n), '_', -1) AS word
FROM (
select 1 n union all
select 2 union all select 3 union all select 4 union all
select 5 union all select 6 union all select 7 union all
select 8 union all select 9 union all select 10 union all
select 11 union all select 12
) numbers
JOIN asset ON LENGTH(file_name)
- LENGTH(REPLACE(file_name, '_', '')) >= numbers.n - 1
) w
GROUP BY word
Output (for your sample data):
word COUNT(*)
16X9 1
235 1
2398 1
720p 2
8CH 1
DISNEY 1
EN 1
FEATURE 1
FINAL 1
FRSUB 1
HD 2
L9714343 1
LG 1
MOVIE 2
MY 1
NEW 1
TYLERPERRY 1
VIDEO 1
Demo on dbfiddle
Assuming the filenames always have exactly three components, SUBSTRING_INDEX can get the job done here:
SELECT word, COUNT(*) AS count
FROM
(
SELECT SUBSTRING_INDEX(file_name, '_', 1) AS word FROM asset
UNION ALL
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(file_name, '_', 2), '_', -1) FROM asset
UNION ALL
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(file_name, '_', -1), '.', 1) FROM asset
) t
GROUP BY word;
Demo
Note: This answer was given based on the OP's original sample data, where all filenames had exactly three underscore-separate components. This answer will not work for the updated question.

Redshift nested json extraction

I have a table with two columns, one column named user, one json column named js that looks like this:
{"1":{"partner_id":54,"provider_id":13},
"2":{"partner_id":56,"provider_id":8},
"3":{"partner_id":2719,"provider_id":274}}
I want to select all 'provider_id' in one column/row.So it should look like this:
user| provider_ids
0001| 13,8,274
0002| 21,36,57,12
How can I do this? Thanks in advance!
Your provided json format is not so easy to work with.
Crated table for test purposes:
create table json_test as
select '0001' as usr, '{"1":{"partner_id":54,"provider_id":13},
"2":{"partner_id":56,"provider_id":8},
"3":{"partner_id":2719,"provider_id":274}}'
as json_text
union all
select '0002' as usr, '{"1":{"partner_id":54,"provider_id":21},
"2":{"partner_id":56,"provider_id":36},
"2":{"partner_id":56,"provider_id":57},
"3":{"partner_id":2719,"provider_id":12}}'
as json_text;
Query to return results:
with NS AS (
select 1 as n union all
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9 union all
select 10
)
select usr,
listagg(trim(TRIM(split_part(SPLIT_PART(js.json_text, '},', NS.n),'"provider_id":',2)),'}'),',') within group(order by null) AS t
from NS
join json_test js ON true and NS.n <= REGEXP_COUNT(js.json_text, '\\},') + 1
group by usr;
Notes:
1) do not name column "user" as it is reserved keyword
2) add as many dummy rows in NS subquery as there is maximum of json provider records
3) Yes, I know, this isn't very readable SQL :D

mysql find numbers in query that are NOT in table

Is there a simple way to compare a list of numbers in my query to a column in a table to return the ones that are NOT in the db?
I have a comma separated list of numbers (1,57, 888, 99, 76, 490, etc etc) that I need to compare to the number column in a table in my DB. SOME of those numbers are in the table, some are not. I need the query to return those that are in my comma separated list, but are NOT in the DB...
I would put the list of numbers to be checked in a table of their own, then use WHERE NOT EXISTS to check whether they exist in the table to be queried. See this SQLFiddle demo for an example of how this might be accomplished:
If you're comfortable with this syntax, you can even avoid putting into a temp table:
SELECT * FROM (
SELECT 1 AS mycolumn
UNION
SELECT 2
UNION
SELECT 3
UNION
SELECT 4
UNION
SELECT 5
UNION
SELECT 6
UNION
SELECT 7
) a
WHERE NOT EXISTS ( SELECT 1 FROM mytable b
WHERE b.mycolumn = a.mycolumn )
UPDATE per comments from OP
If you can insert your very long list of numbers into a table, then query as follows to get the numbers that are not found in the other table:
SELECT mynumber
FROM mytableof37000numbers a
WHERE NOT EXISTS ( SELECT 1 FROM myothertable b
WHERE b.othernumber = a.mynumber)
Alternately
SELECT mynumber
FROM mytableof37000numbers a
WHERE a.mynumber NOT IN ( SELECT b.othernumber FROM myothertable b )
Hope this helps.
May be this is what you are looking for.
Convert your CSV to rows using SUBSTRING_INDEX. Use NOT IN operator to find the values which is not present in DB
Then Convert the result back to CSV using Group_Concat.
select group_concat(value) from(
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(t.a, ',', n.n), ',', -1) value
FROM csv t CROSS JOIN
(
SELECT a.N + b.N * 10 + 1 n
FROM
(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
ORDER BY n
) n
WHERE n.n <= 1 + (LENGTH(t.a) - LENGTH(REPLACE(t.a, ',', '')))) ou
where value not in (select a from db)
SQLFIDDLE DEMO
CSV TO ROWS referred from this ANSWER
You could use the 'IN' clause of MySQL. Maybe check this out IN clause tutorial

Sql syntax to insert every thousandth positive integer, starting from 1 and up until 1 million?

I have the following table:
CREATE TABLE dummy (
thousand INT(10) UNSIGNED,
UNIQUE(thousand)
);
Is there sql syntax I can use to insert every thousandth positive integer, starting from 1 and up until 1 million? I can achieve this in php, but I was wondering if this was possible without using a stored procedure.
thousand
1
1001
2001
3001
4001
...
998001
999001
insert into dummy
( thousand )
select
PreQuery.thousand
from
( select
#sqlvar thousand,
#sqlvar := #sqlvar + 1000
from
AnyTableWithAtLeast1000Records,
( select #sqlvar := 1 ) sqlvars
limit 1000 ) PreQuery
You can insert from a select statement. Using MySQL Variables, start with 1. Then, join to ANY table in your system that may have 1000 (or more) records just to generate a row. Even though not getting any actual column from such table, we just need it for the record position. Then the #sqlvar starts at 1 and is returned in column named thousand. Then, immediately add 1000 to it for the next record in the "AnyTable..."
Unfortunately, mysql doesn't support this with any special SQL function.
You have to populate a table, which isn't such a big deal - it would only be 1000 rows.
You can also hack together a temporary table with unions, but that's hardly elegant - might as well use a table.
Other databases do support it, eg with postgres' generate_series() function, but that is little consolation.
As a side note, I often find it handy to have a table populated with consecutive numbers from 1 up to a large numebr for just such as occasion, and I would just select 1000 * num from numbers where num <= 1000.
For a one statement query, i.e. without introducing any additional tables and supporting statement, I'd use this approach in 'pseudo' SQL:
SELECT (D1.Digit + D2.Digit + D3.Digit)* 1000 + 1
FROM
(
SELECT 0 AS Digit UNION ALL
SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5 UNION ALL
SELECT 6 UNION ALL
SELECT 7 UNION ALL
SELECT 8 UNION ALL
SELECT 9
) AS D1
CROSS JOIN
(
SELECT 0 AS Digit UNION ALL
SELECT 10 UNION ALL
SELECT 20 UNION ALL
SELECT 30 UNION ALL
SELECT 40 UNION ALL
SELECT 50 UNION ALL
SELECT 60 UNION ALL
SELECT 70 UNION ALL
SELECT 80 UNION ALL
SELECT 90
) AS D2
CROSS JOIN
(
SELECT 0 AS Digit UNION ALL
SELECT 100 UNION ALL
SELECT 200 UNION ALL
SELECT 300 UNION ALL
SELECT 400 UNION ALL
SELECT 500 UNION ALL
SELECT 600 UNION ALL
SELECT 700 UNION ALL
SELECT 800 UNION ALL
SELECT 900
) AS D3
WHERE ((D1.Digit + D2.Digit + D3.Digit)* 1000 + 1) < 1000000
I'm not 100% sure but it should run fine in mysql or perhaps require some minor change.
If you're able to reuse parts of the query, it becomes much prettier, for example in SQL Server, I'd write it as follows:
WITH Digits AS
(
SELECT 0 AS Digit UNION ALL
SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5 UNION ALL
SELECT 6 UNION ALL
SELECT 7 UNION ALL
SELECT 8 UNION ALL
SELECT 9
)
SELECT (D1.Digit + D2.Digit + D3.Digit)* 1000 + 1
FROM (SELECT Digit FROM Digits) AS D1
CROSS JOIN (SELECT Digit * 10 AS Digit FROM Digits) AS D2
CROSS JOIN (SELECT Digit * 100 AS Digit FROM Digits) AS D3
WHERE ((D1.Digit + D2.Digit + D3.Digit)* 1000 + 1) < 1000000
Keep an eye on where multiplication happens, it might be more efficient to multiply in sub-queries, rather than in the resulting expression.
Here is a trick, that I usually use for those sort of problems, similar to #sergeBelov solution:
Create an anchor table, a temp table, and fill it with values from 0 to 9, like so:
CREATE TABLE TEMP (Digit int);
INSERT INTO Temp VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
Then you can do this:
INSERT INTO dummy(thousand)
SELECT 1 + (id - 1) * 1000 AS n
FROM
(
SELECT t3.digit * 100 + t2.digit * 10 + t1.digit + 1 AS id
FROM TEMP AS t1
CROSS JOIN TEMP AS t2
CROSS JOIN TEMP AS t3
) t;
SQL Fiddle Demo
How does this work?
The sequence numbers(1, 1001, 2001, ... , 998001, 999001) 1000 terms, that you are looking for, is what they called Arithmetic progression, and in your case the nth term of the sequence (an) is given by:
A + (n - 1) * d
In you sequence: a = 1, d = 1000
Where A is the first term of the sequence, n is the term and d is the difference between each two terms(it is the same for each two successive terms).
The subquery:
SELECT t3.digit * 100 + t2.digit * 10 + t1.digit + 1 AS id
FROM TEMP AS t1
CROSS JOIN TEMP AS t2
CROSS JOIN TEMP AS t3;
Will generate a list of numbers from 1 to 1000(the total number of terms in your sequence), after that we got each term in the sequence from these number by 1 + (id - 1) * 1000 in the outer select.

Counting word occurrences in a table column

I have a table with a varchar(255) field. I want to get (via a query, function, or SP) the number of occurences of each word in a group of rows from this table.
If there are 2 rows with these fields:
"I like to eat bananas"
"I don't like to eat like a monkey"
I want to get
word | count()
---------------
like 3
eat 2
to 2
i 2
a 1
Any idea? I am using MySQL 5.2.
#Elad Meidar, I like your question and I found a solution:
SELECT SUM(total_count) as total, value
FROM (
SELECT count(*) AS total_count, REPLACE(REPLACE(REPLACE(x.value,'?',''),'.',''),'!','') as value
FROM (
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(t.sentence, ' ', n.n), ' ', -1) value
FROM table_name t CROSS JOIN
(
SELECT a.N + b.N * 10 + 1 n
FROM
(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
ORDER BY n
) n
WHERE n.n <= 1 + (LENGTH(t.sentence) - LENGTH(REPLACE(t.sentence, ' ', '')))
ORDER BY value
) AS x
GROUP BY x.value
) AS y
GROUP BY value
Here is the full working fiddle: http://sqlfiddle.com/#!2/17481a/1
First we do a query to extract all words as explained here by #peterm(follow his instructions if you want to customize the total number of words processed). Then we convert that into a sub-query and then we COUNT and GROUP BY the value of each word, and then make another query on top of that to GROUP BY not grouped words cases where accompanied signs might be present. ie: hello = hello! with a REPLACE
I would recommend not to do this in SQL at all. You're loading DB with something that it isn't best at. Selecting a group of rows and doing frequency calculation on the application side will be easier to implement, will work faster and will be maintained with less issues/headaches.
You can try this perverted-a-little way:
SELECT
(LENGTH(field) - LENGTH(REPLACE(field, 'word', ''))) / LENGTH('word') AS `count`
ORDER BY `count` DESC
This query can be very slow. Also, it looks pretty ugly.
I think you should do it like indexing, with additional table.
Whenever u create, update, or delete a row in your original table, you should update your indexing table. That indexing table should have the columns: word, and the number of occurrences.
I think you are trying to do too much with SQL if all the words are in one field of each row. I recommend to do any text processing/counting with your application after you grab the text fields from the db.