count multiple occurence substrings in MySql - mysql

I'm pulling data from Twitter API into my DB. There is a column 'hashtags' which stores a list of hashtags used in the tweet.
Table name: brexittweets
Column: hashtags varchar(500)
I want to count the number of hashtags. For example
Hashtags
Tweet1: ['EUref', 'Brexit', 'poll']
Tweet2: ['Brexit', 'Blair']
Tweet3: ['Brexit', 'Blair', 'EUref']
Result should be:
hashtag count(hashtag)
Brexit 3
EUref 2
Blair 2
poll 1
What I was thinking of doing:
Tried to take substring between quotes ' ', but it occurs multiple times in the same row.
Tried using strpos to find instances of ' ', but it returns only the first instance.
Is there a way to do this with queries? I was thinking of trying out a procedure, but it gets complicated because I need to print these results on a web page using PHP.

If only you've normalized your table such that each tag in a tweet gets stored on its own row, your problem would be solved easily by using COUNT with GROUP BY.

Assuming all the tags are separated by ', ', you can do the following:
SELECT
hashtags,
ROUND (
(
LENGTH(hashtags)
- LENGTH( REPLACE ( hashtags , "', '", "") )
) / 4) + 1
AS count
from brexittweets
Here's the SQL Fiddle.

Related

How to select parts of string in MySQL 5.x

I have a varchar(255) field within a source table and the following contents:
50339 My great example
2020002 Next ID but different title
202020 Here we go
Now I am processing the data and do an insert select query on it. From this field I would need the INT number at the beginning of the field. IT IS followed by 2 spaces and a text with var length, this text is what I need as well but for another field. In General I want to to put text and ID in two fields which are now in one.
I tried to grab it like this:
SELECT STATUS REGEXP '^(/d{6,8}) ' FROM products_test WHERE STATUS is not null
But then I learned that in MySQL 5.x there are no regexp within the SELECT statement.
How could I seperate those values within a single select statment, so I can use it in my INSERT SELECT?
From the correct solution of user slaakso, resulted another related problem since somtimes the STATUS field is empty which then results in only one insert, but in case there is a value I split it into two fields. So the count does not match.
My case statement with his solution somehow contains a syntax problem:
CASE STATUS WHEN ''
THEN(
NULL,
NULL
)
ELSE(
cast(STATUS as unsigned),
substring(STATUS, locate(' ', STATUS)+3)
)
END
You can do following. Note that you need to treat the columns separately:
select
if(ifnull(status, '')!='', cast(status as unsigned), null),
if(ifnull(status, '')!='', substring(status, locate(' ', status)+2), null)
from products_test;
See db-fiddle

SQL Select if substring occurs then copy until substring else keep original

I have a database with TV Guide data, and in my description field (VARCHAR) sometimes i have a '|' where behind it is the rating. I used to check this in php, before converting it all to XML, but i would like to do this in SQL.
So if i have this string:
This is the description | rating pg-13
Then i want to keep the
This is the description
but if there is no '|' i want the whole string.
I tried using substring, but can't get it to work.
My query now is:
SELECT *, SUBSTRING(`long_description`, 1, POSITION('|' IN `long_description`)) FROM `programs` WHERE station_id = 1
this works only one way - this gives me the string before the '|' but if there is no '|' it gives an empty column.
Based on the use of backticks, you might be using MySQL. If so, substring_index() does exactly what you want:
select substring_index(long_description, '|', 1)
How about this:
SELECT
*,
IF(long_description LIKE '%|%',
SUBSTRING(`long_description`,
1,
POSITION('|' IN `long_description`)),
long_description)
FROM
`programs`
WHERE
station_id = 1
The IF clause basically just checks if you have a | in the field and applies your routine when this is true. Else it will simply return the complete long_description value.

Find the all customer names consisting of three or more words (for example King George V)

schema:
customers(name, mailid, city)
What to find:
Find all customer names consisting of three or more words (for example King George V).
What I tried:
select name from customers
where name like
'%[A-Za-z0-9][A-Za-z0-9]% %[A-Za-z0-9][A-Za-z0-9]% %[A-Za-z0-9][A-Za-z0-9]%'
what is surprising me:
If I am trying for two words (removing the last %[A-Za-z0-9]% from my query), its working fine but its not working for three words :(
MySQL Solution:
If a name has words separated by space character, then,
Try the following:
select name from customers
where ( length( name )
-
length( replace( name, ' ', '' ) ) + 1
) >= 3
In t-sql, the like clause can contain multiple wild card checks - eg:
SELECT * FROM Customers WHERE Name like '% % %'
will return those Names where two spaces are contained.
If you are consistent with the spacing between names, you could use this logic
SELECT LENGTH(name)-LENGTH(REPLACE(name,' ',''))
FROM customers
Or you can try this too if your sql dont have length function ( which is the situation I have when I'm doing an online exercies...) Inspired by answers above
SELECT name FROM customers WHERE (replace(name,' ','*')) LIKE '%*%*%'

Match comma seperated list with input in SQL statement

Im optimizing my SQL statement to make it faster.
I have a comma seperated list with zipcodes like
1111, 1112,1115,1112 etc etc
Now in my query I want to match if the iput matches 1 of those zipcodes. If so.. then it will return a ID of the object that has all those zipcodes.
But what is the best way to do this now im doing
AND ( loc.loc_zip LIKE '%".$_REQUEST['zip']."%'
Validation of the input will be added ofcourse.. but this is just for testing. But I have tested this and it seems a bit slow.
Is this the best way to do this ?
you should use 'in'
select * from Users where userid in (1,2,3,4,45,6,656)
Edit:
if the ZipCodes are Chars, you can only use in if you separate them by ''
select * from loc where loc.loc_zip in ('1111','1112','1115','1112')
if .$_REQUEST['zip']. has ' ' then->
select * from loc where loc.loc_zip in (.$_REQUEST['zip'].)
if the ZipCodes are int you can use the first statement
Bare in mid you must intersect your list with '' or it wont work

Find as close as exact matches in database - which way is better?

I have a situation:
I have a database (MySQL) which contains products and their codes like this
BLACK SUGAR BS 709
HOT SAUCE AX889/9
TOMY 8861
I got an excel spreadsheet which I converted to CSV, this contains prices for the products. Its structure consists in 2 columns, code and price, like this:
BS709 23.00
AX 889 /9 10.89
8861 1.69
I made a script to update the products prices by searching in the database for the respective product code, using a FOREACH and %LIKE% query.
FOREACH row in CSV, search the database using "WHERE product_code LIKE %code%.
This is offcourse a primitive and not so succesfull way of updating the prices, because the codes in CSV are not an exact match (in syntax) of those in the database so if I have two products in the DB containing BS709 (BS70923) in their code I get multiple matches.
Is there a better way of doing this ?
You could trim the columns of spaces and other characters using MySQL replace() before comparing. This will return all exact matches, regardless of any spaces contained.
SELECT * FROM table WHERE REPLACE( product_code, ' ', '' ) LIKE 'code'
Given your examples, I would recommend removing all spaces from both, and then just looking for when the beginning or end of a code matches exactly:
where replace(e.code, ' ', '') like concat(replace(db.code, ' ', ''), '%') or
replace(e.code, ' ', '') like concat('%', replace(db.code, ' ', '')) or
replace(db.code, ' ', '') like concat(replace(e.code, ' ', ''), '%') or
replace(db.code, ' ', '') like concat('%', replace(e.code, ' ', ''));
This may not work for the specific case when one code is a prefix of another.
In any case, if the product codes in a spreadsheet are different from the product codes in the database, I think you have bigger problems. If you cannot really fix the spreadsheets, I would recommend that you manually/semi-automatically create a synonyms table in the database. This would have the Excel product code in one column and the correct product code in the other. Then you can do the lookup just by joining this together.
Yes. That is work. But probably less work than struggling with this problem and getting poor results that have to be repeatedly updated.