Populate column with number of substrings in another column - mysql

I have two tables "A" and "B". Table "A" has two columns "Body" and "Number." The column "Number" is empty, the purpose is to populate it.
Table A: Body / Number
ABABCDEF /
IJKLMNOP /
QRSTUVWKYZ /
Table "B" only has one column:
Table B: Values
AB
CD
QR
Here is what I am looking for as a result:
ABABCDEF / 3
IJKLMNOP / 0
QRSTUVWKYZ / 1
In other words, I want to create a query that looks up, for each string in the "Body" column, how many times the substrings in the "Values" column appear.
How would you advise me to do that?

Here's the finished query; explanation will follow:
SELECT
Body,
SUM(
CASE WHEN Value IS NULL THEN 0
ELSE (LENGTH(Body) - LENGTH(REPLACE(Body, Value, ''))) / LENGTH(Value)
END
) AS Val
FROM (
SELECT TableA.Body, TableB.Value
FROM TableA
LEFT JOIN TableB ON INSTR(TableA.Body, TableB.Value) > 0
) CharMatch
GROUP BY Body
There's a SQL Fiddle here.
Now for the explanation...
The inner query matches TableA strings with TableB substrings:
SELECT TableA.Body, TableB.Value
FROM TableA
LEFT JOIN TableB ON INSTR(TableA.Body, TableB.Value) > 0
Its results are:
BODY VALUE
-------------------- -----
ABABCDEF AB
ABABCDEF CD
IJKLMNOP
QRSTUVWKYZ QR
If you just count these you'll only get a value of 2 for the ABABCDEF string because it just looks for the existence of the substrings and doesn't take into consideration that AB occurs twice.
MySQL doesn't appear to have an OCCURS type function, so to count the occurrences I used the workaround of comparing the length of the string to its length with the target string removed, divided by the length of the target string. Here's an explanation:
REPLACE('ABABCDEF', 'AB', '') ==> 'CDEF'
LENGTH('ABABCDEF') ==> 8
LENGTH('CDEF') ==> 4
So the length of the string with all AB occurrences removed is 8 - 4, or 4. Divide the 4 by 2 (LENGTH('AB')) to get the number of AB occurrences: 2
String IJKLMNOP will mess this up. It doesn't have any of the target values so there's a divide by zero risk. The CASE inside the SUM protects against this.

You want an update query:
update A
set cnt = (select sum((length(a.body) - length(replace(a.body, b.value, '')) / length(b.value))
from b
)
This uses a little trick for counting the number of occurrence of b.value in a given string. It replaces each occurrence with an empty string and counts the difference in length of the strings. This is divided by the length of the string being replaced.
If you just wanted the number of matches (so the first value would be "2" instead of "3"):
update A
set cnt = (select count(*)
from b
where a.body like concat('%', b.value, '%')
)

Related

count comma-separated values from a column - sql

I want count the length of a comma separated column
I have use these
(LENGTH(Col2) - LENGTH(REPLACE(Col2,",","")) + 1)
in my select query.
Demo:
id | mycolumn
1 2,5,8,60
2 4,5,1
3 5,Null,Null
query result for first two row is coming correctly.for 1 = 4 ,2 = 3 but for 3rd row it is calculating null value also.
Here is what I believe the actual state of your data is:
id | mycolumn
1 2,5,8,60
2 4,5,1
3 NULL
In other words, the entire value for mycolumn in your third record is NULL, likely from doing an operation involving a NULL value. If you actually had the text NULL your current query should still work.
The way to get around this would be to use COALESCE(val, "") when handling the NULL values in your strings.
Crude way of doing it is to replace the occurances of ',Null' with nothing first:-
SELECT a.id, (LENGTH(REPLACE(mycolumn, ',Null', '')) - LENGTH(REPLACE(REPLACE(mycolumn, ',Null', ''),",","")) + 1)
FROM some_table a
If the values refer to the id of rows in another table then you can join against that table using FIND_IN_SET and then count the matches (assuming that the string 'Null' is not an id on that other table)
SELECT a.id, COUNT(b.id)
FROM some_table a
INNER JOIN id_list_table b
ON FIND_IN_SET(b.id, a.mycolumn)
GROUP BY a.id

Natural Sorting SQL ORDER BY

Can anyone lend me a hand as to what I should append to my ORDER BY statement to sort these values naturally:
1
10
2
22
20405-109
20405-101
20404-100
X
Z
D
Ideally I'd like something along the lines of:
1
2
10
22
20404-100
20405-101
20405-109
D
X
Z
I'm currently using:
ORDER BY t.property, l.unit_number
where the values are l.unit_number
I've tried doing l.unit_number * 1 and l.unit_number + 0 but they haven't worked.
Should I be doing sort of ORDER conditional, such as Case When IsNumeric(l.unit_number)?
Thank you.
This will do it:
SELECT value
FROM Table1
ORDER BY value REGEXP '^[A-Za-z]+$'
,CAST(value as SIGNED INTEGER)
,CAST(REPLACE(value,'-','')AS SIGNED INTEGER)
,value
The 4 levels of the ORDER BY:
REGEXP assigns any alpha line a 1 and non-alphas a 0
SIGNED INT Sorts all of the numbers by the portion preceding the dash.
SIGNED INT after removing the dash sorts any of the items with the same value before the dash by the portion after the dash. Potentially could replace number 2, but wouldn't want to treat 90-1 the same as 9-01 should the case arise.
Sorts the letters alphabetically.
Demo: SQL Fiddle

Counting comma separated values in TSQL

SCHEMA / DATA for TABLE :
SubscriberId NewsletterIdCsv
------------ ---------------
11 52,52,,52
We have this denormalized data, where I need to count the number of comma separated values, for which I am doing this :
SELECT SUM(len(newsletteridcsv) - len(replace(rtrim(ltrim(newsletteridcsv)), ',','')) +1) as SubscribersSubscribedtoNewsletterCount
FROM TABLE
WHERE subscriberid = 11
Result :
SubscribersSubscribedtoNewsletterCount
--------------------------------------
4
The problem is some of our data has blanks / spaces in between the comma separated values, if I run the above query the expected result should be 3 (as one of the value is blank space), how do I check in my query to exclude the blank spaces?
EDIT :
DATA :
SubscriberId NewsletterIdCsv
------------ ---------------
11 52,52,,52
12 22,23
I need to get an accumulative SUM instead of just each rows sum, so for this above data I need to have just a final count i.e. 5 in this case, excluding the blank space.
Here's one solution, although their may be a more efficient way:
SELECT A.[SubscriberId],
SUM(CASE WHEN Split.a.value('.', 'VARCHAR(100)') = '' THEN 0 ELSE 1 END) cnt
FROM
(
SELECT [SubscriberId],
CAST ('<M>' + REPLACE(NewsletterIdCsv, ',', '</M><M>') + '</M>' AS XML) AS String
FROM YourTable
) AS A
CROSS APPLY String.nodes ('/M') AS Split(a)
GROUP BY A.[SubscriberId]
And the SQL Fiddle.
Basically it converts your NewsletterIdCsv field to XML and then uses CROSS APPLY to split the data. Finally, using CASE to see if it's blank and SUM the non-blank values. Alternatively, you could probably build a UDF to do something similar.

Select data which have same letters

I'm having trouble with this SQL:
$sql = mysql_query("SELECT $menucompare ,
(COUNT($menucompare ) * 100 / (SELECT COUNT( $menucompare )
FROM data WHERE $ww = $button )) AS percentday FROM data WHERE $ww >0 ");
$menucompare is table fields names what ever field is selected and contains data bellow
$button is the week number selected (lets say week '6')
$ww table field name with row who have the number of week '6'
For example, I have data in $menucompare like that:
123456bool
521478bool
122555heel
147788itoo
and I want to select those, who have same word in the last of the data and make percentage.
The output should be like that:
bool -- 50% (2 entries)
heel -- 25% (1 entry)
itoo -- 25% (1 entry)
Any clearness to my SQL will be very appreciated.
I didn't find anything like that around.
Well, keeping data in such format probably not the best way, if possible, split the field into 2 separate ones.
First, you need to extract the string part from the end of the field.
if the length of the string / numeric parts is fixed, then it's quite easy;
if not, you should use regular expressions which, unfortunately, are not there by default with MySQL. There's a solution, check this question: How to do a regular expression replace in MySQL?
I'll assume, that numeric part is fixed:
SELECT s.str, CAST(count(s.str) AS decimal) / t.cnt * 100 AS pct
FROM (SELECT substr(entry, 7) AS str FROM data) AS s
JOIN (SELECT count(*) AS cnt FROM data) AS t ON 1=1
GROUP BY s.str, t.cnt;
If you'll have regexp_replace function, then substr(entry, 7) should be replaced to regexp_replace(entry, '^[0-9]*', '') to achieve the required result.
Variant with substr can be tested here.
When sorting out problems like this, I would do it in two steps:
Sort out the SQL independently of the presentation language (PHP?).
Sort out the parameterization of the query and the presentation of the results after you know you've got the correct query.
Since this question is tagged 'SQL', I'm only going to address the first question.
The first step is to unclutter the query:
SELECT menucompare,
(COUNT(menucompare) * 100 / (SELECT COUNT(menucompare) FROM data WHERE ww = 6))
AS percentday
FROM data
WHERE ww > 0;
This removes the $ signs from most of the variable bits, and substitutes 6 for the button value. That makes it a bit easier to understand.
Your desired output seems to need the last four characters of the string held in menucompare for grouping and counting purposes.
The data to be aggregated would be selected by:
SELECT SUBSTR(MenuCompare, -4) AS Last4
FROM Data
WHERE ww = 6
The divisor in the percentage is the count of such rows, but the sub-stringing isn't necessary to count them, so we can write:
SELECT COUNT(*) FROM Data WHERE ww = 6
This is exactly what you have anyway.
The divdend in the percentage will be the group count of each substring.
SELECT Last4, COUNT(Last4) * 100.0 / (SELECT COUNT(*) FROM Data WHERE ww = 6)
FROM (SELECT SUBSTR(MenuCompare, -4) AS Last4
FROM Data
WHERE ww = 6
) AS Week6
GROUP BY Last4
ORDER BY Last4;
When you've demonstrated that this works, you can re-parameterize the query and deal with the presentation of the results.

MySQL String Comparison with Percent Output (Position Very Important

I am trying to compare two entries of 6 numbers, each number which can either can be zero or 1 (i.e 100001 or 011101). If 3 out of 6 match, I want the output to be .5. If 2 out of 6 match, i want the output to be .33 etc.
Note that position matters. A match only occurs when both entries have a 1 in the first position, both have a 0 in the second position etc.
Here are the SQL commands to create the table
CREATE TABLE sim
(sim_key int,
string int);
INSERT INTO sim (sim_key, string)
VALUES (1, 111000);
INSERT INTO sim (sim_key, string)
VALUES (2, 101101);
My desired output to compare the two strings, which share 50% of the characters, and output 50%.
Is it possible to do this sort of comparison in SQL? Thanks in advance
Have a look at this example.
CREATE TABLE sim (sim_key int, string int);
INSERT INTO sim (sim_key, string) VALUES (1, 111000);
INSERT INTO sim (sim_key, string) VALUES (2, 101101);
select a.string A, b.string B,
sum(case when Substring(A.string,Pos,1) = Substring(B.string,Pos,1) then 1 else 0 end) Matches,
count(*) as RowCount,
(sum(case when Substring(A.string,Pos,1) = Substring(B.string,Pos,1) then 1 else 0 end) /
count(*) * 100.0) as PercentMatch
from sim A
cross join sim B
inner join (
select 1 Pos union all select 2 union all select 3
union all select 4 union all select 5 union all select 6) P
on P.Pos between 1 and length(A.string)
where A.sim_key= 1 and B.sim_key = 2
group by a.string, b.string
It is crude and probably included more than required but shows how it can be done. It is better to create a numbers table with just numbers from 1 to 1000 or so, that can be used repeatedly in many queries where a number sequence is required. Such a table will replace the (select .. union virtual table used in the inner join)
Instead of keeping 10010101 as integer convert this binary version to true integer when compare use bit logic AND, result convert to binary and count '1' to how many match...
for convert: http://dev.mysql.com/doc/refman/5.5/en/binary-varbinary.html
for compare: http://dev.mysql.com/doc/refman/5.5/en/bit-functions.html bitwise AND
...