MySQL String Comparison with Percent Output (Position Very Important - mysql

I am trying to compare two entries of 6 numbers, each number which can either can be zero or 1 (i.e 100001 or 011101). If 3 out of 6 match, I want the output to be .5. If 2 out of 6 match, i want the output to be .33 etc.
Note that position matters. A match only occurs when both entries have a 1 in the first position, both have a 0 in the second position etc.
Here are the SQL commands to create the table
CREATE TABLE sim
(sim_key int,
string int);
INSERT INTO sim (sim_key, string)
VALUES (1, 111000);
INSERT INTO sim (sim_key, string)
VALUES (2, 101101);
My desired output to compare the two strings, which share 50% of the characters, and output 50%.
Is it possible to do this sort of comparison in SQL? Thanks in advance

Have a look at this example.
CREATE TABLE sim (sim_key int, string int);
INSERT INTO sim (sim_key, string) VALUES (1, 111000);
INSERT INTO sim (sim_key, string) VALUES (2, 101101);
select a.string A, b.string B,
sum(case when Substring(A.string,Pos,1) = Substring(B.string,Pos,1) then 1 else 0 end) Matches,
count(*) as RowCount,
(sum(case when Substring(A.string,Pos,1) = Substring(B.string,Pos,1) then 1 else 0 end) /
count(*) * 100.0) as PercentMatch
from sim A
cross join sim B
inner join (
select 1 Pos union all select 2 union all select 3
union all select 4 union all select 5 union all select 6) P
on P.Pos between 1 and length(A.string)
where A.sim_key= 1 and B.sim_key = 2
group by a.string, b.string
It is crude and probably included more than required but shows how it can be done. It is better to create a numbers table with just numbers from 1 to 1000 or so, that can be used repeatedly in many queries where a number sequence is required. Such a table will replace the (select .. union virtual table used in the inner join)

Instead of keeping 10010101 as integer convert this binary version to true integer when compare use bit logic AND, result convert to binary and count '1' to how many match...
for convert: http://dev.mysql.com/doc/refman/5.5/en/binary-varbinary.html
for compare: http://dev.mysql.com/doc/refman/5.5/en/bit-functions.html bitwise AND
...

Related

SQL query statement Self Join?

new to SQL.
I have the following set of data
A X Y Z
1 Wind 1 1
2 Wind 2 1
3 Hail 1 1
4 Flood 1 1
4 Rain 1 1
4 Fire 1 1
I would like to select all distinct 'A' fields where for all rows that contain A have flood and rain.
So in this example, the query would return only the number 4 since for the set of all rows that contain A = 4 we have Flood and Rain.
I need the values of A where for a given value 'a' in A, there exists rows with 'a' that must contain all of the following fields provided (in the example Flood and Rain).
Please let me know if you need further clarification.
I need the values of A where for a given value 'a' in A, there exists rows with 'a' that must contain all of the following fields provided (in the example Flood and Rain).
You can use aggregation, and filter with a having clause:
select a
from mytable t
where x in ('Flood', 'Rain') -- either one or the other
having count(*) = 2 -- both match
If tuples (a, x) tuples are not unique, then you want having count(distinct x) = 2 instead.
You Shooud use count(distinct X) group by A and having
count(distinct...) avoid situation where you have two time the same value for X
select A
from my_table
WHERE x in ('Flood', 'Rain')
group A
having count(distinct X) = 2

mySQL count occurances of value on multiple fields. How?

I have a table with 5 fields. Each field can store a number from 1 - 59.
Similar to countif in Excel, how do I count the number of times a number from 1 - 59 shows up in all 5 fields?
Here's an example for the count of occurances for the number 1 in all five fields:
SELECT SUM(pick_1 = 1 OR pick_2 = 1 OR pick_3 = 1 OR pick_4 = 1 OR pick_5 = 1) AS total_count_1
FROM tbldraw
Hopefully I made sense.
There was an answer here that had a solution. I think this is just a variation.
Step1: Create a numbers table (1 field, called id, 59 records (values 1 -59))
Step2:
SELECT numbers_table.number as number
, COUNT(tbldraw.pk_record)
FROM numbers_table
LEFT JOIN tbldraw
ON numbers_table.number = tbldraw.pick_1
OR numbers_table.number = tbldraw.pick_2
OR numbers_table.number = tbldraw.pick_3
OR numbers_table.number = tbldraw.pick_4
OR numbers_table.number = tbldraw.pick_5
GROUP BY number
ORDER BY number
How about a two step process? Assuming a table called summary_table ( int id, int ttl), for each number you care about...
insert into summary_table values (1,
(select count(*)
from table
where field1 = 1 or field2 = 1 or field3 = 1 or field4 = 1 or field5 = 1))
do that 59 times, once for each value. You can use a loop in most cases. Then you can select from the summary_table
select *
from summary_table
order by id
That will do it. I leave the coversion of this SQL into a stored procedure for those that know what database is in use.
The ALL() function, which returns true if the preceding operator is true for all parameters, makes the query particularly elegant and succinct.
To find the count a particular number (eg 3):
select count(*)
from tbldraw
where 3 = all (pick_1, pick_2, pick_3, pick_4, pick_5)
To find the count of all such numbers:
select pick_1, count(*)
from tbldraw
where pick_1 = all (pick_2, pick_3, pick_4, pick_5)
group by pick_1

Populate column with number of substrings in another column

I have two tables "A" and "B". Table "A" has two columns "Body" and "Number." The column "Number" is empty, the purpose is to populate it.
Table A: Body / Number
ABABCDEF /
IJKLMNOP /
QRSTUVWKYZ /
Table "B" only has one column:
Table B: Values
AB
CD
QR
Here is what I am looking for as a result:
ABABCDEF / 3
IJKLMNOP / 0
QRSTUVWKYZ / 1
In other words, I want to create a query that looks up, for each string in the "Body" column, how many times the substrings in the "Values" column appear.
How would you advise me to do that?
Here's the finished query; explanation will follow:
SELECT
Body,
SUM(
CASE WHEN Value IS NULL THEN 0
ELSE (LENGTH(Body) - LENGTH(REPLACE(Body, Value, ''))) / LENGTH(Value)
END
) AS Val
FROM (
SELECT TableA.Body, TableB.Value
FROM TableA
LEFT JOIN TableB ON INSTR(TableA.Body, TableB.Value) > 0
) CharMatch
GROUP BY Body
There's a SQL Fiddle here.
Now for the explanation...
The inner query matches TableA strings with TableB substrings:
SELECT TableA.Body, TableB.Value
FROM TableA
LEFT JOIN TableB ON INSTR(TableA.Body, TableB.Value) > 0
Its results are:
BODY VALUE
-------------------- -----
ABABCDEF AB
ABABCDEF CD
IJKLMNOP
QRSTUVWKYZ QR
If you just count these you'll only get a value of 2 for the ABABCDEF string because it just looks for the existence of the substrings and doesn't take into consideration that AB occurs twice.
MySQL doesn't appear to have an OCCURS type function, so to count the occurrences I used the workaround of comparing the length of the string to its length with the target string removed, divided by the length of the target string. Here's an explanation:
REPLACE('ABABCDEF', 'AB', '') ==> 'CDEF'
LENGTH('ABABCDEF') ==> 8
LENGTH('CDEF') ==> 4
So the length of the string with all AB occurrences removed is 8 - 4, or 4. Divide the 4 by 2 (LENGTH('AB')) to get the number of AB occurrences: 2
String IJKLMNOP will mess this up. It doesn't have any of the target values so there's a divide by zero risk. The CASE inside the SUM protects against this.
You want an update query:
update A
set cnt = (select sum((length(a.body) - length(replace(a.body, b.value, '')) / length(b.value))
from b
)
This uses a little trick for counting the number of occurrence of b.value in a given string. It replaces each occurrence with an empty string and counts the difference in length of the strings. This is divided by the length of the string being replaced.
If you just wanted the number of matches (so the first value would be "2" instead of "3"):
update A
set cnt = (select count(*)
from b
where a.body like concat('%', b.value, '%')
)

how to search for a given sequence of rows within a table in SQL Server 2008

The problem:
We have a number of entries within a table but we are only interested in the ones that appear in a given sequence. For example we are looking for three specific "GFTitle" entries ('Pearson Grafton','Woolworths (P and O)','QRX - Brisbane'), however they have to appear in a particular order to be considered a valid route. (See image below)
RowNum GFTitle
------------------------------
1 Pearson Grafton
2 Woolworths (P and O)
3 QRX - Brisbane
4 Pearson Grafton
5 Woolworths (P and O)
6 Pearson Grafton
7 QRX - Brisbane
8 Pearson Grafton
9 Pearson Grafton
So rows (1,2,3) satisfy this rule but rows (4,5,6) don't even though the first two entries (4,5) do.
I am sure there is a way to do this via CTE's but some help would be great.
Cheers
This is very simple using even good old tools :-) Try this quick-and-dirty solution, assuming your table name is GFTitles and RowNumber values are sequential:
SELECT a.[RowNum]
,a.[GFTitle]
,b.[GFTitle]
,c.[GFTitle]
FROM [dbo].[GFTitles] as a
join [dbo].[GFTitles] as b on b.RowNumber = a.RowNumber + 1
join [dbo].[GFTitles] as c on c.RowNumber = a.RowNumber + 2
WHERE a.[GFTitle] = 'Pearson Grafton' and
b.[GFTitle] = 'Woolworths (P and O)' and
c.[GFTitle] = 'QRX - Brisbane'
Assuming RowNum has neither duplicates nor gaps, you could try the following method.
Assign row numbers to the sought sequence's items and join the row set to your table on GFTitle.
For every match, calculate the difference between your table's row number and that of the sequence. If there's a matching sequence in your table, the corresponding rows' RowNum differences will be identical.
Count the rows per difference and return only those where the count matches the number of sequence items.
Here's a query that implements the above logic:
WITH SoughtSequence AS (
SELECT *
FROM (
VALUES
(1, 'Pearson Grafton'),
(2, 'Woolworths (P and O)'),
(3, 'QRX - Brisbane')
) x (RowNum, GFTitle)
)
, joined AS (
SELECT
t.*,
SequenceLength = COUNT(*) OVER (PARTITION BY t.RowNum - ss.RowNum)
FROM atable t
INNER JOIN SoughtSequence ss
ON t.GFTitle = ss.GFTitle
)
SELECT
RowNum,
GFTitle
FROM joined
WHERE SequenceLength = (SELECT COUNT(*) FROM SoughtSequence)
;
You can try it at SQL Fiddle too.

mysql distribution of combinations/values

I have a mysql table which contains some random combination of numbers. For simplicity take the following table as example:
index|n1|n2|n3
1 1 2 3
2 4 10 32
3 3 10 4
4 35 1 2
5 27 1 3
etc
What I want to find out is the number of times a combination has occured in the table. For instance, how many times has the combination of 4 10 or 1 2 or 1 2 3 or 3 10 4 etc occured.
Do I have to create another table that contains all possible combinations and do comparison from there or is there another way to do this?
For a single combination, this is easy:
SELECT COUNT(*)
FROM my_table
WHERE n1 = 3 AND n2 = 10 AND n3 = 4
If you want to do this with multiple combinations, you could create a (temporary) table of them and join that table with you data, something like this:
CREATE TEMPORARY TABLE combinations (
id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
n1 INTEGER, n2 INTEGER, n3 INTEGER
);
INSERT INTO combinations (n1, n2, n3) VALUES
(1, 2, NULL), (4, 10, NULL), (1, 2, 3), (3, 10, 4);
SELECT c.n1, c.n2, c.n3, COUNT(t.id) AS num
FROM combinations AS c
LEFT JOIN my_table AS t
ON (c.n1 = t.n1 OR c.n1 IS NULL)
AND (c.n2 = t.n2 OR c.n2 IS NULL)
AND (c.n3 = t.n3 OR c.n3 IS NULL)
GROUP BY c.id;
(demo on SQLize)
Note that this query as written is not very efficient due to the OR c.n? IS NULL clauses, which MySQL isn't smart enough to optimize. If all your combinations contain the same number of terms, you can leave those out, which will allow the query to make use of indexes on the data table.
Ps. With the query above, the combination (1, 2, NULL) won't match (35, 1, 2). However, (NULL, 1, 2) will, so, if you want both, a simple workaround would be to just include both patterns in your table of combinations.
If you actually have many more columns than shown in your example, and you want to match patterns that occur in any set of consecutive columns, then your really should pack your columns into a string and use a LIKE or REGEXP query. For example, if you concatenate all your data columns into a comma-separated string in a column named data, you could search it like this:
INSERT INTO combinations (pattern) VALUES
('1,2'), ('4,10'), ('1,2,3'), ('3,10,4'), ('7,8,9');
SELECT c.pattern, COUNT(t.id) AS num
FROM combinations AS c
LEFT JOIN my_table AS t
ON CONCAT(',', t.data, ',') LIKE CONCAT('%,', c.pattern, ',%')
GROUP BY c.id;
(demo on SQLize)
You could make this query somewhat faster by making the prefixes and suffixes added with CONCAT() part of the actual data in the tables, but this is still going to be a fairly inefficient query if you have a lot of data to search, because it cannot make use of indexes. If you need to do this kind of substring searching on large datasets efficiently, you may want to use something better suited for than specific purpose than MySQL.
You only have three columns in the table, so you are looking for combinations of 1, 2, and 3 elements.
For simplicity, I'll start with the following table:
select index, n1 as n from t union all
select index, n2 from t union all
select index, n3 from t union all
select distinct index, -1 from t union all
select distinct index, -2 from t
Let's call this "values". Now, we want to get all triples from this table for a given index. In this case, -1 and -2 represent NULL.
select (case when v1.n < 0 then NULL else v1.n end) as n1,
(case when v2.n < 0 then NULL else v2.n end) as n2,
(case when v3.n < 0 then NULL else v3.n end) as n3,
count(*) as NumOccurrences
from values v1 join
values v2
on v1.n < v2.n and v1.index = v2.index join
values v3
on v2.n < v3.n and v2.index = v3.index
This is using the join mechanism to generate the combinations.
This method finds all combinations regardless of ordering (so 1, 2, 3 is the same as 2, 3, 1). Also, this ignores duplicates, so it cannot find (1, 2, 2) if 2 is repeated twice.
SELECT
CONCAT(CAST(n1 AS VARCHAR(10)),'|',CAST(n2 AS VARCHAR(10)),'|',CAST(n3 AS VARCHAR(10))) AS Combination,
COUNT(CONCAT(CAST(n1 AS VARCHAR(10)),'|',CAST(n2 AS VARCHAR(10)),'|',CAST(n3 AS VARCHAR(10)))) AS Occurrences
FROM
MyTable
GROUP BY
CONCAT(CAST(n1 AS VARCHAR(10)),'|',CAST(n2 AS VARCHAR(10)),'|',CAST(n3 AS VARCHAR(10)))
This creates a single column that represents the combination of the values within the 3 columns by concatenating the values. It will count the occurrences of each.