Finding matches between 30,000+ data sets

Finding matches between 30,000+ data sets - mysql

For simplicity sake, lets use a case example of 3 colors with corresponding numbers (there is actually 30,000+ different 'colors' and 254 different 'numbers' in real life though)
Red - 0, 1, 2, 3, 10, 15
Green - 0, 2, 3, 20
Blue - 2, 10, 11, 12
I want to find the matches between them (rgb, rg, rb), as well as keep a tally of the number of #s shared between the set:
rgb = 1
rg = 2
rb = 2
Finally it'll need to determine a ratio of the number of #s shared compared to the number of distinct #s in the set.
rgb = 1/9 (since it has a distinct: 0, 1, 2, 3, 10, 11, 12, 15, 20)
rg = 2/7 (0, 1, 2, 3, 10, 15, 20)
rb = 2/8 (0, 1, 2, 3, 10, 11, 12, 15)
So the total output would be
match | # of matches | % |
rgb | 1 | 1/9
rg | 2 | 2/7
rb | 2 | 2/8
The algorithm I was able to come up w/is you have each color in a table and map the numbers associated with it (aka red (table Name), 0, 1, 2, 3, 10, 15 (data)). Then take the color with the most 'numbers' and compare it to every other color's hours, find matches. Once done with that color you can 100% ignore it and move onto the next color and do comparisons with n-1.
Take the example:
1) Select red
2) Does any other color share 0
3) Does any other color share 1
....etc
4) Select blue
5) Does any other color minus red share .....
I know that there has to be a more efficient way to do this, any suggestions?
Thanks for the help.

As there are only 254 (or 255 if your 0-254 comment is correct), then you can express the sets of 'numbers' for each 'colour' as a 256-bit integer. Then the number of shared numbers for r and g is just the bitcount of (r and b) and the number of distinct numbers is the bitcount of (r or b), so using your example,
if R is the bit-set for red, G is the bitset for B etc:
match | # of matches | % |
rgb | bitcount(R and G and B) | bitcount(R and G and B)/bitcount(R or G or B) |
rg | bitcount(R and G) | bitcount(R and G)/bitcount(R or G) |
rb | bitcount(R and B) | bitcount(R and B)/bitcount(R or B) |

Related

How to find the next sequence in binary arithmetically?

https://i.stack.imgur.com/Dhln9.png
From the given array data structure and if the base address is 010 base2 at position 0, how do you find the next sequence in binary in order to find the memory address of a specific position? I don't know how 1 = 011 base 2, 2 = 100 base 2, 3 = 101 base 2, etc.

the base address is 010 base2 ... I don't know how 1 = 011 base 2, 2 = 100 base 2, 3 = 101 base 2, etc.
It appears there's just an offset of +2 to what's otherwise an ordinary sequence.
binary 000, 001, 010, 011... + decimal 2 = binary 010, 011, 100, 101...
Just add 2 before you convert to binary and you should have the right location.

SSRS Case Statement

I have created a table in SSRS of various grades, but I need to look at the value of 2 grades together and assign it an alternate grade. I cannot put this as a CASE in the SELECT as the way the database is designed the values are not stored in one row, but in multiple rows, therefore it cannot combine the data in a new column. For example, this is one students grades represented in the DB
634 Attainment *#1#2#3#4#N/A NULL 1 2
636 Effort A*#A#B#C#N/A NULL A 2
637 Focus EX#ME#WB#N/A NULL EX 1
638 Participation EX#ME#WB#N/A NULL ME 2
639 Groupwork EX#ME#WB#N/A NULL ME 2
640 Rigour EX#ME#WB#N/A NULL ME 2
641 Curiosity EX#ME#WB#N/A NULL ME 2
642 Initiative EX#ME#WB#N/A NULL ME 2
643 Self Organisation EX#ME#WB#N/A NULL ME 2
644 Perseverance EX#ME#WB#N/A NULL ME 2
I have created a table that has grouped the grades based on the pupil ID and it is now represented as one row and column headings for each grade (effort, Focus etc).
I have tried to do a sum using the ReportItems!Textbox1.Value but I can't use this method as it is not an aggregate function. What I wanted to do was
IF (ReportItems!Textbox104.Value + ReportItems!Textbox105.Value = 2) THEN 5
Is there a way to do this?
ADDITIONAL:
I have just tried:
=SWITCH(ReportItems!Textbox104.Value + ReportItems!Textbox105.Value = 2, 5,
ReportItems!Textbox104.Value + ReportItems!Textbox105.Value = 3, 4,
ReportItems!Textbox104.Value + ReportItems!Textbox105.Value = 4, 3,
ReportItems!Textbox104.Value + ReportItems!Textbox105.Value = 5, 2,
ReportItems!Textbox104.Value + ReportItems!Textbox105.Value = 6, 1,
ReportItems!Textbox104.Value + ReportItems!Textbox105.Value = 7, 0,
"NULL"
)
This is returning an Error.

I finally resolved this by using this expression:
=IIF(ReportItems!Textbox104.Value + (ReportItems!Textbox105.Value = 2), 5,
IIF(ReportItems!Textbox104.Value + (ReportItems!Textbox105.Value = 3), 4,
IIF(ReportItems!Textbox104.Value + (ReportItems!Textbox105.Value = 4), 3,
IIF(ReportItems!Textbox104.Value + (ReportItems!Textbox105.Value = 5), 2,
IIF(ReportItems!Textbox104.Value + (ReportItems!Textbox105.Value = 6), 1,
IIF(ReportItems!Textbox104.Value + ReportItems!Textbox105.Value = 7, 0,
"NULL"))))))

Compare the digits of two integers in each decimal position

I am not sure I am describing the problem using the correct terms, my math English is not that good.
What I need to do is check if they match for each digit of two integers based on the position of the digit: ones, tens, .. etc
For example check the following table of different numbers and the wanted comparison result:
number1 | number2 | desired result
-----------------------------------
100 | 101 | 001
443 | 143 | 300
7001 | 8000 | 1001
6001 | 8000 | 2001
19 | 09 | 10
Basically I need the absolute value of subtraction for each digit alone. So for the first example:
1 0 0
1 0 1 -
--------
0 0 1
And second:
4 4 3
1 4 3 -
-------
3 0 0
And third:
7 0 0 1
8 0 0 0 -
---------
1 0 0 1
This needs to be done in mysql. Any ideas please?

This should do the job if your numbers are below 10000.
If they exceed, simply modify the query ;)
SELECT number1,
number2,
REVERSE(CONCAT(ABS(SUBSTRING(REVERSE(number1), 1, 1) - SUBSTRING(REVERSE(number2), 1, 1)),
IF(CHAR_LENGTH(number1) > 1, ABS(SUBSTRING(REVERSE(number1), 2, 1) - SUBSTRING(REVERSE(number2), 2, 1)), ''),
IF(CHAR_LENGTH(number1) > 2, ABS(SUBSTRING(REVERSE(number1), 3, 1) - SUBSTRING(REVERSE(number2), 3, 1)), ''),
IF(CHAR_LENGTH(number1) > 3, ABS(SUBSTRING(REVERSE(number1), 4, 1) - SUBSTRING(REVERSE(number2), 4, 1)), ''))) as `desired result`
FROM numbers
for 3 digit numbers:
SELECT number1,
number2,
CONCAT(
ABS(SUBSTRING(number1, 1, 1) - SUBSTRING(number2, 1,1)),
ABS(SUBSTRING(number1, 2, 1) - SUBSTRING(number2, 2,1)),
ABS(SUBSTRING(number1, 3, 1) - SUBSTRING(number2, 3,1))
)
FROM numbers
actually you don't have reverse the string at all. this comes from a more mathematical approach I tried before ;)

if you want to do it with integers only, it can be done this way (for 5 digits as an example):
select abs(number1/10000 - number2/10000) * 10000 +
abs(number1/1000 % 10 - number2/100 % 10) * 1000 +
abs(number1/100 % 10 - number2/100 % 10) * 100 +
abs(number1/10 % 10 - number2/10 % 10) * 10 +
abs(number1 % 10 - number2 % 10)

how to define dynamic nested loop python function

a = [1]
b = [2,3]
c = [4,5,6]
d = [a,b,c]
for x0 in d[0]:
for x1 in d[1]:
for x2 in d[2]:
print(x0,x1,x2)
Result:
1 2 4
1 2 5
1 2 6
1 3 4
1 3 5
1 3 6
Perfect, now my question is how to define this to function, considering ofcourse there could be more lists with values. The idea is to get function, which would dynamicaly produce same result.
Is there a way to explain to python: "do 8 nested loops for example"?

You can use itertools to calculate the products for you and can use the * operator to convert your list into arguments for the itertools.product() function.
import itertools
a = [1]
b = [2,3]
c = [4,5,6]
args = [a,b,c]
for combination in itertools.product(*args):
print combination
Output is
(1, 2, 4)
(1, 2, 5)
(1, 2, 6)
(1, 3, 4)
(1, 3, 5)
(1, 3, 6)

How to multiply two rows or columns?

a = [1, 2, 3];
b = [3, 2, 1];
c = a * b;
yields
error: operator *: nonconformant arguments (op1 is 1x3, op2 is 1x3)
Why can I not multiply these two rows of the same size?
I shouldn't have to run a for loop for this, but I don't know of another way...
I saw section 1.2.3 here, which indicates (to me at least) that I should be able to do it.

You made 2 rows, which can't be multiplied together.
The general form of matrix multiplication is "Row-Dot-Column", which means take the dot product of each row with each column. In your case you have 1 row, but 3 columns (which doesn't work!).
a = [1, 2, 3];
b = [3, 2, 1];
c = a' * b;
ans =
3 2 1
6 4 2
9 6 3

I see now that there is a .* operator. I did not know where to find that in the documentation, and it does what I want.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Finding matches between 30,000+ data sets - mysql

Related

How to find the next sequence in binary arithmetically?

SSRS Case Statement

Compare the digits of two integers in each decimal position

how to define dynamic nested loop python function

How to multiply two rows or columns?

Categories

Resources