SQL Query for exact match in many to many relation - mysql

I have the following tables(only listing the required attributes)
medicine (id, name),
generic (id, name),
med_gen (med_id references medicine(id),gen_id references generic(id), potency)
Sample Data
medicine
(1, 'Crocin')
(2, 'Stamlo')
(3, 'NT Kuf')
generic
(1, 'Hexachlorodine')
(2, 'Methyl Benzoate')
med_gen
(1, 1, '100mg')
(1, 2, '50ml')
(2, 1, '100mg')
(2, 2, '60ml')
(3, 1, '100mg')
(3, 2, '50ml')
I want all the medicines which are equivalent to a given medicine. Those medicines are equivalent to each other that have same generic as well as same potency. In the above sample data, all the three have same generics, but only 1 and three also have same potency for the corresponding generics. So 1 and 3 are equivalent medicines.
I want to find out equivalent medicines given a medicine id.
NOTE : One medicine may have any number of generics. Medicine table has around 102000 records, generic table around 2200 and potency table around 200000 records. So performance is a key point.
NOTE 2 : The database used in MySQL.

One way to do it in MySQL is to leverage GROUP_CONCAT() function
SELECT g.med_id
FROM
(
SELECT med_id, GROUP_CONCAT(gen_id ORDER BY gen_id) gen_id, GROUP_CONCAT(potency ORDER BY potency) potency
FROM med_gen
WHERE med_id = 1 -- here 1 is med_id for which you're trying to find analogs
) o JOIN
(
SELECT med_id, GROUP_CONCAT(gen_id ORDER BY gen_id) gen_id, GROUP_CONCAT(potency ORDER BY potency) potency
FROM med_gen
WHERE med_id <> 1 -- here 1 is med_id for which you're trying to find analogs
GROUP BY med_id
) g
ON o.gen_id = g.gen_id
AND o.potency = g.potency
Output:
| MED_ID |
|--------|
| 3 |
Here is SQLFiddle demo

Related

SSRS - Lookup only on certain columns in a matrix

I have a matrix table with a column group "Application questions" let's say these are in table 1. Some of the questions have unique string values such as: Name, ID number, email address. But others have an integer value that relates to an actual value for a separate lookup table (table 2), for example, the values for the column "Gender" are 1, 2, 3, for Male, Female, Other. Is there a way in the lookup function that I can isolate the columns that only have integer values or alternatively ignore the other columns with unique string values?
Table1
NAME ATTRIBUTE_id ATTRIBUTE
-----------------------------------------
James 5 1
James 6 james#email.com
James 7 8
Table2
Lookup_id ATTRIBUTE_id Description
-----------------------------------------
1 5 Male
2 5 Female
3 5 Other
8 7 New York
9 7 Los Angeles
Output
NAME | Email | Gender | City
-------------------------------------------------------
James james#email.com Male New York
Hope that makes sense!
Thank you.
I think this will be easier to do in your dataset query.
Below I have recreated your sample data and added an extra person in to make sure it's working as expected.
DECLARE #t TABLE (Name varchar(10), AttributeID INT, AttributeMemberID varchar(50))
INSERT INTO #t VALUES
('Mary', 5, '2'),
('Mary', 6, 'Mary#email.com'),
('James', 5, '1'),
('James', 6, 'james#email.com'),
('James', 7, '8')
DECLARE #AttributeMembers TABLE (AttributeMemberID INT, AttributeID int, Description varchar(20))
INSERT INTO #AttributeMembers VALUES
(1, 5, 'Male'),
(2, 5, 'Female'),
(3, 5, 'Other'),
(8, 7, 'New York'),
(9, 7, 'Los Angeles')
I also added in a new table which describes what each attribute is. We will use the output from this as column headers in the final SSRS matrix.
DECLARE #Attributes TABLE(AttributeID int, Caption varchar(50))
INSERT INTO #Attributes VALUES
(5, 'Gender'),
(6, 'Email'),
(7, 'City')
Finally we join all three togther and get a fairly normalised view for the data. The join is a bit messy as your current tables use the same column for both integer based lookups/joins and absolute string values. Hence the CASE in the JOIN
SELECT
t.Name,
a.Caption,
ISNULL(am.[Description], t.AttributeMemberID) as Label
FROM #t t
JOIN #Attributes a on t.AttributeID = a.AttributeID
LEFT JOIN #AttributeMembers am
on t.AttributeID = am.AttributeID
and
CAST(CASE WHEN ISNUMERIC(t.AttributeMemberID) = 0 THEN 0 ELSE t.AttributeMemberID END as int)
= am.AttributeMemberID
ORDER BY Name, Caption, Label
This gives us the following output...
As you can see, this will be easy to put into a Matrix control in SSRS.
Row group by Name, Column Group by Captionand data cell would beLabel`.
If you wanted to ensure the order of the columns, you could extend the Attributes table to include a SortOrder column, include this in the query output and use this in SSRS to order the columns by.
Hope that's clear enough.

Using Count and Group By on table field with "-"

I am running a query that currently counts species from animals table. However I am not getting the desired result with the query listed below. Currently the COUNT is counting number of specie, which is composed of two words type and breed(e.g. dog-pitbull). The query returns 1 for all entries. However, How could I group result and count by dogs,cats,birds, etc. disregarding breed? SQLFIDDLE
Query
SELECT specie, COUNT(*) as Total FROM animals GROUP BY specie;
Schema
CREATE TABLE animals
(`id` int, `name` varchar(20), `specie` varchar(55))
;
INSERT INTO animals
(`id`, `name`, `specie`)
VALUES
(1, 'dougie', 'dog-poodle'),
(2, 'bonzo', 'dog-pitbull'),
(3, 'cadi', 'cat-persian'),
(4, 'mr.turtle', 'turtle-snapping'),
(5, 'spotty', 'turtle-spotted'),
(6, 'tweety', 'bird-canary')
;
This query will give you the animal type and the count i.e. 2 dog, 1 cat, 2 turtle, 1 bird.
It looks at the value of specie and it return the value before the 1st - found along with the count.
SELECT
SUBSTRING_INDEX(specie,'-',1) AS specie
, COUNT(*) AS total
FROM animals
GROUP BY SUBSTRING_INDEX(specie,'-',1);
USE SUBSTRING_INDEX:
SELECT SUBSTRING_INDEX(specie,'-',1), specie, COUNT(*) AS Total
FROM animals GROUP BY SUBSTRING_INDEX(specie,'-',1);
RESULT:
bird bird-canary 1
cat cat-persian 1
dog dog-poodle 2
turtle turtle-snapping 2
DOKU SUBSTRING_INDEX

Nested Set Query to retrieve all ancestors of each node

I have a MySQL query that I thought was working fine to retrieve all the ancestors of each node, starting from the top node, down to its immediate node. However when I added a 5th level to the nested set, it broke.
Below are example tables, queries and SQL Fiddles:
Four Level Nested Set:
CREATE TABLE Tree
(title varchar(20) PRIMARY KEY,
`tree` int,
`left` int,
`right` int);
INSERT Tree
VALUES
("Food", 1, 1, 18),
('Fruit', 1, 2, 11),
('Red', 1, 3, 6),
('Cherry', 1, 4, 5),
('Yellow', 1, 7, 10),
('Banana', 1, 8, 9),
('Meat', 1, 12, 17),
('Beef', 1, 13, 14),
('Pork', 1, 15, 16);
The Query:
SELECT t0.title node
,(SELECT GROUP_CONCAT(t2.title)
FROM Tree t2
WHERE t2.left<t0.left AND t2.right>t0.right
ORDER BY t2.left) ancestors
FROM Tree t0
GROUP BY t0.title;
The returned result for node Banana is Food,Fruit,Yellow - Perfect. You can see this here SQL Fiddle - 4 Levels
When I run the same query on the 5 level table below, the 5th level nodes come back in the wrong order:
CREATE TABLE Tree
(title varchar(20) PRIMARY KEY,
`tree` int,
`left` int,
`right` int);
INSERT Tree
VALUES
("Food", 1, 1, 24),
('Fruit', 1, 2, 13),
('Red', 1, 3, 8),
('Cherry', 1, 4, 7),
('Cherry_pie', 1, 5, 6),
('Yellow', 1, 9, 12),
('Banana', 1, 10, 11),
('Meat', 1, 14, 23),
('Beef', 1, 15, 16),
('Pork', 1, 17, 22),
('Bacon', 1, 18, 21),
('Bacon_Sandwich', 1, 19, 20);
The returned result for Bacon_Sandwich is Bacon,Food,Meat,Pork which is not the right order, it should be Food,Meat,Pork,Bacon - You can see this here SQL Fiddle - 5 Levels
I am not sure what is happening because I don't really understand subqueries well enough. Can anyone shed any light on this?
EDIT AFTER INVESTIGATION:
Woah!! Looks like writing all this out and reading up about ordering with GROUP_CONCAT gave me some inspiration.
Adding ORDER BY to the actual GROUP_CONCAT function and removing from the end of the subquery solved the issue. I now receive Food,Meat,Pork,Bacon for the node Bacon_Sandwich
SELECT t0.title node
,(SELECT GROUP_CONCAT(t2.title ORDER BY t2.left)
FROM Tree t2
WHERE t2.left<t0.left AND t2.right>t0.right
) ancestors
FROM Tree t0
GROUP BY t0.title;
I still have no idea why though. Having ORDER BY at the end of the subquery works for 4 levels but not for 5?!?!
If someone could explain what the issue is and why moving the ORDER BY fixes it, I'd be most grateful.
First it's important to understand that you have an implicit GROUP BY
If you use a group function in a statement containing no GROUP BY clause, it is equivalent to grouping on all rows.
To make the point more understandable I'll leave out subqueries and reduce the problem to the banana. Banana is the set [10, 11]. The correct sorted ancestors are those:
SELECT "banana" as node, GROUP_CONCAT(title ORDER by `left`)
FROM Tree WHERE `left` < 10 AND `right` > 11
GROUP BY node;
The ORDER BY must be in GROUP_CONCAT() as you want the aggregation function to sort. ORDER BY outside sorts by the aggregated results (i.e. the result of GROUP_CONCAT()). The fact that it worked until level 4 is just luck. ORDER BY has no effect on an aggregate function. You would get the same results with or without the ORDER BY:
SELECT GROUP_CONCAT(title)
FROM Tree WHERE `left` < 10 AND `right` > 11
/* ORDER BY `left` */
It might help to understand what
SELECT GROUP_CONCAT(title ORDER BY left) FROM Tree WHERE … ORDER BY left does:
Get a selection (WHERE) which results in three rows in an undefined order:
("Food")
("Yellow")
("Fruit")
Aggregate the result into one row (implicit GROUP BY) in order to be able to use an aggregate function:
(("Food","Yellow", "Fruit"))
Fire the aggregate function (GROUP_CONCAT(title, ORDER BY link)) on it. I.e. order by link and then concatenate:
("Food,Fruit,Yellow")
And now finally it sorts that result (ORDER BY). As it's only one row, sorting changes nothing.
("Food,Fruit,Yellow")
You can get the result using JOIN or SUB-QUERY.
Using JOIN:
SELECT t0.title node, GROUP_CONCAT(t2.title ORDER BY t2.left) ancestors
FROM Tree t0
LEFT JOIN Tree t2 ON t2.left < t0.left AND t2.right > t0.right
GROUP BY t0.title;
Check this SQL FIDDLE DEMO
Using SUB-QUERY:
SELECT t0.title node,
(SELECT GROUP_CONCAT(t2.title ORDER BY t2.left)
FROM Tree t2 WHERE t2.left<t0.left AND t2.right>t0.right) ancestors
FROM Tree t0
GROUP BY t0.title;
Check this SQL FIDDLE DEMO
OUTPUT
| NODE | ANCESTORS |
|----------------|-----------------------|
| Bacon | Food,Meat,Pork |
| Bacon_Sandwich | Food,Meat,Pork,Bacon |
| Banana | Food,Fruit,Yellow |
| Beef | Food,Meat |
| Cherry | Food,Fruit,Red |
| Cherry_pie | Food,Fruit,Red,Cherry |
| Food | (null) |
| Fruit | Food |
| Meat | Food |
| Pork | Food,Meat |
| Red | Food,Fruit |
| Yellow | Food,Fruit |
In your sub query you had used ORDER BY after WHERE clause which won't affect the output. By default GROUP_CONCAT() function will orders the output string in ascending order of column value. It won't consider you explicit ORDER BY clause.
If you check your output of first query which returns the data in ascending order of title column. So the returned result for node Banana is Food,Fruit,Yellow.
But in your second result for Bacon_Sandwich is Bacon,Food,Meat,Pork because in ascending order Bacon comes first than Food will come.
If you want to order the result based on left column than you have to specify ORDER BY inside the GROUP_CONCAT() function as above. Check my both queries.
I prefer that you use JOIN instead of SUB-QUERY for improving performance.

Query to get categorised sets of splits

Given this table structure:
CREATE TABLE IF NOT EXISTS splits (
id INT AUTO_INCREMENT,
sector_id INT,
type VARCHAR(100),
percentage INT,
PRIMARY KEY (id),
INDEX (type)
) ENGINE MyISAM;
And this data set:
INSERT INTO splits (sector_id, type, percentage) VALUES
(1, 'Manager', '50'),
(1, 'Sales Rep', '50'),
(2, 'Manager', '75'),
(2, 'Sales Rep', '25'),
(3, 'Manager', '75'),
(3, 'Sales Rep', '25'),
(4, 'Manager', '100'),
(5, 'Manager', '100'),
(6, 'Manager', '100');
How could I return the amount of sectors that split in the same way:
Like this:
Split | Number
---------------+-------
50% M / 50% SR | 1
75% M / 25% SR | 2
100% M | 3
So this shows 1 sector (id 1) has a split ratio of 50/50, 2 sectors have a split ratio of 75/25 (ids 2, 3) and 3 sectors have a split ratio off 100/0 (ids 4, 5, 6).
Here is a SQL Fiddle with the database setup: http://sqlfiddle.com/#!2/6b19f/1
What have you tried?
I cannot even think of where to start to solve this problem, so I apologise for not being able to show an attempted solution. I will update this question if I get anywhere.
The reason why I want to do this all in the database (and not the application) is because our automated reporting tools can be pointed to a table/view/query and automatically apply filtering, sorting, charting etc. To do it manually in the application loses all the default functionality.
I don't really understand the problem. Your DB contains already all the data you want to retrieve?!
SELECT
sector_id AS Number,
type
percentage
FROM
splits
The easiest thing would now to take you software and then turn those (type-percentage)-tuples into strings. Why do you need the database to create and concat this string?
Can there be more than 2 types?
For Postgres I'd use an array of tuples for output:
SELECT
sector_id,
array_agg(row(percentage, type))
FROM
splits
GROUP BY
sector_id
Correct Query:
SELECT
x.y,
COUNT(*) c
FROM (
SELECT
sector_id,
GROUP_CONCAT(CONCAT(percentage, '% '), type SEPARATOR ' / ') AS y
FROM (
SELECT
sector_id,
type,
percentage
FROM splits
ORDER BY sector_id, type
) z
GROUP BY sector_id
) x
GROUP BY x.y
ORDER by c
Result will look like this:
50% Manager / 50% Sales Rep | 1
75% Manager / 25% Sales Rep | 2
100% Manager | 3

MySQL: Joins vs. Bitwise operator, and performance thereof

There are a number of questions about this subject, but mine is more specific to performance concerns.
With regards to an object, I want to track a multitude of 'attributes', each with a multitude of discrete 'values' (each attribute have between 3 and 16 valid 'values'.) For instance, consider tracking military personnel. The attributes/values might be (not real, I totally made these up):
attribute: {values}
languages_spoken: {english, spanish, russian, chinese, …. }
certificates: {infantry, airborne, pilot, tank_driver…..}
approved_equipment: {m4, rocket_launcher, shovel, super_secret_radio_thingy….}
approved_operations: {reconnaissance, logistics, invasion, cooking, ….}
awards_won: {medal_honor, purple_heart, ….}
… and so on.
One one to do this - the way I want to do this - is to have a personnel table and an attributes table:
personnel table => [id, name, rank, address…..]
personnel_attributes table => [personnel_id, attribute_id, value_id]
along with the associated attributes and values tables.
So if pesonnel_id=31415 is approved for logistics, there would be the following entry in the personnel_attributes table:
personnel_id | attribute_id | value_id
31415 | 3 | 2
where 3 = attribute_id for "approved_operations" and 2 = value_id for "logistics" (sorry formatting spaces didn't line up.)
Then a search to find all personnel who speak english OR spanish, AND who is infantry OR airborne, AND can operate a shovel OR super_secret_radio_thingy would be something like:
SELECT t1.personnel_id
FROM personnel_attributes t1, personnel_attributes t2, personnel_attributes t3
WHERE ((t1.attribute_id = 1 and t1.value_id = 1) OR (t1.attribute_id = 1 and t1.value_id = 2))
AND ((t2.attribute_id = 2 and t1.value_id = 1) OR (t2.attribute_id = 2 and t1.value_id = 2))
AND ((t3.attribute_id = 3 and t1.value_id = 3) OR (t3.attribute_id = 3 and t1.value_id = 4))
AND t2.personnel_id = t1.personnel_id
AND t3.personnel_id = t1.personnel_id;
Assuming this isn't a totally stupid way to write the SQL query, the problem is that its very slow (even with seemingly relevant indexes.)
So I'm am toying with using bitwise operators instead, where each attribute is a column in a table and each value is a bit. The same search would be:
SELECT personnel_id FROM personnel_attributes
WHERE language & b'00000011'
AND certificates & b'00000011'
AND approved_operations & b'00001100';
I know this does a full table scan, but in my experiments with 350,000 sample personnel, and 16 attributes each, the first method took 20 seconds whereas the bitwise method took 38 milliseconds!
Am I doing something wrong here? Are these the performance results I should expect?
Thanks!
Using the bitwise operation will require evaluating all of the rows. I believe your problem can be solved with a change to your original SELECT statement and how you're joing your tables:
To make it a little easier to read, I've changed attribute values to words instead of integers so it's less confusing while reading through my example, but obviously you can leave them as integers and it concept would still work:
CREATE TABLE PERSONNEL (
ID INT,
NAME VARCHAR(20)
)
CREATE TABLE PERSONNEL_ATTRIBUTES (
PERSONNEL_ID INT,
ATTRIB_ID INT,
ATTRIB_VALUE VARCHAR(20)
)
INSERT INTO PERSONNEL VALUES (1, 'JIM SMITH')
INSERT INTO PERSONNEL VALUES (2, 'JANE DOE')
INSERT INTO PERSONNEL_ATTRIBUTES VALUES (1, 1, 'English')
INSERT INTO PERSONNEL_ATTRIBUTES VALUES (1, 1, 'Spanish')
INSERT INTO PERSONNEL_ATTRIBUTES VALUES (1, 1, 'Russian')
INSERT INTO PERSONNEL_ATTRIBUTES VALUES (1, 3, 'Logistics')
INSERT INTO PERSONNEL_ATTRIBUTES VALUES (1, 3, 'Infantry')
INSERT INTO PERSONNEL_ATTRIBUTES VALUES (2, 1, 'English')
INSERT INTO PERSONNEL_ATTRIBUTES VALUES (2, 3, 'Infantry')
SELECT P.ID, P.NAME, PA1.ATTRIB_VALUE AS DESIRED_LANGUAGE, PA2.ATTRIB_VALUE AS APPROVED_OPERATION
FROM PERSONNEL P
JOIN PERSONNEL_ATTRIBUTES PA1 ON P.ID = PA1.PERSONNEL_ID AND PA1.ATTRIB_ID = 1
JOIN PERSONNEL_ATTRIBUTES PA2 ON P.ID = PA2.PERSONNEL_ID AND PA2.ATTRIB_ID = 3
WHERE PA1.ATTRIB_VALUE = 'Spanish' AND (PA2.ATTRIB_VALUE = 'Infantry' OR PA2.ATTRIB_VALUE = 'Airborne')
Have the same issue of using django-bitfield or a separate table for flags.
Inspired by your experiment, I used a 3.5m record table (innodb) and made count() and retrieve queries for both variants. the result was astonishing: approx 5sec vs. 40sec bitfield wins.