Query to get categorised sets of splits - mysql

Given this table structure:
CREATE TABLE IF NOT EXISTS splits (
id INT AUTO_INCREMENT,
sector_id INT,
type VARCHAR(100),
percentage INT,
PRIMARY KEY (id),
INDEX (type)
) ENGINE MyISAM;
And this data set:
INSERT INTO splits (sector_id, type, percentage) VALUES
(1, 'Manager', '50'),
(1, 'Sales Rep', '50'),
(2, 'Manager', '75'),
(2, 'Sales Rep', '25'),
(3, 'Manager', '75'),
(3, 'Sales Rep', '25'),
(4, 'Manager', '100'),
(5, 'Manager', '100'),
(6, 'Manager', '100');
How could I return the amount of sectors that split in the same way:
Like this:
Split | Number
---------------+-------
50% M / 50% SR | 1
75% M / 25% SR | 2
100% M | 3
So this shows 1 sector (id 1) has a split ratio of 50/50, 2 sectors have a split ratio of 75/25 (ids 2, 3) and 3 sectors have a split ratio off 100/0 (ids 4, 5, 6).
Here is a SQL Fiddle with the database setup: http://sqlfiddle.com/#!2/6b19f/1
What have you tried?
I cannot even think of where to start to solve this problem, so I apologise for not being able to show an attempted solution. I will update this question if I get anywhere.
The reason why I want to do this all in the database (and not the application) is because our automated reporting tools can be pointed to a table/view/query and automatically apply filtering, sorting, charting etc. To do it manually in the application loses all the default functionality.

I don't really understand the problem. Your DB contains already all the data you want to retrieve?!
SELECT
sector_id AS Number,
type
percentage
FROM
splits
The easiest thing would now to take you software and then turn those (type-percentage)-tuples into strings. Why do you need the database to create and concat this string?
Can there be more than 2 types?
For Postgres I'd use an array of tuples for output:
SELECT
sector_id,
array_agg(row(percentage, type))
FROM
splits
GROUP BY
sector_id
Correct Query:
SELECT
x.y,
COUNT(*) c
FROM (
SELECT
sector_id,
GROUP_CONCAT(CONCAT(percentage, '% '), type SEPARATOR ' / ') AS y
FROM (
SELECT
sector_id,
type,
percentage
FROM splits
ORDER BY sector_id, type
) z
GROUP BY sector_id
) x
GROUP BY x.y
ORDER by c
Result will look like this:
50% Manager / 50% Sales Rep | 1
75% Manager / 25% Sales Rep | 2
100% Manager | 3

Related

SSRS - Lookup only on certain columns in a matrix

I have a matrix table with a column group "Application questions" let's say these are in table 1. Some of the questions have unique string values such as: Name, ID number, email address. But others have an integer value that relates to an actual value for a separate lookup table (table 2), for example, the values for the column "Gender" are 1, 2, 3, for Male, Female, Other. Is there a way in the lookup function that I can isolate the columns that only have integer values or alternatively ignore the other columns with unique string values?
Table1
NAME ATTRIBUTE_id ATTRIBUTE
-----------------------------------------
James 5 1
James 6 james#email.com
James 7 8
Table2
Lookup_id ATTRIBUTE_id Description
-----------------------------------------
1 5 Male
2 5 Female
3 5 Other
8 7 New York
9 7 Los Angeles
Output
NAME | Email | Gender | City
-------------------------------------------------------
James james#email.com Male New York
Hope that makes sense!
Thank you.
I think this will be easier to do in your dataset query.
Below I have recreated your sample data and added an extra person in to make sure it's working as expected.
DECLARE #t TABLE (Name varchar(10), AttributeID INT, AttributeMemberID varchar(50))
INSERT INTO #t VALUES
('Mary', 5, '2'),
('Mary', 6, 'Mary#email.com'),
('James', 5, '1'),
('James', 6, 'james#email.com'),
('James', 7, '8')
DECLARE #AttributeMembers TABLE (AttributeMemberID INT, AttributeID int, Description varchar(20))
INSERT INTO #AttributeMembers VALUES
(1, 5, 'Male'),
(2, 5, 'Female'),
(3, 5, 'Other'),
(8, 7, 'New York'),
(9, 7, 'Los Angeles')
I also added in a new table which describes what each attribute is. We will use the output from this as column headers in the final SSRS matrix.
DECLARE #Attributes TABLE(AttributeID int, Caption varchar(50))
INSERT INTO #Attributes VALUES
(5, 'Gender'),
(6, 'Email'),
(7, 'City')
Finally we join all three togther and get a fairly normalised view for the data. The join is a bit messy as your current tables use the same column for both integer based lookups/joins and absolute string values. Hence the CASE in the JOIN
SELECT
t.Name,
a.Caption,
ISNULL(am.[Description], t.AttributeMemberID) as Label
FROM #t t
JOIN #Attributes a on t.AttributeID = a.AttributeID
LEFT JOIN #AttributeMembers am
on t.AttributeID = am.AttributeID
and
CAST(CASE WHEN ISNUMERIC(t.AttributeMemberID) = 0 THEN 0 ELSE t.AttributeMemberID END as int)
= am.AttributeMemberID
ORDER BY Name, Caption, Label
This gives us the following output...
As you can see, this will be easy to put into a Matrix control in SSRS.
Row group by Name, Column Group by Captionand data cell would beLabel`.
If you wanted to ensure the order of the columns, you could extend the Attributes table to include a SortOrder column, include this in the query output and use this in SSRS to order the columns by.
Hope that's clear enough.

How to select a mysql column values which contains Y first and N second

I want query in mysql to select a column values which contains Y and N.
Below is my table
If I use this query
"SELECT * from hotel where standard='Y' OR standard='N' group by hotel_code";
This query is working based on insert id but my requirement is not like that, first it should select 'Y' first then only 'N' should come.
[![enter image description here][3]][3]
I want select particular these column values
2 ---- 123 ------Y
4 -----324 ------Y
6 -----456 ------N or 5 ------456 -- N any row from when N appear
7 -----987 ------Y
Thanks in advance!!!
My previous answer has a problem which indicated error related to only_full_group_by when executing a query in MySql. However, I have created a local database myself and then came up with the correct sql that you need. Here it is.
SELECT min(origin), hotel_code, max(standard) as std from hotel
where standard='Y' OR standard='N'
group by hotel_code
order by std desc;
And after executing the sql, here's the result that I have got.
1 123 Y
3 324 Y
7 987 Y
5 456 N
I am sharing the create table and insert statements so that anyone can check by themselves if the query is okay.
create table hotel (
origin integer auto_increment primary key,
hotel_code integer not null,
standard varchar(1) not null
);
INSERT INTO `hotel` (`origin`, `hotel_code`, `standard`)
VALUES
(1, 123, 'Y'),
(2, 123, 'N'),
(3, 324, 'N'),
(4, 324, 'Y'),
(5, 456, 'N'),
(6, 456, 'N'),
(7, 987, 'N'),
(8, 987, 'Y');
Hope that helps!
Here what you need:
SELECT origin, hotel_code, CASE COUNT(DISTINCT standart)
WHEN 1 AND standart = "N" THEN "N"
WHEN 1 AND standart = "Y" THEN "Y"
WHEN 2 THEN "Y"
END as standart
FROM hotel
GROUP BY hotel_code ORDER BY standart DESC
Results:
origin hotel_code standart
1 123 Y
3 324 Y
9 888 Y
7 987 Y
6 456 N
SQLFiddle: http://sqlfiddle.com/#!9/17bd53/3/0
There are many approaches to solve your algorithm, however, due to simplicity I pick this one:
SELECT h2.origin, h2.hotel_code, h2.standard
FROM (SELECT * FROM hotel WHERE standard = 'Y') h1
JOIN hotel h2 on h1.hotel_code = h2.hotel_code
ORDER BY h2.hotel_code, h2.standard;
Click here to view it working
Enjoy it!

SQL Query for exact match in many to many relation

I have the following tables(only listing the required attributes)
medicine (id, name),
generic (id, name),
med_gen (med_id references medicine(id),gen_id references generic(id), potency)
Sample Data
medicine
(1, 'Crocin')
(2, 'Stamlo')
(3, 'NT Kuf')
generic
(1, 'Hexachlorodine')
(2, 'Methyl Benzoate')
med_gen
(1, 1, '100mg')
(1, 2, '50ml')
(2, 1, '100mg')
(2, 2, '60ml')
(3, 1, '100mg')
(3, 2, '50ml')
I want all the medicines which are equivalent to a given medicine. Those medicines are equivalent to each other that have same generic as well as same potency. In the above sample data, all the three have same generics, but only 1 and three also have same potency for the corresponding generics. So 1 and 3 are equivalent medicines.
I want to find out equivalent medicines given a medicine id.
NOTE : One medicine may have any number of generics. Medicine table has around 102000 records, generic table around 2200 and potency table around 200000 records. So performance is a key point.
NOTE 2 : The database used in MySQL.
One way to do it in MySQL is to leverage GROUP_CONCAT() function
SELECT g.med_id
FROM
(
SELECT med_id, GROUP_CONCAT(gen_id ORDER BY gen_id) gen_id, GROUP_CONCAT(potency ORDER BY potency) potency
FROM med_gen
WHERE med_id = 1 -- here 1 is med_id for which you're trying to find analogs
) o JOIN
(
SELECT med_id, GROUP_CONCAT(gen_id ORDER BY gen_id) gen_id, GROUP_CONCAT(potency ORDER BY potency) potency
FROM med_gen
WHERE med_id <> 1 -- here 1 is med_id for which you're trying to find analogs
GROUP BY med_id
) g
ON o.gen_id = g.gen_id
AND o.potency = g.potency
Output:
| MED_ID |
|--------|
| 3 |
Here is SQLFiddle demo

Correct way to store uni/bi/trigrams ngrams in RDBMS?

I have a list of unigrams (single word), bigrams (two words), and trigrams (three words) I have pulled out of a bunch of documents. My goal is a statically analyses report and also a search I can use on these documents.
John Doe
Xeon 5668x
corporate tax rates
beach
tax plan
Porta San Giovanni
The ngrams are tagged by date and document. So for example, I can find relations between bigrams and when their phrases first appeared as well as relations between documents. I can also search for documents that contain these X number of un/bi/trigram phrases.
So my question is how to store them to optimize these searches.
The simplest approach is just a simple string column for each phrase and then I add relations to the document_ngram table each time I find that word/phrase in the document.
table document
{
id
text
date
}
table ngram
{
id
ngram varchar(200);
}
table document_ngram
{
id
ngram_id
document_id
date
}
However, This means that if I want to search through trigrams for a single word I have to use string searching. For example, lets say I wanted all trigrams with the word "summer" in them.
So if I instead split the words up so that the only thing stored in ngram was a single word, then added three columns so that all 1, 2, & 3 word chains could fit inside document_ngram?
table document_ngram
{
id
word1_id NOT NULL
word2_id DEFAULT NULL
word3_id DEFAULT NULL
document_id
date
}
Is this the correct way to do it? Are their better ways? I am currently using PostgreSQL and MySQL but I believe this is a generic SQL question.
This is how I would model your data (note that 'the' is referenced twice) You could also add weights to the single words.
DROP SCHEMA ngram CASCADE;
CREATE SCHEMA ngram;
SET search_path='ngram';
CREATE table word
( word_id INTEGER PRIMARY KEY
, the_word varchar
, constraint word_the_word UNIQUE (the_word)
);
CREATE table ngram
( ngram_id INTEGER PRIMARY KEY
, n INTEGER NOT NULL -- arity
, weight REAL -- payload
);
CREATE TABLE ngram_word
( ngram_id INTEGER NOT NULL REFERENCES ngram(ngram_id)
, seq INTEGER NOT NULL
, word_id INTEGER NOT NULL REFERENCES word(word_id)
, PRIMARY KEY (ngram_id,seq)
);
INSERT INTO word(word_id,the_word) VALUES
(1, 'the') ,(2, 'man') ,(3, 'who') ,(4, 'sold') ,(5, 'world' );
INSERT INTO ngram(ngram_id, n, weight) VALUES
(101, 6, 1.0);
INSERT INTO ngram_word(ngram_id,seq,word_id) VALUES
( 101, 1, 1)
, ( 101, 2, 2)
, ( 101, 3, 3)
, ( 101, 4, 4)
, ( 101, 5, 1)
, ( 101, 6, 5)
;
SELECT w.*
FROM ngram_word nw
JOIN word w ON w.word_id = nw.word_id
WHERE ngram_id = 101
ORDER BY seq;
RESULT:
word_id | the_word
---------+----------
1 | the
2 | man
3 | who
4 | sold
1 | the
5 | world
(6 rows)
Now, suppose you want to add a 4-gram to the existing (6-gram) data:
INSERT INTO word(word_id,the_word) VALUES
(6, 'is') ,(7, 'lost') ;
INSERT INTO ngram(ngram_id, n, weight) VALUES
(102, 4, 0.1);
INSERT INTO ngram_word(ngram_id,seq,word_id) VALUES
( 102, 1, 1)
, ( 102, 2, 2)
, ( 102, 3, 6)
, ( 102, 4, 7)
;
SELECT w.*
FROM ngram_word nw
JOIN word w ON w.word_id = nw.word_id
WHERE ngram_id = 102
ORDER BY seq;
Additional result:
INSERT 0 2
INSERT 0 1
INSERT 0 4
word_id | the_word
---------+----------
1 | the
2 | man
6 | is
7 | lost
(4 rows)
BTW: adding a document-type object to this model will add two additional tables to this model: one for the document, and one for document*ngram. (or in another approach: for document*word) A recursive model would also be a possibility.
UPDATE: the above model will need an additional constraint, which will need triggers (or a rule+ an additional table) to be implemented. Pseudocode:
ngram_word.seq >0 AND ngram_word.seq <= (select ngram.n FROM ngram ng WHERE ng.ngram_id = ngram_word.ngram_id)
One idea would be to modify your original table layout a bit. Consider the ngram varchar(200) column to only contain 1 word of the ngram, add in a word_no (1, 2, or 3) column, and add in a grouping column, so that, for example the two records for the two words in a bigram are related (give them the same word_group). [In Oracle, I'd pull the word_group numbers from a Sequence - I think PostGres would have something similar)
table document
{
id
text
date
}
table ngram
{
id
word_group
word_no
ngram varchar(200);
}
table document_ngram
{
id
ngram_id
document_id
date
}

MySQL: Joins vs. Bitwise operator, and performance thereof

There are a number of questions about this subject, but mine is more specific to performance concerns.
With regards to an object, I want to track a multitude of 'attributes', each with a multitude of discrete 'values' (each attribute have between 3 and 16 valid 'values'.) For instance, consider tracking military personnel. The attributes/values might be (not real, I totally made these up):
attribute: {values}
languages_spoken: {english, spanish, russian, chinese, …. }
certificates: {infantry, airborne, pilot, tank_driver…..}
approved_equipment: {m4, rocket_launcher, shovel, super_secret_radio_thingy….}
approved_operations: {reconnaissance, logistics, invasion, cooking, ….}
awards_won: {medal_honor, purple_heart, ….}
… and so on.
One one to do this - the way I want to do this - is to have a personnel table and an attributes table:
personnel table => [id, name, rank, address…..]
personnel_attributes table => [personnel_id, attribute_id, value_id]
along with the associated attributes and values tables.
So if pesonnel_id=31415 is approved for logistics, there would be the following entry in the personnel_attributes table:
personnel_id | attribute_id | value_id
31415 | 3 | 2
where 3 = attribute_id for "approved_operations" and 2 = value_id for "logistics" (sorry formatting spaces didn't line up.)
Then a search to find all personnel who speak english OR spanish, AND who is infantry OR airborne, AND can operate a shovel OR super_secret_radio_thingy would be something like:
SELECT t1.personnel_id
FROM personnel_attributes t1, personnel_attributes t2, personnel_attributes t3
WHERE ((t1.attribute_id = 1 and t1.value_id = 1) OR (t1.attribute_id = 1 and t1.value_id = 2))
AND ((t2.attribute_id = 2 and t1.value_id = 1) OR (t2.attribute_id = 2 and t1.value_id = 2))
AND ((t3.attribute_id = 3 and t1.value_id = 3) OR (t3.attribute_id = 3 and t1.value_id = 4))
AND t2.personnel_id = t1.personnel_id
AND t3.personnel_id = t1.personnel_id;
Assuming this isn't a totally stupid way to write the SQL query, the problem is that its very slow (even with seemingly relevant indexes.)
So I'm am toying with using bitwise operators instead, where each attribute is a column in a table and each value is a bit. The same search would be:
SELECT personnel_id FROM personnel_attributes
WHERE language & b'00000011'
AND certificates & b'00000011'
AND approved_operations & b'00001100';
I know this does a full table scan, but in my experiments with 350,000 sample personnel, and 16 attributes each, the first method took 20 seconds whereas the bitwise method took 38 milliseconds!
Am I doing something wrong here? Are these the performance results I should expect?
Thanks!
Using the bitwise operation will require evaluating all of the rows. I believe your problem can be solved with a change to your original SELECT statement and how you're joing your tables:
To make it a little easier to read, I've changed attribute values to words instead of integers so it's less confusing while reading through my example, but obviously you can leave them as integers and it concept would still work:
CREATE TABLE PERSONNEL (
ID INT,
NAME VARCHAR(20)
)
CREATE TABLE PERSONNEL_ATTRIBUTES (
PERSONNEL_ID INT,
ATTRIB_ID INT,
ATTRIB_VALUE VARCHAR(20)
)
INSERT INTO PERSONNEL VALUES (1, 'JIM SMITH')
INSERT INTO PERSONNEL VALUES (2, 'JANE DOE')
INSERT INTO PERSONNEL_ATTRIBUTES VALUES (1, 1, 'English')
INSERT INTO PERSONNEL_ATTRIBUTES VALUES (1, 1, 'Spanish')
INSERT INTO PERSONNEL_ATTRIBUTES VALUES (1, 1, 'Russian')
INSERT INTO PERSONNEL_ATTRIBUTES VALUES (1, 3, 'Logistics')
INSERT INTO PERSONNEL_ATTRIBUTES VALUES (1, 3, 'Infantry')
INSERT INTO PERSONNEL_ATTRIBUTES VALUES (2, 1, 'English')
INSERT INTO PERSONNEL_ATTRIBUTES VALUES (2, 3, 'Infantry')
SELECT P.ID, P.NAME, PA1.ATTRIB_VALUE AS DESIRED_LANGUAGE, PA2.ATTRIB_VALUE AS APPROVED_OPERATION
FROM PERSONNEL P
JOIN PERSONNEL_ATTRIBUTES PA1 ON P.ID = PA1.PERSONNEL_ID AND PA1.ATTRIB_ID = 1
JOIN PERSONNEL_ATTRIBUTES PA2 ON P.ID = PA2.PERSONNEL_ID AND PA2.ATTRIB_ID = 3
WHERE PA1.ATTRIB_VALUE = 'Spanish' AND (PA2.ATTRIB_VALUE = 'Infantry' OR PA2.ATTRIB_VALUE = 'Airborne')
Have the same issue of using django-bitfield or a separate table for flags.
Inspired by your experiment, I used a 3.5m record table (innodb) and made count() and retrieve queries for both variants. the result was astonishing: approx 5sec vs. 40sec bitfield wins.