SQL query to select from two nested sub queries - mysql

This is a pretty hard sql query on my HW assignment and i am kind of stuck. any hints would be appreciated.
my query:
SELECT nest1.carname,
nest1.plndescription,
nest1.plancount,
nest2.totalmems,
Round(( nest1.plancount / nest2.totalmems ), 2) AS pct
FROM (SELECT carriers.carname,
p.plndescription,
Count(members.planid)AS plancount
FROM carriers,
plans p,
members
WHERE carriers.carrierid = p.carrierid
AND p.planid = members.planid
GROUP BY carriers.carname,
p.plndescription)nest1
NATURAL JOIN (SELECT carriers.carrierid,
Count(members.planid)AS totalmems
FROM carriers,
plans p,
members
WHERE carriers.carrierid = p.carrierid
AND p.planid = members.planid
GROUP BY carriers.carrierid)nest2
ORDER BY nest1.carname
my tables and relationships;
` CREATE TABLE Carriers
( CarrierID varchar2(4) not null,
carName varchar2(35),
carAddress varchar2(50),
carCity varchar2(30),
carStCode varchar2(2),
carZip varchar2(10),
carPhone varchar2(10),
carWebSite varchar2(255),
carContactFirstname varchar2(35),
carContactLastName varchar2(35),
carContactEmail varchar2(255),
CONSTRAINT pk_CarrierID PRIMARY KEY (CarrierID)
);`
CREATE TABLE Plans
( PlanID integer not null,
plnDescription varchar2(35),
plnCost decimal (8,2),
CarrierID varchar2(4),
CONSTRAINT pk_PlanID PRIMARY KEY (PlanID),
CONSTRAINT fk_CarrierID FOREIGN KEY (CarrierId) REFERENCES Carriers
);
CREATE TABLE Members
( MemberNo integer not null,
mbrFirstname varchar2(35),
mbrLastName varchar2(35),
mbrStreet varchar2(50),
mbrCity varchar2(30),
mbrState varchar2(2),
mbrZip varchar2(10),
mbrPhoneNo varchar2(10),
PlanID integer,
mbrEmail varchar2(255),
mbrDateEffective date,
employerID integer,
CONSTRAINT pk_MemberNo PRIMARY KEY (MemberNo),
CONSTRAINT fk_PlanID FOREIGN KEY (PlanId) REFERENCES Plans,
CONSTRAINT fk_employerID FOREIGN KEY (employerID) REFERENCES employers
);
the problem :
Create a query that will list all Carriers and their Plans along with a column that displays the number of members in that Plan, the total number of members serviced by the Carrier and the percent of the Carrier’s Members that are in that Plan. For Example – Blue Cross Blue Shield – would display as follows:
correct output:
Carrier Plan PlanCount TotalMems Pct
Blue Cross Blue Shield 2-Party Basic Medical 10 22 45.45

Being lazy, I wouldn't want to type that much. I didn't check to see if these parse correctly, but since it is homework it should be too easy.
if you were using mysql I would do it like this and use session variables with subqueries:
select c.carName As Carrier
, p.plnDescription As Plan
, count(1) As PlanCount
, (select #plnCount:=count(1) from members m where m.PlanId=p.PlanId) TotalMems
, (select #plnCount/count(1) from members) Pct
from carriers c
left join plans p on p.CarrierId=c.CarrierId
but your ddl indicates you are using oracle (b/c varchar2), so something more like:
select c.carName As Carrier
, p.plnDescription As Plan
, PlanCount
, PlanCount/tot Pct
from carriers c
,plans p on p.CarrierId=c.CarrierId
,(select sum(decode(m.planId,p.planId,1,0)) PlanCount, count(1) tot from members)

I would do with the following. The lowest query is just a query of ALL MEMBERS regardless of which plan a person is in. This will need to be run on its own in the query execution plan as it will result in a single record and applied to every record for the rest of the query. This results with no actual JOIN or WHERE clause.
Next, the first part of the from does a pre-aggregation on just the plan ID and count of members per specific plan grouped by plan.
Now, get into proper JOIN clause to pull in the details of the plan for it's description, then the carrier for the carrier name, then order it by whatever you want...
SELECT
C.CarName,
P.PlnDescription,
PerPlan.PerPlanCount,
( PerPlan.PerPlanCount / AllPlans.AllMembers ) as PcntOfAllMembers
from
( select M.PlanID, COUNT(*) as PerPlanCount
from Members M
group by M.PlanID ) PerPlan
Join Plans P
ON PerPlan.PlanID = P.PlanID
Join Carriers C
ON P.CarrierID = C.CarrierID,
( select COUNT(*) AllMembersCount
from Members ) AllPlans
order by
C.CarName,
P.PlnDescription
You had a count of 10 for a plan count on your sample, but nothing to indicate multiple sub-plans for '2-Party Basic Medical', so I don't think that is a real column per the data.
HOWEVER, if the 10 is a representation of how many distinct plans that 'Blue Cross' has, and that '2-Party...' is one of those plans, that would be a different query.

Related

Picking out and showing only the highest value from one column based on three columns in MySQL

Hei!
I have searched everywhere but cannot find the correct way to select only the highest result per student inside the same topic. So studentnr is the ID of the student in the table, emnekode is the topic code and karakter is result with A as highest and F as lowest.
Any suggestions to how I can make it work? I would appreciate it.
Here is the code so far with output under:
SELECT Eksamensresultat.*, Emnenavn, Studiepoeng
FROM Eksamensresultat, Emne
WHERE Eksamensresultat.Emnekode = Emne.Emnekode
ORDER BY RIGHT (Eksamensresultat.Emnekode, 4) ASC;
I want the output to be like this (except result B since A is higher for that student):
Here is the two tables:
CREATE TABLE Emne
(
Emnekode CHAR(8) NOT NULL,
Emnenavn CHAR(40) NOT NULL,
Studiepoeng DECIMAL(3, 1),
CONSTRAINT EmnekodePK PRIMARY KEY(Emnekode)
);
CREATE TABLE Eksamen
(
Emnekode CHAR(8) NOT NULL,
Dato DATE NOT NULL,
Romnr CHAR(4) NOT NULL,
CONSTRAINT EksamenPK PRIMARY KEY(Dato,Emnekode),
CONSTRAINT EksamenEmneFK FOREIGN KEY(Emnekode) REFERENCES Emne(Emnekode),
CONSTRAINT EksamenRomFK FOREIGN KEY(Romnr) REFERENCES Rom(Romnr)
);
Data:
INSERT INTO Emne (Emnekode, Emnenavn, Studiepoeng) VALUES
("PRG1000", "Grunnleggende programmering 1", 7.5),
("PRG1100", "Grunnleggende programmering 2", 7.5),
("WEB1100", "Webutvikling og HCI", 7.5),
("SYS1000", "Systemutvikling", 7.5),
("ORL1100", "Organisering", 7.5);
INSERT INTO Eksamensresultat(Karakter,Studentnr,Emnekode,Dato) VALUES
("A","240202","PRG1000","20210505"),
("C","240202","PRG1100","20210506"),
("B","240202","SYS1000","20210507"),
("A","225087","PRG1100","20210506"),
(NULL,"225087","SYS1000","20210507"),
(NULL,"240225","SYS1000","20210507"),
(NULL,"884642","SYS1000","20210507"),
("C","139959","PRG1000","20210505"),
("B","240202","PRG1000","20210606");
One way is to identify the right row:
SELECT Studentnr, Emnekode, min(Karakter)
FROM Eksamensresultat
GROUP BY 1, 2
then join that with your query to select the correct row of Eksamensresultat (only the group by column and aggregate are valid):
SELECT Eksamensresultat.*, Emnenavn, Studiepoeng
FROM Eksamensresultat as e
JOIN (
SELECT Studentnr, Emnekode, min(Karakter) 'Karakter'
FROM Eksamensresultat
GROUP BY 1, 2
) using (Studentnr, Emnekode, Karakter)
JOIN Emne using (Emnekode)
ORDER BY RIGHT (Eksamensresultat.Emnekode, 4) ASC;
The other (more readable) option is to use a window function if your mysql is recent enough. See https://dev.mysql.com/doc/refman/8.0/en/window-functions-usage.html. I would need a working schema to test that query for you.
If I understand correctly, you want ROW_NUMBER(), so you can choose one row per student/exam with the higher Karakter:
SELECT er.*
FROM (SELECT er.*, e.Emnenavn, e.Studiepoeng,
ROW_NUMBER() OVER (PARTITION BY er.Studentnr, er.Emnekode ORDER BY er.Karakter DESC) as seqnum
FROM Eksamensresultat er JOIN
Emne e
ON er.Emnekode = e.Emnekode
) er
WHERE er = 1
ORDER BY RIGHT(Emnekode, 4) ASC;

SQL - Column in field list is ambiguous

I have two tables BOOKINGS and WORKER. Basically there is table for a worker and a table to keep track of what the worker has to do in a time frame aka booking. I’m trying to check if there is an available worker for a job, so I query the booking to check if requested time has available workers between the start end date. However, I get stuck on the next part. Which is returning the list of workers that do have that time available. I read that I could join the table passed on a shared column, so I tried doing an inner join with the WORKER_NAME column, but when I try to do this I get a ambiguous error. This leads me to believe I misunderstood the concept. Does anyone understand what I;m trying to do and knows how to do it, or knows why I have the error below. Thanks guys !!!!
CREATE TABLE WORKER (
ID INT NOT NULL AUTO_INCREMENT,
WORKER_NAME varchar(80) NOT NULL,
WORKER_CODE INT,
WORKER_WAGE INT,
PRIMARY KEY (ID)
)
CREATE TABLE BOOKING (
ID INT NOT NULL AUTO_INCREMENT,
WORKER_NAME varchar(80) NOT NULL,
START DATE NOT NULL,
END DATE NOT NULL,
PRIMARY KEY (ID)
)
query
SELECT *
FROM WORKERS
INNER JOIN BOOKING
ON WORKER_NAME = WORKER_NAME
WHERE (START NOT BETWEEN '2010-10-01' AND '2010-10-10')
ORDER BY ID
#1052 - Column 'WORKER_NAME' in on clause is ambiguous
In your query, the column "worker_name" exists in two tables; in this case, you must reference the tablename as part of the column identifer.
SELECT *
FROM WORKERS
INNER JOIN BOOKING
ON workers.WORKER_NAME = booking.WORKER_NAME
WHERE (START NOT BETWEEN '2010-10-01' AND '2010-10-10')
ORDER BY ID
In your query, the column WORKER_NAME and ID columns exists in both tables, where WORKER_NAME retains the same meaning and ID is re-purposed; in this case, you must either specify you are using WORKER_NAME as the join search condition or 'project away' (rename or omit) the duplicate ID problem.
Because the ID columns are AUTO_INCREMENT, I assume (hope!) they have no business meaning. Therefore, they could both be omitted, allowing a natural join that will cause duplicate columns to be 'projected away'. This is one of those situations where one wishes SQL had a WORKER ( ALL BUT ( ID ) ) type syntax; instead, one is required to do it longhand. It might be easier in the long run to to opt for a consistent naming convention and rename the columns to WORKER_ID and BOOKING_ID respectively.
You would also need to identify a business key to order on e.g. ( START, WORKER_NAME ):
SELECT *
FROM
( SELECT WORKER_NAME, WORKER_CODE, WORKER_WAGE FROM WORKER ) AS W
NATURAL JOIN
( SELECT WORKER_NAME, START, END FROM BOOKING ) AS B
WHERE ( START NOT BETWEEN '2010-10-01' AND '2010-10-10' )
ORDER BY START, WORKER_NAME;
This is good, but its returning the start and end times as well. I'm just wanting the WOKER ROWS. I cant take the start and end out, because then sql doesn’t recognize the where clause.
Two approaches spring to mind: push the where clause to the subquery:
SELECT *
FROM
( SELECT WORKER_NAME, WORKER_CODE, WORKER_WAGE FROM WORKER ) AS W
NATURAL JOIN
( SELECT WORKER_NAME, START, END
FROM BOOKING
WHERE START NOT BETWEEN '2010-10-01' AND '2010-10-10' ) AS B
ORDER BY START, WORKER_NAME;
Alternatively, replace SELECT * with a list of columns you want to SELECT:
SELECT WORKER_NAME, WORKER_CODE, WORKER_WAGE
FROM
( SELECT WORKER_NAME, WORKER_CODE, WORKER_WAGE FROM WORKER ) AS W
NATURAL JOIN
( SELECT WORKER_NAME, START, END FROM BOOKING ) AS B
WHERE START NOT BETWEEN '2010-10-01' AND '2010-10-10'
ORDER BY START, WORKER_NAME;
This error comes after you attempt to call a field which exists in both tables, therefore you should make a reference. For instance in example below I first say cod.coordinator so that DBMS know which coordinator I want
SELECT project__number, surname, firstname,cod.coordinator FROMcoordinatorsAS co JOIN hub_applicants AS ap ON co.project__number = ap.project_id JOIN coordinator_duties AS cod ON co.coordinator = cod.email

Mysql deduplicate records in single query

I have the following table:
CREATE TABLE `relations` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`relationcode` varchar(25) DEFAULT NULL,
`email_address` varchar(100) DEFAULT NULL,
`firstname` varchar(100) DEFAULT NULL,
`latname` varchar(100) DEFAULT NULL,
`last_contact_date` varchar(25) DEFAULT NULL,
PRIMARY KEY (`id`)
)
In this table there are duplicates, these are relation with exact the same relationcode and email_address. They can be in there twice or even 10 times.
I need a query that selects the id's of all records, but excludes the ones that are in there more than once. Of those records, I only would like to select the record with the most recent last_contact_id only.
I'm more into Oracle than Mysql, In Oracle I would be able to do it this way:
select * from (
select row_number () over (partition by relationcode order by to_date(last_contact_date,'dd-mm-yyyy')) rank,
id,
relationcode,
email_address ,
last_contact_date
from RELATIONS)
where rank = 1
But I can't figure out how to modify this query to work in MySql. I'm not even dure it's possible to do the same thing in a single query in MySQl.
Any ideas?
Normal way to do this is a sub query to get the latest record and then join that against the table:-
SELECT id, relationcode, email_address, firstname, latname, last_contact_date
FROM RELATIONS
INNER JOIN
(
SELECT relationcode, email_address, MAX(last_contact_date) AS latest_contact_date
FROM RELATIONS
GROUP BY relationcode, email_address
) Sub1
ON RELATIONS.relationcode = Sub1.relationcode
AND RELATIONS.email_address = Sub1.email_address
AND RELATIONS.last_contact_date = Sub1.latest_contact_date
It is possible to manually generate the kind of rank that your Oracle query uses using variables. Bit messy though!
SELECT id, relationcode, email_address, firstname, latname, last_contact_date
FROM
(
SELECT id, relationcode, email_address, firstname, latname, last_contact_date, #seq:=IF(#relationcode = relationcode AND #email_address = email_address, #seq + 1, 1) AS seq, #relationcode := relationcode, #email_address := email_address
(
SELECT id, relationcode, email_address, firstname, latname, last_contact_date
FROM RELATIONS
CROSS JOIN (SELECT #seq:=0, #relationcode := '', #email_address :='') Sub1
ORDER BY relationcode, email_address, last_contact_date DESC
) Sub2
) Sub3
WHERE seq = 1
This uses a sub query to initialise the variables. The sequence number is added to if the relation code and email address are the same as the previous row, if not they are reset to 1 and stored in a field. Then the outer select check the sequence number (as a field, not as the variable name) and records only returned if it is 1.
Note that I have done this as multiple sub queries. Partly to make it clearer to you, but also to try to force the order that MySQL executes it is. There are a couple of possible issues with how MySQL says it may order the execution of things that could cause an issue. They never have done for me, but with sub queries I would hope for force the order.
Here is a method that will work in both MySQL and Oracle. It rephrases the question as: Get me all rows from relations where the relationcode has no larger last_contact_date.
It works something like this:
select r.*
from relations r
where not exists (select 1
from relations r2
where r2.relationcode = r.relationcode and
r2.last_contact_date > r.last_contact_date
);
With the appropriate indexes, this should be pretty efficient in both databases.
Note: This assumes that last_contact_date is stored as a date not as a string (as in your table example). Storing dates as strings is just a really bad idea and you should fix your data structure

Write a big sql query or handle it through code?

I have 2 tables built in this way:
Trips
- id
- organization_id REQUIRED
- collaboration_organization_id OPTIONAL
...other useless fields...
Organizations
- id
- name REQUIRED
...other useless fields...
Now I have been asked to create this type of report:
I want the sum of all trips for each organization, considering that if
they have a collaboration_organization_id it should count as 0.5,
obviusly the organization in collaboration_organization_id get a +0.5
too
So whenever I have a trip that has organization_id AND collaboration_organization_id set, that trip count as 0.5 for both organizations. If instead only organization_id is set, it counts as 1.
Now my question is composed by two parts:
1.
Is a good idea to "solve" the problem all in SQL?
I already know how to solve it through code, my idea is currently "select all trips (only those 3 fields) and start counting in ruby". Please consider that I'm using ruby on rails so could still be a good reason to say "no because it will work only on mysql".
2.
If point 1 is YES, I have no idea how to count for 0.5 each trip where it's required, because count is a "throw-in-and-do-it" function
I'm not familiar with ruby on rails, but this is how you can do this with MySQL.
Sample data:
CREATE TABLE Trips(
id int not null primary key,
organization_id int not null,
collaboration_organization_id int null
);
INSERT INTO Trips (id,organization_id,collaboration_organization_id)
VALUES
(1,1,5),
(2,1,1),
(3,1,2),
(4,11,1),
(5,1,null),
(6,2,null),
(7,10,null),
(8,6,2),
(9,1,3),
(10,1,4);
MySQL Query:
SELECT organization_id,
sum(CASE WHEN collaboration_organization_id IS null THEN 1 ELSE 0.5 End) AS number
FROM Trips
GROUP BY organization_id;
Try it out via: http://www.sqlfiddle.com/#!2/1b01d/107
EDIT: adding collaboration organization
Sample data:
CREATE TABLE Trips(
id int not null primary key,
organization_id int not null,
collaboration_organization_id int null
);
INSERT INTO Trips (id,organization_id,collaboration_organization_id)
VALUES
(1,1,5),
(2,1,1),
(3,1,2),
(4,11,1),
(5,1,null),
(6,2,null),
(7,10,null),
(8,6,2),
(9,1,3),
(10,1,4);
CREATE TABLE Organizations(
id int auto_increment primary key,
name varchar(30)
);
INSERT INTO Organizations (name)
VALUES
("Org1"),
("Org2"),
("Org3"),
("Org4"),
("Org5"),
("Org6"),
("Org7"),
("Org8"),
("Org9"),
("Org10"),
("Org11"),
("Org12"),
("Org13"),
("Org14"),
("Org15"),
("Org16");
MySQL query:
SELECT O.id, O.name,
sum(CASE WHEN T.collaboration_organization_id IS null THEN 1 ELSE 0.5 End) AS number
FROM Organizations AS O LEFT JOIN Trips AS T
ON T.organization_id = O.id OR T.collaboration_organization_id = O.id
WHERE T.collaboration_organization_id = O.id OR O.id = T.organization_id
GROUP BY O.id;
http://www.sqlfiddle.com/#!2/ee557/15

Get all posts from a specific category

The Situation
As some of you might already know from my previous questions, I'm currently developing a Blog-system.
This time, I'm stuck at getting all posts from a specific category, with their category.
Database
Here are the SQL-commands to create the three required tables.
Post
create table Post(
headline varchar(100),
date datetime,
content text,
author int unsigned,
public tinyint,
type int,
ID serial,
Primary Key (ID),
)ENGINE=INNODB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
author is the ID of the user who created the post, public determines if the post can be read from everyone or is just a draft and type determines if it's a blog-post (0) or something else.
Category
create table Kategorie(
name varchar(30),
short varchar(200),
ID serial,
Primary Key (name)
)ENGINE=INNODB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Post_Kategorie
create table Post_Kategorie(
post_ID bigint unsigned,
kategorie_ID bigint unsigned,
Primary Key (post_ID, kategorie_ID),
Foreign Key (post_ID) references Post(ID),
Foreign Key (kategorie_ID) references Kategorie(ID)
)ENGINE=INNODB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
The Query
This is my current query to get all posts tagged with a specific category, which is determined by the category's ID:
SELECT Post.headline, Post.date, Post.ID,
CONCAT(
"[", GROUP_CONCAT('{"name":"',Kategorie.name,'","id":',Kategorie.ID,'}'), "]"
) as "categorys"
FROM Post
INNER JOIN Post_Kategorie
ON Post.ID = Post_Kategorie.post_ID
INNER JOIN Kategorie
ON Post_Kategorie.kategorie_ID = 2
WHERE Post.public = 1
AND Post.type = 0
GROUP BY Post.headline, Post.date
ORDER BY Post.date DESC
LIMIT 0, 20
The query works for listing all posts tagged with a specific category, but the categorys-column gets mixed up as every listed post has all available category's (every category listed in the Kategorie-table).
I'm sure the problem lays in the INNER JOIN-condition, but I have no clue where. Please point me in the right direction.
I suspect there might be issues with your CONCAT function, as it mixes different types of quotation marks. I think "[" and "]" should be respectively '[' and ']'.
Otherwise, the problem does seem to be with one of the joins. In particular, INNER JOIN Kategorie does not specify the joining condition, which, I think, should be Post_Kategorie.Kategorie_ID = Kategorie.ID.
There entire query should thus be something like this:
SELECT Post.headline, Post.date, Post.ID,
CONCAT(
"[", GROUP_CONCAT('{"name":"',Kategorie.name,'","id":',Kategorie.ID,'}'), "]"
) as "categorys"
FROM Post
INNER JOIN Post_Kategorie
ON Post.ID = Post_Kategorie.post_ID
INNER JOIN Kategorie
ON Post_Kategorie.Kategorie_ID = Kategorie.ID
WHERE Post.public = 1
AND Post.type = 0
GROUP BY Post.headline, Post.date
HAVING MAX(CASE Post_Kategorie.kategorie_ID WHEN 2 THEN 1 ELSE 0 END) = 1
ORDER BY Post.date DESC
LIMIT 0, 20
The Post_Kategorie.kategorie_ID = 2 condition has been modified to a CASE expression and moved to the HAVING clause, and it is used together with the MAX() aggregate function. This works as follows:
If a post is tagged with a tag or tags belonging to Kategorie.ID = 2, the CASE expression will return 1, and MAX will evaluate to 1 too. Consequently, all the group will be valid and remain in the output.
If no tag the post is tagged with belongs to the said category, the CASE expression will never evaluate to 1, nor will MAX. As a result, the entire group will be discarded.