MySQL joins involving aggregate data - mysql

What I require is a SQL query which can report on data from aggregate and singular tables. The current database I have is as follows.
CREATE TABLE IF NOT EXISTS `faults_days` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`employee_id` int(11) NOT NULL,
`day_date` date NOT NULL,
`actioned_calls_out` int(11) NOT NULL,
`actioned_calls_in` int(11) NOT NULL,
`actioned_tickets` int(11) NOT NULL,
)
CREATE TABLE IF NOT EXISTS `faults_departments` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(40) NOT NULL,
)
CREATE TABLE IF NOT EXISTS `faults_employees` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`team_id` int(11) NOT NULL,
`name` varchar(127) NOT NULL,
)
CREATE TABLE IF NOT EXISTS `faults_qos` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`qos_date` datetime NOT NULL,
`employee_id` int(11) NOT NULL,
`score` double NOT NULL,
)
CREATE TABLE IF NOT EXISTS `faults_teams` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`department_id` int(11) NOT NULL,
`name` varchar(40) NOT NULL,
)
A row in Day tracks a single employee's performance for a single day (number of calls taken, number of tickets actioned). A Qos is a measure of an employee's quality on a day (there can be more than one Qos per day - what I need to obtain is the average score). Also, a Qos can be performed on a day where the employee has no performance entry in the database, and this will still need to be shown on the report.
The required end result is 4 reports, which show the employee performance grouped by different columns. A breakdown of a single employee's performance per day, an employee's total performance over a period of time, a team's performance over a period of time, and a whole department's performance over a period of time.
My problem, is that my current queries are a little convoluted, and require two separate queries for the Day data, and the Qos data. My PHP application then combines the data before outputting the report. What I would like, is a single query which returns both total performance, and average quality scores.
The current queries I have to show employee performances are:
SELECT
`Employee`.`name` ,
`Team`.`name` ,
`Department`.`name` ,
SUM( `Day`.`actioned_calls_in` ) + SUM( `Day`.`actioned_calls_out` ) ,
SUM( `Day`.`actioned_tickets` )
FROM
`faults_days` AS `Day`
JOIN
`faults_employees` AS `Employee` ON `Day`.`employee_id` = `Employee`.`id`
JOIN
`faults_teams` AS `Team` ON `Employee`.`team_id` = `Team`.`id`
JOIN
`faults_departments` AS `Department` ON `Team`.`department_id` = `Department`.`id`
WHERE
`Day`.`day_date` >= '2011-06-01'
AND `Day`.`day_date` <= '2011-06-07'
GROUP BY `Employee`.`id`
WITH ROLLUP
and
SELECT
`Employee`.`name` ,
`Team`.`name` ,
`Department`.`name` ,
COUNT( `Qos`.`score` ) ,
AVG( `Qos`.`score` )
FROM
`faults_qos` AS `Qos`
JOIN
`faults_employees` AS `Employee` ON `Qos`.`employee_id` = `Employee`.`id`
JOIN
`faults_teams` AS `Team` ON `Employee`.`team_id` = `Team`.`id`
JOIN
`faults_departments` AS `Department` ON `Team`.`department_id` = `Department`.`id`
WHERE
`Qos`.`qos_date` >= '2011-06-01'
AND `Qos`.`qos_date` <= '2011-06-07'
GROUP BY `Employee`.`id`
WITH ROLLUP
I have also tried simply joining the Qos table, but because it returns multiple rows it messes up the SUM() totals, and also has problems due to the missing FULL OUTER JOIN functionality.
EDIT:
I've made some small progress with this. It looks like using subqueries is the way to go, but everything I'm doing is pure guesswork. Here's what I've got so far, its only showing a row if there's an entry in both the Day and Qos tables, which is not what I want, and I've no idea how to expand it to include the various groupings described above.
SELECT
`Employee`.`name` ,
`Team`.`name` ,
`Department`.`name`,
`Day`.`Calls`,
`Day`.`Tickets`,
`Qos`.`NumQos`,
`Qos`.`Score`
FROM `faults_employees` AS `Employee`
JOIN
`faults_teams` AS `Team` ON `Employee`.`team_id` = `Team`.`id`
JOIN
`faults_departments` AS `Department` ON `Team`.`department_id` = `Department`.`id`
JOIN
(SELECT
`Day`.`employee_id` AS `eid`,
SUM(`Day`.`actioned_calls_in`) + SUM(`Day`.`actioned_calls_out`) AS `Calls`,
SUM(`Day`.`actioned_tickets`) AS `Tickets`
FROM `faults_days` AS `Day`
WHERE
`Day`.`day_date` = '2011-03-02'
GROUP BY `Day`.`employee_id`
) AS `Day`
ON `Day`.`eid` = `Employee`.`id`
JOIN
(SELECT
`Qos`.`employee_id` AS qid,
COUNT(`Qos`.`id`) AS `NumQos`,
AVG(`Qos`.`score`) AS `Score`
FROM `faults_qos` AS `Qos`
WHERE
`Qos`.`qos_date` = '2011-03-02'
GROUP BY `Qos`.`employee_id`
) AS `Qos`
ON `Qos`.`qid` = `Employee`.`id`
GROUP BY `Employee`.`id`

You do want the left joins on the fault_qos and fault_days subqueries. That's what will give you a result even if there isn't a corresponding row in one or both. A left join says that the value is necessary in the table(s) to the left of the join that are involved in the join but not the one on the right. I haven't tested this, and it's late, so I might not be thinking clearly, but if you change your query to this it should work:
SELECT
`Employee`.`name` ,
`Team`.`name` ,
`Department`.`name`,
`Day`.`Calls`,
`Day`.`Tickets`,
`Qos`.`NumQos`,
`Qos`.`Score`
FROM `faults_employees` AS `Employee`
JOIN
`faults_teams` AS `Team` ON `Employee`.`team_id` = `Team`.`id`
JOIN
`faults_departments` AS `Department` ON `Team`.`department_id` = `Department`.`id`
LEFT JOIN
(SELECT
`Day`.`employee_id` AS `eid`,
SUM(`Day`.`actioned_calls_in`) + SUM(`Day`.`actioned_calls_out`) AS `Calls`,
SUM(`Day`.`actioned_tickets`) AS `Tickets`
FROM `faults_days` AS `Day`
WHERE
`Day`.`day_date` = '2011-03-02'
GROUP BY `Day`.`employee_id`
) AS `Day`
ON `Day`.`eid` = `Employee`.`id`
LEFT JOIN
(SELECT
`Qos`.`employee_id` AS qid,
COUNT(`Qos`.`id`) AS `NumQos`,
AVG(`Qos`.`score`) AS `Score`
FROM `faults_qos` AS `Qos`
WHERE
`Qos`.`qos_date` = '2011-03-02'
GROUP BY `Qos`.`employee_id`
) AS `Qos`
ON `Qos`.`qid` = `Employee`.`id`
GROUP BY `Employee`.`id`

Related

Mysql Inner join - how to use value of first table column in join clause

In my "bookings" table, each booking has a number of persons and an "event_time" , which is one of three time slots which is bookable.
In my query I am trying to return how many free seats there are left for each restaurant and time slot (event_time number)
I select restaurants and do an INNER JOIN to include the bookings table, but I would need access to the "number_of_seats_max" column from the restaurants table inside the inner join, which does not seem possible.
Here is fiddle.
Tables:
CREATE TABLE `restaurants` (
`id` int(10) UNSIGNED NOT NULL,
`title` text COLLATE utf8mb4_unicode_ci,
`number_of_seats_max` int(11) DEFAULT NULL
);
CREATE TABLE `bookings` (
`id` int(10) UNSIGNED NOT NULL,
`event_date` timestamp NULL DEFAULT NULL,
`event_time` int(11) NOT NULL,
`number_of_persons` int(11) NOT NULL,
`restaurant_id` int(11) NOT NULL
);
The below query works, but in this case I have hard coded "80" instead of the max seats column ( r.number_of_seats_max ). Thats the column I need to use. If you put r.number_of_seats_max instead, you get the error "unknown column".
SELECT r.title, r.number_of_seats_max, innerquery.free_seats_left,
innerquery.num_persons_booked
FROM restaurants r
INNER JOIN(
select
restaurant_id,
SUM(number_of_persons) as num_persons_booked,
(80 - SUM(number_of_persons)) AS free_seats_left // <-- 80 is hard coded
from bookings
WHERE event_date = '2019-07-18'
group by event_time,restaurant_id
ORDER BY free_seats_left DESC
) as innerquery
ON innerquery.restaurant_id = r.id;
How can I solve it?
Do the subtraction in the main query, not the subquery.
SELECT r.title, innerquery.event_time, r.number_of_seats_max,
r.number_of_seats_max - innerquery.num_persons_booked AS free_seats_left,
innerquery.num_persons_booked
FROM restaurants r
INNER JOIN(
select
restaurant_id,
event_time,
SUM(number_of_persons) as num_persons_booked
from bookings
WHERE event_date = '2019-07-18'
group by event_time,restaurant_id
) as innerquery
ON innerquery.restaurant_id = r.id
ORDER BY free_seats_left DESC
I added event_time to the SELECT list of both the subquery and the main query, so you can show the available seats for each time slot.

MysqL big table query optimization

I have a chatting application. I have an api which returns list of users who the user talked. But it takes a long to mysql return a list messages when it reachs 100000 rows of data.
This is my messages table
CREATE TABLE IF NOT EXISTS `messages` (
`_id` int(11) NOT NULL AUTO_INCREMENT,
`fromid` int(11) NOT NULL,
`toid` int(11) NOT NULL,
`message` text NOT NULL,
`attachments` text NOT NULL,
`status` tinyint(1) NOT NULL DEFAULT '0',
`date` datetime NOT NULL,
`delete` varchar(50) NOT NULL,
`uuid_read` varchar(250) NOT NULL,
PRIMARY KEY (`_id`),
KEY `fromid` (`fromid`,`toid`,`status`,`delete`,`uuid_read`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=118561 ;
and this is my users table (simplified)
CREATE TABLE IF NOT EXISTS `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`login` varchar(50) DEFAULT '',
`sex` tinyint(1) DEFAULT '0',
`status` varchar(255) DEFAULT '',
`avatar` varchar(30) DEFAULT '0',
`last_active` datetime DEFAULT NULL,
`active` tinyint(1) DEFAULT '1',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=15523 ;
And here is my query (for user with id 1930)
select SQL_CALC_FOUND_ROWS `u_id`, `id`, `login`, `sex`, `birthdate`, `avatar`, `online_status`, SUM(`count`) as `count`, SUM(`nr_count`) as `nr_count`, `date`, `last_mesg` from
(
(select `m`.`fromid` as `u_id`, `u`.`id`, `u`.`login`, `u`.`sex`, `u`.`birthdate`, `u`.`avatar`, `u`.`last_active` as online_status, COUNT(`m`.`_id`) as `count`, (COUNT(`m`.`_id`)-SUM(`m`.`status`)) as `nr_count`, `tm`.`date` as `date`, `tm`.`message` as `last_mesg` from `messages` as m inner join `messages` as tm on `tm`.`_id`=(select MAX(`_id`) from `messages` as `tmz` where `tmz`.`fromid`=`m`.`fromid`) left join `users` as u on `u`.`id`=`m`.`fromid` where `m`.`toid`=1930 and `m`.`delete` not like '%1930;%' group by `u`.`id`)
UNION
(select `m`.toid as `u_id`, `u`.`id`, `u`.`login`, `u`.`sex`, `u`.`birthdate`, `u`.`avatar`, `u`.`last_active` as online_status, COUNT(`m`.`_id`) as `count`, 0 as `nr_count`, `tm`.`date` as `date`, `tm`.`message` as `last_mesg` from `messages` as m inner join `messages` as tm on `tm`.`_id`=(select MAX(`_id`) from `messages` as `tmz` where `tmz`.`toid`=`m`.`toid`) left join `users` as u on `u`.`id`=`m`.`toid` where `m`.`fromid`=1930 and `m`.`delete` not like '%1930;%' group by `u`.`id`)
order by `date` desc ) as `f` group by `u_id` order by `date` desc limit 0,10
Please help to optimize this query
What I need,
Who user talked to (name, sex, and etc)
What was the last message (from me or to me)
Count of messages (all)
Count of unread messages (only to me)
The query works well, but takes too long.
The output must be like this
You have some design problems on your query and database.
You should avoid keywords as column names, as that delete column or the count column;
You should avoid selecting columns not declared in the group by without an aggregation function... although MySQL allows this, it's not a standard and you don't have any control on what data will be selected;
Your not like construction may cause a bad behavior on your query because '%1930;%' may match 11930; and 11930 is not equal to 1930;
You should avoid like constructions starting and ending with % wildcard, which will cause the text processing to take longer;
You should design a better way to represent a message deletion, probably a better flag and/or another table to save any important data related with the action;
Try to limit your result before the join conditions (with a derived table) to perform less processing;
I tried to rewrite your query the best way I understood it. I've executed my query in a messages table with ~200.000 rows and no indexes and it performed in 0,15 seconds. But, for sure you should create the right indexes to help it perform better when the amount of data increase.
SELECT SQL_CALC_FOUND_ROWS
u.id,
u.login,
u.sex,
u.birthdate,
u.avatar,
u.last_active AS online_status,
g._count,
CASE WHEN m.toid = 1930
THEN g.nr_count
ELSE 0
END AS nr_count,
m.`date`,
m.message AS last_mesg
FROM
(
SELECT
MAX(_id) AS _id,
COUNT(*) AS _count,
COUNT(*) - SUM(m.status) AS nr_count
FROM messages m
WHERE 1=1
AND m.`delete` NOT LIKE '%1930;%'
AND
(0=1
OR m.fromid = 1930
OR m.toid = 1930
)
GROUP BY
CASE WHEN m.fromid = 1930
THEN m.toid
ELSE m.fromid
END
ORDER BY MAX(`date`) DESC
LIMIT 0, 10
) g
INNER JOIN messages AS m ON 1=1
AND m._id = g._id
LEFT JOIN users AS u ON 0=1
OR (m.fromid <> 1930 AND u.id = m.fromid)
OR (m.toid <> 1930 AND u.id = m.toid)
ORDER BY m.`date` DESC
;

Writing a JOIN query with two counts from joined table

I've spent a few hours fighting with this, but I can't get the counts to work. Hopefully someone can help?!
I have a project table and task table, linked on the project_id. I can get the project_id, project_name, and the status_id with the query below:
SELECT
a.project_id,
a.project_name,
b.status_id
FROM project_list as a
INNER JOIN task_list as b
ON a.project_id=b.project_id
I'd like to select a single record for each project and add two count fields based on the status_id. In pseudo code:
SELECT
a.project_id,
a.project_name,
(SELECT COUNT(*) FROM task_list WHERE status_id < 3) as not_completed,
(SELECT COUNT(*) FROM task_list WHERE status_id = 3) as completed
FROM project_list as a
INNER JOIN task_list as b
ON a.project_id=b.project_id
GROUP BY project_id
My create table scripts are below:
CREATE TABLE `project_list` (
`project_id` int(11) NOT NULL AUTO_INCREMENT,
`topic_id` int(11) DEFAULT NULL,
`project_name` varchar(45) DEFAULT NULL,
PRIMARY KEY (`project_id`)
)
CREATE TABLE `task_list` (
`task_id` int(11) NOT NULL AUTO_INCREMENT,
`project_id` int(11) DEFAULT NULL,
`task_name` varchar(45) DEFAULT NULL,
`status_id` int(11) DEFAULT '0',
PRIMARY KEY (`task_id`)
)
Any help is much appreciated. Thanks!
EDIT: ANSWER:
SELECT
a.project_id,
project_name,
SUM(status_id != 3) AS not_completed,
SUM(status_id = 3) AS completed,
SUM(status_id IS NOT NULL) as total
FROM tasks.project_list as a
INNER JOIN tasks.task_list as b
ON a.project_id=b.project_id
GROUP BY a.project_id
The problem is that in your subqueries you are counting all the rows in the whole table rather than just the rows that have the correct project_id. You could fix this by modifying the WHERE clause in each of your subqueries.
(SELECT COUNT(*)
FROM task_list AS c
WHERE c.status_id < 3
AND a.project_id = c.project_id)
However a simpler approach is to use SUM with a boolean condition instead of COUNT to count the rows that match the condition:
SELECT
a.project_id,
a.project_name,
SUM(b.status_id < 3) AS not_completed,
SUM(b.status_id = 3) AS completed,
FROM project_list as a
INNER JOIN task_list as b
ON a.project_id = b.project_id
GROUP BY project_id
This works because TRUE evaluates to 1 and FALSE evaluates to 0.

Need help creating SQL query

I have thus two tables:
CREATE TABLE `workers` (
`id` int(7) NOT NULL AUTO_INCREMENT,
`number` int(7) NOT NULL,
`percent` int(3) NOT NULL,
`order` int(7) NOT NULL,
PRIMARY KEY (`id`)
);
CREATE `data` (
`id` bigint(15) NOT NULL AUTO_INCREMENT,
`workerId` int(7) NOT NULL,
PRIMARY KEY (`id`)
);
I want to return the first worker (order by order ASC) that his number of rows in the table data times percent(from table workers) /100 is smaller than number(from table workers.
I have tried this query:
SELECT workers.id, COUNT(data.id) AS `countOfData`
FROM `workers` as workers, `data` as data
WHERE data.workerId = workers.id
AND workers.percent * `countOfData` < workers.number
LIMIT 1
But I get the error:
#1054 - Unknown column 'countOfData' in 'where clause'
This should work:
SELECT A.id
FROM workers A
LEFT JOIN (SELECT workerId, COUNT(*) AS Quant
FROM data
GROUP BY workerId) B
ON A.id = B.workerId
WHERE (COALESCE(Quant,0) * `percent`)/100 < `number`
ORDER BY `order`
LIMIT 1
You could calculate the number of rows per worker in a subquery. The subquery can be joined to the worker table. If you use a left join, a worker with no data rows will be considered:
select *
from workers w
left join
(
select workerId
, count(*) as cnt
from data
group by
workerId
) d
on w.id = d.workerId
where coalesce(d.cnt, 0) * w.percent / 100 < w.number
order by
w.order
limit 1

sql join on two fields in one table

i have bookings table which has two people- i want to return person_1 as a row, person_2 as a new row but with the person's id related to the people table
This is as far as i got-but doesnt pull in booking info
SELECT people.* FROM (
(select booking.person_1 as id from booking)
union ALL
(select booking.person_2 as id from booking)
) as peopleids
join people on people.id = peopleids.id;
heres my structure
CREATE TABLE IF NOT EXISTS `booking` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`slot` enum('morning_drive','afternoon_loop','return_drive') NOT NULL,
`type` enum('911','vintage_911') NOT NULL,
`car` int(11) NOT NULL,
`person_1` int(11) DEFAULT NULL,
`person_2` int(11) DEFAULT NULL,
`dated` date NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=5 ;
CREATE TABLE IF NOT EXISTS `people` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`first_name` varchar(50) NOT NULL,
`last_name` varchar(50) NOT NULL,
`organisation` varchar(100) NOT NULL,
`event_date` date NOT NULL,
`wave` varchar(20) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=9 ;
any ideas on how i could get a result set like- person.first_name, person.last_name, person.organisation, booking.dated, person.car, person.slot. im struggling with having two fields and having them to relate them into the one list
update for anyone interested in this and joining a 3rd table
heres my final query with php vars to pull in my certain dates and slots and also join a third table
SELECT peopleids.id,
peopleids.car,
cars.nr,
p.first_name,
p.last_name,
p.organisation,
p.event_date,
p.wave
FROM (SELECT booking.car, booking.person_1 as id FROM booking WHERE booking.dated = '".$date."' AND booking.`slot` = '".$slot."'
union ALL SELECT booking.car, booking.person_2 as id FROM booking WHERE booking.dated = '".$date."' AND booking.`slot` = '".$slot."'
) as peopleids
LEFT JOIN people p ON p.id = peopleids.id LEFT JOIN cars on cars.id = peopleids.car;
SELECT
ag.id,
p.first_name,
p.last_name,
p.organisation,
p.event_date,
p.wave
FROM (
SELECT booking.person_1 as id, booking.Car as car FROM booking
union ALL
SELECT booking.person_2 as id, booking.Car as car FROM booking
) as ag
JOIN people p ON people.id = ag.id;
INNER | LEFT JOIN Cars c ON c.ID = ag.car
select people.first_name as firstname,
people.last_name as lastname,
people.organisation,
booking.dated,
booking.car,
booking.slot from booking
left join people on booking.person_1 = people.id
OR
select people.first_name as firstname,
people.last_name as lastname,
people.organisation,
booking.dated,
booking.car,
booking.slot
from booking
left join people on booking.person_1 = people.id or booking.person_2 = people.id
check that...if you still need help will help you