MYSQL self join get row based on MAX and having - mysql

I need to COUNT rows from 2 tables and only get the rows with highest g_event_id if events.event_id IN (30, 31, 32, 33). Only take rows in account where events.event_id is 30-33.
SQL Fiddle: Fiddle
My tables:
CREATE TABLE event_parties
(`g_event_id` int, `agent_id` int)
;
INSERT INTO event_parties
(`g_event_id`, `agent_id`)
VALUES
(2917, '2'),
(2918, '2'),
(2919, '2'),
(3067, '3'),
(3078, '3'),
(3079, '1'),
(3082, '1'),
(3917, '2'),
(3918, '2'),
(3919, '2'),
(4067, '3'),
(4078, '3'),
(4079, '1'),
(5067, '3'),
(5078, '3'),
(5079, '1'),
(6067, '3'),
(6078, '3'),
(6079, '1'),
(7067, '3'),
(7078, '3'),
(7079, '1'),
(8067, '3'),
(8078, '3'),
(8079, '1')
;
CREATE TABLE events
(`g_event_id` int, `event_id` int, `event_time` datetime)
;
INSERT INTO events
(`g_event_id`, `event_id`, `event_time`)
VALUES
(2917, '29', '2016-10-19 15:24:25'),
(2918, '31', '2016-10-19 15:24:28'),
(2919, '21', '2016-10-19 15:29:46'),
(3067, '29', '2016-10-20 15:33:46'),
(3078, '23', '2016-10-21 15:29:46'),
(3079, '29', '2016-10-20 15:34:46'),
(3082, '30', '2016-10-21 15:42:46'),
(3917, '29', '2016-10-19 15:24:25'),
(3918, '31', '2016-10-19 15:24:28'),
(3919, '21', '2016-10-19 15:29:46'),
(4067, '29', '2016-10-20 15:33:46'),
(4078, '23', '2016-10-21 15:29:46'),
(4079, '29', '2016-10-20 15:34:46'),
(5067, '29', '2016-10-20 15:33:46'),
(5078, '23', '2016-10-21 15:29:46'),
(5079, '29', '2016-10-20 15:34:46'),
(6067, '29', '2016-10-20 15:33:46'),
(6078, '23', '2016-10-21 15:29:46'),
(6079, '29', '2016-10-20 15:34:46'),
(7067, '29', '2016-10-20 15:33:46'),
(7078, '23', '2016-10-21 15:29:46'),
(7079, '29', '2016-10-20 15:34:46'),
(8067, '29', '2016-10-20 15:33:46'),
(8078, '23', '2016-10-21 15:29:46'),
(8079, '29', '2016-10-20 15:34:46')
;
The select is suppose to give me the status of an Callcenter Agent, i want to count how many agents (agent_id) in each state (event_id). As the table "events" is just events of the agents i only need to count the latest (with highest value) g_event_id of each agent_id and tricky part is that i only want to count where event_id IN (30, 31, 32, 32, 33).
So basicly, select rows with highest g_event_id (and event_id IN (30, 31, 32, 33)) for each agent_id.
I need an JOIN between these two tables with g_event_id as the ID. The field g_event_id is the key and only appears once. I need all fields in table events and I need the row with highest g_event_id or with highest event_time.
Like this:
event_id N_events
-------- ----------
31 1
30 1
I have this select so far:
SELECT event_id,
COUNT(events.event_id) AS N_events
FROM event_parties
INNER JOIN events USING (g_event_id)
LEFT JOIN event_parties AS later_event
ON (later_event.agent_id = event_parties.agent_id
AND later_event.g_event_id > event_parties.g_event_id)
WHERE later_event.g_event_id IS NULL AND event_parties.agent_id != 0 AND events.`event_id` IN (30,31,32,33)
GROUP BY events.event_id
Problem with this select above is that it will only give me the rows with highest g_event_id, i want to only select rows having events.event_id = (30, 31, 32, 33) and then count the rows with highest g_event_id
Been trying to use having after the GROUP (HAVING events.event_id IN (30,31,32,33)) without any success.

This query should give you your result:
select e.event_id, count(stats.agent_id) as N_count
from (
select max(p.g_event_id) as g_event_id, p.agent_id
from events e
join event_parties p
on e.g_event_id = p.g_event_id
where e.event_id in (30,31,32,33)
group by p.agent_id
) as stats
join events e
on e.g_event_id = stats.g_event_id
group by e.event_id;
The inner query (stats) first retrieves the latest relevant status of each agent: it will get the largest g_event_id for each agent_id with an event_id in the given range (so at most one row for each agent).
It will then be joined with the events-table to retrieve the actual event_id for this g_event_id; then it counts the number of agents per event_id.
As worked out in the comments, this assumes that g_event_id is the primary key for both tables (but especially for events), and that the newest status is given by the largest g_event_id, not the event_time.

Related

SQL query to get number of clients with last statement equal connected

I need to make a SQL query
table 'records' structure:
contact_id(integer),
client_id(integer),
worker_id(integer),
statement_status(varchar),
contact_ts(timestamp)
It has to show the following:
current date
number of clients which last statement_status was 'interested'
number of clients which last statement_status was 'not_interested' and previus status was 'not_present'
Could somebody help?
sample data:
contact_id client_id contact_ts worker_id statement_status
'1', '181', '2017-09-24 03:38:31.000000', '107', 'voicemail'
'2', '72', '2017-09-23 09:32:38.000000', '10', 'not_interested'
'3', '277', '2017-09-22 07:06:16.000000', '119', 'interested'
'4', '36', '2017-09-21 04:39:57.000000', '118', 'not_present'
'5', '33', '2017-09-20 04:12:12.000000', '161', 'voicemail'
'6', '244', '2017-09-19 02:26:30.000000', '13', 'not_interested'
'7', '346', '2017-09-18 02:30:35.000000', '255', 'interested'
'8', '128', '2017-09-17 06:20:13.000000', '52', 'not_present'
'9', '33', '2017-09-16 08:58:02.000000', '188', 'not_present'
'10', '352', '2017-09-15 08:18:40.000000', '324', 'not_interested'
'11', '334', '2017-09-14 04:27:40.000000', '373', 'interested'
'12', '2', '2017-09-13 08:44:40.000000', '40', 'not_present'
'13', '33', '2017-09-12 03:46:16.000000', '252', 'voicemail'
'14', '366', '2017-09-11 04:31:22.000000', '78', 'not_interested'
'15', '184', '2017-09-10 06:08:01.000000', '289', 'interested'
'16', '184', '2017-09-09 05:45:56.000000', '124', 'not_present'
'17', '102', '2017-09-08 07:09:30.000000', '215', 'voicemail'
'18', '140', '2017-09-07 08:09:18.000000', '196', 'not_interested'
'19', '315', '2017-09-06 05:13:40.000000', '242', 'interested'
'20', '268', '2017-09-05 07:41:40.000000', '351', 'not_present'
'21', '89', '2017-09-04 05:32:05.000000', '232', 'voicemail'
desired output:
Time, interested, not-interested
2017-09-10 06:08:01, 5, 5
I tried something with sub queries, but it obviously doesn't work:
SELECT
GETDATE()
,(select count(*)
from record a
where (select statement_status
from record
where client_id == a.client_id
order by a.contact_ts
limit 1) == "interested"
group by a.contact_id)
,(select count(*)
from record a
where (select (select statement_status
from record
where client_id == a.client_id
order by a.contact_ts
limit 2) order by a.contact_ts desc limit 1) == "interested"
and
(select statement_status
from record
where client_id == a.client_id
order by a.contact_ts
limit 1) == "interested"
group by a.contact_id)
from record b;
How should I use the inner selects?
I must write a poem, because most of my post is a code.
So maybe something from "Dead man"?
“Don't let the sun burn a hole in your ass, William Blake. Rise now, and drive your cart and plough over the bones of the dead!”
;)
Try something like this:
WITH status AS (
SELECT DISTINCT client_id,
first_value(statement_status) OVER w1 AS last_status,
nth_value(statement_status, 2) OVER w1 AS prev_status
FROM records
WINDOW w1 AS (PARTITION BY client_id ORDER BY contact_ts DESC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
)
SELECT CURRENT_DATE(),
SUM(last_status = 'interested') AS interesed,
SUM(last_status = 'not_interested' AND prev_status = 'not_present') AS not_interested
FROM status

Order of SQL Query Clauses

I ran into a situation where I don't seem to get what SQL is doing. I have the following table and want to give out all the sorts of coffee which have the most amount of rating=5 with the amount itself.
create table likes
(
CName varchar(30),
UName varchar(30),
Rating int
);
insert into likes (CName, UName, Rating)
values ('Java', 'Klaus', '5'),
('Super', 'Klaus', '5'),
('MP', 'Klaus', '3'),
('Java', 'Marc', '5'),
('Mp', 'Marc', '5'),
('Super', 'Marc', '2'),
('Java', 'Nine', '2'),
('Super', 'Nine', '0'),
('MP', 'Karo', '3'),
('Super', 'Fabian', '4');
However this solution doesn't work as intended
SELECT
favcof.CName, favcof.cnt
FROM
(SELECT l.CName, COUNT(CName) cnt
FROM likes l
WHERE l.rating = 5
GROUP BY CName) favcof
WHERE
favcof.cnt = (SELECT MAX(favcof.cnt))
It executes as if there is no outer where-clause and gives out all sorts of coffees with their amount of rating = 5.
The expression (select max(favcof.cnt)) doesn't do anything. You can just drop the select and you will get favcof.cnt = favcof.cnt.
This is a little complicated, because favcof.cnt = max(favcof.cnt) would generate a syntax error because aggregation functions are not allowed in the where clause. So, the select subquery is actually an aggregation subquery with no from. Because there is only one value, it returns that value.
You want a correlated subquery. This would look like:
SELECT favcof.CName, favcof.cnt
FROM (SELECT l.UName, count(UName) as cnt
FROM likes l
WHERE l.rating=5
GROUP BY UName
) favcof
WHERE favcof.cnt = (SELECT MAX(favcof2.cnt)
FROM (SELECT l2.UName, count(l2.UName) as cnt
FROM likes l2
WHERE l2.rating=5
GROUP BY l2.UName
) favcof2
);
There are definitely other ways to write this query. However, this should help you understand why your version does not do what you want it to do.
you can do like this
DECLARE #likes AS TABLE(CName NVARCHAR(50), UName NVARCHAR(50), Rating INT)
insert into #likes (CName, UName, Rating) values
('Java', 'Klaus', '5'),
('Super', 'Klaus', '5'),
('MP', 'Klaus', '3'),
('Java', 'Marc', '5'),
('Mp', 'Marc', '5'),
('Super', 'Marc', '2'),
('Java', 'Nine', '2'),
('Super', 'Nine', '0'),
('MP', 'Karo', '3'),
('Super', 'Fabian', '4');
SELECT UName, COUNT(CName) Cnt FROM #Likes
WHERE Rating = (SELECT MAX(Rating) FROM #Likes)
GROUP BY UNAME
DEMO

Handling MySql Group By

Below is the mysql code
CREATE TABLE pricing
(
`id` INT NOT NULL AUTO_INCREMENT, `cost` FLOAT NOT NULL,
`valid_on` TIMESTAMP NOT NULL, `quantity` INT NOT NULL,
PRIMARY KEY (`id`)
);
INSERT INTO pricing (`id`, `cost`, `valid_on`, `quantity`) VALUES
(NULL, '4', '2017-01-01 00:00:00', '1'),
(NULL, '4', '2017-01-02 00:00:00', '1'),
(NULL, '4', '2017-01-03 00:00:00', '1'),
(NULL, '5', '2017-01-04 00:00:00', '2'),
(NULL, '5', '2017-01-05 00:00:00', '2'),
(NULL, '4', '2017-01-06 00:00:00', '2'),
(NULL, '4', '2017-01-07 00:00:00', '3'),
(NULL, '5', '2017-01-08 00:00:00', '3'),
(NULL, '5', '2017-01-09 00:00:00', '3'),
(NULL, '4', '2017-01-10 00:00:00', '3'),
(NULL, '4', '2017-01-11 00:00:00', '3'),
(NULL, '4', '2017-01-12 00:00:00', '2'),
(NULL, '5', '2017-01-13 00:00:00', '2'),
(NULL, '5', '2017-01-14 00:00:00', '2');
So when Group By is done on quantity following results are displayed.
select quantity, sum(cost) from pricing GROUP BY quantity
1 - 12
2 - 28
3 - 22
But actually I need something like the below results.
quantity start_date end_date cost
1 2017-01-01 00:00:00 2017-01-03 00:00:00 12
2 2017-01-04 00:00:00 2017-01-06 00:00:00 14
3 2017-01-07 00:00:00 2017-01-11 00:00:00 22
2 2017-01-12 00:00:00 2017-01-14 00:00:00 14
SQL Fiddle Link
Can someone please help me solve this issue...
Try this:
SELECT quantity,
MIN(valid_on) AS start_date, MAX(valid_on) AS end_date,
SUM(cost)
FROM (
SELECT id, cost, valid_on, quantity,
#rn := #rn + 1 AS rn,
#grn := IF(#q = quantity, #grn + 1,
IF(#q := quantity, 1, 1)) AS grp
FROM pricing
CROSS JOIN (SELECT #rn := 0, #q := 0, #grn := 0) AS vars
ORDER BY valid_on, quantity) AS t
GROUP BY rn - grp, quantity
The query uses variables in order to identify islands of consecutive records having the same quantity value. Using the computed grp value, it groups separately each island and calculates start/end dates, as well as the sum of cost.
Demo here
This is a pain to do in MySQL. You need to identify the groups. One method -- which is not particularly efficient -- uses a trick. For each row it counts the number of previous rows where the quantity is different from the given row. This identifies adjacent groups with the same value.
select quantity, sum(cost), min(valid_on) as start_valid_on
from (select p.*,
(select count(*)
from pricing p2
where p2.valid_on < p.valid_on and p2.quantity <> p.quantity
) as grp
from pricing p
) p
group by grp, quantity;

Use nested subquery to fetch value with multiple condition

I have a table named test_plan (id, unit, num)
I inserted some values
INSERT INTO `test_plan` (`id`, `unit`, `num`) VALUES
('1', '1', '12'),
('2', '1', '13'),
('3', '1', '14'),
('4', '1', '10'),
('5', '2', '10'),
('6', '2', '9'),
('7', '2', '-1'),
('8', '2', '-1'),
('9', '2', '-1'),
('10', '3', '-1'),
('11', '3', '-1'),
('12', '3', '-1');
I have to fetch unit what is fraction of each unit to total unit when num is not equals to -1
i.e.after run the query it display as unit 1 is 100% completed, unit 2 is 40% completed, unit 3 is 0% completed as row wise. I can count the number of each unit but not the how much it completed.
I tried JOIN for this
SELECT a.unit, numb / count(*) as frac FROM test_plan as a
LEFT OUTER JOIN (SELECT unit, count(num) as numb FROM test_plan where num != -1 group by unit) as b
ON a.unit = b.unit group by a.unit;
try this:
select unit,
(sum(case when num = -1 then 0 else 1 end) / count(*)) * 100 as pct_complete
from lecture_plan group by unit;
there's no need for a nested sub query, the combination of aggregation and the case statement is sufficient

SQL query to show top x records with evenly distributed values

I have a database of contacts at companies. Multiple contacts per company in different departments. Each company has turnover and industry data attached to it.
I need to write a query that shows the top 10 most recently added contacts (unix timestamp) but i don't want it to be all Marketing contacts (even if the top 10 are), i would like to look at the top 100 instead and get 10 contacts out that are from different departments. So instead of the top 10 being all marketing, there might be 2 marketing, 2 I.T, 2 HR, 2 Personnel.
So my query basically is this:
SELECT DISTINCT `surname`, `job_title`, `company_name`
FROM (`company_database`)
WHERE `employee_code` IN ('6', '7', '8', '9', '10', '11', '12', '13')
AND `turnover_code` IN ('5', '6', '7', '8')
AND `contact_code` IN ('16', '17', '26', '27', '9', '10', '30', '31', '23', '24', '12', '13') AND `industry_code` NOT IN ('22', '17', '35', '36') LIMIT 10
But that simply returns a unique row. What i need is one contact per company and no more than 1 contact_code type. I also only want 10 rows returned, but obviously to get this 1 per contact code per row, the query will need to look at more than 10.
Is this possible in just a query? Or should i do something programatically to apply the logic needed to whittle down the results of a query.
you can work with a temporary table using the myisam engine and a trick.
If you create the following temporary table:
create table tmp_company_sequence
( surname varchar(255)
,job_title varchar(255)
,company_name varchar(255)
,date_added date
,contact_code int
,counter int auto_increment
,primary key (contact_code,counter)
);
Now
insert into `tmp_company_sequence`( `surname`, `job_title`, `company_name`,`contact_code`,`date_added`)
SELECT DISTINCT `surname`, `job_title`, `company_name`,`contact_code`,`date_added`
FROM (`company_database`)
WHERE `employee_code` IN ('6', '7', '8', '9', '10', '11', '12', '13')
AND `turnover_code` IN ('5', '6', '7', '8')
AND `contact_code` IN ('16', '17', '26', '27', '9', '10', '30', '31', '23', '24', '12', '13') AND `industry_code` NOT IN ('22', '17', '35', '36')
order by contact_code, added_date desc;
Your temporary table will now hold all the contacts with a counter. The counter is increased for every contact of the same contact_code. SO the newest contact with a certain contact code will have counter = 1, the next recent will have counter = 2 and so on.
You can now do a
select *
from tmp_company_sequence
order by counter asc, date_added desc
limit 10;
This will give you a list of the latest contacts added over all contact_codes.
Edit:
I just realised this could be done with a single query, but it is even more ugly:
SELECT `surname`
, `job_title`
, `company_name`
, `contact_code`
FROM(
SELECT
`surname`
, `job_title`
, `company_name`
, `contact_code`
, `date_added`
, IF(contact_code = #prev_contact_code,#i:=#i+1,#i:=1) AS counter
, #prev_contact_code = contact_code
FROM
(`company_database`)
,(SELECT #i := 1)
WHERE `employee_code` IN ('6', '7', '8', '9', '10', '11', '12', '13')
AND `turnover_code` IN ('5', '6', '7', '8')
AND `contact_code` IN (
'16'
, '17'
, '26'
, '27'
, '9'
, '10'
, '30'
, '31'
, '23'
, '24'
, '12'
, '13'
)
AND `industry_code` NOT IN ('22', '17', '35', '36')
ORDER BY contact_code
, added_date DESC) sub
WHERE counter = 1
ORDER BY added_date DESC
LIMIT 10;
This does basically the same as the option with the temporary table, but it creates the counter in the fly by storing data from the previous column in global variables. It is messy but can be used within a single query.