sum a value on another distinct value - mysql

after doing some joins I got a non-grouped result/view like:
id | from | to | who | group | ...
1 | 2012-01-01 12:00:00 | 2012-01-01 14:00:00 | adam | sales
2 | 2012-01-01 12:00:00 | 2012-01-01 15:00:00 | bertil | sales
2 | 2012-01-01 12:00:00 | 2012-01-01 15:00:00 | bertil | admin
...
the result is going to be grouped by DATE(from)
but I'm also interested in the time-length (UNIX_TIMESTAMP(to) - UNIX_TIMESTAMP(from))/3600
and I want a sum of it grouped by day, but i only want each id to count once
can I do somthing like:
SUM((UNIX_TIMESTAMP(to) - UNIX_TIMESTAMP(from))/3600 DISTINCT id)
or maybe use a variable and a IF()?
or do i have to do a subquery of by big query grouped by id and then the outer query grouped by date?
UPDATE 1 clarify what you expect to see as an answer
the result above is the times people worked, and what groups they was members of,
after filtering out the groups that is interesting at the moment,
I want to know how much mantime that was spent that day,
for the example data: its 2 hours for adam + 3 hours for bertil = 5 hours,
but bertil is a member of 2 of the interesting groups, so his time is shown twice
UPDATE 2 provide a little more data
Above was a try to of an generalization of this query
SELECT
worktimes.date AS 'Datum',
COUNT(DISTINCT employments.citizen_id) AS 'Säljare',
SUM(UNIX_TIMESTAMP(TIMESTAMP(worktimes.date, worktimes.death)) - UNIX_TIMESTAMP(TIMESTAMP(worktimes.date, worktimes.birth))) AS 'Mantid',
COUNT(DISTINCT agreements.agreement_id) AS 'Avtal',
COUNT(DISTINCT IF(agreement_status.status_id = 57, agreements.agreement_id, NULL)) AS 'Godkända',
COUNT(DISTINCT IF(agreement_status.status_id = 3, agreements.agreement_id, NULL)) AS 'Ånger',
COUNT(DISTINCT IF(agreement_status.status_id = 4, agreements.agreement_id, NULL)) AS 'Makuleringar',
COUNT(DISTINCT IF(agreement_status.status_id IS NULL, agreements.agreement_id, NULL)) AS 'Övrigt'
FROM worktimes
LEFT JOIN employments USING (employment_id)
LEFT JOIN membership_cache ON (
membership_cache.target_type = 34 AND
membership_cache.target_id IN (136, 138) AND
membership_cache.member_type = 11 AND
membership_cache.member_id = employments.citizen_id AND
DATE(membership_cache.birth) <= worktimes.date AND
(membership_cache.death IS NULL OR worktimes.date < DATE(membership_cache.death))
)
LEFT JOIN agreements ON (
agreements.citizen_id = employments.citizen_id AND
agreements.project_id = 20 AND
agreements.date >= 20110101 AND
DATE(agreements.date) = worktimes.date
)
LEFT JOIN agreement_status ON (
agreements.agreement_id = agreement_status.agreement_id AND
agreement_status.value = 1 AND
agreement_status.death IS NULL AND
agreement_status.status_id IN (3, 4, 57)
)
WHERE
worktimes.death IS NOT NULL AND
membership_cache.member_id IS NOT NULL AND
worktimes.date >= 20110101
GROUP BY DATE(agreements.date)
mysql EXPLAIN of above query
+----+-------------+------------------+-------+------------------------------------------------------+------------------------+---------+-------------------------------+------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+-------+------------------------------------------------------+------------------------+---------+-------------------------------+------+-----------------------------------------------------------+
| 1 | SIMPLE | membership_cache | range | target,member | target | 10 | NULL | 85 | Using where; Using index; Using temporary; Using filesort |
| 1 | SIMPLE | employments | ref | employment_id,citizen | citizen | 8 | db.membership_cache.member_id | 1 | Using where; Using index |
| 1 | SIMPLE | worktimes | ref | employment,date,death | employment | 8 | db.employments.employment_id | 34 | Using where |
| 1 | SIMPLE | agreements | ref | project,date,project_customer,agreements_per_citizen | agreements_per_citizen | 8 | db.employments.citizen_id | 36 | |
| 1 | SIMPLE | agreement_status | ref | agreement,status_birth,status_death | agreement | 8 | db.agreements.agreement_id | 1 | |
+----+-------------+------------------+-------+------------------------------------------------------+------------------------+---------+-------------------------------+------+-----------------------------------------------------------+
And the problem is that SUM(UNIX_TIMESTAMP(TIMESTAMP(worktimes.date, worktimes.death)) - UNIX_TIMESTAMP(TIMESTAMP(worktimes.date, worktimes.birth))) AS 'Mantid' is getting multiplied by the number of matching rows in the tables: membership_cache, agreements and agreement_status

Related

Get the newest record from MySQL from 2 tables more optimalized [duplicate]

This question already has answers here:
MySQL join two table with the maximum value on another field
(3 answers)
Closed 2 years ago.
I have some problems with query in SQL.
I have 2 tables.
people
+----+--------+------+
| id | name | val2 |
+----+--------+------+
| 1 | john | 12 |
| 2 | adam | 5 |
| 3 | alfred | 3 |
+----+--------+------+
data
+----+----+----+-----+---------------------+
| id | v1 | v2 | v3 | date |
+----+----+----+-----+---------------------+
| 1 | 4 | 15 | 18 | 2020-10-16 11:15:53 |
| 1 | 2 | 12 | 17 | 2020-10-16 11:22:53 |
| 1 | 3 | 13 | 16 | 2020-10-16 11:32:53 |
| 2 | 1 | 16 | 15 | 2020-10-16 13:22:53 |
| 2 | 3 | 13 | 25 | 2020-10-16 13:42:53 |
| 2 | 4 | 12 | 35 | 2020-10-16 14:12:53 |
| 3 | 1 | 21 | 12 | 2020-10-16 14:12:53 |
| 3 | 2 | 28 | 42 | 2020-10-16 15:12:53 |
| 3 | 4 | 30 | 72 | 2020-10-16 16:12:53 |
+----+----+----+-----+---------------------+
I need to get in one table ID, NAME, v1,v2,v3,date for the new date to all object from first table
something like this:
RESULT
+----+--------+----+----+-----+---------------------+
| id | name | v1 | v2 | v3 | date |
+----+--------+----+----+-----+---------------------+
| 1 | john | 3 | 13 | 16 | 2020-10-16 11:32:53 |
| 2 | adam | 4 | 12 | 35 | 2020-10-16 14:12:53 |
| 3 | alfred | 4 | 30 | 72 | 2020-10-16 16:12:53 |
+----+--------+----+----+-----+---------------------+
I need the newest record from SECOND TABLE for all people from first table.
I try do it by this query:
SELECT people.id,
people.name,
data.v1,
data.v2,
data.v3,
max(data.date)
FROM people
JOIN DATA ON people.id = data.id
GROUP BY people.id
I got the newest data but v1, v2, v3 is random from table.
You want entire rows from data, so aggregation is not an option here. In most databases, your query would fail, because the select and group by clause are not consistent... But MySQL, somehow unfortunaltely, gives you enough rope to developers to to hang themselves with. Your query runs (if sql mode ONLY_FULL_GROUP_BY is disabled), but is actually equivalent to:
SELECT people.id, people.name, ANY_VALUE(data.v1), ANY_VALUE(data.v2), ANY_VALUE(data.v3), MAX(data.date)
FROM people
JOIN data on people.id = data.id
GROUP BY people.id
Now it is plain to see that the database gives you any value of data rows that match the join condition - which may, or may not belong to the row that has the latest date.
Instead of grouping, you actually need to filter. One option uses a subquery:
select p.id, p.name, d.v1, d.v2, d.v3, d.date
from people p
inner join data d on d.id = p.id
where d.date = (select max(d1.date) from data d1 where d1.id = d.id)
The upside of this approach is that it works in all versions of MySQL, including pre-8.0, where window functions are not available.
One simple method uses window functions:
SELECT p.id, p.name, d.v1, d.v2, d.v3, d.date)
FROM people p JOIN
(SELECT d.*,
ROW_NUMBER() OVER (PARTITION BY d.id ORDER BY d.date DESC) as seqnum
FROM data d
) d
ON p.id = d.id AND d.seqnum = 1;
Note: It seems strange that the join column in data would be id. I would expect it to be called something like people_id.

Query with dynamic date intervals

Given a statuses table that holds information about products availability, how do I select the date that corresponds to the 1st day in the latest 20 days that the product has been active?
Yes I know the question is hard to follow. I think another way to put it would be: I want to know how many times each product has been sold in the last 20 days that it was active, meaning the product could have been active for years, but I'd only want the sales count from the latest 20 days that it had a status of "active".
It's something easily doable in the server-side (i.e. getting any collection of products from the DB, iterating them, performing n+1 queries on the statuses table, etc), but I have hundreds of thousands of items so it's imperative to do it in SQL for performance reasons.
table : products
+-------+-----------+
| id | name |
+-------+-----------+
| 1 | Apple |
| 2 | Banana |
| 3 | Grape |
+-------+-----------+
table : statuses
+-------+-------------+---------------+---------------+
| id | name | product_id | created_at |
+-------+-------------+---------------+---------------+
| 1 | active | 1 | 2018-01-01 |
| 2 | inactive | 1 | 2018-02-01 |
| 3 | active | 1 | 2018-03-01 |
| 4 | inactive | 1 | 2018-03-15 |
| 6 | active | 1 | 2018-04-25 |
| 7 | active | 2 | 2018-03-01 |
| 8 | active | 3 | 2018-03-10 |
| 9 | inactive | 3 | 2018-03-15 |
+-------+-------------+---------------+---------------+
table : items (ordered products)
+-------+---------------+-------------+
| id | product_id | order_id |
+-------+---------------+-------------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 1 | 3 |
| 4 | 1 | 4 |
| 5 | 1 | 5 |
| 6 | 2 | 3 |
| 7 | 2 | 4 |
| 8 | 2 | 5 |
| 9 | 3 | 5 |
+-------+---------------+-------------+
table : orders
+-------+---------------+
| id | created_at |
+-------+---------------+
| 1 | 2018-01-02 |
| 2 | 2018-01-15 |
| 3 | 2018-03-02 |
| 4 | 2018-03-10 |
| 5 | 2018-03-13 |
+-------+---------------+
I want my final results to look like this:
+-------+-----------+----------------------+--------------------------------+
| id | name | recent_sales_count | date_to_start_counting_sales |
+-------+-----------+----------------------+--------------------------------+
| 1 | Apple | 3 | 2018-01-30 |
| 2 | Banana | 0 | 2018-04-09 |
| 3 | Grape | 1 | 2018-03-10 |
+-------+-----------+----------------------+--------------------------------+
So this is what I mean by latest 20 active days for e.g. Apple:
It was last activated at '2018-04-25'. That's 4 days ago.
Before that, it was inactive since '2018-03-15', so all these days until '2018-04-25' don't count.
Before that, it was active since '2018-03-01'. That's more 14 days until '2018-03-15'.
Before that, inactive since '2018-02-01'.
Finally, it was active since '2018-01-01', so it should only count the missing 2 days (4 + 14 + 2 = 20) backwards from '2018-02-01', resulting in date_to_start_counting_sales = '2018-01-30'.
With the '2018-01-30' date in hand, I'm then able to count Apple orders in the last 20 active days: 3.
Hope that makes sense.
Here is a fiddle with the data provided above.
I've got a standard SQL solution, that does not use any window function as you are on MySQL 5
My solution requires 3 stacked views.
It would have been better with a CTE but your version doesn't support it. Same goes for the stacked Views... I don't like to stack views and always try to avoid it, but sometimes you have no other choice, because MySQL doesn't accept subqueries in FROM clause for Views.
CREATE VIEW VIEW_product_dates AS
(
SELECT product_id, created_at AS active_date,
(
SELECT created_at
FROM statuses ti
WHERE name = 'inactive' AND ta.created_at < ti.created_at AND ti.product_id=ta.product_id
GROUP BY product_id
) AS inactive_date
FROM statuses ta
WHERE name = 'active'
);
CREATE VIEW VIEW_product_dates_days AS
(
SELECT product_id, active_date, inactive_date, datediff(IFNULL(inactive_date, SYSDATE()),active_date) AS nb_days
FROM VIEW_product_dates
);
CREATE VIEW VIEW_product_dates_days_cumul AS
(
SELECT product_id, active_date, ifnull(inactive_date,sysdate()) AS inactive_date, nb_days,
IFNULL((SELECT SUM(V2.nb_days) + V1.nb_days
FROM VIEW_product_dates_days V2
WHERE V2.active_date >= IFNULL(V1.inactive_date, SYSDATE()) AND V1.product_id=V2.product_id
),V1.nb_days) AS cumul_days
FROM VIEW_product_dates_days V1
);
The final view produce this :
| product_id | active_date | inactive_date | nb_days | cumul_days |
|------------|----------------------|----------------------|---------|------------|
| 1 | 2018-01-01T00:00:00Z | 2018-02-01T00:00:00Z | 31 | 49 |
| 1 | 2018-03-01T00:00:00Z | 2018-03-15T00:00:00Z | 14 | 18 |
| 1 | 2018-04-25T00:00:00Z | 2018-04-29T11:28:39Z | 4 | 4 |
| 2 | 2018-03-01T00:00:00Z | 2018-04-29T11:28:39Z | 59 | 59 |
| 3 | 2018-03-10T00:00:00Z | 2018-03-15T00:00:00Z | 5 | 5 |
So it aggregates all active periods of all products, it counts the number of days for each period, and the cumulative days of all past active periods since current date.
Then we can query this final view to get the desired date for each product. I set a variable for your 20 days, so you can change that number easily if you want.
SET #cap_days = 20 ;
SELECT PD.id, Pd.name,
SUM(CASE WHEN o.created_at > PD.date_to_start_counting_sales THEN 1 ELSE 0 END) AS recent_sales_count ,
PD.date_to_start_counting_sales
FROM
(
SELECT p.*,
(CASE WHEN LowerCap.max_cumul_days IS NULL
THEN ADDDATE(ifnull(HigherCap.min_inactive_date,sysdate()),(-#cap_days))
ELSE
CASE WHEN LowerCap.max_cumul_days < #cap_days AND HigherCap.min_inactive_date IS NULL
THEN ADDDATE(ifnull(LowerCap.max_inactive_date,sysdate()),(-LowerCap.max_cumul_days))
ELSE ADDDATE(ifnull(HigherCap.min_inactive_date,sysdate()),(LowerCap.max_cumul_days-#cap_days))
END
END) as date_to_start_counting_sales
FROM products P
LEFT JOIN
(
SELECT product_id, MAX(cumul_days) AS max_cumul_days, MAX(inactive_date) AS max_inactive_date
FROM VIEW_product_dates_days_cumul
WHERE cumul_days <= #cap_days
GROUP BY product_id
) LowerCap ON P.id=LowerCap.product_id
LEFT JOIN
(
SELECT product_id, MIN(cumul_days) AS min_cumul_days, MIN(inactive_date) AS min_inactive_date
FROM VIEW_product_dates_days_cumul
WHERE cumul_days > #cap_days
GROUP BY product_id
) HigherCap ON P.id=HigherCap.product_id
) PD
LEFT JOIN items i ON PD.id = i.product_id
LEFT JOIN orders o ON o.id = i.order_id
GROUP BY PD.id, Pd.name, PD.date_to_start_counting_sales
Returns
| id | name | recent_sales_count | date_to_start_counting_sales |
|----|--------|--------------------|------------------------------|
| 1 | Apple | 3 | 2018-01-30T00:00:00Z |
| 2 | Banana | 0 | 2018-04-09T20:43:23Z |
| 3 | Grape | 1 | 2018-03-10T00:00:00Z |
FIDDLE : http://sqlfiddle.com/#!9/804f52/24
Not sure which version of MySql you're working with, but if you can use 8.0, that version came out with a lot of functionality that makes things slightly more doable (CTE's, row_number(), partition, etc.).
My recommendation would be to create a view like in this DB-Fiddle Example, call the view on server side and iterate programatically. There are ways of doing it in SQL, but it'd be a bear to write, test and likely would be less efficient.
Assumptions:
Products cannot be sold during inactive date ranges
Statuses table will always alternate status active/inactive/active for each product. I.e. no date ranges where a certain product is both active and inactive.
View Results:
+------------+-------------+------------+-------------+
| product_id | active_date | end_date | days_active |
+------------+-------------+------------+-------------+
| 1 | 2018-01-01 | 2018-02-01 | 31 |
+------------+-------------+------------+-------------+
| 1 | 2018-03-01 | 2018-03-15 | 14 |
+------------+-------------+------------+-------------+
| 1 | 2018-04-25 | 2018-04-29 | 4 |
+------------+-------------+------------+-------------+
| 2 | 2018-03-01 | 2018-04-29 | 59 |
+------------+-------------+------------+-------------+
| 3 | 2018-03-10 | 2018-03-15 | 5 |
+------------+-------------+------------+-------------+
View:
CREATE OR REPLACE VIEW days_active AS (
WITH active_rn
AS (SELECT *, Row_number()
OVER ( partition BY NAME, product_id
ORDER BY created_at) AS rownum
FROM statuses
WHERE name = 'active'),
inactive_rn
AS (SELECT *, Row_number()
OVER ( partition BY NAME, product_id
ORDER BY created_at) AS rownum
FROM statuses
WHERE name = 'inactive')
SELECT x1.product_id,
x1.created_at AS active_date,
CASE WHEN x2.created_at IS NULL
THEN Curdate()
ELSE x2.created_at
END AS end_date,
CASE WHEN x2.created_at IS NULL
THEN Datediff(Curdate(), x1.created_at)
ELSE Datediff(x2.created_at,x1.created_at)
END AS days_active
FROM active_rn x1
LEFT OUTER JOIN inactive_rn x2
ON x1.rownum = x2.rownum
AND x1.product_id = x2.product_id ORDER BY
x1.product_id);

MySQL Subselect Issue with two tables and aggregate functions into single query

I have 2 tables
Transaction table
+----+----------+-----+---------+----
| TID | CampaignID | DATE |
+----+----------+-----+---------+---+
| 1 | 5 | 2016-01-01 |
| 2 | 5 | 2016-01-01 |
| 3 | 2 | 2016-01-01 |
| 4 | 5 | 2016-01-01 |
| 5 | 1 | 2016-01-01 |
| 6 | 1 | 2016-02-02 |
| 7 | 3 | 2016-02-02 |
| 8 | 3 | 2016-02-02 |
| 9 | 5 | 2016-02-02 |
| 10| 4 | 2016-02-02 |
+----+----------+-----+---------+---+
Campaign Table
+-------------+----------------+--------------------
| CampaignID | DailyMaxImpressions | CampaignActive
+-------------+----------------+--------------------
| 1 | 5 | Y |
| 2 | 5 | Y |
| 3 | 5 | Y |
| 4 | 5 | Y |
| 5 | 1 | Y |
+-------------+----------------+--------------------
What I am trying to do is get a single random campaign where the the count in transaction table is less than the daily max impressions in the campaign table. I might also be passing a date s part of the query for the transaction table
So for CampaignId 1 there must be 4 trans of less in the transaction table and the Campaignactive must be a "Y"
Any help would be appreciated if this can be done in a single statement. ( mysql )
Thanks in advance,
Jeff Godstein
This should get it for you. The basic query is select each campaign that is active. The INNER query will pre-aggregate per campaign for the given date in question. From that, a LEFT-JOIN allows any campaign to be returned even if it does NOT exist within the subquery OR it DOES exist, but the count is less than that allowed for the date in question. The order by RAND() is obvious.
SELECT
c.CampaignID
from
Campaign c
LEFT JOIN
( select
t1.CampaignID,
count(*) as CampCount
from
Transaction t1
where
t1.Date = YourDateParameterValue
group by
t1.CampaignID ) as T
ON c.CampaignID = T.CampaignID
where
c.CampaignActive = 'Y'
AND ( t.CampaignID IS NULL
OR t.CampCount < c.DailyMaxImpressions )
order by
RAND()

Loop through each record of a table and preform calculations on all other records

I want to calculate a value for NO_TOP_RATING in my table working
The calculation for NO_TOP_RATING is made by:
For each row, get all other rows that fall within the previous year from ANNDATS_CONVERTED for that record, and have the same ESTIMID as that record.
From those, find the lowest IRECCD value.
Then, count the number of times that the same ANALYST has an IRECCD that matches the lowest IRECCD calculated.
NOTE: This should omit the current row being calculated (so to find the value for row id 1, do not use this row in the calculations) and any records where ANALYST is blank should be ignored altogether.
TABLE working:
| ID | ANALYST | ESTIMID | ANNDATS_CONVERTED | IRECCD | NO_TOP_RATING |
---------------------------------------------------------------------------------
| 1 | DAVE | Brokerage000 | 1998-07-01 | 2 | |
| 2 | DAVE | Brokerage000 | 1998-06-28 | 2 | |
| 3 | DAVE | Brokerage000 | 1998-07-02 | 4 | |
| 4 | DAVE | Brokerage000 | 1998-07-04 | 3 | |
| 5 | SAM | Brokerage000 | 1998-06-14 | 1 | |
| 6 | SAM | Brokerage000 | 1998-06-28 | 4 | |
| 7 | | Brokerage000 | 1998-06-28 | 1 | |
| 8 | DAVE | Brokerage111 | 1998-06-28 | 5 | |
So - when calculating NO_TOP_RATING for record #1:
record #1 would not be included in the calculation, because I want to omit it from the calculation
record #7 would not be included in the calculation at all, because ANALYST is blank
Record #8 would not be included in the calculation, because ESTIMID is not the same as record #1
EXPECTED RESULT:
TABLE working:
| ID | ANALYST | ESTIMID | ANNDATS_CONVERTED | IRECCD | NO_TOP_RATING |
---------------------------------------------------------------------------------
| 1 | DAVE | Brokerage000 | 1998-07-01 | 2 | 0 |
| 2 | DAVE | Brokerage000 | 1998-06-28 | 2 | 0 |
| 3 | DAVE | Brokerage000 | 1998-07-02 | 4 | 0 |
| 4 | DAVE | Brokerage000 | 1998-07-04 | 3 | 0 |
| 5 | SAM | Brokerage000 | 1998-06-14 | 1 | 0 |
| 6 | SAM | Brokerage000 | 1998-06-28 | 4 | 1 |
| 7 | | Brokerage000 | 1998-06-28 | 1 | |
| 8 | DAVE | Brokerage111 | 1998-06-28 | 5 | 0 |
Here is the MySQL I have so far:
UPDATE `working`
SET `working`.`NO_TOP_RATING` =
(
SELECT COUNT(`ID`) FROM (SELECT `ID`,`IRECCD`,`ESTIMID` FROM `working`) AS BB
WHERE
`IRECCD` =
(
SELECT COUNT(`ID`) FROM (SELECT `ID`,`IRECCD`,`ESTIMID`, `ANALYST` FROM `working`) AS ZZ
WHERE
`IRECCD` =
-- this calculates the LOWEST number with same `ESTIMID`
(
SELECT MIN(`IRECCD`)
FROM (SELECT `ID`,`IRECCD`,`ANNDATS_CONVERTED`,`ESTIMID` FROM `working`) AS CC
WHERE
`ANNDATS_CONVERTED` >= DATE_SUB(`ANNDATS_CONVERTED`,INTERVAL 1 YEAR)
AND
`working`.`ESTIMID` = BB.`ESTIMID`
)
-- END this calculates the LOWEST number with same `ESTIMID`
AND
`working`.`ANALYST` = ZZ.`ANALYST`
)
)
WHERE `working`.`ANALYST` != ''
This is working in PHP, looping through each record and evaluating all the other records for each. This involves looping and takes a very long time on a large database. I am trying to achieve the same result with MySQL.
I took a few steps to solve this. The first thing I did was write a JOIN that got all of the rows I needed. I joined the table to itself on several conditions:
The estimid matched
The id value was not the same
The analyst column was not null in either table
The anndats_converted of one table was within the previous year of the other table.
To test, I selected the id from both tables to make sure I was getting proper pairings:
SELECT w.id, wo.id
FROM working w
JOIN working wo
ON w.estimid = wo.estimid
AND w.id != wo.id
AND w.analyst IS NOT NULL
AND wo.analyst IS NOT NULL
AND wo.anndats_converted BETWEEN DATE_SUB(w.anndats_converted, INTERVAL 1 YEAR) AND w.anndats_converted
ORDER BY w.id;
A brief result set showed the following pairings:
| id | id |
+----+----+
| 1 | 2 |
| 1 | 5 |
| 1 | 6 |
| 2 | 5 |
| 2 | 6 |
This seems to match what you wanted. For id #1, row 1 is excluded (because it is being calculated) rows 3 and 4 do not fall in the proper date range, row 7 is null and row 8 is a different estimid.
Then, I used an aggregate function to calculate the minimum ireccd by grouping by the first table:
SELECT w.id, w.analyst, MIN(wo.ireccd) AS min_ireccd
FROM working w
JOIN working wo
ON w.estimid = wo.estimid
AND w.id != wo.id
AND w.analyst IS NOT NULL
AND wo.analyst IS NOT NULL
AND wo.anndats_converted BETWEEN DATE_SUB(w.anndats_converted, INTERVAL 1 YEAR) AND w.anndats_converted
GROUP BY w.id;
The next part was also tricky so I'll explain it in two steps. I joined the above query with the original table, with the only condition that the analyst column matched. What this did was create a Cartesian Product, in a way. The query looked like this:
SELECT *
FROM working w
LEFT JOIN(
SELECT w.id, w.analyst, MIN(wo.ireccd) AS min_ireccd
FROM working w
LEFT JOIN working wo
ON w.estimid = wo.estimid
AND w.id != wo.id
AND w.analyst IS NOT NULL
AND wo.analyst IS NOT NULL
AND wo.anndats_converted BETWEEN DATE_SUB(w.anndats_converted, INTERVAL 1 YEAR) AND w.anndats_converted
GROUP BY w.id) temp ON temp.analyst = w.analyst;
And I saw all possible pairings for each person, like this:
| id | analyst | ireccd | id | analyst | min_ireccd |
+----+---------+--------+----+---------+------------+
| 1 | DAVE | 2 | 8 | DAVE | null |
| 1 | DAVE | 2 | 4 | DAVE | 1 |
| 1 | DAVE | 2 | 1 | DAVE | 1 |
| 1 | DAVE | 2 | 2 | DAVE | 1 |
| 1 | DAVE | 2 | 3 | DAVE | 1 |
Notice that compares the first DAVE with all other rows of DAVE in the table. ALSO NOTE I changed the above inner query to include an outer join so that all rows were considered. If there was nothing to calculate, the min_ireccd would be null.
The last thing I did was use that result set, and count the number of times the ireccd matched the min_ireccd. I grouped by id, so in the above sample set, it never matches, so the count would be 0. Here is the final query. It leaves null values (row 7) as null because that's what your expected results show:
SELECT w.*, SUM(w.ireccd = temp.min_ireccd) AS NO_TOP_RATING
FROM working w
LEFT JOIN(
SELECT w.id, w.analyst, MIN(wo.ireccd) AS min_ireccd
FROM working w
LEFT JOIN working wo
ON w.estimid = wo.estimid
AND w.id != wo.id
AND w.analyst IS NOT NULL
AND wo.analyst IS NOT NULL
AND wo.anndats_converted BETWEEN DATE_SUB(w.anndats_converted, INTERVAL 1 YEAR) AND w.anndats_converted
GROUP BY w.id) temp ON temp.analyst = w.analyst
GROUP BY w.id;
These are the results I got:

SQL reduce number of columns in inner query

I have a query:
select
count(*), paymentOptionId
from
payments
where
id in (select min(reportDate), id
from payments
where userId in (select distinct userId
from payments
where paymentOptionId in (46,47,48,49,50,51,52,53,54,55,56))
group by userId)
group by
paymentOptionId;
The problem place is "select min(reportDate), id", this query must return 1 column result, but I can't realize how to do it while I need to group min.
The data set looks like
+----+--------+--------+-----------+---------------------+--------+----------+-----------------+
| id | userId | amount | userLevel | reportDate | buffId | bankQuot | paymentOptionId |
+----+--------+--------+-----------+---------------------+--------+----------+-----------------+
| 9 | 12012 | 5 | 5 | 2014-02-10 23:07:57 | NULL | NULL | 2 |
| 10 | 12191 | 5 | 6 | 2014-02-10 23:52:12 | NULL | NULL | 2 |
| 11 | 12295 | 5 | 6 | 2014-02-11 00:12:04 | NULL | NULL | 2 |
| 12 | 12295 | 5 | 6 | 2014-02-11 00:12:42 | NULL | NULL | 2 |
| 13 | 12256 | 5 | 6 | 2014-02-11 00:26:25 | NULL | NULL | 2 |
| 14 | 12256 | 5 | 6 | 2014-02-11 00:26:35 | NULL | NULL | 2 |
| 16 | 12510 | 5 | 5 | 2014-02-11 00:42:58 | NULL | NULL | 2 |
| 17 | 12510 | 5 | 5 | 2014-02-11 00:43:08 | NULL | NULL | 2 |
| 18 | 12510 | 18 | 5 | 2014-02-11 00:45:16 | NULL | NULL | 3 |
| 19 | 12510 | 5 | 6 | 2014-02-11 01:00:10 | NULL | NULL | 2 |
+----+--------+--------+-----------+---------------------+--------+----------+-----------------+
select count(*), paymentOptionId
from
(select userId, min(reportdate), paymentOptionId
from payments as t1
group by userId, paymentOptionId) as t2
group by paymentOptionId
Fiddle
It first gets the minimum report date (so the first entry) for every user, for every type (so there are two records for a user who has 2 types) and then counts them grouping by type (aka paymentOptionId).
By the way, you can of course cut the attributes chosen in select in from clause, they are only there so you can copy-paste it and see the results it is giving step by step.
You seem to want to report on various payment options and their counts for the earliest ReportDate for each user.
If so, here is an alternative approach
select p.paymentOptionId, count(*)
from payments p
where paymentOptionId in (46,47,48,49,50,51,52,53,54,55,56) and
not exists (select 1
from payments p2
where p2.userId = p.userId and
p2.ReportDate < p.ReportDate
)
group by paymentOptionId;
This isn't exactly the same as your query, because this will only report on the list of payment types, whereas you might want the first payment type for anyone who has ever had one of these types. I'm not sure which you want, though.