MySQL select count only new id's for each year - mysql

I have a MySQL table that looks like this
id | client_id | date
--------------------------------------
1 | 12 | 02/02/2008
2 | 15 | 12/06/2008
3 | 23 | 11/12/2008
4 | 12 | 18/01/2009
5 | 12 | 03/03/2009
6 | 18 | 02/07/2009
7 | 23 | 08/09/2010
8 | 18 | 02/10/2010
9 | 21 | 30/11/2010
What I am trying to do is get the number of new clients for each year. 2008 has 3 new clients(12,15,23), 2009 has 1 new client(18) and 2010 has 1 new client(21).
So far I have this query that gives me the distinct clients for each year, that is 3 for 2008, 2 for 2009 and 3 for 2010.
SELECT COUNT(DISTINCT client_id) FROM table GROUP BY YEAR(date)
Any help would be appreciated..

You could use a subquery to get the first year of every client_id grouped by client_id, and then count the occurrence of client_id grouped by year, so:
SELECT COUNT(client_id), YEAR_MIN FROM (
SELECT client_id, MIN(YEAR(date)) AS YEAR_MIN
FROM table
GROUP BY client_id) AS T
GROUP BY YEAR_MIN
SQL Fiddle here

So you want to count the first date a client appears in the table. In other words, the row for which no other row exists with an earlier date and the same client. You can do this with an exclusion join.
Then you can count them per year as you're doing now.
SELECT YEAR(t.date) AS yr, COUNT(t.client_id) AS client_count
FROM (
SELECT t1.client_id, t1.date
FROM mytable AS t1
LEFT JOIN mytable AS t2 ON (t1.client_id=t2.client_id AND t1.date > t2.date)
WHERE t2.client_id IS NULL) AS t
GROUP BY yr
You should store dates using the DATE data type, which uses YYYY-MM-DD format. You won't be able to do > comparisons if your dates are stored as strings in DD-MM-YYYY format.

DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id SERIAL PRIMARY KEY
,client_id INT NOT NULL
,date INT NOT NULL
);
INSERT INTO my_table VALUES
(1,12,2008),
(2,15,2008),
(3,23,2008),
(4,12,2009),
(5,12,2009),
(6,18,2009),
(7,23,2010),
(8,18,2010),
(9,21,2010);
SELECT year
, COUNT(*) total
FROM
( SELECT client_id, MIN(date) year FROM my_table GROUP BY client_id ) x
GROUP
BY year;
+------+-------+
| year | total |
+------+-------+
| 2008 | 3 |
| 2009 | 1 |
| 2010 | 1 |
+------+-------+

Related

Select last inserted value of each month for every year from DATETIME

I got a DATETIME to store when the values where introduced, like this example shows:
CREATE TABLE IF NOT EXISTS salary (
change_id INT(11) NOT NULL AUTO_INCREMENT,
emp_salary FLOAT(8,2),
change_date DATETIME,
PRIMARY KEY (change_id)
);
I gonna fill the example like this:
+-----------+------------+---------------------+
| change_id | emp_salary | change_date |
+-----------+------------+---------------------+
| 1 | 200.00 | 2018-06-18 13:17:17 |
| 2 | 700.00 | 2018-06-25 15:20:30 |
| 3 | 300.00 | 2018-07-02 12:17:17 |
+-----------+------------+---------------------+
I want to get the last inserted value of each month for every year.
So for the example I made, this should be the output of the Select:
+-----------+------------+---------------------+
| change_id | emp_salary | change_date |
+-----------+------------+---------------------+
| 2 | 700.00 | 2018-06-25 15:20:30 |
| 3 | 300.00 | 2018-07-02 12:17:17 |
+-----------+------------+---------------------+
1 won't appear because is an outdated version of 2
You could use a self join to pick group wise maximum row, In inner query select max of change_date by grouping your data month and year wise
select t.*
from your_table t
join (
select max(change_date) max_change_date
from your_table
group by date_format(change_date, '%Y-%m')
) t1
on t.change_date = t1.max_change_date
Demo
If you could use Mysql 8 which has support for window functions you could use common table expression and rank() function to pick row with highest change_date for each year and month
with cte as(
select *,
rank() over (partition by date_format(change_date, '%Y-%m') order by change_date desc ) rnk
from your_table
)
select * from cte where rnk = 1;
Demo
The below query should work for you.
It uses group by on month and year to find max record for each month and year.
SELECT s1.*
FROM salary s1
INNER JOIN (
SELECT MAX(change_date) maxDate
FROM salary
GROUP BY MONTH(change_date), YEAR(change_date)
) s2 ON s2.maxDate = s1.change_date;
Fiddle link : http://sqlfiddle.com/#!9/1bc20b/15

Get the count() where created_date is cumulative and date based

I'm aware that there are several answers on SO about cumulative totals. I have experimented and have not found a solution to my problem.
Here is a sqlfiddle.
We have a contacts table with two fields, eid and create_time:
eid create_time
991772 April, 21 2016 11:34:21
989628 April, 17 2016 02:19:57
985557 April, 04 2016 09:56:39
981920 March, 30 2016 11:03:12
981111 March, 30 2016 09:36:48
I would like to select the number of new contacts in each month along with the size of our contacts database at the end of each month. New contacts by year and month is simple enough. For the size of the contacts table at the end of each month I did some research and found what looked to be a straight forwards method:
set #csum = 0;
select
year(c.create_time) as yr,
month(c.create_time) as mth,
count(c.eid) as new_contacts,
(#csum + count(c.eid)) as cumulative_contacts
from
contacts c
group by
yr,
mth
That runs but gives me unexpected results.
If I run:
select count(*) from contacts where date(create_time) < current_date
I get the total number of records in the table 146.
I therefore expected the final row in my query using #csum to have 146 for April 2016. It has only 3?
What my goal is for field cumulative_contacts:
For the record with e.g. January 2016.
select count(*) from contacts where date(create_time) < '2016-02-01';
And the record for February would have:
select count(*) from contacts where date(create_time) < '2016-03-01';
And so on
Try this, a bit of modification from your sql;)
CREATE TABLE IF NOT EXISTS `contacts` (
`eid` char(50) DEFAULT NULL,
`create_time` timestamp NULL DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT;
INSERT INTO `contacts` (`eid`, `create_time`) VALUES
('991772', '2016-04-21 11:34:21'),
('989628', '2016-04-17 02:19:57'),
('985557', '2016-04-04 09:56:39'),
('981920', '2016-03-30 11:03:12'),
('981111', '2016-03-30 09:36:48');
SET #csum = 0;
SELECT t.*, #csum:=(#csum + new_contacts) AS cumulative_contacts
FROM (
SELECT YEAR(c.create_time) AS yr, MONTH(c.create_time) AS mth, COUNT(c.eid) AS new_contacts
FROM contacts c
GROUP BY yr, mth) t
Output results is
| yr | mth | new_contacts | cumulative_contacts |
------ ----- -------------- ---------------------
| 2016 | 3 | 2 | 2 |
| 2016 | 4 | 3 | 5 |
This sql will get the cumulative sum and is pretty efficient. It numbers each row first and then uses that as the cumulative sum.
SELECT s1.yr, s1.mth, s1.new_contacts, s2.cummulative_contacts
FROM
(SELECT
YEAR(create_time) AS yr,
MONTH(create_time) AS mth,
COUNT(eid) AS new_contacts,
MAX(eid) AS max_eid
FROM
contacts
GROUP BY
yr,
mth
ORDER BY create_time) s1 INNER JOIN
(SELECT eid, (#sum:=#sum+1) AS cummulative_contacts
FROM
contacts INNER JOIN
(SELECT #sum := 0) r
ORDER BY create_time) s2 ON max_eid=s2.eid;
--Result sample--
| yr | mth | new_contacts | cumulative_contacts |
|------|-----|--------------|---------------------|
| 2016 | 1 | 4 | 132 |
| 2016 | 2 | 4 | 136 |
| 2016 | 3 | 7 | 143 |
| 2016 | 4 | 3 | 146 |
Try this: fiddele
Here you have a "greater than or equal" join, so each group "contains" all previous values. Times 12 part, converts the hole comparation to months. I did offer this solution as it is not MySql dependant. (can be implemented on many other DBs with minimun or no changes)
select dates.yr, dates.mth, dates.new_contacts, sum(NC.new_contacts) as cumulative_new_contacts
from (
select
year(c.create_time) as yr,
month(c.create_time) as mth,
count(c.eid) as new_contacts
from
contacts c
group by
year(c.create_time),
month(c.create_time)
) as dates
left join
(
select
year(c.create_time) as yr,
month(c.create_time) as mth,
count(c.eid) as new_contacts
from
contacts c
group by
year(c.create_time),
month(c.create_time)
) as NC
on dates.yr*12+dates.mth >= NC.yr*12+NC.mth
group by
dates.yr,
dates.mth,
dates.new_contacts -- not needed by MySql, present here for other DBs compatibility
order by 1,2

How to query avg for every past 7 days in sql, MySQL?

Say I have a dataset of :
|dateid | value |
|20150101 | 1 |
|20150102 | 2 |
|20150103 | 3.1 |
|20150104 | 4.3 |
|20150105 | 3.1 |
|20150106 | 1 |
|20150107 | 1 |
|20150108 | 1 |
|.... | |
|.... | ... |
|20151001 | 10.3|
I want to query the average of every past 7 days based on a date range.
say for dateid from 20150707 and 20150730, when I select row of 20150707, I also need the average value between 20150701 and 20150707( (1+2+3.1+4.3+1+1+1+1)/7) as well as the value for 20150707(1) like:
select dateid, value , avg(value) as avg_past_7 from mytable where dateid between 20150707 and 20150730GROUP BY every past_7days.
And when the records are less than 7 rows to count, the avg remains null.
That means if I only have records from 20150707-20150730 in the table, the past_7_day avg for 20150707/8/9/10/11/12 remains null.
Correlated sub-select:
select dateid, value, (select avg(value) from mytable t2
where t2.dateid between (DATE_SUB(date(t1.dateid),INTERVAL 6 day)+0)
and t1.dateid) as avg_past_7
from mytable t1
where dateid between 20150101 and 20150201 order by dateid;
Use Date_SUB With Interval of 7 Days
I solve the problem by :
select t1.dateid, t1.value, if(count(1)>=7,avg(t2.value),null)
from mytable t1 , mytable t2
where t2.dateid between DATE_SUB(date(t1.dateid),INTERVAL 6 day)+0 and t1.dateid and
t1.dateid between 20150105 and 20150201
group by t1.dateid ,t1.value
order by dateid;

SUM a pair of COUNTs from two tables based on a time variable

Been searching for an answer to this for the better part of an hour without much luck. I have two regional tables laid out with the same column names and I can put out a result list for either table based on the following query (swap Table2 for Table1):
SELECT Table1.YEAR, FORMAT(COUNT(Table1.id),0) AS Total
FROM Table1
WHERE Table1.variable='Y'
GROUP BY Table1.YEAR
Ideally I'd like to get a result that gives me a total sum of the counts by year, so instead of:
| REGION 1 | | REGION 2 |
| YEAR | Total | | YEAR | Total |
| 2010 | 5 | | 2010 | 1 |
| 2009 | 2 | | 2009 | 3 |
| | | | 2008 | 4 |
I'd have:
| MERGED |
| YEAR | Total |
| 2010 | 6 |
| 2009 | 5 |
| 2008 | 4 |
I've tried a variety of JOINs and other ideas but I think I'm caught up on the SUM and COUNT issue. Any help would be appreciated, thanks!
SELECT `YEAR`, FORMAT(SUM(`count`), 0) AS `Total`
FROM (
SELECT `Table1`.`YEAR`, COUNT(*) AS `count`
WHERE `Table1`.`variable` = 'Y'
GROUP BY `Table1`.`YEAR`
UNION ALL
SELECT `Table2`.`YEAR`, COUNT(*) AS `count`
WHERE `Table2`.`variable` = 'Y'
GROUP BY `Table2`.`YEAR`
) AS `union`
GROUP BY `YEAR`
You should use an UNION:
SELECT
t.YEAR,
COUNT(*) as TOTAL
FROM (
SELECT *
FROM Table1
UNION ALL
SELECT *
FROM Table2
) t
WHERE t.variable='Y'
GROUP BY t.YEAR;
Select year, sum(counts) from (
SELECT Table1.YEAR, FORMAT(COUNT(Table1.id),0) AS Total
FROM Table1
WHERE Table1.variable='Y'
GROUP BY Table1.YEAR
UNION ALL
SELECT Table2.YEAR, FORMAT(COUNT(Table2.id),0) AS Total
FROM Table2
WHERE Table2.variable='Y'
GROUP BY Table2.YEAR ) GROUP BY year
To improve upon Shehzad's answer:
SELECT YEAR, FORMAT(SUM(counts),0) AS total FROM (
SELECT Table1.YEAR, COUNT(Table1.id) AS counts
FROM Table1
WHERE Table1.variable='Y'
GROUP BY Table1.YEAR
UNION ALL
SELECT Table2.YEAR, COUNT(Table2.id) AS counts
FROM Table2
WHERE Table2.variable='Y'
GROUP BY Table2.YEAR ) AS newTable GROUP BY YEAR

MySQL grouping by date range with multiple joins

I currently have quite a messy query, which joins data from multiple tables involving two subqueries. I now have a requirement to group this data by DAY(), WEEK(), MONTH(), and QUARTER().
I have three tables: days, qos and employees. An employee is self-explanatory, a day is a summary of an employee's performance on a given day, and qos is a random quality inspection, which can be performed many times a day.
At the moment, I am selecting all employees, and LEFT JOINing day and qos, which works well. However, now, I need to group the data in order to breakdown a team or individual's performance over a date range.
Taking this data:
Employee
id | name
------------------
1 | Bob Smith
Day
id | employee_id | day_date | calls_taken
---------------------------------------------
1 | 1 | 2011-03-01 | 41
2 | 1 | 2011-03-02 | 24
3 | 1 | 2011-04-01 | 35
Qos
id | employee_id | qos_date | score
----------------------------------------
1 | 1 | 2011-03-03 | 85
2 | 1 | 2011-03-03 | 95
3 | 1 | 2011-04-01 | 91
If I were to start by grouping by DAY(), I would need to see the following results:
Day__date | Day__Employee__id | Day__calls | Day__qos_score
------------------------------------------------------------
2011-03-01 | 1 | 41 | NULL
2011-03-02 | 1 | 24 | NULL
2011-03-03 | 1 | NULL | 90
2011-04-01 | 1 | 35 | 91
As you see, Day__calls should be SUM(calls_taken) and Day__qos_score is AVG(score). I've tried using a similar method as above, but as the date isn't known until one of the tables has been joined, its only displaying a record where there's a day saved.
Is there any way of doing this, or am I going about things the wrong way?
Edit: As requested, here's what I've come up with so far. However, it only shows dates where there's a day.
SELECT COALESCE(`day`.day_date, qos.qos_date) AS Day__date,
employee.id AS Day__Employee__id,
`day`.calls_taken AS Day__Day__calls,
qos.score AS Day__Qos__score
FROM faults_employees `employee`
LEFT JOIN (SELECT `day`.employee_id AS employee_id,
SUM(`day`.calls_taken) AS `calls_in`,
FROM faults_days AS `day`
WHERE employee.id = 7
GROUP BY (`day`.day_date)
) AS `day`
ON `day`.employee_id = `employee`.id
LEFT JOIN (SELECT `qos`.employee_id AS employee_id,
AVG(qos.score) AS `score`
FROM faults_qos qos
WHERE employee.id = 7
GROUP BY (qos.qos_date)
) AS `qos`
ON `qos`.employee_id = `employee`.id AND `qos`.qos_date = `day`.day_date
WHERE employee.id = 7
GROUP BY Day__date
ORDER BY `day`.day_date ASC
The solution I'm comming up with looks like:
SELECT
`date`,
`employee_id`,
SUM(`union`.`calls_taken`) AS `calls_taken`,
AVG(`union`.`score`) AS `score`
FROM ( -- select from union table
(SELECT -- first select all calls taken, leaving qos_score null
`day`.`day_date` AS `date`,
`day`.`employee_id`,
`day`.`calls_taken`,
NULL AS `score`
FROM `employee`
LEFT JOIN
`day`
ON `day`.`employee_id` = `employee`.`id`
)
UNION -- union both tables
(
SELECT -- now select qos score, leaving calls taken null
`qos`.`qos_date` AS `date`,
`qos`.`employee_id`,
NULL AS `calls_taken`,
`qos`.`score`
FROM `employee`
LEFT JOIN
`qos`
ON `qos`.`employee_id` = `employee`.`id`
)
) `union`
GROUP BY `union`.`date` -- group union table by date
For the UNION to work, we have to set the qos_score field in the day table and the calls_taken field in the qos table to null. If we don't, both calls_taken and score would be selected into the same column by the UNION statement.
After this, I selected the required fields with the aggregation functions SUM() and AVG() from the union'd table, grouping by the date field in the union table.