Count specific fields with inner join - mysql

I have a following schema:
create table myapp_task
(
title varchar(100) not null,
state varchar(11) not null,
estimate date not null,
my_id int not null auto_increment
primary key,
road_map_id int not null,
create_date date not null,
constraint myapp_task_road_map_id_5e114978_fk_myapp_roadmap_rd_id
foreign key (road_map_id) references myapp_roadmap (rd_id)
);
— auto-generated definition
create table myapp_roadmap
(
rd_id int not null auto_increment
primary key,
name varchar(50) not null
);
I want get number, begin and end of a week of create_date, number of all tasks and number of ready tasks (state = 'ready/in_progress')
Here is my query:
select DISTINCT week(create_date, 1) as week,
SUBDATE(create_date, WEEKDAY(create_date)) as beginofweek,
DATE(create_date + INTERVAL (6 - WEEKDAY(create_date)) DAY) as endofweek,
SUM(state) as number,
SUM(state = 'ready') as ready
from myapp_task
inner join myapp_roadmap
on myapp_task.road_map_id = myapp_roadmap.rd_id;
Actually, I have a problem only with count of ready tasks.

I think you are close:
select week(create_date, 1) as week,
SUBDATE(create_date, WEEKDAY(create_date)) as beginofweek,
DATE(create_date + INTERVAL (6 - WEEKDAY(create_date)) DAY) as endofweek,
count(state) as number,
SUM(CASE WHEN state = 'ready' THEN 1 ELSE 0 END) as ready,
SUM(CASE WHEN state = 'in_progress' THEN 1 ELSE 0 END) as in_progress
FROM myapp_task inner join myapp_roadmap
on myapp_task.road_map_id = myapp_roadmap.rd_id
GROUP BY week, beginofweek, endofweek
Using a CASE statement you can add up just states that are ready or in_progress separately. Furthemore, the addition of a GROUP BY insures that the count is for the week. I think MySQL would probably spit out the right result without the GROUP BY in this case, but why let it guess at what you want here. Also, if you upgrade to MySQL 5.7+ then a query like this written without a GROUP BY will error by default.
Also got rid of that DISTINCT modifier. Thanks #AaronDietz

You should look up the use of aggregate functions
COUNT is the function to return the number of rows, and to get two values for the total number of states and those which are equal to 'ready', you need to join the table twice with different join conditions.
The columns that are then not aggregated need to be included in a GROUP BY clause.
select DISTINCT week(create_date, 1) as week,
SUBDATE(create_date, WEEKDAY(create_date)) as beginofweek,
DATE(create_date + INTERVAL (6 - WEEKDAY(create_date)) DAY) as endofweek,
COUNT(r1.state) AS number,
COUNT(r2.state) AS ready
from myapp_roadmap inner join myapp_task r1
on r1.road_map_id = myapp_roadmap.rd_id
inner join myapp_task r2
on r2.road_map_id = myapp_roadmap.rd_id and r2.state = 'ready'
group by week, beginofweek, endofweek

Related

MySQL Attendance Calculation

Here i have table attendances
I need result as shown below
How can i achieve this in mysql without using any programming language
Sql File is Attendances.sql
We can try a pivot query approach, aggregating by user and date:
SELECT
user_id,
DATE(date_time) AS date,
TIMESTAMPDIFF(MINUTE,
MAX(CASE WHEN status = 'IN' THEN date_time END),
MAX(CASE WHEN status = 'OUT' THEN date_time END)) / 60.0 AS hours
FROM yourTable
GROUP BY
user_id,
DATE(date_time);
The caveats of this answer are many. It assumes that there would be only one IN and OUT entry, per user, per day. If a period could cross over dates, then my answer might not generate correct results. Also, if an IN or OUT value be missing, then NULL would be reported for the hours value.
I have Achieve it my self by creating a mysql function and view
Mysql View
CREATE OR REPLACE VIEW `view_attendances` AS
SELECT
`a`.`id` AS `a1_id`,
`a`.`user_id` AS `user_id`,
CAST(`a`.`date_time` AS DATE) AS `date`,
`a`.`date_time` AS `in`,
`a2`.`id` AS `a2_id`,
`a2`.`date_time` AS `out`,
(TIMESTAMPDIFF(SECOND,
`a`.`date_time`,
`a2`.`date_time`) / 3600) AS `hours`
FROM
(`attendances` `a`
JOIN `attendances` `a2` ON (((`a`.`is_confirm` = 1)
AND (`a`.`status` = 'IN')
AND (`a2`.`id` = FN_NEXT_OUT_ATTENDANCE_ID(`a`.`user_id`, `a`.`date_time`, `a`.`status`))
AND (a2.status = 'OUT')
AND (CAST(`a`.`date_time` AS DATE) = CAST(`a2`.`date_time` AS DATE)))))
Mysql Function
CREATE FUNCTION `fn_next_out_attendance_id`( _user_id INT, _attendance_date_time DATETIME, _status VARCHAR(10) ) RETURNS int(11)
BEGIN
DECLARE _id INT(11);
SELECT
id INTO _id
FROM
attendances
WHERE
is_confirm = 1
AND user_id = _user_id
AND date_time > _attendance_date_time
AND `status` <> _status
ORDER BY
date_time ASC LIMIT 1 ;
RETURN if (_id IS NULL, 0, _id);
END

Change db schema or query to return balance for a given period of time

I have came up with the following schema:
CREATE TABLE products
(
id INT(10) UNSIGNED AUTO_INCREMENT NOT NULL,
name VARCHAR(255) NOT NULL,
quantity INT(10) UNSIGNED NOT NULL,
purchase_price DECIMAL(8,2) NOT NULL,
sell_price DECIMAL(8,2) NOT NULL,
provider VARCHAR(255) NULL,
created_at TIMESTAMP NULL,
PRIMARY KEY (id)
);
# payment methods = {
# "0": "CASH",
# "1": "CREDIT CARD",
# ...
# }
CREATE TABLE orders
(
id INT(10) UNSIGNED AUTO_INCREMENT NOT NULL,
product_id INT(10) UNSIGNED NOT NULL,
quantity INT(10) UNSIGNED NOT NULL,
payment_method INT(11) NOT NULL DEFAULT 0,
created_at TIMESTAMP NULL,
PRIMARY KEY (id),
FOREIGN KEY (product_id) REFERENCES products(id)
);
# status = {
# "0": "PENDING"
# "1": "PAID"
# }
CREATE TABLE invoices
(
id INT(10) UNSIGNED AUTO_INCREMENT NOT NULL,
price INT(10) UNSIGNED NOT NULL,
status INT(10) UNSIGNED NOT NULL DEFAULT 0,
created_at TIMESTAMP NULL,
PRIMARY KEY (id)
);
# payment methods = {
# "0": 'CASH',
# "1": 'CREDIT CARD',
# ...
# }
CREATE TABLE bills
(
id INT(10) UNSIGNED AUTO_INCREMENT NOT NULL,
name VARCHAR(255) NOT NULL,
payment_method INT(10) UNSIGNED NOT NULL DEFAULT 0,
price DECIMAL(8,2) NOT NULL,
created_at TIMESTAMP NULL,
PRIMARY KEY (id)
);
And the following query to select a balance:
SELECT ((orders + invoices) - bills) as balance
FROM
(
SELECT SUM(p.sell_price * o.quantity) as orders
FROM orders o
JOIN products p
ON o.product_id = p.id
) orders,
(
SELECT SUM(price) as invoices
FROM invoices
WHERE status = 1
) invoices,
(
SELECT SUM(price) as bills
FROM bills
) bills;
Its working and returning the right balance, but I want to create a chart using Morris.js and I need to change it to return a daily or monthly balance at a given period of time and in this format:
Daily (2017-02-27 to 2017-03-01)
balance | created_at
--------------------------
600.00 | 2017-03-01
50.00 | 2017-02-28
450.00 | 2017-02-27
And monthly (2017-01 to 2017-03)
balance | created_at
--------------------------
200.00 | 2017-03
250.00 | 2017-02
350.00 | 2017-01
What I need to change in my schema or query to return results in this way?
http://sqlfiddle.com/#!9/2289a9/2
Any hints are welcomed. Thanks in advance
Include the created_at date in the SELECT list and a GROUP BY clause in each query.
Ditch the old school comma operator for the join operation, and replace it with a LEFT JOIN.
To return dates for which there are no orders (or no payments, or no invoices) we need a separate row source that is guaranteed to return the date values. As an example, we could use an inline view:
SELECT d.created_dt
FROM ( SELECT '2017-02-27' + INTERVAL 0 DAY AS created_dt
UNION ALL SELECT '2017-02-28'
UNION ALL SELECT '2017-03-01'
) d
ORDER BY d.created_dt
The inline view is just an option. If we had a calendar table that contains rows for the three dates we're interested in, we could make use of that instead. What's important is that we have a query that is guaranteed to return to us exactly three rows with the distinct created_at date values we want to return.
Once we have that, we can add a LEFT JOIN to get the value of "bills" for that date.
SELECT d.created_dt
, b.bills
FROM ( SELECT '2017-02-27' + INTERVAL 0 DAY AS created_dt
UNION ALL SELECT '2017-02-28'
UNION ALL SELECT '2017-03-01'
) d
LEFT
JOIN ( SELECT DATE(bills.created_at) AS created_dt
, SUM(bills.price) AS bills
FROM bills
WHERE bills.created_at >= '2017-02-27'
AND bills.created_at < '2017-03-01' + INTERVAL 1 DAY
GROUP BY DATE(bills.created_at)
) b
ON b.created_dt = d.created_dt
ORDER BY d.created_dt
Extending that to add another LEFT JOIN, to get invoices
SELECT d.created_dt
, i.invoices
, b.bills
FROM ( SELECT '2017-02-27' + INTERVAL 0 DAY AS created_dt
UNION ALL SELECT '2017-02-28'
UNION ALL SELECT '2017-03-01'
) d
LEFT
JOIN ( SELECT DATE(bills.created_at) AS created_dt
, SUM(bills.price) AS bills
FROM bills
WHERE bills.created_at >= '2017-02-27'
AND bills.created_at < '2017-03-01' + INTERVAL 1 DAY
GROUP BY DATE(bills.created_at)
) b
ON b.created_dt = d.created_dt
LEFT
JOIN ( SELECT DATE(invoices.created_at) AS created_dt
, SUM(invoices.price) AS invoices
FROM invoices
WHERE invoices.status = 1
AND invoices.created_at >= '2017-02-27'
AND invoices.created_at < '2017-03-01' + INTERVAL 1 DAY
GROUP BY DATE(invoices.created_at)
) i
ON i.created_dt = d.created_dt
ORDER BY d.created_dt
Similarly, we can a LEFT JOIN to another inline view that returns total orders grouped by DATE(created_at).
It's important that the inline views return distinct value of created_dt, a single row for each date value.
Note that for dev, test and debugging, we can independently execute just the inline view queries.
When a matching row is not returned from a LEFT JOIN, for example no matching row returned from i because there were no invoices on that date, the query is going to return a NULL for the expression i.invoices. To replace the NULL with a zero, we can use the IFNULL function, or the more ANSI standard COALESCE function. For example:
SELECT d.created_dt
, IFNULL(i.invoices,0) AS invoices
, COALESCE(b.bills,0) AS bills
FROM ...
To get the results monthly, we'd need a calendar query that returns one row per month. Let's assume we're going to return a DATE value which as the first day of the month. For example:
SELECT d.created_month
FROM ( SELECT '2017-02-01' + INTERVAL 0 DAY AS created_month
UNION ALL SELECT '2017-03-01'
) d
ORDER BY d.created_month
The inline view queries will need to GROUP BY created_month, so they return a single value for each month value. My preference would be to use a DATE_FORMAT function to return the first day of the month, derived from created_at. But there are other ways to do it. The goal is return a single row for '2017-02-01' and a single row for '2017-03-01'. Note that the date ranges on created_at extend from '2017-02-01' up to (but not including) '2017-04-01', so we get the total for the whole month.
( SELECT DATE_FORMAT(bills.created_at,'%Y-%m-01') AS created_month
, SUM(bills.price) AS bills
FROM bills
WHERE bills.created_at >= '2017-02-01'
AND bills.created_at < '2017-03-01' + INTERVAL 1 MONTH
GROUP BY DATE_FORMAT(bills.created_at,'%Y-%m-01')
) b

MySQL - Group By Year using 2 columns & Count

I have a table like this:
ID_____StartDate_____EndDate
----------------------------
1______05/01/2012___02/03/2013
2______06/30/2013___07/12/2013
3______02/17/2010___02/17/2013
4______12/10/2012___11/16/2013
I'm trying to get a count of the ID's that were active during each year. If the ID was active for multiple years, it would be counted multiple times. I don't want to "hardcode" years into my query because the data is over many many multiple years. (i.e. can't use CASE YEAR(StartDate) WHEN x then y or IF...
Desired Result from the table above:
YEAR_____COUNT
2010_____1
2011_____1
2012_____3
2013_____4
I've tried:
SELECT COUNT(ID)
FROM table
WHERE (DATE_FORMAT(StartDate, '%Y-%m') BETWEEN '2013-01' AND '2013-12'
OR DATE_FORMAT(EndDate, '%Y-%m') BETWEEN '2013-01' AND '2013-12')
of course this only is for the year 2013. I also tried:
SELECT YEAR(StartDate) AS 'Start Year', YEAR(EndDate) AS 'End Year', COUNT(id)
FROM table
WHERE StartDate IS NOT NULL
GROUP BY YEAR(StartDate);
though this gave me just those that started in a given year.
Assuming that there is an auxiliary table that contains consecutive numbers from 1 .. to X (where X must be grather than possible number of years in the table):
create table series( x int primary key auto_increment );
insert into series( x )
select null from information_schema.tables;
then the query might look like:
SELECT years.year, count(*)
FROM (
SELECT mm.min_year + s.x - 1 as year
FROM (
SELECT min( year( start_date )) min_year,
max( year( end_date )) max_year
FROM tab
) mm
JOIN series s
ON s.x <= mm.max_year - mm.min_year + 1
GROUP BY mm.min_year + s.x - 1
) years
JOIN tab
ON years.year between year( tab.start_date )
and year( tab.end_date )
GROUP BY years.year
;
see a demo: http://www.sqlfiddle.com/#!2/f49ab/14

Check Status of the Duplicate Records

Lets say we have a table named record with 4 fields
id (INT 11 AUTO_INC)
email (VAR 50)
timestamp (INT 11)
status (INT 1)
And the table contains following data
Now we can see that the email address test#xample.com was duplicated 4 times (the record with the lowest timestamp is the original one and all copies after that are duplicates). I can easily count the number of unique records using
SELECT COUNT(DISTINCT email) FROM record
I can also easily find out which email address was duplicated how many times using
SELECT email, count(id) FROM record GROUP BY email HAVING COUNT(id)>1
But now the business question is
How many times STATUS was 1 on all the Duplicate Records?
For example:
For test#example.com there was no duplicate record having status 1
For second#example.com there was 1 duplicate record having status 1
For third#example.com there was 1 duplicate record having status 1
For four#example.com there was no duplicate record having status 1
For five#example.com there were 2 duplicate record having status 1
So the sum of all the numbers is 0 + 1 + 1 + 0 + 2 = 4
Which means there were 4 Duplicate records which had status = 1 In table
Question
How many Duplicate records have status = 1 ?
This is a new solution that works better. It removes the first entry for each email and then counts the rest. It's not easy to read, if possible I would write this in a stored procedure but this works.
select sum(status)
from dude d1
join (select email,
min(ts) as ts
from dude
group by email) mins
using (email)
where d1.ts != mins.ts;
sqlfiddle
original answer below
Your own query to find "which email address was duplicated how many times using"
SELECT email,
count(id) as duplicates
FROM record
GROUP BY email
HAVING COUNT(id)>1
can easily be modified to answer "How many Duplicate records have status = 1"
SELECT email,
count(id) as duplicates_status_sum
FROM record
GROUP BY email
WHERE status = 1
HAVING COUNT(id)>1
Both these queries will answer including the original line so it's actually "duplicates including the original one". You can subtract 1 from the sums if the original one always have status 1.
SELECT email,
count(id) -1 as true_duplicates
FROM record
GROUP BY email
HAVING COUNT(id)>1
SELECT email,
count(id) -1 as true_duplicates_status_sum
FROM record
GROUP BY email
WHERE status = 1
HAVING COUNT(id)>1
If I am not wrong in understanding then your query should be
SELECT `email` , COUNT( `id` ) AS `tot`
FROM `record` , (
SELECT `email` AS `emt` , MIN( `timestamp` ) AS `mtm`
FROM `record`
GROUP BY `email`
) AS `temp`
WHERE `email` = `emt`
AND `timestamp` > `mtm`
AND `status` =1
GROUP BY `email`
HAVING COUNT( `id` ) >=1
First we need to get the minimum timestamp and then find duplicate records that are inserted after this timestamp and having status 1.
If you want the total sum then the query is
SELECT SUM( `tot` ) AS `duplicatesWithStatus1`
FROM (
SELECT `email` , COUNT( `id` ) AS `tot`
FROM `record` , (
SELECT `email` AS `emt` , MIN( `timestamp` ) AS `mtm`
FROM `record`
GROUP BY `email`
) AS `temp`
WHERE `email` = `emt`
AND `timestamp` > `mtm`
AND `status` =1
GROUP BY `email`
HAVING COUNT( `id` ) >=1
) AS t
Hope this is what you want
You can get the count of Duplicate records have status = 1 by
select count(*) as Duplicate_Record_Count
from (select *
from record r
where r.status=1
group by r.email,r.status
having count(r.email)>1 ) t1
The following query will return the duplicate email with status 1 count and timestamp
select r.email,count(*)-1 as Duplicate_Count,min(r.timestamp) as timestamp
from record r
where r.status=1
group by r.email
having count(r.email)>1

more efficient group by for query with Case

I have the following query building a recordset which is used in a pie-chart as a report.
It's not run particularly often, but when it does it takes several seconds, and I'm wondering if there's any way to make it more efficient.
SELECT
CASE
WHEN (lastStatus IS NULL) THEN 'Unused'
WHEN (attempts > 3 AND callbackAfter IS NULL) THEN 'Max Attempts Reached'
WHEN (callbackAfter IS NOT NULL AND callbackAfter > DATE_ADD(NOW(), INTERVAL 7 DAY)) THEN 'Call Back After 7 Days'
WHEN (callbackAfter IS NOT NULL AND callbackAfter <= DATE_ADD(NOW(), INTERVAL 7 DAY)) THEN 'Call Back Within 7 Days'
WHEN (archived = 0) THEN 'Call Back Within 7 Days'
ELSE 'Spoke To'
END AS statusSummary,
COUNT(leadId) AS total
FROM
CO_Lead
WHERE
groupId = 123
AND
deleted = 0
GROUP BY
statusSummary
ORDER BY
total DESC;
I have an index for (groupId, deleted), but I'm not sure it would help to add any of the other fields into the index (if it would, how do I decide which should go first? callbackAfter because it's used the most?)
The table has about 500,000 rows (but will have 10 times that a year from now.)
The only other thing I could think of was to split it out into 6 queries (with the WHEN clause moved into the WHERE), but that makes it take 3 times as long.
EDIT:
Here's the table definition
CREATE TABLE CO_Lead (
objectId int UNSIGNED NOT NULL AUTO_INCREMENT,
groupId int UNSIGNED NOT NULL,
numberToCall varchar(20) NOT NULL,
firstName varchar(100) NOT NULL,
lastName varchar(100) NOT NULL,
attempts tinyint NOT NULL default 0,
callbackAfter datetime NULL,
lastStatus varchar(30) NULL,
createdDate datetime NOT NULL,
archived bool NOT NULL default 0,
deleted bool NOT NULL default 0,
PRIMARY KEY (
objectId
)
) ENGINE = InnoDB;
ALTER TABLE CO_Lead ADD CONSTRAINT UQIX_CO_Lead UNIQUE INDEX (
objectId
);
ALTER TABLE CO_Lead ADD INDEX (
groupId,
archived,
deleted,
callbackAfter,
attempts
);
ALTER TABLE CO_Lead ADD INDEX (
groupId,
deleted,
createdDate,
lastStatus
);
ALTER TABLE CO_Lead ADD INDEX (
firstName
);
ALTER TABLE CO_Lead ADD INDEX (
lastName
);
ALTER TABLE CO_Lead ADD INDEX (
lastStatus
);
ALTER TABLE CO_Lead ADD INDEX (
createdDate
);
Notes:
If leadId cannot be NULL, then change the COUNT(leadId) to COUNT(*). They are logically equivalent but most versions of MySQL optimizer are not so clever to identify that.
Remove the two redundant callbackAfter IS NOT NULL conditions. If callbackAfter satisfies the second part, it cannot be null anyway.
You could benefit from splitting the query into 6 parts and add appropriate indexes for each one - but depending on whether the conditions at the CASE are overlapping or not, you may have wrong or correct results.
A possible rewrite (mind the different format and check if this returns the same results, it may not!)
SELECT
cnt1 AS "Unused"
, cnt2 AS "Max Attempts Reached"
, cnt3 AS "Call Back After 7 Days"
, cnt4 AS "Call Back Within 7 Days"
, cnt5 AS "Call Back Within 7 Days"
, cnt6 - (cnt1+cnt2+cnt3+cnt4+cnt5) AS "Spoke To"
FROM
( SELECT
( SELECT COUNT(*) FROM CO_Lead
WHERE groupId = 123 AND deleted = 0
AND lastStatus IS NULL
) AS cnt1
, ( SELECT COUNT(*) FROM CO_Lead
WHERE groupId = 123 AND deleted = 0
AND attempts > 3 AND callbackAfter IS NULL
) AS cnt2
, ( SELECT COUNT(*) FROM CO_Lead
WHERE groupId = 123 AND deleted = 0
AND callbackAfter > DATE_ADD(NOW(), INTERVAL 7 DAY)
) AS cnt3
, ( SELECT COUNT(*) FROM CO_Lead
WHERE groupId = 123 AND deleted = 0
AND callbackAfter <= DATE_ADD(NOW(), INTERVAL 7 DAY)
) AS cnt4
, ( SELECT COUNT(*) FROM CO_Lead
WHERE groupId = 123 AND deleted = 0
AND archived = 0
) AS cnt5
, ( SELECT COUNT(*) FROM CO_Lead
WHERE groupId = 123 AND deleted = 0
) AS cnt6
) AS tmp ;
If it does return correct results, you could add indexes to be used for each one of the subqueries:
For subquery 1: (groupId, deleted, lastStatus)
For subquery 2, 3, 4: (groupId, deleted, callbackAfter, attempts)
For subquery 5: (groupId, deleted, archived)
Another approach would be to keep the query you have (minding only notes 1 and 2 above) and add a wide covering index:
(groupId, deleted, lastStatus, callbackAfter, attempts, archived)
Try removing the index to see if this improves the performance.
Indexes do not necessarily improve performance, in some databases. If you have an index, MySQL will always use it. In this case, that means that it will read the index, then it will have to read data from each page. The page reads are random, rather than sequential. This random reading can reduce performance, on a query that has to read all the pages anyway.