How to join two tables with average function and where clause? SQL - mysql

I have two tables below with the following information
project.analytics
| proj_id | list_date | state
| 1 | 03/05/10 | CA
| 2 | 04/05/10 | WA
| 3 | 03/05/10 | WA
| 4 | 04/05/10 | CA
| 5 | 03/05/10 | WA
| 6 | 04/05/10 | CA
employees.analytics
| employee_id | proj_id | worked_date
| 20 | 1 | 3/12/10
| 30 | 1 | 3/11/10
| 40 | 2 | 4/15/10
| 50 | 3 | 3/16/10
| 60 | 3 | 3/17/10
| 70 | 4 | 4/18/10
What query can I write to determine the average number of unique employees who have worked on the project in the first 7 days that it was listed by month and state?
Desired output:
| list_date | state | # Unique Employees of projects first 7 day list
| March | CA | 1
| April | WA | 2
| July | WA | 2
| August | CA | 1
My Attempt
select
month(list_date),
state_name,
count(*) as Projects,
from projects
group by
month(list_date),
state_name;
I understand the next steps are to subtract the worked_date - list_date and if value is <7 then average count of employees from the 2nd table but I'm not sure what query functions to use.

You could use a CASE with a DISTINCT to COUNT the unique employees that worked within the first 7 days of the list_date.
Once you have that total of employees per project, then you can calculate those averages per month & state.
SELECT
MONTHNAME(list_date) as `ListMonth`,
state,
AVG(TotalUniqEmp7Days) AS `Average Unique Employees of projects first 7 day list`
FROM
(
SELECT
proj.proj_id,
proj.list_date,
proj.state,
COUNT(DISTINCT CASE
WHEN emp.worked_date BETWEEN proj.list_date and DATE_ADD(proj.list_date, INTERVAL 6 DAY)
THEN emp.employee_id
END) AS TotalUniqEmp7Days
-- , COUNT(DISTINCT emp.employee_id) AS TotalUniqEmp
FROM project.analytics proj
LEFT JOIN employees.analytics emp ON emp.proj_id = proj.proj_id
GROUP BY proj.proj_id, proj.list_date, proj.state
) AS ProjectTotals
GROUP BY YEAR(list_date), MONTH(list_date), MONTHNAME(list_date), state;
A Sql Fiddle test can be found here

I think this is the code that you want
select
p.list_date, p.state,
emp.no_of_unique_emp
from project.analytics p
inner join (
select
t.project_id,
count(t.employee_id) as no_of_unique_emp
from (
select distinct employee_id, project_id
from employees.analytics
) t
group by t.project_id
) emp
on emp.project_id = p.project_id
where datediff (p.list_date, getdate()) <= 7

Related

Bring all data from a table with joins with where clause that may not exist in the other table

I'm having a hard time setting up a query(select). Database is not my specialty, so I'm turning to the experts. Let me show what I need.
----companies--- ----company_server----- -----servers---- -----print------------------------
| id | name | | company | server | | id | name | | id |page|copy | date |server
|----|-------- | |---------|----------| |----|-------- | |----|----|-----|-------------
| 1 | Company1 |1--N| 1 | 1 |N*--1| 1 | Server1 |1--N| 1 | 2 | 3 | 2020-1-11 | 1
| 2 | Company2 | | 2 | 1 | | 2 | Server2 | | 2 | 1 | 6 | 2020-1-12 | 3
| 3 | Company3 | | 3 | 2 | | 3 | Server3 | | 3 | 4 | 5 | 2020-1-13 | 4
| 3 | 3 | | 4 | Server4 | | 4 | 5 | 3 | 2020-1-15 | 2
| 5 | 3 | 4 | 2020-1-15 | 4
| 6 | 1 | 2 | 2020-1-16 | 3
| 7 | 2 | 2 | 2020-1-16 | 4
What I need?
Example where date between CAST(2020-1-12 AS DATE) AND CAST(2020-1-15 AS DATE) group by servers.id
| companies | server | sum | percent
------------------------------------------------------------------------------------
| company1,company2 | server1 | sum(page*copy) = 0 or null | 0 or NULL
| company3 | server2 | sum(page*copy) = 15 | 28.30
| company3 | server3 | sum(page*copy) = 6 | 11.32
| NULL | server4 | sum(page*copy) = 32 | 60.38
Few notes:
I need this query for MYSQL;
Every Company is linked to at least one server.
I need result grouped by server. So, every company linked to that server must be concatenated by a comma.
If the company has not yet been registered, the value null should be presented.
The sum (page * copie) must be presented as zero or null (I don't care) in the case that there was no printing in the date range.
The percentage should be calculated according to the date range entered and not with all records in the database.
The field date is stored as MYSQL DATE.
Experts, I thank you in advance for your help. I currently solve this problem with at least 03 queries to the database, but I have a conviction that I could do it with just one query.
Added a fiddle. Sorry. Im still learing how to use this.
https://www.db-fiddle.com/f/dXej7QCPe9iDopfYd1SfVh/2
Follows the query that more or less represents how far I had arrived. Notice that in the middle of the way 'server4' disappeared because there are no values ​​for it in print in the period searched for him and I am in possession of the total of the period but I cannot calculate the percentage.
i'm stuck
select
*
from
(select
sum(p.copy * p.page) as sum1,
s.name as s_name,
s.id as s_id
from
print p
join servers s on s.id = p.server
where p.date between cast('2020-1-12' as date) and cast('2020-1-15' as date)
group by s.id) as t1
join company_server cs on cs.server = t1.s_id
right join companies c on c.id = cs.company
cross join(
select
sum(p1.copy * p1.page) sum2
from
print p1
where p1.date between cast('2020-1-12' as date) and cast('2020-1-15' as date)
) as c;
I did this query before you add fiddle, so may be name of column of mine is not same as you. Anyway, this is my solution, hope it help you.
select group_concat(c.name separator ',') as name_company,
ss.name,
sum_print as sum,
(sum_print/total) *100 as percentage
from companies c
inner join company_server cs on c.id = cs.company
right join servers ss on ss.id = cs.id
left join
(
select server,sum(page*copy) as sum_print, date from print
where date between CAST('2020-1-12' AS DATE) AND CAST('2020-1-15' AS DATE)
group by server
) tmp on tmp.server = ss.id
cross join
(select sum(page*copy) as total from print where date between CAST('2020-1-12' AS DATE) AND CAST('2020-1-15' AS DATE)) tmp2
group by id
Group and concat by comma, using GROUP_CONCAT .
You can reference this image for JOIN clause.
https://i.stack.imgur.com/6cioZ.png

SQL Query that selects only one duplicate record, based on the highest date value in that record

I have a table below that shows employee details, along with a 'dateValue', which is a number based off when the employee clocked into work. As you can see, 'Dave' has clocked in twice today, but I only want to see Dave's most recent clock in (The larger the number, the more recent the clock in)
ID is a column in both the 'employee' and 'clock' database that links the two database's together, this is unique to each employee.
SQL for table below
SELECT e.name, e.country, e.role, e.age, c.dateValue FROM employee e left join clock c on e.ID = c.ID
| e.name | e.country| e.role |e.age| c.dateValue | c.ID |
| Dave | England | Programmer | 45 | 013 | 1 |
| Gary | Scotland | Engineer | 44 | 033 | 2 |
| Brian | USA | Engineer | 67 | 042 | 4 |
| Dave | England | Programmer | 45 | 019 | 1 |
| Lucy | England | Sales | 35 | 033 | 5 |
Desired result:
| e.name | e.country| e.role |e.age| c.dateValue | c.ID |
| Gary | Scotland | Engineer | 44 | 033 | 2 |
| Brian | USA | Engineer | 67 | 042 | 4 |
| Dave | England | Programmer | 45 | 019 | 1 |
| Lucy | England | Sales | 35 | 033 | 5 |
In my desired result, Dave's first clock in is not displayed, as I want to display only one of each employee, whether they've clocked in once, or 100 times today, I only want to show their most recent clock in, where the c.dateValue is the highest, and grouping by e.name
SQL I have tried:
SELECT e.name, e.country, e.role, e.age, c.dateValue FROM employee e left join clock c on e.ID = c.ID group by e.name where MAX(c.dateValue) AS date
SELECT e.name, e.country, e.role, e.age, MAX(c.dateValue) AS date FROM employee e left join clock c on e.ID = c.ID group by e.name
For both attempts of my SQL above, I get the error: " 'employee.country' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause "
You need filtering, not aggregation. I would recommend row_number() (available in MySQL 8.0 only):
select name, country, role, age, datevalue
from (
select e.*, c.datevalue, row_number() over(partition by c.datevalue order by e.id desc) rn
from employee e
inner join clock c on e.id = c.id
) t
where rn = 1

How to write a BigQuery query to find rows where a specific column changes in a table

I need to write a query for a table that records the date when a value changes in a column. The table is such that the following query yields the corresponding result.
SELECT
employeeId,
date,
location,
FROM
MY_TABLE
ORDER BY
employeeId, date, location
Result:
+----+--------------+------------+------------------+
| | employeeId | date | location |
+====+==============+============+==================+
| 0 | 2467 | 2016-04-31 | COUNTRY A |
+----+--------------+------------+------------------+
| 1 | 2467 | 2016-05-31 | COUNTRY A |
+----+--------------+------------+------------------+
| 2 | 2467 | 2016-06-31 | COUNTRY A |
+----+--------------+------------+------------------+
| 3 | 2467 | 2016-07-31 | COUNTRY A |
+----+--------------+------------+------------------+
| 4 | 2467 | 2016-08-31 | COUNTRY B |
+----+--------------+------------+------------------+
| 5 | 2467 | 2017-09-31 | COUNTRY A |
+----+--------------+------------+------------------+
For every employeeId, if the location changes between two dates, I want the old date, old location, new date and new location. Here is the query that I wrote:
WITH
cte AS (
SELECT
employeeId,
date,
location,
FROM
MY_TABLE),
movements AS (
SELECT
a.employeeId AS EMPLOYEEID,
b.employeeId AS EMPLOYEEID_NEW,
a.date AS OLD_DATE,
b.date AS NEW_DATE,
a.location AS OLD_LOCATION,
b.location AS NEW_LOCATION
FROM
cte a
INNER JOIN
cte b
ON
a.employeeId = b.employeeId
WHERE
b.date > a.date
AND DATE_DIFF(b.date, a.date, MONTH) = 1
AND a.location <> b.location
)
SELECT
NEW_DATE,
OLD_DATE,
COUNT(EMPLOYEEID) AS MOVED,
OLD_LOCATION,
NEW_LOCATION
FROM
movements
GROUP BY
NEW_DATE,
OLD_DATE,
EMPLOYEEID,
OLD_LOCATION,
NEW_LOCATION
ORDER BY
MOVED,
NEW_DATE,
OLD_LOCATION,
NEW_LOCATION
I get the following results:
+----+------------+------------+---------+----------------+----------------+
| | NEW_DATE | OLD_DATE | MOVED | OLD_LOCATION | NEW_LOCATION |
+====+============+============+=========+================+================+
| 0 | 2016-07-01 | 2016-06-01 | 1 | COUNTRY A | COUNTRY B |
+----+------------+------------+---------+----------------+----------------+
| 1 | 2016-07-01 | 2016-06-30 | 1 | COUNTRY A | COUNTRY B |
+----+------------+------------+---------+----------------+----------------+
| 2 | 2016-07-31 | 2016-06-30 | 1 | COUNTRY A | COUNTRY B |
+----+------------+------------+---------+----------------+----------------+
| 3 | 2016-07-31 | 2016-06-01 | 1 | COUNTRY A | COUNTRY B |
+----+------------+------------+---------+----------------+----------------+
| 4 | 2016-08-01 | 2016-07-01 | 1 | COUNTRY C | COUNTRY B |
+----+------------+------------+---------+----------------+----------------+
| 5 | 2016-08-01 | 2016-07-31 | 1 | COUNTRY C | COUNTRY B |
+----+------------+------------+---------+----------------+----------------+
| 6 | 2016-08-31 | 2016-07-01 | 1 | COUNTRY C | COUNTRY B |
+----+------------+------------+---------+----------------+----------------+
| 7 | 2016-08-31 | 2016-07-31 | 1 | COUNTRY C | COUNTRY B |
+----+------------+------------+---------+----------------+----------------+
The results do not seem to be correct. I highly doubt that the number of movements between two countries is always 1... Can you please have a look at the query and let me know where I am erring? Also, fyi, I have obfuscated the data provided here. I switched around country names and dates, basically.
I was not able to identify within your query how you selected the previous date and location. However, I was able to simplify it and achieve what you aim.
First, I have changed your dummy data a little bit in order to check some cases. Thus, I have used the following:
employeeId|date|location
2467|2016-04-30|COUNTRY A
2467|2016-05-31|COUNTRY A
2467|2016-06-30|COUNTRY B
2467|2016-07-31|COUNTRY A
2467|2016-08-31|COUNTRY B
2467|2017-09-30|COUNTRY A
2468|2017-09-30|COUNTRY A
2468|2017-09-30|COUNTRY A
Notice that the employeeId 2467 changes countries 4 times.
I have created the following script:
WITH data AS (
SELECT employeeid, date, location, (LAG(date) OVER (PARTITION BY employeeid ORDER BY date ASC)) AS prev_date,
(LAG(location) OVER (PARTITION BY employeeid ORDER BY date ASC)) AS prev_loc
FROM `test-proj-261014.bq_load_codelab.employee`
ORDER BY date
)
SELECT * FROM data
WHERE DATE_DIFF(date, prev_date, MONTH)>=1
AND prev_loc IS NOT NULL
AND location<>prev_loc
ORDER BY date
As you can see I used the LAG() function in order to select the previous date and location for each row. I would like to point that when LAG() is used in the first row, it returns null. For this reason a filter WHEN prev_loc IS NOT NULL is used.
The output is as below:
As you can see I have selected the employeeid to effectively check the results. Although, you can delete this field from the last select statement and retrieve only the fields that you wish.
Lastly, if you want to check how many times an employee has moved, you will need another piece of code to query the table above. When you use COUNT(), you can not retrieve the old and new dates, as you stated above, because you are counting how many times each employee moved and grouping by the employeeid. That means you will have a number(count) per employeeid. Thus, in this case I have saved the above result in a temp table called final_output to query it as following:
SELECT employeeid, count(employeeid) as MOVED FROM final_output
GROUP BY employeeid
And the output:
The employer who has the id 2467 has moved 4 times within the time frame analysed.

Select max value from joined table

i need help with a mysql query. My tables:
objects
+---------+--------+
| id | name |
+---------+--------+
| 1 | house 1|
| 2 | house 2|
| 3 | house 3|
+---------+--------+
objects_expire
+----------+-----------+
| object_id| expire |
+----------+-----------+
| 1 | 2014-09-11|
| 1 | 2015-09-11|
| 2 | 2014-09-11|
| 2 | 2015-09-11|
| 2 | 2016-09-11|
| 3 | 2013-09-11|
| 3 | 2014-09-11|
| 3 | 2015-09-15|
+----------+-----------+
Now i need objects where max 'expire' is bigger then 2015-09-04 and smaller then 2015-09-18 (+/- 7 days)
Like this result:
+----------+-----------+-----------+
| object_id| expire | name |
+----------+-----------+-----------+
| 1 | 2015-09-11| house 1 |
| 3 | 2015-09-15| house 3 |
+----------+-----------+-----------+
This is what i have now:
SELECT o.id, MAX(uio.expire) AS object_expires
FROM objects AS o
LEFT JOIN objects_expire AS oe ON oe.object_id = o.id
WHERE expire < '2015-09-18'
AND expires > '2015-09-04'
GROUP BY o.id
But thats not correct.
Thanks for any help!!!
One usual approach is to do the grouping first and then join back, also if you do not want to hardcode the dates you can always use date_sub and date_add function to get -/+ 7 days from the current date.
select
o.id,
e.mexpire as expire,
o.name
from objects o
join(
select object_id,max(expire) as mexpire
from objects_expire
group by object_id
having mexpire > date_sub(curdate(),interval 7 day) and mexpire < date_add(curdate(),interval 7 day)
)e
on o.id = e.object_id
You need to group, and to use HAVING as a filter for the grouped column
select object_id, max(expire) as expire, name
from objects_expire
left join objects on objects_expire.object_id=objects.id
group by object_id, name
having max(expire) < '2015-09-17'
and max(expire) > '2015-09-03'

Using MySQL group by clause with where clause

I have two tables, one that store product information and one that stores reviews for the products.
I am now trying to get the number of reviews submitted for the products between two dates but for some reason I get the same results regardless of the dates i put.
This is my query:
SELECT
productName,
COUNT(*) as `count`,
avg(rating) as `rating`
FROM `Reviews`
LEFT JOIN `Products` using(`productID`)
WHERE `date` BETWEEN '2015-07-20' AND '2015-07-30'
GROUP BY
`productName`
ORDER BY `count` DESC, `rating` DESC;
This returns:
+------------+---------------------+
| productName| count|rating |
+------------+------+--------------+
| productA | 23 | 4.3333333 |
| productB | 17 | 4.25 |
| productC | 10 | 3.5 |
+------------+---------------------+
Products table:
+---------+-------------+
|productID | productName|
+---------+-------------+
| 1 | productA |
| 2 | productB |
| 3 | productC |
+---------+-------------+
Reviews table
+---------+-----------+--------+---------------------+
|reviewID | productID | rating | date |
+---------+-----------+--------+---------------------+
| 1 | 1 | 4.5 | 2015-07-27 17:47:01|
| 2 | 1 | 3.5 | 2015-07-27 18:54:22|
| 3 | 3 | 2 | 2015-07-28 13:28:37|
| 4 | 1 | 5 | 2015-07-28 18:33:14|
| 5 | 2 | 1.5 | 2015-07-29 11:58:17|
| 6 | 2 | 3.5 | 2015-07-30 15:04:25|
| 7 | 2 | 2.5 | 2015-07-30 18:11:11|
| 8 | 1 | 3 | 2015-07-30 18:26:23|
| 9 | 1 | 3 | 2015-07-30 21:35:05|
| 10 | 1 | 4.5 | 2015-07-31 14:25:47|
| 11 | 3 | 0.5 | 2015-07-31 14:47:48|
+---------+-----------+--------+---------------------+
when I put two random dates that I do know for sure they not on the date column, I will still get the same results. Even when I want to retrieve records only on a certain day, I get the same results.
You should not use left join, because by doing so you retrieve all the data from one table. What you should use is something like :
select
productName,
count(*) as `count`,
avg(rating) as `rating`
from
products p,
reviews r
where
p.productID = r.productID
and `date` between '2015-07-20' and '2015-07-30'
group by productName
order by count desc, rating desc;
If the result, given your sample data, that you're looking for is:
| productName | count | rating |
|-------------|-------|--------|
| productA | 5 | 4 |
| productB | 3 | 3 |
| productC | 1 | 2 |
This is the count and average of reviews made on any date between 2015-07-20 and 2015-07-30 inclusive.
Then the there are two issues with your query. First, you need to change the join to a inner join instead of a left join, but more importantly you need to change the date condition as you are currently excluding reviews that fall on the last date on the range, but after midnight.
This happens because your between clause compares datetime values with date values so the comparison ends up being date between '2015-07-20 00:00:00' and '2015-07-30 00:00:00' which clearly excludes some dates at the end.
The fix is to either change the date condition so that the end is a day later:
where date >= '2015-07-20' and date < '2015-07-31'
or cast the date column to a date value, which will remove the time part:
where date(date) between '2015-07-20' and '2015-07-30'
Sample SQL Fiddle
You are using a LEFT JOIN between your reviews and your products tables. This will result in all the rows of reviews being shown with some rows having all product columns left empty.
You should use INNER JOIN, as this will filter only the wanted results.
(In the end I can only guess, since I don't even know which column belongs to which table ...)
The full query (very similar to Angelo Giannis's solution):
select
productName,
count(*) as `count`,
avg(rating) as `rating`
from
products INNER JOIN reviews USING(productId)
where date between '2015-07-20' and '2015-07-30'
group by productName
order by count desc, rating desc;
Here a fiddle with my and Angelo's solution (they both work).