MySQL SUM not working correctly after JOIN - mysql

I have 2 tables that look like the following:
TABLE 1 TABLE 2
user_id | date accountID | date | hours
And I'm trying to add up the hours by the week. If I use the following statement I get the correct results:
SELECT
SUM(hours) as totalHours
FROM
hours
WHERE
accountID = 244
AND
date >= '2014-02-02' and date < '2014-02-09'
GROUP BY
accountID
But when I join the two tables I get a number like 336640 when it should be 12
SELECT
SUM(hours) as totalHours
FROM
hours
JOIN table1 ON
user_id = accountID
WHERE
accountID = 244
AND
date >= '2014-02-02' and date < '2014-02-09'
GROUP BY
accountID
Does anyone know why this is?
EDIT: Turns out I just needed to add DISTINC, thanks!

JOIN operations usually generate more rows in the result table: join's result is a row for every possible pair of rows in the two joined tables that happens to meet the criterion selected in the ON clause. If there are multiple rows in table1 that match each row in hours, the result of your join will repeat hours.accountID and hours.hours many times. So, adding up the hours yields a high result.

The reason is that the table you are joining to matches multiple rows in the first table. These all get added together.
The solution is to do the aggregation in a subquery before doing the join:
select totalhours
from (SELECT SUM(hours) as totalHours
FROM hours
WHERE accountID = 244 AND
date >= '2014-02-02' and date < '2014-02-09'
GROUP BY accountID
) h join
table1 t1
on t1.user_id = h.accountID;
I suspect your actual query is more complicated. For instance, table1 is not referenced in this query so the join is only doing filtering/duplication of rows. And the aggregation on hours is irrelevant when you are choosing only one account.

You should probably be specifying LEFT JOIN to be sure that it won't eliminate rows that don't match.
Also, date BETWEEN ? AND ? is preferable to date >= ? AND date < ?.

Related

Mysql. how to join two tables from this example?

everybody.
I have two requests.
1 query - show the list of dates with time 22:00 from one table
SELECT DATE_FORMAT(tt.create_time,"%Y-%m-%d 22:00:00") AS DAY,tt.id
FROM tick tt
GROUP BY DATE_FORMAT(tt.create_time,"%Y-%m-%d")
2 query - shows the number of records that have create_time less than the date specified in the query
SELECT COUNT(*) AS count FROM
(SELECT * FROM
(SELECT * FROM tick_history th
WHERE th.create_time < '2019-04-15 22:00:00'
ORDER BY th.id DESC) AS t1
GROUP BY t1.tick_id) AS t2
WHERE t2.state NOT IN (1,4,9) AND t2.queue = 1
Is it possible to somehow combine these two queries to get one column with dates from the first query, and the second column is the number from the second query for each date from the first column?
Ie as if substituted date and calculated the number of the second request..
Is it possible? Help with request please

MySQL - Group By Latest and Join First Instance

I've tried a few things but I've ended up confusing myself.
What I am trying to do is find the most recent records from a table and left join the first after a certain date.
An example might be
id | acct_no | created_at | some_other_column
1 | A0001 | 2017-05-21 00:00:00 | x
2 | A0001 | 2017-05-22 00:00:00 | y
3 | A0001 | 2017-05-22 00:00:00 | z
So ideally what I'd like is to find the latest record of each acct_no sorted by created_at DESC so that the results are grouped by unique account numbers, so from the above record it would be 3, but obviously there would be multiple different account numbers with records for different days.
Then, what I am trying to achieve is to join on the same table and find the first record with the same account number after a certain date.
For example, record 1 would be returned for a query joining on acct_no A0001 after or equal to 2017-05-21 00:00:00 because it is the first result after/equal to that date, so these are sorted by created_at ASC AND created_at >= "2017-05-21 00:00:00" (and possibly AND id != latest.id.
It seems quite straight forward but I just can't get it to work.
I only have my most recent attempt after discarding multiple different queries.
Here I am trying to solve the first part which is to select the most recent of each account number:
SELECT latest.* FROM my_table latest
JOIN (SELECT acct_no, MAX(created_at) FROM my_table GROUP
BY acct_no) latest2
ON latest.acct_no = latest2.acct_no
but that still returns all rows rather than the most recent of each.
I did have something using a join on a subquery but it took so long to run I quite it before it finished, but I have indexes on acct_no and created_at but I've also ran into other problems where columns in the select are not in the group by. I know this can be turned off but I'm trying to find a way to perform the query that doesn't require that.
Just try a little edit to your initial query:
SELECT latest.* FROM my_table latest
join (SELECT acct_no, MAX(created_at) as max_time FROM my_table GROUP
BY acct_no) latest2
ON latest.acct_no = latest2.acct_no AND latest.created_at = latest2.max_time
Trying a different approach. Not sure about the performance impact. But hoping that avoiding self join and group by would be better in terms of performance.
SELECT * FROM (
SELECT mytable1.*, IF(#temp <> acct_no, 1, 0) selector, #temp := acct_no FROM `mytable1`
JOIN (SELECT #temp := '') a
ORDER BY acct_no, created_at DESC , id DESC
) b WHERE selector = 1
Sql Fiddle
you need to get the id where max date is created.
SELECT latest.* FROM my_table latest
join (SELECT max(id) as id FROM my_table GROUP
BY acct_no where created_at = MAX(created_at)) latest2
ON latest.id = latest2.id

MySQL right outer join query

I have a query regarding a query in MySQL.
I have 2 tables one containing SalesRep details like name, email, etc. I have another table with the sales data which has reportDate, customers served and link to the salesrep via a foreign key. One thing to note is that the reportDate is always a friday.
So the requirement is this: I need to find sales data for a 13 week period for a given list of sales reps - with 0 as customers served if on a particular friday there is no data. The query result is consumed by a Java application which relies on the 13 rows of data per sales rep.
I have created a table with all the Friday dates populated and wrote a outer join like below:
select * from (
select name, customersServed, reportDate
from Sales_Data salesData
join `SALES_REPRESENTATIVE` salesRep on salesRep.`employeeId` = salesData.`employeeId`
where employeeId = 1
) as result
right outer join fridays on fridays.datefield = reportDate
where fridays.datefield between '2014-10-01' and '2014-12-31'
order by datefield
Now my doubts:
Is there any way where i can get the name to be populated for all 13 rows in the above query?
If there are 2 sales reps, I'd like to use a IN clause and expect 26 rows in total - 13 rows per sales person (even if there is no record for that person, I'd still like to see 13 rows of nulls), and 39 for 3 sales reps
Can these be done in MySql and if so, can anyone point me in the right direction?
You must first select your lines (without customersServed) and then make an outer join for the customerServed
something like that:
select records.name, records.datefield, IFNULL(salesRep.customersServed,0)
from (
select employeeId, name, datefield
from `SALES_REPRESENTATIVE`, fridays
where fridays.datefield between '2014-10-01' and '2014-12-31'
and employeeId in (...)
) as records
left outer join `Sales_Data` salesData on (salesData.employeeId = records.employeeId and salesData.reportDate = records.datefield)
order by records.name, records.datefield
You'll have to do 2 level nesting, in your nested query change to outer join for salesrep, so you have atleast 1 record for each rep, then a join with fridays without any condition to have atleast 13 record for each rep, then final right outer join with condition (fridays.datefield = innerfriday.datefield and (reportDate is null or reportDate=innerfriday.datefield))
Very inefficient, try to do it in code except for very small data.

How can I write a query that aggregate a single row with latest date among multiple set of rows?

I have a MySQL table where there are many rows for each person, and I want to write a query which aggregates rows with special constraint. (one per person)
For example, lets say the table is consist of following data.
name date reason
---------------------------------------
John 2013-04-01 14:00:00 Vacation
John 2013-03-31 18:00:00 Sick
Ted 2012-05-06 20:00:00 Sick
Ted 2012-02-20 01:00:00 Vacation
John 2011-12-21 00:00:00 Sick
Bob 2011-04-02 20:00:00 Sick
I want to see the distribution of 'reason' column. If I just write a query like below
select reason, count(*) as count from table group by reason
then I will be able to see number of reasons for this table overall.
reason count
------------------
Sick 4
Vacation 2
However, I am only interested in single reason from each person. The reason that should be counted should be from a row with latest date from the person's records. For example, John's latest reason would be Vacation while Ted's latest reason would be Sick. And Bob's latest reason (and the only reason) is Sick.
The expected result for that query should be like below. (Sum of count will be 3 because there are only 3 people)
reason count
-----------------
Sick 2
Vacation 1
Is it possible to write a query such that single latest reason will be counted when I want to see distribution(count) of reasons?
Here are some facts about the table.
The table has tens of millions of rows
For most of times, each person has one reason.
Some people have multiple reasons, but 99.99% of people have fewer than 5 reasons.
There are about 30 different reasons while there are millions of distinct names.
The table is partitioned based on date range.
SELECT T.REASON, COUNT(*)
FROM
(
SELECT PERSON, MAX(DATE) AS MAX_DATE
FROM TABLE-NAME
GROUP BY PERSON
) A, TABLE-NAME T
WHERE T.PERSON = A.PERSON AND T.DATE = A.MAX_DATE
GROUP BY T.REASON
Try this
select reason, count(*) from
(select reason from table where date in
(select max(date) from table group by name)) t
group by reason
In MySQL, it's not very efficient to do this kind of query since you don't have access to tools like partitionning query in SQL Server or Oracle.
You can still emulate it by doing a subquery and retrieve the rows based on the condition you need, here the maximum date :
SELECT t.reason, COUNT(1)
FROM
(
SELECT name, MAX(adate) AS maxDate
FROM #aTable
GROUP BY name
) maxDateRows
INNER JOIN #aTable t ON maxDateRows.name = t.name
AND maxDateRows.maxDate = t.adate
GROUP BY t.reason
You can see a sample here.
Test this query on your samples, but I'm afraid that it will be slow as hell.
For your information, you can do the same thing in a more elegant and much much faster way in SQL Server :
SELECT reason, COUNT(1)
FROM
(
SELECT name
, reason
, RANK() OVER(PARTITION BY name ORDER BY adate DESC) as Rank
FROM #aTable
) AS rankTable
WHERE Rank = 1
GROUP BY reason
The sample is here
If you are really stuck to MySql, and the first query is too slow, then you can split the problem.
Do a first query creating a table:
CREATE TABLE maxDateRows AS
SELECT name, MAX(adate) AS maxDate
FROM #aTable
GROUP BY name
Then create index on both name and maxDate.
Finally, get the results :
SELECT t.reason, COUNT(1)
FROM maxDateRows m
INNER JOIN #aTable t ON m.name = t.name
AND m.maxDate = t.adate
GROUP BY t.reason
The solution you are looking for seems to be solved by this query :
select
reason,
count(*)
from (select * from tablename group by name) abc
group by
reason
It is quite fast and simple. You can view the SQL Fiddle
Apologies if this answer duplicates an existing. Maybe I'm suffering from some form aphasia but I cannot see it...
SELECT x.reason
, COUNT(*)
FROM absentism x
JOIN
( SELECT name,MAX(date) max_date FROM absentism GROUP BY name) y
ON y.name = x.name
AND y.max_date = x.date
GROUP
BY reason;

How to use query results in another query?

I am trying to write a query which will give me the last entry of each month in a table called transactions. I believe I am halfway there as I have the following query which groups all the entries by month then selects the highest id in each group which is the last entry for each month.
SELECT max(id),
EXTRACT(YEAR_MONTH FROM date) as yyyymm
FROM transactions
GROUP BY yyyymm
Gives the correct results
id yyyymm
100 201006
105 201007
111 201008
118 201009
120 201010
I don’t know how to then run a query on the same table but select the balance column where it matches the id from the first query to give results
id balance date
120 10000 2010-10-08
118 11000 2010-09-29
I've tried subqueries and looked at joins but i'm not sure how to go about using them.
You can make your first select an inline view, and then join to it. Something like this (not tested, but should give you the idea):
SELECT x.id
, t.balance
, t.date
FROM your_table t
/* here, we make your select an inline view, then we can join to it */
, (SELECT max(id) id,
EXTRACT(YEAR_MONTH FROM date) as yyyymm
FROM transactions
GROUP BY yyyymm) x
WHERE t.id = x.id