I have a query having LEFT JOIN, group by and ROLLUP, like this:
Select * from
(
Select user_agent,
value,
recoqty,
count(recoqty) as C
from august_2016_search_stats SS
LEFT JOIN august_2016_extra E
on (SS.id = E.stats_id and E.key = 'personalized')
where time >= '2016-08-22 00:00:00' and
time <= '2016-08-22 23:59:59' and
query_type = 'myfeed' and
recoqty = 'topics'
group by recoqty,
user_agent,
value
with ROLLUP
having recoqty is not null
) D
order by C desc;
which gives result like this:
+------------+-------+---------+------+
| user_agent | value | recoqty | C |
+------------+-------+---------+------+
| NULL | NULL | topics | 1330 |
| abscdef | NULL | topics | 1330 |
| abscdef | NULL | topics | 1285 |
| abscdef | 1 | topics | 25 |
| abscdef | 0 | topics | 20 |
+------------+-------+---------+------+
Here, the value (NULL 1285) is due to LEFT JOIN, and the value (NULL 1330) is due to rollup.
However, is there a way to replace NULL value ONLY for LEFT JOIN and not for ROLLUP ?
This is a little tricky, because it appears that the NULL values coming from your data are indistinguishable from the NULL values coming from the rollup. One possible workaround is to first do a non-aggregation query in which you replace the NULL values from the value column with 'NA', or some other placeholder, using COALESCE. Then aggregate this as a subquery using GROUP BY with rollup. Then the NULL values in the value column will with certainty be from the rollup and not your actual data.
SELECT t.user_agent,
t.value,
t.recoqty,
t.C
FROM
(
SELECT user_agent,
COALESCE(value, 'NA') AS value
recoqty,
COUNT(recoqty) AS C
FROM august_2016_search_stats SS
LEFT JOIN august_2016_extra E
ON SS.id = E.stats_id AND
E.key = 'personalized'
WHERE time >= '2016-08-22 00:00:00' AND
time <= '2016-08-22 23:59:59' AND
query_type = 'myfeed' AND
recoqty = 'topics'
) t
GROUP BY t.recoqty,
t.user_agent,
t.value
WITH ROLLUP
HAVING t.recoqty IS NOT NULL
Related
I want to SELECT the Latest Date, the Second Latest Date and the First Date FROM a table1 where the First Date is higher than a reference Date found in another table2. And that reference Date should also be the latest from that table2. I have a solution, supposed to be. But the problem is, the solutions will not return an output if there is ONLY 1 record from table1. Example of the tables:
table1
Reg ID | DateOfAI | byTechnician
2GP001 | 2015-01-13 | 31
2GP001 | 2015-02-18 | 31
2GP001 | 2017-11-10 | 45
2GP001 | 2017-11-30 | 32
2GP044 | 2017-11-30 | 28
2GP001 | 2017-12-23 | 32
table2
Reg ID | DateOfCalving | DryOffDate
2GP001 | 2016-01-14 |
2GP070 | 2016-01-14 |
2GP065 | 2017-04-08 |
2GP001 | 2017-04-12 |
my expected output would be:
Reg ID | LatestDateOfCalving | 1stDateOfAI | PreviousAIDate | LastestAIDate
2GP001 | 2017-04-12 | 2017-11-10 | 2017-11-30 | 2017-12-23
I have searched everywhere from the moon and back... still no luck. these are the queries that i have used
the Fisrt:
SELECT b.actualDam,COUNT(x.actualDam) AS ilanba, max(b.breedDate) AS huli, max(x.breedDate) AS nex,MIN(x.breedDate) AS una,IFNULL(c.calvingDate,NULL) AS nganak,r.*,h.herdID,a.animalID,a.regID, IFNULL(a.dateOfBirth,NULL) AS buho
FROM x_animal_breeding_rec b
LEFT JOIN x_animal_calving_rec c ON b.recID=c.brecID
LEFT JOIN x_herd_animal_rel r ON b.actualDam=r.animal
LEFT JOIN x_herd h ON r.herd=h.herdID
LEFT JOIN x_animal_main_info a ON b.actualDam=a.animalID
JOIN x_animal_breeding_rec x ON b.actualDam = x.actualDam AND x.breedDate < b.breedDate
WHERE h.herdID = ? AND x.mateType = ? AND x.recFlag = ? GROUP BY b.actualDam
and the Second one that I've tried is this code:
SELECT b.recID
, b.actualDam
, b.breedDate
, min(b.breedDate) AS una
, max(b.breedDate) AS huli
, COUNT(b.actualDam) AS sundot
, b.mateType
, b.recFlag
, a.animalID
, a.regID
, h.*
FROM
( SELECT c.recID, c.actualDam
, c.breedDate
, c.mateType
, c.recFlag
, CASE WHEN #prev=c.recID THEN #i:=#i+1 ELSE #i:=1 END i
, #prev:=c.recID prev
FROM x_animal_breeding_rec c
, ( SELECT #prev:=null,#i:=0 ) vars
ORDER BY c.recID,c.breedDate DESC
) b
LEFT JOIN x_animal_main_info a ON b.actualDam=a.animalID
LEFT JOIN x_herd_animal_rel h ON b.actualDam=h.animal
WHERE i <= 2 GROUP BY b.actualDam HAVING h.herd = ? AND b.mateType = ? AND b.recFlag = ? ORDER BY b.breedDate DESC
Another problem here is the first solution returns a WRONG COUNT. the second solution returns a CORRECT COUNT, however, wrong Dates were returned. I hope you could give me an idea. Thanx in Advance.
The following query answers your question:
SELECT
RegID,
LatestDateOfCalving,
MIN(DateOfAI) AS 1stDateOfAI,
REPLACE(SUBSTRING_INDEX(GROUP_CONCAT(DateOfAI ORDER BY DateOfAI DESC), ',', 2), CONCAT(MAX(DateOfAI), ','), '') AS PreviousAIDate,
MAX(DateOfAI) AS LatestAIDate
FROM (
SELECT
t1.RegID,
LatestDateOfCalving,
DateOfAI,
IF(DateOfAI >= LatestDateOfCalving, 1, 0) AS dates
FROM table1 AS t1
INNER JOIN (
SELECT
RegID,
MAX(DateOfCalving) AS LatestDateOfCalving
FROM table2 GROUP BY RegID
) AS tt2 ON t1.RegID = tt2.RegID) AS x
WHERE dates = 1
GROUP BY RegID
HAVING COUNT(dates) >= 3;
Output:
+--------+---------------------+-------------+----------------+--------------+
| RegID | LatestDateOfCalving | 1stDateOfAI | PreviousAIDate | LatestAIDate |
+--------+---------------------+-------------+----------------+--------------+
| 2GP001 | 2017-04-12 | 2017-11-10 | 2017-11-30 | 2017-12-23 |
+--------+---------------------+-------------+----------------+--------------+
DEMO
In a subquery we select RegID and LatestDateOfCalving from table2 in order to have a reference date. Then join it to table1 and flag the record whether DateOfAI is greater or equal to LatestDateOfCalving (IF(DateOfAI >= LatestDateOfCalving, 1, 0)). We use this subquery in the outer query (SELECT RegID, LatestDateOfCalving, MIN(DateOfAI) AS 1stDateOfAI, MAX(DateOfAI) AS LatestAIDate, ...) and select only those records where the DateOfAI are at or after LatestDateOfCalving (WHERE dates = 1, where 1 is the flag where the condition was true) and have at least 3 records (HAVING COUNT(dates) >= 3). In the outer query I use the REPLACE(SUBSTRING_INDEX(GROUP_CONCAT(...))) structure in order to extract the previousAIDate from a comma (,) separated list of dates.
I'm trying to count the number of sales orders has been canceled in a time period. But I run into the problem that it doesn't return results that are zero
My table
+---------------+------------+------------------+
| metrausername | signupdate | cancellationdate |
+---------------+------------+------------------+
| GLO00026 | 2017-06-22 | 2017-03-20 |
| GLO00055 | 2017-06-22 | 2017-04-18 |
| GLO00022 | 2017-06-27 | NULL |
| GLO00044 | 2017-06-24 | NULL |
| GLO00005 | 2017-06-26 | NULL |
+---------------+------------+------------------+
The statment i'm trying to count with
SELECT metrausername, COUNT(*) AS count FROM salesdata2
WHERE cancellationdate IS NOT NULL
AND signupDate >= '2017-6-21' AND signupDate <= '2017-7-20'
GROUP BY metrausername;
Let me know if any additional information would help
If the metrausername is filtered out by the where, it won't appear. Left join to the aggregation to get round this:
select distinct a1.metrausername, coalesce(a2.counted,0) as counted -- coalesce replaces null with a value
from salesdata2 a1
left join
(
SELECT metrausername, COUNT(*) AS counted
FROM salesdata2
WHERE cancellationdate IS NOT NULL
AND signupDate >= '2017-6-21' AND signupDate <= '2017-7-20'
GROUP BY metrausername
) a2
on a1.metrausername = a2.metrausername
I would just do this by moving the filtering clause to the select. Assuming you really do want the date range (as opposed to having users outside the range), then:
SELECT metrausername, COUNT(cancellationdate ) AS count
FROM salesdata2
WHERE signupDate >= '2017-06-21' AND signupDate <= '2017-07-20'
GROUP BY metrausername;
COUNT(<colname>) counts the non-NULL values, so this seems like the simplest approach.
I have this query
SELECT
`from_id` as user_id,
MAX(`createdon`) as updated_at,
SUM(`unread`) as new,
u.username,
p.sessionid,
s.access
FROM (
SELECT `from_id`, `createdon`, `unread`
FROM `modx_messenger_messages`
WHERE `to_id` = {$id}
UNION
SELECT `to_id`, `createdon`, 0
FROM `modx_messenger_messages`
WHERE `from_id` = {$id}
ORDER BY `createdon` DESC
) as m
LEFT JOIN `modx_users` as u ON (u.id = m.from_id)
LEFT JOIN `modx_user_attributes` as p ON (p.internalKey = m.from_id)
LEFT JOIN `modx_session` as s ON (s.id = p.internalKey)
GROUP BY `from_id`
ORDER BY `new` DESC, `createdon` DESC;
table
id | message | createdon | from_id | to_id | unread
1 | test | NULL | 5 | 6 | 0
2 | test2 | NULL | 6 | 5 | 1
3 | test3 | NULL | 6 | 5 | 1
result new = 28. Why?
If remove joins new = 2, correctly.
Though it depends on the actual database, pure SQL says that a statement using GROUP BY requires all non-aggregated columns to be in the GROUP BY. Without including all columns, weird stuff can happen, which might explain why you get different results. If you know that the other columns are going to be the same within the user_id, you could do MAX(u.username) or something similar (again, depending on your database server). So I'd try and clean up the SQL statement first.
For example I have created 3 index:
click_date - transaction table, daily_metric table
order_date - transaction table
I want to check does my query use index, I use EXPLAIN function and get this result:
+----+--------------+--------------+-------+---------------+------------+---------+------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+--------------+-------+---------------+------------+---------+------+--------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 668 | Using temporary; Using filesort |
| 2 | DERIVED | <derived3> | ALL | NULL | NULL | NULL | NULL | 645 | |
| 2 | DERIVED | <derived4> | ALL | NULL | NULL | NULL | NULL | 495 | |
| 4 | DERIVED | transaction | ALL | order_date | NULL | NULL | NULL | 291257 | Using where; Using temporary; Using filesort |
| 3 | DERIVED | daily_metric | range | click_date | click_date | 3 | NULL | 812188 | Using where; Using temporary; Using filesort |
| 5 | UNION | <derived7> | ALL | NULL | NULL | NULL | NULL | 495 | |
| 5 | UNION | <derived6> | ALL | NULL | NULL | NULL | NULL | 645 | Using where; Not exists |
| 7 | DERIVED | transaction | ALL | order_date | NULL | NULL | NULL | 291257 | Using where; Using temporary; Using filesort |
| 6 | DERIVED | daily_metric | range | click_date | click_date | 3 | NULL | 812188 | Using where; Using temporary; Using filesort |
| NULL | UNION RESULT | <union2,5> | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+--------------+-------+---------------+------------+---------+------+--------+----------------------------------------------+
In EXPLAIN results I see, that index order_date of transaction table is not used, do I correct understand ?
Index click_date of daily_metric table was used correct ?
Please tell my how to understand from EXPLAIN result does my created index is used in query properly ?
My query:
SELECT
partner_id,
the_date,
SUM(clicks) as clicks,
SUM(total_count) as total_count,
SUM(count) as count,
SUM(total_sum) as total_sum,
SUM(received_sum) as received_sum,
SUM(partner_fee) as partner_fee
FROM (
SELECT
clicks.partner_id,
clicks.click_date as the_date,
clicks,
orders.total_count,
orders.count,
orders.total_sum,
orders.received_sum,
orders.partner_fee
FROM
(SELECT
partner_id, click_date, sum(clicks) as clicks
FROM
daily_metric WHERE DATE(click_date) BETWEEN '2013-04-01' AND '2013-04-30'
GROUP BY partner_id , click_date) as clicks
LEFT JOIN
(SELECT
partner_id,
DATE(order_date) as order_dates,
SUM(order_sum) as total_sum,
SUM(customer_paid_sum) as received_sum,
SUM(partner_fee) as partner_fee,
count(*) as total_count,
count(CASE
WHEN status = 1 THEN 1
ELSE NULL
END) as count
FROM
transaction WHERE DATE(order_date) BETWEEN '2013-04-01' AND '2013-04-30'
GROUP BY DATE(order_date) , partner_id) as orders ON orders.partner_id = clicks.partner_id AND clicks.click_date = orders.order_dates
UNION ALL SELECT
orders.partner_id,
orders.order_dates as the_date,
clicks,
orders.total_count,
orders.count,
orders.total_sum,
orders.received_sum,
orders.partner_fee
FROM
(SELECT
partner_id, click_date, sum(clicks) as clicks
FROM
daily_metric WHERE DATE(click_date) BETWEEN '2013-04-01' AND '2013-04-30'
GROUP BY partner_id , click_date) as clicks
RIGHT JOIN
(SELECT
partner_id,
DATE(order_date) as order_dates,
SUM(order_sum) as total_sum,
SUM(customer_paid_sum) as received_sum,
SUM(partner_fee) as partner_fee,
count(*) as total_count,
count(CASE
WHEN status = 1 THEN 1
ELSE NULL
END) as count
FROM
transaction WHERE DATE(order_date) BETWEEN '2013-04-01' AND '2013-04-30'
GROUP BY DATE(order_date) , partner_id) as orders ON orders.partner_id = clicks.partner_id AND clicks.click_date = orders.order_dates
WHERE
clicks.partner_id is NULL
ORDER BY the_date DESC
) as t
GROUP BY the_date ORDER BY the_date DESC LIMIT 50 OFFSET 0
Although I can't explain what the EXPLAIN has dumped, I thought there must be an easier solution to what you have and came up with the following. I would suggest the following indexes to optimize your existing query for the WHERE date range and grouping by partner.
Additionally, when you have a query that uses a FUNCTION on a field, it doesn't take advantage of the index. Such as your DATE(order_date) and DATE(click_date). To allow the index to better be used, qualify the full date/time such as 12:00am (morning) up to 11:59pm. I would typically to this via
x >= someDate #12:00 and x < firstDayAfterRange.
in your example would be (notice less than May 1st which gets up to April 30th at 11:59:59pm)
click_date >= '2013-04-01' AND click_date < '2013-05-01'
Table Index
transaction (order_date, partner_id)
daily_metric (click_date, partner_id)
Now, an adjustment. Since your clicks table may have entries the transactions dont, and vice-versa, I would adjust this query to do a pre-query of all possible date/partners, then left-join to respective aggregate queries such as:
SELECT
AllParnters.Partner_ID,
AllParnters.the_Date,
coalesce( clicks.clicks, 0 ) Clicks,
coalesce( orders.total_count, 0 ) TotalCount,
coalesce( orders.count, 0 ) OrderCount,
coalesce( orders.total_sum, 0 ) OrderSum,
coalesce( orders.received_sum, 0 ) ReceivedSum,
coalesce( orders.partner_fee 0 ) PartnerFee
from
( select distinct
dm.partner_id,
DATE( dm.click_date ) as the_Date
FROM
daily_metric dm
WHERE
dm.click_date >= '2013-04-01' AND dm.click_date < '2013-05-01'
UNION
select
t.partner_id,
DATE(t.order_date) as the_Date
FROM
transaction t
WHERE
t.order_date >= '2013-04-01' AND t.order_date < '2013-05-01' ) AllParnters
LEFT JOIN
( SELECT
dm.partner_id,
DATE( dm.click_date ) sumDate,
sum( dm.clicks) as clicks
FROM
daily_metric dm
WHERE
dm.click_date >= '2013-04-01' AND dm.click_date < '2013-05-01'
GROUP BY
dm.partner_id,
DATE( dm.click_date ) ) as clicks
ON AllPartners.partner_id = clicks.partner_id
AND AllPartners.the_date = clicks.sumDate
LEFT JOIN
( SELECT
t.partner_id,
DATE(t.order_date) as sumDate,
SUM(t.order_sum) as total_sum,
SUM(t.customer_paid_sum) as received_sum,
SUM(t.partner_fee) as partner_fee,
count(*) as total_count,
count(CASE WHEN t.status = 1 THEN 1 ELSE NULL END) as COUNT
FROM
transaction t
WHERE
t.order_date >= '2013-04-01' AND t.order_date < '2013-05-01'
GROUP BY
t.partner_id,
DATE(t.order_date) ) as orders
ON AllPartners.partner_id = orders.partner_id
AND AllPartners.the_date = orders.sumDate
order by
AllPartners.the_date DESC
limit 50 offset 0
This way, the first query will be quick on the index to get all possible combinations from EITHER table. Then the left-join will AT MOST join to one row per set. If found, get the number, if not, I am applying COALESCE() so if null, defaults to zero.
CLARIFICATION.
Like you when building your pre-aggregate queries of "clicks" and "orders", the "AllPartners" is the ALIAS result of the select distinct of partners and dates within the date range you were interested in. The resulting columns of that where were "partner_id" and "the_date" respective to your next queries. So this is the basis of joining to the aggregates of "clicks" and "orders". So, since I have these two columns in the alias "AllParnters", I just grabbed those for the field list since they are LEFT-JOINed to the other aliases and may not exist in either/or the respective others.
I have a table 'foo' with a timestamp field 'bar'. How do I get only the oldest timestamp for a query like: SELECT foo.bar from foo? I tried doing something like: SELECT MIN(foo.bar) from foo but it failed with this error
ERROR 1140 (42000) at line 1: Mixing of GROUP columns (MIN(),MAX(),COUNT(),...) with no GROUP columns is illegal if there is no GROUP BY clause
OK, so my query is much more complicated than that and that's why I am having a hard time with it. This is the query with the MIN(a.timestamp):
select distinct a.user_id as 'User ID',
a.project_id as 'Remix Project Id',
prjs.based_on_pid as 'Original Project ID',
(case when f.reasons is NULL then 'N' else 'Y' end)
as 'Flagged Y or N',
f.reasons, f.timestamp, MIN(a.timestamp)
from view_stats a
join (select id, based_on_pid, user_id
from projects p) prjs on
(a.project_id = prjs.id)
left outer join flaggers f on
( f.project_id = a.project_id
and f.user_id = a.user_id)
where a.project_id in
(select distinct b.id
from projects b
where b.based_on_pid in
( select distinct c.id
from projects c
where c.user_id = a.user_id
)
)
order by f.reasons desc, a.user_id, a.project_id;
Any help would be greatly appreciated.
The view_stats table:
+------------+------------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------------+------+-----+-------------------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| user_id | int(10) unsigned | NO | MUL | 0 | |
| project_id | int(10) unsigned | NO | MUL | 0 | |
| ipaddress | bigint(20) | YES | MUL | NULL | |
| timestamp | timestamp | NO | | CURRENT_TIMESTAMP | |
+------------+------------------+------+-----+-------------------+----------------+
If you are going to use aggregate functions (like min(), max(), avg(), etc.) you need to tell the database what exactly it needs to take the min() of.
transaction date
one 8/4/09
one 8/5/09
one 8/6/09
two 8/1/09
two 8/3/09
three 8/4/09
I assume you want the following.
transaction date
one 8/4/09
two 8/1/09
three 8/4/09
Then to get that you can use the following query...note the group by clause which tells the database how to group the data and get the min() of something.
select
transaction,
min(date)
from
table
group by
transaction