GROUP BY HAVING not working as expected - mysql

I'm struggling with what should be a simple query.
An event table stores user activity in an application. Each click generates a new event and datetime stamp. I need to show a list of recently accessed records having the most recent datetime stamp. I need to only show the past 7 days of activity.
The table has an auto-increment field (eventID), which corresponds with the date_event field, so it's better to use that for determining the most recent record in the group.
I found that some records are not appearing in my results with the expected most recent datetime. So I stripped my query down the basics:
NOTE that the real-life query does not look at custID. I am including it here to narrow down on the problem.
SELECT
el.eventID,
el.custID,
el.date_event
FROM
event_log el
WHERE
el.custID = 12345 AND
el.userID=987
GROUP BY
el.custID
HAVING
MAX( el.eventID )
This is returned:
eventID custID date_event
346290 12345 2013-06-21 09:58:44
Here's the EXPLAIN
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE el ref userID,custID,Composite custID 5 const 203 Using where
If I change the query to use HAVING MIN, the results don't change.. I should see a different eventID and date_event, as there are dozens of records matching the custID and userID.
SELECT
el.eventID,
el.custID,
el.date_event
FROM
event_log el
WHERE
el.custID = 12345 AND
el.userID=987
GROUP BY
el.custID
HAVING
MIN( el.eventID )
Same results as before:
eventID custID date_event
346290 12345 2013-06-21 09:58:44
No change.
This tells me I have another problem, but I am not seeing what that might be.
Some pointers would be appreciated.

SELECT
el.eventID,
el.custID,
el.date_event
FROM
event_log el
WHERE
el.custID = 12345 AND
el.userID=987 AND
el.eventID IN (SELECT MAX(eventID)
FROM event_log
WHERE custID = 12345
AND userID = 987)
Your query doesn't work because you misunderstand what HAVING does. It evaluates the expression on each line of the result set, and keeps the rows where the expression evaluates to true. The expression MAX(el.eventID) simply returns the maximum event ID selected by the query, it doesn't compare the current row to that event ID.
Another way is:
SELECT
el.eventID,
el.custID,
el.date_event
FROM
event_log el
WHERE
el.custID = 12345 AND
el.userID=987
ORDER BY eventID DESC
LIMIT 1
The more general form that works for multiple custID is:
SELECT el.*
FROM event_log el
JOIN (SELECT custID, max(date_event) maxdate
FROM event_log
WHERE userID = 987
GROUP BY custID) emax
ON el.custID = emax.custID AND el.date_event = emax.maxdate
WHERE el.userID = 987

You can use a group function in a statement containing no GROUP BY clause, but it would be equivalent to grouping on all rows. But I guess you're looking for the common syntax,
SELECT
MIN(el.eventID) AS `min_eventID`, --> Yes it is wrong :(
el.custID,
el.date_event
FROM
event_log el
WHERE
el.userID = 987
GROUP BY el.custID;
But disagreements are welcome .
[ Edit ]
I think I didn't show a solution fast enough... but maybe you're rather looking for the fastest solution.
Assuming field date_event defaults to CURRENT_TIMESTAMP (am I wrong?), ordering by date_event would be a waste of time (and money, thus).
I've made some tests with 20K rows and execution time was about 5ms.
SELECT STRAIGHT_JOIN y.*
FROM ((
SELECT MAX(eventId) as eventId
FROM event_log
WHERE userId = 987 AND custId = 12345
)) AS x
INNER JOIN event_log AS y
USING (eventId);
Maybe (possibly, who knows) you didn't get the straight_join thing; as documented on the scriptures, STRAIGHT_JOINs are similar to JOINs, except that the left table is always read before the right table. Sometimes it's useful.
For your specific situation, we're likely to filter to a certain eventID before (on table "x"), not to retrieve 99,99% useless rows from table "y".
More disagreements expected in 3, 2, ...

Related

Selecting rows until a column value isn't the same

SELECT product.productID
, product.Name
, product.date
, product.status
FROM product
INNER JOIN shelf ON product.sheldID=shelf.shelfID
WHERE product.weekID = $ID
AND product.date < '$day'
OR (product.date = '$day' AND shelf.expire <= '$time' )
ORDER BY concat(product.date,shelf.expire)
I am trying to stop the SQL statement at a specific value e.g. bad.
I have tried using max-date, but am finding it hard as am making the time stamp in the query. (Combining date/time)
This example table shows that 3 results should be returned and if the status "bad" was the first result than no results should be returned. (They are ordered by date and time).
ProductID Date status
1 2017-03-27 Good
2 2017-03-27 Good
3 2017-03-26 Good
4 2017-03-25 Bad
5 2017-03-25 Good
Think I may have fixed it, I added this to my while loop.
The query gives the results in order by present to past using date and time, this while loop checks if the column of that row is equal to 'bad' if it is does something (might be able to use an array to fill it up with data). If not than the loop is broken.
I know it doesn't seem ideal but it works lol
while ($row = mysqli_fetch_assoc($result)) {
if ($row['status'] == "bad") {
$counter += 1;
}
else{
break;}
I will provide an answer just with your output as if it was just one table. It will give you the main ideia in how to solve your problem.
Basically I created a column called ord that will work as a row_number (MySql doesn't support it yet AFAIK). Then I got the minimum ord value for a bad status then I get everything from the data where ord is less than that.
select y.*
from (select ProductID, dt, status, #rw:=#rw+1 ord
from product, (select #rw:=0) a
order by dt desc) y
where y.ord < (select min(ord) ord
from (select ProductID, status, #rin:=#rin+1 ord
from product, (select #rin:=0) a
order by dt desc) x
where status = 'Bad');
Result will be:
ProductID dt status ord
-------------------------------------
1 2017-03-27 Good 1
2 2017-03-27 Good 2
3 2017-03-26 Good 3
Also tested with the use case where the Bad status is the first result, no results will be returned.
See it working here: http://sqlfiddle.com/#!9/28dda/1

MySQL find minimum and maximum date associated with a record in another table

I am trying to write a query to find the number of miles on a bicycle fork. This number is calculated by taking the distance_reading associated with the date that the fork was installed on (the minimum reading_date on or after the Bicycle_Fork.start_date associated with the Bicycle_Fork record) and subtracting that from the date that the fork was removed (the maximum reading_date on or before the Bicycle_Fork.end_date or, if that is null, the reading closest to today's date). I've managed to restrict the range of odometer_readings to the appropriate ones, but I cannot figure out how to find the minimum and maximum date for each odometer that represents when the fork was installed. It was easy when I only had to look at records matching the start_date or end_date, but the user is not required to enter a new odometer reading for each date that a part is changed. I've been working on this query for several hours now, and I can't find a way to use MIN() that doesn't just take the single smallest date out of all of the results.
Question: How can I find the minimum reading_date and the maximum reading_date associated with each odometer_id while maintaining the restrictions created by my WHERE clause?
If this is not possible, I plan to store the values retrieved from the first query in an array in PHP and deal with it from there, but I would like to be able to find a solution solely in MySQL.
Here is an SQL fiddle with the database schema and the current state of the query: http://sqlfiddle.com/#!2/015642/1
SELECT OdometerReadings.distance_reading, OdometerReadings.reading_date,
OdometerReadings.odometer_id, Bicycle_Fork.fork_id
FROM Bicycle_Fork
INNER JOIN (Bicycles, Odometers, OdometerReadings)
ON (Bicycles.bicycle_id = Bicycle_Fork.bicycle_id
AND Odometers.bicycle_id = Bicycles.bicycle_id AND OdometerReadings.odometer_id = Odometers.odometer_id)
WHERE (OdometerReadings.reading_date >= Bicycle_Fork.start_date) AND
((Bicycle_Fork.end_date IS NOT NULL AND OdometerReadings.reading_date<= Bicycle_Fork.end_date) XOR (Bicycle_Fork.end_date IS NULL AND OdometerReadings.reading_date <= CURRENT_DATE()))
This is the old query that didn't take into account the possibility of the database lacking a record that corresponded with the start_date or end_date:
SELECT MaxReadingOdo.distance_reading, MinReadingOdo.distance_reading
FROM
(SELECT OdometerReadings.distance_reading, OdometerReadings.reading_date,
OdometerReadings.odometer_id
FROM Bicycle_Fork
LEFT JOIN (Bicycles, Odometers, OdometerReadings)
ON (Bicycles.bicycle_id = Bicycle_Fork.bicycle_id
AND Odometers.bicycle_id = Bicycles.bicycle_id AND OdometerReadings.odometer_id = Odometers.odometer_id)
WHERE Bicycle_Fork.start_date = OdometerReadings.reading_date) AS MinReadingOdo
INNER JOIN
(SELECT OdometerReadings.distance_reading, OdometerReadings.reading_date,
OdometerReadings.odometer_id
FROM Bicycle_Fork
LEFT JOIN (Bicycles, Odometers, OdometerReadings)
ON (Bicycles.bicycle_id = Bicycle_Fork.bicycle_id AND Odometers.bicycle_id
= Bicycles.bicycle_id AND OdometerReadings.odometer_id = Odometers.odometer_id)
WHERE Bicycle_Fork.end_date = OdometerReadings.reading_date) AS
MaxReadingOdo
ON MinReadingOdo.odometer_id = MaxReadingOdo.odometer_id
I'm trying to get the following to return from the SQL schema:
I will eventually sum these into one number, but I've been working with them separately to make it easier to check the values.
min_distance_reading | max_distance_reading | odometer_id
=============================================================
75.5 | 2580.5 | 1
510.5 | 4078.5 | 2
17.5 | 78.5 | 3
I don't understand the final part of the puzzle, but this seems close...
SELECT MIN(ro.distance_reading) min_val
, MAX(ro.distance_reading) max_val
, ro.odometer_id
FROM OdometerReadings ro
JOIN odometers o
ON o.odometer_id = ro.odometer_id
JOIN Bicycle_Fork bf
ON bf.bicycle_id = o.bicycle_id
AND bf.start_date <= ro.reading_date
GROUP
BY ro.odometer_id;
http://sqlfiddle.com/#!2/015642/8

Why does this query returns no result?

I have two Tables, the table reseau_stream has different information about my a user post. A user can share the post of someone else. Table reseau_share makes that connexion (you have the detail of both table below). Now, if a user share someone else post, I have to order my query using the datetime of reseau_share.
I don't have alot of MySQL skills, but with some help, I finally ended up with the query below. It is working only if reseau_share has a row in it. If reseau_share is empty, the query return 0 result. I really don't understand why. Can anyone identify why ? Cheers.
Table reseau_stream
id user_id content datetime
1 100 Lorem Ipsum1 2013-03-04 19:35:02
2 100 Lorem Ipsum2 2013-03-04 12:35:02
Table reseau_share
id user_id target_id stream_id datetime
-------------------- EMPTY ------------------------
The query
SELECT reseau_stream.id,
reseau_stream.user_id,
reseau_stream.content,
IF(reseau_stream.user_id = 100, reseau_stream.datetime, reseau_share.datetime) as datetime
FROM reseau_stream, reseau_share
WHERE reseau_stream.id
IN (
SELECT id
FROM reseau_stream
WHERE user_id = 100
UNION
SELECT stream_id
FROM reseau_share
WHERE user_id = 100
) ORDER BY datetime DESC;
Basically it looks like you need a LEFT JOIN on reseau_share. Right now you have a FULL OUTER JOIN, which (a) is causing the zero rows as #diegoperini has pointed out and (b) probably isn't what you really want. It's unclear which column relates the two tables. I'll guess it's user_id:
SELECT
reseau_stream.id,
reseau_stream.user_id,
reseau_stream.content,
IF(reseau_stream.user_id = 100, reseau_stream.datetime, reseau_share.datetime) as datetime
FROM reseau_stream
LEFT JOIN reseau_share ON reseau_stream.user_id = reseau_share.user_id
WHERE reseau_stream.id
IN (
SELECT id
FROM reseau_stream
WHERE user_id = 100
UNION
SELECT stream_id -- or whatever
FROM reseau_share
WHERE user_id = 100
)
ORDER BY datetime DESC;
Cartesian product of a non empty set with an empty set is an empty set.
Multiple tables in a FROM statement uses above rule to join two tables which ends up with 0 results in your case.

How can I return the numerical boxplot data of all results using 1 mySQL query?

[tbl_votes]
- id <!-- unique id of the vote) -->
- item_id <!-- vote belongs to item <id> -->
- vote <!-- number 1-10 -->
Of course we can fix this by getting:
the smallest observation (so)
the lower quartile (lq)
the median (me)
the upper quartile (uq)
and the largest observation (lo)
..one-by-one using multiple queries but I am wondering if it can be done with a single query.
In Oracle I can use COUNT OVER and RATIO_TO_REPORT, but this is not supported in mySQL.
For those who don't know what a boxplot is: http://en.wikipedia.org/wiki/Box_plot
Any help would be appreciated.
I've found a solution in PostgreSQL using using PL/Python.
However, I leave the question open in case someone else comes up with a solution in mySQL.
CREATE TYPE boxplot_values AS (
min numeric,
q1 numeric,
median numeric,
q3 numeric,
max numeric
);
CREATE OR REPLACE FUNCTION _final_boxplot(strarr numeric[])
RETURNS boxplot_values AS
$$
x = strarr.replace("{","[").replace("}","]")
a = eval(str(x))
a.sort()
i = len(a)
return ( a[0], a[i/4], a[i/2], a[i*3/4], a[-1] )
$$
LANGUAGE 'plpythonu' IMMUTABLE;
CREATE AGGREGATE boxplot(numeric) (
SFUNC=array_append,
STYPE=numeric[],
FINALFUNC=_final_boxplot,
INITCOND='{}'
);
Example:
SELECT customer_id as cid, (boxplot(price)).*
FROM orders
GROUP BY customer_id;
cid | min | q1 | median | q3 | max
-------+---------+---------+---------+---------+---------
1001 | 7.40209 | 7.80031 | 7.9551 | 7.99059 | 7.99903
1002 | 3.44229 | 4.38172 | 4.72498 | 5.25214 | 5.98736
Source: http://www.christian-rossow.de/articles/PostgreSQL_boxplot_median_quartiles_aggregate_function.php
Here is an example of calculation of the quartiles for e256 value ranges within e32 groups, an index on (e32, e256) in this case is a must:
SELECT
#group:=IF(e32=#group, e32, GREATEST(#index:=-1, e32)) as e32_,
MIN(e256) as so,
MAX(IF(lq_i=(#index:=#index+1), e256, NULL)) as lq,
MAX(IF(me_i=#index, e256, NULL)) as me,
MAX(IF(uq_i=#index, e256, NULL)) as uq,
MAX(e256) as lo
FROM (SELECT #index:=NULL, #group:=NULL) as init, test t
JOIN (
SELECT e32,
COUNT(*) as cnt,
(COUNT(*) div 4) as lq_i, -- lq value index within the group
(COUNT(*) div 2) as me_i, -- me value index within the group
(COUNT(*) * 3 div 4) as uq_i -- uq value index within the group
FROM test
GROUP BY e32
) as cnts
USING (e32)
GROUP BY e32;
If there is no need in groupings, the query will be slightly simplier.
P.S. test is my playground table of random values where e32 is the result of Python's int(random.expovariate(1.0) * 32), etc.
Well I can do it in two queries.
Do the first query to get the positions of the quartiles and then use the limit function to
get the answers in the second query.
mysql> select (select floor(count(*)/4)) as first_q, (select floor(count(*)/2) from
customer_data) as mid_pos, (select floor(count(*)/4*3) from customer_data) as third_q from
customer_data order by measure limit 1;
mysql> select min(measure),(select measure from customer_data order by measure limit 0,1) as firstq, (select measure from customer_data order by measure limit 5,1) as median, (select measure from customer_data order by measure limit 8,1) as last_q, max(measure) from customer_data;

How to use result of an subquery multiple times into an query

A MySQL query needs the results of a subquery in different places, like this:
SELECT COUNT(*),(SELECT hash FROM sets WHERE ID=1)
FROM sets
WHERE hash=(SELECT hash FROM sets WHERE ID=1)
and XD=2;
Is there a way to avoid the double execution of the subquery (SELECT hash FROM sets WHERE ID=1)?
The result of the subquery always returns an valid hash value.
It is important that the result of the main query also includes the HASH.
First I tried a JOIN like this:
SELECT COUNT(*), m.hash FROM sets s INNER JOIN sets AS m
WHERE s.hash=m.hash AND id=1 AND xd=2;
If XD=2 doesn't match a row, the result is:
+----------+------+
| count(*) | HASH |
+----------+------+
| 0 | NULL |
+----------+------+
Instead of something like (what I need):
+----------+------+
| count(*) | HASH |
+----------+------+
| 0 | 8115e|
+----------+------+
Any ideas? Please let me know! Thank you in advance for any help.
//Edit:
finally that query only has to count all the entries in an table which has the same hash value like the entry with ID=1 and where XD=2. If no rows matches that (this case happend if XD is set to an other number), so return 0 and simply hash value.
SELECT SUM(xd = 2), hash
FROM sets
WHERE id = 1
If id is a PRIMARY KEY (which I assume it is since your are using a single-record query against it), then you can just drop the SUM:
SELECT xd = 2 AS cnt, hash
FROM sets
WHERE id = 1
Update:
Sorry, got your task wrong.
Try this:
SELECT si.hash, COUNT(so.hash)
FROM sets si
LEFT JOIN
sets so
ON so.hash = si.hash
AND so.xd = 2
WHERE si.id = 1
I normally nest the statements like the following
SELECT Count(ResultA.Hash2) AS Hash2Count,
ResultA.Hash1
FROM (SELECT S.Hash AS Hash2,
(SELECT s2.hash
FROM sets AS s2
WHERE s2.ID = 1) AS Hash1
FROM sets AS S
WHERE S.XD = 2) AS ResultA
WHERE ResultA.Hash2 = ResultA.Hash1
GROUP BY ResultA.Hash1
(this one is hand typed and not tested but you should get the point)
Hash1 is your subquery, once its nested, you can reference it by its alias in the outer query. It makes the query a little larger but I don't see that as a biggy.
If I understand correctly what you are trying to get, query should look like this:
select count(case xd when 2 then 1 else null end case), hash from sets where id = 1 group by hash
I agree with the other answers, that the GROUP BY may be better, but to answer the question as posed, here's how to eliminate the repetition:
SELECT COUNT(*), h.hash
FROM sets, (SELECT hash FROM sets WHERE ID=1) h
WHERE sets.hash=h.hash
and sets.ID=1 and sets.XD=2;