I have a database that holds readings for devices. I am trying to write a query that can select the latest reading from a device. I have two queries that are seemingly the same and that I'd expect to give the same results; however they do not. The queries are as follows:
First query:
select max(datetime), reading
from READINGS
where device_id = '1234567890'
Second query:
select datetime, reading
from READINGS
where device_id = '1234567890' and datetime = (select max(datetime)
from READINGS
where device_id = '1234567890')
The they both give different results for the reading attribute. The second one is the one that gives the right result but why does the first give something different?
This is MySQL behaviour at work. When you use grouping the columns you select must either appear in the group by or be an aggregate function eg min(), max(). Mixing aggregates and normal columns is not allowed in most other database flavours.
The first query will just return the first rating in each group (first in the sense of where it appears on the file system), which is most likely wrong.
The second query correlates rating with maximum time stamp leading to the correct result.
It is because you are not using a GROUP BY reading clause, which you should be using in both queries.
This is normal on MySQL. See the documentation on this:
If you use a group function in a statement containing no GROUP BY clause, it is equivalent to grouping on all rows.
Also, read http://dev.mysql.com/doc/refman/5.0/en/group-by-hidden-columns.html
You can use the Explain and Explan extended commands to know more about your queries.
Related
this is my first post here since most of the time I already found a suitable solution :)
However this time nothing seems to help properly.
Im trying to migrate information from some mysql Database I have just read-only access to.
My problem is similar to this one: Group by doesn't give me the newest group
I also need to get the latest information out of some tables but my tables have >300k entries therefore checking whether the "time-attribute-value" is the same as in the subquery (like suggested in the first answer) would be too slow (once I did "... WHERE EXISTS ..." and the server hung up).
In addition to that I can hardly find the important information (e.g. time) in a single attribute and there never is a single primary key.Until now I did it like it was suggested in the second answer by joining with subquery that contains latest "time-attribute-entry" and some primary keys but that gets me in a huge mess after using multiple joins and unions with the results.
Therefore I would prefer using the having statement like here: Select entry with maximum value of column after grouping
But when I tried it out and looked for a good candidate as the "time-attribute" I noticed that this queries give me two different results (more = 39721, less = 37870)
SELECT COUNT(MATNR) AS MORE
FROM(
SELECT DISTINCT
LAB_MTKNR AS MATNR,
LAB_STG AS FACH,
LAB_STGNR AS STUDIENGANG
FROM
FKT_LAB
) AS TEMP1
SELECT COUNT(MATNR) AS LESS
FROM(
SELECT
LAB_MTKNR AS MATNR,
LAB_STG AS FACH,
LAB_STGNR AS STUDIENGANG,
LAB_PDATUM
FROM
FKT_LAB
GROUP BY
LAB_MTKNR,
LAB_STG,
LAB_STGNR
HAVING LAB_PDATUM = MAX(LAB_PDATUM)
)AS TEMP2
Although both are applied to the same table and use "GROUP BY" / "SELECT DISTINCT" on the same entries.
Any ideas?
If nothing helps and I have to go back to my mess I will use string variables as placeholders to tidy it up but then I lose the overview of how many subqueries, joins and unions I have in one query... how many temproal tables will the server be able to cope with?
Your second query is not doing what you expect it to be doing. This is the query:
SELECT COUNT(MATNR) AS LESS
FROM (SELECT LAB_MTKNR AS MATNR, LAB_STG AS FACH, LAB_STGNR AS STUDIENGANG, LAB_PDATUM
FROM FKT_LAB
GROUP BY LAB_MTKNR, LAB_STG, LAB_STGNR
HAVING LAB_PDATUM = MAX(LAB_PDATUM)
) TEMP2;
The problem is the having clause. You are mixing an unaggregated column (LAB_PDATUM) with an aggregated value (MAX(LAB_PDATAUM)). What MySQL does is choose an arbitrary value for the column and compare it to the max.
Often, the arbitrary value will not be the maximum value, so the rows get filtered. The reference you give (although an accepted answer) is incorrect. I have put a comment there.
If you want the most recent value, here is a relatively easy way:
SELECT COUNT(MATNR) AS LESS
FROM (SELECT LAB_MTKNR AS MATNR, LAB_STG AS FACH, LAB_STGNR AS STUDIENGANG,
max(LAB_PDATUM) as maxLAB_PDATUM
FROM FKT_LAB
GROUP BY LAB_MTKNR, LAB_STG, LAB_STGNR
) TEMP2;
It does not, however, affect the outer count.
To calculate the price of invoices (that have *invoice item*s in a separate table and linked to the invoices), I had written this query:
SELECT `i`.`id`, SUM(ii.unit_price * ii.quantity) invoice_price
FROM (`invoice` i)
JOIN `invoiceitem` ii
ON `ii`.`invoice_id` = `i`.`id`
WHERE `i`.`user_id` = '$user_id'
But it only resulted ONE row.
After research, I got that I had to have GROUP BY i.id at the end of the query. With this, the results were as expected.
From my opinion, even without GROUP BY i.id, nothing is lost and it should work well!
Please in some simple sentences tell me...
Why should I always use the additional!!! GROUP BY i.id, What is lost without it, and maybe as the most functioning question, How should I remember that I have lost the additional GROUP BY?!
You have to include the group by because there are many IDs that went into the sum. If you don't specify it then MySQL just picks the first one, and sums across the entire result set. GroupBy tells MySQL to sum (or generically aggregate) for each Grouped By Entity.
Why should I always use GROUP BY?
SUM() and others are Aggregate Functions. Their very nature requires that they be used in combination with GROUP BY.
What is lost without it?
From the documentation:
If you use a group function in a statement containing no GROUP BY clause, it is equivalent to grouping on all rows.
In the end, there is nothing to remember, as these are GROUP BY aggregate functions. You will quickly tell from the result that you have forgotten GROUP BY when the result includes the entire result set (incorrectly), instead of your grouped subsets.
I have read a few post on this, but not seeming to be able to fix my problem.
I am calling two database queries to populate two array's that run along side by side of each other, but they aren't matching, as the order that they come out is different. I believe i have something to do with the Group By, and this may require a sub query, but again a little lost...
Query 1:
SELECT count(bids_bid.total_bid), bidtime_bid, users_usr.company_usr, users_usr.id_usr
FROM bids_bid
INNER JOIN users_usr
ON bids_bid.user_bid = users_usr.id_usr
WHERE auction_bid = 36
GROUP BY user_bid
ORDER BY bidtime_bid ASC
Query 2:
SELECT auction_bid, user_bid, bidtime_bid, bids_bid.total_bid
FROM bids_bid
WHERE auction_bid = 36
ORDER BY bidtime_bid ASC
Even though the 'Order by' is the same the results aren't matching. The users are coming out in a different sequence.
I hope this makes sense, and thanks in advance.
* Update *
I just wanted to add a bit of clarity on what the output I want is. I need to only show 1 result by one user (user_bid) the second query show all users rows. I only need the first one to show the first row entered for each user. So if I could order before the the group and by min date, that would be ace...
It's to be expected. You're fetching fields that are NOT involved in the grouping, and are not part of an aggregate function. MySQL allows such things, but generally the results of the ungrouped/unaggregated functions can be wonky.
Because MySQL is free to chose WHICH of the potentially multiple 'free' rows to choose for the actual result row, you will get different results. Generally it picks the first-encountered 'free choice' result, but that's not defined/guaranteed.
You use grouping when you want unique results in result set according to some
group id (column name). usually grouping is used with aggregate functions such as
(min, max,count,sum..).
Ordering or inner query is nothing to do with result set, i suggest read some introductory
tutorials about grouping and think/treat Sql as a set based language and most of the set theory is applied on sql you'll be fine.
So I was complicating issues that I didn't need to. The solution I found was before.
SELECT users_usr.company_usr,
users_usr.id_usr,
bids_bid.bidtime_bid, min(bidtime_bid) as minbid FROM bids_bid INNER JOIN users_usr ON bids_bid.user_bid = users_usr.id_usr
WHERE auction_bid = 36
GROUP BY id_usr
ORDER BY minbid ASC
Thanks everyone for making me look (try) harder...
I have followed the tutorial over at tizag for the MAX() mysql function and have written the query below, which does exactly what I need. The only trouble is I need to JOIN it to two more tables so I can work with all the rows I need.
$query = "SELECT idproducts, MAX(date) FROM results GROUP BY idproducts ORDER BY MAX(date) DESC";
I have this query below, which has the JOIN I need and works:
$query = ("SELECT *
FROM operators
JOIN products
ON operators.idoperators = products.idoperator JOIN results
ON products.idProducts = results.idproducts
ORDER BY drawndate DESC
LIMIT 20");
Could someone show me how to merge the top query with the JOIN element from my second query? I am new to php and mysql, this being my first adventure into a computer language I have read and tried real hard to get those two queries to work, but I am at a brick wall. I cannot work out how to add the JOIN element to the first query :(
Could some kind person take pity on a newb and help me?
Try this query.
SELECT
*
FROM
operators
JOIN products
ON operators.idoperators = products.idoperator
JOIN
(
SELECT
idproducts,
MAX(date)
FROM results
GROUP BY idproducts
) AS t
ON products.idproducts = t.idproducts
ORDER BY drawndate DESC
LIMIT 20
JOINs function somewhat independently of aggregation functions, they just change the intermediate result-set upon which the aggregate functions operate. I like to point to the way the MySQL documentation is written, which hints uses the term 'table_reference' in the SELECT syntax, and expands on what that means in JOIN syntax. Basically, any simple query which has a table specified can simply expand that table to a complete JOIN clause and the query will operate the same basic way, just with a modified intermediate result-set.
I say "intermediate result-set" to hint at the mindset which helped me understand JOINS and aggregation. Understanding the order in which MySQL builds your final result is critical to knowing how to reliably get the results you want. Generally, it starts by looking at the first row of the first table you specify after 'FROM', and decides if it might match by looking at 'WHERE' clauses. If it is not immediately discardable, it attempts to JOIN that row to the first JOIN specified, and repeats the "will this be discarded by WHERE?". This repeats for all JOINs, which either add rows to your results set, or remove them, or leaves just the one, as appropriate for your JOINs, WHEREs and data. This process builds what I am referring to when I say "intermediate result-set". Somewhere between starting and finishing your complete query, MySQL has in it's memory a potentially massive table-like structure of data which it built using the process I just described. Only then does it begin to aggregate (GROUP) the results according to your criteria.
So for your query, it depends on what specifically you are going for (not entirely clear in OP). If you simply want the MAX(date) from the second query, you can simply add that expression to the SELECT clause and then add an aggregation spec to the end:
SELECT *, MAX(date)
FROM operators
...
GROUP BY idproducts
ORDER BY ...
Alternatively, you can add the JOIN section of the second query to the first.
I have the following SQL query , it seems to run ok , but i am concerned as my site grows it may not perform as expected ,I would like some feeback as to how effective and efficient this query really is:
select * from articles where category_id=XX AND city_id=XXX GROUP BY user_id ORDER BY created_date DESC LIMIT 10;
Basically what i am trying to achieve - is to get the newest articles by created_date limited to 10 , articles must only be selected if the following criteria are met :
City ID must equal the given value
Category ID must equal the given value
Only one article per user must be returned
Articles must be sorted by date and only the top 10 latest articles must be returned
You've got a GROUP BY clause which only contains one column, but you are pulling all the columns there are without aggregating them. Do you realise that the values returned for the columns not specified in GROUP BY and not aggregated are not guaranteed?
You are also referencing such a column in the ORDER BY clause. Since the values of that column aren't guaranteed, you have no guarantee what rows are going to be returned with subsequent invocations of this script even in the absence of changes to the underlying table.
So, I would at least change the ORDER BY clause to something like this:
ORDER BY MAX(created_date)
or this:
ORDER BY MIN(created_date)
some potential improvements (for best query performance):
make sure you have an index on all columns you querynote: check if you really need an index on all columns because this has a negative performance when the BD has to build the index. -> for more details take a look here: http://dev.mysql.com/doc/refman/5.1/en/optimization-indexes.html
SELECT * would select all columns of the table. SELECT only the ones you really require...