Whats wrong with this MYSQL query - mysql

I have the following SQL query , it seems to run ok , but i am concerned as my site grows it may not perform as expected ,I would like some feeback as to how effective and efficient this query really is:
select * from articles where category_id=XX AND city_id=XXX GROUP BY user_id ORDER BY created_date DESC LIMIT 10;
Basically what i am trying to achieve - is to get the newest articles by created_date limited to 10 , articles must only be selected if the following criteria are met :
City ID must equal the given value
Category ID must equal the given value
Only one article per user must be returned
Articles must be sorted by date and only the top 10 latest articles must be returned

You've got a GROUP BY clause which only contains one column, but you are pulling all the columns there are without aggregating them. Do you realise that the values returned for the columns not specified in GROUP BY and not aggregated are not guaranteed?
You are also referencing such a column in the ORDER BY clause. Since the values of that column aren't guaranteed, you have no guarantee what rows are going to be returned with subsequent invocations of this script even in the absence of changes to the underlying table.
So, I would at least change the ORDER BY clause to something like this:
ORDER BY MAX(created_date)
or this:
ORDER BY MIN(created_date)

some potential improvements (for best query performance):
make sure you have an index on all columns you querynote: check if you really need an index on all columns because this has a negative performance when the BD has to build the index. -> for more details take a look here: http://dev.mysql.com/doc/refman/5.1/en/optimization-indexes.html
SELECT * would select all columns of the table. SELECT only the ones you really require...

Related

How to get reliable results from first() and last()?

EDIT: See edit below for explanation of why min() and max() are NOT adequate.
=========================
The MS documentation on the functions first() and last() says “Because records are usually returned in no particular order (unless the query includes an ORDER BY clause), the records returned by these functions will be arbitrary.”
Obviously, that makes these functions pretty useless for their intended purpose unless the query includes an ORDER BY. But including that in the query is not a straightforward thing to do because these are "aggregate" functions, so a query that SELECTs on them cannot ORDER BY any other field that is not also submitted to an aggregate function.
I have found that a query based on a single table generally returns results in the order of that table’s primary key. But apparently, that cannot be relied on to always be true and may fail under certain circumstances. There's an excellent discussion of this issue in an article, DFirst/DLast and the Myth of the Sorted Result Set.
That article offers two solutions to this problem:
Option one; you first use the DMin/DMax-Function to retrieve the value from the “sortable” column ... and use this as an additional criterion to your query to retrieve the target record.
Second option; you first create a query just containing the primary key and the max value of the sortable column (e.g. CustomerId and maximum of order date). Then you create a second query on the orders and join the first query in there on these two fields. The results will be all column from the orders table but only for the most recent order of each customer.
Those instructions are pretty complicated, so I'd need to see an example of them implemented in code in order to trust myself to use them myself.
This issue has got to be very common because a lot of businesses need to know the first or last order by a customer that meets some condition. But when I Google "Access query first last "order by"", there are several results that explain the problem, including on StackOverflow, but none that lay out a solution with sample SQL code.
What is the right way to do this, including sample code of doing it?
=========================
Edit:
Many sources online, as well as the comment below by Gustav and the proposed answer by Albert D. Kallal, say you can just use min() and max() instead of first() and last(). Obviously, that's okay if what you want is the value of a field in the record in which that field has the smallest or largest value. That's a trivial problem. What I'm talking about is how to get the value of a field in the record in which some other field has the smallest or largest value.
For example, in the answer by Albert D. Kallal, he wants the first and last tour for each customer, so he can just use min() and max() on the dates of the tours. But what if I want to know the location of the first tour for each customer? Obviously, I can't use min(location). If first() would work in a sensible way and if table [Tours] has the primary key [Date], I should be able to use something like:
(SELECT first(location) from [Tours] where [Customer] = ID_Customer)
I am using code like that and it usually gives me the right answer, but not always. So that is what I need to fix. I understand that I may need to use min() instead of first(). But how do I use min() for this since, as I said, I obviously can't just use min(location)?
Never really grasped what first() and last() does in Access.
As you note, rather common to want say last invoice or whatever.
So, say we have a table of Tours. I want the first tour date, and the last tour date.
Well, this query works:
SELECT MAX(FromDate) as LastTourDate, min(FromDate) as FirstTourDate
FROM tblTours
WHERE FromDate is not null
When I run above, I get this:
So, that gets you the min, and max - and gets you this in one query.
No real need for a order by.
However, often there are more then one table involved.
So, I might in place of JUST the first and last tour date?
I probably want a list of customers, and their first tour they took, and say their last tour. But, then again, that's a different question.
But, you again can order your main table ANY way you want, and still pluck out
(pull the min and max).
So, you can do it this way:
Say, tblMain client (people - customers whatever).
Say, tblMyTours - a list of tours they took (child table).
So, the query can look like this:
SELECT tblMainClient.FirstName, tblMainClient.LastName,
(SELECT Min(FromDate) FROM tblMyTours
WHERE tblMyTours.main_id = tblMainClient.id)
AS FirstTourDate,
(SELECT MAX(FromDate) FROM tblMyTours
WHERE tblMyTours.main_id = tblMainClient.id)
AS LastTourDate
FROM tblMainClient
so, the main query is still tblMainClient - I can order, filter, sort by any column in that main table, but we used two sub-query to get the first tour date and the last tour date. So, it will look say like this:
So, typical, we can use a sub-query, pull the max (or min) value, but restrict the sub query to the one row from our parent/main table.
edit: Get last reocrd, but SOME OTHER column
Ok, so say in our simple example, we want the last tour, but NOT the date, but say some other column - like say the last Tour name.
Ok, so we just modify the sub query to return ONLY the last reocrd, but a different column.
And since dates (say 2 invoices on the same day, or yearly tours might have the SAME name, then we need to ensure that ONLY one reocrd is returned. We do this by using top 1, but ALSO add a order by to be 100%, 200%, 300% sure that ONLY ONE top record is returned.
So, our query to get the last tour name, but based on say most recent tour date?
We can do this:
SELECT FirstName, LastName,
(SELECT TOP 1 TourName FROM tblMyTours
WHERE tblMyTours.main_id = tblMainClient.id
ORDER BY tblMyTours.FromDate DESC, tblMyTours.ID DESC)
AS LastTour
FROM tblMainClient
And that will give us the tour name, but the last one.
This:
So, you ceratinly not limited to using "max()" in that sub query.
However, what happens if we want the Tour Name, Hotel Name, and City of that tour?
In other words, it certainly reasonable that we may well want multiple columns.
There are more ways to do this then flavors of ice cream.
However, I like using the query builder for the first part.
What I do is use the standard query builder, do a join to the table and simple slect all the columns I need.
So, for above tblMainClient, and their tours from tblMyTours?
I build a join - use query builder like this:
So, note how I added the columns TourName, FromDate, HotelName and city from that child table (tblMyTours).
Now, of course the above will return 10 rows for anyone who gone on 10 trips.
So, what we do is add a WHERE clause to the child table, get the LAST pk "id" from tblMyTours, and restrict that child table to the ONE row.
So, the above query builder gives us this:
SELECT tblMainClient.ID, tblMainClient.FirstName, tblMainClient.LastName,
tblMyTours.TourName, tblMyTours.FromDate, tblMyTours.HotelName, tblMyTours.City
FROM tblMainClient
INNER JOIN tblMyTours ON
tblMainClient.ID = tblMyTours.Main_id;
(but, I did not have to write above).
So, we add a where clause to that child table join - get the CHILD table "id" in place of TourName, or Tourdate).
So above becomes this:
SELECT tblMainClient.ID, tblMainClient.FirstName, tblMainClient.LastName,
tblMyTours.TourName, tblMyTours.FromDate, tblMyTours.HotelName,
tblMyTours.City
FROM tblMainClient
INNER JOIN tblMyTours ON tblMainClient.ID = tblMyTours.Main_id
WHERE tblMyTours.ID =
(SELECT TOP 1 ID FROM tblMyTours
WHERE tblMyTours.Main_id = tblMainClient.id
ORDER BY tblMyTours.FromDate DESC, tblMyTours.ID DESC)
Now, above is a bit advanced, but OFTEN we want SEVERAL columns. But, at least the first part of the query, the two tables, and the join was done using the query builder - I did not have to type that part in.
so, if you want JUST one column - differnt then the max() critera, then use top 1 with a order by. Do keep in mind that ONLY ONE RECORD can EVER be retunred by that query - if more then one reocrd is returned, the query enginer will fail and you get a message to this fact.
So, for a produce bought, invoice date? They could by the 1 product 2 times, or 2 invoices on the same day might occur. So, by introduction of the 2nd ORDER BY clause (by ID DESC), then that top 1 will ONLY ever return one row.
So, which of the above two?
Well, if just one column from the child table - easy. But, if you want multiple columns? Then you could probably write up a "messy" solution, but I perfect to just fire up query builder, join in the child table, click on the "several" child values I want. Get the query working - and hey, it all up to this point 100% GUI.
Then we toss in the EXTRA criteria to restrict that child table row to the ONE last row, be it simple last one based on ID DESC, or say TourDate, or whatever.
And now we get this:

Is there any way to fetch the last N rows from a MySQL table without using auto-increment field or timestamp?

There are many solutions in stackoverflow itself where the objective was to read the last n rows of the table using either an auto-increment field or timestamp: for example, the following query fetches the last ten records from a table named tab in the descending order of the field values named id which is an auto increment field in the table:
Select * from tab order by id desc limit 10
My question is: Is there any alternative way without having to get an auto increment field or timestamp to accomplish the task to get the same output?
Tips: The motivation to ask this question comes from the fact that: as we store records into tables and when query the database with a simple query without specifying any criteria like :
Select * from tab
Then the order of the output is same as the order of the records as inserted into the table. So is there any way to get the records in the reverse order of what they were entered into the database?
Data in mysql is not ordered- you don't have any guarantee on the order of the records you'll get unless you'll specify order by in your query.
So no, unless you'll order by timestamp, id, or any other field, you can't get the last rows, simply because there's no 'last' without the order
In the SQL world, order is not an inherent property of a set of data.
Thus, you get no guarantees from your RDBMS that your data will come
back in a certain order -- or even in a consistent order -- unless you
query your data with an ORDER BY clause.
So if you don't have the data sorted by some id or some column then you cannot track the data based on its sorting. So it is not guaranteed how MYSQL will store the data and hence you cannot get the last n records.
You can also check this article:
Caveats
Ordering of Rows
In the absence of ORDER BY, records may be returned in a different
order than the previous MEMORY implementation.
This is not a bug. Any application relying on a specific order without
an ORDER BY clause may deliver unexpected results. A specific order
without ORDER BY is a side effect of a storage engine and query
optimizer implementation which may and will change between minor MySQL
releases.

Which row's fields are returned when Grouping with MySQL?

I have a MySQL table with the fields id and string. ids are unique. strings are varchars and are non-unique.
I perform the following query:
SELECT id, string, COUNT( * ) AS frequency
FROM table
GROUP BY string
ORDER BY frequency DESC, id ASC
Questions
Assume the table contains three rows with identical string values, and ids 1, 2, and 3.
Which id is going to be returned ( 1, 2, or 3 )?
Which id is this query going to ORDER BY ( Same as is returned? ... see question 1 )?
Can you control which id is returned / used for ordering? eg. Return the largest id, or the first id from a GROUP.
What I'm ultimately trying to do is get a frequency occurrence for identical strings, order by that frequency, highest to lowest, and on a frequency tie, order by id with the smallest id from the group returned / ordered by. I made the situation more generic to figure out how MySQL handles this situation.
Which id is going to be returned ( 1, 2, or 3 )?
A: The server will choose for all the records that have the same name the id it wants (most likely the fastest to fetch, which is unpredictable). To cite the official documentation:
The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.
Much more information in this link.
Which id is this query going to ORDER BY ( Same as is returned? ... see question 1 )?
It makes no sense to find out in what order the data retrieved will be returned as you can't predict the result you are going to get. However, it is very likely that you get the result sorted by the unpredictable ID column.
Can you control which id is returned / used for ordering? eg. Return the largest id, or the first id from a GROUP.
You should be assuming at this point that you can't. Read again the documentation.
Making things even more clear: You can't predict the result of an improperly used GROUP BY clause. The main issue with MySQL is that it allows you to use it in a non-standard way but you need to know how to make use of that feature. The main point behind it is to group by fields that you know will always be the same. EG:
SELECT id, name, COUNT( * ) AS frequency
FROM table
GROUP BY id
Here, you know name will be unique as id functionally determines name. So the result you know is valid. If you grouped also by name this query would be more standard but will perform slightly worse in MySQL.
As a final note, take into account that, in my experience the results in those non-standard queries for the selected and non-grouped fields are usually the ones that you would get applying a GROUP BY and then an ORDER BY on that field. That is why so many times it seems to work. However, if you keep testing you will eventually find out that this happens 95% of the time. And you can not rely on that number.
The documentation says that when not grouping by all non-aggregate columns, one row for each unique combination if the grouped by columns is returned. The row selected is up to the server - ie "random"
However, in practice it is the first row encountered during processing. You can control which is encountered first by selecting from an inner query that is ordered in the order of preference of return.
For example to get the lowest id for each name (yes, undocumented, blah blah, but it works!):
SELECT id, name, COUNT( * ) AS frequency
FROM (select * from table order by id) x
GROUP BY name
ORDER BY frequency DESC, id ASC
I personally am comfortable relying on this behaviour and have never seen or heard of it behaving differently in real life. Many shun this as undocumented and "risky", but if it works, it works.

optimizing a complex query in mysql

I have two questions here but i am asking them at once as i think they are inter-related.
I am working with a complex query (Multiple joins + sub queries) and the table is pretty huge as well (around 2,00,000 records in this table).
A part of this query (a LEFT JOIN) is required to find a record which has a second lowest value in a cetain column among all the records associated with the primary key of the first table. For now I have isolated this part and thinking on the lines of -
SELECT id FROM tbl ORDER BY `myvalue` ASC LIMIT 1,1;
But there is a case where, if there is only 1 record in the table, it must return that record instead of NULL. So my first question is how do write a query for this ?
Secondly, considering the size of the table and the time its already taking to run even after creating indexes, I understand that adding any more complexity to it in order to achieve the above part might affect the querying time dramatically.
I cannot decompose joins because I need to get some of the columns for the ORDER BY clause (the application has an option to sort the result by these columns, the above column "myvalue" being one of them)
What would be the way(s) to approach this problem ?
Thanks
Something like this might work
COALESCE(
(SELECT id FROM tbl ORDER BY `myvalue` ASC LIMIT 1,1),
(SELECT id FROM tbl ORDER BY `myvalue` ASC LIMIT 0,1))
It selects the first non null value from the list provided.
As for the complexity of the query, post the whole thing so we can take a look at it.

avoid Sorting by the MYSQL IN Keyword

When querying the db for a set of ids, mysql doesnot provide the results in the order by which the ids were specified. The query i am using is the following:
SELECT id ,title, date FROM Table WHERE id in (7,1,5,9,3)
in return the result provided is in the order 1,3,5,7,9.
How can i avoid this auto sorting
If you want to order your result by id in the order specified in the in clause you can make use of FIND_IN_SET as:
SELECT id ,title, date
FROM Table
WHERE id in (7,1,5,9,3)
ORDER BY FIND_IN_SET(id,'7,1,5,9,3')
There is no auto-sorting or default sorting going on. The sorting you're seeing is most likely the natural sorting of rows within the table, ie. the order they were inserted. If you want the results sorted in some other way, specify it using an ORDER BY clause. There is no way in SQL to specify that a sort order should follow the ordering of items in an IN clause.
The WHERE clause in SQL does not affect the sort order; the ORDER BY clause does that.
If you don't specify a sort order using ORDER BY, SQL will pick its own order, which will typically be the order of the primary key, but could be anything.
If you want the records in a particular order, you need to specify an ORDER BY clause that tells SQL the order you want.
If the order you want is based solely on that odd sequence of IDs, then you'd need to specify that in the ORDER BY clause. It will be tricky to specify exactly that. It is possible, but will need some awkward SQL code, and will slow down the query significantly (due to it no longer using a key to find the records).
If your desired ID sequence is because of some other factor that is more predictable (say for example, you actually want the records in alphabetical name order), you can just do ORDER BY name (or whatever the field is).
If you really want to sort by the ID in an arbitrary sequence, you may need to generate a temporary field which you can use to sort by:
SELECT *,
CASE id
WHEN 7 THEN 1
WHEN 1 THEN 2
WHEN 5 THEN 3
WHEN 3 THEN 4
WHEN 9 THEN 5
END AS mysortorder
FROM mytable
WHERE id in (7,1,5,9,3)
ORDER BY mysortorder;
The behaviour you are seeing is a result of query optimisation, I expect that you have an index on id so that the IN statement will use the index to return records in the most efficient way. As an ORDER BY statement has not been specified the database will assume that the order of the return records is not important and will optimise for speed. (Checkout "EXPLAIN SELECT")
CodeAddicts or Spudley's answer will give the result you want. An alternative is assigning a priority to the id's in "mytable" (or another table) and using this to order the records as desired.