MySQL WHERE, LIMIT and pagination - mysql

I have tables: documents, languages and document_languages. Documents exist in one or more languages and this relationship is mapped in document_languages.
Imagine now I want to display the documents and all of its languages on a page, and paginate my result set to show 10 records on each page. There will be a WHERE statement, specifying which languages should be retrieved (ex: en, fr, it).
Even though I only want to display 10 documents on the page (LIMIT 10), I have to return more than 10 records if a document has more than one language (which most do).
How can you combine the WHERE statement with the LIMIT in a single query to get the records I need?

Use sub query to filter only documents records
select * from
(select * from documents limit 0,10) as doc,
languages lan,
document_languages dl
where doc.docid = dl.docid
and lan.langid = dl.langid
Check sub query doc as well
http://dev.mysql.com/doc/refman/5.0/en/from-clause-subqueries.html
http://dev.mysql.com/doc/refman/5.0/en/subqueries.html

You can add a little counter to each row counting how many unique documents you're returning and then return just 10. You just specify what document_id to start with and then it returns the next coming 10.
SELECT document_id,
if (#storedDocumentId <> document_id,(#docNum:=#docNum+1),#docNum),
#storedDocumentId:=document_id
FROM document, document_languages,(SELECT #docNum:=0) AS document_count
where #docNum<10
and document_id>=1234
and document.id=document_languages.document_id
order by document_id;

I created these tables:
create table documents (iddocument int, name varchar(30));
create table languages (idlang char(2), lang_name varchar(30));
create table document_languages (iddocument int, idlang char(2));
Make a basic query using GROUP_CONCAT function to obtain the traspose of languages results:
select d.iddocument, group_concat(dl.idlang)
from documents d, document_languages dl
where d.iddocument = dl.iddocument
group by d.iddocument;
And finally set the number of the documents with LIMIT option:
select d.iddocument, group_concat(dl.idlang)
from documents d, document_languages dl
where d.iddocument = dl.iddocument
group by d.iddocument limit 10;
You can check more info about GROUP_CONCAT here: http://dev.mysql.com/doc/refman/5.0/es/group-by-functions.html

Hmmmm... so, if you post your query (SQL statement), it might be easier to spot the error. Your outermost LIMIT statement should "do the trick." As Rakesh said, you can use subqueries. However, depending on your data, you may (probably) just want to use simple JOINs (e.g. where a.id = b.id...).
This should be fairly straightforward in MySQL. In the unlikely case that you're doing something "fancy," you can always pull the datasets into variables to be parsed by an external language (e.g., Python). In the case that you're literally just trying to limit screen output (interactive session), check-out the "pager" command (I like "pager less;").
Lastly, check-out using the UNION statement. I hope that something, here, is useful. Good luck!

Related

Single query VS Multiple query : execution time and resource usage

I'd like to know what are the downsides of using an "IN" restriction with a lot of values in it.
SELECT count(*), mail
FROM event, contacts
WHERE event.contactid = contacts.id
AND event_type = 1
AND mail IN (#1, #2, #3, etc.)
GROUP BY mail;
Also, do you think it would be better to split these queries into multiple ones that are executed in parallel ? What would be the consequences in terms of resource usage and execution time (for example) compared to the first solution ?
Thanks in advance
Never use commas in the FROM clause. Always use proper, explicit JOIN syntax.
Qualify your column names and use table aliases:
SELECT count(*), c.mail -- I am guessing `mail` comes from `contacts`
FROM event e JOIN
contacts c
ON e.contactid = c.id
WHERE e.event_type = 1 AND
c.mail IN (#1, #2, #3, etc.)
GROUP BY c.mail;
There is no downside to having a large IN list (well, within reason -- at some point you may hit query length limits). In fact, MySQL has a nice optimization when constants are used for the IN list. It sorts the list and does a binary search on the values.
That said, if the list is coming from another table/query, then you should not put them in as constants. Instead, you should incorporate the table/query into this query.

Rails select subquery (without finder_sql, if possible)

I have an model called Object (doesn't really matter what it is)
It has a default price (column is called "price").
And then there is a Schedule object that allows to override the price for specific dates.
I want to be able to determine the MINIMUM price (which is by definition the MINIMUM between the default and "current" price) during the SQL-query just in order to be able to ORDER BY the calculated minimum price
I want to make my search query as efficient as possible and I was wondering if I can
do something like that:
Object.select("id AS p_id, id, (SELECT MIN(`schedules`.`price`) FROM `schedules` WHERE `schedules`.`object_id` = p_id`) AS objects.min_price").limit(5)
But, it generates an odd SQL that looks like this:
SELECT `objects`.`id` AS t0_r0, `objects`.`title` AS t0_r1, `objects`.`created_at` AS t0_r2, `objects`.`updated_at` AS t0_r3, `objects`.`preferences` AS t0_r4 ........ (a lot of columns here) ... ` WHERE `objects`.`id` IN (1, 2, 3, 4 ....)
So, as you can see it doesn't work. First of all - it loads all the columns from the objects table, and second of all - it looks horrible.
The reason why I don't want to use finder_sql is that I have a lot of optional parameters and stuff, so using the AR::Relation object is highly preferred prior to fetching the results themselves.
In addition to abovementioned, I have a lot of records in the DB, and I think that loading them all into the memory is not a good idea and that is the main reason why I want to perform this subquery - just to filter-out as many records as possible.
Can someone help me how to do it more efficiently ?
You can make this easier if you generate the subquery separately and use a join instead of a correlated subquery:
subquery = Schedule.select('MIN(price) as min_price, object_id')
.group(:object_id)
.to_sql
Object.joins("JOIN (#{subquery}) schedules ON objects.p_id = schedules.object_id")
.select('objects.*, schedules.min_price')
.limit(5)

Sorting redundancies while fetching MySQL entries using ordered Sphinx Search output

I have a MySQL table that I indexed using Sphinx, with a bunch of columns as attributes that I want to let my users sort their search results by (e.g. name, ratings, etc.).
So I tell Sphinx to do this (for example, in PHP):
$sphinx = new SphinxClient();
// Retrieve $query, $sort_attr, and $order from $_GET
$sphinx->SetMatchMode(SPH_MATCH_ANY);
$sphinx->SetArrayResult(true);
$sphinx->SetSortMode($order, $sort_attr);
$sphinx->SetLimits( /* something reasonable, <1000 */ );
$results_sphinx = $sphinx->Query($query, 'table');
This works and I get my ordered results.
I also want to display all the attributes (and some other columns that should remain unindexed) as part of the search results. This means that I have to fetch each item of the search results from the DB.
So I make the following MySQL call:
SELECT id, colA, colB, [...] FROM table WHERE table.id IN ([IDs returned from Sphinx, in some sorted order])
However, even if my list of IDs returned from Sphinx are in some sorted order according to the attribute columns (e.g. alphabetical order), WHERE IN will return results in the order of the table's index column, which in this case is the IDs themselves.
The only option I have in mind is to use ORDER BY:
SELECT id, colA, colB, [...] FROM table WHERE table.id IN ([IDs returned from Sphinx, in some sorted order]) ORDER BY [attribute] [DESC|ASC]
This works, but I just made both Sphinx and MySQL sort the same set of data for each search instance. This feels sub-optimal. I don't think I can leave the sorting to the latter MySQL call either, as I intend to have pagination in my results, so the IDs returned from Sphinx have to be in some order to begin with.
Can StackOverflow find me a way to avoid this redundancy? Please pick apart anything that I did above.
Thanks!
How many IDs are you returning at a time? If it isn't many I would suggest using the MySQL ORDER BY FIELD as such
SELECT id, colA, colB, ... FROM table WHERE table.id IN (id1,id2,id3,...) ORDER BY FIELD (table.id,id1,id2,id3,....)
I do the exact same thing for my Sphinx/MySQL searches and retrievals, works great, never had a slow query (although I'm only fetching between 6 and 12 IDs at a time).

Best way to combine multiple advanced mysql select queries

I have multiple select statements from different tables on the same database. I was using multiple, separate queries then loading to my array and sorting (again, after ordering in query).
I would like to combine into one statement to speed up results and make it easier to "load more" (see bottom).
Each query uses SELECT, LEFT JOIN, WHERE and ORDER BY commands which are not the same for each table.
I may not need order by in each statement, but I want the end result, ultimately, to be ordered by a field representing a time (not necessarily the same field name across all tables).
I would want to limit total query results to a number, in my case 100.
I then use a loop through results and for each row I test if OBJECTNAME_ID (ie; comment_id, event_id, upload_id) isset then LOAD_WHATEVER_OBJECT which takes the row and pushes data into an array.
I won't have to sort the array afterwards because it was loaded in order via mysql.
Later in the app, I will "load more" by skipping the first 100, 200 or whatever page*100 is and limit by 100 again with the same query.
The end result from the database would pref look like "this":
RESULT - selected fields from a table - field to sort on is greatest
RESULT - selected fields from a possibly different table - field to sort on is next greatest
RESULT - selected fields from a possibly different table table - field to sort on is third greatest
etc, etc
I see a lot of simpler combined statements, but nothing quite like this.
Any help would be GREATLY appreciated.
easiest way might be a UNION here ( http://dev.mysql.com/doc/refman/5.0/en/union.html ):
(SELECT a,b,c FROM t1)
UNION
(SELECT d AS a, e AS b, f AS c FROM t2)
ORDER BY a DESC

Why does MySQL allow "group by" queries WITHOUT aggregate functions?

Surprise -- this is a perfectly valid query in MySQL:
select X, Y from someTable group by X
If you tried this query in Oracle or SQL Server, you’d get the natural error message:
Column 'Y' is invalid in the select list because it is not contained in
either an aggregate function or the GROUP BY clause.
So how does MySQL determine which Y to show for each X? It just picks one. From what I can tell, it just picks the first Y it finds. The rationale being, if Y is neither an aggregate function nor in the group by clause, then specifying “select Y” in your query makes no sense to begin with. Therefore, I as the database engine will return whatever I want, and you’ll like it.
There’s even a MySQL configuration parameter to turn off this “looseness”.
http://dev.mysql.com/doc/refman/5.7/en/sql-mode.html#sqlmode_only_full_group_by
This article even mentions how MySQL has been criticized for being ANSI-SQL non-compliant in this regard.
http://www.oreillynet.com/databases/blog/2007/05/debunking_group_by_myths.html
My question is: Why was MySQL designed this way? What was their rationale for breaking with ANSI-SQL?
According to this page (the 5.0 online manual), it's for better performance and user convenience.
I believe that it was to handle the case where grouping by one field would imply other fields are also being grouped:
SELECT user.id, user.name, COUNT(post.*) AS posts
FROM user
LEFT OUTER JOIN post ON post.owner_id=user.id
GROUP BY user.id
In this case the user.name will always be unique per user.id, so there is convenience in not requiring the user.name in the GROUP BY clause (although, as you say, there is definite scope for problems)
Unfortunately almost all the SQL varieties have situations where they break ANSI and have unpredictable results.
It sounds to me like they intended it to be treated like the "FIRST(Y)" function that many other systems have.
More than likely, this construct is something that the MySQL team regret, but don't want to stop supporting because of the number of applications that would break.
MySQL treats this is a single column DISTINCT when you use GROUP BY without an aggregate function. Using other options you either have the whole result be distinct, or have to use subqueries, etc. The question is whether the results are truly predictable.
Also, good info is in this thread.
From what I have read in the mysql reference page, it says:
"You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group."
I suggest you to read this page (link to the reference manual of mysql):
http://dev.mysql.com/doc/refman/5.5/en//group-by-extensions.html
Its actually a very useful tool that all other fields dont have to be in an aggregate function when you group by a field. You can manipulate the result which will be returned by simply ordering it first and then grouping it after. for instance if i wanted to get user login information and i wanted to see the last time the user logged in i would do this.
Tables
USER
user_id | name
USER_LOGIN_HISTORY
user_id | date_logged_in
USER_LOGIN_HISTORY has multiple rows for one user so if i joined users to it it would return many rows. as i am only interested in the last entry i would do this
select
user_id,
name,
date_logged_in
from(
select
u.user_id,
u.name,
ulh.date_logged_in
from users as u
join user_login_history as ulh
on u.user_id = ulh.user_id
where u.user_id = 1234
order by ulh.date_logged_in desc
)as table1
group by user_id
This would return one row with the name of the user and the last time that user logged in.