Consistent random ordering in a MySQL query - mysql

I have a database of pictures and I want to let visitors browse the pictures. I have one "next" and one "previous" link.
But what I want is to show every visitor anther order of the pictures. How can I do that? If I will use ORDER BY RANDOM() I will show sometimes duplicate images.
Can someone help me please? Thank you!

You can try to use seed in random function:
SELECT something
FROM somewhere
ORDER BY rand(123)
123 is a seed. Random should return the same values.

The problem arises from the fact that each page will run RAND() again and has no way of knowing if the returned pictures have already been returned before. You would have to compose your query in such a way that you can filter out the pictures already presented on the previous pages, so that RAND() will have fewer options to choose from.
An idea would be to randomize the pictures, select the IDs, store the IDs in the session, then SELECT using those IDs. This way, each user will have the pictures randomized, but they will be able to paginate through them without re-randomizing them on each page.
So, something like:
SELECT id FROM pictures ORDER BY RAND() LIMIT x if you don't have the IDs in the session already
Store the IDs in the session
SELECT ... FROM pictures WHERE id IN (IDs from session) LIMIT x
Another idea is to store in session the IDs that the user already saw and filter them out. For example:
SELECT ... FROM pictures ORDER BY RAND() LIMIT x if the session doesn't contain any ID
Append the IDs from the current query to the session
SELECT ... FROM pictures WHERE id NOT IN (IDs from session) ORDER BY RAND() LIMIT x
Another way seems to be to use a seed, as izi points out. I have to say I didn't know about the seed, but it seems to return the exact same results for the exact same value of the seed. So, run your usual query and use RAND(seed) instead of RAND(), where "seed" is a unique string or number. You can use the session ID as a seed, because it's guaranteed to be unique for each visitor.

You can seed the random function as suggested by izi, or keep track of visited images vs non-visited images as suggested by rdineiu.
I'd like to stress that neither option will perform well, however. Either will lead you to sorting your entire table (or the part of it of interest) using an arbitrary criteria and extracting the top n rows, possibly with an offset. It'll be dreadfully slow.
Thus, consider for a moment how important it is that every visitor should get a different image order. Probably, it'll be not that important, as long as things look random. Assuming this is the case, consider this alternative...
Add an extra float field to your table, call it sort_ord. Add an index on it. On every insert or update, assign it a random value. The point here is to end up with a seemingly random order (from the visitor's standpoint) without compromising performance.
Such a setup will allow you to grab the top n rows and paginate your images using an index, rather than by sorting your entire table.
At your option, have a cron job periodically set a new value:
update yourtable
set sort_ord = rand();
Also at your option, create several such fields and assign one to visitors when they visit your site (cookie or session).

This will solve:
SELECT DISTINCT RAND() as rnd, [rest of your query] ORDER BY rnd;

Use RAND(SEED). From the docs: "If a constant integer argument N is specified, it is used as the seed value." (http://dev.mysql.com/doc/refman/5.0/en/mathematical-functions.html#function_rand).
In the example above the result order is always the same. You simply change the seed (351) and you get a new random order.
SELECT * FROM your_table ORDER BY RAND(351);
You can to change the seed every time the user hits the first page.

Without seeing the SQL I'd guess you could try SELECT DISTINCT...

Related

Does pymysql's lastrowid function work properly in cases of multiple users inserting into DB?

I have 2 tables in my DB purchase order and lines. Every order can have multiple lines(one line for each part ordered). I am developing an application where an order will first be created. Need to get the ID of this order and then insert lines later on(as the user adds the parts). How can I ensure ther correct value of order ID is fetched?
I cant understand what you exactly want but you should search about ORM its maybe can help you.

SQL: paging without repetition from an dynamic table

I have an SQL table called articles from which I load rows divided by pages. My SQL is
SELECT ...
ORDER BY $orderCol DESC
LIMIT $offset, $numPerPage
On page one the limit is 0, $numPerPage, page two it's $numPerPage, 2 * $numPerPage etc.
The problem: When a new row is inserted before page 2 is loaded, the last article from page 1 will be the first article in page 2 etc. How can I avoid this?
I thought about adding a WHERE clause to select articles starting from the last $orderCol, but this field is not unique (it's a date in my case) so I'll miss articles with the same value here. The primary index is also a problem because it's not ordered the same way as $orderCol
It's not necessary that the newly added row will appear at any point. This will require a refresh.
Your LIMIT should be something like below rather. Define a $page variable which will change from 1 .. no.of pages you want.
LIMIT $offset, $page * $numPerPage
OK, in that case you will have to re-calculate your $pages and $numPerPage variable every time (on refresh) and define the paging accordingly.
The solution I found is to add a condition in the WHERE clause.
Rahul suggested that I should count the new articles since the previous query and add it to the offset. This was tricky because since counting the articles would imply that I can run a query that stops where the previous one began, and if I had a way to do that, I could've just as easily make the second query start where the first ended.
So I realized I needed a new column. I called it date-added and the new condition is, date-added < $time-of-first-query for all subsequent queries. Then, the offset is just the number of articles written so far.

MYSQL LIMIT. Is it possible to skip certain rows?

Sorry if this question is confusing.
I have inherited a site that is already built, so I can't really do anything too drastic.
The MYSQL query on a certain page uses LIMIT to only show the relevant entries like this:
comtitlesub.idcts = %s LIMIT 1,3
Skipping the first record and displaying the following three records.
I have been asked to add a new record, which is fine, but this is record number 7. Records 5 and 6 are not supposed to display on this page so changing the query to:
comtitlesub.idcts = %s LIMIT 1,6
displays all 6 records as you would expect.
One confusing thing is that I have altered the ID's for each of the records so that my new one is ID 4, and yet this did not make a difference.
Is there a simple way to 'skip' the unwanted records or am I approaching this from the wrong direction?
add "order by comtitlesub.idcts" at the end of you query, but before the limit clause.
... comtitlesub.idcts = %s ORDER BY comtitlesub.idcts LIMIT 1,6
basically, changing the id doesn't reorder them, rows are stored in order they have been created, and retrieved that way by default.
Andrew, LIMIT will delimit according to a specific order, in your case, coincidentally the default order was the same than ID order, now that you've changed it, you will need to order by ID:
ORDER BY comtitlesub.idcts
I believe the easiest course would be to modify your WHERE clause to exclude the rows you want excluded. For example:
WHERE comtitlesub.idcts = %s AND someothercol NOT IN ('cat','frog','kazoo')
Ideally for maintainability, you would want someothercol to hold stable data rather than a numeric ID which might change as your application data changes.

'Natural sorting' with MySQL?

I'm trying to query a Wordpress database and get the post titles to sort in a correct order.
The titles are formatted like this: Title 1, Title 2.. I need to sort them in ascending order, how can I do this? If I just sort them ascending they will come out like: 1,10,11...
Right now my order by statement is this but it does nothing:
ORDER BY CONVERT(p.post_title,SIGNED) ASC;
Per-row functions are a bad idea in any database that you want to scale well. That's because they have to perform the calculation on every row you retrieve every time you do a select.
The intelligent DBA's way of doing this is to create a whole new column containing the computed sort key, and use an insert/update trigger to ensure it's set correctly. The means the calculation is performed only when needed and amortises its cost across all selects.
This is one of the few cases where it's okay to revert from third normal form since the use of the triggers prevents data inconsistency. Hardly anyone complains about the disk space taken up by their databases, the vast majority of questions concern speed.
And, by using this method and indexing the new column, your queries will absolutely scream along.
So basically, you create another column called natural_title mapped as follows:
title natural_title
----- -------------
title 1 title 00001
title 2 title 00002
title 10 title 00010
title 1024 title 01024
ensuring that the mapping function used in the trigger allows for the maximum value allowed. Then you use a query like:
select title from articles
order by natural_title asc
If the # is always at the end like that you can do some string manipulation to make it work:
SELECT *, CAST(RIGHT(p.post_title,2) AS UNSIGNED) AS TITLE_INDEX
FROM wp_posts p
ORDER BY TITLE_INDEX asc
Might have to tweak it a bit assuming you may have 100+ or a 1000+ numbers as well.

Generate number id from text/url for fast "SELECT"

I have the following problem:
I have a feed capturer that captures news from different sources every half an hour.
I only insert entries that don't have their URLs already in the database (URL is used to see if the record is already in database).
Even with that, I get some repeated entries, because some sites report the same news (that usually are from a news source like Reuters). I could look for these repeated entries during insertion, but i think this would slow the insertion time even more.
So, I can later find these repeated entries by the title. But I think this search is slow. Then, my idea is to generate a numeric field from the title and then search by this number for repeated titles.
What kind of encoding could I use (I thought in something reverse to base64) to encode the titles?
I'm suposing that searching for repeated numbers is a lot faster than searching for repeated words. Is that true or not?
Do you suggest a better solution for this problem?
Well, I don't care to have the repeated entries in the database, I just don't want to show then to the user. Like google, that filters the repeated results, but shows then if you want.
I hope I explained It well. Thanks in advance.
Fill the MD5 hash of the URL and title and build a UNIQUE index on it:
CREATE UNIQUE INDEX ux_mytable_title_url ON (title_hash, url_hash)
INSERT
INTO mytable (url, title, url_hash, title_hash)
VALUES ('url', 'title', MD5('url'), MD5('title'))
To select like Google (one result per title), use this query:
SELECT *
FROM (
SELECT DISTINCT title_hash
FROM mytable
) md
JOIN mytable mo
ON mo.url_title = md.title_hash
AND mo.url_hash =
(
SELECT url_hash
FROM mytable mi
WHERE mi.title_hash = md.title_hash
ORDER BY
mi.title_hash, mi.url_hash
LIMIT 1
)
so you can use a new table containing only the encoded keys based on title and url, you have then to add a key on it to accelerate search. But i don't think that you can use an effecient algorytm to transform strings to numbers ..
for the encryption use
SELECT MD5(CONCAT('title', 'url'));
and before every insertion you test if the encoded concatenation of title and url exists on this table.
#Quassnoi can explain better than I, but I think there is no visible difference in performance if you use a VARCHAR/CHAR or INT in a index to use it later for GROUPing or other method to find the duplicates. That way you could use the solution proposed by him but use a normal INDEX instead of a UNIQUE index and keep the duplicates in the database, filtering out only when showing to users.