I have this SQL:
UPDATE products pr
SET pr.product_model_id = (SELECT id FROM product_models pm WHERE pm.category_id = 1 ORDER BY rand() LIMIT 1)
limit 200;
It took the mysql-server more then 15 seconds for these 200 records. and in the tablem there are 220,000 records.
why is that?
edit:
I have these tables which are empty, but I need to fill them with random information for testing.
True estimations shows that I will have:
80 categories
40,000 models
And, around 500,000 products
So, I've manually created:
ALL the categories.
200 models (and used sql to duplicate them to 20k).
200 products (and duplicated them to 250k)
I need them all attached.
DB tables are:
categories {id, ...}
product_models {id, category_id, ...}
products {id, product_model_id, category_id}
Although question seems to be little odd but here is a quick thought about the problem.
RAND function doesn't perform well on large data-set.
In mysql, developer try to achieve this in different ways, check these posts:
How can i optimize MySQL's ORDER BY RAND() function?
http://www.titov.net/2005/09/21/do-not-use-order-by-rand-or-how-to-get-random-rows-from-table/
One of the quick way is following(in php):
//get the total number of row
$result= mysql_query("SELECT count(*) as count
FROM product_models pm WHERE pm.category_id = 1 ");
$row = mysql_fetch_array($result);
$total=$row['count'];
//create random value from 1 to the total of rows
$randomvalue =rand(1,$total);
//get the random row
$result= mysql_query("UPDATE products pr
SET pr.product_model_id =
(SELECT id FROM product_models pm
WHERE pm.category_id = 1
LIMIT $randomvalue,1)
limit 200");
Hope this will help.
The 'problem' is the ORDER BY rand()
Related
I have a MySQL Database with over 100k rows, so I need to make a search to fetch only the last 1000 rows , so if it is not found in the last 1000 rows the fetch ends (even if it is not found)
Example: if my table is like that
id name
1 AL
2 BL
...
1000 P12
1001 P15
And I do a fetch like this: SELECT * FROM myTable WHERE name = 'AL' ONLY LAST 1000 ROWS ORDER BY id DESC (Since I don't know what to use I invented the ONLY LAST 1000 ROWS)
This should return empty because I wanted my query to get the information only if it was on the last 1000 rows, not on the 1001th as specified.
Using LIMIT field doesn't work as it would LIMIT the FOUND ROWS not when they are not found.
Is there a way to implement this in MySQL ?
Thank you!
As touched on in the comments, you can use OFFSET to get the id of the 1000th last record, then SELECT records with an id larger than that record's id.
Something like this:
SELECT name
FROM myTable
WHERE id > (SELECT id FROM myTable ORDER BY id DESC LIMIT 1 OFFSET 1000)
AND name = 'AL'
I have two tables one called topic as shown below!
topic_id topic_name
1 topic 1
2 topic 2
3 topic 3
and another table called questions as shown
q_id question_name topic_id
1 question 1 1
2 question 2 1
3 question 3 1
4 question 4 2
5 question 5 2
6 question 6 2
7 question 7 3
8 question 8 3
9 question 9 3
i want to choose random 2 question from given three topic. Someone please help me to fix this issue
Get list of topics with their question IDs GROUP_CONCAT([column] order by RAND()).
And then link table to itself.
SELECT t.q_id, t.question_name, t.topic_id
FROM table t
JOIN (
SELECT topic_id, SUBSTRING_INDEX(GROUP_CONCAT(q_id ORDER BY RAND()), ',', 2) as qList
FROM table GROUP BY topic_id
) tGrouped ON FIND_IN_SET(t.q_id, tGrouped.qList)>0
One can sort the rows randomly and then fetch the top row from this random order
For two random questions which could have same topic:
SELECT * FROM questions
ORDER BY RAND()
LIMIT 2
For two random questions which should have different topic:
Use 2 different queries which take as parameters two different topic_ids (t1, t2):
First select 2 random topic ids (similarly to above code):
SELECT topic_id FROM topics
ORDER BY RAND()
LIMIT 2
Then select 2 random questions with these topics ids (2 select statements)
SELECT * FROM questions
WHERE topic_id = t1
ORDER BY RAND()
LIMIT 1
SELECT * FROM questions
WHERE topic_id = t2
ORDER BY RAND()
LIMIT 1
UPDATE (after OP's comment and explanation)
To obtain two random questions from every topic use a variation of the above solutions:
3 select statements (one for each topic):
SELECT * FROM questions
WHERE topic_id = needed_topic_id_here
ORDER BY RAND()
LIMIT 2
repeat the select for every topic_id.
Presumably these select statements could be combined into one big select statement, but i'm not sure at this point.
Note as pointed out in another answer, this could be less efficient (to randomly select in pure sql) and a better solution would be to pre-compute random indices in PHP (or whatever your platform is) and then actually select the random questions. Since no language is mentioned in the question, i'll leave it here (and point to the other answer(s) for this approach)
You can use ORDER BY RAND() and LIMIT 2 in the query but it runs painfully slow for tables that have thousand of records or more.
A better approach for big tables is to get the bounding values of the PK field using the WHERE condition you need, generate 2 random numbers smaller between these bounding values in PHP then issue 2 MySQL queries to get 2 questions.
Something along these lines:
$query = '
SELECT MIN(q_id) AS min_id, MAX(q_id) AS max_id
FROM questions
WHERE topic_id = 1 # put the filtering you need here
';
// Run the query
// ... use your regular PHP code for database access here ...
// get and store the returned values in PHP variables $minId and $maxId
// Keep the generated random values here to avoid duplicates
$list = array();
// Get N random questions from the database
for ($cnt = 0; $cnt < N; $cnt ++) {
// Generate a new ID that is not in the list
do {
$id = rand($minId, $maxId);
} while (in_array($id, $list);
// Put it into the list to avoid generating it again
$list[] = $id;
// Get the random question
$query = "
SELECT *
FROM questions
WHERE topic_id = 1
AND q_id <= $id
ORDER BY q_id DESC
LIMIT 1
";
// Run the query, get the question
// ... use your regular PHP code for database access here ...
}
No matter what queries you run (these or others provided by other answer), you need indexes on q_id and the columns used in the WHERE clause.
I hope q_id is the PK of the table which means it already is an UNIQUE INDEX.
Provide 3 topic ids to get 2 questions at random:
select
q.question_name
from
topics t,
questions q
where
t.topic_id = q.topic_id and
t.topic_id in (1, 2, 3) /*define your 3 given topics*/
order by rand() limit 0,2;
I need to show some records sorted based on modified column (latest activity on top)
(Post with new edit or comments at the top)
App UI has twitter like 'more' post button for infinite scroll. each 'more' will add next 10 records to UI.
Issue is that pagination index breaks when any of the to be shown record is modified
for example
Suppose i have records A,B,C,..Z in jobs table.
first time I'm' showing the records A-J to the user using
SELECT * FROM Jobs WHERE 1 ORDER BY last_modified DESC LIMIT 0, 10
second time if none of the records are modified
SELECT * FROM Jobs WHERE 1 ORDER BY last_modified DESC LIMIT 10, 10
will return K-T
But if some body modifies any records after J before the user clicks 'more button',
SELECT * FROM Jobs WHERE 1 ORDER BY last_modified DESC LIMIT 10, 10
will return J-S
Here record J is duplicated. I can hide it by not inserting J to the UI, but the more button will show only 9 records. But this mechanism fails when large number of records are updated, If 10 records are modified, the query will return A-J again.
What is the best way to handle this pagination issue?
Keeping a second time stamp fails if a record has multiple updates.
Server cache of queries?
I would do a NOT IN() and a LIMIT instead of just a straight LIMIT with a pre-set offset.
SELECT * FROM Jobs WHERE name NOT IN('A','B','C','D','E','F','G','H','I','J')
ORDER BY last_modified DESC LIMIT 10
This way you still get the most recent 10 every time but you would need to be tracking what IDs have already been shown and constantly negative match on those in your sql query.
Twitter timelines not paged queries they are queried by ids
This page will help you a lot understanding timeline basics https://dev.twitter.com/docs/working-with-timelines
lets say each column have id field too
id msg
1 A
2 B
....
First query will give you 10 post and max post_id will be 10
Next query should be
SELECT * FROM Jobs WHERE id > 10 ORDER BY last_modified DESC LIMIT 0, 10
I don't know the exact solution but I can give it a try.
First u need an integer ID column in your Job table.
Now send a max_id = null along with limit = 10 and offset = 0 from UI.
In this case if max_id is null, set max_id to (MAX(ID) + 1) of Table.
SELECT (MAX(ID) + 1) INTO max_id FROM Jobs;
Later find the records:
SELECT * FROM Jobs WHERE ID < max_id ORDER BY last_modified DESC LIMIT 10 OFFSET 0;
Return the records to UI.
Now from UI set max_id = ID of first record in the response array, offset = offset + limit.
Now onwards try with updated values of max_id and offset:
SELECT * FROM Jobs WHERE ID < max_id ORDER BY last_modified DESC LIMIT 10 OFFSET 10;
I have an online rugby manager game. Every registered user has one team, and each team has 25 players at the beginning. There are friendly, league, cup matches.
I want to show for each player page the number of:
official games played,
tries,
conversions,
penalities,
dropgoals
and for each of the categories:
in this season;
in his career;
for national team;
for U-20 national team
I have two options:
$query = "SELECT id FROM tbl_matches WHERE type='0' AND standing='1'";
$query = mysql_query($query, $is_connected) or die('Can\'t connect to database.');
while ($match = mysql_fetch_array($query)) {
$query2 = "SELECT COUNT(*) as 'number' FROM tbl_comm WHERE matchid='$match[id]' AND player='$player' and result='5'";
$query2 = mysql_query($query2, $is_connected) or die('Can\'t connect to database.');
$try = mysql_fetch_array($query2);
}
This script searches every official match played by the selected player. Then gets the report for that match (about 20 commentary lines for every match) and check every line if the searched player had scored a try.
The problem is that in a few seasons there could be about 20 000 000 row in commentary page. Will my script load slowly (notable by users)?
The second option is to create a table with player stats, who will have about 21 cols.
What do you recommend that I do?
Why do all of those separate queries when you can just group them all together and get your count by id:
select tbl_matches.id,count(*)
from tbl_matches join tbl_comm on tbl_matches.id = tbl_comm.matchid
where tbl_comm.player = '$player'
and tbl_comm.result = '5'
and tbl_matches.type='0'
and tbl_matches.standing='1'
group by tbl_matches.id;
If you need additional columns, just add them to both the select columns and the group by column list.
Also: you should be extremely wary about substituting $player directly into your query. If that string isn't properly escaped, that could be the source of a SQL-injection attack.
EDIT: fixed query per Jonathan's comment
I have a high score database for a game that tracks every play in a variety of worlds. What I want to do is find out some statistics on the plays, and then find where each world "ranks" according to each other world (sorted by number of times played).
So far I've got all my statistics working fine, however I've run into a problem finding the ranking of each world.
I'm also pretty sure doing this in three separate queries is probably a very slow way to go about this and could probably be improved.
I have a timestamp column (not used here) and the "world" column indexed in the DB schema. Here's a selection of my source:
function getStast($worldName) {
// ## First find the number of wins and some other data:
$query = "SELECT COUNT(*) AS total,
AVG(score) AS avgScore,
SUM(score) AS totalScore
FROM highscores
WHERE world = '$worldName'
AND victory = 1";
$win = $row['total'];
// ## Then find the number of losses:
$query = "SELECT COUNT(*) AS total
FROM highscores
WHERE world = '$worldName'
AND victory = 0";
$loss = $row['total'];
$total = $win + $loss;
// ## Then find the rank (this is the broken bit):
$query="SELECT world, count(*) AS total
FROM highscores
WHERE total > $total
GROUP BY world
ORDER BY total DESC";
$rank = $row['total']+1;
// ## ... Then output things.
}
I believe the specific line of code that's failing me is in the RANK query,
WHERE total > $total
Is it not working because it can't accept a calculated total as an argument in the WHERE clause?
Finally, is there a more efficient way to calculate all of this in a single SQL query?
I think you might want to use 'having total > $total'?
SELECT world, count(*) AS total
FROM highscores
GROUP BY world
having total > $total
ORDER BY total DESC