I have a SELECT query that I am expecting millions of results from. I need to randomize these results in MySQL. Doing it in my script after the query obviously uses too much RAM. Can someone please rework this query so that the results are all random without using order by rand()? I have seen some examples and tried to use them but they don't work for me since they all seem to depend on returning the whole table rather than using a WHERE clause. Here is my query:
SELECT * FROM pool
WHERE gender = 'f'
AND (`location` = 'united states' OR `location` = 'us' OR `location` = 'usa');
If you have 10 million lines with id's, are they in a contiguous range?
Get the lowest id from the range you want via a quick select.
Get the largest id as well.
generate random numbers in this range using php
once you have your numbers "SELECT * FROM table1 WHERE id IN (the numbers you generated)" or something like that
If you can use other language, for example php you can use its rand() function to generate ids and add the to the query like
$ids = range($mini, $maxid);
shuffle($ids);
array_slice($ids, 0, $quantity);
Or something similar in any language you are using.
If you need to do this in pure mysql query then here are some alternaties: http://www.electrictoolbox.com/msyql-alternative-order-by-rand/
Related
I have used SphinxSE to make a query for MySQL. In my case, I have to use two different Sphinx queries together with an OR condition.
My question is how can I accurately calculate the total result based on all the individual query results.
For example, the two queries look like this:
The first query uses the GEODIST function to calculate the distance between two sets of coordinates.
`query` = 'select=GEODIST(lto, lgo, 0.89, -0.001) AS geodist;query=index=index;floatrange=geodist,0,32186.88;...;maxmatches=1000;'
The second query does not use the GEODIST function, but it has a location filter.
`query` = 'query=#location \"london\"/0.8 ;index=index;...;maxmatches=1000;'
Then, the final query is
SELECT *
FROM table_name
WHERE `query` = 'select=GEODIST(lto, lgo, 0.89, -0.001) AS geodist;query=index=index;floatrange=geodist,0,32;...;maxmatches=1000;'
OR `query` = 'query=#location \"London\"/0.8 ;index=index;...;maxmatches=1000;'
Based on this query, if the data looked like the table below, I expected the query results to be two rows.
id
location
lto
lgo
1
london
null
null
2
city
0.89
-0.001
I have tried using the SHOW STATUS LIKE 'sphinx_total_found'; but the result I get is only 1. Is there any other way I can find the total result?
I also tried to get the total of the results by using the COUNT method. Unfortunately, it did not work for big data because Sphinx has a default limit of 20.
For example, I have table
id;name
1;John
2;Mary
3;Cat
4;Cheng
I want selection to stop right after 3;Cat and still have as much rows in it as exist berore 3;Cat
I think this could be described with such a query
SELECT * FROM table WHERE condition ORDER BY id LIMIT name = 'Cat'
but of course there is no such a construction LIMIT name='Cat' in SQL.
Maybe something else fits?
Currently Im using extensive select, but it requires enormous 1200 rows to be sure that it has at least one record expected.
This is a not-so-ad answer
https://stackoverflow.com/a/22232897/1475428
Solution might look like
SELECT * WHERE id <= (SELECT MIN(id) WHERE name = 'Cat') order by id
MIN function plays role of backward approach that works like conditional LIMIT.
This looks like an ugly way, I still think there might be a better solution.
This is quite awkward to do in a single query. That means you probably should not try to do it in a single query.
Sometimes it's simpler to do a complex task in several steps. It's easier to write, it's easier to debug, it's easier to modify if you need to, and it's easier for future programmers to read your code if they need to take over responsibility.
So first query for the condition, and find out the id of the row you want to stop at:
SELECT MIN(id) FROM mytable WHERE name = 'Cat';
This returns either an id value, or else NULL if there is no row matching the condition.
If that result was not NULL, then use that value to run a simple query:
SELECT * FROM mytable WHERE id <= ? ORDER BY id
Else if the result was NULL, then default to a query with the fixed LIMIT you want:
SELECT * FROM mytable ORDER BY id LIMIT ?
If you have special conditions that aren't supported by simple SQL, then break it up into different queries that are each simple, and use a little bit of application logic to choose which query to run.
I am trying to retrieve the maximum value of a column using ActiveRecord, but after I order and limit the values.
My query is:
max_value = current_user.books.order('created_at DESC').limit(365).maximum(:price)
Yet the resulting query is:
(243.0ms) SELECT MAX(`books`.`price`) AS max_id FROM `books` WHERE `books`.`user_id` = 2 LIMIT 365
The order is ignored completely and as a result the maximum value comes from the first 365 records instead of the last 365 records.
There's a curious line in the active record code (active_record/relation/calculations.rb) which removes the ordering. I say curious because it refers specifically to postgres:
# Postgresql doesn't like ORDER BY when there are no GROUP BY
relation = reorder(nil)
You should be able to use pluck to achieve what you want. It can select a single attribute which can be a reference to an aggregate function:
q = current_user.books.order('created_at DESC').limit(365)
max_value = q.pluck("max(price)").first
pluck will return an array of values so you need the first to get the first one (and only one in this case). If there are no results then it will return nil.
According to the rails guides maximum returns the maximum value of your table for this field so I suppose Active Records tries to optimize your query and ends up messing up with the order of executing your chained methods.
Could you try: First query the 365 rows you want, and then get the maximum?
max_value = (current_user.books.order('created_at DESC').limit(365)).maximum(:price)
I have found the solution thanks to #RubyOnRails on freenode:
max_value = current_user.books.order('created_at DESC').limit(365).pluck(:price).max
Of course the drawback is that this will grab all 365 prices and calculate the max locally. But I'll survive.
Best and the most effective way is to do subquery .. do something like this ...
current_user.books.where(id: current_user.books.order('created_at DESC').limit(365)).maximum(:price)
M2014 is a text field in the DB table.
This statement works correctly (returns count = 368)
SELECT count(*) FROM arealist WHERE M2014 = 'Yes'
However, I having problems with this statement (returns count = 0) All I have changed
is the concat
SELECT count(*) FROM arealist WHERE concat('M','2014') = 'Yes'
What could be the cause and solution?
You are comparing two strings in the second SELECT statement. The second statement is appending two strings 'M' and '2014' which results in the query comparing 'M2014' to 'Yes' two strings, not the value of the column. Making a statement like this:
SELECT COUNT(*)
FROM AreaList
WHERE M2014 = CONCAT('Y','es')
That statement would return 368 rows. What are you ultimately trying to do with this statement?
You can't generate a dynamic column name for the where clause in MySQL. There are a number of Stack Overflow articles to that effect. Normally, I would arrange my data so that there was some sort of date or timestamp associated with the row, rather than using date specific columns. (I'm assuming M2014 has something to do with the year 2014). When arranged this way you can select what you need based on whatever date requirements you have.
That said, if your data model is fixed, then you're best bet is probably to use another language, C#, python, whatever, to create the column names you need dynamically and then send the entire query to MySQL. Alternatively, you could write a series of SQL statements, one for date column you're interested in.
The following query in google turned up a number of relevant results: https://www.google.com/search?client=opera&q=dynamic+column+names+in+sql&sourceid=opera&ie=utf-8&oe=utf-8&channel=suggest&safe=active#channel=suggest&q=dynamic+column+names+in+mysql+where+clause&safe=active
you can do it in php like that
$year = 2014;
SELECT count(*) FROM arealist
WHERE M$year = 'Yes'
I have a MySQL table with 2 fields:
pd_code and pd_sort (pd_sort default value=0). For each product it is possible to specify an order index (in pd_sort) -1000, -900 and so on.
So when I print out products in PHP, i would like to sort them out like this.
product1 (pd_sort = -100), product2 (pd_sort = -90) etc, and then the rest products (where pd_sort = 0) sorted by pd_code.
ORDER BY pd_sort,pd_code works only for 2 products.
Any suggestions?
Chris
If I understand right, you should try something like this:
SELECT * FROM table
WHERE pd_sort <> 0
ORDER BY pd_sort
UNION
SELECT * FROM table
WHERE pd_sort = 0
ORDER BY pd_code
A union as jab suggested should be fairly efficient, even if it does result in two queries rather than one.
If you don't want to do the union for whatever reason, another approach is to have the select generate a column by manipulating the pd_code and pd_sort values, and sort on that column. You haven't given us sample data to work with (well, other than a couple of pd_sort values), but in most cases it's possible to manipulate the data such that you end up with a sortable value, usually just by doing concats or numeric expressions. But in the most complex cases, you can fall back on case statements.