I have used SphinxSE to make a query for MySQL. In my case, I have to use two different Sphinx queries together with an OR condition.
My question is how can I accurately calculate the total result based on all the individual query results.
For example, the two queries look like this:
The first query uses the GEODIST function to calculate the distance between two sets of coordinates.
`query` = 'select=GEODIST(lto, lgo, 0.89, -0.001) AS geodist;query=index=index;floatrange=geodist,0,32186.88;...;maxmatches=1000;'
The second query does not use the GEODIST function, but it has a location filter.
`query` = 'query=#location \"london\"/0.8 ;index=index;...;maxmatches=1000;'
Then, the final query is
SELECT *
FROM table_name
WHERE `query` = 'select=GEODIST(lto, lgo, 0.89, -0.001) AS geodist;query=index=index;floatrange=geodist,0,32;...;maxmatches=1000;'
OR `query` = 'query=#location \"London\"/0.8 ;index=index;...;maxmatches=1000;'
Based on this query, if the data looked like the table below, I expected the query results to be two rows.
id
location
lto
lgo
1
london
null
null
2
city
0.89
-0.001
I have tried using the SHOW STATUS LIKE 'sphinx_total_found'; but the result I get is only 1. Is there any other way I can find the total result?
I also tried to get the total of the results by using the COUNT method. Unfortunately, it did not work for big data because Sphinx has a default limit of 20.
Related
I want to achieve the below 2 scenarios in a single query.
(Note- This is just for reference, the actual query is different)
1. SELECT * FROM CUSTOMER.CUSTOMER LIMIT :startingRow, rowsCount; //WITH LIMIT
2. SELECT * FROM CUSTOMER.CUSTOMER; // NO LIMIT
Is it possible to write a single conditional query for this?
If I pass starting row and rows count params it should go for 1st condition and if no input params are passed, it should give me all records from a table.
The MySQL manual gives a tip for this:
https://dev.mysql.com/doc/refman/en/select.html
To retrieve all rows from a certain offset up to the end of the result set, you can use some large number for the second parameter. This statement retrieves all rows from the 96th row to the last:
SELECT * FROM tbl LIMIT 95,18446744073709551615;
The very large value used in this example is 264-1, or the greatest value of BIGINT UNSIGNED. There are certainly a smaller number of rows in your table.
In your case, you could use 0 as the default offset and a very large value like that as the default limit.
Speaking for myself, I would just run two different queries. One with a LIMIT clause, and the other with no LIMIT clause. Use some kind of if/then/else structure in your client code to determine which query to run, based on whether the function has specified the limit parameters or not.
Let's say I have a black box query that I don't really understand how it works, something along the lines of:
SELECT ... FROM ... JOIN ... = A (denoted as A)
Let's say A returns 500 rows.
I want to get the count of the number of rows (500 in this case), and then only return a limit of 50.
How can I wrote a query built around A that would return the number '500' and 50 rows of data?
You can use window functions (available in MySQL 8.0 only) and a row-limiting clause:
select a.*, count(*) over() total_rows
from ( < your query >) a
order by ??
limit 50
Note that I added an order by clause to the query. Although this is not technically required, it is a best practice: without an order by clause where the column (or set of columns) uniquely identifies each row, it is undefined which 50 rows the database will return, and the results may not be consistent over consecutive executions of the same query.
This is what SELECT SQL_CALC_FOUND_ROWS is intended to do.
SELECT SQL_CALC_FOUND_ROWS * FROM tbl_name WHERE id > 100 LIMIT 10;
SELECT FOUND_ROWS();
The first query returns the limited set of rows.
The second query calls FOUND_ROWS() which returns an integer number of how many rows matched the most recent query, the number of rows which would have been returned if that query had not used LIMIT.
See https://dev.mysql.com/doc/refman/8.0/en/information-functions.html#function_found-rows
However, keep in mind that using SQL_CALC_FOUND_ROWS incurs a significant performance cost. Benchmarks show that it's usually faster to just run two queries:
SELECT COUNT(*) FROM tbl_name WHERE id > 100; -- the count of matching rows
SELECT * FROM tbl_name WHERE id > 100 LIMIT 10; -- the limited result
See https://www.percona.com/blog/2007/08/28/to-sql_calc_found_rows-or-not-to-sql_calc_found_rows/
There are a few ways you can do this (assuming that I am understanding your question correctly). You can open run two queries (and point a cursor to each) and then open and return both cursors, or you can run a stored procedure in which the count query is ran first, the result is stored into a variable, then it is used in another query.
Let me know if you would like an example of either of these
I have a MySQL table which has two columns : ID and count. It has an index on ID field.
Now if i have to get sum of all the count between two IDs, I can write a query like:
Select SUM(count) from table where id between x and y
or i can get
select count from table where id between x and y
And then loop through the result and calculate the sum of the count on my application code
Which one is better, considering the speed is the essential thing here. Will indexing on the count help?? Or can i write a different SQL?
Would indexing on the count column help in any way?
I have around 10000 requests per second coming in and I am using a load balancer and 5 servers for this.
The second one is the correct one. There's no need to sum a count, as the count comes back as a single value. It only needs to be run once.
Unless you have a column named count, in which you want to sum all the values...
EDIT
Because you are saying you have a column named Count, you would use the first query:
Select SUM(count) from table where id between x and y
Use approach 1 as you would save on fetching data from MySQL and iterating over it.
The time taken by MySQL to execute either of your queries would be nearly the same but the second approach would require looping through the results and summing them; unnecessary overhead.
Before i start my question i cover briefly what the problem is:
I have a table that stores around 4 million 'parameter' values. These values have an id, simulation id and parameter id.
The parameter id maps to a parameter table that basically just maps the id to a text like representation of the parameter x,y, etc etc
The simulation table has around 170k entries that map parameter values to a job.
There is also a score table which stores the score of each simulation , simulations have varying scores for example one might have one score another might have three. The scores tables has a simulation_id column for selecting this.
Each job has an id and an objective.
Currently im trying to select all the parameter_values who's parameter is 'x' and where the job id is 17 and fetch the score of it. The variables of the select will change but in princible its only really these things im interested in.
Currently im using this statement:
SELECT simulation.id , value , name , ( SELECT GROUP_CONCAT(score) FROM score WHERE score.simulation_id = simulation.id ) AS score FROM simulation,parameter_value,parameter WHERE simulation.id=parameter_value.simulation_id AND simulation.job_id = 17 AND parameter_value.parameter_id=parameter.id AND parameter.name = "$x1"
This works nicley except its taking around 3 seconds to execute. Can this be done any faster?
I don't know if it would be faster doing a query before this a pre-calculating the parameter_ids im searching for and doing an WHERE parameter_id IN (1,2,3,4) etc.
But i was under the impression SQL would optimize this anyway?
I have created index's where ever possible but cant get faster than the 2.7 seconds mark.
So my question would be:
Should i pre-calculate some values and avoid the joins,
Is there another other than group_concat to get the scores
and is there any other optimizations i could make to this?
I should also add that the scores must be in the same row or at least return sorted so i can easily read them from the result set.
Thanks,
Lewis
I have a SELECT query that I am expecting millions of results from. I need to randomize these results in MySQL. Doing it in my script after the query obviously uses too much RAM. Can someone please rework this query so that the results are all random without using order by rand()? I have seen some examples and tried to use them but they don't work for me since they all seem to depend on returning the whole table rather than using a WHERE clause. Here is my query:
SELECT * FROM pool
WHERE gender = 'f'
AND (`location` = 'united states' OR `location` = 'us' OR `location` = 'usa');
If you have 10 million lines with id's, are they in a contiguous range?
Get the lowest id from the range you want via a quick select.
Get the largest id as well.
generate random numbers in this range using php
once you have your numbers "SELECT * FROM table1 WHERE id IN (the numbers you generated)" or something like that
If you can use other language, for example php you can use its rand() function to generate ids and add the to the query like
$ids = range($mini, $maxid);
shuffle($ids);
array_slice($ids, 0, $quantity);
Or something similar in any language you are using.
If you need to do this in pure mysql query then here are some alternaties: http://www.electrictoolbox.com/msyql-alternative-order-by-rand/