My query goes from 15 seconds to 0.05 seconds when I remove the ORDER BY in the following query:
simplified version:
SELECT field1, fiedl2, field 3,
FUNC1(1, 2) AS score1,
FUNC2(1, 2) AS score2,
FUNC3(1, 2) AS score3,
FUNC4(1, 2) AS score4
FROM table
WHERE field1 = 1
ORDER BY (score1 * 1 + score2 * 2 + score3 * 2 + score4 * 4) DESC;
I have a couple of stored functions that calculate sub-scores. Except I have to order the result based on the total score. In the ORDER BY I use * 2 to add some weight to the subscores to influence the total score.
I use MySQL 5.6.13
Has anybody has an idea how I can make the ORDER BY, but not slow it down?
Like is it possible to and store the score# fields and sum them up?
Thanks!
The difference in time is because MySql needs to create a sorted temp table and fill it with data. As your query is running much slower when using order by, the problem might be in disk where temp data is stored. You haven't mentioned how many rows you are returning from this query. You may also try manually perform the steps that MySql is probably doing, so create a temp table with primary key (order_by_result int, n int auto_increment) and insert into it your select results:
Insert into t(order_by_result, n, ...)
select (score1 * 1 + score2 * 2 + score3 * 2 + score4 * 4),null,...
and check hpw fast it runs - you may also check this way if the problem lies in your storage.
You could add a total_score column to the table, and define a trigger to update it automatically whenever a row is added or updated. Then index the column, and ORDER BY total_score should be fast.
I think the best solution is to precalculate the values of the functions and store them in the database. This way, the problem of calculating the values of the function on the fly will be transformed into a very simple ordering query.
As lowleveldesing has said, this type of queries forces mysql to calculate the product (score1 * 1 + score2 * 2 + score3 * 2 + score4 * 4) for all the register before giving you any output.
Related
I have two tables(ProductionDetails, Production), I need to populate Production based on calculated values of the ProductionDetails. ProductionDetails has the number of buckets gathered, Production needs to be populated with BINS. For example if I have 100 buckets, and I divide those by 30 that equals 3.33 bins. Now I need to create 3 BINS.
select Round( sum(ep.Buckets)/30),ep.lot,ep.crewid
from On_EmpProdDetails ep
group by ep.buckets, ep.lot,ep.crewid
Results in
Bins
Lot
Crew
3
556790186SOCC1
SOCC1
Now I want to insert 3 unique rows into the Production Table
BinID is auto increment
BinID
LOT
1
556790186SOCC1
2
556790186SOCC1
3
556790186SOCC1
You can use a recursive CTE to generate the rows:
insert into production (lot)
with recursive cte as (
select Round( sum(ep.Buckets)/30) as num_bins, ep.lot, ep.crewid, 1 as bin
from On_EmpProdDetails ep
group by ep.buckets, ep.lot,ep.crewid
union all
select num_bins, lot, crewid, bin + 1
from cte
where bin < num_bins
)
select lot
from cte;
Here is a db<>fiddle.
E.g.
The first number is: 429
The second number is: 529
So I want to write MySQL query in such a way that, it should give me either 429 or 529 exactly.
I searched on google regarding this, but its showing results for a random number as a range.
Any help will be appreciated.
EDIT
My real requirement is this:
INSERT INTO table1(table2_id, status, stage, added_by)
(SELECT id, 'Pending', 'Semifinal', RAND(SELECT 429 UNION SELECT 529) FROM table2)
SELECT * FROM (SELECT 429 UNION SELECT 529) AS tmp ORDER BY RAND() LIMIT 1
Steps:
Select 429 and 529
Apply random order
Return first result
The function is the following (without UNION and ORDER, only math and only one step):
(ROUND(RAND()) * 100) + 429
or
(FLOOR(0 + (RAND() * 2)) * 100) + 429
Refer to MySQL docs
APPENDED
To give a general answer to the question (to select one random integer from any two integers :x and :y):
(FLOOR(0 + (RAND() * 2)) * (:y - :x)) + :x
This way does not create a mem table and does not sort the rows in it and/or fetch one of the random rows.
Is this data placed within a table? Then something like this might work:
SELECT number FROM table ORDER BY rand() LIMIT 1
I'm trying to optimize an algoritmic query by using a view. I created the view like:
Simplified version:
CREATE ALGORITHM = MERGE VIEW view_name AS (SELECT field1, fiedl2, field 3,
FUNC1(1, 2) AS score1,
FUNC2(1, 2) AS score2,
FUNC3(1, 2) AS score3,
FUNC4(1, 2) AS score4
FROM table
WHERE field1 = 1
); # only 0.003 seconds - approximate 2000 records
But when I try to do a SELECT query like:
SELECT * FROM view_name
ORDER BY (score1 * 1 + score2 * 2 + score3 * 2 + score4 * 4) DESC
LIMIT 0,20; # 9 to 11 seconds
The query takes about 10 seconds to get up with the result.
I already tried a couple of other solutions, but nothing has succeeded, see:
MySQL query slow because of ORDER BY with Stored Functions
If anyone has a clue, suggestion or an answer that would be awesome!
Thanks!
For the last two days, I have been asking questions on rank queries in Mysql. So far, I have working queries for
query all the rows from a table and order by their rank.
query ONLY one row with its rank
Here is a link for my question from last night
How to get a row rank?
As you might notice, btilly's query is pretty fast.
Here is a query for getting ONLY one row with its rank that I made based on btilly's query.
set #points = -1;
set #num = 0;
select * from (
SELECT id
, points
, #num := if(#points = points, #num, #num + 1) as point_rank
, #points := points as dummy
FROM points
ORDER BY points desc, id asc
) as test where test.id = 3
the above query is using subquery..so..I am worrying about the performance.
are there any other faster queries that I can use?
Table points
id points
1 50
2 50
3 40
4 30
5 30
6 20
Don't get into a panic about subqueries. Subqueries aren't always slow - only in some situations. The problem with your query is that it requires a full scan.
Here's an alternative that should be faster:
SELECT COUNT(DISTINCT points) + 1
FROM points
WHERE points > (SELECT points FROM points WHERE id = 3)
Add an index on id (I'm guessing that you probably you want a primary key here) and another index on points to make this query perform efficiently.
*Hey everyone, I am working on a query and am unsure how to make it process as quickly as possible and with as little redundancy as possible. I am really hoping someone there can help me come up with a good way of doing this.
Thanks in advance for the help!*
Okay, so here is what I have as best I can explain it. I have simplified the tables and math to just get across what I am trying to understand.
Basically I have a smallish table that never changes and will always only have 50k records like this:
Values_Table
ID Value1 Value2
1 2 7
2 2 7.2
3 3 7.5
4 33 10
….50000 44 17.2
And a couple tables that constantly change and are rather large, eg a potential of up to 5 million records:
Flags_Table
Index Flag1 Type
1 0 0
2 0 1
3 1 0
4 1 1
….5,000,000 1 1
Users_Table
Index Name ASSOCIATED_ID
1 John 1
2 John 1
3 Paul 3
4 Paul 3
….5,000,000 Richard 2
I need to tie all 3 tables together. The most results that are likely to ever be returned from the small table is somewhere in the neighborhood of 100 results. The large tables are joined on the index and these are then joined to the Values_Table ON Values_Table.ID = Users_Table.ASSOCIATED_ID …. That part is easy enough.
Where it gets tricky for me is that I need to return, as quickly as possible, a list limited to 10 results where value1 and value2 are mathematically operated on to return a new_ value where that new_value is less than 10 and the result is sorted by that new_value and any other where statements I need can be applied to the flags. I do need to be able to move along the limit. EG LIMIT 0,10 / 11,10 / 21,10 etc...
In a subsequent (or the same if possible) query I need to get the top 10 count of all types that matched that criteria before the limit was applied.
So for example I want to join all of these and return anything where Value1 + Value2 < 10 AND I also need the count.
So what I want is:
Index Name Flag1 New_Value
1 John 0 9
2 John 0 9
5000000 Richard 1 9.2
The second response would be:
ID (not index) Count
1 2
2 1
I tried this a few ways and ultimately came up with the following somewhat ugly query:
SELECT INDEX, NAME, Flag1, (Value1 * some_variable + Value2) as New_Value
FROM Values_Table
JOIN Users_Table ON ASSOCIATED_ID = ID
JOIN Flags_Table ON Flags_Table.Index = Users_Table.Index
WHERE (Value1 * some_variable + Value1) < 10
ORDER BY New_Value
LIMIT 0,10
And then for the count:
SELECT ID, COUNT(TYPE) as Count, (Value1 * some_variable + Value2) as New_Value
FROM Values_Table
JOIN Users_Table ON ASSOCIATED_ID = ID
JOIN Flags_Table ON Flags_Table.Index = Users_Table.Index
WHERE (Value1 * some_variable + Value1) < 10
GROUP BY TYPE
ORDER BY New_Value
LIMIT 0,10
Being able to filter on the different flags and such in my WHERE clause is important; that may sound stupid to comment on but I mention that because from what I could see a quicker method would have been to use the HAVING statement but I don't believe that will work in certain instance depending on what I want to use my WHERE clause to filter against.
And when filtering using the flags table :
SELECT INDEX, NAME, Flag1, (Value1 * some_variable + Value2) as New_Value
FROM Values_Table
JOIN Users_Table ON ASSOCIATED_ID = ID
JOIN Flags_Table ON Flags_Table.Index = Users_Table.Index
WHERE (Value1 * some_variable + Value1) < 10 AND Flag1 = 0
ORDER BY New_Value
LIMIT 0,10
...filtered count:
SELECT ID, COUNT(TYPE) as Count, (Value1 * some_variable + Value2) as New_Value
FROM Values_Table
JOIN Users_Table ON ASSOCIATED_ID = ID
JOIN Flags_Table ON Flags_Table.Index = Users_Table.Index
WHERE (Value1 * some_variable + Value1) < 10 AND Flag1 = 0
GROUP BY TYPE
ORDER BY New_Value
LIMIT 0,10
That works fine but has to run the math multiple times for each row, and I get the nagging feeling that it is also running the math multiple times on the same row in the Values_table table. My thought was that I should just get only the valid responses from the Values_table first and then join those to the other tables to cut down on the processing; with how SQL optimizes things though I wasn't sure if it might not already be doing that. I know I could use a HAVING clause to only run the math once if I did it that way but I am uncertain how I would then best join things.
My questions are:
Can I avoid running that math twice and still make the query work
(or I suppose if there is a good way
to make the first one work as well
that would be great)
What is the fastest way to do this
as this is something that will
be running very often.
It seems like this should be painfully simple but I am just missing something stupid.
I contemplated pulling into a temp table then joining that table to itself but that seems like I would trade math for iterations against the table and still end up slow.
Thank you all for your help in this and please let me know if I need to clarify anything here!
** To clarify on a question, I can't use a 3rd column with the values pre-calculated because in reality the math is much more complex then addition, I just simplified it for illustration's sake.
Do you have a benchmark query to compare against? Usually it doesn't work to try to outsmart the optimizer. If you have acceptable performance from a starting query, then you can see where extra work is being expended (indicated by disk reads, cache consumption, etc.) and focus on that.
Avoid the temptation to break it into pieces and solve those. That's an antipattern. That includes temp tables especially.
Redundant math is usually ok - what hurts is disk activity. I've never seen a query that needed CPU work reduction on pure calculations.
Gather your results and put them in a temp table
SELECT * into TempTable FROM (SELECT INDEX, NAME, Type, ID, Flag1, (Value1 + Value2) as New_Value
FROM Values_Table
JOIN Users_Table ON ASSOCIATED_ID = ID
JOIN Flags_Table ON Flags_Table.Index = Users_Table.Index
WHERE New_Value < 10)
ORDER BY New_Value
LIMIT 0,10
Return Result for First Query
SELECT INDEX, NAME, Flag1, New_Value
FROM TempTable
Return Results for count of Types
Select ID, Count(Type)
FROM TempTable
GROUP BY TYPE
Is there any chance that you can add a third column to the values_table with the pre-calculated value? Even if the result of your calculation is dependent on other variables, you could run the calculation for the whole table but only when those variables change.