Sorry about the poorly worded question.. but I don't know how else to explain this...
MySQL... I have a query with several extremely complex subqueries in it. I and selecting from a table and I need to find out what "place" each record is in according to a variety of criteria .. So I have this
Select record.id, record.title
(select count(*) from (complex-query-that-returns-newer-records)) as agePlace,
(select count(*) from (complex-query-that-returns-records-with-better-ROI)) as ROIPlace...
From record...
Now the issue is that the query is slow - as I had expected give the amount of crunching required. But I realized that there are there are situations where results of 2 subqueries will be the same, and there is no need for me to run the subquery twice.. (or have it in my code twice). So I would like to wrap one of the subqueries in an if statement and if the criteria are met, use the value from another column that already calculated that data, else, run the subquery as normal .
I have tried just putting the other subquery's alias, but it says unknown column totalSales because the field is in the query, not one of the tables.
Is there any way around this?
UPDATE: I have reposted this as a query refortoring question - thanks for the suggestions.. How to refactor select subqueries into joins?
There really isn't a way around this. The SQL engine compiles the query to run the whole query, not just part of it. During compile time, the query engine does not know that the results will be the same.
More likely, you can move the subqueries to the from clause and find optimizations there.
If that is of interest, you should write another question with the actual queries you are using. That is a different question from this one ("how to rephrase this query" rather than "how can I conditionally make this run").
Related
I have a large database in which I use LIMIT in order not to fetch all the results of the query every time (It is not necessary). But I have an issue: I need to count the number of results. The dumbest solution is the following and it works:
We just get the data that we need:
SELECT * FROM table_name WHERE param > 3 LIMIT 10
And then we find the length:
SELECT COUNT(1) FROM table_name WHERE param > 3 LIMIT 10
But this solution bugs me because unlike the query in question, the one that I work with is complex and you have to basically run it twice to achieve the result.
Another dumb solution for me was to do:
SELECT COUNT(1), param, anotherparam, additionalparam FROM table_name WHERE param > 3 LIMIT 10
But this results in only one row. At this point I will be ok if it would just fill the count row with the same number, I just need this information without wasting computation time.
Is there a better way to achieve this?
P.S. By the way, I am not looking to get 10 as the result of COUNT, I need the length without LIMIT.
You should (probably) run the query twice.
MySQL does have a FOUND_ROWS() function that reports the number of rows matched before the limit. But using this function is often worse for performance than running the query twice!
https://www.percona.com/blog/2007/08/28/to-sql_calc_found_rows-or-not-to-sql_calc_found_rows/
...when we have appropriate indexes for WHERE/ORDER clause in our query, it is much faster to use two separate queries instead of one with SQL_CALC_FOUND_ROWS.
There are exceptions to every rule, of course. If you don't have an appropriate index to optimize the query, it could be more costly to run the query twice. The only way to be sure is to repeat the tests shown in that blog, using your data and your query on your server.
This question is very similar to: How can I count the numbers of rows that a MySQL query returned?
See also: https://mariadb.com/kb/en/found_rows/
This is probably the most efficient solution to your problem, but it's best to test it using EXPLAIN with a reasonably sized dataset.
I am currently trying to optimize a GROUP BY query because for some reason it is taking forever. Whats odd is that when I run a GROUP BY query on another column that has the same number of characters MySQL can do it with ease. So I have a feeling it has something to do with the data itself. This is my first question. If anyone has any suggestions on how to debug this that would be awesome.
Assuming it's just a optimization problem, I found this post which recommends creating an index. I am confused about how this flow would work in my terminal
Suppose the query I am having trouble with is
SELECT user_id, count(uid) FROM table GROUP BY user_id;
Given his advice would I just run the previous query and then next the following one:
CREATE INDEX ix_temp ON table (uid);
Or would they be the same query? What is the exact flow here? Is there a step I am missing?
I have a query as the following:
SELECT SUM(`weight`) as totalgrams,
SUM(`weight`)/1000 as totalkilograms
FROM `item`
which requires me to use the result of the first column's SUM, but since I can't use totalgrams, I need to redo SUM function again in the second column calculation.
The query plan from EXPLAIN:
Now, with the second query:
SELECT totalgrams, totalgrams/1000 as totalkilograms
FROM (SELECT SUM(`weight`) as totalgrams
FROM `item`) prequery
I don't need to repeat the SUM but I ended up with a nested query.
The query plan from EXPLAIN:
At a glance, it seems better to go with the first query, as it only has one entry in the execution plan, but was SUM calculated twice here (which is redundant and not scalable)?
Or actually the system already have an optimization for this and just calculate it once; so indeed the first query is better?
Right now there are only a few rows inside the table, so perhaps the difference is not significant in the real [ms] unit.
But if later it becomes huge, I wonder actually which query would be better?
And does it apply for all DBMS?
It is purely for understanding the SQL workflow, any insight is appreciated.
MySQL materializes subqueries in the from clause -- the so-called derived table. In this case, the summary has one row and one column, so it is really no big deal.
Including the sum() twice in the select does not have this overhead. It is unclear from the explain output whether sum() is calculated once or twice. Probably twice, but there could be an optimization step that eliminates that processing. In any case, sum() is a really cheap. The expensive part is arranging the aggregation, and all the aggregation functions are processed together.
You say this is purely for understanding the workflow, so I'll start my answer by saying mySQL does have means for optimizing these sort of operations and will do so but it isn't perfect and you shouldn't depend on it. [PICKY] The example is not the best as a sum operation is trivial anyhow[/PICKY]
I would say your first solution is the better, but even better still would be to remove the need for the calculation at all. Most of the time when a calculated column is used, it's simpler to code the calculation in the application that's getting the result, ie if this is called from php let php calculate total kilos instead of mysql. It's a one time calculation based on a single return value and it doesn't matter whether mySQL optimizes it or not. As I said earlier, sum is inexpensive, so for this particular example it isn't relevant but if the operation was something more expensive it would be a factor and for a general policy we should not assume the triviality of the operation.
If the outside language is an issue, another possibility would be to create an intermediate table and then update that table with the result. In this case (a single row) the overhead makes this less desirable but if it were many rows in the result table (such as with a group by), or to create a general policy, the overhead becomes a non-issue.
I used to write select clause in side select clause to avoid joins in from clause. But I am afraid that is it a good coading practice or it will degrade database performance. Below is the query which contains multiple tables but I have written it using nested select clause without any join statement. Please let me know if I am making any mistake or it is ok. At this moment, I am getting accurate result.
SELECT * ,
(select POrderNo from PurchaseOrderMST POM
where POM.POrderID=CET.POrderID)as POrderNo,
(select SiteName from SiteTRS ST where ST.SiteID=CET.SiteID)as SiteName,
(select ParticularName from ParticularMST PM where
PM.ParticularID=CET.ParticularID)as ParticulerName
FROM ClaimExpenseTRS CET
WHERE ClaimID=#ClaimID
I'd use joins for this because it is best practice to do so and will be better for the query optimizer.
But for the learning just try to execute the script with join and without and see what happens on the query plan and the execution time. Usually this answers your questions right away.
Your solution is just fine.
As long as you are only using 1 column for each "joined" table, and has no multiple matching rows, it is fine. In some cases, even better than joining.
(the db engine could anytime change the direction of a join, if you are not using tricks to force a given direction, which could cause performance suprises. It is called query optimiyation, but as far as you really know your database, you should be the one to decide how the query should run).
I think you should JOIN indeed.
Now your creating your own JOIN with where and select statements.
I'm trying to figure out which is the more efficient way to get the nth highest record in a mySQL database:
SELECT *
FROM table_name
ORDER BY column_name DESC
LIMIT n - 1, 1
or
SELECT *
FROM table_name AS a
WHERE n - 1 = (
SELECT COUNT(primary_key_column)
FROM products b
WHERE b.column_name > a. column_name)
There is an index on column_name.
I would think mySQL would efficiently perform the limit clause and the first option is the way to go.
I wasn't too clear what the 2nd query does exactly, so if that is more efficient can someone explain why.
Thanks.
I tried EXPLAIN on both those queries on a database of mine (note: the optimizer may choose different plans for your schema/data) and it definitely looks like the first one wins in every regard: it's simpler to read and understand, and will most likely be faster.
As aaronls said, and EXPLAIN confirms, the second query has a correlated subquery which will require an extra iteration through the entire table for each row.
Since the first one is way easier to read, I'd choose it in a shot. If you do find that it's a bottleneck (after profiling your application), you could give the second a try but I don't see how it could possibly be faster.
I think with the second query it's going to do an inner loop to run the subquery for evaluating against each row in table_name. If that is the case, this means you might have something like O(n^2) runtime.
Based on that I would personnally go with the first query, but if it were that important to me, I would do some performance testing. Make sure you test against very large data sets as well to get a good idea of how the performance scales. Something that runs at O(n) is faster for very small datasets, but something that runs at O(log(n)) is much better for large data sets.
Run explain on both queries and see which one MySQL thinks is more complicated.
This isn't really an answer but...
Go with the first query assuming your loads aren't super heavy, just because it works and is simple. You can always go back later and change if really necessary.
I would suggest (though I'm not sure about the exact SQL syntax myself) that you compute an additional RANK column on a simple query that orders elements as desired (DESC). Then just select the row where RANK = n.
You can probably do this with a variable that gets incremented I guess. It's basically something that says how many rows come before this row, so it should be very easy to compute.
If you're really concerned about efficiency, maybe you should look into implementing a selection algorithm in SQL.
for 2nd highest record
select max(Price) as price from OrderDetails
where Price<(select max(Price ) from OrderDetails)
for N th highest record
SELECT * FROM OrderDetails AS a
WHERE n-1 = ( SELECT COUNT(OrderNo) FROM OrderDetails b WHERE b.Price > a. Price)