MySQL how can I speed up UDF in my query - mysql

I have one table that has userID, department, er. I've created one simple query to gather all this information.
SELECT table.userID, table.department, table.er FROM table;
Now, I want to group all er's that belong to the same department and perform this calculation
select sum(table.er)/3 as department_er from table group by table.department;
Then add this result as a new column in my first query. To do this I've created a UDF that looks like this
BEGIN
DECLARE department_er FLOAT;
set department_er = (select sum(er) from table where table.department = dpt);
RETURN department_er;
END
Then I used that UDF in this query
SELECT table.userID, table.department, (select dptER(table.department)/3) as department_er FROM table
I've indexed my tables and more complex queries were dropped from 4+ minutes to less than 1 second. This seems to be pretty simple but is going on 10 minutes to run. Is there a better way to do this or a way to optimize my UDF?
Forgive my n00b-ness :)

Try a query without a dependent aggregated subquery in SELECT clause:
select table.userID,
table.department as dpt,
x.department_er
from table
join (
select department,
(sum(table.er)/3) As department_er
from table
group by department
) x
ON x.department = table.department
This UDF function cannot be optimized. Maybe it seems to work in simple queries, but generally it can hurt your database performance.
Imagine that we have a query like this one:
SELECT ....., UDF( some parameters )
FROM table
....
MySql must call this funcion for each record that is retrieved from the table in this query
If the table contains 1000 records - the function is fired 1000 times.
And the query within the function is also fired 1000 times.
If 10.000 records - then the function is called 10.000 times.
Even if you optimize this function in such a way, that the UDF will be 2 times faster, the above query will still fire the function 1000 times.
If 500 users have the same department - it still is called 500 times for each user and calculates the same value for each of them. 499 redundant calls, because only 1 call is required to calculate this value.
The only way to optimize such queries is to take the "inner" query out of the UDF function and combine it with the main query using joins etc.

Related

Slow stored MySQL function gets progressively slower with repeated runs

I need to filter a list by whether the person has an appointment. This runs in 0.09 seconds.
select personid from persons p
where EXISTS (SELECT 1 FROM appointments a
WHERE a.personid = p.personid);
Since I use this in more than one query and it actually contains another condition, it seemed convenient to put the filter into a function, so I have
CREATE FUNCTION `has_appt`(pid INT) RETURNS tinyint(1)
BEGIN
RETURN
EXISTS (SELECT 1 FROM appointments WHERE personid = pid);
END
Then I can use
select personid from persons where has_appt(personid)
However, two unexpected things happen. First, the statement using the has_appt() function now takes 2.5 seconds to run. I know there is overhead to a function call, but this seems extreme. Second, if I run the statement repeatedly, it takes about 5 seconds longer each time, so by the 4th time, it is taking over 20 seconds. This happens regardless of how long I wait between tries, but storing the function again resets the time to 2.5 seconds. What can account for the progressive slowness? What state can be affected by simply running it multiple times?
I know the solution is to forget the function and just embed this into my queries, but I want to understand the principle so I can avoid making the same mistake again. Thanks in advance for you help.
I'm using MySQL 8 and Workbench.
Your original query can be replaced by, and sped up by,
SELECT personid FROM appointments;
But the query seems dumb -- why would you want a list of all it ids of people with appointments, but no info about them? Perhaps you over-simplified the query?
If a person might have multiple appointments, then this would be needed, and might not be as fast:
SELECT DISTINCT personid FROM appointments;
As for why the function is so slow... Optimization does not see what is inside the function. So select personid from persons where has_appt(personid) walks through the entire persons table, calling the function repeatedly.

How to return only 10 rows via SQL query

I have 1 query that returns over 180k rows. I need to make a slight change, so that it returns only about 10 less.
How do I show only the 10 rows as a result?
I've tried EXCEPT but it seems to return a lot more than just the 10.
You can use LIMIT. This will show first n rows. Example:
SELECT * FROM Orders LIMIT 10
If you are trying to make pagination add OFFSET. It will return 10 rows starting from row 20. Example:
SELECT * FROM Orders LIMIT 10 OFFSET 20
MySQL doesn't support EXCEPT (to my knowledge).
Probably the most efficient route would be to incorporate the two WHERE clauses into one. I say efficient in the sense of "Do it that way if you're going to run this query in a regular report or production application."
For example:
-- Query 1
SELECT * FROM table WHERE `someDate`>'2016-01-01'
-- Query 2
SELECT * FROM table WHERE `someDate`>'2016-01-10'
-- Becomes
SELECT * FROM table WHERE `someDate` BETWEEN '2016-01-01' AND '2016-01-10'
It's possible you're implying that the queries are quite complicated, and you're after a quick (read: not necessarily efficient) way of getting the difference for a one-off investigation.
That being the case, you could abuse UNION and a sub-query:
(Untested, treat as pseudo-SQL...)
SELECT
*
FROM (
SELECT * FROM table WHERE `someDate`>'2016-01-01'
UNION ALL
SELECT * FROM table WHERE `someDate`>'2016-01-10'
) AS sub
GROUP BY
`primaryKey`
HAVING
COUNT(1) = 1;
It's ugly though. And not efficient.
Assuming that the only difference is only that one side (I'll call it the "right hand side") is missing records that the left includes, you could LEFT JOIN the two queries (as subs) and filter to right-side-is-null. But that'd be dependent on all those caveats being true.
Temporary tables can be your friend - especially given they're so easily created (and can be indexed):
CREATE TEMPORARY TABLE tmp_xyz AS SELECT ... FROM ... WHERE ...;

Specific where clause in Mysql query

So i have a mysql table with over 9 million records. They are call records. Each record represents 1 individual call. The columns are as follows:
CUSTOMER
RAW_SECS
TERM_TRUNK
CALL_DATE
There are others but these are the ones I will be using.
So I need to count the total number of calls for a certain week in a certain Term Trunk. I then need to sum up the number of seconds for those calls. Then I need to count the total number of calls that were below 7 seconds. I always do this in 2 queries and combine them but I was wondering if there were ways to do it in one? I'm new to mysql so i'm sure my syntax is horrific but here is what I do...
Query 1:
SELECT CUSTOMER, SUM(RAW_SECS), COUNT(*)
FROM Mytable
WHERE TERM_TRUNK IN ('Mytrunk1', 'Mytrunk2')
GROUP BY CUSTOMER;
Query 2:
SELECT CUSTOMER, COUNT(*)
FROM Mytable2
WHERE TERM_TRUNK IN ('Mytrunk1', 'Mytrunk2') AND RAW_SECS < 7
GROUP BY CUSTOMER;
Is there any way to combine these two queries into one? Or maybe just a better way of doing it? I appreciate all the help!
There are 2 ways of achieving the expected outcome in a single query:
conditional counting: use a case expression or if() function within the count() (or sum()) to count only specific records
use self join: left join the table on itself using the id field of the table and in the join condition filter the alias on the right hand side of the join on calls shorter than 7 seconds
The advantage of the 2nd approach is that you may be able to use indexes to speed it up, while the conditional counting cannot use indexes.
SELECT m1.CUSTOMER, SUM(m1.RAW_SECS), COUNT(m1.customer), count(m2.customer)
FROM Mytable m1
LEFT JOIN Mytable m2 ON m1.id=m2.id and m2.raw_secs<7
WHERE TERM_TRUNK IN ('Mytrunk1', 'Mytrunk2')
GROUP BY CUSTOMER;

Speed of query using FIND_IN_SET on MySql

i have several problems with my query from a catalogue of products.
The query is as follows:
SELECT DISTINCT (cc_id) FROM cms_catalogo
JOIN cms_catalogo_lingua ON ccl_id_prod=cc_id
JOIN cms_catalogo_famiglia ON (FIND_IN_SET(ccf_id, cc_famiglia) != 0)
JOIN cms_catalogo_categoria ON (FIND_IN_SET(ccc_id, cc_categoria) != 0)
JOIN cms_catalogo_sottocat ON (FIND_IN_SET(ccs_id, cc_sottocat) != 0)
LEFT JOIN cms_catalogo_order ON cco_id_prod=cc_id AND cco_id_lingua=1 AND cco_id_sottocat=ccs_id
WHERE ccc_nome='Alpine Skiing' AND ccf_nome='Ski'
I noticed that querying the first time it takes on average 4.5 seconds, then becomes rapid.
I use FIND_IN_SET because in my Database on table "cms_catalogo" I have the column "cc_famiglia" , "cc_categoria" and "cc_sottocat" with inside ID separated by commas (I know it's stupid).
Example:
Table cms_catalogo
Column cc_famiglia: 1,2,3,4,5
Table cms_catalogo_famiglia
Column ccf_id: 3
The slowdown in the query may arise from the use of FIND_IN_SET that way?
If instead of having IDs separated by comma have a table with ID as an index would be faster?
I can not explain, however, why the first execution of the query is very slow and then speeds up
It is better to use constraint connections between tables. So you better connect them by primary key.
If you want just to quick optimisation for this query:
Check explain select ... in mysql to see performance of you query;
Add indexes for columns ccc_id, ccf_id, ccs_id;
Check explain select ... after indexes added.
The first MySQL query takes much more time because it is raw query, the next are cached. So you should rely on first query time.
If it is not complicated report then execution time should be less than 50-100ms, otherwise you can get problems with performance in total. Because I am so sure it is not the only one query for your application.

How do I query for the sum of averages?

I have data that resemble stock data that is being updated every hour. So there are 24 entries every day for each stock. (just using stock as an example). But sometimes, the data may not be updated.
For example, let's assume we have 3 stocks, A, B, C. And assume that we gather data at various intervals during the day for each stock. The data would look something like this...
row A B C
1 3 4 5
2 3.5 4.1 5
3 2.9 3.8 4.3
What I want is to sum up the average value of each stock for this time period or
Avg(A) + Avg(B) + Avg(C)
In reality I have hundreds of stocks and hundreds of thousands of rows. I need this to calculate for a single day.
I tried this (stock names are in an array - stocks = array('A','B','C'))
SELECT SUM(AVG(stock_price)) FROM table WHERE date = [mydate] AND stock_name IN () ('".implode("','", $stocks)."') GROUP BY stock_name
but that didn't work. Can someone provide some insight?
Thanks, in advance.
Calculate the per-stock averages in a sub-query, then sum them in the main query.
SELECT SUM(average_price) AS total_averages
FROM (SELECT AVG(price) AS average_price)
FROM table
WHERE <conditions>
GROUP BY stock_name) AS averages
One way to do it, use an inline view as a rowsource:
SELECT SUM(a.avg_stock_price) AS sum_avg_stock_price
FROM ( SELECT AVG(t.stock_price) AS avg_stock_price
FROM table t
WHERE t.date = [mydate]
AND t.stock_name IN ('a','b','c')
GROUP BY t.stock_name
) a
You can run just the query from the inline view (aliased as a) to get verify the results it returns. The outer query runs against the set of rows returned by the inline view query. (MySQL refers to the inline view (aliased as a) as a "derived table".
The outer query is effectively like this:
SELECT SUM(a.avg_stock_price) AS sum_avg_stock_price
FROM a
The "trick" is that "a" isn't a regular table, it's a set of rows returned by a query; but in terms of relational algebra theory, it works the same... it's a set or rows. If a were a regular table, we could write:
SELECT b.col
FROM (
SELECT col FROM a
) b
We don't want to do that in MySQL when we don't have to, because of the inefficient way that MySQL processes that. MySQL first runs the inner query (the query in the inline view). MySQL creates a temporary MyISAM table, and inserts the rows returned by the query into the temporary MyISAM table. MySQL then runs the outer query, against that temporary table (which MySQL refers to as a "derived table") to return the result. Creating and populating a temporary table that's a copy of a regular table is a lot of overhead, especially with large sets.
What makes this powerful is that inline view query can include JOINs, WHERE clause, aggregates, GROUP BY, whatever. As long as it returns a set of rows (with appropriate column names), we can wrap the query in parens, and reference it in another query like it was a table.