How do I query for the sum of averages? - mysql

I have data that resemble stock data that is being updated every hour. So there are 24 entries every day for each stock. (just using stock as an example). But sometimes, the data may not be updated.
For example, let's assume we have 3 stocks, A, B, C. And assume that we gather data at various intervals during the day for each stock. The data would look something like this...
row A B C
1 3 4 5
2 3.5 4.1 5
3 2.9 3.8 4.3
What I want is to sum up the average value of each stock for this time period or
Avg(A) + Avg(B) + Avg(C)
In reality I have hundreds of stocks and hundreds of thousands of rows. I need this to calculate for a single day.
I tried this (stock names are in an array - stocks = array('A','B','C'))
SELECT SUM(AVG(stock_price)) FROM table WHERE date = [mydate] AND stock_name IN () ('".implode("','", $stocks)."') GROUP BY stock_name
but that didn't work. Can someone provide some insight?
Thanks, in advance.

Calculate the per-stock averages in a sub-query, then sum them in the main query.
SELECT SUM(average_price) AS total_averages
FROM (SELECT AVG(price) AS average_price)
FROM table
WHERE <conditions>
GROUP BY stock_name) AS averages

One way to do it, use an inline view as a rowsource:
SELECT SUM(a.avg_stock_price) AS sum_avg_stock_price
FROM ( SELECT AVG(t.stock_price) AS avg_stock_price
FROM table t
WHERE t.date = [mydate]
AND t.stock_name IN ('a','b','c')
GROUP BY t.stock_name
) a
You can run just the query from the inline view (aliased as a) to get verify the results it returns. The outer query runs against the set of rows returned by the inline view query. (MySQL refers to the inline view (aliased as a) as a "derived table".
The outer query is effectively like this:
SELECT SUM(a.avg_stock_price) AS sum_avg_stock_price
FROM a
The "trick" is that "a" isn't a regular table, it's a set of rows returned by a query; but in terms of relational algebra theory, it works the same... it's a set or rows. If a were a regular table, we could write:
SELECT b.col
FROM (
SELECT col FROM a
) b
We don't want to do that in MySQL when we don't have to, because of the inefficient way that MySQL processes that. MySQL first runs the inner query (the query in the inline view). MySQL creates a temporary MyISAM table, and inserts the rows returned by the query into the temporary MyISAM table. MySQL then runs the outer query, against that temporary table (which MySQL refers to as a "derived table") to return the result. Creating and populating a temporary table that's a copy of a regular table is a lot of overhead, especially with large sets.
What makes this powerful is that inline view query can include JOINs, WHERE clause, aggregates, GROUP BY, whatever. As long as it returns a set of rows (with appropriate column names), we can wrap the query in parens, and reference it in another query like it was a table.

Related

Specific where clause in Mysql query

So i have a mysql table with over 9 million records. They are call records. Each record represents 1 individual call. The columns are as follows:
CUSTOMER
RAW_SECS
TERM_TRUNK
CALL_DATE
There are others but these are the ones I will be using.
So I need to count the total number of calls for a certain week in a certain Term Trunk. I then need to sum up the number of seconds for those calls. Then I need to count the total number of calls that were below 7 seconds. I always do this in 2 queries and combine them but I was wondering if there were ways to do it in one? I'm new to mysql so i'm sure my syntax is horrific but here is what I do...
Query 1:
SELECT CUSTOMER, SUM(RAW_SECS), COUNT(*)
FROM Mytable
WHERE TERM_TRUNK IN ('Mytrunk1', 'Mytrunk2')
GROUP BY CUSTOMER;
Query 2:
SELECT CUSTOMER, COUNT(*)
FROM Mytable2
WHERE TERM_TRUNK IN ('Mytrunk1', 'Mytrunk2') AND RAW_SECS < 7
GROUP BY CUSTOMER;
Is there any way to combine these two queries into one? Or maybe just a better way of doing it? I appreciate all the help!
There are 2 ways of achieving the expected outcome in a single query:
conditional counting: use a case expression or if() function within the count() (or sum()) to count only specific records
use self join: left join the table on itself using the id field of the table and in the join condition filter the alias on the right hand side of the join on calls shorter than 7 seconds
The advantage of the 2nd approach is that you may be able to use indexes to speed it up, while the conditional counting cannot use indexes.
SELECT m1.CUSTOMER, SUM(m1.RAW_SECS), COUNT(m1.customer), count(m2.customer)
FROM Mytable m1
LEFT JOIN Mytable m2 ON m1.id=m2.id and m2.raw_secs<7
WHERE TERM_TRUNK IN ('Mytrunk1', 'Mytrunk2')
GROUP BY CUSTOMER;

MySQL how can I speed up UDF in my query

I have one table that has userID, department, er. I've created one simple query to gather all this information.
SELECT table.userID, table.department, table.er FROM table;
Now, I want to group all er's that belong to the same department and perform this calculation
select sum(table.er)/3 as department_er from table group by table.department;
Then add this result as a new column in my first query. To do this I've created a UDF that looks like this
BEGIN
DECLARE department_er FLOAT;
set department_er = (select sum(er) from table where table.department = dpt);
RETURN department_er;
END
Then I used that UDF in this query
SELECT table.userID, table.department, (select dptER(table.department)/3) as department_er FROM table
I've indexed my tables and more complex queries were dropped from 4+ minutes to less than 1 second. This seems to be pretty simple but is going on 10 minutes to run. Is there a better way to do this or a way to optimize my UDF?
Forgive my n00b-ness :)
Try a query without a dependent aggregated subquery in SELECT clause:
select table.userID,
table.department as dpt,
x.department_er
from table
join (
select department,
(sum(table.er)/3) As department_er
from table
group by department
) x
ON x.department = table.department
This UDF function cannot be optimized. Maybe it seems to work in simple queries, but generally it can hurt your database performance.
Imagine that we have a query like this one:
SELECT ....., UDF( some parameters )
FROM table
....
MySql must call this funcion for each record that is retrieved from the table in this query
If the table contains 1000 records - the function is fired 1000 times.
And the query within the function is also fired 1000 times.
If 10.000 records - then the function is called 10.000 times.
Even if you optimize this function in such a way, that the UDF will be 2 times faster, the above query will still fire the function 1000 times.
If 500 users have the same department - it still is called 500 times for each user and calculates the same value for each of them. 499 redundant calls, because only 1 call is required to calculate this value.
The only way to optimize such queries is to take the "inner" query out of the UDF function and combine it with the main query using joins etc.

mysql Subquery with JOIN bad performance

My problem is this:
select * from
(
select * from barcodesA
UNION ALL
select * from barcodesB
)
as barcodesTOTAL, boxes
where barcodesTotal.code=boxes.code;
Table barcodesA has 4000 entries
Table barcodesB has 4000 entries
Table boxes has like 180.000 entries
It takes 30 seconds to proccess the query.
Another problematic query:
select * from
viewBarcodesTotal, boxes
where barcodesTotal.code=boxes.code;
viewBarcodesTotal contains the UNION ALL from both barcodes tables. It also takes forever.
Meanwhile,
select * from barcodesA , boxes where barcodesA.code=boxes.code
UNION ALL
select * from barcodesB , boxes where barcodesB.code=boxes.code
This one takes <1second.
The question is obviously WHY?, is my code bugged? is mysql bugged?
I have to migrate from access to mysql, and i would have to rewrite all my code if the first option in bugged.
Add an index on boxes.code if you don't already have one. Joining 8000 records (4K+4K) to the 180,000 will benefit from an index on the 180K side of the equation.
Also, be explicit and specify the fields you need back in your SELECT statements. Using * in a production-use query is bad form as it encourages not having to think about what fields (and how big they might be), not to mention the fact that you have 2 different tables in your example, barcodesa and barcodesb with potentially different data types and column orders that you're UNIONing....
The REASON for the performance difference...
The first query says... First, do a complete union of EVERY record in A UNIONed with EVERY record in B, THEN Join it to boxes on the code. The union does not have an index to be optimized against.
By explicitly applying your SECOND query instance, each table individually IS optimized on the join (apparently there IS an index per performance of second, but I would ensure both tables have index on "code" column).

mysql / matlab: optimize query - removing dates from a list

I have a table with ~3M rows. The rows are date, time, msec, and some other columns with int data. Some unknown fraction of these rows are considered 'invalid' based on their existence in a separate table outages (based on date ranges).
Currently the query does a select * and then uses a huge WHERE to remove the invalid date ranges ( lots of 'and not ( RecordDate >'2008-08-05' and RecordDate < '2008-08-10' )') and so on. This blows away any chance of using an index.
Im looking for a better way to limit the results. As it stands now the query takes several minutes to run.
DELETE b FROM bigtable b
INNER JOIN outages o ON (b.`date` BETWEEN o.datestart AND o.dateend)
WHERE (1=1) //In some modes MySQL demands a `where` clause or it will not run.
Make sure you have an index on all fields involved in the query.

Best way to combine multiple advanced mysql select queries

I have multiple select statements from different tables on the same database. I was using multiple, separate queries then loading to my array and sorting (again, after ordering in query).
I would like to combine into one statement to speed up results and make it easier to "load more" (see bottom).
Each query uses SELECT, LEFT JOIN, WHERE and ORDER BY commands which are not the same for each table.
I may not need order by in each statement, but I want the end result, ultimately, to be ordered by a field representing a time (not necessarily the same field name across all tables).
I would want to limit total query results to a number, in my case 100.
I then use a loop through results and for each row I test if OBJECTNAME_ID (ie; comment_id, event_id, upload_id) isset then LOAD_WHATEVER_OBJECT which takes the row and pushes data into an array.
I won't have to sort the array afterwards because it was loaded in order via mysql.
Later in the app, I will "load more" by skipping the first 100, 200 or whatever page*100 is and limit by 100 again with the same query.
The end result from the database would pref look like "this":
RESULT - selected fields from a table - field to sort on is greatest
RESULT - selected fields from a possibly different table - field to sort on is next greatest
RESULT - selected fields from a possibly different table table - field to sort on is third greatest
etc, etc
I see a lot of simpler combined statements, but nothing quite like this.
Any help would be GREATLY appreciated.
easiest way might be a UNION here ( http://dev.mysql.com/doc/refman/5.0/en/union.html ):
(SELECT a,b,c FROM t1)
UNION
(SELECT d AS a, e AS b, f AS c FROM t2)
ORDER BY a DESC