I am trying to create a report to understand the time-of-day that orders are being placed, so I need to sum and group them by time. For example, I would like a sum of all orders placed between 1 and 1:59, then the next row listing the sum of all orders between 2:00 and 2:59, etc. The field is a datetime variable, but for the life me I haven't been able to find the right query to do this. Any suggestions sending me down the right path would be greatly appreciated.
Thanks
If by luck it is mysql and by sum of orders you mean the number of orders and not the value amount:
select date_format(date_field, '%Y-%m-%d %H') as the_hour, count(*)
from my_table
group by the_hour
order by the_hour
This king of grouping (using a calculated field) will certainly not scale over time. If you really need to execute this specific GROUP BY/ORDER BY frequently, you should create an extra field (an UNSIGNED TINYINT field will suffice) storing the hour and place an INDEX on that column.
That is of course if your table is becoming quite big, if it is small (which cannot be stated in mere number of records because it is actually a matter of server configuration and capabilities as well) you won't probably notice much difference in performance.
Related
I have a MySQL table with nearly 4.000.000 rows containing income transactions of more than 100.000 employees.
There are three columns relevant in it, which are:
Employee ID [VARCHAR and INDEX] (not unique since one employee gets more than one income);
Type of Income [also VARCHAR and INDEX]
Value of the Income [Decimal; 10,2]
What I was looking to do seems to be very simple to me. I wanted to sum all the income occurrences grouping by each employee, filtering by one type.
For that, I was using the following code:
SELECT
SUM(`value`) AS `SumofValue`,
`type`,
`EmployeeID`
FROM
`Revenue`
GROUP BY `EmployeeID`
HAVING `type` = 'X'
And the result was supposed to be something like this:
SUM TYPE EMPLOYEE ID
R$ 250,00 X 250000008377
R$ 5.000,00 X 250000004321
R$ 3.200,00 X 250000005432
R$ 1.600,00 X 250000008765
....
However, this is taking a long time. I decide to use the LIMIT command to limit the results just to 1.000 rows and it is working, but if i want to do for the whole table, it would take approximately 1 hous according to my projections. This seems to be way too much time for something that does not look sooooo demandable to me (but i'm assuming i'm probably wrong). Not only that, but this is just the first step on an even more complex query that i intend to run in the future, in which i will group also by Employer ID, aside with Employee ID (one person can get income from more than one employer).
Is there any way to optimize this? Is there anything wrong with my code? Is there any secret path to increase the speed of this operation? Should I index the column of the value of the income as well? If this is a MySQL limitation, is there any option that could handle this better?
I would really appreaciate any help.
Thanks in advance
DISCLOSURE: This is a open government database. All this data is lawfully open to the public.
First, phrase the query using WHERE, rather than HAVING -- filter before doing the aggregation:
SELECT SUM(`value`) AS `SumofValue`,
MAX(type) as type,
EmployeeID
FROM Revenue r
WHERE `type` = 'X'
GROUP BY EmployeeID;
Next, try using this index: (type, EmployeeId, value). At the very least, this is a covering index for the query. MySQL (depending on the version) might be smart enough to use it for the aggregation as well.
As per your defined schema, Why you are using VARCHAR datatype for Employee ID and Type.
You can create reference table for Type with 1-->X, 2-->Y...So basically integer reference will be for type in transaction table.
Just create one dummy table with below one and execute your same query which was taking hour. Even you will see major change in execution plan as well.
CREATE TABLE test_transaction
(
Employee_ID BIGINT,
Type SMALLINT,
Income DECIMAL(10,2)
)
Create separate index on Employee_ID and Type column.
I have table named Amounts with columns RowId, CounterId and Amount. It is easy to group by CounterId and get counters average value, but if I want also to get last value of Amount in group to know is it bigger or smaller than average, I’m in trouble? How to get that as just including Amount in query gives me first value of Amount in group what is useless. Maybe it is easy to do, but I have not found answer for my problem with just one table. I found, how to find only last Amount in group by help with RowId, but how to obtain them - average and last value - to one result, is mystery for me now… Thanks ahead.
Thanks to Ram Bath I built what I needed and result is here:
SELECT Kliendid.Id AS Id,
Kliendid.Nimi AS Klient,
MIN(X.Tarbimine) AS Piseim,
AVG(X.Tarbimine) AS Keskmine,
MAX(X.Tarbimine) AS Suureim,
COUNT(X.Tarbimine) AS Kuid,
(
Select Tarbimine
from Naidud A
where A.Id=MAX(X.Id)
) as Viimane
FROM Naidud X
INNER JOIN Kliendid ON Kliendid.ID=X.Klient
INNER JOIN Mooturid ON Mooturid.ID=X.Mootur
WHERE X.Tarbimine>0
AND X.Aeg>'2015-12-31'
AND Mooturid.Kasutusel=1
GROUP BY X.Mootur
HAVING Kuid>5
AND (Viimane=Piseim OR Viimane=Suureim)
As you see, my question was simplified as I use Estonian for table and column names and there would be much harder to help if I had shared code from the beginning... Thanks again for all of you.
Here I am assuming you RowId is Unique or is the Primary Key.
SELECT RowId,AVG(AmountId) as Average_Amount,(Select Amount from Amounts a where a.RowId=MAX(X.RowId)) as LastAmount from Amounts X Group by X.CounterId;
I have a big database with about 3 million records with records containing a time stamp.
Now I want to select one record per month and it works using this query:
SELECT timestamp, id, gas_used, kwh_used1, kwh_used2 FROM energy
GROUP BY MONTH(timestamp) ORDER BY timestamp ASC
It works but it is very slow.
I have indexes on id and on timestamp.
What can I do to make this query fast?
GROUP BY MONTH(timestamp) is forcing the engine to look at each record individually, aka a sequential scan, which obviously is very slow when you have 30M records.
A common solution is to add an indexed column with just the criterium you will want to select on. However, I highly suspect that you will actually want to select on Year-Month, if your db is not reset every year.
To avoid data corruption issues, it may be best to create an insert trigger that automatically fills that field. That way this extra column doesn't interfere with your business logic.
It is not a good practice to SELECT columns that don't appear in GROUP BY statement, unless they are handled with aggregating function such as MIN(), MAX(), SUM() etc.
In your query this applies to columns:
id, gas_used, kwh_used1, kwh_used2
You will not get the "earliest" (by timestamp) row for each month in this case.
More:
https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html
I have a pricing history table with half a billion records. It is formatted like this:
Id, sku, vendor, price, datetime
What I want to do is get average price of all products by vendor for a certain date range. Most products are updated once every 3 days, but it varies.
So, this is the query I want to run:
SELECT
avg(price)
FROM table
WHERE
vendor='acme'
AND datetime > '12-15-2014'
AND datetime < '12-18-2014'
GROUP BY sku
This 3 day range is broad enough that i will for sure get at least one price sample, but some skus may have been sampled more than once, hence group by to try and get only one instance of each sku.
The problem is, this query runs and runs and doesn't seem to finish (more than 15 minutes). There are around 500k unique skus.
Any ideas?
edit: corrected asin to sku
For this query to be optimized by mysql you need to create a composite index
(vendor, datetime, asin)
IN THIS PARTICULAR ORDER (it mattters)
It also worth trying creating another one
(vendor, datetime, asin, price)
since it may perform better (since it's a so called "covering index").
The indexes with other order, like (datetime, vendor) (which is suggested in another answer) are useless since the datetime is used in a range comparison.
Few notes:
The index will be helpful if only the vendor='acme' AND datetime > '12-15-2014' AND datetime < '12-18-2014' filter condition covers a small part of the whole table (say less than 10%)
Mysql does not support dd-mm-yyyy literals (at least it's not documented, see references) so I assume it must be yyyy-mm-dd instead
Your comparison does not cover the first second of the December 15th, 2014. So you probably wanted datetime >= '2014-12-15' instead.
References:
http://dev.mysql.com/doc/refman/5.6/en/range-optimization.html
http://dev.mysql.com/doc/refman/5.6/en/date-and-time-literals.html
You need an index to support your query. Suggest you create an index on vendor and datetime like so:
CREATE INDEX pricing_history_date_vendor ON pricing_history (datetime, vendor);
Also, I assume you wanted to group by sku rather than undefined column asin.
Not to mention your non-standard SQL date format MM-dd-yyyy as pointed out by others in comments (should be yyyy-MM-dd).
I have an order table that contains dates and amounts for each order, this table is big and contains more that 1000000 records and growing.
We need to create a set of queries to calculate certain milestones, is there a way in mysql to figure out on which date we reached an aggregate milestone of x amount.
For e.g we crossed 1 m sales on '2011-01-01'
Currently we scan the entire table then use the logic in PHP to figure out the date, but it would be great if this could be done in mysql without reading so many records at 1 time.
There maybe elegant approaches, but what you can do is maintain a row in another table which contains, current_sales and date it occurred. Every time you have a sale, increment the value, and store sales date. If the expected milestones(1 Million, 2 Million etc) are known in advance, you can store them away when they occur(in same or different table)
i think using gunner's logic with trigger will be a good option as it reduce your efforts to maintain the row and after that you can send mail notification through trigger to know the milestone status