I have a live, filterable report in my web app which is querying a list of loans and loan payments in MySQL. The goal is to display each loan in a table row and then a list of its loan payments in table columns that each represent a sum of loan payments for that day. We also allow the user to select a date range and aggregation level (daily / weekly / monthly). If the user chose Sept 1-3 with daily aggregation, the results would look like this:
Loan ID | sept 1 | sept 2 | sept 3
---------------------------------------
0001 | $350 | $239.45 | $112
0002 | $100 | $0 | $75
The 2 database tables are Loan and Payment where Payment stores the Loan ID, date, and amount of each payment.
When we run this query on a 60 day range, the result is ~45sec response time. We then tried to create our own pre-aggregated table which was 366 columns per year (Loan ID + daily date columns representing the sum of payments on that day). This increased the response time to > 60sec. That is not even including weekly or monthly aggregation which is even slower.
How can we speed this up? We're ideally looking for 10-15 sec response time, and I have tried every caching / indexing technique I can find without success.
You should discuss with the business what are business requirements or practical application of the table with 60 columns?
The result table looks fine for Sep1-3 example, but for 60 days date range? Who would look at this table? Would it better to group by weeks or months?
If the number of loans is limited
Related
I have a set of inventory data where the amount increases at a given rate. For example, the inventory increases by ten units every day. However, from time to time there will be an inventory reduction that could be any amount. I need a query that can find me the most recent inventory reduction and return to me the sum of that deduction.
My table holds date and amount for numerous item id's. In theory what I am trying to do is select all amounts and dates for a given item ID, and then find the difference between the most recent reduction between two days inventory. Due to the fact that multiple items are tracked, there is no guarantee that the id column will be consecutive for a set of items.
Researching to find a solution to this has been completely overwhelming. It seems like window functions might be a good route to try, but I have never used them and don't even really have a concept of where to start.
While I could easily return the amounts and do the calculation in PHP, I feel the right thing to do here is harness SQL but my experience with more complex queries is limited.
ID | ItemID | Date | Amount
1 2 2019-05-05 25
7 2 2019-05-06 26
34 2 2019-05-07 14
35 2 2019-05-08 15
67 2 2019-05-09 16
89 2 2019-05-10 5
105 2 2019-05-11 6
Given the data above, it would be nice to see a result like:
item id | date | reduction
2 2019-05-10 11
This is because the most recent inventory reduction is between id 67 and 89 and the amount of the reduction is 11 on May 10th 2019.
In MySQL 8+, you can use lag():
select t.*, (prev_amount - amount) as reduction
from (select t.*,
lag(amount) over (partition by itemid order by date) as prev_amount
from t
) t
where prev_amount > amount
order by date desc
limit 1;
hope you are well today.
I need some help in this case, need for each day, for each taxiID, the sum of the hours worked with passengers ( ent_pickup_time, ent_dropoff_time) and without passengers, ( ent_requested time, ent_dropofftime).
For example the taxi 003001 in the day march 3 of 2015 worked 3 hours with a passenger and 3 hours and a half without a passenger.
I have tried a million queries, and no one of them worked so far :(
My last query :
SELECT substring(hex(ent_id), 1, 3) AS fleetId, substring(hex(ent_id), 4, 16) AS taxiId,
(ent_requested_time), (ent_pickup_time), (ent_dropoff_time) , (ent_fare), (ent_distance),
SEC_TO_TIME(SUM(TIME_TO_SEC(ent_dropoff_time) - TIME_TO_SEC(ent_pickup_time))) AS Con_Pasajero,
SEC_TO_TIME(SUM(TIME_TO_SEC(ent_pickup_time) - TIME_TO_SEC(ent_requested_time))) AS Sin_pasajero
FROM tf_entities WHERE DATE(`ent_requested_time`) = '2015-03-03'
GROUP BY 'fleetId', taxiId ASC
order by taxiId ASC
In this Query I have to manually sum the time differences in a day, but I want to automatize the date-hour thing.
My wished result would be something like this, ordered by date and taxiId, for example:
Id|Taxi_Id| DATE |time_wihtout_passenger|time_with_passenger| total_time |
03|003001|2015-3-3| 00:30:00 | 01:02:00 | 01:32:00
ent_fare_total | ent_distance_total |
40,000 | 20,000
The time without passenger would be the difference between ent_requested_time and ent_pickup_time and the time with the passenger would be ent_pickup_time and ent_dropoff_time. Total time would be the SUM of the two of them.
Data:
Customer | Ship_Date | Ship_Weight
Peter 08/01/14 120
Peter 08/01/14 285
How do I summarize these two rows to get an answer by date:
Customer | Ship Date | Ship Weight
Peter 08/01/14 405
As you can see, there are multiple shipments on a single day. I want to summarize it to show unique ship dates with total ship weight.
I am using MS Access 2007.
SELECT Customer, Ship_Date, Sum(Ship_Weight) as Sum_Weight
From tblMyTable
Group By Customer, Ship_Date
You're going to need to make sure your Ship_Date is in Date format only and not DateTime, otherwise it will group by both Date and Time. If necessary, you may need to format that within the query.
compliment of the day.
Based on the previous feedback received,
After creating a Ticket sales database in MS Access. I want to use a single form to Query the price of a particular ticket at a particular month and have the price displayed back in the form in a text field or label.
Below are sample tables and used query
CompanyTable
CompID CompName
A Ann
B Bahn
C Can
KK Seven
- --
TicketTable
TicketCode TicketDes
10 Two people
11 Monthly
12 Weekend
14 Daily
TicketPriceTable
ID TicketCode Price ValidFrom
1 10 $35.50 8/1/2010
2 10 $38.50 8/1/2011
3 11 $20.50 8/1/2010
4 11 $25.00 11/1/2011
5 12 $50.50 12/1/2010
6 12 $60.50 1/1/2011
7 14 $15.50 2/1/2010
8 14 $19.00 3/1/2011
9 10 $40.50 4/1/2012
Used query:
SELECT TicketPriceTable.Price
FROM TicketPriceTable
WHERE (((TicketPriceTable.ValidFrom)=[DATE01]) AND ((TicketPriceTable.TicketCode)=[TCODE01]));
In MS Access, a mini boxes pops up to enter the parameters when running the query. How can I use a single form to enter the parameters for [DATE01] and [TCODE01]. and the price displayed in the same form in a textfield (For further calculations).
Such as 'Month' field equals to input to [DATE01] parameter
'Ticket Code' equals to input for [TCODE01] parameter
Textfield equals to output of the query result (Ticket price)
If possible, I would like to use only the Month and Year in this format MM/YYYY.The day is not necessarry. How can I achieve it in MS Access?
If any question, please don't hesitate to ask
Thanks very much for your time and anticipated feedback.
You can refer to the values in the form fields by using expressions like: [Forms]![NameOfTheForm]![NameOfTheField]
Entering up to 300 different types of tickets
Answer to your comment referring to Accessing data from a ticket database, based on months in MS Access)
You can use Cartesian products to create a lot of records. If you select two tables in a query but do not join them, the result is a Cartesian product, which means that every record from one table is combined with every record from the other.
Let's add a new table called MonthTable
MonthNr MonthName
1 January
2 February
3 March
... ...
Now if you combine this table containing 12 records with your TicketTable containing 4 records, you will get a result containing 48 records
SELECT M.MonthNr, M.MonthName, T.TicketCode, T.TicketDes
FROM MonthTable M, TicketTable T
ORDER BY M.MonthNr, T.TicketCode
You get something like this
MonthNr MonthName TicketCode TicketDes
1 January 10 Two people
1 January 11 Monthly
1 January 12 Weekend
1 January 14 Daily
2 February 10 Two people
2 February 11 Monthly
2 February 12 Weekend
2 February 14 Daily
3 March 10 Two people
3 March 11 Monthly
3 March 12 Weekend
3 March 14 Daily
... ... ... ...
You can also get the price actually valid for a ticket type like this
SELECT TicketCode, Price, ActualPeriod AS ValidFrom
FROM (SELECT TicketCode, MAX(ValidFrom) AS ActualPeriod
FROM TicketPriceTable
WHERE ValidFrom <= Date
GROUP BY TicketCode) X
INNER JOIN TicketPriceTable T
ON X.TicketCode = T.TicketCode AND X.ActualPeriod=T.ValidFrom
The WHERE ValidFrom <= Date is in case that you entered future prices.
Here the subquery selects the actually valid period, i.e. the ValidFrom that applies for each TicketCode. If you find sub-selects a bit confusing, you can also store them as query in Access or as view in MySQL and base a subsequent query on them. This has the advantage that you can create them in the query designer.
Consider not creating all your 300 records physically, but just getting them dynamically from a Cartesian product.
I let you put all the pieces together now.
In Access Forms you can set the RecordSource to be a query, not only a table. This can be either the name of a stored query or a SQL statement. This allows you to have controls bound to different tables through this query.
You can also place subforms on the main form that are bound to other tables than the main form.
You can also display the result of an expression in a TextBox by setting the ControlSource to an expression by starting with an equal sign
=DLookUp("Price", "TicketPriceTable", "TicketCode=" & Me!cboTicketCode.Value)
You can set the Format of a TextBox to MM\/yyyy or use the format function
s = Format$(Now, "MM\/yyyy")
I've got a table which keeps track of article views. It has the following columns:
id, article_id, day, month, year, views_count.
Let's say I want to keep track of daily views / each day for every article. If I have 1,000 user written articles. The number of rows would compute to:
365 (1 year) * 1,000 => 365,000
Which is not too bad. But let say. The number of articles grow to 1M. And as time passes by to 3 years. The number of rows would compute to:
365 * 3 * 1,000,000 => 1,095,000,000
Obviously, over time, this table will keep growing. And quite fast. What problems will this cause? Or should I not worry since RDBM's handle situations like this quite commonly?
I plan on using the views data in our reports. Either break it down to months or even years. Should I worry about 1B+ rows in a table?
The question to ask yourself (or your stakeholders) is: do you really need 1-day resolution on older data?
Have a look into how products like MRTG, via RRD, do their logging. The theory is you don't store all the data at maximum resolution indefinitely, but regularly aggregate them into larger and larger summaries.
That allows you to have 1-second resolution for perhaps the last 5-minutes, then 5-minute averages for the last hour, then hourly for a day, daily for a month, and so on.
So, for example, if you have a bunch of records like this for a single article:
year | month | day | count | type
-----+-------+-----+-------|------
2011 | 12 | 1 | 5 | day
2011 | 12 | 2 | 7 | day
2011 | 12 | 3 | 10 | day
2011 | 12 | 4 | 50 | day
You would then at regular periods create a new record(s) that summarises these data, in this example just the total count for the month
year | month | day | count | type
-----+-------+-----+-------|------
2011 | 12 | 0 | 72 | month
Or the average per day:
year | month | day | count | type
-----+-------+-----+-------+------
2011 | 12 | 0 | 2.3 | month
Of course you may need some flag to indicate the "summarised" status of the data, in this case I've used a 'type' column for finding the "raw" records and the processed records, allowing you to purge out the day records as required.
INSERT INTO statistics (article_id, year, month, day, count, type)
SELECT article_id, year, month, max(day), sum(count), 'month'
FROM statistics
WHERE type = 'day'
GROUP BY article_id, year, month, type
(I haven't tested that query, it's just an example)
The answer is "it depends". but yes, it will probably be a lot to deal with.
However - this is generally a problem of "cross that bridge when you need to". It's a good idea to think about what you could do if this becomes a problem for you in the future.. but it's probably too early to actually implement any suggestions until they're necessary.
My suggestion, if it ever occurs, is to not keep the individual records for longer than X-months (where you adjust X according to your needs). Instead, you'd store the aggregated data that you currently feed into your reports. What you'd do is run, say, a daily script that looks at your records and grabs any that are over X months old... and create a "daily_stats" object of some sort, then delete the originals (or better yet, archives them somewhere).
This will ensure that only X-months worth of data are ever in the db - but you still have quick access to an aggregated form of the stats for long-timeline reports.
It's not something you need to worry about if you can put some practices in place.
Partition the table; this should make archiving easier to do
Determine how much data you need at present
Determine how much data you can archive
Ensure that the table has the right build, perhaps in terms of data types and indexes
Schedule for a time when you will archive partitions that meet the aging requirements
Schedule for index checking (and other table checks)
If you have a DBA in your team, then you can discuss it with him/her, and I'm sure they'll be glad to assist.
Also, like what is used in many data warehouses, and I just saw #Taryn's post (which I agree with -> )store aggregated data as well. This is quickly suggested based on the data you keep in the involved table. If you have trouble with possible editing/updating of records, then it brings to light (even more) the fact that you will just have to set restrictions like how much data to keep (which means this data is what can be modified) and have procedures+jobs in place to ensure that the aggregated data is checked/updated daily and can be updated/checked manually when any changes are made. This way, data integrity is maintained. Discuss with your DBA what other approaches you can take...
By the way, in case you didn't already know.. Aggregated data are normally needed for weekly or monthly reports, and many other reports based upon an interval. Granulize your aggregation as needed, but not so much that it becomes too tedious or seemingly exaggerated.