I'm designing a statistics tracking system for a sales organization that manages 300+ remote sales locations around the world. The system receives daily reports on sales figures (raw dollar values, and info-stats such as how many of X item were sold, etc.).
I'm using MAMP to build the system.
I'm planning on storing these figures in one MySQL big table, so each row is one day's statistics from one location. Here is a sample:
------------------------------------------------------------------
| LocationID | Date | Sales$ | Item1Sold | Item2Sold | Item3Sold |
------------------------------------------------------------------
| Hawaii | 3/4 | 100 | 2 | 3 | 4 |
| Turkey | 3/4 | 200 | 1 | 5 | 9 |
------------------------------------------------------------------
Because the organization will potentially receive a statistics update from each of 300 locations on a daily basis, I am estimating that within a month the table will have 9,000 records and within a year around 108,000. MySQL table partitioning based on the year should therefore keep queries in the 100,000 record range, which I think will allow steady performance over time.
(If anyone sees a problem with the theories in my above 'background data', feel free to mention them as I have no experience with building a large-scale database and this was simply what I have gathered with searching around the net.)
Now, on the front end of this system, it is web-based and has a primary focus on PHP. I plan on using the YUI framework I found online to display graph information.
What the organization needs to see is daily/weekly graphs of the sales figures of their remote locations, and whatever 'breakdown' statistics such as items sold (so you can "drill down" into a monetary graph and see what percentage of that income came from item X).
So if I have the statistics by LocationID, it's a fairly simple matter to organize this information by continent. If the system needs to display a graph of the sales figures for all locations in Europe, I can do a Query that JOINs a Dimension Table for the LocationID that gives its "continent" category and thereby sum (by date) all of those figures and display them on the graph. Or, to display weekly information, sum all of the daily reports in a given week and return them to my JS graph object as a JSON array, voila. Pretty simple stuff as far as I can see.
Now, my thought was to create "summary" tables of these common queries. When the user wants to pull up the last 3 months of sales for Africa, and the query has to go all the way down to the daily level and with various WHERE and JOIN clauses, sum up the appropriate LocationID's figures on a weekly basis, and then display to the user...well it just seemed more efficient to have a less granular table. Such a table would need to be automatically updated by new daily reports into the main table.
Here's the sort of hierarchy of data that would then need to exist:
1) Daily Figures by Location
2) Daily Figures by Continent based on Daily Figures by Location
3) Daily Figures for Planet based on Daily Figures by Continent
4) Weekly Figures by Location based on Daily Figures by Location
5) Weekly Figures By Continent based on Weekly Figures by Location
6) Weekly Figures for Planet based on Weekly Figures by Continent
So we have a kind of tree here, with the most granular information at the bottom (in one table, admittedly) and a series of less and less granular tables so that it is easier to fetch the data for long-term queries (partitioning the Daily Figures table by year will be useless if it receives queries for 3 years of weekly figures for the planet).
Now, first question: is this necessary at all? Is there a better way to achieve broad-scale query efficiency in the scenario I'm describing?
Assuming that there is no particularly better way to do this, how to go about this?
I discovered MySQL Triggers, which to me would seem capable of 'cascading the updates' as it were. After an INSERT into the Daily Figures table, a trigger could theoretically read the information of the inserted record and, based on its values, call an UPDATE on the appropriate record of the higher-level table. I.e., $100 made in Georgia on April 12th would prompt the United States table's 'April 10th-April 17th' record to UPDATE with a SUM of all of the daily records in that range, which would of course see the newly entered $100 and the new value would be correct.
Okay, so that's theoretically possible, but it seems too hard-coded. I want to build the system so that the organization can add/remove locations and set which continent they are in, which would mean that the triggers would have to be reconfigured to include that LocationID. The inability to make multiple triggers for a given command and table means that I would have to either store the trigger data separately or extract it from the trigger object, and then parse in/out the particular rule being added or removed, or keep an external array that I handled with PHP before this step, or...basically, a ton of annoying work.
While MySQL triggers initially seemed like my salvation, the more I look into how tricky it will be to implement them in the way that I need the more it seems like I am totally off the mark in how I am going about this, so I wanted to get some feedback from more experienced database people.
While I would appreciate intelligent answers with technical advice on how to accomplish what I'm trying to do, I will more deeply appreciate wise answers that explain the correct action (even if it's what I'm doing) and why it is correct.
Related
I have a Theory question, I have an access database and I want to track cost by task. Currently I have a task tracker table that will store the users Hours|HourlyRate and Overtime|OvertimeRate among other things work order no, project no etc. I don't think that this is the best way to store this data as the users could look at the table and see each others rates, before now it didn't matter much, but I'm about to give this database to more users. I was thinking of having that Rate data in a separate table linked to the ID no of the Task table and not allow users to have access to this table, but then I couldn't do an after update event as the user wont have access to write to that table. Either that or store the rates in a separate Database with a start and end date of that given rate. For instance:
Ed | Rate $0.01 | StartDate 01/01/1999 | EndDate 12/32/1999
Ed | Rate $0.02 | StartDate 01/01/2000 | EndDate 12/32/2000
This way I can store the costing data in a separate database that the users don't have access too and just calculate the cost information every time I need it based on date and unique user ID. I was wondering what solutions have others come up within MSAccess for this type of situation.
I am fairly new to mySql. I am working on a project, where I want to do evaluations on the economic efficiency of certain goods in python. For my first tries, I held my data in a csv-file. I now want to move on to a solid database solution and decided I would use mySql.
The dada consists for example of the following columns
id, GoodName, GoodWeightRaw, GoodWeightProcessed, RawMaterial, SellingPrice.
The "problem" is, that there are simple goods and there are combined goods.
Say:
id 1 - copper
id 2 - plastics
Somewhere further down we might have
id 50 - copper cable
Copper Cables are made from copper and plastics - therefore the RawMaterial of id50 would be the goods id1 and id2. Also the RawWeight of the copper cables would be the proccesed weight of copper + the processed weight of plastics.
In the csv file I use at the moment those values have all been "hard coded" and if some values of the basic materials change, I would have to look up in which combined goods they are used and change the values by hand accordingly.
I wonder wether there is a way in sql to automaticly compute values in a row from values in another row, as well as have them updated every time the other rows change.
So far I tried:
First I thought I might create two tables for basic and combined goods, however the combined goods would still not update themselfes.
I found out, that I can create table rows from SELECT statements, and create combined goods from a combination of basic goods this way. However those rows are also "permanent" once created and would still have to be updated manually.
So is there a clean best practice in SQL for rows, created from other rows and then updated accordingly when the correlating rows are changed?
Basic question of language ability.
I am developing a database to keep track of market trades and provide useful metrics to the user. Most brokers do not supply enough information in the transaction .csv file which would be imported to this database to combine strategies and positions in a useful way, or in a way that I envision can be useful for users. For instance, combining a buy order on AAPL of 1000 shares that was filled in three separate transactions on the same order (1 order for 1000 shares filled initially with 200, then 350, then 450 shares). I could only think of assigning a trade group to each of these so that I can group them.
So, the example above, each transaction would be a separate record, I've created a column in my table with the alias of Trade Group, and each transaction would be assigned 1 under the Trade Group column. The sale of the 1,000 shares, no matter how many transactions it took to fill the order, would also be assigned to trade group 1.
My query combines the shares for both opening and closing transactions by using the trade group and transaction type (buy or sell). If there is a match, 1,000 shares on the buy side, and 1,000 shares on the sell side, then it runs some queries to provide useful data about the trade.
The problem I foresee is that the trade grouping can become cumbersome since it currently has to be manually input. I would like to develop a counter to automatically increment the trade group of each the opening and closing transactions every time the balance of shares = 0.
So if both the buy and sell of the above example belonged to trade group 1, and now I decide to open up a position of 2000 shares of AAPL, and subsequently sell them, those transactions would be automatically assigned trade group 2. And now that the balance of shares is 0 again, the next time I open and close a position on AAPL it will be assigned trade group 3.
That way, I don't need to clutter up my table with something that is manually input which can create mistakes. Instead, the query assigns the trade grouping every time it is run, and the necessary metrics supplied.
Is this something that can be done using SQL alone?
Thanks.
I'm designing an Access .accdb for project management. The project contract stipulates a number of milestones for each project, with an associated date. The exact number of milestones depends on an "either/or" case of project size, but max of 6
My employer would like to track a [Forecast] date, [Actual] date and [Paid] date for each milestone, meaning a large sized project ends up with 24 dates associated with it, often duplicated (if a project runs to time, all four dates will be identical)
Currently, I have tblMilestones, which has a FK linking to tblProject and a record for each Milestone, with the 4 associated dates as fields in the record and a field to mark the milestone as complete or current.
I feel like we're collecting, storing and entering a lot of pretty pointless data - especially the [Forecast] date, for which we collect data from our project managers (not the most reliable of data anyway). Once the milestone is complete and the [Actual] date is entered, the [Forecast] date is pretty meaningless
I'd rather have the contract date in one table, entered when a new project is added, a reporting table for the changeable forecast date, set the Actual date when user marks milestone as complete and draw the paid date from transactions records.
Is this a better design approach? The db is small - less than 50 projects, so part of me thinks I'd just be making things more complicated than they need to be, especially in terms of the extra UI required.
Take a page out of dimensional data warehouse design and store dates in their own table with a DateID:
DateID DateValue
------ ----------
1 2000-01-01
... ...
9999 2012-12-31
Then turn all your date fields--Forecast, Actual, Paid, etc.--into foreign key references to the date table's DateID field.
To populate the dates table you can go two ways:
Use some VBA to generate a large set of dates, say 2005-01-01 to 2100-12-31, and insert them into the dates tables as a one-time operation.
Whenever someone types in a new date, check the dates table to see if it already exists, and if not, insert it.
Whichever way you do it, you'll obviously need an index on DateValue.
Taking a step back from the actual question, I'm realising that you're trying to fit two different uses into the same database--regular transactional use (as your project management app) and analytical use (tracking several different dates for your milestones--in other words, the milestone completion date is a Slowly Changing Dimension). You might want to consider splitting up these two uses into a regular transactional database and a data warehouse for analysis, and setting up an ETL process to move the data between them.
This way you can track only a milestone completion date and a payment date in your transactional database and the data warehouse will capture changes to the completion date over time. And allow you to do analysis and forecasting on that without bogging down the performance of the transactional (application) database.
We dont have any existing data warehouse, but we have customers (in OLTP) that have been with us many years and made purchases. How can I populate a customer dimension and then "replay" all the age updates that have occurred over the years, so that the type 2 dimension will have all the updates for those customers.
Since I want to populate the fact table with sales and refer to the DimCustomerFK. But when our clients query for data I want those customers to have the correct age. Since if I dont make any changes the customer will have the same age now and 10 years back when he placed the first order.
Any ideas how this can be made?
Interesting problem Patrik.
Some options:-
1) design SQL to parse through your customer / transaction OLTP data to create a daily flat file of customer updates. So you will end up with many thousand fairly small files (obviously depending on the number of customers you have and the date range). Name them Customeryyyymmdd.csv. Then create an ETL suite to read in the flat files in forward date order and apply the type 2 changes in order to the DWH.
2) build a very complex SQL query (I'm waving my hands around here as I dont know your data structures so couldnt suggest how complex this would be) that creates an ordered customer change list that you can pass through an ETL SCD component record by record.
Either seems logically feasible given what you have said earlier, but that may give you some ideas to consider that may give you a more concrete solution.
g/l
Mark.