I am trying to store approximately 11 billion records in the destination table by querying various source tables via inner join.
I would need to store the data based on Year and Quarter. There is a need to store data from year 2000 onward.
So I would have tables for e.g
FinData2015_1
FinData2015_2
FinData2015_3
FinData2015_4
FinData2016_1
FinData2016_2
FinData2016_3
FinData2016_4
I planned to create the physical tables right from year 2000 to 50 years from now and implement a split condition component in SSIS.
So I would create in all 67 tables in all and 67 split conditions
See the screenshot below
Is there a better way to do this. That is creating table dynamically if data for that year and quarter exist
Related
This question already has answers here:
How do I create a view in MySQL? [closed]
(2 answers)
Closed last month.
I have a table (table1) in a MySQL database with data similar to this:
Category
total_sold
revenue
profit
fruit
32
200
150
veggies
12
50
23
chips
23
170
110
fruit
43
300
180
chips
5
25
15
New data is inserted into the table regularly through a python script using sqlalchemy (appends csv files to the mysql table using Pandas).
I want to create an aggregated table (table2) based on category using a query, and I want the aggregation to update whenever I add new records to the table1.
For example after executing the query:
CREATE TABLE table2 AS
SELECT Category, AVG(total_sold) avg_sold, AVG(revenue) avg_revenue, AVG(profit) avg_profit
FROM table1
GROUP BY 1
I get table2 as:
Category
avg_sold
avg_revenue
avg_profit
fruit
37.5
250
165
veggies
12
50
23
chips
14
97.5
62.5
Now if I insert several new records to table1, how can I make table2 update automatically? (recalculate all aggregations based on all table1 records including the newly added ones)
I read about creating a trigger, but I'm not sure how to execute it or if it would work in a case like this.
I Would appreciate any help.
MySQL does not support "materialized views". You have to do the aggregation yourself. Build a separate Summary Table and keep it updated. A simple IODKU is one way to do that maintenance; the link discusses other, such as a nightly batch update of the day's data.
I have a table of "outcomes" of scores for a few hundred people on specific days. E.g.
Person
Date
Score
1
1/1/2021
10
2
1/1/2021
15
1
2/2/2022
20
3
2/2/2022
17
I will need to repeatedly compare each players' average score for a specific date range. E.g. get each player's average score between 1/1/2021 and 12/31/2021.
I know that I could query their average using the AVG(score) aggregate function, like SELECT Person, AVG(Score) FROM outcomes WHERE date < ?;
However, since I have hundreds of players with possibly hundreds of outcomes, I am worried that repeatedly doing this query will be produce a lot of row reads. I am considering creating an "averages" table or view where there is an entry for each player on each unique date, but the Score is an average score for the outcomes before that date.
Something like:
Person
EndDate
AVG(Score)
1
1/2/2021
10
2
1/2/2021
15
3
1/2/2021
0
1
2/3/2022
15
2
2/3/2022
15
3
2/3/2022
17
I realize that this is essentially at least doubling the amount of storage required, because each outcome will also have the associated "average" entry.
How is this kind of problem often addressed in practice? At what point does creating an "averages" table make sense? When is using the AVG(x) function most appropriate? Should I just add an "average" column to my "outcomes" table?
I was able to implement my query using the aggregate AVG(x) function, but I am considered about the number of row reads that my database quickly started requiring.
What you are describing is a form of denormalization. Storing the result of an aggregation instead of running the query every time you need it.
When to implement this? When running the query cannot be done fast enough to meet your performance goals.
Be cautious about adopting denormalization too soon. It comes with a cost.
The risk is that if your underlying data changes, but your denormalized copy is not updated, then the stored averages will be outdated. You have to decide whether it's acceptable to query outdated aggregate results from the denormalized table, and how often you want to update those stored results. There isn't one answer to this — it's up to your project requirements and your judgment.
So I currently have a database with about 100.000 measurement results from 850 unique tests on 20 devices at 4 different temperatures.
Now i want to create 4 tables from my database containing 850 rows (1 for each unique test) and 20 columns containing the result of each of the unique devices at a single temperature.
I have attached a picture to visualize this below.
Anyone know what kind of SQL operation i should be using to get this output?
I'm using Microsoft Access 2007 to query two separate SharePoint sources.
The first has most of the data I need. The unique ID number for each row in the first has a corresponding column in the second. The ID from the first can occur multiple times in the second. (It's a mapping between two different databases of defects.)first.
What I want to do is this: find all the ID's from table one that occur in the second, and list ID's from the second for each item that corresponds with the first. For starters, I want something a bit like this:
Table 1 ID Table 2 ID's
5 9, 13, 23
10 11, 15
20 8
But there's also more data from Table 1 I want to display for each item.
What I'm getting is this:
Table 1 ID Table 2 ID Table 1 Data
5 9 Row 5 Additional Data
5 13 Row 5 Additional Data
5 23 Row 5 Additional Data
10 11 Row 10 Additional Data
10 15 Row 10 Additional Data
20 8 Row 20 Additional Data
What I want is something like this:
Table 1 ID Table 2 ID's Table 1 Data
5 9, 13, 23 Row 5 Additional Data
10 11, 15 Row 10 Additional Data
20 8 Row 20 Additional Data
Or perhaps:
Table 1 ID Table 2 ID's
5 9, 13, 23
Row 5 Additional Data
10 11, 15
Row 10 Additional Data
20 8
Row 20 Additional Data
How can I create a report like that?
Comma-separated list from multiple records
Grouping of multiple data rows into a comma-separated list is not a built-in feature of Access. There are various ways to do this, but I most often see links to Allen Browne's tutorial.
Multi-line row details
The difference between your last two examples is just a matter of formatting a Form or Report in Design View. A Report (here capitalized) in Access is a specific type of object for generating custom, formatted views of your data, often for printing or read-only viewing. A Form is a dynamic, on-screen view of your data. I suspect that your use of "report" is of a more general sense.
First of all, there is no way to make multiple lines using the default Datasheet View of tables and queries. To get multiple lines per row of data, you need to create a Form or Report object in Access. In Design View, you can move the data controls around the detail area to produce multiple lines for each data row. I suggest searching for tutorials on the web for creating Access Forms and Reports.
See Guide to designing reports.
I am maintaining record of expenses an expenses table looks like this
Expenses(id,name)
Expenses_data(id,amount,expense_id)
Expenses are based on years, lets say 10 years and i am saving it as months, so it would 120 months
If i would have 10 expenses then expenses_data would have 120*10 = 1200 Rows
I want to save it from 1200 rows to 120 rows and data would be like this as i enter in excel
id month marketing electricity bank charges
1 month-1 100 200 300
2 month-2 95.5 5000 100
Please suggest if it is possible and how ?
I think you probably what to stick to the database structure you already have, but use a query to display the data in the format you wish.
If you think about the number of data-points you're storing, there's not much difference between your sought schema and what you already have -- it's still 1200 data-points of expenses. Having to upgrade your schema each time you add an expense column would be pretty invasive.
Sticking with a query for your excel export would allow the database to understand the concept of expense categories, and updating your export query to include the new category would be much easier than modifying the schema. The necessary JOINs could even be calculated programmatically by iterating an initial query of "What Expense Categories are known?"