I have a bit of dilemma creating unique 1 to many type relationships to build a data warehouse table using cubes/Microsoft analytical services.
This is how my data is built
I've got a model with a fact_table, that has an order_no, package_no, and many other columns. For each order + package I want to provide all the performances that are in that package. The model we have is a fixed package can have the same 4 performances. But a flex package can have any 4 performances you want.
The way this is built is
Table1: Package_no | Package_details etc.
Table2: Package_no | Perf_Group_No (this has a unique ID for each package)
Table3: Perf_Group_No | Perf_no
Talbe4: Perf_no | Perf_details etc. ...
Through the above joins, we can display all the performances in a given package. This works really well when it comes to fixed packages because no matter what you buy if you have package ABC you will always have performances A, B and C. When it comes to Flex packages they are built to include all 25 performances and the place where we actually see what you selected is your Order table.
We are trying to translate this data into a dimension in a cube.
Person A - purchased order # 12345. They had Fixed Package ABC with performances A, B and C and they had a Flex Package with performances X, Y and Z
Person B - purchased order # 45678. They had Fixed Package ABC with performances A, B and C and they had a Flex Package with performances J, K and L
Person C - purchased order # 12098. They had Fixed Package ABC with performances A, B and C and they had a Flex Package with performances J, K and Z
Having the order_no and package_no helps us identify if its a flex or fixed package but we need the order itself to see what actual performances were purchased in the case of the flex.
Here is what I have
Order_No, Package_No
I need to build a series of views creating a series of joins between the Fact table with order_no/package_no and the performance_no to provide details for each order. However it seems I need to do it one to one and never one to many. No matter what I do, there is always a one to many relationship.
This is the idea we have -- the package table has a reference, and the performances have a reference and their combo creates the linking view/table. The problem is there is still a 1 to many relationship no matter how you slice it.
Any ideas or suggestions would be appreciated.
If more clarification is needed I'd be happy to provide.
If there is a way to create a 1 to many relationship in the CUBES itself, that could work as well but I haven't been able to find a good explanation of how to build that.
Thank you,
Related
Suppose I have some sample data like that shown below (with a lot more entries), and my main use case is to look up a specific aliment and provide a list of waiting times for different hospitals which offer that treatment.
Not being very experienced at all with DB design, I don't know whether in this example there is an advantage to using separate tables with links between then or if a simple import of the CSV to a single table will suffice.
If I used separate tables, I'm guessing they would be for hospital and ailment perhaps?
I would be very grateful if someone tell me the best approach for this.
ID,Main Department,Specific Complaint,Hospital ,Waiting time
1,Cardiology,general,Hospital 1,7
2,Cardiology,general,Hospital 2,7
3,Cardiology,general,Hospital 3,7
4,Cardiology,general,Hospital 4,21
5,Cardiology,traumatology,Hospital 1,8
6,Cardiology,traumatology,Hospital 2,7
7,Dermatology,general,Hospital 1,21
8,Dermatology,general,Hospital 2,14
9,Dermatology,general,Hospital 3,21
10,Dermatology,erysipelas,Hospital 1,7
11,Dermatology,erysipelas,Hospital 3,7
...
One detail you must understand, SO is not a teaching site, tutorials abound for that. It is more to address specific problems that arise when developing solutions. That being said, I like this type of question, so here goes.
The type of solution to implement (simple CSV or complete database) depends on the volume of data, and type type of reports you require.
CSV is quick to implement.
Database takes more time, but will allow you to produce more complex reports than CSV, through the use of queries.
CSV is often used as a medium to load or extract data, but as for queries it is not as powerful.
A database can be expanded. Ex. today you only consider the name of the hospital. You could expand your table to include the address, phone number, ... You could also expand your model to add insurance company links, doctors, ...
Basic modeling:
Identify your objects. Ex. here I would consider ailment, hospital, complaint.
Identify relations between objects, and their type. Ex. ailment and hospital are linked, the that link is n-n. Meaning 1 ailment can be treated in many hospitals, and 1 hospital can treat many ailments.
I am not certain what to do with complaint. In your question you do not specify if all hospitals treat all (ailment - complaint) duos or not. More on that later.
As you define your structure, make sure you apply the normal forms. In most cases, forms 1-3 are enough.
1NF: atomic values and no repeating groups. Ex. you would create table with columns hospital and ailments separated by commas. 1 line == 1 hospital <-> 1 ailment.
2NF: 1NF is achieved and all the non-key attributes are dependent on the primary key. Ex. you should not create a table linking ailment and wait time. The wait time is not dependent on the ailment, it is dependent on the combination of ailment and hospital.
3NF: 2NF is achieved and there are no transitive functional dependencies. So A is dependent on B, B is dependant on C, so A is transitively dependent on C.
Some critical questions must be answered before you can model your data:
A hospital can treat a certain ailment. In all cases?
Can you have: hospital 1 can tread ailment 1 when the complaint is A and B, but not C?
Ex. all hospitals can provide primary care for cardiac patients, but cardiac surgery can only be performed as some hospitals.
In that case, you cannot link ailment and hospital together directly. A combination of (ailment,complaint) can. And this will impact wait time.
Based on reality, I will link (ailment and complaint) and link this duo to hospital.
Here is my first model, "for fun", which might need to be modified for your needs:
Wait time is in table Hospital_Treads_Ailment_has_Complaint. In my model, an hospital can only estimate the wait time once they know which ailment and which complaint the patient has.
A final exercise I do to test my model is try the main queries I need. If one query cannot be done with the model, it needs to be changed.
Which hospital treats cardiac problems? Ok, select hospital where ailment == cardiology, complaint == *.
Which hospital can accept patients who have trauma. Ok, select hospital where ailment == *, complaint == trauma.
and so on...
I am working with a client in manufacturing whose products are configurations of the same bunch of parts. I am creating a database that holds all valid products and their Bill of Materials. I need help on deciding a Bill Of Material schedule to implement.
The obvious solution is a many-to-many relationship with a junction table:
Table 1: Products
Table 2: Parts
Junction Table: products, parts, part quantities
However, there are multiple levels in my client's product;
-Assembly
-Sub-Assembly
-Component
-Part
and items from lower levels are allowed to be associated with any upper level item;
Assembly |Sub-assembly
Assembly |Component
Assembly |Part
Sub-Assembly |Component
Sub-Assembly |Part
Component |Part
and I suspect the client will want to add more levels in the future when new product lines are added.
Correct me if I am wrong, but I believe the above relation schedule would demand a growing integer sequence of junction tables and queries (0+1+1+2+3...) to display and export the full Bill of Materials which may eventually affect performance.
Someone suggested to put everything in one table:
Table 1: Assemblies, sub-assemblies, components, parts, etc...
Junction table: Children and Parents
This only requires one junction table to create infinite levels of many-to-many relationships. I don't know if I trust this solution, but I can't think of any issues other than accidentally making an item its own parent and creating an infinite loop and that it sounds disorganized.
I lack the experience to determine whether either or neither of these models will work for my client. I am sketching these models in MS Access, but I am open to moving this project to a more powerful platform if necessary. Any input is appreciated. Thank you.
-M
What you are describing is a hierarchy. As such it should take the form:
part_hierarchy:
part_id | parent_part_id | other | attributes | of | this | relationship
So part_id 1 may have a parent part_id 10 "component" which may have a parent_part_id (when looked up itself in this table) of 12 "Assembly. It would look like:
part_id | parent_part_id
1 | 10
10 | 12
and parts table:
part_id | description
1 | widget
10 | widget component
12 | aircraft carrier
That's a little simplified since it doesn't take into account your product/part relationship, but it will all fit together using this methodology.
Nice and simple. Now it doesn't matter how deep the hierarchy goes. It's still just two columns (And any extra columns needed for attributes of this relationship like... create_date, last_changed_by_user, etc.
I would suggest something more powerful than access though since it lacks the ability to pick a part a hierarchy using a Recursive CTE, something that comes with SQL Server, Postgres, Oracle, and the like.
I would 100% avoid any schema that requires you to add more fields or tables as the hierarchy becomes deeper and more complex. That is a path that leads towards pain and regret.
Since the level of nesting is arbitrary, use one table with a self-referencing parent_id foreign key to itself.
While this is technically correct, navigating it requires recursive query that most DB's don't support. However, a simple and effective way of making accessing nested parts simple is to store a "path" to each component, which looks like a path in a file system.
For example, say part id 1 is a top level part that has a child whose id is 2, and part id 2 has a child part with id 3, the paths would be:
id parent_id path
1 null /1
2 1 /1/2
3 2 /1/2/3
Doing this means finding the tree of subparts for any part is simply:
select b.part
from parts a
join parts b on b.path like concat(a.path, '%')
where a.id = ?
I'm building an accommodation rental site for a specific town.
It will include, Houses, Resorts, Hotels etc.
I'm looking for advice on how best to link Property Features (Air-Con, Swimming Pool etc.) to individual properties.
I have a table of around 50 Property Features set up as feature_id, feature_category, feature_name.
What would be the best way to store which features relate to which property?
Would a column in the property table (prop_features) containing an array of feature_id be the best way?
The only example I've managed to find and be able to dissect the DB showed the features added as feature_1, feature_2 etc. which seemed really inefficient as some properties may only have feature_1 and feature_49 for example.
Each one was added as a column to the property_table.
I'm new to creating databases from scratch, so I'd be very grateful for any advice on how best to start with this section of my project.
(It's also why I'm not having much luck Googling it, as I'm not sure how to put it in more general terms that might yield me a solution).
One solution would be to have an intermediate table that joins properties to features like so:
CREATE TABLE propertyfeatures (property_id INT, feature_id INT);
If we have a property called Acme Hotel (property id 1) that has air conditioning (feature id 2) and swimming pool (feature id 4), the data would look something like:
property_id | feature_id
1 2
1 4
To retrieve features per property (excluding properties without features) a simple query would be:
SELECT
p.property_name,
f.feature_name,
f.feature_category
FROM property AS p
INNER JOIN propertyfeatures AS pf
ON p.property_id = pf.property_id
INNER JOIN features AS f
ON pf.feature_id = f.feature_id
GROUP BY p.property_id
Note: I have made assumptions about table and column names in your existing database. You'd have to adjust the above accordingly.
The only example I've managed to find and be able to dissect the DB showed the features added as feature_1, feature_2 etc. which seemed really inefficient as some properties may only have feature_1 and feature_49 for example. Each one was added as a column to the property_table.
Although this can be done, you're correct in that it's inefficient, or rather, it's awkward to maintain. It's referred to as pivoting because you're changing unique row values into multiple columns. For example, what if a new feature (e.g. Free Wifi) was added? It's not a case of simply inserting a new row of data as it would be with the intermediate table, you'd have to create a new column to support that.
Not only that, but you would still have to define the feature columns manually or dynamically. For reference, take a look at MySQL Pivot Table which demonstrates both manual and dynamic methods.
One simple way would be to add another table to your database having the columns. The keyword to this approach is "junction table", it is pretty basic in database design.
property_identifier | feature_identifier (feature_id in your case)
In this table you can display the connection between the properties and specific features.
So you could say property with property_id 1 has a pool (feature_id: 2) and a nice kitchen (feature_id: 23)
So the table would look like this:
propery_id | feature_id
1 | 2
1 | 23
I am struggling with some basics of MS Access 2010/2013. I am not sure if I did hit the limit of it or if I am just using the wrong procedure. I will explain what i need.
Take for example 5 items in a shop, where 3 are items and 2 are a combination of items.
They need to be presented in the same matter, i.e. the combinations of products (bundles), needs to have a number in the same series as those it contains. See figure below:
So far, I created 2 tables. 1 for stand alone products and 1 for bundles. Bundles should be able to include other bundles (This is where i get the problems).
If someone orders, lets say, 10 times Items 5, I need Access to count how many Motherboards (Item 1), how many CPU (item 2) etc. I need a full list of those items, so i hopefully should get a list that says:
10 x Motherboard
10 x CPU
10 x Cabinet
So i don't have to dig into each bundle. Hence, as i see it, the relationsship runs in sort of a loop.
I identify the items to be either a combination or product by a column with "yes/no".
If you have any suggestions, let me know, or if you think i hit the limit.
An alternative method is welcome, as well as a sample of a simple Access database.
For the record, the system is gonna be used for huge machines, creating lists of bolts, nuts, electrical equipment etc. The above is only to explain my thoughts.
Best Regards, Emil.
I recently ran into a problem in our SQL Server 2008 Analysis Services Cube. Imagine you have a simple sales data warehouse with orders and products. Each order can be associated with several products, and each product can be contained in several orders. So the data warehouse consists out of at least 3 tables: One for the Products, one for the Orders and one for the reference table, modelling the n:n relationship between both.
The question I want our cube to answer is: How many orders are there which contain both product x and product y?
In SQL, this is easy:
select orderid from dbo.OrderRefProduct
where ProductID = 1
intersect
select orderid from dbo.OrderRefProduct
where ProductID = 3
Since I am fairly proficient in SQL, but a newbie in MDX, I have been unable to implement that in MDX. I have tried using distinct count measures, the MDX-functions intersect and nonempty and subcubes. I also tried duplicating the dimensions logically (by adding the dimension to the cube twice) as well as physically (by duplicating the data source table and the dimension).
On http://www.zeitz.net/thts/intersection.zip, you can download a zip file of 25kB size which contains an SQL script with some test data and the Analysis Services Solution using the tables.
We are using SQL Server 2008 R2 and its Analysis Services counterpart. Performance considerations are not that important, as the data volume is rather low (millions of rows) compared to the other measure groups included in that cube (billions of rows).
The ultimate goal would be to be able to use the desired functionality in standard OLAP (custom calculated measures are ok), since Excel is our primary frontend, and our customers would like to choose their Products from the dimension list and get the correct result in the cube measures. But even a working standalone MDX-Query would greatly help.
Thank you!
Edit March 12th
Did I miss something or can't this be solved somehow?
If it helps to build the mdx, here is another way to get the results in sql, using subquerys. It can be further nested.
select distinct b.orderid from
(
select distinct orderid from dbo.OrderRefProduct
where ProductID = 1
) a
join dbo.OrderRefProduct b on (a.orderid = b.orderid)
where ProductID = 3
I tried something like this with subcubes in mdx, but didn't manage to succeed.
I've had a go - you can download my solution from here:
http://sdrv.ms/YWtMod
I've added a copy of your Fact table as a "Cross Reference", Aliased the Product1 dimension as a "Cross Reference", set the Dimension references to Product independently from your existing relationships, and specified the Many-to-Many relationships.
It is returning the right answer in Excel (sample attached).
You could extend that pattern as many times as you need.
Good luck!
Mike
an other way to deal with this in SQL (I know it works, but I didn't test this query) is to use double negation
select distinct orderid
from X
where TK NOT in (
select TK
from X x_alias
where productid NOT in (id1,id2)
)
I'm pretty sure you can do the same in MDX.