SSAS many to many dimension hierarchy - many-to-many

First of all, this is very, very simple data warehouse that I made only to ask following, specific question.
Scenario:
I have one fact table FactSales, and 2 dimensions: DimShop and DimProduct, and they are both separated from each other and directly connected to the fact table. some shops can sell selected products and vice versa, some products can be selled in specific shops. This give us many to many relationship. The problem is when I try to slice my cube i get all combinations between shops and products.
Question:
How can I create hierarchy between two separated dimensions in SSAS with many to many relationship? i tried to use brigde table but i was unable to configure hierarchy in SSAS. Is it even possible?

If you're trying to report on "what can happen" rather than "what did happen", you need a separate fact table & cube to represent the relationship between products and the shops that can sell the products. It's not really a hierarchy since it's many to many.
A simple cross reference fact should be fine:
FACT_PRODUCT_SHOP
ProductID
ShopID
Then when doing reports that want to see what products are allowed to be sold in what stores, you can use this fact table. The sales fact only shows "what actually happens".
You can even modify this fact to be your Inventory fact table, just adding a date and "In Stock amount" and "On order amount" etc..

It is possible to implement such a design but it may not perform well.
Basically instead of product and shop key in the fact table, you need an alternative key.
This key will be the unique combination of products and shops. That needs to be prepared in the ETL.
In a new dimension named "Shops and Products", on top of this key, you can create 2 hierarchies Product and Shop in the same dimension.
Additionaly, you can also create an unnatural hierarchy as you requested. But since it is an unnatural hierarchy, it may not perform well.
So in addition to Product and Shop hierarchies, you can provide following unnatural hierarchies: Shop -> Product, Product -> Shop.

Related

Solving a one-to-one problem in database design

Sorry if this is obvious but I'm new to database design.
A customer must make a reservation before renting an item(s), he provides details up front such as dates of reservation, item type etc. The employee checks if item is available before allowing the customer to rent it. If available he enters item id, rental date, return date, etc into the system.
Am I correct in creating two tables for this? One for Reservations(which includes the proposed rental info.) and one for Rentals (Which includes actual rental info). And If so, wouldn't these have a one to one relationship? How could I get around this one to one relationship? Should I merge the two tables?
Firstly, since a reservation may never materialize as a rental, the relationship is not exactly 1:1 but 1:(0-1).
I would think that it's correct that you model them as separate entities since:
They may have different "life cycles".
They most likely have different properties.
A rental will probably be related to a bunch of other entities compared to a reservation. Those FKs will make sense for rentals but not for reservations.
I might be wrong but from what i'm understanding you can have just 1 table for rentals and have a column named status as enum (0,1) 0 being available and 1 rented. I'm assuming you are not renting the same item at the same time.

How best to implent 3-dimensional SQL relationship?

This question is an extension of another question I asked regarding many-to-many relationships in MySQL.
I currently have 3 tables that I need to link with a 4th intermediary table:
Stores, Products, and States
My intermediary table, _stores_products_states, combines the id from the other three tables to determine which product is sold by which store and in which state.
Now, as I understand it, I would need to create an entry in _stores_products_states for every possible combination of the three, correct? This would lead to thousands of duplicated values in 1-2 of the columns (though never all 3).
For example, if Best Guy sells both GI Bros and Darbies in all 50 states, that would be 100 entries just for those two products. If those products are sold by another store, they too would have 100 entries.
Is this the correct way to implement this kind of relationship?
EDIT:
This whole setup is basically just to determine the availability of a particular product. A user will search for a product and receive a list of stores that sell that product in their state.
The 4th table is the way!
So if I got it right, your '_stores_products_states' table could even be called sale
You do not need to create a record for all possible combinations of product, state, and store. You only need to create a record for existing combinations, that is, availability of a product in a store in a state (maybe with things like local price and quantity bolted on).
You will have to store this information one way or another; a 3-relation link table, especially stored as a clustered MySQL index, would be a pretty standard solution, with good performance characteristics.
One thing I wonder about is why you have stores separate from states. I'd expect a store to be associated with a state. With the 3-relation link table, you'd be able to associate the same store with a product in several different states. Is this what your business domain supposes?

Should I move a dimension attribute to it's own dimension?

The question I have below is hypothetical, I'm using SSAS:
Let's say I have a dimension (dim_Product) representing products I sell in a store, I have another dimension (dim_Employee) representing the company's employees and I have a fact table (fact_EmployeeSalesPerMonth) showing how many products each employee has sold for this month.
Now, lets say that each product has a category, and I have a requirement from a client to display this category data in a report which pulls data from the warehouse. Let's say the question my client is trying to answer is "What employees are best at selling what categories of what products?".
The category of product from the source system is set by using a drop down list of predefined values. Let's say that the pre-defined values are electronics and hardware. This category text is stored on the dim_Product dimension as a text column.
Now let's say that we add a third category of products (children toys) in our source system, which at the moment, contains no products. My client would like the report to show this category. It's obvious that I'm not bringing through this category data, since there are no products linked to it.
My question is, if this is the requirement. How would I store this data in the warehouse? Where would I store it?
I've considered moving the category data to it's own dimension and then having a category key on the fact table pointing to the category dimension, but I'm not sure this is correct. This means that any fact table I create in the future, that is linked to dim_Product, would need to be linked to the product dimension as well as the product category dimension and have keys pointing to both.
You are addressing several things in your question, so let’s go it through step by step
Product Category Dimension
A product category is an example of a dimension hierarchy. The first thing I’d recommend is to store the product category in the product dimension table as an additional attribute, possible with other attributes such as subcategory, super-category. You may define so a product hierarchy with several levels.
The obvious consequence of this design is that if you want to introduce a new category you need at least one (e.g. dummy) product.
The fact table contains only the product ID.
Reporting of Product Category
If you report on a product level, i.e. with dimensions month, product, category you will need a “non existing” product to fill in the report, so the “dummy” product entry in the dimension table is justified.
To get the dimension entries without usage in the report you may either integrate it in the reporting query or made an additional query – “what was not used?”. Which is better depend on your mileage, in case that you considers product IDs in the fact table that are not defined in the dimension table too, you will end in a full outer join (which could affect the performance) so you could find the latter option (with extra query – product from the dimension table that does NOT EXISTS in the fact table) more flexible.
If you report often only on the category level (without a product), you may find useful to define a category table. Especially if the category has other attributes such as description it is more convenient to have it in a dedicated table that to recover it from the product dimension with a DISTINCT query.
Storing the category ID in the Fact Table
The driver for this decision is in the dynamic of you product hierarchy. If there are changes in the product categories over time this approach delivers out of the box history of the attribute. You are able after the re-assignment of the product category to report the “correct” category in which the product was sold. (But you may also report all sales with the new category, simple ignoring the entry from the fact table and taking the category from the product dimension). The point here is IMO not in the decision if there is a category dimension, but if it is required to maintain a history of the product attributed (here the catrogory).
So if you read my answer and summaries what I recommend, finding that for the topics
Extra category dimension table
Storing category key in fact table
Outer join or extra query to find not used dimensions
the answer is it depends – you got it!
I would make product category as an standalone dimension only if it is frequently used without a product list. If it is not frequently used or if it is mostly used with product list together, then it would make sense to make it an attribute instead.
Standalone dimensions are generally faster(I saw 50% reduction in query times when such attributes are used) but also cost more space/load time.
I saw about 7% size increase in a very big cube we had per such new dimension.
I would avoid them if they are not frequently used. It is about finding the correct balance. In my case, I had 50+ such attributes in one of the dimension and that would have made the cube a lot larger and it is already a large cube anyway.
By the way, by making it a part of Product dimension, a query bringing products corresponding to a category can be resolved by using autoexists instead of fact retrieves.

Relationship database design - object specific many to many, do I solve with self join table or new table

Being new to relational database design, I am trying to clarify one piece of information to properly design this database. Although I am using Filemaker as the platform, I believe this is a universal question.
Using the logic of ideally having all one to many relationships, and using separate tables or join tables to solve these.
I have a database with multiple products, made by multiple brands, in multiple product categories. I also want this to be as scale-able as possible when it comes to reporting, being able to slice and dice the data in as many ways as possible since the needs of the users are constantly changing.
So when I ask the question "Does each Brand have multiple products" I get a yes, and "Does each product have multiple brands" the answer is no. So this is a one to many relationship, but it also seems that a self-join table might give me everything that I need.
This methodology also seems to go down a rabbit hole for other "product related" information such as product category, each product is tied to one product category, but only one product category is related to a product.
So I see 2 possibilities, make three tables and join them with primary and foreign keys, one for Brand, one for Product Category, and one for Products.
Or the second possibility is to create one table that has the brand and product category and product info all in one table (since they are all product related) and simply do self-joins and other query based tables to give me the future reporting requirements that will be changing over time.
I am looking for input from experiences that might point me in the right direction.
Thanks in advance!
Could you ever want to store additional information about a brand (company URL, phone number, etc.) or about a product category (description, etc.)?
If the answer is yes, you definitely want to use three tables. If you don't, you'll be repeating all that information for every single item that belongs to the same brand or same category.
If the answer is no, there is still an advantage to using three tables - it will prevent typos or other spelling inconsistencies from getting into your database. For example, it would prevent you from writing a brand as "Coca Cola" for some items and as "Coca-Cola" for other items. These inconsistencies get harder and harder to find and correct as your database grows. By having each brand only listed once in it's own table, it will always be written the same way.
The disadvantage of multiple tables is the SQL for your queries is more complicated. There's definitely a tradeoff, but when in doubt, normalize into multiple tables. You'll learn when it's better to de-normalize with more experience.
I am not sure where do you see a room for a self-join here. It seems to me you are saying: I have a table of products; each product has one brand and one (?) category. If that's the case then you need either three tables:
Brands -< Products >- Categories
or - in Filemaker only - you can replace either or both the Brands and the Categories tables with a value list (assuming you won't be renaming brands/categories and at the expense of some reporting capabilities). So really it depends on what type of information you want to get out in the end.
If you truly want your solution to be scalable you need to parse and partition your data now. Otherwise you will be faced with the re-structuring of the solution down the road when the solution grows in size. You will also be faced with parsing and relocating the data to new tables. Since you've also included the SQL and MySQL tags if you plan on connecting Filemaker to an external data source then you will definitely need to up your game structurally.
Building everything in one table is essentially using Filemaker to do Excel work and it won't cut it if you are connecting to SQL, MySQL, etc.
Self join tables are a great tool. However, they should really only be used for calculating small data points and should not be used as pivot points or foundations for your reporting features. It can grow out of control as time goes on and you need to keep your backend clean.
Use summary and sub-summary reporting features to slice product based data.
For retail and general product management solutions, whether it's Filemaker/SQL/or whatever the "Brand" or "Vendor" is it's own table. Then you would have a "Products" table (the match key being the "Brand ID").
The "Product Category" field should be a field in the "Products" table. You can manage the category values by building a standard value list or building a value list based on a "Product Category" table. The second scenario is better for long term administration.

Database design issue - trying to avoid circular reference

I have been at this for a day and a bit now, trying to figure out how to best model the database (MySQL) for an app I'm developing for a friend who owns a bakery. The assumptions are as follows:
many (external) Bakers produce many Products
BakersProducts is updated fortnightly by certain staff who either call bakers for their product prices, or the bakers fax through their pricelist themselves, which the staff then update via a front-end UI.
the manager should be able to generate an order based on the products that she anticipates having.
So the front-end UI must be able to allow the manager to purely choose the products she would like in the order, and then present her with a list of Bakers to choose from for each product in the order.
In other words, Orders_has_Products should also include a reference to BakersProducts.bpID. I'm sure though that if I do this, then I would create a circular reference (of sort) to Products.
Im sure I've gone about this the wrong way, and would really appreciate anyone's advice as to how I can restructure my design to acccommodate the chosen Product Price - ie. to include BakersProducts.bpID.
Thank you!
This is not a circular reference, since
Order_has_products references Products
Order_has_products references BakersProducts
BakersProducts references Products
a circular reference would be if, for example,
Order_has_products references Products
Products references BakersProducts
BakersProducts references Order_has_products
Aside from that, circular references are relatively normal in a database (i.e. Employees table with a manager field, where the manager is herself an employee is a one table circular reference)
What your design has is a simple redundancy, because one product is referenced twice in the Order_has_products table - once directly from the Products table, and once via the related BakersProducts record. There is posibility for getting out of sync, but, since you stated that the business rule is that the product is chosen before the baker, it's quite all right.
I would include the productID even if it was the other way around, because a little denormalization can go a long way when speeding up queries, because otherwise you would have to scan the BakersProducts table, even for simle questions, like, 'Did we have any bagels on wednesday?'
I think the mix-up is from a business process standpoint: you're getting requisitions mixed up with orders.
A requisition has a list of products needed without necessarily specifying the supplier of each, whereas an order is directed at a specific supplier, for specific price look-up codes (what bpID seems to represent). One requisition may spawn multiple orders if it is split across multiple suppliers, and even a single product may have its order split across multiple suppliers, perhaps due to vendor volume limits or locality of delivery.
You may want to provide a view of a requisition that shows the order line item(s) generated from each requisition line item, but that is a user interface concern.
one way of solving this is simply to eliminate the Products table, and move productName into the BakersProducts table.
This would essentially only work if you do not expect the bakers to carry the same product, if the products are unique to the bakers.
If you do expect the bakers to carry the same product, then you may want to leave the separate Products table, but instead of having Order_has_Products.Products_productID, I would change it to Order_has_Products.bpID. If/when you need to access the productName (or other product related metadata that may go in that table) you could just do a join between BakersProducts and Products.