Should I move a dimension attribute to it's own dimension?

Should I move a dimension attribute to it's own dimension? - ssis

The question I have below is hypothetical, I'm using SSAS:
Let's say I have a dimension (dim_Product) representing products I sell in a store, I have another dimension (dim_Employee) representing the company's employees and I have a fact table (fact_EmployeeSalesPerMonth) showing how many products each employee has sold for this month.
Now, lets say that each product has a category, and I have a requirement from a client to display this category data in a report which pulls data from the warehouse. Let's say the question my client is trying to answer is "What employees are best at selling what categories of what products?".
The category of product from the source system is set by using a drop down list of predefined values. Let's say that the pre-defined values are electronics and hardware. This category text is stored on the dim_Product dimension as a text column.
Now let's say that we add a third category of products (children toys) in our source system, which at the moment, contains no products. My client would like the report to show this category. It's obvious that I'm not bringing through this category data, since there are no products linked to it.
My question is, if this is the requirement. How would I store this data in the warehouse? Where would I store it?
I've considered moving the category data to it's own dimension and then having a category key on the fact table pointing to the category dimension, but I'm not sure this is correct. This means that any fact table I create in the future, that is linked to dim_Product, would need to be linked to the product dimension as well as the product category dimension and have keys pointing to both.

You are addressing several things in your question, so let’s go it through step by step
Product Category Dimension
A product category is an example of a dimension hierarchy. The first thing I’d recommend is to store the product category in the product dimension table as an additional attribute, possible with other attributes such as subcategory, super-category. You may define so a product hierarchy with several levels.
The obvious consequence of this design is that if you want to introduce a new category you need at least one (e.g. dummy) product.
The fact table contains only the product ID.
Reporting of Product Category
If you report on a product level, i.e. with dimensions month, product, category you will need a “non existing” product to fill in the report, so the “dummy” product entry in the dimension table is justified.
To get the dimension entries without usage in the report you may either integrate it in the reporting query or made an additional query – “what was not used?”. Which is better depend on your mileage, in case that you considers product IDs in the fact table that are not defined in the dimension table too, you will end in a full outer join (which could affect the performance) so you could find the latter option (with extra query – product from the dimension table that does NOT EXISTS in the fact table) more flexible.
If you report often only on the category level (without a product), you may find useful to define a category table. Especially if the category has other attributes such as description it is more convenient to have it in a dedicated table that to recover it from the product dimension with a DISTINCT query.
Storing the category ID in the Fact Table
The driver for this decision is in the dynamic of you product hierarchy. If there are changes in the product categories over time this approach delivers out of the box history of the attribute. You are able after the re-assignment of the product category to report the “correct” category in which the product was sold. (But you may also report all sales with the new category, simple ignoring the entry from the fact table and taking the category from the product dimension). The point here is IMO not in the decision if there is a category dimension, but if it is required to maintain a history of the product attributed (here the catrogory).
So if you read my answer and summaries what I recommend, finding that for the topics
Extra category dimension table
Storing category key in fact table
Outer join or extra query to find not used dimensions
the answer is it depends – you got it!

I would make product category as an standalone dimension only if it is frequently used without a product list. If it is not frequently used or if it is mostly used with product list together, then it would make sense to make it an attribute instead.
Standalone dimensions are generally faster(I saw 50% reduction in query times when such attributes are used) but also cost more space/load time.
I saw about 7% size increase in a very big cube we had per such new dimension.
I would avoid them if they are not frequently used. It is about finding the correct balance. In my case, I had 50+ such attributes in one of the dimension and that would have made the cube a lot larger and it is already a large cube anyway.
By the way, by making it a part of Product dimension, a query bringing products corresponding to a category can be resolved by using autoexists instead of fact retrieves.

Related

Stock management system approach

I am currently working on a project relating to a medicine stock management system on vb.net.
Basically I have 3 tables in a MySQL database that I will link to my program; orders, current stock, and medicine.
Each order has an autoincrementing order reference, delivery date, units ordered and the reference number of the medicine that has been ordered.
The stock table contains all the medicine names which are in stock, how many units are in stock, the cost price and the retail price.
Finally, each medicine has a reference, a name, and a supplier name.
The tasks I would like to perform throughout my program are:
1- Store and add medicines to the system
2- create, edit and view orders
3- view medicines in stock and the amount of units present
4- search for a specific field in each of these tables
I am quite new to object oriented programming and Vb.net so I would like to know what is the best approach to design this program?
1- Windows form based application with no inheritance seeing that I have only 1 type of product (separate classes for everything)
2- Windows form based but with inheritance and an interface
3- any other more efficient approach?
If I were to choose option 2 I would require just a few guidance tips on what my baseclass should probably be.
Thank you

Well, technically speaking, this is not a stock management system only, if you are including orders. Stock is only the part taking care about stocking items.
What you look for, in a nutshell, is probably:
(Purchase)Orders: Handle their logic separately from stock logic. You will need Orders (List of orders) and OrdersLines tables. I'm just guessing, that you mean Purchase Orders.
(Customer)Orders - you will need similar for Customer Orders, if you don't sell the goods in shop, but to a partners per Invoices.
Item: Table Item - ut will hold details of each medicine - columns like, ItemNo, Name, Description, OrderCode, VendorReference, ReferencePicture, Price (if you have different prices for different quantities, you will need another separate table ItemPrices with ID linked to foreign key of Items), etc.
Stock: Tables StockCards (each linked to Item, it is to store data like minimum, maximum a and actual stock level, you might pre-define stock location), StockRecords (to record movements of goods in and out of stock), you can have also a separate StockLocations
And as for interface, I reccomend to do a List and Detail VB.NET form for each table. List will contain list of items and filters to find what you want. The Detail page will allow to show all the deatails and edit them. You can then load the forms into i.e. TabControl in your main application. And combine them, i.e. put a List into left panel of SplitContainer and detail into right one, and use DataGridView's CellClick to load item into the Detail module.

SSAS many to many dimension hierarchy

First of all, this is very, very simple data warehouse that I made only to ask following, specific question.
Scenario:
I have one fact table FactSales, and 2 dimensions: DimShop and DimProduct, and they are both separated from each other and directly connected to the fact table. some shops can sell selected products and vice versa, some products can be selled in specific shops. This give us many to many relationship. The problem is when I try to slice my cube i get all combinations between shops and products.
Question:
How can I create hierarchy between two separated dimensions in SSAS with many to many relationship? i tried to use brigde table but i was unable to configure hierarchy in SSAS. Is it even possible?

If you're trying to report on "what can happen" rather than "what did happen", you need a separate fact table & cube to represent the relationship between products and the shops that can sell the products. It's not really a hierarchy since it's many to many.
A simple cross reference fact should be fine:
FACT_PRODUCT_SHOP
ProductID
ShopID
Then when doing reports that want to see what products are allowed to be sold in what stores, you can use this fact table. The sales fact only shows "what actually happens".
You can even modify this fact to be your Inventory fact table, just adding a date and "In Stock amount" and "On order amount" etc..

It is possible to implement such a design but it may not perform well.
Basically instead of product and shop key in the fact table, you need an alternative key.
This key will be the unique combination of products and shops. That needs to be prepared in the ETL.
In a new dimension named "Shops and Products", on top of this key, you can create 2 hierarchies Product and Shop in the same dimension.
Additionaly, you can also create an unnatural hierarchy as you requested. But since it is an unnatural hierarchy, it may not perform well.
So in addition to Product and Shop hierarchies, you can provide following unnatural hierarchies: Shop -> Product, Product -> Shop.

How to build database for variant management in a webshop

I am searching for a guideline on how to set up my database for a auction side.
My problem is, that there is a lot of different product types - let's say paintings, clothes, computers etc. They have different specifications, and it should be possible to set just Product A in size L on auction - or the whole stock of Product B e.g.
How should I build my database for optimal performance - and coding - in this case?

I would suggest the following database/object structure:
[Auction] n..1 [Category] 1..n [Variation Attribute] 1..n [Attribute Value]
An auction then has a category and several attribute values referring the variation attribute as well:
[Auction] = [Category], [Name], [Description]
[Auction_AttrVal] = [AuctionID], [VarAttrID], [AttrValID]
First of all you can have some kind of category table, which holds items like "Paintings", "Clothes", "Computers". An auction / product is assigned to one category.
Each category then defines variation attributes for this specific category. An example would be "Size" for the category "Clothes" or "CPU" for the category "Computers". You can also add predefined values for the variation attributes to limit the number of variations and avoid differentiations like "3GhZ" vs "3 GhZ".
This mechanism also allows for easy filtering of search results. You select a category and simply load all variation attributes as filters (or add a flag to an attribute to declare it as such) and offer the values for filtering to the end-user.
Furthermore you can make variation attributes for a category mandatory to force users who create the auctions (I'm assuming it's Consumer-to-Consumer) to provide sufficient information for their auction.
The code will probably be quite generic and simple. The database structure is highly flexible and extensible. Performance is much better than having all in one table. You probably should create an index (for the field AuctionID) for the Auction_AttrVal table. Please let me know if the database structure is not explained properly.

Database design issue - trying to avoid circular reference

I have been at this for a day and a bit now, trying to figure out how to best model the database (MySQL) for an app I'm developing for a friend who owns a bakery. The assumptions are as follows:
many (external) Bakers produce many Products
BakersProducts is updated fortnightly by certain staff who either call bakers for their product prices, or the bakers fax through their pricelist themselves, which the staff then update via a front-end UI.
the manager should be able to generate an order based on the products that she anticipates having.
So the front-end UI must be able to allow the manager to purely choose the products she would like in the order, and then present her with a list of Bakers to choose from for each product in the order.
In other words, Orders_has_Products should also include a reference to BakersProducts.bpID. I'm sure though that if I do this, then I would create a circular reference (of sort) to Products.
Im sure I've gone about this the wrong way, and would really appreciate anyone's advice as to how I can restructure my design to acccommodate the chosen Product Price - ie. to include BakersProducts.bpID.
Thank you!

This is not a circular reference, since
Order_has_products references Products
Order_has_products references BakersProducts
BakersProducts references Products
a circular reference would be if, for example,
Order_has_products references Products
Products references BakersProducts
BakersProducts references Order_has_products
Aside from that, circular references are relatively normal in a database (i.e. Employees table with a manager field, where the manager is herself an employee is a one table circular reference)
What your design has is a simple redundancy, because one product is referenced twice in the Order_has_products table - once directly from the Products table, and once via the related BakersProducts record. There is posibility for getting out of sync, but, since you stated that the business rule is that the product is chosen before the baker, it's quite all right.
I would include the productID even if it was the other way around, because a little denormalization can go a long way when speeding up queries, because otherwise you would have to scan the BakersProducts table, even for simle questions, like, 'Did we have any bagels on wednesday?'

I think the mix-up is from a business process standpoint: you're getting requisitions mixed up with orders.
A requisition has a list of products needed without necessarily specifying the supplier of each, whereas an order is directed at a specific supplier, for specific price look-up codes (what bpID seems to represent). One requisition may spawn multiple orders if it is split across multiple suppliers, and even a single product may have its order split across multiple suppliers, perhaps due to vendor volume limits or locality of delivery.
You may want to provide a view of a requisition that shows the order line item(s) generated from each requisition line item, but that is a user interface concern.

one way of solving this is simply to eliminate the Products table, and move productName into the BakersProducts table.
This would essentially only work if you do not expect the bakers to carry the same product, if the products are unique to the bakers.
If you do expect the bakers to carry the same product, then you may want to leave the separate Products table, but instead of having Order_has_Products.Products_productID, I would change it to Order_has_Products.bpID. If/when you need to access the productName (or other product related metadata that may go in that table) you could just do a join between BakersProducts and Products.

Merge two results in a MySQL query if the records are related by a field value

We have a products table. Users can create new products as copies of existing products.
Instead of simply duplicating this data, we're thinking in order to minimize database size, we would store only the differences from the "parent" product. (were talking thousands of products)
My thinking is that, for each new "child" product, we create a new record in that same table which has a "parent" field which has the ID of the parent product.
So, when querying for the "child" product, is there a way to merge the results so that any empty fields in the child record will be taken from the parent?
(I hope this makes sense)

Yes, you can do this.
Say for example Your table name is Product and you want to retrieve name of child product, Then you can query as,
select IF(c.productName = '',p.productName,c.productName) as childProductName
from Products p,Products c
where c.ID = p.ParentID
Similarly you can do this for other fields.

I would anticipate that you'd want to have child products of child products (e.g. product C is based on product B, which is in turn based on product A.) And there would be children of those and so on (especially with user generated content.) This could get out of hand very quickly and require you to make either long cumbersome queries or collect the data with code rather than SQL queries.
I'm just offering this as a consideration because the saving is size often yield a cost of processing time. Just be sure you consider this before you jump into something that can't easily be undone.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008