Data From MULTIPLE Fact Tables in ONE Report (one Table) - business-objects

If every fact table in BusinessObjects has to have one assigned context (1 context per 1 fact table), then it shouldn't be possible to show the data (measures) from TWO fact tables in ONE REPORT (ie. in ONE TABLE). Is that right?

Actually, you can select objects from different contexts in the same data provider, if you have the option Multiple SQL Statements for each context enabled (located in the SQL Options of the Data Foundation). In that case, it will generate a separate SQL query for each context and combine them in the microcube.
Another option is to create two different data providers within the same document and merge the common dimensions (it may do this automatically if you have auto-merging turned on). Thus, you'll be able to combine the two measures in a table with the merged dimensions from both fact tables.
I wouldn't state that every fact table has to have a context as a hard rule. I only use contexts when I need to: e.g. to prevent a loop or to solve a chasm trap. Other than that, there is no hard rule that requires the use of contexts. Just like aliases, they're techniques/features to help you define you data model and to overcome data modelling issues.

You can, in fact, use measures from multiple fact tables in the same block, and this is one of the strengths of Business Objects. If the two contexts share common dimensions, you can analyze the measures along the same axes. The fact that two contexts and two SQL statements are used to retrieve the information is completely transparent to the end user.

Related

Implementing inheritance in MySQL: alternatives and a table with only surrogate keys

This is a question that has probably been asked before, but I'm having some difficulty to find exactly my case, so I'll explain my situation in search for some feedback:
I have an application that will be registering locations, I have several types of locations, each location type has a different set of attributes, but I need to associate notes to locations regardless of their type and also other types of content (mostly multimedia entries and comments) to said notes. With this in mind, I came up with a couple of solutions:
Create a table for each location type, and a "notes" table for every location table with a foreign key, this is pretty troublesome because I would have to create a multimedia and comments table for every comments table, e.g.:
LocationTypeA
ID
Attr1
Attr2
LocationTypeA_Notes
ID
Attr1
...
LocationTypeA_fk
LocationTypeA_Notes_Multimedia
ID
Attr1
...
LocationTypeA_Notes_fk
And so on, this would be quite annoying to do, but after it's done, developing on this structure should not be so troublesome.
Create a table with a unique identifier for the location and point content there, like so:
Location
ID
LocationTypeA
ID
Attr1
Attr2
Location_fk
Notes
ID
Attr1
...
Location_fk
Multimedia
ID
Attr1
...
Notes_fk
As you see, this is far more simple and also easier to develop, but I just don't like the looks of that table with only IDs (yeah, that's truly the only objection I have to this, it's the option I like the most, to be honest).
Similar to option 2, but I would have an enormous table of attributes shaped like this:
Location
ID
Type
Attribute
Name
Value
And so on, or a table for each attribute; a la Drupal. This would be a pain to develop because then it would take several insert/update operations to do something on a location and the Attribute table would be several times bigger than the location table (or end up with an enormous amount of attribute tables); it also has the same issue of the surrogate-keys-only table (just it has a "type" now, which I would use to define the behavior of the location programmatically), but it's a pretty solution.
So, to the question: which would be a better solution performance and scalability-wise?, which would you go with or which alternatives would you propose? I don't have a problem implementing any of these, options 2 and 3 would be an interesting development, I've never done something like that, but I don't want to go with an option that will collapse on itself when the content grows a bit; you're probably thinking "why not just use Drupal if you know it works like you expect it to?", and I'm thinking "you obviously don't know how difficult it is to use Drupal, either that or you're an expert, which I'm most definitely not".
Also, now that I've written all of this, do you think option 2 is a good idea overall?, do you know of a better way to group entities / simulate inheritance? (please, don't say "just use inheritance!", I'm restricted to using MySQL).
Thanks for your feedback, I'm sorry if I wrote too much and meant too little.
ORM systems usually use the following, mostly the same solutions as you listed there:
One table per hierarchy
Pros:
Simple approach.
Easy to add new classes, you just need to add new columns for the additional data.
Supports polymorphism by simply changing the type of the row.
Data access is fast because the data is in one table.
Ad-hoc reporting is very easy because all of the data is found in one table.
Cons:
Coupling within the class hierarchy is increased because all classes are directly coupled to the same table.
A change in one class can affect the table which can then affect the other classes in the hierarchy.
Space potentially wasted in the database.
Indicating the type becomes complex when significant overlap between types exists.
Table can grow quickly for large hierarchies.
When to use:
This is a good strategy for simple and/or shallow class hierarchies where there is little or no overlap between the types within the hierarchy.
One table per concrete class
Pros:
Easy to do ad-hoc reporting as all the data you need about a single class is stored in only one table.
Good performance to access a single object’s data.
Cons:
When you modify a class you need to modify its table and the table of any of its subclasses. For example if you were to add height and weight to the Person class you would need to add columns to the Customer, Employee, and Executive tables.
Whenever an object changes its role, perhaps you hire one of your customers, you need to copy the data into the appropriate table and assign it a new POID value (or perhaps you could reuse the existing POID value).
It is difficult to support multiple roles and still maintain data integrity. For example, where would you store the name of someone who is both a customer and an employee?
When to use:
When changing types and/or overlap between types is rare.
One table per class
Pros:
Easy to understand because of the one-to-one mapping.
Supports polymorphism very well as you merely have records in the appropriate tables for each type.
Very easy to modify superclasses and add new subclasses as you merely need to modify/add one table.
Data size grows in direct proportion to growth in the number of objects.
Cons:
There are many tables in the database, one for every class (plus tables to maintain relationships).
Potentially takes longer to read and write data using this technique because you need to access multiple tables. This problem can be alleviated if you organize your database intelligently by putting each table within a class hierarchy on different physical disk-drive platters (this assumes that the disk-drive heads all operate independently).
Ad-hoc reporting on your database is difficult, unless you add views to simulate the desired tables.
When to use:
When there is significant overlap between types or when changing types is common.
Generic Schema
Pros:
Works very well when database access is encapsulated by a robust persistence framework.
It can be extended to provide meta data to support a wide range of mappings, including relationship mappings. In short, it is the start at a mapping meta data engine.
It is incredibly flexible, enabling you to quickly change the way that you store objects because you merely need to update the meta data stored in the Class, Inheritance, Attribute, and AttributeType tables accordingly.
Cons:
Very advanced technique that can be difficult to implement at first.
It only works for small amounts of data because you need to access many database rows to build a single object.
You will likely want to build a small administration application to maintain the meta data.
Reporting against this data can be very difficult due to the need to access several rows to obtain the data for a single object.
When to use:
For complex applications that work with small amounts of data, or for applications where you data access isn’t very common or you can pre-load data into caches.

How to decide between row based and column based table structures?

I've some data set, which has hundreds of parameters (with more coming in)
If I dump them in one table, it'll probably end up having hundreds of columns (and I am not even sure how many, at this point)
I could do row based, with a bunch of meta tables, but somehow row based structure feels unintuitive
One more way would be to keep column based, but have multiple tables (split the tables logically) which seems like a good solution.
Is there any other way to do it? If yes, could you point me to some tutorial? (I am using mysql)
EDIT:
based on the answers, I should clarify one thing - updates and deletes are going to be much lesser, than inserts and selects. as it is, selects are going to be the bulk of the operations, so selects have to be fast.
I ran across several designs where a #4 was possible:
Split your columns into searchable and auxiliary
Define a table with only searchable columns, and an extra BLOB column
Put everything in one table: searchable columns go as-is, auxiliary go as a BLOB
We used this approach with BLOBs of XML data or even binary data, representing the entire serialized object. The downside is that your auxiliary columns remain non-searchable for all practical purposes. The upside is that you can add new auxiliary columns at will without changing the schema. You can also make schema changes to make previously auxiliary columns searchable with a schema change and a very simple program.
It all depends on the kind of data you need to store.
If it's not "relational" at all - for instance, a collection of web pages, documents, etc - it's usually not a good fit for a relational database.
If it's relational, but highly variable in schema - e.g. a product catalogue - you have a number of options:
single table with every possible column (your option 1)
"common" table with the attributes that each type shares, and joined tables for attributes for subtypes
table per subtype
If the data is highly variable and you don't want to make schema changes to accommodate the variations, you can use "entity-attribute-value" or EAV - though this has some significant drawbacks in the context of relational database. I think this is what you have in mind with option 2.
If the data is indeed relational, and there is at least the core of a stable model in the data, you could of course use traditional database design techniques to come up with a schema. That seems to correspond with option 3.
Does every item in the data set have all those properties? If yes, then one big table might well be fine (although scary-looking).
On the other hand, perhaps you can group the properties. The idea being that if an item has one of the properties in the group, then it has all the properties in that group. If you can create such groupings, then these could be separate tables.
So should they be separate? Yes, unless you can prove that the cost of performing joins is unacceptable. Perform all SELECTs via stored procedures and you can denormalise later on without much trouble.

How many columns in table to keep? - MySQL

I am stuck between row vs columns table design for storing some items but the decision is which table is easier to manage and if columns then how many columns are best to have? For example I have object meta data, ideally there are 45 pieces of information (after being normalized) on the same level that i need to store per object. So is 45 columns in a heavry read/write table good? Can it work flawless in a real world situation of heavy concurrent read/writes?
If all or most of your columns are filled with data and this number is fixed, then just use 45 fields. It's nothing inherently bad with 45 columns.
If all conditions are met:
You have a possibility of the the attributes which are neither known nor can be predicted at design time
The attributes are only occasionally filled (say, 10 or less per entity)
There are many possible attributes (hundreds or more)
No attribute is filled for most entities
then you have a such called sparce matrix. This (and only this) model can be better represented with an EAV table.
"There is a hard limit of 4096 columns per table", it should be just fine.
Taking the "easier to manage" part of the question:
If the property names you are collecting do not change, then columns is just fine. Even if it's sparsely populated, disk space is cheap.
However, if you have up to 45 properties per item (row) but those properties might be radically different from one element to another then using rows is better.
For example taking a product catalog. One product might have color, weight, and height. Another might have a number of buttons or handles. These are obviously radically different properties. Further this type of data suggests that new properties will be added that might only be related to a particular set of products. In this case, rows is much better.
Another option is to go NoSql and utilize a document based database server. This would allow you to set the named "columns" on a per item basis.
All of that said, management of rows will be done by the application. This will require some advanced DB skills. Management of columns will be done by the developer at design time; which is usually easier for most people to get their minds around.
I don't know if I'm correct but I once read in MySQL to keep your table with minimum columns IF POSSIBLE, (read: http://dev.mysql.com/doc/refman/5.0/en/data-size.html ), do NOTE: this is if you are using MySQL, I don't know if their concept applies to other DBMS like oracle, firebird, posgresql, etc.
You could take a look at your table with 45 column and analyze what you truly need and leave the optional fields into other table.
Hope it helps, good luck

Database design for recursive children

This design problem is turning out to be a bit more "interesting" than I'd expected....
For context, I'll be implementing whatever solution I derive in Access 2007 (not much choice--customer requirement. I might be able to talk them into a different back end, but the front end has to be Access (and therefore VBA & Access SQL)). The two major activities that I anticipate around these tables are batch importing new structures from flat files and reporting on the structures (with full recursion of the entire structure). Virtually no deletes or updates (aside from entire trees getting marked as inactive when a new version is created).
I'm dealing with two main tables, and wondering if I really have a handle on how to relate them: Products and Parts (there are some others, but they're quite straightforward by comparison).
Products are made up of Parts. A Part can be used in more than one Product, and most Products employ more than one Part. I think that a normal many-to-many resolution table can satisfy this requirement (mostly--I'll revisit this in a minute). I'll call this Product-Part.
The "fun" part is that many Parts are also made up of Parts. Once again, a given Part may be used in more than one parent Part (even within a single Product). Not only that, I think that I have to treat the number of recursion levels as effectively arbitrary.
I can capture the relations with a m-to-m resolution from Parts back to Parts, relating each non-root Part to its immediate parent part, but I have the sneaking suspicion that I may be setting myself up for grief if I stop there. I'll call this Part-Part. Several questions occur to me:
Am I borrowing trouble by wondering about this? In other words, should I just implement the two resolution tables as outlined above, and stop worrying?
Should I also create Part-Part rows for all the ancestors of each non-root Part, with an extra column in the table to store the number of generations?
Should Product-Part contain rows for every Part in the Product, or just the root Parts? If it's all Parts, would a generation indicator be useful?
I have (just today, from the Related Questions), taken a look at the Nested Set design approach. It looks like it could simplify some of the requirements (particularly on the reporting side), but thinking about generating the tree during the import of hundreds (occasionally thousands) of Parts in a Product import is giving me nightmares before I even get to sleep. Am I better off biting that bullet and going forward this way?
In addition to the specific questions above, I'd appreciate any other comentary on the structural design, as well as hints on how to process this, either inbound or outbound (though I'm afraid I can't entertain suggestions of changing the language/DBMS environment).
Bills of materials and exploded parts lists are always so much fun. I would implement Parts as your main table, with a Boolean field to say a part is "sellable". This removes the first-level recursion difference and the redundancy of Parts that are themselves Products. Then, implement Products as a view of Parts that are sellable.
You're on the right track with the PartPart cross-ref table. Implement a constraint on that table that says the parent Part and the child Part cannot be the same Part ID, to save yourself some headaches with infinite recursion.
Generational differences between BOMs can be maintained by creating a new Part at the level of the actual change, and in any higher levels in which the change must be accomodated (if you want to say that this new Part, as part of its parent hierarchy, results in a new Product). Then update the reference tree of any Part levels that weren't revised in this generational change (to maintain Parts and Products that should not change generationally if a child does). To avoid orphans (unreferenced Parts records that are unreachable from the top level), Parts can reference their predecessor directly, creating a linked list of ancestors.
This is a very complex web, to be sure; persisting tree-like structures of similarly-represented objects usually are. But, if you're smart about implementing constraints to enforce referential integrity and avoid infinite recursion, I think it'll be manageable.
I would have one part table for atomic parts, then a superpart table with a superpartID and its related subparts. Then you can have a product/superpart table.
If a part is also a superpart, then you just have one row for the superpartID with the same partID.
Maybe 'component' is a better term than superpart. Components could be reused in larger components, for example.
You can find sample Bill of Materials database schemas at
http://www.databaseanswers.org/data_models/
The website offers Access applications for some of the models. Check with the author of the website.

Multiple Mappers for the same class in different databases

I am currently working on a Wikipedia API which means that we have a
database for each language we want to use. The structure of each
database is identical, they only differ in their language. The only
place where this information is stored is in the name of the database.
When starting with one language the straight forward approach to use a
mapping between the tables to needed classes (e.g. Page) looked fine.
We defined an engine and corresponding metadata. When we added a
second
database with its own setup for engine and metadata we ran into the
following error:
ArgumentError:
Class '<class 'wp.orm.types.pages.Page'>' already has a primary mapper defined.
Use non_primary=True to create a non primary Mapper.clear_mappers() will remove
*all* current mappers from all classes.
I found an email saying that there must be at least one primary
mapper, so using this option for all databases doesn't seem feasible.
The next idea is to use sharding. For that we need a way to
distinguish
between the databases from the perspective of an instance, as noted in
the docs:
"You need a function which can return
a single shard id, given an instance
to be saved; this is called
"shard_chooser"
I am stuck here. Is there a way to get the database name given an
Object
it is loaded from? Or a possibility to add a static attribute based on
the engine? The alternative would be to add a language column to every
table which is just ugly.
Am I overseeing other possibilities? Any ideas how to define multiple
mappers for the same class, that map against tables in different
databases?
I asked this question on a mailing list and got this answer by Michael Bayer:
if you'd like distinct classes to
indicate that they "belong" in a
different database, and you have very
clear lines as to how this is
performed, use the "entity_name"
concept described at
http://www.sqlalchemy.org/trac/wiki/UsageRecipes/EntityName
. this sounds very much like your use
case.
The next idea is to use sharding. For that we need a way to
distinguish
between the databases from the perspective of an instance, as noted
in
the docs:
"You need a function which can return a single shard id, given an
instance to be saved; this is called "shard_chooser"
horizontal sharding is a method of
storing many homogeneous instances
across multiple databases, with the
implication that you're creating one
big "virtual" database among
partitions - the main concept is
that an individual instance gets
placed in different partitions based
on some ruleset. This is a little
like your use case as well but since
you have a very simple delineation i
think the "entity name" approach is
easier.
So the basic idea is to generate anonymous subclasses for each desired mapping which are distinguished by the Entity_Name. The details can be found in Michaels Link