I know it is recommended to build dimension using a view on source table because it allows to make changes to the dimension contents without opening the SSAS project. However, I cannot create a view in the source system.
How can I load only a subset of data into a dimension?
A named query can be used to filter out unnecessary dimension members. Conceptually it is similar to a database view and the source RDBMS will take care of data processing (e.g. filtering it), but it is defined in the SSAS project.
Open the data source view.
Right-click on the dimension source table and choose Replace Table / With New Named Query... from the context menu.
Add a WHERE clause to the query and make any other changes you need:
Related
We are needing to build a huge amount of reports, but alot of the metrics and fields are duplicated. Is it possible to build 1 (or a few) report to incorporate everything, and then based on what 'report' is requested to be run by the user, it hides/shows certain fields?
e.g. master report contains columns 1-100
user 1 needs to run report A, which uses columns 1-20, so hides columns 21-100
user 2 needs to run report B, which uses columns 21-40, so hides columns 1-20 and columns 41-100
Any help would be appreciated!!
Yes it's possible.
Go to condition explorer and create a string variable based on
#sq($account.defaultName)#
Create values for all possible usernames.
Use this variable as a Style variable and set up report presentation (show/hide columns) for all values.
And think about using groups not named users with
#CSVIdentityNameList()#
It will be more complicated but you won't stick to chosen usernames.
I would create report views pointing to the single report, one for each version of the report you want to run. Each view would pass in a different static value for a common parameter, say 'reportType'. The report would then use this static value to change the output returned to the user. This can be accomplished with hiding columns as Alexey suggested or you can create multiple pages and use a render variable that tests the value of the 'reportType' parameter and renders the appropriate page. The benefits of this approach as opposed to hiding is easier maintenance and a potential performance improvement as non-displayed columns are not retrieved from the data source, especially if the non-displayed columns force expensive unnecessary joins.
I am looking for a scripted/automated way (presumably VBA?) to take an Access query and generate some kind of savable, searchable, publish-able documentation on the data lineage. So if there were a bunch of layered/nested queries, or even passthrough queries, along the way I want a way to trace the final fields in the specified query back until I get back to the original source tables/fields.
Everything I've found seems to do database documentation focused on how the table relationships are configured. I'm looking for a way to get the documentation of the user-created portion, down to the field. I'm very open-minded on what format the output is in. I'm convinced this must be possible, but haven't had any luck yet.
I'm also open to recommendations for a third-party application if it could do this.
Thanks in advance!
Access does have a built in “dependency” feature. The result is a VERY nice tree-view of those dependencies, and you can even launch such objects using that treeview of your application to “navigate” the application so to speak.
The option is found under database tools and is appropriately called Object Dependencies.
The result looks like this:
While you don't want to use auto correct, this feature will force on track changes. If this is a large application, then on first run a significant delay will occur. After that, the results can be viewed instantly. As noted, not only do you have a hierarchical tree view, but objects in the tree view can be clicked on to launch the object in question.
And the above will work for a query that based on a query etc. all the way down to the base table.
https://www.dropbox.com/sh/f73rs3h9u9q2xk5/AAArloN_Cmf_WbPZ4W75I6KVa?dl=0
This is a set of queries I wrote to provide the kind of documentation you're looking for. It seems a bit kludgy, but it works for me. It's not a simple as the other response, but it provides output that can be incorporated into other documentation.
Note - the documentation is out of date with respect to Union queries. The query I have to analyze Union queries seems to only pick up the 1st two things that go into the Union, so I changed this to a Make Table query, and manually edit the resulting table to add the missing relationships.
To use the queries:
Copy the table and all the queries into your database
Run the "Mapping Unions Make Table" query
Manually edit the Unions table if necessary
When you run any of the 3 main output queries, you are prompted for the Top object you want to analyze. Enter the name of a query or table to find all the dependencies for that object. The three main outputs are:
Mapping Summary - lists all of the objects that go into the top object and all of the objects that go into them, to a depth of about 10 (depth is controlled in the "Mapping all parents" query)
Mapping summary without duplicates
Mapping summary duplicates
I especially like the 2nd output - this is in a format that can be saved in Excel and input to Visio's Org Chart Wizard to get a simple graphical representation of the relationships. Then the 3rd output query can be used to manually add in the queries that go into more than one other query, which Visio's wizard cannot handle.
I am in the process of setting up a new cube. This cube is going to be very similar to a cube that exists except it will contain only data where the date is not in the past.
The current table has data that is past and present. 1 idea was to create a database view on this table and include any rows where the date was >= getdate() but I don't think you can select a database view when setting up a cube, is this right? Another option would be to create a new database table that includes only those records with a present or future date.
The final option would be to filter the current cube but I think I would prefer a fresh cube with only this data as it will be predominantly used by users in excel pivot tables so I want to avoid any filters/mdx if possible.
What would be the way of achieving this?
Thanks
You can most definitely add a view in your SSAS DSV. You can select views or tables in the object chooser dialog box. You can also write a named query. Many people advise that you should always use views in your DSV for your cube. It creates a layer between the cube and the physical tables.
Another approach is to use the existing cube as a source for a Power Pivot model and filter the dates in the data model in Excel and then provide that model/Excel file to your users.
Not sure if there's any way to do this, but we're trying to programmatically determine dependencies in our ETL process, specifically whether modifying a column in our source data set will impact our ETLs and if so, which ones, ie. with a package 'myPackage' containing a data flow task that draws from 'sourceTable' and includes various columns including 'column1' and ultimately loads 'destinationTable' with 'column1New' is there any way to query the SSIS package itself to determine that column1New is based on column1 (does lineage provide anything of use here?)
Each column you use in a transformation of your package is associated an ID. The next component to which the column is passed down to will refer to that column using the lineage ID property, but is given a new id.
You could query the XML of your package to trace the path a column takes by creating a map of these IDs. However, this might be difficult to implement in a stable way.
This might help you on your way:
http://blogs.msdn.com/b/helloworld/archive/2008/08/01/how-to-find-out-which-column-caused-ssis-to-fail.aspx
What is the best way to re-use reports on different tables / datasets?
I have a number of reports built in BIRT, which get their data from a flat (un-normalized) MySQL table, the data which in turn has been imported from an excel sheet.
In BIRT, I've constructed my query like this, such that I can change the field names and re-use the report:
SELECT * FROM
(SELECT index as "Index", name as "Name", param1 as "First Parameter" FROM mytable) t
However, then when I switch to a new client's data, I need to change the query to the new data source and this doesn't seem sustainable or anywhere near a good practice.
So... what is a good practice?
Is this a reporting issue, or a database-design issue?
Do I create a standard view that the report connects to?
If I have a standard view, do I create a different view with the same structure for each data table, or keep replacing the view with a reference to the correct data table each time I run the report?
What's annoying is the excel sheets keep changing - new columns are added, and different clients name their data differently. Even if I can standardize this, I'd store different client data in different tables... so would I need to create a different report for each client, or pass in the table name to the report?
There are two ways and the path you choose is really dictated by how much flexibility you have architecturally.
First, you are on the right track by renaming your selected columns to a common name since that name is what is used to bind the data to the control on the report. Have you considered a stored procedure to access the data? This removes the query from the report and allows you to set up the stored proc on any database to return the necessary columns. If you cannot off-load to a stored proc, you can always rely on altering the query text at run-time. Because BIRT reports are not compiled (they are XML) you can change the query based on parameters and have it executed for each run of the design. Look at the onCreate event for the Data Set and you can access this.queryText and do any dynamic string substitution you need via JavaScript. Hidden parameters are a good way to help alter/tune the query. If you build the Data Set correctly, the changing of the underlying data could be as easy as changing the Data Source and then re-associating the Data Set to the new Data Source (in the edit data set window). I have done this MANY times and it works well. If you are going down this route, I would add the Data Source(s), Data Set(s) and any controls that they provide data to a report library. With the library you can use the controls in many reports and maintain them in one spot. If you update the library, all the reports using the library get updated as well.
Alternatively, if you want to really commit to a fully re-usable strategy that allows you to build a library of reusable components you could check out the free Reusable Component Library at BIRT Exchange (Reusable Component Library). In my opinion this strategy would give you the re-use you are looking for but at the expense of maintainability. It is abstraction to the point of obfuscation. It requires totally generic names for columns and controls that make debugging very difficult. While it would not be my first choice (the option above would be) others have used it successfully so I thought I would include it here since it directly speaks to your question.