SSIS MDX Query Problem - ssis

Hallo at all!
I have a little Problem with my Query in MDX.
I try to query up the Damage Repair Types from my Cube. Next i explain my Dimension and the Fact Table:
Dimension: Demage Repair Type
RepairTypeKey | Name | RepairTypeAlternateKey | RepairSubTypeAlternateKey | SubName 0 |Unknown |0 | NULL | NULL
1 |Repair |1 |1 | 1 Boil
2 |Replacement |2 |NULL | NULL
3 |Repair |1 |2 | 2 Boils
4 |Repair |1 |3 | 3 Boils
So I have in my Fact Table "CLaimCosts" for every Claim one RepairTypeKey. I Fill the Tables and design a Cube. The Dimension have a Hirarchy with RepairType and SubRepairType. I Process the Cube and it works Fine:
Demage Repair Type
Hirarchy
Members
All
Replacement
Repair
1 Boil
2 Boils
3 Boils
Unknown
Now I Create a Query with MDX:
select
{
[Measures].[Claim Count],
[Measures].[Claim Cost Position Count],
[Measures].[Claim Cost Original],
[Measures].[Claim Cost Original Average],
[Measures].[Claim Cost Possible Savings],
[Measures].[Claim Cost Possible Savings Average],
[Measures].[Claim Cost Possible Savings Percentage]
} on 0,
NON EMPTY{
NonEmpty([Damage Repair Type].[Hierarchy].Allmembers, ([Measures].[Claim Count]))
} on 1
from
Cube
where
(
({StrToMember(#DateFrom) : StrToMember(#DateTo)})
,([Claim Document Type].[Document Type].&[4])
)
Now i try to Run the Query and it Works but i have to much Rows Shown:
Demage Repair Type | Demage Repair Sub Type | Claim Count | ....
NULL |NULL | 200000
Replacement | NULL | 150000
Repair | NULL | 45000
Repair | 1 Boil | 10000
Repair | 2 Boil | 15000
Repair | 3 Boil | 19000
Unknown |NULL | 1000
My Problem are the frist Row (Sum) and the third Row (Sum)! I don't need this Rows but I don't know how to Filter them! I don't need this Sums because i have the Childs with the right Counts!
How I can Filter this? Please help me. It doesn't work!
Sorry for my bad English and Thank you!
Alex

NonEmpty([Damage Repair Type].[Hierarchy].Allmembers, ([Measures].[Claim Count]))
You can use:
NonEmpty([Damage Repair Type].[Hierarchy].Levels(2).Members, [Measures].[Claim Count])
This way we exclude the All members. Also, when you use the level members (e.g. [dim].[hier].[lvl].Members) instead of the hierarchy members (e.g. [dim].[hier].members) you don't get the aggregate members - e.g. the All member which is commonly present in all hierarchies other than non-aggregatable attribute hierarchies.

Related

MS Access help needed forming a specific report

I have a table with a column for agent names and a column for each of the skills those agents could possibly have. Each skill the agent is assigned shows a 1 in the field under that skill.
Columns look like this:
+---------+----------+----------+----------+
| Name | 'Skill1' | 'Skill2' | 'Skill3' |
+---------+----------+----------+----------+
| John | 1 | | 1 |
| Sam | 1 | 1 | |
| Roberta | 1 | | 1 |
+---------+----------+----------+----------+
I would like to make a query that returns a list of all agent names that have a 1 for each particular skill. The query would return something like this:
+-----------+
| Skill 1 |
+-----------+
| John |
| Sam |
| Roberta |
+-----------+
Additionally I would like to be able to query a single name and retrieve all skills that agent has (all rows the Name column has a 1 in) like this:
+-----------+
| John |
+-----------+
| Skill 1 |
| Skill 3 |
+-----------+
I've done this in Excel using an index but I'm new to Access and not sure how to complete this task.
Thanks in advance.
One of the reasons that you are finding this task difficult is because your database is not normalised and so due to the way that your database is structured, you are working against MS Access, not with it.
Consequently, whilst a solution is still possible with the current data, the resulting queries will be painful to construct and will either be full of multiple messy iif statements, or several union queries performing the same operations over & over again, one for each 'skill'.
Then, if you every wish to add another Skill to the database, all of your queries have to be rewritten!
Whereas, if your database was normalised (as Gustav has suggested in the comments), the task would be a simple one-liner; and what's more, if you add a new skill later on, your queries will automatically output the results as if the skill had always been there.
Your data has a many-to-many relationship: an agent may have many skills, and a skill may be known by many agents.
As such, the most appropriate way to represent this relationship is using a junction table.
Hence, you would have a table of Agents such as:
tblAgents
+-----+-----------+----------+------------+
| ID | FirstName | LastName | DOB |
+-----+-----------+----------+------------+
| 1 | John | Smith | 1970-01-01 |
| ... | ... | ... | ... |
+-----+-----------+----------+------------+
This would only contain information unique to each agent, i.e. minimising the repeated information between records in the table.
You would then have a table of possible Skills, such as:
tblSkills
+-----+---------+---------------------+
| ID | Name | Description |
+-----+---------+---------------------+
| 1 | Skill 1 | Skill 1 Description |
| 2 | Skill 2 | Skill 2 Description |
| ... | ... | ... |
+-----+---------+---------------------+
Finally, you would have a junction table linking Agents to Skills, e.g.:
tblAgentSkills
+----+----------+----------+
| ID | Agent_ID | Skill_ID |
+----+----------+----------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 2 | 1 |
| 4 | 3 | 2 |
+----+----------+----------+
Now, say you want to find out which agents have Skill 1, the query is simple:
select Agent_ID from tblAgentSkills where Skill_ID = 1
What if you want to find out the skills known by an agent? Equally as simple:
select Skill_ID from tblAgentSkills where Agent_ID = 1
Of course, these queries will merely return the ID fields as present in the junction table - but since the ID uniquely identifies a record in the tblAgents or tblSkills tables, such ID is all you need to retrieve any other required information:
select
tblAgents.FirstName,
tblAgents.LastName
from
tblAgentSkills inner join tblAgents on
tblAgentSkills.AgentID = tblAgents.ID
where
tblAgentSkills.Skill_ID = 1
To get all agents with skill1, open the query designer and create the following query:
this will generate the following sql
SELECT Skills.AgentName
FROM Skills
WHERE (((Skills.Skill1)=1));
If you adjust the names you can also paste this query into the sql pane of the designer to get the query you want.
To get all the skills an agent has I chose a parameterized query. Open the query designer and create a new query:
When you run this query it will ask you for the name of the agent. Make sure to type the agent name exactly. Here is the resulting sql:
SELECT Skills.AgentName, Skills.Skill1, Skills.Skill2, Skills.Skill3
FROM Skills
WHERE (((Skills.AgentName)=[Agent]));
If you continue working with this query I would improve the table design by breaking your table into a skills table, agents table, skills&agents table. Then link the skills and agents tables to the skills&agents table in a many to many relationship. The query to get all an agents skills would then look like this in the designer:

Basic MySQL query is pretty slow. Am I indexing my tables properly?

I'm currently in the process of optimizing a very large MySQL database, around which I'm building a web-based query interface.
The database will have two tables. The first table is already optimized (I believe), and contains information regarding 950 meteorological data observation stations across the US:
Description for: stations (950 records)
+-----------+------------+--------+-------+---------+----------------+
|Field |Type | NULL |KEY | Default | Extra |
+-----------+------------+--------+-------+---------+----------------+
|id |INT |NO |PRI |NULL |auto_increment |
|stationID |char(4) |NO |PRI |NULL | |
|name |varchar(16) |YES | |NULL | |
|state |char(2) |YES |MUL |NULL | |
|lat |float(6,2) |YES | |NULL | |
|lon |float(6,2) |YES | |NULL | |
|elev |INT |YES | |NULL | |
+-----------+------------+--------+-------+---------+----------------+
The other table contains observations collected at these stations from 2014 through 2017 (constructed, not optimized):
Description for: metar_records (359786049 records)
+-----------+------------+--------+-------+---------+----------------+
|Field |Type | NULL |KEY | Default | Extra |
+-----------+------------+--------+-------+---------+----------------+
|auto_id |INT |NO |PRI |NULL |auto_increment |
|stationID |char(4) |NO |MUL |0 | |
|zdatetime |datetime |NO | |NULL | |
|ldatetime |datetime |NO | |NULL | |
|temp |tinyint(4) |YES | |NULL | |
|dew |tinyint(4) |YES | |NULL | |
|wspd |tinyint(3) |YES | |NULL | #unsigned |
|wdir |tinyint(3) |YES | |NULL | #unsigned |
|wgust |tinyint(3) |YES | |NULL | #unsigned |
|VRB |char(3) |YES | |NULL | |
+-----------+------------+--------+-------+---------+----------------+
where stationID is the field upon which the two tables are related. The metar_records has a unique index on ('stationID', 'zdatetime'). A list of metar_records table indexes:
+-------------+--------+---------+------------+-----------+-----------+----------+
|Table |Non_UNQ |Key_name |Seq_in_index|Column_name|Cardinality|Index_type|
+-------------+--------+---------+------------+-----------+-----------+----------+
|metar_records|0 |PRIMARY |1 |auto_id |358374698 |BTREE |
|metar_records|0 |sz_date |1 |stationID |820079 |BTREE |
|metar_records|0 |sz_date |2 |zdatetime |358374698 |BTREE |
|metar_records|1 |stationID|1 |stationID |598288 |BTREE |
+-------------+--------+---------+------------+-----------+-----------+----------+
Here's where I'm really confused: I also have a test table (called metar_test), which is identical to metar_records aside from having no auto_increment field, and has no indexes whatsoever. Execution of SELECT COUNT(*) FROM metar_test; lasts 0.02 seconds at most, whereas SELECT COUNT(*) FROM metar_records; takes roughly 1 minute and 18 seconds to complete.
I understand that having a table this large will result in some long query times, but metar_records is only 3.36 times larger than metar_test -- why is there such a large discrepancy between the SELECT COUNT(*) ... queries for the two tables? I'm not particularly well versed in data storage, but this difference seems unexpectedly large to me.
How can I improve my indexing to optimize the large table size? Is it possible to reduce the query duration from here?
You can try:
select count(stationID)
from metar_records
This will make the query optimizer to use the index of stationID and therefore reading less data as the count(*) which reads through full data.
I would reconstruct your tables in such way.
Stations.
Id as auto inc.
StationId char (4) unique
Rest...
Metar_records
Id as auto inc
StationId referencing stations.id
Rest...
This way your key length are way smaller and numeric. Will boost up your performance.
You probably have the "Query cache" turned on. That makes it so that running exactly the same query a second time is very fast. To properly time a query, do this twice and take the second timing:
SELECT SQL_NO_CACHE ...
COUNT(*) is the usual pattern for counting rows. COUNT(col) is slower because it needs to check each col for being NOT NULL.
You have 3 INDEXes on your big table; you need only one:
PRIMARY KEY(stationID, zdatetime)
And, by clustering that way, several likely queries will run faster.
Please use SHOW CREATE TABLE; it is more descriptive than DESCRIBE.
You should use ENGINE=InnoDB, not ENGINE=MyISAM (see SHOW CREATE TABLE).
SELECT COUNT(*) ... is not a very common query; you should not take much stock in how fast it runs.
PARTITIONing is not likely to help performance. Let's see more of your queries -- to double check my claim. MySQL has no parallel processing, even for PARTITIONed tables.
Also toss the id AUTO_INCREMENT from the Stations table; instead, have PRIMARY KEY(stationID).

How can I count values in a range that are greater, equal, less than values in a another range in Excel?

I think my question is really basic but I can't seem to find the answer anywhere. I have a table "Matches" that shows a home team, an away team, the round and the score. Like this:
Round | HomeTeam | HTGoals | ATGoals | AwayTeam
-----------------------------------------------
1 | team1 | 1 | 0 | team4
1 | team2 | 1 | 1 | team5
1 | team3 | 0 | 1 | team6
2 | team4 | 3 | 0 | team1
2 | team5 | 2 | 0 | team2
2 | team6 | 2 | 0 | team3
-----------------------------------------------
(Sorry I can't paste images yet)
I need a second table with Home Team victories, away team victories, and ties, like this:
|Home|Tie|Away
---------------------
Round1|1 |1 |1
Round2|3 |0 |0
But I can't find a formula that will do that comparison automatically. I tried this:
=COUNTIFS(A:A;1;C:C;">"&D:D)
but it doesn't work.
Do you know if there is a way to COUNT the times one range of values (in this case HTgoals) is greater than another range of values (in this case ATGoals) comparing each cell in the first range with the respective cell in the next range?
One more thing, without considering the round, if I just verify the goal comparison, I found a solution that is everywhere:
=SUMPRODUCT(--(C:C>D:D))
And it works for ">" and "<" but it doesn't for "=" so it is not working for me.
I have two possible solutions that I'm trying to avoid but if there is no formula, please let me know.
1- I can create an extra column that compares the scores with 3 IF, ElseIF, ELSE that replies "HW", "AW" or "TIE". and then do COUNTIFS() with the round and the resuls. (I'm trying to avoid this one because my data is coming form a database)
2- I can go ahead and make a procedure in my database (mySQL) which I'm also trying to avoid because I would end up having tons and tons of store procedures for each competition. I need to have that logic in diferent workspaces (Or spreadsheets)
Three formula:
Home:
=SUMPRODUCT(($A$2:$A$7=G2)*($C$2:$C$7> $D$2:$D$7))
Tie:
=SUMPRODUCT(($A$2:$A$7=G2)*($C$2:$C$7= $D$2:$D$7))
Away:
=SUMPRODUCT(($A$2:$A$7=G2)*($C$2:$C$7< $D$2:$D$7))
You can also use this one array formula:
=SUMPRODUCT(($A$2:$A$7=$G2)*(CHOOSE(COLUMN(A:A),--($C$2:$C$7> $D$2:$D$7),--($C$2:$C$7= $D$2:$D$7),--($C$2:$C$7< $D$2:$D$7))))
Put it in H2, hit Ctrl-Shift-Enter instead of Enter, then copy/drag over and down.
SumProduct will work:
|Home|Tie|Away
---------------------
Round1|a |b |c
Round2|3 |0 |0
a: =SUMPRODUCT(((A:A=1)*(C:C>D:D)))
b: =SUMPRODUCT(((A:A=1)*(C:C=D:D)))
c: =SUMPRODUCT(((A:A=1)*(C:C<D:D)))
I'll let you figure out Round2

Access a parent field from sub query in mysql

I'm trying to access a field being called from the parent query within a nested one and here is my table
TABLE: reminders.
Columns: id:PK, rid:VARCHAR, title:VARCHAR, remind:Integer, start_day:DATE
SELECT id, remind, rid, title
FROM reminders
WHERE DATEDIFF(start_day, NOW()) <= (SELECT LEAST(3, remind))
Basically the second "remind" column in the LEAST() command is suppossed to reference the first "remind" column value for every row being spanned but for reasons that I can't just imagine i keep getting unexpected returns.
EDIT
In response to Sir Gordons that i provide more detailed info, I will try my best but I really do not know how to present table data here, but i'll try.
So basically i'm trying to SELECT all items from the reminders table WHERE the DIFFERENCE between the SET DAY (start_day) and TODAY doesn't exceed one of TWO values, those are either 3 or the value set in the remind column of the current row. Basically if the value set there is less than 3 then it should be used instead, but if it exceeds 3, 3 should be chosen. Here's a visual of the table.
+---+-----------------+--------------------+-----------------+-------------+
|id | rid | title | start_day | remind |
+---|-----------------|--------------------|-----------------|-------------|
|1 | ER456GH | This is real deep | 2014-01-01 | 10 |
|2 | OUBYV90 | This is also deep | 2014-01-13 | 10 |
|3 | UI90POL | This is deeper | 2014-01-13 | 60 |
|4 | TWEET90 | This is just deep | 2014-01-14 | 0 |
+---+-----------------+--------------------+-----------------+-------------+
So in editing this I realized that there was a false table entry under remind on the 4th entry that was causing it to pull false (i.e where remind = 0). Sigh. Some serious short sight on my part/lack of sleep I guess. The query does work . Thanks again.
You don't need a subquery here. Does this work?
SELECT id, remind, rid, title
FROM reminders
WHERE DATEDIFF(start_day, NOW()) <= LEAST(3, remind);

MySQL how to rank objects by similarity of multiple property rows

Hello all and a Happy New Year
SITUATION:
I have some tables in MySQL db:
Scores:
(Unique ID, unique (objectID, metricID))
| ID | ObjectID | MetricID | Score |
|--------+----------+----------+----------|
|0 | 1 | 7 | 0 |
|1 | 5 | 3 | 13 |
|2 | 7 | 2 | 78 |
|3 | 7 | 3 | 22 |
|.....
|--------+----------+----------+----------|
Objects:
(unique ID, unique ObjectName)
| ID | ObjectName |
|--------+------------|
|0 | Ook |
|1 | Oop |
|2 | Oww |
|3 | Oat |
|.....
|--------+------------|
Metrics:
(unique ID, unique MetricName)
| ID | MetricName |
|--------+------------|
|0 | Moo |
|1 | Mar |
|2 | Mee |
|3 | Meep |
|.....
|--------+------------|
For a given object ID:
There will be a number of scores between '0' and 'one per metric'
REQUIREMENT:
For a given ObjectID, I want to return a sorted list based on the following criteria:
Returned rows ranked in order of similarity to the provided object
Returned rows not to include provided object
(this is the hard bit I think) Order of similarity is determined by an object's "score distance" from the provided object based on the numeric offset/difference of its score from the provided object's score for any metric for which there is an entry for both the provided and the currently-examined objects
Contains objectID, Object name, score difference (or something similar)
PROBLEM STATEMENT:
I don't know the correct SQL syntax to use for this, and my experiments so far have failed. I would like to do as much of this work in the DB as possible and have little or none of this work done in nasty for-loops in the code or similar.
ADDITIONAL NON-FUNCTIONALS
At present there are only 200 rows in the Scores table. My calculations show that ultimately there may be up to around 2,000,000 rows, but probably no more.
The Objects table will only ever have up to around 5000 rows
The Metrics table will only ever have up to around 400 rows
Here's an approach to order objects based on their similarity to object 1:
select other.ObjectID
, avg(abs(target.Score - other.Score)) as Delta
from Scores target
join Scores other
on other.MetricID = target.MetricID
and other.ObjectID <> target.ObjectID
where target.ObjectID = 1
group by
other.ObjectID
order by
Delta
Similarity is defined as the average difference in common metrics. Objects that do not share at least one metric with object 1 are not listed. If this answer makes wrong assumptions, feel free to clarify your question :)
Live example at SQL Fiddle.