Improving the performance of a COUNT operation in MDX - sql-server-2008

I have a fairly complex OLAP database, which basically amounts to a "header" record, and a huge number of "Members" which belong to that header.
My current MDX query gets the aggregated sum "Value" of members sliced by Age band and type of member. Here is the query:
SELECT
(
[Measures].[Member Value]
)
ON COLUMNS,
NON EMPTY
(
{
[Header].[Client Hierarchy].[Group Name].&[The Company]&[UK]&[ABC Group],
[Header].[Client Hierarchy].[Group Name].&[The Company]&[UK]&[DEF Group]
}
[Header].[Member Type].[Member Type],
[Member].[Age Band].[Age Band]
)
ON ROWS
FROM [Cube]
WHERE
(
[Header].[Another Attribute].&[Something],
[Header].[Created Date].&[2010-12-31T00:00:00],
[Member].[A Boolean Attribute].&[False]
)
I am trying to add another measure to this query to get the number of members aggregated in each row of the resultset. I achieved this using this calculated member:
WITH MEMBER [Measures].[Member Count] AS
COUNT(
EXISTING ([Member].[Id].[Id],[Measures].[Member Value])
,EXCLUDEEMPTY
)
And of course added it into the COLUMNS
SELECT
(
[Measures].[Member Value],
[Measures].[Member Count]
)
...
However this chnages the query from taking ~1second originally to ~1:14minutes
Im thinking this is more to do with my cube structure than the query itself, does anyone have any hints as to what I need to change in my cube structure, or possibly a more efficient way of querying the same thing? I have seen some examples online of using SUM rather than count but they were more to do with COUNT and FILTER together.

One possible solutions is adding a new measure in your cube that retrieves this information.
Change your fact table adding a new column with the [Member].[Id].[Id] if it's not already there. Create a 'distinct count' measure on this column -> [Member Count]. Now this measure is retrieving the information you're looking for, note that in your facts you can not have null values in the column pointing to [Measures].[Member Value].
Another version is using SCOPE functionality of ssas, not sure if this will improve performance but it's likely.

For the record I determined that I already had a Measure that would sum up my records.
The member table had a measure which represented "The number of members this record represents" This had a 1 in it for all the records I was looking at (single members). Simply adding this measure summed that value across my slice.

Related

How to get reliable results from first() and last()?

EDIT: See edit below for explanation of why min() and max() are NOT adequate.
=========================
The MS documentation on the functions first() and last() says “Because records are usually returned in no particular order (unless the query includes an ORDER BY clause), the records returned by these functions will be arbitrary.”
Obviously, that makes these functions pretty useless for their intended purpose unless the query includes an ORDER BY. But including that in the query is not a straightforward thing to do because these are "aggregate" functions, so a query that SELECTs on them cannot ORDER BY any other field that is not also submitted to an aggregate function.
I have found that a query based on a single table generally returns results in the order of that table’s primary key. But apparently, that cannot be relied on to always be true and may fail under certain circumstances. There's an excellent discussion of this issue in an article, DFirst/DLast and the Myth of the Sorted Result Set.
That article offers two solutions to this problem:
Option one; you first use the DMin/DMax-Function to retrieve the value from the “sortable” column ... and use this as an additional criterion to your query to retrieve the target record.
Second option; you first create a query just containing the primary key and the max value of the sortable column (e.g. CustomerId and maximum of order date). Then you create a second query on the orders and join the first query in there on these two fields. The results will be all column from the orders table but only for the most recent order of each customer.
Those instructions are pretty complicated, so I'd need to see an example of them implemented in code in order to trust myself to use them myself.
This issue has got to be very common because a lot of businesses need to know the first or last order by a customer that meets some condition. But when I Google "Access query first last "order by"", there are several results that explain the problem, including on StackOverflow, but none that lay out a solution with sample SQL code.
What is the right way to do this, including sample code of doing it?
=========================
Edit:
Many sources online, as well as the comment below by Gustav and the proposed answer by Albert D. Kallal, say you can just use min() and max() instead of first() and last(). Obviously, that's okay if what you want is the value of a field in the record in which that field has the smallest or largest value. That's a trivial problem. What I'm talking about is how to get the value of a field in the record in which some other field has the smallest or largest value.
For example, in the answer by Albert D. Kallal, he wants the first and last tour for each customer, so he can just use min() and max() on the dates of the tours. But what if I want to know the location of the first tour for each customer? Obviously, I can't use min(location). If first() would work in a sensible way and if table [Tours] has the primary key [Date], I should be able to use something like:
(SELECT first(location) from [Tours] where [Customer] = ID_Customer)
I am using code like that and it usually gives me the right answer, but not always. So that is what I need to fix. I understand that I may need to use min() instead of first(). But how do I use min() for this since, as I said, I obviously can't just use min(location)?
Never really grasped what first() and last() does in Access.
As you note, rather common to want say last invoice or whatever.
So, say we have a table of Tours. I want the first tour date, and the last tour date.
Well, this query works:
SELECT MAX(FromDate) as LastTourDate, min(FromDate) as FirstTourDate
FROM tblTours
WHERE FromDate is not null
When I run above, I get this:
So, that gets you the min, and max - and gets you this in one query.
No real need for a order by.
However, often there are more then one table involved.
So, I might in place of JUST the first and last tour date?
I probably want a list of customers, and their first tour they took, and say their last tour. But, then again, that's a different question.
But, you again can order your main table ANY way you want, and still pluck out
(pull the min and max).
So, you can do it this way:
Say, tblMain client (people - customers whatever).
Say, tblMyTours - a list of tours they took (child table).
So, the query can look like this:
SELECT tblMainClient.FirstName, tblMainClient.LastName,
(SELECT Min(FromDate) FROM tblMyTours
WHERE tblMyTours.main_id = tblMainClient.id)
AS FirstTourDate,
(SELECT MAX(FromDate) FROM tblMyTours
WHERE tblMyTours.main_id = tblMainClient.id)
AS LastTourDate
FROM tblMainClient
so, the main query is still tblMainClient - I can order, filter, sort by any column in that main table, but we used two sub-query to get the first tour date and the last tour date. So, it will look say like this:
So, typical, we can use a sub-query, pull the max (or min) value, but restrict the sub query to the one row from our parent/main table.
edit: Get last reocrd, but SOME OTHER column
Ok, so say in our simple example, we want the last tour, but NOT the date, but say some other column - like say the last Tour name.
Ok, so we just modify the sub query to return ONLY the last reocrd, but a different column.
And since dates (say 2 invoices on the same day, or yearly tours might have the SAME name, then we need to ensure that ONLY one reocrd is returned. We do this by using top 1, but ALSO add a order by to be 100%, 200%, 300% sure that ONLY ONE top record is returned.
So, our query to get the last tour name, but based on say most recent tour date?
We can do this:
SELECT FirstName, LastName,
(SELECT TOP 1 TourName FROM tblMyTours
WHERE tblMyTours.main_id = tblMainClient.id
ORDER BY tblMyTours.FromDate DESC, tblMyTours.ID DESC)
AS LastTour
FROM tblMainClient
And that will give us the tour name, but the last one.
This:
So, you ceratinly not limited to using "max()" in that sub query.
However, what happens if we want the Tour Name, Hotel Name, and City of that tour?
In other words, it certainly reasonable that we may well want multiple columns.
There are more ways to do this then flavors of ice cream.
However, I like using the query builder for the first part.
What I do is use the standard query builder, do a join to the table and simple slect all the columns I need.
So, for above tblMainClient, and their tours from tblMyTours?
I build a join - use query builder like this:
So, note how I added the columns TourName, FromDate, HotelName and city from that child table (tblMyTours).
Now, of course the above will return 10 rows for anyone who gone on 10 trips.
So, what we do is add a WHERE clause to the child table, get the LAST pk "id" from tblMyTours, and restrict that child table to the ONE row.
So, the above query builder gives us this:
SELECT tblMainClient.ID, tblMainClient.FirstName, tblMainClient.LastName,
tblMyTours.TourName, tblMyTours.FromDate, tblMyTours.HotelName, tblMyTours.City
FROM tblMainClient
INNER JOIN tblMyTours ON
tblMainClient.ID = tblMyTours.Main_id;
(but, I did not have to write above).
So, we add a where clause to that child table join - get the CHILD table "id" in place of TourName, or Tourdate).
So above becomes this:
SELECT tblMainClient.ID, tblMainClient.FirstName, tblMainClient.LastName,
tblMyTours.TourName, tblMyTours.FromDate, tblMyTours.HotelName,
tblMyTours.City
FROM tblMainClient
INNER JOIN tblMyTours ON tblMainClient.ID = tblMyTours.Main_id
WHERE tblMyTours.ID =
(SELECT TOP 1 ID FROM tblMyTours
WHERE tblMyTours.Main_id = tblMainClient.id
ORDER BY tblMyTours.FromDate DESC, tblMyTours.ID DESC)
Now, above is a bit advanced, but OFTEN we want SEVERAL columns. But, at least the first part of the query, the two tables, and the join was done using the query builder - I did not have to type that part in.
so, if you want JUST one column - differnt then the max() critera, then use top 1 with a order by. Do keep in mind that ONLY ONE RECORD can EVER be retunred by that query - if more then one reocrd is returned, the query enginer will fail and you get a message to this fact.
So, for a produce bought, invoice date? They could by the 1 product 2 times, or 2 invoices on the same day might occur. So, by introduction of the 2nd ORDER BY clause (by ID DESC), then that top 1 will ONLY ever return one row.
So, which of the above two?
Well, if just one column from the child table - easy. But, if you want multiple columns? Then you could probably write up a "messy" solution, but I perfect to just fire up query builder, join in the child table, click on the "several" child values I want. Get the query working - and hey, it all up to this point 100% GUI.
Then we toss in the EXTRA criteria to restrict that child table row to the ONE last row, be it simple last one based on ID DESC, or say TourDate, or whatever.
And now we get this:

Do we have a workaround to use alias with 'where' in sql

Sales :
Q1) Return the name of the agent who had the highest increase in sales compared to the previous year
A) Initially I wrote the following query
Select name, (sales_2018-sales_2017) as increase
from sales
where increase= (select max(sales_2018-sales_2017)
from sales)
I got an error saying I cannot use increase with the keyword where because "increase" is not a column but an alias
So I changed the query to the following :
Select name, (sales_2018-sales_2017) as increase
from sales
where (sales_2018-sales_2017)= (select max(sales_2018-sales_2017)
from sales)
This query did work, but I feel there should be a better to write this queryi.e instead of writing where (sales_2018-sales_2017)= (select max(sales_2018-sales_2017) from sales). So I was wondering if there is a work around to using alias with where.
Q2) suppose the table is as following, and we are asked to return the EmpId, name who got rating A for consecutive 3 years :
I wrote the following query its working :
select id,name
from ratings
where rating_2017='A' and rating_2018='A' and rating_2019='A'
Chaining 3 columns (ratings_2017,rating_2018,rating_2019) with AND is easy, I want know if there is a better way to chain columns with AND when say we want to find a employee who has rating 'A' fro 10 consective years.
Q3) Last but not the least, I'm really interested in learning to write intermediate-complex SQL queries and take my sql skills to next level. Is there a website out there that can help me in this regard ?
1) You are referencing an expression with a table column value, and therefore you would need to define the expression first(either using an inline view/cte for increase). After that you can refer it in the query
Eg:
select *
from ( select name, (sales_2018-sales_2017) as increase
from sales
)x
where x.increase= (select max(sales_2018-sales_2017)
from sales)
Another option would be to use analytical functions for getting your desired results, if you are in mysql 8.0
select *
from ( select name
,(sales_2018-sales_2017) as increase
,max(sales_2018-sales_2017) over(partition by (select null)) as max_increase
from sales
)x
where x.increase=x.max_increase
Q2) There are alternative ways to write this. But the basic issue is with the table design where you are storing each rating year as a new column. Had it been a row it would have been more easy.
Here is another way
select id,name
from ratings
where length(concat(rating_2017,rating_2018,rating_2019))-
length(replace(concat(rating_2017,rating_2018,rating_2019)),'A','')=3
Q3) Check out some example of problems from hackerrank or https://msbiskills.com/tsql-puzzles-asked-in-interview-over-the-years/. You can also search for the questions and answers from stackoverflow to get solutions to tough problems people faced
Q1 : you can simply order and limit the query results (hence no subquery is necessary) ; also, column aliases are allowed in the ORDER BY clause
SELECT
name,
sales_2018-sales_2017 as increase
FROM sales
ORDER BY increase DESC
LIMIT 1
Q2 : your query is fine ; other options exists, but they will not make it faster or easier to maintain.
Finally, please note that your best option overall would be to modify your database layout : you want to have yearly data in rows, not in columns ; there should be only one column to store the year instead of several. That would make your queries simpler to write and to maintain (and you wouldn’t need to create a new column every new year...)

Alteryx to select top N records where N=a value on that group

I'm in a fix with Alteryx. I'm trying to select the top N rows where N=a cell value for that partition. The business question is:
"We need to know, out of our orders (TicketIDs), those that have
least 1 combination of Type of discount item AND drink AND side."
The SQL query would join this table onto itself and partition to get the TopNtoIncludeInItems for that row, however, I just can't seem to find a way to do this in Alteryx. I've tried the community, but the question has gone ananswered.
In other words, select thusly:
<pseudocode>
for each (TicketID)
for each(Type)
select top(TopNtoIncludeInItems for this.TicketID)
next
next
</pseudocode>
or indeed select just the green records
Here's my solution:
MultiRow Formula: create new field ComboCount (or whatever) as Int32, 0 or empty for rows that don't exists, Group By TicketID and Type, with the Expression [Row-1:ComboCount]+1 ... this counts up each group; we'll want the first topN of each group, ensuring the group actuall has that many, and not going beyond TopN.
Filter on [ComboCount] <= [TopN] ... which excludes unnecessary rows beyond TopN
Summarize: group by TicketID and Type, doing Max(ComboCount) ... if this value is less than TopN for any group, the group should be excluded:
Join the summary back to the earlier pre-summary data on TicketID and Type
Filter on [Max_ComboCount] = [TopN] ... this excludes the groups where any ItemType falls short of TopN
And that's it. Pictorally, this is what my workflow looks like, along with data results based on data similar to that in your screenshot:

mysql group_concat alternative or multiple rows as columns

Before i start my question i cover briefly what the problem is:
I have a table that stores around 4 million 'parameter' values. These values have an id, simulation id and parameter id.
The parameter id maps to a parameter table that basically just maps the id to a text like representation of the parameter x,y, etc etc
The simulation table has around 170k entries that map parameter values to a job.
There is also a score table which stores the score of each simulation , simulations have varying scores for example one might have one score another might have three. The scores tables has a simulation_id column for selecting this.
Each job has an id and an objective.
Currently im trying to select all the parameter_values who's parameter is 'x' and where the job id is 17 and fetch the score of it. The variables of the select will change but in princible its only really these things im interested in.
Currently im using this statement:
SELECT simulation.id , value , name , ( SELECT GROUP_CONCAT(score) FROM score WHERE score.simulation_id = simulation.id ) AS score FROM simulation,parameter_value,parameter WHERE simulation.id=parameter_value.simulation_id AND simulation.job_id = 17 AND parameter_value.parameter_id=parameter.id AND parameter.name = "$x1"
This works nicley except its taking around 3 seconds to execute. Can this be done any faster?
I don't know if it would be faster doing a query before this a pre-calculating the parameter_ids im searching for and doing an WHERE parameter_id IN (1,2,3,4) etc.
But i was under the impression SQL would optimize this anyway?
I have created index's where ever possible but cant get faster than the 2.7 seconds mark.
So my question would be:
Should i pre-calculate some values and avoid the joins,
Is there another other than group_concat to get the scores
and is there any other optimizations i could make to this?
I should also add that the scores must be in the same row or at least return sorted so i can easily read them from the result set.
Thanks,
Lewis

Need a SQL Server 2008 case statement to evaluate a table and return two values

I'm a novice SQL programmer and have been banging my head against this all morning, so please bear with me. My situation is this: I have a table of SKUs that need to be sent to our eCommerce website. Each of these SKUs has a 'quantity', an 'active' value, and a 'discontinued' value. This was easy enough to handle when we were dealing with one SKU at a time, but now I have to send kits, which contain one or more SKUs.
For example, if my Kit's ID is 000920_001449_001718_999999 (a combination of four SKUs) I need to collect data for the entire set of SKUs like so:
Here's the logic I need to incorporate:
If any of the SKUs have null or WEBNO as an IsActive value, the entire kit must return WEBNO. Otherwise, return WEBYES.
If any of the SKUs have null or '1' as an IsDiscontinued value, the entire kit must return IsDiscontinued = '1'. Otherwise, return a 0.
My code is a bit of a mess, but here's what I've managed so far:
SELECT
CASE WHEN 'WEBNO' in
(
SELECT IsActive
FROM #SkusToSend as Sending
RIGHT JOIN
(
SELECT * FROM [eCommerce].[dbo].[Split] (
'000920_001449_001718_999999'
,'_')
) as SplitSkus
on Sending.SKU = SplitSkus.items
) THEN 'WEBNO'
ELSE 'WEBYES'
END
My question is this: Is it possible to write a statement that parses through my example table, returning only one row of 'IsActive' and 'IsDiscontinued'? I've tried using GROUP BY and HAVING statements on those fields, but always get multiple rows returned.
The code I have handles the WEBNO value, but not NULL, and doesn't even start to take into consideration the IsDiscontinued field yet. Is there a concise way to parse this together, or a better way to handle this type of problem?
I think a combination of ISNULL and MIN / MAX should do the trick:
SELECT
MIN(ISNULL(sending.IsActive, 'WEBNO')) AS IsActive,
MAX(ISNULL(sending.IsDiscontinuted, 1)) AS IsDiscontinuted
FROM
(
SELECT * FROM [eCommerce].[dbo].[Split] (
'000920_001449_001718_999999'
,'_')
) AS SplitSkus
LEFT JOIN #SkusToSend AS Sending
AS Sending.SKU = SplitSkus.items
I think this would be easier if you had a working example of some sample data in those tables. From guessing it looks like you have a table function splitting a string apart and giving multiple rows. You have some temp table that right joins to that so that is taking the function and essentially returning all rows it gets even if there are nulls in the temp table. This could return multiple rows as if you have a condition where you expect a single entity on a left or right join and there is a null at times you will get multiples. Or if you have a value repeated you will get multiples. You would have to ensure that you get one one result I am believing from your
Case when 'WEBNO' in
(
As while the logic may be correct to return the 'WEBNO' answer, it may be repeating the row result multiple times as the engine may interpret 'this happened' once, twice, three times. You could alleviate this by potentially doing a
'Select Distinct IsActive'
Which will make the expression return only a single result that is distinct for that column return.
Again this would be easier if we could see examples of what data those objects contained but this would be my guess.