Stata related -selecting specific rows - extract

I am currently working with a dataset that has information on individuals i = 1,...,N by time t = 1,...,T. I basically have a panel structure in my dataset. However, I want to select only one row of data from each individual. Specifically, I want to select only the last time period t=T for each individual i=1,...,N. How can I 'extract' this specific information from the bigger dataset?

In Stata [not STATA] rows are more properly called observations. You can "select" the last observation in each panel with the generic
bysort id (time) : ... if _n == _N
as under the aegis of by:
the built-in variable _n identifies observations in each panel
its sibling _N is the number of observations in each panel and therefore identifies the last observation in each panel.
This is well documented: e.g. see the help and manual entries explaining the by: prefix.

Related

Tableau's functions - how to find an equivalent to IF EXISTS

I'm creating a Tableau Dashboard with 'buttons' which are coloured red or green based on certain criteria and what is selected in the filters. The filters are just a way to select different offices in different regions and when selecting an office the buttons should change colour depending on whether the targets for the different metrics have been hit for that office or not.
The navigation buttons on Tableau won't accommodate this so I've made a work around. For each 'button' I've created a worksheet with just the text of the metric name on the Label mark and a calculated field on the colour mark. I've then added the worksheet to the Dashboard and added an action to go to the corresponding metric dashboard when the 'button' is clicked on.
The issue I'm having is the conditional colouring of one of these metrics. This metric is based on stock levels. For each office there are multiple categories of stock types, each with a corresponding target, with multiple 'bins' in each category. I want the button to turn red if ANY of the combined total of stock in the bins for one category is over the target for that category for that office.
To try and type it logically-
For the currently filtered data: IF EXISTS(FOR EACH OFFICE( FOR EACH CATEGORY: [SUM(BinValue)< CategoryTarget])) THEN 'Green' ELSE 'Red'
I've tried to translate that logic into Tableau's functions in a calculated field and have the following:
SUM(INT({INCLUDE [Category]:Min([CategoryTarget])} > {INCLUDE [Category]:SUM(BinValue)}))
This colouring is correct when I add the Office Name and Category pills to the worksheet to test my logic however when I remove the pills the colouring isn't correct. Something seems to be going wrong when I try to sum the number of categories that are within target levels over all offices and targets.
I've tried so many iterations of the following functions and have been going around in circles for days now:
INCLUDE, EXCLUDE, FIXED, IF, SUM, INT
If anyone knows how to do this properly or even just a different way of being able to conditionally colour buttons on a dashboard I would be incredibly grateful.
The structure of my data is as follows with some dummy data as an example:
Region
SubRegion
Office
Category
Bin
BinValue
CategoryTarget
North
NorthWest
Manchester
Toys
B123
30
50
North
NorthWest
Manchester
Toys
B456
40
50
So for a Stock Level metric selecting any of ALL/North/NorthWest/Manchester filter options should flag as red due to the total of the bins in one category in an office being higher than the target amount for that category for that office.
I've updated my calculated field however I'm still having issues with the grouping showing as true/false correctly.
This is what it is now:
MAX( {INCLUDE Category, Office:Sum(BinValue)} > {INCLUDE Category, Office:MIN(CategoryTarget)} )
With True showing as Red and False Green (we want to be below target hence the green).
When working on the example to showcase the issue I managed to get it working.
I ended up using the following logic:
max({EXCLUDE [Bin]:SUM([Bin Value])} > [Category Target])
This meant that even if most of the Offices in the filter were within their stock level targets, if there was one with stock levels over target the 'button' showed as red.
I published the example I've used anyway in case it helps others in the future.
Link to the Tableau Public dashboard:
https://public.tableau.com/views/ConditionalColouring/Dashboard1?:language=en-GB&:useGuest=true&:display_count=y&:origin=viz_share_link
Thank you very much for the help!
To work with logical conditions, such as testing whether a condition holds for any (or every) record in a group of data rows, it helps to understand that Tableau treats the boolean value "True" as greater than the boolean value "False".
Once you get comfortable with that idea, you can use the functions MAX() (or MIN()) to test whether a condition holds for any record (or for every record, respectively). So MAX(False, False, True, False) is True.
So to tell if any records have an actual value below their target, test MAX([Actual Value] < [Target Value])
You can then combine this idea with dimensions on the viz (or LOD calcs if necessary) to group the data records appropriately before testing your conditions. If you work with the same conditions repeatedly, this type of calculation can be very useful for defining sets that get used in multiple places.
One technical caveat, if your condition test ever evaluates to NULL, then those null values are ignored by MIN() and MAX() - just like other aggregation functions do. So for example, you could test whether every record satisfies a condition using MIN() and get a possibly misleading result if all the non-null values are True (so MIN() reports True). MIN(TRUE, TRUE, NULL, TRUE) = TRUE. If your condition can evaluate to NULL, and you don't want to ignore nulls, but instead treat it as, say, the same as False, then you can use the IFNULL() function to provide a default value for your condition.
As an example, MIN(IFNULL([Actual Value] > [Target Value], FALSE)) returns True only if every record has a value above its target, treating any records with missing values or targets as failing the condition - i.e. not exceeding the target. The choice of whether to have a default value for a condition, and what it should be, are problem dependent of course. If your data does not have null values, you don't have this complication to consider.
Though the data you have given is very less, yet I think this calculation field you require
IF { FIXED [Region], [Sub-Region], [Office], [Category] : SUM([Bin Value])}
> {FIXED [Region], [Sub-Region], [Office], [Category] : MIN([Category Target])} THEN 'RED' ELSE 'GREEN' END
This is based on assumption that for every group of region/sub-region/office/category target value will be same in each row within the group. Therefore MAX/AVG etc. will all work in place of MIN used in the calculation.
See I added two rows in your data
and result

How to find similar records in ms access database compared to a specific record in the table

I want to find the latest record of each patient and
compare the columns of that record to another record of a specific patient ID
Bring out the similarities
and group the records according to the percentage similarity value
So I want to see patients who have most similar records to that specfic patient to come on top and the rest follows.
Patient record
In short, sort by a calculated variable. Here is the query and table structure I used:
In my first definition of similarity I weighted each test equally:
CalculatedSimilarity: IIf([TestResults]![Test1]=[TestResult1],1,0)+IIf([TestResults]![Test2]=[TestResult2],1,0)+IIf([TestResults]![Test3]=[TestResult3],1,0)
in my second definition of similarity I doubled the value of the second test and ignored the 3rd:
CalculatedSimilarity2: IIf([TestResults]![Test1]=[TestResult1],1,0)+IIf([TestResults]![Test2]=[TestResult2],2,0)
to display as a percentage just divide by the weighted number of tests included.
When run against patient 5 who had "S" results on all tests the result is:
If you have many tests it would be better to construct the query with vba rather than the designer. this next step gets around a designer bug by linking the query to normally invisible text boxes.
the combobox rowsource is set to:
SELECT Patients.PatientID, Patients.FirstName, TestResults.Test1, TestResults.Test2, TestResults.test3 FROM Patients INNER JOIN TestResults ON Patients.PatientID = TestResults.PatientID;
have the combobox set the invisible textboxes
Private Sub cmbPatient_AfterUpdate()
'access designer can't see the column property. workaround by setting normally invisible text boxes
txttest1 = cmbPatient.Column(2)
txttest2 = cmbPatient.Column(3)
txttest3 = cmbPatient.Column(4)
similarity = cmbPatient.Column(5)
Me.Requery
End Sub
change the query parameters to reference the textboxes
CalculatedSimilarity: IIf([TestResults]![Test1]=[Forms]![Patients]![txttest1],1,0)+IIf([TestResults]![Test2]=[Forms]![Patients]![txttest2],1,0)+IIf([TestResults]![Test3]=[Forms]![Patients]![txttest3],1,0)

Access 2013 Count

I am working on a report in Access 2013 I need to seperate the first 20 records in a column that contain a value and assign a name to them. Such as at 1-20 I need it to insert Lot 1 at 21-40 need to assign Lot 2 etc... The report needs to be separated by lots of 20. I can also just insert a line when it reaches sets of 20 without a name if that makes it easier. Just need something to show a break at sets of 20.
Example: As you can see the report is separated by welder stencil. When the count in the VT column reaches 20 I need to enter a line or some type of divider to separate data. What our client is asking for is we separate the VT in sets of 20. I don't know whats the easiest way to accomplish this. I have researched it but haven't found anything.
Example Report with Divisions
Update the report's RecordSource query by adding "Lot" values for each row. There are multiple ways of doing this, but the easiest will be if your records already have a sequential, continuous numerical key. If they do not have such a key, you can research generating such sequential numbers for your query, but it is beyond the scope of this question and no details about the actual data schema were supplied in the question.
Let's imagine that you have such a key column [Seq]. You use the modulo (mod) and/or integer division operators (\ - backslash) to determine values that are exactly divisible by 20, e.g. ([Seq] - 1) mod 20 == 0.
Generate a lot value for each row. An example SQL snippet: SELECT ("Lot " & (([Seq] - 1) \ 20)) As LotNumber ...
Utilize Access report sorting and grouping features --grouping on the new Lot field-- to print a line and/or label at the start of each group. You can also have the report start a new page at the beginning or end of such a group.
The details about grouping can be found elsewhere in tutorials and Access documentation and are beyond the scope of this question.

SAP Web Intelligence - Summary Column Based on Multiple Criteria

I'm new to SAP Web Intelligence and I'm trying to create a report with a summary column based on multiple criteria. Below is my desired output in Excel as an example. I am having trouble coming up with ways to create this summary column (col H)
Link to the example here.
Essentially, I need column H to do the following:
Score = 0
For each cell in Range C:G, if cell isn't empty, get amount of points test is worth based on region the user is in, and add that score to "Score", and show in Column H as a total at the end.
Is this possible in SAP WI? I really really appreciate any help with this (even a push in the right direction).
Thanks!
I believe you're looking for a compounded IF function in a Variable here.
If you're used to using Excel, you will recognise the similar syntax for it:
=IF(TEST;Value if true;Value if false)
You will want to compound that so that you have and IF based on the 'Region' followed by a list of IF statements for each test that add up the points, followed by another list of IF statements for each test based on the other 'Region'
Something like the following should give you the basic start:
=If([Region]="Oceania";If(IsNull([Test #1]);0;1)+If(IsNull([Test #2]);0;1)+IF(...);If(IsNull([Test #1]);0;2)+If(IsNull([Test #2]);0;0.5)+IF(...))
From there you add the relevant test columns into the sums to get the totals for each row.

Using running totals in MS access report cumulatively

I am developing a db (MS access 2010) to support a school with a well-defined model for tuition quotation. The list of products is assembled for each quote, then various discounts are applied. The discounts may be a percentage or an absolute dollar amount. So far, so easy. The problem is that their business logic requires:
No limit on number of discounts.
Specific discounts to be applied in a defined sequence (implemented in my case with a "discount ordinal" column, values 1 (first applied) to 100 (last applied).
Each sequential application of a discount is to the running total of the quote. Eg: Total products $1000. Discount: 50%. Value: $500. Subtotal $500.
Subtotal: $500. Discount: $25. Value: $25. Subtotal: $475.
Subtotal: $475. Discount: $10%. Value: $47.50. Subtotal: $427.50.
This appears to be a variation of the "get the value of the field in the previous row" problem, but with the added twist that the "value of the field" is actually a cumulative calculation. It has the flavor of recursion: while discounts remain, subtotal(previous subtotal).
I have no clear idea how to implement this in a report, because the calculation as noted above is self-referential. I'm not looking for code here, just guidance on the general approach to the problem (ie, some kind of global variable, using VBA - in which case, I'm not sure what the "glue" between the query in VBA and the report would be - or some trick with a calculated field although I've spent a lot of time trying to figure one out). Any thoughts?
In that kind of situations, I always create a new table, that will get filled up when the report opens, and base the report in that table, not the original one. That way I can do all the calculations I need, even making several passes. The report then is simply a "dump" of the table. Complex totals can be additional columns, that will be shown only in the totals section.
You could have table for purchase history using an integer to link each purchase since an autonumber by itself will not link each discount stage.
So in excel I would use something like this:
v = Starting Value
i = 1
Do Until i = Last Discount
d = ws.Cells(i, 9).Value
v = v * (1 - d)
ws.Range("B2").Value = v
i = i + 1
Loop
At each stage you could write to the table (using docmd.runsql) the discount applied (d) and the value (v) but it could be quite slow. You could then order the table by purchase identifier then descending by value since sequential discounts will inherently order correctly.