Counting duplicate observations in a variable without deleting *SAS* - duplicates

I have a data set of annual patient claims where each patient can be represented more than once because he can have more than one claim per year (ie: a woman could have two claims if she gives birth two times in a year).
I want to count the number of times each patient ID is repeated, but I do not want to eliminate the duplicates or take them out of the data set. Is there a different code to do this?
Thanks!!

data work.claims_data;
input patient_id $ claim_number $;
datalines;
P1 C1
P1 C2
P1 C3
;
run;
proc sql;
select patient_id,count(distinct claim_number) - 1 as cnt
from claims_data
group by patient_id
having cnt > 0;
quit;
Working: SQL procedure above will give patient wise count of distinct claim numbers from the input dataset. If we subtract 1 from each count will give repetition claim count for each patient.
Output:
Patient_ID cnt
P1 2

Related

Query to get the output as shown in designation table

EmpId EmpDeptName
1 Account
2 Sales
3 IT
4 HR
i want output in the format
EmpId Account Sales IT HR
1
2
3
4
If anybody knows how to write query to get this output , please provide me that query.
This type od data transformation is a PIVOT, where you take row data and convert it to columns.
You did not provide many details, but if you wanted the count of employees in each department, you could use something like this:
select Account, Sales, IT, HR
from
(
select EmpDeptName
from employees
) d
pivot
(
count(EmpDeptName)
for EmpDeptName in (Account, Sales, IT, HR)
) p;
This will display the total number of employees in each department as column data instead of rows.

N or more continuous year range

I have to create a report using MySql DB where more than 4 tables are involved. I have one table (S1) with S1_ID and S1_Year_Range (strings like 2001-2002) and another table (S2) with S2_ID(PK), S2_Customer_ID, S1_ID (FK) and other fields for other conditions that can appear in Where clause of my query. There can be more than one row in S2 with the same S2_Customer_ID but different S1_ID. My query is to create a report using VB.net and ask users to enter two values; one number for how many continuous years or bigger (like >= 5 years), and a year range value (like 2011-2012) which is the highest value in the list for all customers.
My report lists customer names (by joining the above query with another table), customer rank and all year range values (highest at the bottom) for that customer in one column for each customer. Any help for this query would be appreciated.
Data and results could be like the following:
S1:
(S1_ID....S1_Year_Range)
(1......2000-2001)
(2......2001-2002)
(3......2002-2003)
(4......2003-2004)
(5......2004-2005)
etc
S2:
(S2_ID.....S2_Customer_ID.....S1_ID)
(1....1....1)
(2....1....2)
(3....1....3)
(4....2....2)
(5....2....3)
(6....2....5)
(7....3....2)
(8....3....3)
(9....3....4)
(10...3....5)
(11...4....3)
(12...4....4)
(13...4....5)
etc
when number 2 and year range (2003-2004) is entered by the user, the result should be the following:
customer 3 with 3 year range values (2003-2004, 2002-2003, and 2001-2002) and customer 4 with 2 year range values (2003-2004 and 2002-2003):
cname3
2001-2002
2002-2003
2003-2004
cname4
2002-2003
2003-2004
I hope you can see the columns of the report correctly.
I finally created a complex query to solve my problem. In the following query, I encoded the user year range value as '2010-2011' and number of continuous years as 14. Also a tiny difference with the question is the table names; table CSP here is the same as table S2 in my question but field names are the same as those in my question.
SELECT CSYWFY.S2_Customer_ID, COUNT(CSYWFY.S2_Customer_ID)
FROM (SELECT S1F.S1_Year_Range, S2.S2_Customer_ID , COUNT(S1F.S1_Year_Range) FROM CSP as S2 INNER JOIN S1 as S1F ON S2.S1_ID = S1F.S1_ID WHERE '2010-2011' IN (SELECT S1N.S1_Year_Range FROM CSP as S2N INNER JOIN S1 as S1N ON S2N.S1_ID = S1N.S1_ID WHERE S2N.S2_Customer_ID = S2.S2_Customer_ID ) GROUP BY S2.S2_Customer_ID ASC , S1F.S1_Year_Range DESC ) CSYWFY
GROUP BY CSYWFY.S2_Customer_ID
HAVING COUNT(CSYWFY.S2_Customer_ID) > 14
HTH

MySQL - The most occuring for the specific day?

I'm stuck on this problem.
Basically I need to find out for each department how to figure out which days had the most sales made in them. The results display the department number and the date of the day and a department number can appear several times in the results if there were several days that have equally made the most sales.
This is what I have so far:
SELECT departmentNo, sDate FROM Department
HAVING MAX(sDate)
ORDER BY departmentNo, sDate;
I tried using the max function to find which dates occurred most. But it only returns one row of values. To clarify more, the dates that has the most sales should appear with the corresponding column called departmentNo. Also, if two dates for department A has equal amount of most sales then department A would appear twice with both dates showing too.
NOTE: only dates with the most sales should appear and the departmentNo.
I've started mySQL for few weeks now but still struggling to grasp the likes of subqueries and store functions. But i'll learn from experiences. Thank you in advance.
UPDATED:
Results I should get:
DepartmentNo Column 1: 1 | Date Column 2: 15/08/2000
DepartmentNo Column 1: 2 | Date Column 2: 01/10/2012
DepartmentNo Column 1: 3 | Date Column 2: 01/06/1999
DepartmentNo Column 1: 4 | Date Column 2: 08/03/2002
DepartmentNo Column 1: nth | Date Column 2: nth date
These are the data:
INSERT INTO Department VALUES ('1','tv','2012-05-20','13:20:01','19:40:23','2');
INSERT INTO Department VALUES ('2','radio','2012-07-22','09:32:23','14:18:51','4');
INSERT INTO Department VALUES ('3','tv','2012-09-14','15:15:43','23:45:38','3');
INSERT INTO Department VALUES ('2','tv','2012-06-18','06:20:29','09:57:37','1');
INSERT INTO Department VALUES ('1','radio','2012-06-18','11:34:07','15:41:09','2');
INSERT INTO Department VALUES ('2','batteries','2012-06-18','16:20:01','23:40:23','3');
INSERT INTO Department VALUES ('2','remote','2012-06-18','13:20:41','19:40:23','4');
INSERT INTO Department VALUES ('1','computer','2012-06-18','13:20:54','19:40:23','4');
INSERT INTO Department VALUES ('2','dishwasher','2011-06-18','13:20:23','19:40:23','4');
INSERT INTO Department VALUES ('3','lawnmower','2011-06-18','13:20:57','20:40:23','4');
INSERT INTO Department VALUES ('3','lawnmower','2011-06-18','11:20:57','20:40:23','4');
INSERT INTO Department VALUES ('1','mobile','2012-05-18','13:20:31','19:40:23','4');
INSERT INTO Department VALUES ('1','mouse','2012-05-18','13:20:34','19:40:23','4');
INSERT INTO Department VALUES ('1','radio','2012-05-18','13:20:12','19:40:23','4');
INSERT INTO Department VALUES ('2','lawnmowerphones','2012-05-18','13:20:54','19:40:23','4');
INSERT INTO Department VALUES ('2','tv','2012-05-12','06:20:29','09:57:37','1');
INSERT INTO Department VALUES ('2','radio','2011-05-23','11:34:07','15:41:09','2');
INSERT INTO Department VALUES ('1','batteries','2011-05-21','16:20:01','23:40:23','3');
INSERT INTO Department VALUES ('2','remote','2011-05-01','13:20:41','19:40:23','4');
INSERT INTO Department VALUES ('3','mobile','2011-05-09','13:20:31','19:40:23','4');
For department1 the date 2012-05-18 would appear because that date occurred the most. And for every department, it should only show the one with the most sales, and if same amount of sales appears on the same date then both will appear, e.g. Department 1 will appear twice with both the dates of max sales.
I've tested the following query based on the table and two columns you've provided along with sample data. So, let me describe it for you. The inner-most "PREQUERY" is doing a count by department and date. The results of this will be pre-ordered by Department first, THEN the highest count in DESCENDING ORDER (so highest sales count is listed FIRST), it doesn't matter what date the count happened.
Next, by utilizing MySQL #variables, I'm pre-declaring two to be used in the query. #variables are like inline programming with MySQL. They can be declared once and then changed as applied to each record being processed. So, I'm defaulting to a bogus department value and a zero sales count.
Now, I'm grabbing the results of the PreQuery (Dept, #Sales and Date), but now, adding a test. If it is the FIRST ENTRY for a given department, use that record's "NumberOfSales" and put into the #maxSales variable and store as a final column name "MaxSaleCnt". The next column name uses the #lastDept and is set to whatever the current record's Department # is. So it can be compared to the next record.
If the next record is the same department, then it just keeps whatever the #maxSales value was from the previous, thus keeping the same first count(*) result for ALL entries on each respective department.
Now, the closure. I've added a HAVING clause (not a WHERE as that restricts what records get tested, but HAVING processes AFTER the records are part of the PROCESSED set. So now, it would have all 5 columns. I am saying ONLY KEEP those records where the final NumberOfSales for the record MATCHES the MaxSaleCnt for the department. If one, two or more dates, no problem it returns them all per respective department.
So, one department could have 5 dates with 10 sales each, and another department has 2 dates with only 3 sales each, and another with only 1 date with 6 sales.
select
Final.DepartmentNo,
Final.NumberOfSales,
Final.sDate
from
(select
PreQuery.DepartmentNo,
PreQuery.NumberOfSales,
PreQuery.sDate,
#maxSales := if( PreQuery.DepartmentNo = #lastDept, #maxSales, PreQuery.NumberOfSales ) MaxSaleCnt,
#lastDept := PreQuery.DepartmentNo
from
( select
D.DepartmentNo,
D.sDate,
count(*) as NumberOfSales
from
Department D
group by
D.DepartmentNo,
D.sDate
order by
D.DepartmentNo,
NumberOfSales DESC ) PreQuery,
( select #lastDept := '~',
#maxSales := 0 ) sqlvars
having
NumberOfSales = MaxSaleCnt ) Final
To clarify the "#" and "~" per you final comment. The "#" indicates a local variable to the program (or in this case and in-line sql variable) that can be used in the query. The '~' is nothing more than a simple string that probability would never exist that of any of your departments, so when it is compared to the first qualified record, does an IF( '~' = YourFirstDepartmentNumber, then use this answer, otherwise use this answer).
Now, how do the above work. Lets say the following is the results of your data returned by the inner-most query, grouped and ordered by the most sales at the top going down... SLIGHTLY altered from your data, lets just assume the following to simulate multiple dates on Dept 2 that have the same sales quantity...
Row# DeptNo Sales Date # Sales
1 1 2012-05-18 3
2 1 2012-06-18 2
3 1 2012-05-20 1
4 2 2012-06-18 4
5 2 2011-05-23 4
6 2 2012-05-18 2
7 2 2012-05-12 1
8 3 2011-06-18 2
9 3 2012-09-14 1
Keep track of the actual rows. The innermost query that finishes as alias "PreQuery" returns all the rows in the order you see here. Then, that is joined (implied) with the declarations of the # sqlvariables (special to MySQL, other sql engines dont do this) and starts their values with the lastDept = '~' and the maxSales = 0 (via assignment with #someVariable := result of this side ).
Now, think of the above being handled as a
DO WHILE WE HAVE RECORDS LEFT
Get the department #, Number of Sales and sDate from the record.
IF the PreQuery Record's Department # = whatever is in the #lastDept
set MaxSales = whatever is ALREADY established as max sales for this dept
This basically keeps the MaxSales the same value for ALL in the same Dept #
ELSE
set MaxSales = the # of sales since this is a new department number and is the highest count
END IF
NOW, set #lastDept = the department you just processed to it
can be compared when you get to the next record.
Skip to the next record to be processed and go back to the start of this loop
END DO WHILE LOOP
Now, the reason you need to have the #MaxSales and THEN the #LastDept as returned columns is they must be computed for each record to be used to compare to the NEXT record. This technique can be used for MANY application purposes. If you click on my name, look at my tags and click on the MySQL tag, it will show you the many MySQL answers I've responded to. Many of them do utilize # sqlvariables. In addition, there are many other people who are very good at working queries, so dont just look in one place. As for any question, if you find a good answer that you find helpful, even if you didn't post the question, clicking on an up-arrow next to the answer helps others indicate what really helped them understand and get resolution to questions -- again, even if its not your question. Good luck on your MySQL growth.
I think this can be achieved with a single query, but my experiences for similar functionality have involved either WITH (as defined in SQL'99) using either Oracle or MSSQL.
The best (only?) way to approach a problem like this is to break in into smaller components. (I don't think your provided statement provides all columns, so I'm going to have to make a few assumptions.)
First, how many sales were made for each day for each group:
SELECT department, COUNT(1) AS dept_count, sale_date
FROM orders
GROUP BY department, sale_date
Next, what's the most sales for each department
SELECT tmp.department, MAX(tmp.dept_count)
FROM (
SELECT department, COUNT(1) AS dept_count
FROM orders
GROUP BY department
) AS tmp
GROUP BY tmp.department
Finally, putting the two together:
SELECT a.department, a.dept_count, b.sale_date
FROM (
SELECT tmp.department, MAX(tmp.dept_count) AS max_dept_count
FROM (
SELECT department, COUNT(1) AS dept_count
FROM orders
GROUP BY department
) AS tmp
GROUP BY tmp.department
) AS a
JOIN (
SELECT department, COUNT(1) AS dept_count, sale_date
FROM orders
GROUP BY department, sale_date
) AS b
ON a.department = b.department
AND a.max_dept_count = b.dept_count

View to return row duplicated the # of times represented by amount field value

I have an accounting table that contains a dollar amount field. I want to create a view that will return one row per penny of that amount with some of the other fields from the table.
So as a very simple example let's say I have a row like this:
PK Amount Date
---------------------------
123 4.80 1/1/2012
The query/view should return 480 rows (one for each penny) that all look like this:
PK Date
-----------------
123 1/1/2012
What would be the most performant way to accomplish this? I have a solution that uses a table valued function and a temp table but in the back of my head I keep thinking there has got to be a way to accomplish this with a traditional view. Possibly a creative cross join or something that will return this result without having to declare too many resources in the form of temp tables, and tbf's etc. Any ideas?
With a little help of a numbers table you can do like this:
select PK, [Date]
from YourTable as T
inner join number as N
on N.n between 1 and T.Amount * 100
If you don't have one around and you want to test this you can use master..spt_values.
declare #T table
(
PK int,
Amount money,
[Date] date
)
insert into #T values
(123, 4.80, '20120101')
;with number(n) as
(
select number
from master..spt_values
where type = 'P'
)
select PK, [Date]
from #T as T
inner join number as N
on N.n between 1 and T.Amount * 100
Update:
From an article by Jeff Moden.
The "Numbers" or "Tally" Table: What it is and how it replaces a loop.
A Tally table is nothing more than a table with a single column of
very well indexed sequential numbers starting at 0 or 1 (mine start at
1) and going up to some number. The largest number in the Tally table
should not be just some arbitrary choice. It should be based on what
you think you'll use it for. I split VARCHAR(8000)'s with mine, so it
has to be at least 8000 numbers. Since I occasionally need to generate
30 years of dates, I keep most of my production Tally tables at 11,000
or more which is more than 365.25 days times 30 years.
You could use a CTE, something like this:
WITH duplicationCTE AS
(
SELECT PK, Date, Amount, 1 AS Count
FROM myTable
UNION ALL
SELECT myTable.PK, myTable.Date, myTable.Amount, Count+1
FROM myTable
JOIN duplicationCTE ON myTable.PK = duplicationCTE.PK
WHERE Count+1 <= myTable.Amount*100
)
SELECT PK, Date
FROM duplicationCTE
OPTION (MAXRECURSION 0);
Here is the SqlFiddle
AND, do note the 0. That means that this can run infinitely (dangerous btw) Otherwise, 32676 is the max number of recursions you can set (default is 100). However, if you are running over 32676 loops, then maybe you need to rethink your logic :)

mysql first record retrieval

While very easy to do in Perl or PHP, I cannot figure how to use mysql only to extract the first unique occurence of a record.
For example, given the following table:
Name Date Time Sale
John 2010-09-12 10:22:22 500
Bill 2010-08-12 09:22:37 2000
John 2010-09-13 10:22:22 500
Sue 2010-09-01 09:07:21 1000
Bill 2010-07-25 11:23:23 2000
Sue 2010-06-24 13:23:45 1000
I would like to extract the first record for each individual in asc time order.
After sorting the table is ascending time order, I need to extract the first unique record by name.
So the output would be :
Name Date Time Sale
John 2010-09-12 10:22:22 500
Bill 2010-07-25 11:23:23 2000
Sue 2010-06-24 13:23:45 1000
Is this doable in an easy fashion with mySQL?
I think that something along the lines of
select name, date, time, sale from mytable order by date, time group by name;
will get you what you're looking for
you need to perform a groupwise max or groupwise min
see below or http://pastie.org/973117 for an example
select
u.user_id,
u.username,
latest.comment_id
from
users u
left outer join
(
select
max(comment_id) as comment_id,
user_id
from
user_comment
group by
user_id
) latest on u.user_id = latest.user_id;
In databases, there really is no "first" or "last" record; think of each record as its own, non-positional entity in the table. The only positions they have are when you give them one, say, using ORDER BY.
This will give you what you want. It might not be efficient, but it works.
select Name, Date, Time, Sale from
(select Name, Date, Time, Sale from MyTable
order by Date asc, Time asc) MyTable_subquery_name
group by Name
Note: MyTable_subquery_name is just a dummy name for the subquery. MySQL will give the error ERROR 1248 (42000): Every derived table must have its own alias without it.
If only GROUP BY and ORDER BY were communicative operations, then this wouldn't have to be a subquery.