Identify unique levels of categorical variable - unique

I have a list of person ids, and the types of medicines they got on specific dates.
I would like to create a variable count whereby I can give the indicator 1 to the first drug that occurs, 2 to the second unique drug and 3 to the third unique drug. When the first drug occurs after the second and third, I want it to still have the indicator 1. Likewise for unique drug 2, it should maintain the value 2 throughout the person's whole medication history, and the same for drug 3.
+-------------------------------------+
| p_id date agent_~e count |
|-------------------------------------|
38. | 1001 13dec2001 thiazide 1|
39. | 1001 12apr2002 thiazide 1|
40. | 1001 15jul2002 thiazide 1|
41. | 1001 28aug2002 arb 2|
42. | 1001 26sep2002 CCB 3|
|-------------------------------------|
43. | 1001 26sep2002 arb 2|
44. | 1001 10oct2002 CCB 3|
45. | 1001 10oct2002 thiazide 1|
46. | 1001 10oct2002 arb 2|
47. | 1001 10dec2002 CCB 3|
|-------------------------------------|
48. | 1001 10dec2002 arb 2|
+-------------------------------------+
Because each person has a different set of drugs, I think I need quite a general solution as opposed to something like
gen count = 1 if agent_type == "thiazide".
For example, person two is below and they have a very different drug history to person one above.
+-------------------------------+
| p_id date agent_t~e |
|-------------------------------|
207. | 2001 08jul1999 ace_inhib |
208. | 2001 02aug1999 ace_inhib |
209. | 2001 25aug1999 ace_inhib |
210. | 2001 22oct1999 ace_inhib |
211. | 2001 18nov1999 CCB |
|-------------------------------|
212. | 2001 18nov1999 ace_inhib |
213. | 2001 14dec1999 CCB |
214. | 2001 12jan2000 CCB |
215. | 2001 03feb2000 CCB |
216. | 2001 03feb2000 arb |
|-------------------------------|
217. | 2001 02mar2000 CCB |
+-------------------------------+

"Unique" is a common misnomer here; strictly, it means occurring once only, which is not what you mean at all. "Distinct" is a much better word: for a discussion in Stata context, see here.
Please find out about dataex from SSC to be able to show data examples that can be copied and pasted directly. Yours required some engineering to be made easy to use.
Your problem is already a Stata FAQ found here. It is a good idea to look through the FAQs before posting.
* Example generated by -dataex-. To install: ssc install dataex
clear
input float p_id str8 agent_type float(wanted date)
1001 "thiazide" 1 15322
1001 "thiazide" 1 15442
1001 "thiazide" 1 15536
1001 "arb" 2 15580
1001 "CCB" 3 15609
1001 "arb" 2 15609
1001 "CCB" 3 15623
1001 "thiazide" 1 15623
1001 "arb" 2 15623
1001 "CCB" 3 15684
1001 "arb" 2 15684
2001 "ace_inhi" 1 14433
2001 "ace_inhi" 1 14458
2001 "ace_inhi" 1 14481
2001 "ace_inhi" 1 14539
2001 "CCB" 2 14566
2001 "ace_inhi" 1 14566
2001 "CCB" 2 14592
2001 "CCB" 2 14621
2001 "CCB" 2 14643
2001 "arb" 3 14643
2001 "CCB" 2 14671
end
format date %td
bysort p_id agent_type (date) : gen firstdate = date[1]
egen group = group(p_id firstdate agent_type)
bysort p_id (group date agent_type): gen count = sum(group != group[_n-1])
assert count == wanted
Note that the code takes care of the possibility that two or more drugs are first used on the same day by the same person.

Related

Mysql - Get season from current month

I have the following table of seasons:
| id | name | start_month | end_month |
------------------------------------------
| 101 | Summer | 12 | 2 |
| 102 | Winter | 6 | 8 |
| 103 | Spring | 9 | 11 |
| 104 | Fall | 3 | 5 |
I need to get the season by month. Say current month is 2 (February), I want Summer to be the output.
I can get other seasons to work by simply having the where condition start_month >= 4 and end_month <= 4. But this won't work with Summer since the season crosses into next year.
What do I have to do to handle the case of Summer?
One solution I thought was to use dates instead of month number like 1980-12-01 and use between function but it gets a bit complicated for the user end.
It'd be great if it could work with just month numbers.
You could do:
(month(d) between start_month and end_month) or
(start_month>end_month and (month(d)>=start_month or month(d)<=end_month))
See db-fiddle

Displaying matching pairs & return

I am stumped on a question in my assignment.
On a single table (Condo_Unit), we have several columns - CondoID, UnitNum, SqrFt (Square Feet) etc.
I need to find a query that can display the UnitNum of any pair of Condos which have the same square footage. For example, Condos 305 & 409 both have square footage of 1500ft. The output must show both condos in a pair
At this stage, I can generate a list showing only one of the pair duplicated across two result columns (ie unit 305 is shown twice, not 305 | 409) using:
SELECT UnitNum, UnitNum
FROM condo_unit
GROUP BY SqrFt
HAVING Count(SqrFt) >1;
Sample data includes:
Condo ID | UnitNum | SqrFt
1 | 102 | 675
2 | 201 | 1030
3 | 305 | 1500
4 | 409 | 1500
5 | 104 | 1030
6 | 207 | 870
From this data, we can see units 201 & 104 are a matching pair, as well as 305 & 409
Results should show:
1st Unit | 2nd Unit
201 | 104
305 | 409
The current results I am getting are:
1st Unit | 2nd Unit
201 | 201
305 | 305
Is anyone able to assist, or need further clarification?
Query:
SELECT
DISTINCT least(t.c,t.d) as "1st Unit",
greatest(t.c,t.d) as "2nd Unit"
FROM
(SELECT a.UnitNum c,b.UnitNum d
FROM world.condo a JOIN world.condo b
WHERE a.SqrFt=b.SqrFt AND a.Condo_ID!=b.Condo_ID) t;
Output:
This code will help you
select GROUP_CONCAT(UnitNum,'&'),SqrFt from Condo_Unit group by SqrFt ORDER BY SqrFt
This should do.
select GROUP_CONCAT(UnitNum SEPARATOR '|') as UnitName from condo_unit
group by SqrFt HAVING Count(SqrFt) >1;
DEMO FOR ANSWER
OUTPUT :
+----------+
| UnitName |
+----------+
| 201|104 |
+----------+
| 305|409 |
+----------+
You can use whatever separator you want. I have given pipe symbol |.

Get Number of A's in Result Table - MySQL

This is the case. In my school all classes prepare excel sheet for each class with marks for each subject in term end test. There are 17 classes. I combine them in to access table. Then again export all data in to excel. make csv file . And import to Mysql Database using phpmyadmin. now I have result table as follow.
| ID | Name | Religion | Sinhala | science | english | maths | History | Categery 1 | Categery 2 | Categery 3 | Total | Average | Rank | |
|---- |------- |---------- |--------- |--------- |--------- |------- |--------- |------------ |------------ |------------ |------- |--------- |------ |--- |
| 1 | manoj | 45 | 65 | 78 | 98 | 67 | 67 | 63 | 76 | 64 | 654 | 62 | 12 | |
Sectional Head Need to get number of students who got >75 for all Subject.
And Number of Student Who got >75 for 8 subject out of 9.
I need to retrieve number of A s, B s (marks >=75) from this table.
Ex. Student names and Number of A s
Total Number of A for all 9 subject - 45
Total Number of A for all 8 subject (any 8 subject ) - 45
Total Number of A for all 7 subject (any 7 subject ) - 45
I Tried following SQL Statement
SELECT COUNT(SELECT COUNT()
FROM result
WHERE religion >=75
AND Math >=75)
FROM result
I read about same scenario in stack overflow.
Access 2010
this one get some point. but I cant solve it for my scenario.
Use GROUP BY studentName and SUM(grade = 'A') AS numberOfAs.
[Quick answer bc question is quickly formatted]

what is the logic to represent sub items from a box in an stock(Warehouse) database?

For ex.: The process of buy a rice bag of 100 kg, give one entry for the completely bag, and then give out 20 or 30 kilogram of that bag. How to achieve it in the stock database.
The structure can be different based on the behaviour of your application. There is no absolute way. But, I give you an example and you can find an idea:
inventory table:
id
date
merchandise_id
amount (negative for exit and positive for entry)
inventory_id (this is null for entry and includes the id of entry for exits)
Sample data:
id | date | merchandise_id | amount | inventory_id
-----------------------------------------------------------------
1 | 2016-06-01 | 32 | 100 | NULL
2 | 2016-06-03 | 32 | -20 | 1
3 | 2016-06-04 | 32 | -30 | 1

SQL GROUP BY query issue

I have a table with the following columns:
DriverNumber; DriverName; CarNumber; DriverConditions; LogonTime; VehicleID
this table has an entry for each LogonTime for each DriverNumber, and a driver can logon to different vehicles.
for example:
93070495 Mehar 189 Parcel, V, Wheelchair, M50, Special, Animal, COD P... Jan 2 2014 07:40:26:197AM 1029
93070495 Mehar 189 Parcel, V, Wheelchair, M50, Special, Animal, COD P... Jan 7 2014 08:09:50:097AM 1029
25184313 Kerry 895 Parcel, Cheques, V, Wheelchair, Special, Animal, C... Jan 3 2014 05:00:26:600PM 970
what i essentially want to do is show how many times a DriverNumber logs into each car.
this is what i have done so far:
SELECT DriverNumber, DriverName, CarNumber, DriverConditions, LogonTime,
count(DriverNumber) as DriverCount
FROM SilverDrivers
WHERE DriverNumber > 0
GROUP BY CarNumber
This gives me close to what i am after, but it only shows one CarNumber per DriverNumber. eg:
DRIVER HDL | DRIVER NAME | CAR NUMBER | DRIVER CONDITIONS | NUMBER OF LOGONS
98749492 | Manpreet | 3 | Parcel | 10
32176467 | Mark | 19 | Wheelchair | 7
92173581 | Varinder | 46 | Parcel | 1
what i want it to look like is:
DRIVER HDL | DRIVER NAME | CAR NUMBER | DRIVER CONDITIONS | NUMBER OF LOGONS
98749492 | Manpreet | 3 | Parcel | 7
98749492 | Manpreet | 12 | Parcel | 3
32176467 | Mark | 19 | Wheelchair | 4
32176467 | Mark | 214 | Wheelchair | 3
92173581 | Varinder | 46 | Parcel | 1
You should also group by driver number to get the count you want. Also, add all the other columns to your GROUP BY clause and remove columns that wouldn't be unique from SELECT (I left only DriverName as I assume it's always the same for one DriverNumber).
SELECT DriverNumber, DriverName, CarNumber, count(*) as DriverCount
FROM SilverDrivers
WHERE DriverNumber > 0
GROUP BY DriverNumber, DriverName, CarNumber
You want either
GROUP BY DriverNumber, CarNumber
or
GROUP BY CarNumber, DriverNumber