Tag duplicates with first occurrence ID - duplicates

I'm using the clustercommand and am having difficulties due to insufficient memory. To get around this problem I would like to delete all duplicate observations.
I would like to cluster via the variables A, B and C and I identify duplicate values as so:
/* Create dummy data */
input id A B C
1 1 1 1
2 1 1 1
3 1 1 1
4 2 2 2
5 2 2 2
6 2 2 2
7 2 2 2
8 3 3 3
9 3 3 3
10 4 4 4
end
sort A B C id
duplicates tag A B C, gen(dup_tag)
I would like to add a variable dup_ID which tells me that ids 2 and 3 are duplicates of id 1, ids 5 and 6 of id 4, and so on. How could I do this?
/* Desired result */
id A B C dup_id
1 1 1 1 1
2 1 1 1 1
3 1 1 1 1
4 2 2 2 4
5 2 2 2 4
6 2 2 2 4
7 2 2 2 4
8 3 3 3 8
9 3 3 3 8
10 4 4 4 10

duplicates is a wonderful command (see its manual entry for why I say that), but you can do this directly:
bysort A B C : gen tag = _n == 1
tags the first occurrence of duplicates of A B C as 1 and all others as 0. For the other way round use _n > 1, _n != 1, or whatever.
EDIT:
So then the id of tagged observations is just
by A B C: gen dup_id = id[1]
For basic technique with by: see (e.g.) this discussion

You can refer to the first observation in each group of A B C using the subscript [1] on ID. Note the (id) argument in bysort, which sorts by id, but identifies the groups by A, B, and C only.
clear
input id A B C
1 1 1 1
2 1 1 1
3 1 1 1
4 2 2 2
5 2 2 2
6 2 2 2
7 2 2 2
8 3 3 3
9 3 3 3
10 4 4 4
end
bysort A B C (id): gen dup_id = id[1]
li, noobs sepby(dup_id)
yielding
+-------------------------+
| id A B C dup_id |
|-------------------------|
| 1 1 1 1 1 |
| 2 1 1 1 1 |
| 3 1 1 1 1 |
|-------------------------|
| 4 2 2 2 4 |
| 5 2 2 2 4 |
| 6 2 2 2 4 |
| 7 2 2 2 4 |
|-------------------------|
| 8 3 3 3 8 |
| 9 3 3 3 8 |
|-------------------------|
| 10 4 4 4 10 |
+-------------------------+

Related

MS Access Sum and Averaging Columns with Same Information as New Column

So I have the following chart here where I have columns:
a b c d
1 1 1 3
1 1 1 4
1 1 1 5
2 2 2 1
2 2 2 3
2 2 2 3
3 3 3 4
3 3 3 5
3 3 3 6
What I want to do is add and average column d where columns a,b,c contain the same values. How would I go about doing this?
I imagine it would be something like
Select SUM(Table.d) where a = b AND b = c AS e
Try this:
Select SUM(Table.d), AVG(Table.d) from Table Group by Table.a, Table.b, Table.c

What is the best way to store mysql tree relation changes?

I found the solution of doing it, but I don't like it. Because each time we change one relation we must duplicate all tree records instead of one record. Who have any ideas how to do it in other way, or optimize my - you are welcome! Thanks!
The simpliest self to self tree table:
`categoies`
id|parent_id|label|
--+---------+------
1 NULL A
2 1 B
3 2 C
4 3 D
5 4 E
"A > B > C > D > E" nested set
I want to save all relation changes. I think that I need to do it in this way:
add table revisions
`revisions`
id | created |
---+----------------------
1 2015-02-02 09:00:00
2 2015-03-02 09:01:00
3 2015-04-02 09:00:00
4 2015-04-07 09:02:00
change table categories
`categoies`
id|label|
--+-------
1 A
2 B
3 C
4 D
5 E
add table category_revisions
`category_revisions`
id|rev_id|category_id|parent_category_id|
--+------+-----------+-------------------
1 1 1 null
2 1 2 1
3 1 3 2
4 1 4 3
5 1 5 4 # A > B > C > D > E
6 2 1 null
7 2 2 1
8 2 3 1
9 2 4 3
10 2 5 4 # A > B
# > C > D > E

Duplicate Value Avoid in sql query

item qty
1201-10-005-A 1
1110-01-006-A 1
1112-01-006-A 1
1202-01-008-A 1
1202-01-023-A 1
G-1000-00-003-A 1
Q-2252-00-004-D 1
1150-01-002-A 1
1201-01-009-A 1
1201-01-010-A 1
1201-01-012-A 1
1201-01-013-A 1
1201-02-005-A 1
1201-02-006-A 1
1201-04-001-A 1
1201-05-001-A 1
1201-06-002-A 1
1201-06-003-A 1
1201-06-004-A 1
1201-07-001-A 1
1201-07-002-A 1
1201-07-005-A 1
1201-07-006-A 1
1201-07-009-A 1
1201-07-007-A 1
1201-06-004-A 2
1201-07-001-A 2
1201-07-002-A 2
1201-07-005-A 2
1201-07-006-A 2
1201-07-007-A 2
1201-07-009-A 2
1201-10-005-A 2
1202-01-008-A 2
1202-01-023-A 2
1110-01-006-A 2
1201-06-004-A 3
1201-07-001-a 3
1201-07-002-A 3
1201-07-005-A 3
1201-07-006-a 3
1201-07-007-A 3
1201-07-009-A 3
1201-10-005-A 3
1202-01-008-A 3
1202-01-023-A 3
1110-01-006-A 3
1130-03-009-A 3
1201-06-004-A 4
1201-07-001-A 4
1201-07-002-A 4
1201-07-005-A 4
1201-07-006-A 4
1201-07-007-A 4
1201-07-009-A 4
1201-10-005-A 4
1202-01-008-A 4
1202-01-023-A 4
1110-01-006-A 4
1130-03-009-A 4
1110-01-006-A 5
1130-03-009-A 5
1201-01-009-A 1
0004-08-107-A 1
0010-08-012-A 1
1000-00-003-B 1
Same item repeat show max quantuty value ony
You need to use Group By:
select item,max(quantity)
from table
group by item

mysql- 3 random records for main departments with multiple level depth of subdepartments

I am trying make a complex query in MySQL i have following two tables
departments employees
Id parent title id name department status
--------------------------- ------------------------------------
1 0 Health 1 abc 3 1
2 0 Sports 2 def 3 1
3 0 Education 3 ghi 5 1
4 1 Physical 4 jkl 10 1
5 1 Mental 5 kkk 6 1
6 2 Football 6 lll 6 1
7 2 Baseball 7 sss 8 1
8 2 Beachball 8 xxx 6 1
9 2 Hockey 9 yyy 6 1
10 4 ENT 10 zzz 7 1
11 0 Finance 11 mnb 11 1
Departments table have four main departments(i-e: parent = 0) with multiple level depths of sub-departments.
Currently i have done it through a PHP function by running queries multiple time to achieve this, but still i want know if this is possible to fetch it with a Query.
Whats the best way OR how to select max 3 employee randomly for each main department with status 1.
The expected result should be something like
id name department maindep
1 abc 3 3
2 def 3 3
3 ghi 5 1
4 jkl 10 1
5 kkk 6 2
7 sss 8 2
10 zzz 7 2
11 mnb 11 11

How to order by max(a,b) in SQL?

Consider the following table:
id a b
--------------
1 5 1
2 2 3
3 4 2
4 3 6
5 0 1
6 2 2
I would like to order it by max(a,b) in descending order, so that the result will be:
id a b
--------------
4 3 6
1 5 1
3 4 2
2 2 3
6 2 2
5 0 1
What will be the SQL query to perform such ordering ?
Use GREATEST :
SELECT *
FROM table
ORDER BY GREATEST(a, b) DESC