I have data of the following type (the other variables can be completely random):
Name Member Other variables
AAA 0
AAA 0
AAA 1
BBB 0
BBB 0
CCC 1
Note that the 1 can occur in a group of duplicates at any location, but only one will occur per block of duplicates.
I would like to eliminate duplicates as follows:
If member is 1, no need to worry, no duplicates problem (e.g., CCC).
If member is 0 for all duplicates, that's alright too (e.g., BBB)
if member has 1 and the rest 0, then all other rows for that duplicate set needs to be made equal to 1.
I have tried looking at using duplicates, custom routines using _N, _n etc., but none of them work since I don't know how to loop over one set of duplicates at a time (have also looked into foreach, etc).
The final result should look like this:
Name Member Other variables
AAA 1
AAA 1
AAA 1
BBB 0
BBB 0
CCC 1
One thing I was thinking of was that if I can somehow work with one group at a time, I can apply max() to the member column for each block of duplicates, and that will yield what I want. However, the issue is that I don't know how to work with one group at a time.
Bonus:
If I can also eliminate duplicates after this change and arrive at the set below, that will be a nice bonus. But I think that I know how to get there once the above step is clear.
Name Member Other variables
AAA 1
BBB 0
CCC 1
The following works for me:
clear
input str3 Name Member
AAA 0
AAA 0
AAA 1
BBB 0
BBB 0
CCC 1
end
bysort Name (Member) : egen Wanted1 = max(Member)
or
bysort Name (Member) : generate Wanted2 = Member[_N]
Both produce the desired output:
list, sepby(Name)
+-----------------------------------+
| Name Member Wanted1 Wanted2 |
|-----------------------------------|
1. | AAA 0 1 1 |
2. | AAA 0 1 1 |
3. | AAA 1 1 1 |
|-----------------------------------|
4. | BBB 0 0 0 |
5. | BBB 0 0 0 |
|-----------------------------------|
6. | CCC 1 1 1 |
+-----------------------------------+
Note that _n runs from 1 to _N in any (sub)set of observations. As such,
_N always indexes the last observation.
Bonus:
bysort Name (Member) : drop if (Wanted1 == 1 & Name == Name[_n-1]) | ///
(Wanted1 == 0 & Name == Name[_n-1])
list, sepby(Name)
+-----------------------------------+
| Name Member Wanted1 Wanted2 |
|-----------------------------------|
1. | AAA 0 1 1 |
|-----------------------------------|
2. | BBB 0 0 0 |
|-----------------------------------|
3. | CCC 1 1 1 |
+-----------------------------------+
Related
I want to sort the user record according to city (chosen from the drop-down list). like if I pass city_id 22 in my query then i want all the row first which are having city_ids 22 then the rest of the rows.
I know WHERE find_in_set('22',city_ids) will give me the correct result but it will not return the all rows so I want to achieve it using some ORDER BY .
I have tried ORDER BY FIND_IN_SET('22',city_ids) but its not working. How do I fix this, any best way?
User Table:
Id Name city_ids
1 AAAAA 10,22,30
2 BBBBB 11,28
3 CCCCC 15,22,44
4 DDDDD 19,99,
5 EEEEE 55,27,22
Want Sorted Output like below:
Id Name city_ids
1 AAAAA 10,22,30
3 CCCCC 15,22,44
5 EEEEE 55,27,22
2 BBBBB 11,28
4 DDDDD 19,99,
You can do:
ORDER BY (FIND_IN_SET('22', city_ids) > 0) DESC
This puts matches first.
Then you should fix your data model. It is broken, broken, broken. Storing lists of ids in a string is wrong for many reasons:
The data types are (presumably) wrong. The ids are numbers and should not be stored as strings.
Storing multiple values in a column is not the SQL way to store things.
Ids should have properly declared foreign key relationships, which you cannot declare.
SQL does not have very good functions for processing strings.
The resulting queries cannot take advantage of indexes or partitioning, impeding performance.
SQL has this really great data structure for storing lists of things. It is called a table, not a string column.
The expression:
FIND_IN_SET('22', city_ids) > 0
will return 1 for all rows where '22' exists in column city_ids and 0 for the others.
So, after that you need add one more level for sorting by id ascending:
ORDER BY
FIND_IN_SET('22', city_ids) > 0 DESC,
id
See the demo.
Results:
| Id | Name | city_ids |
| --- | ----- | -------- |
| 1 | AAAAA | 10,22,30 |
| 3 | CCCCC | 15,22,44 |
| 5 | EEEEE | 55,27,22 |
| 2 | BBBBB | 11,28 |
| 4 | DDDDD | 19,99 |
What I want to do is this:
case_1 case_2 case_3 Final
0 0 0 0
0 0 1 3
0 1 0 2
1 0 0 1
1 1 0 2
1 0 1 3
0 1 1 3
1 1 1 3
That means when case_1 is 0, case_2 is 0 and case_3 is 0, the final col has value 0.
Similarly, when case_1 is 1, case_2 is 1 and case_3 is 1, the final cols will be 3.
And so forth.
And what I ended up typing in SQL which is awkward:
Select *,
case when case_1>0 and case_2>0 and case_3>0 then 3 else 0,
case when case_1>0 and case_2>0 and case_3=0 then 2 else 0,
case when case_1>0 and case_2=0 and case_3=0 then 1 else 0,
....
....
....
from mytable;
Now this is seriously bad, I know that. Can there be better way of such coding?
From the example, it looks like the priority is case 3 -> case 2 -> case 1. In which case, you can do something like this:
SELECT *,
CASE WHEN case_3 > 0 THEN 3
WHEN case_2 > 0 THEN 2
WHEN case_1 > 0 THEN 1
ELSE 0 END AS `Final`
FROM table;
Looks like you want the rightmost position of nonzero column, if any
select *,
case when case_3>0 then 3 else
case when case_2>0 then 2 else
case when case_1>0 then 1 else 0 end
end
end final
from tbl
For what it's worth, electrical engineering knows this problem as "generating a Boolean expression from a truth table."
I'm going a different direction from the other answerers.
Create yourself a tiny lookup table with eight rows and four columns, like this
SELECT * FROM final
| case_1 | case_2 | case_3 | Final |
|--------|--------|--------|-------|
| 0 | 0 | 0 | 0 |
| 0 | 0 | 1 | 3 |
| 0 | 1 | 0 | 2 |
| 1 | 0 | 0 | 1 |
| 1 | 1 | 0 | 2 |
| 1 | 0 | 1 | 3 |
| 0 | 1 | 1 | 3 |
| 1 | 1 | 1 | 3 |
Then join to it to your main data table to do your lookup of the final value, like this (http://sqlfiddle.com/#!9/4de009/1/0).
SELECT a.Name, b.Final
FROM test a
JOIN final b ON a.case_1 = b.case_1
AND a.case_2 = b.case_2
AND a.case_3 = b.case_3
Performance? Not a problem on an eight-row lookup table. SQL is made for this.
Flexibility? If your rules for computing Final change all you have to do is update the table. You don't have to do the Boolean expression simplification again.
Complexity? Well, yes, it's more complex than a nested bunch of CASE or IF statements. But it's easier to read.
Is it possible to set it so that if you try to created a table where two values match and one value is a specific value say, 1, then it won't let you?
I want to have a table like this:
Table1_id: | Table1_name:
1 | Edgar
2 | Rudy
Table2_id: | Table2_Table1_id |Table2_active(bit):
1 | 1 | 1 (this is okay)
2 | 1 | 0 (this is okay)
3 | 1 | 1 (this is NOT okay) // shouldn't work
4 | 1 | 0 (this is still okay)
5 | 2 | 1 (this is okay)
6 | 2 | 0 (this is okay)
7 | 2 | 1 (this is NOT okay) // shouldn't work
8 | 2 | 0 (this is still okay)
I want to be able to have as many (x,0s) as I want but only be able to have ONE (x,1)
How could this be accomplished?
edit: possible solution:
CREATE TRIGGER active_check BEFORE INSERT ON table2`
IF EXISTS (SELECT table2_id FROM table_2 WHERE table2_table1_id = NEW.table2_table1_id WHERE active = 1)
BEGIN
ROLLBACK TRANSACTION;
RETURN
END;
Would my use of New.table2_table1_id work here even though it's in its own statement? If not how could I get around that?
I have a table (type is tinyint(1)):
id | name | type
-----------------
1 | John | 1
2 | Peter | 1
3 | Bob | 2
After calling a SELECT * FROM user WHERE type <> 1 I get 0 rows. Bob's line should have been returned.
I've tried NOT IN (1), != 1, but no success.
All the 3 type of queries Works in the following fiddle:
Whether it is int or text type, the query works:
Checkout the fiddle:
http://sqlfiddle.com/#!2/d4bb1/1
and
http://sqlfiddle.com/#!2/9b070/2
I currently have some sql that brings back tags. they should have distinct ids, but they don't.... so my current data is like:
Microsoft | GGG | 1 | 167
Microsoft | GGG | 1 | 2
Microsoft | GGG | 1 | 1
What i would like to do is have only one row come back with the final column concatenated into a delimited list like:
Microsoft | GGG | 1 | 167, 2, 1
I am using mySQL 5 for this.
Use GROUP_CONCAT() for this, with a GROUP BY covering the other three columns:
SELECT
name, -- Microsoft
other, -- GGG
other2, -- 1
GROUP_CONCAT(id) AS ids
FROM tbl
GROUP BY name, other, other2