I want to eliminate duplicates from a database, based on an identifier, an order and a condition.
More precisely, I have data with several observations. I have sometimes a condition that makes me want to keep that observation anyway (let fix it condition=1), but then also keep the observation with the same identifier even if this condition does not hold (condition=0).
But if I have for one identifier several observations where condition=0 then I want to elminate duplicates, with criterion being having the greatest order.
Without the condition I can do that
proc sort data=have;
by identifier descending order;
run;
proc sort nudopkey data=have;
by identifier;
run;
But how to incorporate my condition in this ?
Edit 1 : add a database example :
data Test;
input identifier $ order condition;
datalines;
1023 1 0
1023 2 0
1064 2 0
1064 1 0
1098 1 0
1098 1 1
;
Then I want to keep
1023 2 0
1064 2 0
1098 1 0
1098 1 1
Edit 2 : tried to precise my conditions
I presume you want to eliminate duplicates only when the condition for all records for an identifier is set to 0. In that case you want to keep the record with the maximum order and eliminate all other records with the same identifier.
Proc sql;
create table want as
select *
from test
group by identifier
having max (condition) ne 0
or order eq max (order)
;
Quit;
This will keep all rows for an Identifier where the maximum condition = 1,
or in the case of those where maximum condition = 0, select the row with the maximum order.
Is that what you want?
Some of this depends on how you define 'condition'. Is your condition easily verifiable on every record for that identifier? Then you can do something like this.
Evaluate the condition.
For records where it is true (you want to remove the duplicate), set flag=0. For records where it is not true, increment the condition flag by one.
If the condition is true for all records in that ID, all will have the same value (flag=0) and nodupkey on by identifier flag; will remove extras. If the condition is false for all records, those will not be removed. If it's true for some and false for some, and you want to remove only some of the records with that identifier (only the duplicates where it is true), then you have to make sure that either it's sorted to have all of the condition=true records at top, or have a separate flag counter that determines what value the flag will be (since it sometimes will go to 0 in the middle, so 0 0 0 1 2 3 0 4 5 6 is what you want, not 0 0 0 1 2 3 0 0 1 2 ).
Perhaps easier to see is to do it within a datastep. After sorting by identifier descending order:
data want;
set have;
by identifier descending order;
if (condition=true) and not (first.identifier) then delete;
run;
This will, again, work if either condition=true is always at the top, or if it's always consistent within one ID group. If it's inconsistent and mixed, then you need to keep track of whether you've kept one where it was true (assuming you want to), or it might delete all records where it is true; use a separate variable to keep track of how many you've kept. first.identifier will be 1/TRUE for the first record for that identifier only, not taking into account the condition. You could also create the flag, then sort by identifier flag descending order; and guarantee the condition=true are at the top (either by making flag=0 for true, or sorting by descending flag.)
Related
SQLFiddle
In this fiddle example i m trying to use WHERE o.d_id = 1 or 2,3,4 etc but if i use d_id= 0 it should show the total result or we can say the where clause should not work if d_id = 0.
You can do a conditional style WHERE clause like this....
WHERE (?inputvalue = 0 OR ?inputvalue = d_id)
In this case the ?inputvalue is whatever search term you want. You'll provide it from your programming language.
edit Notice that in many queries you can specify an input value just once. For example, if all you want is WHERE d_id = 17 that's easy to specify. You just put the 17 in and you're done.
But, if you follow my suggestion you need to repeat the input value. You need WHERE (0 = 17 OR d_id = 17). That will, when you give it 17, filter your result set to find the seventeens.
And, if you give it 0, like this WHERE ( 0 = 0 OR d_id = 0 ) it will find all results, not filtering on just the zeroes. That's good because there aren't any zeros.
Probably, that's not how WHERE works. If you write a statement like WHERE id = 0, you ask your database to return just the rows where the id column is set to 0. If you want your database to return all values, you have to skip this condition completly
I was doing a query with MySQL to save all objects returned, but I'd like identify these objects based in statements of the block WHERE, that is, if determined object to satisfy the specific characteristic I'd like create one column and in this column I assignment the value 0 or 1 in the row corresponding the object if it satisfy or not satisfy these characteristic.
This is my script:
SELECT
s.id, al.ID, al.j, al.k, al.r, gal.i
FROM
datas as al
WHERE
AND s.id = al.ID
AND al.j between 1 and 1
AND al.k BETWEEN 15 and 16
AND al.r BETWEEN 67 and 72
The script above is working perfectly and I can to save all objects which it return.
So, I'd like to know if is there a way add in the query above, on block WHERE, the following statement,
( Flags & (dbo.environment('cool') +
dbo.environment('ok') -
dbo.environment('source')) ) = 25
and ((al_pp x al_pp1)-0.5/3=11
and determined the objects that satisfy or not these condition with 0 or 1 in a new column created in Table saved.
I read some tutorials about this and saw some attempts with IF, CASE, ADD COLUMN or WHEN, but none of these solved.
Thanks in advance
MySQL has if function, see here
So you can simply use it in your query:
SELECT IF(( Flags & (dbo.fPhotoFlags('SATURATED') +
dbo.fPhotoFlags('BRIGHT') +
dbo.fPhotoFlags('EDGE')) ) = 0
and petroRad_r < 18
and ((colc_u - colc_g) - (psfMag_u - psfMag_g)) < -0.4
, 1 --// VALUE IF TRUE
, 0 --// VALUE IF FALSE
) as conditional_column, ... rest of your query
What SQLite statement do I need to get the column name WHERE there is a value?
COLUMN NAME: ALPHA BRAVO CHARLIE DELTA ECHO
ROW VALUE: 0 1 0 1 1
All I want in my return is: Bravo, Delta, Echo.
Your request is not entirely clear, but you appear to be asking for a SELECT statement that will return not data but rather columns names, and not a predictable number of values but rather a number values that depend on the data in the table.
For instance,
A B C D E
0 1 0 1 1
would return (B,D,E) whereas
A B C D E
1 0 1 0 0
would return (A, C).
If that's what you're asking, this is not something that SQL does. SQL retrieves data from the table and an SQL result set always has the same number of columns per row.
To accomplish your goal, you would have to retrieve all columns that might have a value in the table and then, in your program code, check for the value in each column and accrue a list of column names that had values.
Also, consider what happens when there is more than one row to examine and the distribution of values differ. In other words, what's the expected result if the data looks like this:
A B C D E
- - - - -
0 1 0 1 1
1 0 1 0 0
[Also, note that all the columns in your example have values, some 0, some 1. What you really want is a list of column names where the column contains a value of 1.]
Finally, consider that your inability to easily get the results you need from your data might indicate a flaw in the data model you're using. For instance, if you were to structure your data like this:
TagName TagValue
------- --------
Alpha 0
Bravo 1
Charlie 0
Delta 1
Echo 1
you could then obtain your results with SELECT TagName FROM Tags WHERE TagValue = 1.
Furthermore, if 0 and 1 are really the only two possible values (indicating boolean "presence" or "absence" of the tag) then you could remove the TagValue column and the rows for Alpha and Charlie entirely (you'd INSERT a row into the table to add tag and DELETE a row to remove it).
A design along these lines seems to model your data more accurately and allows you to entire new tags to the system without having to issue an ALTER TABLE command.
http://sqlfiddle.com/#!9/1407e/1
SELECT CONCAT(IF(ALPHA,'ALPHA,',''),
IF(BRAVO,'BRAVO,',''),
IF(CHARLIE,'CHARLIE,',''),
IF(DELTA,'DELTA,',''),
IF(ECHO,'ECHO',''))
FROM table1
I want to query my database so that I don't order by or order by desc but instead seek rows when a given field is different from 0 first and only after get others ordered by if there are no rows with field != "0". What is the best way to accomplish this or even is it possible to do so?
This is supposing the field can have values from -100 to 0 to 100
With example:
Considering LIMIT equals 3 and field in consideration is field1
field1
10
0
-20
-40
Another set:
field1
0
0
-100
50
In the first set, the results extracted will be 10, -20 and -40 (by any order), while in the second set they should be -100, 50 and 0 (any of the zeros, by any order).
First I'd like to check whether there exists != 0 in the database and if true extract those and only after 0's until I fill the LIMIT
something like this?
its not clear AT ALL what you want so I'm only guessing here.
I want to query my database so that I don't order by or order by desc but instead seek rows when a given field is different from 0
seek rows to me means find all rows that are not 0.
only after get others ordered by if there are no rows with field != "0".
to me that means order the rows in a sub query where the field is not 0.
with that in mind maybe this query?
SELECT * FROM table t
WHERE field IN (
SELECT field
from table t
WHERE field <> 0
ORDER BY field
)
without data It's not clear what you want. but i think something like this is what you were asking.. you could also try
SELECT * FROM table t
WHERE field <> 0
ORDER BY field
Picture a table with fields (Id, Valid, Value)
Valid = boolean
Value = number from 0 to 100
What I want is a report that counts the number of records where (valid = 0), and then gives me the total number of cases where (value < 70) and the number of cases where (value >= 70).
The problem is that the "value" field could be empty on some of the records and I only want the records where the value field is not empty.
I know that the second value (value>=70) is going to be calculated, but the problem is that I can't simply do (total number of records - number of records where value < 70), because there's the problem with the records where "value" is null...
And then I want to create graphic with these values, to see the percentage of records below and above 70.
"The problem is that the "value" field could be empty on some of the records and I only want the records where the value field is not empty."
Use a WHERE clause to exclude rows whose "value" field is Null.
Here is sample data for tblMetraton.
Id Valid Valu
1 -1 2
2 0 4
3 -1 6
4 0
5 0 90
I used Valu for the field name because Value is a reserved word.
You can use the Count() function in a query and take advantage of the fact it only counts non-Null values. So use Count() with an IIf() expression which returns a non-Null value (I used 1) for the condition you want to match and Null otherwise.
SELECT
Count(IIf(Valid=0,1,Null)) AS valid_false,
Count(IIf(Valu<70,1,Null)) AS below_70,
Count(IIf([Valu]>=70,1,Null)) AS at_least70
FROM tblMetraton AS m
WHERE (((m.[Valu]) Is Not Null));
Based on my sample data in tblMetraton, that query gives me this result set.
valid_false below_70 at_least70
2 3 1
If my sample data does not address the variability you're dealing with, and/or if my query results don't match your requirements, please show us you own sample data and the results you want based on that sample.