Suppose I have a table like this:
| id | date | name | value |
|----|------------|------|-------|
| 0 | 2017-01-14 | foo | one |
| 1 | 2017-01-17 | bar | two |
| 2 | 2017-01-18 | john | five |
| 3 | 2017-01-19 | doe | ten |
(where date need not necessarily be ordered)
I want to be able to select some values of the previous row (based on date). Such a functionality can be achieved by the following query:
SELECT
*,
(SELECT
name
FROM
example e2
WHERE
e2.dt < e1.dt
ORDER BY dt DESC
LIMIT 1
) as prev_name
FROM example e1
with resulting table:
| id | dt | name | value | prev_name |
|----|------------|------|-------|-----------|
| 0 | 2017-01-14 | foo | one | (null) |
| 1 | 2017-01-17 | bar | two | foo |
| 2 | 2017-01-18 | john | five | bar |
| 3 | 2017-01-19 | doe | ten | john |
Now, this works just fine. However, it would be preferable if I could easily select multiple columns from the previous row, resulting in a result like:
| id | dt | name | value | prev_name | prev_value | prev_dt |
|----|------------|------|-------|-----------|------------|------------|
| 0 | 2017-01-14 | foo | one | (null) | (null) | (null) |
| 1 | 2017-01-17 | bar | two | foo | one | 2017-01-14 |
| 2 | 2017-01-18 | john | five | bar | two | 2017-01-17 |
| 3 | 2017-01-19 | doe | ten | john | five | 2017-01-18 |
This can of course be accomplished by simply copying the subquery (SELECT [..] FROM example e2 ...) into the query multiple times, but I guess this is not the preferable way to go. I have found several question on SO addressing either the "how to select records from a previous row" or the "how to select multiple columns using subqueries" problem, but not both. The latter problem is then mostly solved by using a JOIN statement, but I think this is not combinable with the "previous row" case. So my question is: what would be a better way to produce the last result, rather then copying a subquery for every column we need?
EDIT. As an extra constraint, that I did not include in the original question, "previous" could actually be something different from the previous row, but rather "the previous row that satisfies a condition". So suppose my table contains an extra boolean column b
| id | dt | name | value | b |
|----|------------|------|-------|---|
| 0 | 2017-01-14 | foo | one | 1 |
| 1 | 2017-01-17 | bar | two | 0 |
| 2 | 2017-01-18 | john | five | 1 |
| 3 | 2017-01-19 | doe | ten | 0 |
I would want the "previous row" to be the previous row with b = 1, so the desired result would be:
| id | dt | name | value | b | prev_name | prev_value | prev_dt |
|----|------------|------|-------|---|-----------|------------|------------|
| 0 | 2017-01-14 | foo | one | 1 | (null) | (null) | (null) |
| 1 | 2017-01-17 | bar | two | 0 | foo | one | 2017-01-14 |
| 2 | 2017-01-18 | john | five | 1 | foo | one | 2017-01-14 |
| 3 | 2017-01-19 | doe | ten | 0 | john | five | 2017-01-18 |
I think this can still be accomplished by James Scott's answer, by simply only updating the variables when b = 1, using an IF-statement, but maybe there is another solution possible in this case.
EDIT. SQLfiddle
Something like this will return the id of the 'previous' row.
SELECT x.*
, MAX(y.id) prev_id
FROM example x
LEFT
JOIN example y
ON y.id < x.id
AND y.b = 1
GROUP
BY x.id;
I'll leave the business of returning the rest of the data associated with this row as an exercise for the reader.
Looks like a good use case for session variables if you only want the previous row, you can use ORDER BY to get different results.
SET #VDt := NULL, #VName := NULL, #VValue := NULL;
SELECT id, #VName prev_name, #VValue prev_value, #VDt prev_dt, #VDt := dt dt, #VName := `name` `name`, #VValue := `value` `value` FROM example;
Messed this up when I first posted, note that the variables must be set after they are returned from the previous row. To reorder the columns (if desired) you can wrap this query in another that then reorders the result columns.
Let me know if you need anything else,
Regards,
James
Related
I have been looking around quite a lot but I can't seem to find a solution to this problem.
I got two tables:
|---------------------|-------------------|
| ID | Value |
|---------------------|-------------------|
| 1 | NULL |
| 2 | NULL |
| 3 | NULL |
| 4 | NULL |
|---------------------|-------------------|
...
|---------------------|-------------------|
| ID | Value |
|---------------------|-------------------|
| 1 | 7 |
| 1 | 18 |
| 2 | 21 |
| 2 | 2 |
| 4 | 103 |
|---------------------|-------------------|
...
Basically what I wanna do is update the NULL-fields from the first table with the smallest value from the second table where there are matching IDs.
So that in the end it looks something like this:
|---------------------|-------------------|
| ID | Value |
|---------------------|-------------------|
| 1 | 7 |
| 2 | 2 |
| 3 | NULL |
| 4 | 103 |
|---------------------|-------------------|
...
I tired out a bunch of things but failed. Can anyone help me?
You could use a sub query:
update t1
inner join (select ID, min(Value) as minimum from t2 group by ID) tempt2 on t1.ID=tempt2.ID
set t1.value=tempt2.minimum;
Basically, you're looking up that minimum value in the second table for each ID, you call that table tempt2, and you join on that.
I have the following sample data:
| key_id | name | name_id | data_id |
+--------+-------+---------+---------+
| 1 | jim | 23 | 098 |
| 2 | joe | 24 | 098 |
| 3 | john | 25 | 098 |
| 4 | jack | 26 | 098 |
| 5 | jim | 23 | 091 |
| 6 | jim | 23 | 090 |
I have tried this query:
INSERT INTO temp_table
SELECT
DISTINCT #key_id,
name,
name_id,
#data_id FROM table1,
I am trying to dedupe a table by all fields in a row.
My desired output:
| key_id | name | name_id | data_id |
+--------+-------+---------+---------+
| 1 | jim | 23 | 098 |
| 2 | joe | 24 | 098 |
| 3 | john | 25 | 098 |
| 4 | jack | 26 | 098 |
What I'm actually getting:
| key_id | name | name_id | data_id |
+--------+-------+---------+----------+
| 1 | jim | 23 | NULL |
| 2 | joe | 24 | NULL |
| 3 | john | 25 | NULL |
| 4 | jack | 26 | NULL |
I am able to dedupe the table, but I am setting the 'data_Id' value to NULL by attempting to override the field with '#'
Is there anyway to select distinct on all fields and while keeping the value for 'data_id'? I will take the highest or MAX data_id # if possible.
If you only want one row returned for a specific value (in this case, name), one option you have is to group by that value. This seems like a good approach because you also said you wanted the largest data_id for each name, so I would suggest grouping and using the MAX() aggregate function like this:
SELECT name, name_id, MAX(data_id) AS data_id
FROM myTable
GROUP BY name, name_id;
The only thing you should be aware of is the possibility that a name occurs multiple times under different name_ids. If that is possible in your table, you could group by the name_id too, which is what I did.
Since you stated you're not interested in the key_id but only the name, I just excluded it from the query altogether to get this:
| name | name_id | data_id |
+-------+---------+---------+
| jim | 23 | 098 |
| joe | 24 | 098 |
| john | 25 | 098 |
| jack | 26 | 098 |
Here is the SQL Fiddle example.
RENAME TABLE myTable to Old_mytable,
myTable2 to myTable
INSERT INTO myTable
SELECT *
FROM Old_myTable
GROUP BY name, name_id;
This groups my tables by the values I want to dedupe while still keeping structure and ignoring the 'Data_id' column
Thanks for your help i'm stuck on this problem.
Let me explain it, i have this kind of table :
| domain | creationdate | value 1 | value 2 |
|--------|---------------------|---------|---------|
| abc | 2013-05-28 15:35:01 | value 1 | value 2 |
| abc | 2013-04-30 12:10:10 | value 1 | value 2 |
| aaa | 2011-04-02 13:10:10 | value 1 | value 2 |
| bbb | 2012-02-12 10:48:10 | value 1 | value 2 |
| bbb | 2013-04-15 07:15:23 | value 1 | value 2 |
And i want to select (with subqueries) this :
| domain | creationdate | value 1 | value 2 |
|--------|---------------------|---------|---------|
| abc | 2013-04-30 12:10:10 | value 1 | value 2 |
| aaa | 2011-04-02 13:10:10 | value 1 | value 2 |
| bbb | 2012-02-12 10:48:10 | value 1 | value 2 |
I tried to do a combinaison of subqueries with IN/NOT IN in WHERE clause and group by/having but i'm not able to obtain a proper result.
I also have another question to ask, if someone already faced this kind of problem i would be glad to hear how he managed to figure it out.
The records in the first table you see above are frequently (every ten mins) deleted/inserted. My aim is to make a copy (or maybe a view) of the result (without the duplicates entries) which will be used 24/7 by a postfix mail server. I heard that big views (with many subqueries) decreases performances which means a table would be a preferable option. The thing is if i have to make a new table every ten mins there will be a little down time and postfix will not be able to read the table.
Waiting for your advices, thanks already.
EDIT :
Based on #Ed Gibbs answer, there is a better sample :
Source table :
| domain | creationdate | value 1 | value 2 |
|------------|---------------------|---------|---------|
| google.com | 2013-05-28 15:35:01 | john | mary |
| google.com | 2013-04-30 12:10:10 | patrick | edward |
| yahoo.fr | 2011-04-02 13:10:10 | britney | garry |
| ebay.com | 2012-02-12 10:48:10 | harry | mickael |
| ebay.com | 2013-04-15 07:15:23 | bill | alice |
With your query the result is the source table.
Desired result :
| domain | value 1 | value 2 |
|------------|---------|---------|
| google.com | patrick | edward |
| yahoo.fr | britney | garry |
| ebay.com | harry | mickael |
I want to keep the oldest domain (with the min creation date) with its own value1 and 2.
New question !
I made a view of the desired result based on your anwser.
The result look like this :
| domain | value 1 | foreign_key |
|------------|---------|-------------|
| google.com | patrick | X |
| yahoo.fr | britney | Y |
| ebay.com | harry | Z |
I also have a table with this kind of entries :
| email | value 1 | foreign_key |
|--------------------|---------|-------------|
| john#google.com | patrick | X |
| john#google.com | britney | Y |
| harry#google.com | mary | X |
| mickael#google.com | jack | X |
| david#ebay.com | walter | Z |
| alice#yahoo.com | brian | Y |
Assume that (in this sample) emails %#google.com from Y foreign_key aren't good records (only %google.com from X foreign are the good ones and also because its domain is the one i choose with the creationdate selection) how could i manage to select only emails from domain/fk referenced in my new view ?
Desired result :
| email | value 1 | foreign_key |
|--------------------|---------|-------------|
| john#google.com | patrick | X |
| harry#google.com | mary | X |
| mickael#google.com | jack | X |
| david#ebay.com | walter | Z |
| alice#yahoo.com | brian | Y |
I tried with a CONCAT('%','#',domain) and a foreign_key=foreign_key join but it doesn't give me what i want.
Based on your sample data and results, a GROUP BY will give you the results you're after:
SELECT
domain,
MIN(creationdate) AS creationdate,
value1,
value2
FROM mytable
GROUP BY domain, value1, value2
Addendum: #Arka provided updated sample data where the value 1 and value 2 columns have different values (in the original they were the same). That changes the query to this:
SELECT domain, creationdate, value1, value2
FROM mytable
WHERE (domain, creationdate) IN (
SELECT domain, MIN(creationdate)
FROM mytable
GROUP BY domain)
The subquery gets a list of the earliest creationdate for each domain, and the outer query only selects rows where the domain and creationdate match the subquery values.
I've the following table:
| id | Name | Date of Birth | Date of Death | Result |
| 1 | John | 3546565 | 3548987 | |
| 2 | Mary | 5233654 | 5265458 | |
| 3 | Lewis| 6546876 | 6548752 | |
| 4 | Mark | 6546546 | 6767767 | |
| 5 | Steve| 6546877 | 6548798 | |
And I need to do this for the whole table:
Result = 1, if( current_row(Date of Birth) - row_above_current_row(Date of Death))>X else 0
To make things easier, I guess, I created the same table above but with 2 extra id fields: id_minus_one and id_plus_one
Like this:
| id | id_minus_one | id_plus_one |Name | Date_of_Birth | Date_of_Death | Result |
| 1 | 0 | 2 |John | 3546565 | 3548987 | |
| 2 | 1 | 3 |Mary | 5233654 | 5265458 | |
| 3 | 2 | 4 |Lewis| 6546876 | 6548752 | |
| 4 | 3 | 5 |Mark | 6546546 | 6767767 | |
| 5 | 4 | 6 |Steve| 6546877 | 6548798 | |
So my approach would be something like (in pseudo code):
for id=1, ignore result. (Because there is no row above)
for id=2, Result = 1 if( (Where id=2).Date_of_Birth - (where id_minus_one=id-1).Date_of_Death )>X else 0
for id=3, Result = 1 if( (Where id=3).Date_of_Birth - (where id_minus_one=id-1).Date_of_Death)>X else 0
and so on for the whole table...
Just ignore id_plus_one if there is no need for it, I'll use it later for the same thing. So, if I manage to do this for id_minus_one I'll manage for id_plus_one as they are the same algorithm.
My question is how to pass that pseudo code into SQL code, I can't find a way to relate both ids in just one select.
Thank you!
As you describe this, it is just a self join with some logic on the select:
select t.*,
((t.date_of_birth - tprev.date_of_death) > x) as flag
from t left outer join
t tprev
on t.id_minus_one = tprev.id
I know there are similar questions out there but I can't find this particular case among them. If someone knows where this is answered please hook me up with a link. Otherwise here goes.
I have a table which has two fields I'm interested in - code, and id. None of these are unique though there is a company field which combines with the id field to make the primary key.
I need a list of all codes and names which have more than one name associated with the same code. so if my date looks like this:
| code | name | company |
+---------------+---------------+----------------------------------------------------+
| 00009 | name | 1 |
| 00009 | name | 2 |
| 00009 | name | 3 |
| 00009 | name | 4 |
| 00009 | diff name | 1 |
| 00014 | Foo | 2 |
| 00014 | foo | 3 |
| 00014 | foo | 4 |
| 00014 | foo | 5 |
| 00014 | foo | 6 |
| 00015 | barbaz | 1 |
| 00015 | barbaz | 2 |
| 00015 | barbaz | 3 |
| 00015 | barbaz | 4 |
| 00015 | bar baz | 5 |
| 00017 | foo | 1 |
| 00018 | bar | 1 |
I need my results too look like this:
| code | name |
+---------------+-------------------------------------------------------------------+
| 00009 | name |
| 00009 | diff name |
| 00014 | Foo |
| 00014 | foo |
| 00015 | barbaz |
| 00015 | bar baz |
I have tried several things including
SELECT DISTINCT t1.code, t1.name from items t1
inner join (select t2.code from items t2 group by t2.name ) t2
on (t1.code = t2.code)
group by t1.code order by t1.code;
Which of course is wrong. Thanks for any insight!
[edit] I forgot to mention one detail. I only want to list results that have more than one unique name entry for a give code. I've updated the initial data (which does not change my desired results).
[edited for typos]
Use HAVING with COUNT(DISTINCT):
SELECT code, name
FROM items
WHERE code IN (
SELECT code
FROM items
GROUP BY code
HAVING count(DISTINCT name) > 1
) t1
ORDER BY code, name