Select duplicate and keep the oldest (not based on ID) - mysql

Thanks for your help i'm stuck on this problem.
Let me explain it, i have this kind of table :
| domain | creationdate | value 1 | value 2 |
|--------|---------------------|---------|---------|
| abc | 2013-05-28 15:35:01 | value 1 | value 2 |
| abc | 2013-04-30 12:10:10 | value 1 | value 2 |
| aaa | 2011-04-02 13:10:10 | value 1 | value 2 |
| bbb | 2012-02-12 10:48:10 | value 1 | value 2 |
| bbb | 2013-04-15 07:15:23 | value 1 | value 2 |
And i want to select (with subqueries) this :
| domain | creationdate | value 1 | value 2 |
|--------|---------------------|---------|---------|
| abc | 2013-04-30 12:10:10 | value 1 | value 2 |
| aaa | 2011-04-02 13:10:10 | value 1 | value 2 |
| bbb | 2012-02-12 10:48:10 | value 1 | value 2 |
I tried to do a combinaison of subqueries with IN/NOT IN in WHERE clause and group by/having but i'm not able to obtain a proper result.
I also have another question to ask, if someone already faced this kind of problem i would be glad to hear how he managed to figure it out.
The records in the first table you see above are frequently (every ten mins) deleted/inserted. My aim is to make a copy (or maybe a view) of the result (without the duplicates entries) which will be used 24/7 by a postfix mail server. I heard that big views (with many subqueries) decreases performances which means a table would be a preferable option. The thing is if i have to make a new table every ten mins there will be a little down time and postfix will not be able to read the table.
Waiting for your advices, thanks already.
EDIT :
Based on #Ed Gibbs answer, there is a better sample :
Source table :
| domain | creationdate | value 1 | value 2 |
|------------|---------------------|---------|---------|
| google.com | 2013-05-28 15:35:01 | john | mary |
| google.com | 2013-04-30 12:10:10 | patrick | edward |
| yahoo.fr | 2011-04-02 13:10:10 | britney | garry |
| ebay.com | 2012-02-12 10:48:10 | harry | mickael |
| ebay.com | 2013-04-15 07:15:23 | bill | alice |
With your query the result is the source table.
Desired result :
| domain | value 1 | value 2 |
|------------|---------|---------|
| google.com | patrick | edward |
| yahoo.fr | britney | garry |
| ebay.com | harry | mickael |
I want to keep the oldest domain (with the min creation date) with its own value1 and 2.
New question !
I made a view of the desired result based on your anwser.
The result look like this :
| domain | value 1 | foreign_key |
|------------|---------|-------------|
| google.com | patrick | X |
| yahoo.fr | britney | Y |
| ebay.com | harry | Z |
I also have a table with this kind of entries :
| email | value 1 | foreign_key |
|--------------------|---------|-------------|
| john#google.com | patrick | X |
| john#google.com | britney | Y |
| harry#google.com | mary | X |
| mickael#google.com | jack | X |
| david#ebay.com | walter | Z |
| alice#yahoo.com | brian | Y |
Assume that (in this sample) emails %#google.com from Y foreign_key aren't good records (only %google.com from X foreign are the good ones and also because its domain is the one i choose with the creationdate selection) how could i manage to select only emails from domain/fk referenced in my new view ?
Desired result :
| email | value 1 | foreign_key |
|--------------------|---------|-------------|
| john#google.com | patrick | X |
| harry#google.com | mary | X |
| mickael#google.com | jack | X |
| david#ebay.com | walter | Z |
| alice#yahoo.com | brian | Y |
I tried with a CONCAT('%','#',domain) and a foreign_key=foreign_key join but it doesn't give me what i want.

Based on your sample data and results, a GROUP BY will give you the results you're after:
SELECT
domain,
MIN(creationdate) AS creationdate,
value1,
value2
FROM mytable
GROUP BY domain, value1, value2
Addendum: #Arka provided updated sample data where the value 1 and value 2 columns have different values (in the original they were the same). That changes the query to this:
SELECT domain, creationdate, value1, value2
FROM mytable
WHERE (domain, creationdate) IN (
SELECT domain, MIN(creationdate)
FROM mytable
GROUP BY domain)
The subquery gets a list of the earliest creationdate for each domain, and the outer query only selects rows where the domain and creationdate match the subquery values.

Related

Selecting multiple columns from previous row in MySQL

Suppose I have a table like this:
| id | date | name | value |
|----|------------|------|-------|
| 0 | 2017-01-14 | foo | one |
| 1 | 2017-01-17 | bar | two |
| 2 | 2017-01-18 | john | five |
| 3 | 2017-01-19 | doe | ten |
(where date need not necessarily be ordered)
I want to be able to select some values of the previous row (based on date). Such a functionality can be achieved by the following query:
SELECT
*,
(SELECT
name
FROM
example e2
WHERE
e2.dt < e1.dt
ORDER BY dt DESC
LIMIT 1
) as prev_name
FROM example e1
with resulting table:
| id | dt | name | value | prev_name |
|----|------------|------|-------|-----------|
| 0 | 2017-01-14 | foo | one | (null) |
| 1 | 2017-01-17 | bar | two | foo |
| 2 | 2017-01-18 | john | five | bar |
| 3 | 2017-01-19 | doe | ten | john |
Now, this works just fine. However, it would be preferable if I could easily select multiple columns from the previous row, resulting in a result like:
| id | dt | name | value | prev_name | prev_value | prev_dt |
|----|------------|------|-------|-----------|------------|------------|
| 0 | 2017-01-14 | foo | one | (null) | (null) | (null) |
| 1 | 2017-01-17 | bar | two | foo | one | 2017-01-14 |
| 2 | 2017-01-18 | john | five | bar | two | 2017-01-17 |
| 3 | 2017-01-19 | doe | ten | john | five | 2017-01-18 |
This can of course be accomplished by simply copying the subquery (SELECT [..] FROM example e2 ...) into the query multiple times, but I guess this is not the preferable way to go. I have found several question on SO addressing either the "how to select records from a previous row" or the "how to select multiple columns using subqueries" problem, but not both. The latter problem is then mostly solved by using a JOIN statement, but I think this is not combinable with the "previous row" case. So my question is: what would be a better way to produce the last result, rather then copying a subquery for every column we need?
EDIT. As an extra constraint, that I did not include in the original question, "previous" could actually be something different from the previous row, but rather "the previous row that satisfies a condition". So suppose my table contains an extra boolean column b
| id | dt | name | value | b |
|----|------------|------|-------|---|
| 0 | 2017-01-14 | foo | one | 1 |
| 1 | 2017-01-17 | bar | two | 0 |
| 2 | 2017-01-18 | john | five | 1 |
| 3 | 2017-01-19 | doe | ten | 0 |
I would want the "previous row" to be the previous row with b = 1, so the desired result would be:
| id | dt | name | value | b | prev_name | prev_value | prev_dt |
|----|------------|------|-------|---|-----------|------------|------------|
| 0 | 2017-01-14 | foo | one | 1 | (null) | (null) | (null) |
| 1 | 2017-01-17 | bar | two | 0 | foo | one | 2017-01-14 |
| 2 | 2017-01-18 | john | five | 1 | foo | one | 2017-01-14 |
| 3 | 2017-01-19 | doe | ten | 0 | john | five | 2017-01-18 |
I think this can still be accomplished by James Scott's answer, by simply only updating the variables when b = 1, using an IF-statement, but maybe there is another solution possible in this case.
EDIT. SQLfiddle
Something like this will return the id of the 'previous' row.
SELECT x.*
, MAX(y.id) prev_id
FROM example x
LEFT
JOIN example y
ON y.id < x.id
AND y.b = 1
GROUP
BY x.id;
I'll leave the business of returning the rest of the data associated with this row as an exercise for the reader.
Looks like a good use case for session variables if you only want the previous row, you can use ORDER BY to get different results.
SET #VDt := NULL, #VName := NULL, #VValue := NULL;
SELECT id, #VName prev_name, #VValue prev_value, #VDt prev_dt, #VDt := dt dt, #VName := `name` `name`, #VValue := `value` `value` FROM example;
Messed this up when I first posted, note that the variables must be set after they are returned from the previous row. To reorder the columns (if desired) you can wrap this query in another that then reorders the result columns.
Let me know if you need anything else,
Regards,
James

Remove duplicates SQL while ignoring key and selecting max of specified column

I have the following sample data:
| key_id | name | name_id | data_id |
+--------+-------+---------+---------+
| 1 | jim | 23 | 098 |
| 2 | joe | 24 | 098 |
| 3 | john | 25 | 098 |
| 4 | jack | 26 | 098 |
| 5 | jim | 23 | 091 |
| 6 | jim | 23 | 090 |
I have tried this query:
INSERT INTO temp_table
SELECT
DISTINCT #key_id,
name,
name_id,
#data_id FROM table1,
I am trying to dedupe a table by all fields in a row.
My desired output:
| key_id | name | name_id | data_id |
+--------+-------+---------+---------+
| 1 | jim | 23 | 098 |
| 2 | joe | 24 | 098 |
| 3 | john | 25 | 098 |
| 4 | jack | 26 | 098 |
What I'm actually getting:
| key_id | name | name_id | data_id |
+--------+-------+---------+----------+
| 1 | jim | 23 | NULL |
| 2 | joe | 24 | NULL |
| 3 | john | 25 | NULL |
| 4 | jack | 26 | NULL |
I am able to dedupe the table, but I am setting the 'data_Id' value to NULL by attempting to override the field with '#'
Is there anyway to select distinct on all fields and while keeping the value for 'data_id'? I will take the highest or MAX data_id # if possible.
If you only want one row returned for a specific value (in this case, name), one option you have is to group by that value. This seems like a good approach because you also said you wanted the largest data_id for each name, so I would suggest grouping and using the MAX() aggregate function like this:
SELECT name, name_id, MAX(data_id) AS data_id
FROM myTable
GROUP BY name, name_id;
The only thing you should be aware of is the possibility that a name occurs multiple times under different name_ids. If that is possible in your table, you could group by the name_id too, which is what I did.
Since you stated you're not interested in the key_id but only the name, I just excluded it from the query altogether to get this:
| name | name_id | data_id |
+-------+---------+---------+
| jim | 23 | 098 |
| joe | 24 | 098 |
| john | 25 | 098 |
| jack | 26 | 098 |
Here is the SQL Fiddle example.
RENAME TABLE myTable to Old_mytable,
myTable2 to myTable
INSERT INTO myTable
SELECT *
FROM Old_myTable
GROUP BY name, name_id;
This groups my tables by the values I want to dedupe while still keeping structure and ignoring the 'Data_id' column

MYSQL select all unique records by eliminating duplicate values by other column

I have table name graph
columns are : wwid_a,wwid_b, active, date_added
The values are
+--------+--------+--------+---------------------+
| wwid_a | wwid_b | active | date_added |
+--------+--------+--------+---------------------+
| 1943 | 402158 | 1 | 2014-03-05 09:08:51 |
| 1943 | 402209 | 1 | 2014-03-05 09:08:52 |
| 1943 | 402464 | 1 | 2014-03-05 09:08:52 |
| 402158 | 1943 | 1 | 2014-03-05 09:08:5 |
| 402209 | 1943 | 1 | 2014-03-05 09:08:59 |
| 402464 | 1943 | 1 | 2014-03-05 09:08:58 |
+--------+--------+--------+---------------------+
Basically each entry has duplicate record with column interchanged by wwid_a and wwid_b.
I want select query that will give each unique record eliminating duplicate value by alternate wwid_a and wwid_b
something like
+--------+--------+--------+---------------------+
| wwid_a | wwid_b | active | date_added |
+--------+--------+--------+---------------------+
| 1943 | 402158 | 1 | 2014-03-05 09:08:51 |
| 1943 | 402209 | 1 | 2014-03-05 09:08:52 |
| 1943 | 402464 | 1 | 2014-03-05 09:08:52 |
+--------+--------+--------+---------------------+
If, indeed, all are duplicated, then this might be the most efficient way to remove them:
select g.*
from graph g
where g.wwid_a < g.wwid_b;
If you are concerned that this might not always be true, then you have a couple of options. The not exists logic might be best:
select g.*
from graph g
where g.wwid_a < g.wwid_b or
not exists (select 1
from graph g2
where g2.wwid_a = g.wwid_b and
g2.wwid_b = g.wwid_a
);
That is, keep a row if the first record is smaller than the second. Or, keep it if the matching less-than record doesn't exist.
This will work best on larger data with an index on graph(wwid_a, wwid_b).

Calculating Date columns of MysQl in VB.Net?

I'm using VB.Net 2010 and MySQL.
I have two tables in MySQL database 'CAR' and 'CAR_RENT'.
From the VB.Net I want to do the following calculations:
I want to calculate the total_fee Column in CAR_RENT. Which can be multiplying the rental_fee column from 'CAR' table with the date difference of Issue_date and return_date from 'CAR_RENT' table.
I want to calculate the penalty_fee column of 'CAR_RENT' table by finding the exceeded date from the return_date. That should be rental_fee*number_of_exceeded_date for specified client.
That should be automatically calculated when the program is run.
I know that the code I tried is completely not formal way so no need to post it here. Please I need your help??
TABLE:CAR
+-----------+----------+---------------+--------+----------------+
| Car_id | Plate_no | Model | color | Rental_fee_day |
| 100 | 25534 | Tesla Model S | Black | $3500 |
| 101 | 25535 | Audi A6 | Black | $2100 |
| 103 | 35625 | BMW 3 Series | silver | $2000 |
+-----------+----------+---------------+--------+----------------+
TABLE:CAR_RENT
+-----------+--------+------------+-------------+-----------+-------------+
| Client_id | Car_id | Issue_date | Return_date | Total_fee | Penalty_fee |
+-----------+--------+------------+-------------+-----------+-------------+
| 1 | 103 | 2014-02-01 | 2014-02-10 | | |
| 1 | 100 | 2014-02-01 | 2014-02-15 | | |
| 3 | 101 | 2014-02-18 | 2014-02-30 | | |
+-----------+--------+------------+-------------+-----------+-------------+
you should check here this is the DateDiff function for mysql.
you can use this and a join to get the info you need....

Select without duplicate from a specific string/key

Thanks to #Ed Gibbs i managed to solve my first problem on this case (Select duplicate and keep the oldest (not based on ID))
I am now facing a new problem I can not solve.
I have two tables, "domain" which is clear of duplicate and "email" which contains duplicate. In the first table i had a value called "creationdate" which i used as a filter. In the second table i don't have any filter but some informations could (i think) be used to act as a filter.
Table domain :
| domain | value 1 | foreign_key |
|------------|---------|-------------|
| google.com | patrick | X |
| yahoo.com | britney | Y |
| ebay.com | harry | Z |
Table email :
| email | value 1 | foreign_key |
|--------------------|---------|-------------|
| john#google.com | patrick | X |
| john#google.com | britney | Y |
| harry#google.com | mary | X |
| mickael#google.com | jack | X |
| david#ebay.com | walter | Z |
| alice#yahoo.com | brian | Y |
As you can see on the first table, the domain google.com is handled by X foreign_key. In the email table the records "john#google.com,patrick,X" and "harry#google.com,mary,X" are fine because they match to the right foreign_key. The problem is records like "john#google.com,britney,Y", Y isn't the associated foreign_key to the domain google.com so i want to remove it.
Here is the desired table :
| email | value 1 | foreign_key |
|--------------------|---------|-------------|
| john#google.com | patrick | X |
| harry#google.com | mary | X |
| mickael#google.com | jack | X |
| david#ebay.com | walter | Z |
| alice#yahoo.com | brian | Y |
How can i select theses datas without the wrongs records ? I think the key of the problem is a concat/substring but i can't figure how to do it.
Thanks for your help.
To get domain out of a proper email you can use substring_index() function and use a simple join based on foreign key and domain match.
SELECT email.* FROM email
JOIN domain ON email.foreign_key = domain.foreign_key
AND substring_index( email.email, '#', -1 ) = domain.domain