How to delete duplicate data from MySQL except latest data - mysql

I want to delete records from mysql table
I have table like this
I am checking here if (date, url, price, hotelName) is same then remove except one
id | hotelName | price | url | date |
-------------------------------------------------
1 | abcd | 20$ | abcd.com | 21 jan 2019 |
2 | abcd | 24$ | abcd.com | 22 jan 2019 |
3 | wzyz | 10$ | wzyz.com | 21 jan 2019 |
4 | abcd | 20$ | abcd.com | 21 jan 2019 |
5 | wzyz | 15$ | wzyz.com | 22 jan 2019 |
6 | wzyz | 15$ | wzyz.com | 22 jan 2019 |
In this table you can see duplicate records is id [1,4] and [5,6]
I want to delete duplicate records from this table except latest data
After deleting this table should look like
id | hotelName | price | url | date |
-------------------------------------------------
2 | abcd | 24$ | abcd.com | 22 jan 2019 |
3 | wzyz | 10$ | wzyz.com | 21 jan 2019 |
4 | abcd | 20$ | abcd.com | 21 jan 2019 |
6 | wzyz | 15$ | wzyz.com | 22 jan 2019 |

If your table is not too big, this is a short and straight-forward syntax :
DELETE t1
FROM
mytable t1
CROSS JOIN t2
WHERE
t1.id < t2.id
AND t1.hotelName = t2.hotelName
AND t1.date = t2.date
AND t1.url = t2.url
AND t1.price = t2.price
Anoter solution, less resource-consuming :
DELETE FROM mytable
WHERE id NOT IN (
SELECT MAX(t.id) FROM mytable t GROUP BY t.hotelName, t.date, t.url, t.price
)

I strongly recommend group by and join for this purpose:
delete t join
(select date, url, price, hotelName, max(id) as max_id
from t
group by date, url, price, hotelName
) tt
using (date, url, price, hotelName)
where t.id < tt.max_id;
I assume by latest, you mean "keep the one with the largest id".
If you have a large amount of data, delete can be expensive. In that case. create temporary table/truncate/insert might have better performance.

Related

How can I merge Monthly Sales and Monthly Purchase table into one singe table?

I have two table: Monthly Sales(month, total_sales) and Monthly Purchase(month, total_purchase). I need to combine both the table and output (month, total_sales, total_purchase).
Month_Sales: Monthly_Purchase:
+----+----------+ +-----+-------------+
| Month | sales | | Month | purchase |
+----+----------+ +-----+-------------+
| Jan | 50000 | | Jan | 50000 |
| Mar | 20000 | | Feb | 60000 |
| Jun | 10000 | | Mar | 40000 |
+----+----------+ +-----+-------------+
Output:
+----+----------+---------+
| Month | sales | purchase|
+----+----------+---------+
| Jan | 50000 | 50000 |
| Feb | NULL | 60000 |
| Mar | 20000 | 40000 |
| Jun | 10000 | NULL |
+----+----------+---------+
I try to achieve this using FULL OUTER JOIN but it does not provide expected.
SELECT Table1.month, Table1.sales, Table2.purchase FROM (SELECT month, sales from Monthly_Sales) as Table1
FULL OUTER JOIN (SELECT month, purchase from Monthly_Purchase) as Table2
ON Table1.month = Table2.month;
So what should I do?
You can use union all and group by:
select month, sum(sales), sum(purchase)
from ((select month, sales, null as purchase
from sales
) union all
(select month, null, purchase
from purchases
)
) sp
group by month;

SUM from different column and from different table and show result in one row of each year

I have two tables named table1 and table2 . I've tried to sum some column but result is showing wrong
I've tried the following mysql query:
SELECT t1.year
, SUM(t1.deposit) TOTALDEPOSIT
, SUM(t1.interest) TOTALINTEREST
, SUM(t1.otherinterest) TOTALOTHER
FROM table1 t1
LEFT
JOIN table2 t2
ON t1.year = t2.year
GROUP
BY t1.year
But result of SUM is not showing accurately
My tables are below
table1
| table1id| year| deposit| interest|
|---------|-----|--------|---------|
| 1|2019 | 20 | 1 |
| 2|2019 | 20 | 2 |
| 3|2019 | 20 | 1 |
| 3|2019 | 20 | 2 |
| 3|2020 | 20 | 3 |
| 3|2020 | 20 | 4 |
table2
| table2id| year | otherinterest|
|----------------|--------------|
| 1 | 2019 | 10 |
| 2 | 2019 | 10 |
The expected result is
| YEAR | TOTALDEPOSIT| TOTALINTEREST |TOTALOTHER |
|--------------------|----------------|-----------|
| 2019 | 120 | 6 | 20 |
| 2020 | 40 | 7 | |
But My query giving result
| YEAR | TOTALDEPOSIT| TOTALINTEREST |TOTALOTHER |
|--------------------|----------------|-----------|
| 2019 | 160 | 12 | 80 |
| 2020 | 40 | 7 | |
So could you please anyone help me to solve this query?
Your query doesn’t work correctly because the intermediate result is probably not the same as you expected.
Let’s try this query:
SELECT *
FROM table1 t1
LEFT JOIN table2 t2
ON t1.year = t2.year
Result will be:
+----------+------+---------+----------+----------+------+---------------+
| table1id | year | deposit | interest | table2id | year | otherinterest |
+----------+------+---------+----------+----------+------+---------------+
| 1 | 2019 | 20 | 1 | 1 | 2019 | 10 |
| 2 | 2019 | 20 | 2 | 1 | 2019 | 10 |
| 3 | 2019 | 20 | 1 | 1 | 2019 | 10 |
| 3 | 2019 | 20 | 2 | 1 | 2019 | 10 |
| 1 | 2019 | 20 | 1 | 2 | 2019 | 10 |
| 2 | 2019 | 20 | 2 | 2 | 2019 | 10 |
| 3 | 2019 | 20 | 1 | 2 | 2019 | 10 |
| 3 | 2019 | 20 | 2 | 2 | 2019 | 10 |
| 3 | 2020 | 20 | 3 | NULL | NULL | NULL |
| 3 | 2020 | 20 | 4 | NULL | NULL | NULL |
+----------+------+---------+----------+----------+------+---------------+
So we have 10 rows, not 6. You can see that for example sum of deposits for year 2019 is 160. Same number as in your "wrong" result.
This is because for each record in table1 where year is 2019 joining condition (t1.year = t2.year) is twice true.
In other words for this rows from table1 where year equals 2019 we have two rows in result table - one with table2id=1 and antoher with table2id=2.
A sub query is a bit less wordy than a join.
drop table if exists t,t1;
create table t
(table1id int, year int, deposit int, interest int);
insert into t values
( 1,2019 , 20 , 1),
( 2,2019 , 20 , 2),
( 3,2019 , 20 , 1),
( 3,2019 , 20 , 2),
( 3,2020 , 20 , 3),
( 3,2020 , 20 , 4);
create table t1
( table2id int, year int, otherinterest int);
insert into t1 values
( 1 , 2019 , 10 ),
( 2 , 2019 , 10 );
select t.year,sum(deposit),sum(interest),
(select sum(otherinterest) from t1 where t1.year = t.year) otherinterest
FROM t
group by t.year;
+------+--------------+---------------+---------------+
| year | sum(deposit) | sum(interest) | otherinterest |
+------+--------------+---------------+---------------+
| 2019 | 80 | 6 | 20 |
| 2020 | 40 | 7 | NULL |
+------+--------------+---------------+---------------+
2 rows in set (0.00 sec)
Just use a simple subquery and it will works.
SELECT A.year, SUM(A.deposit) TOTALDEPOSIT, SUM(A.interest) TOTALINTEREST,
(SELECT SUM(B.otherinterest) FROM table2 B WHERE B.year= A.year) TOTALOTHER
FROM table1 A
GROUP BY A.year
You should join the aggreated result from each table eg:
select t1.year, tt1.totaldeposit, tt1.totalinterest, tt2.otherinterest
from table1 t1
inner join (
select year, sum(deposit) totaldeposit, sum(interest) totalinterest
from table1
group by year
) tt1 On t1.year = tt1.year
left join (
select year, sum(otherinterest) otherinterest
from table1
group by year
) tt2 On t1.year = tt2.year
Simple GROUP BY with LEFT JOIN is what you need, but order of that operations should be different from what you have :)
select t1.year,
t1.deposit totaldeposit,
t1.interest totalinterest,
t2.otherinterest * t1.cnt totalother
from (
select year, sum(deposit) deposit, sum(interest) interest, count(*) cnt
from table1
group by year
) t1 left join (
select year, sum(otherinterest) otherinterest
from table2
group by year
) t2 on t1.year = t2.year

MYSQL: Left JOIN from two SELECT to "fill gaps" in dates

Let's say I have a table "calendar"
+------------+
| day_date |
+------------+
| 2015-01-01 |
| 2015-01-02 |
| 2015-01-03 |
| .......... |
| 2015-07-14 |
| 2015-07-15 |
+------------+
With this query I can select the WEEK (that I need)
SELECT WEEK(day_date,1) AS NUM_WEEK,
YEAR(day_date) AS YEAR,
STR_TO_DATE(CONCAT(YEAR(day_date),WEEK(day_date,1),' Monday'), '%X%V %W') AS date_start
FROM calendar
GROUP BY NUM_WEEK
And this is the result:
+----------+------+------------+
| NUM_WEEK | YEAR | date_start |
+----------+------+------------+
| 29 | 2015 | 2015-07-20 |
| 30 | 2015 | 2015-07-27 |
| 31 | 2015 | 2015-08-03 |
| 32 | 2015 | 2015-08-10 |
| 33 | 2015 | 2015-08-17 |
| 34 | 2015 | 2015-08-24 |
| 35 | 2015 | 2015-08-31 |
| 36 | 2015 | 2015-09-07 |
| 37 | 2015 | 2015-09-14 |
| 38 | 2015 | 2015-09-21 |
| 39 | 2015 | 2015-09-28 |
| 40 | 2015 | 2015-10-05 |
| 41 | 2015 | 2015-10-12 |
| 42 | 2015 | 2015-10-19 |
| 43 | 2015 | 2015-10-26 |
+----------+------+------------+
Now I have another table:
+----+------------+--------+---------------------+
| id | id_account | amount | date_transaction |
+----+------------+--------+---------------------+
| 1 | 283 | 150 | 2015-06-21 15:50:47 |
| 2 | 283 | 47.74 | 2015-07-23 15:55:44 |
| 3 | 281 | 21.55 | 2015-08-24 12:27:11 |
| 4 | 283 | 11.22 | 2015-08-25 10:00:54 |
+----+------------+--------+---------------------+
They are gaps in date.
With a similar query:
SELECT WEEK(date_transaction,1) AS NUM_WEEK,
YEAR(date_transaction) AS YEAR,
STR_TO_DATE(CONCAT(YEAR(date_transaction),WEEK(date_transaction,1),' Monday'), '%X%V %W')
AS date_start,
transaction.id_account,
SUM(amount) as total FROM transaction
INNER JOIN account ON account.id_account = transaction.id_account
WHERE amount > 0 AND transaction.id_account
IN ( SELECT id_account FROM account WHERE id_customer = 12 )
GROUP BY id_account, WEEK(date_transaction,1)
I obtain this result (probably data are not accurate, referring to previous tables, just to explain).
+----------+------+------------+-----------+----------+
| NUM_WEEK | YEAR | date_start | idAccount | total |
+----------+------+------------+-----------+----------+
| 29 | 2015 | 2015-07-20 | 281 | 22377.00 |
| 30 | 2015 | 2015-07-27 | 281 | 11550.00 |
| 32 | 2015 | 2015-08-04 | 281 | 4500.00 |
| 30 | 2015 | 2015-07-27 | 283 | 1500 |
+----------+------+------------+-----------+----------+
What I would, RIGHT (or LEFT) JOINING the two tables?
The min (and max) WEEK, so I can... (see 2)
Fill the gaps with missing WEEKS with NULL VALUES.
E.g., in a more complicated resultset:
+----------+------+------------+-----------+----------+
| NUM_WEEK | YEAR | date_start | idAccount | total |
+----------+------+------------+-----------+----------+
| 29 | 2015 | 2015-07-20 | 281 | 22377.00 |
| 30 | 2015 | 2015-07-27 | 281 | 11550.00 |
| 31 | 2015 | 2015-07-02 | 281 | NULL |
| 32 | 2015 | 2015-08-09 | 281 | 4500.00 |
| 29 | 2015 | 2015-08-09 | 283 | NULL |
| 30 | 2015 | 2015-07-16 | 283 | 1500 |
| 31 | 2015 | 2015-07-16 | 283 | NULL |
| 32 | 2015 | 2015-07-16 | 283 | NULL |
+----------+------+------------+-----------+----------+
Note, for example, that id=283 now has NULL at WEEK 29, 31 and 32, for example, like id=281 has NULL in WEEK 31.
I prepared also SQLFiddle here: http://sqlfiddle.com/#!9/a8fdc/3
Thank you very much.
I take a look on your question and i came up with this solution. Here is how your query could look like:
SELECT t1.NUM_WEEK, t1.`YEAR`, t1.date_start, t1.id_account, t2.total
FROM (SELECT c.NUM_WEEK, c.`YEAR`, c.date_start, a.id_account
FROM (SELECT WEEK(day_date,1) AS NUM_WEEK,
YEAR(day_date) AS `YEAR`,
STR_TO_DATE(CONCAT(YEAR(day_date),WEEK(day_date,1),' Monday'), '%X%V %W') AS date_start,
(SELECT GROUP_CONCAT(id_account) FROM account WHERE id_customer=12) AS accounts_id
FROM calendar
GROUP BY NUM_WEEK) c
INNER JOIN account a
ON FIND_IN_SET(a.id_account, c.accounts_id)
ORDER BY a.id_account, c.NUM_WEEK) t1
LEFT JOIN
(SELECT WEEK(t.date_transaction,1) AS NUM_WEEK,
YEAR(t.date_transaction) AS `YEAR`,
STR_TO_DATE(CONCAT(YEAR(t.date_transaction),WEEK(t.date_transaction,1),' Monday'), '%X%V %W') AS date_start,
t.id_account, SUM(t.amount) AS total
FROM `transaction` t
INNER JOIN account a
ON a.id_account = t.id_account
WHERE t.amount > 0 AND
t.id_account IN (SELECT id_account FROM account WHERE id_customer = 12)
GROUP BY id_account, WEEK(date_transaction,1)) t2
ON t1.NUM_WEEK = t2.NUM_WEEK AND t1.YEAR = t2.YEAR AND t1.id_account = t2.id_account;
Here is SQL Fiddle for that so you can check up result. Hope that is what are you looking for.
Little explanation:
First think i done is that I little modified your first query where you extract data from table calendar and add there one new column called accounts_id. That query now look's like this:
SELECT WEEK(day_date,1) AS NUM_WEEK,
YEAR(day_date) AS `YEAR`,
STR_TO_DATE(CONCAT(YEAR(day_date),WEEK(day_date,1),' Monday'), '%X%V %W') AS date_start,
(SELECT GROUP_CONCAT(id_account) FROM account WHERE id_customer=12) AS accounts_id
FROM calendar
GROUP BY NUM_WEEK
Please pay attention on this line in SELECT statement
(SELECT GROUP_CONCAT(id_account) FROM account WHERE id_customer=12) AS accounts_id
Note that when you select for specific customer you need to change customer ID in this line too!!!
Here is Fiddle so you can check result that this query produce.
This is necessary because we need to connect each week with each account to get desired result.
Next step is to extend previous query so we could separate accounts_id column (look result of previous query) so we could get row for each value in that column. Extended query look like this:
SELECT c.NUM_WEEK, c.`YEAR`, c.date_start, a.id_account
FROM (SELECT WEEK(day_date,1) AS NUM_WEEK,
YEAR(day_date) AS `YEAR`,
STR_TO_DATE(CONCAT(YEAR(day_date),WEEK(day_date,1),' Monday'), '%X%V %W') AS date_start,
(SELECT GROUP_CONCAT(id_account) FROM account WHERE id_customer=12) AS accounts_id
FROM calendar
GROUP BY NUM_WEEK) c
INNER JOIN account a
ON FIND_IN_SET(a.id_account, c.accounts_id)
ORDER BY a.id_account, c.NUM_WEEK
and output you can see in this Fiddle
After that all we need to do is to make left join between this query and query you already wrote in your question (last query).
There might be a better solution or even this one maybe can be improved a little, but I don't have much time now to deal with that and this is the first think that cross my mind...
GL!
P. S. pay attention when you use reserved word in MySQL like YEAR, TRANSACTION etc for column name (as column_name).. that can cause you a treble if have to use them in name of column or table use backquote () to mark them (asyear`)...

mysql query for getting distinct and lastest record

I have made a view(joining four tables) like below:
ID | BookID | date | points |
1 | 11 | 2014-11-01 | 15 |
1 | 11 | 2015-01-01 | 16 |
1 | 11 | 2014-12-01 | 17 |
1 | 12 | 2014-02-11 | 18 |
1 | 12 | 2014-03-11 | 19 |
1 | 12 | 2014-04-11 | 15 |
1 | 13 | 2014-12-23 | 121 |
1 | 14 | 2014-01-15 | 113 |
1 | 14 | 2014-02-08 | 112 |
I want the result of this view as below
ID | BookID | Date | points |
1 | 11 | 2015-01-01 | 16 |
1 | 12 | 2014-04-11 | 15 |
1 | 13 | 2014-12-23 | 121 |
1 | 14 | 2014-02-08 | 112 |
It should be like Distincit Book ID with max date and showing as seprate points.
So far i have tried the group by with join and group by with date. But it is getting a bit over as i am unable to find a solution to this.
My Query is:
SELECT m1.* FROM viewPoints m1 LEFT JOIN viewPoints m2
ON (m1.BookID = m2.BookID AND m1.Date < m2.Date)
WHERE m1.ID= 1 and m2.Date IS NULL
ORDER BY m1.BookID
Any help! Thanks in Advance.
Maybe this is what you want?
select v.*
from viewPoints v
join (
select
BookID,
max(date) max_date
from viewPoints
where points is not null
group by BookID
) v2 on v.BookID = v2.BookID and v.date = v2.max_date
where v.points is not null
order by v.BookID
Sample SQL Fiddle
Sample output:
| ID | BOOKID | DATE | POINTS |
|----|--------|---------------------------------|--------|
| 1 | 11 | January, 01 2015 00:00:00+0000 | 16 |
| 1 | 12 | April, 11 2014 00:00:00+0000 | 15 |
| 1 | 13 | December, 23 2014 00:00:00+0000 | 121 |
| 1 | 14 | February, 08 2014 00:00:00+0000 | 112 |
SELECT *
FROM tablename
WHERE DATE
IN (
SELECT MAX( DATE )
FROM tablename
GROUP BY bookid
ORDER BY DATE DESC
)
ORDER BY DATE DESC
CREATE VIEW [BOOKLIST] AS
SELECT m1.* FROM viewPoints m1 LEFT JOIN viewPoints m2
ON (m1.BookID = m2.BookID AND m1.Date < m2.Date)
WHERE m1.ID= 1 and m2.Date IS NULL
ORDER BY m1.BookID
SELECT ID, DISTINCT BookID, Date, points FROM BOOKLIST
WHERE Date BETWEEN "start date" AND "end date"

MySQL: Finding Maximum Value for a Column for each location and server

I have the following table:
++++++++++++++++++++++++++++++++++++++++++++++++++
| location | server | datetime | max_cpu |
++++++++++++++++++++++++++++++++++++++++++++++++++
| Chicago | 1 | 2013-05-01 00:00 | 10 |
| Chicago | 1 | 2013-05-01 01:00 | 15 |
| Chicago | 1 | 2013-05-01 02:00 | 11 |
| Chicago | 2 | 2013-05-01 00:00 | 8 |
| Chicago | 2 | 2013-05-01 01:00 | 12 |
| Chicago | 2 | 2013-05-01 02:00 | 13 |
| Atlanta | 1 | 2013-05-01 00:00 | 11 |
| Atlanta | 1 | 2013-05-01 01:00 | 12 |
| Atlanta | 1 | 2013-05-01 02:00 | 19 |
| Atlanta | 2 | 2013-05-01 00:00 | 21 |
| Atlanta | 2 | 2013-05-01 01:00 | 15 |
| Atlanta | 2 | 2013-05-01 02:00 | 17 |
I need the maximum CPU for each box in each location for a given day, e.g.
++++++++++++++++++++++++++++++++++++++++++++++++++
| location | server | datetime | max_cpu |
++++++++++++++++++++++++++++++++++++++++++++++++++
| Chicago | 1 | 2013-05-01 01:00 | 15 |
| Chicago | 2 | 2013-05-01 02:00 | 13 |
| Atlanta | 1 | 2013-05-01 02:00 | 19 |
| Atlanta | 2 | 2013-05-01 00:00 | 21 |
I know how to do this for a single criteria (e.g. just location) and tried to expand upon that (see below) but it is not giving me the output I need.
SELECT a.location, a.server, a.datetime, a.max_cpu
FROM mytable as a INNER JOIN
(
SELECT location, server, max(max_cpu) as max_cpu
FROM mytable
GROUP BY location, server
)
AS b ON
(
a.location = b.location
AND a.server = b.server
AND a.max_cpu = b.max_cpu
)
You can do this by finding the max cpu and joining back to the original table.
It seems that you want the time of the max as well as the amount (this is not clearly stated in the text, but clear in the results):
select t.*
from mytable t join
(select location, server, DATE(datetime) as thedate, MAX(max_cpu) as maxmaxcpu
from mytable t
group by location, server, DATE(datetime)
) lsd
on lsd.location = t.location and lsd.server = t.server and
lsd.thedate = DATE(t.datetime) and lsd.maxmaxcpu = t.max_cpu
This calculates the maxcpu on each day and then joins back to get the appropriate row or rows in the original data. If there is more than one record with the max, you'll get all the records. If you only want one, you can add group by location, server, day(datetime) to the query.
This better answers the "for a given day" part of the question. Since you can ignore the time, this avoids that date hacky thing, is a tad simpler, and if multiple times have the same CPU for that server, it doesn't show duplicates:
select distinct a.location, a.server, a.datetime, a.max_cpu
from
mytable a
inner join (
select location, server, max(max_cpu) as max
from mytable
where
datetime >= ? -- start of day
and datetime < ? -- start of next day
group by location, server
) b on a.location=b.location and a.server=b.server and a.max_cpu as max
where
a.datetime >= ? -- start of day
a.and datetime < ? -- start of next day
Query (works only if max_cpu is unique per location, server and Date ):
SQLFIDDLEExample
SELECT t1.*
FROM Table1 t1
WHERE t1.max_cpu = (SELECT MAX(t2.max_cpu)
FROM Table1 t2
WHERE t2.location = t1.location
AND t2.server = t1.server
AND DATE(t2.datetime) = DATE(t1.datetime))
Result:
| LOCATION | SERVER | DATETIME | MAX_CPU |
------------------------------------------------------------
| Chicago | 1 | May, 01 2013 01:00:00+0000 | 15 |
| Chicago | 2 | May, 01 2013 02:00:00+0000 | 13 |
| Atlanta | 1 | May, 01 2013 02:00:00+0000 | 19 |
| Atlanta | 2 | May, 01 2013 00:00:00+0000 | 21 |