How to retrieve odd rows from the table?
In the Base table always Cr_id is duplicated 2 times.
Base table
I want a SELECT statement that retrieves only those c_id =1 where Cr_id is always first as shown in the output table.
Output table
Just see the base table and output table you should automatically know what I want, Thanx.
Just testing min date should be enough
drop table if exists t;
create table t(c_id int,cr_id int,dt date);
insert into t values
(1,56,'2020-12-17'),(56,56,'2020-12-17'),
(1,8,'2020-12-17'),(56,8,'2020-12-17'),
(123,78,'2020-12-17'),(1,78,'2020-12-18');
select c_id,cr_id,dt
from t
where c_id = 1 and
dt = (select min(dt) from t t1 where t1.cr_id = t.cr_id);
+------+-------+------------+
| c_id | cr_id | dt |
+------+-------+------------+
| 1 | 56 | 2020-12-17 |
| 1 | 8 | 2020-12-17 |
+------+-------+------------+
2 rows in set (0.002 sec)
What you're looking for could be "partition by", at least if you're working on mssql.
(In the future, please include more background, SQL is not just SQL)
https://codingsight.com/grouping-data-using-the-over-and-partition-by-functions/
I have an old query lying around, that is able to put a sorting index on data who lacks this, although the underlying reason is 99.9% sure to be a bad data design.
Typically I use this query to remove bad data, but you may rewrite it to become a join instead, so that you can identify the data you need.
The reason why I'm not putting that answer here, is to point out, bad data design results in more work when reading it afterwards, whom seems to be the real root cause here.
DELETE t
FROM
(
SELECT ROW_NUMBER () OVER (PARTITION BY column_1 ,column_2, column_3 ORDER BY column_1,column_2 ,column_3 ) AS Seq
FROM Table
)t
WHERE Seq > 1
Related
Briefly: database imported from foreign source, so I cannot prevent duplicates, I can only prune and clean the database.
Foreign db changes daily, so, I want to automate the pruning process.
It resides on:
MariaDB v10.4.6 managed predominantly by phpMyadmin GUI v4.9.0.1 (both pretty much up to date as of this writing).
This is a radio browsing database.
It has multiple columns, but for me there are only few important:
StationID (it is unique entry number, thus db does not consider new entries as duplicates, all of them are unique because of this primary key)
There are no row numbers.
Name, url, home-page, country, etc
I do want to remove multiple url duplicated entries base on:
duplicate url has country to it, but some country values are NULL (=empty)
so I do want remove all duplicates except one containing country name, if there is one entry with it, if there is none, just one url, regardless of name (names are multilingual, so some duplicated urls have also various names, which I do not care for.
StationID (unique number, but not consecutive, also this is primary db key)
Name (variable, least important)
url (variable, but I do want to remove the duplicates)
country (variable, frequently NULL/empty, I want to eliminate those with empty entries as much as possible, if possible)
One url has to stay by any means (not to be deleted)
I have tried multitude of queries, some work for SELECT, but do NOT for DELETE, some hang my machine when executed. Here are some queries I tried (remember I use MariaDB, not oracle, or ms-sql)
SELECT * from `radio`.`Station`
WHERE (`radio`.`Station`.`Url`, `radio`.`Station`.`Name`) IN (
SELECT `radio`.`Station`.`Url`, `radio`.`Station`.`Name`
FROM `radio`.`Station`
GROUP BY `radio`.`Station`.`Url`, `radio`.`Station`.`Name`
HAVING COUNT(*) > 1)
This one should show all entries (not only one grouped), but this query hangs my machine
This query gets me as close as possible:
SELECT *
FROM `radio`.`Station`
WHERE `radio`.`Station`.`StationID` NOT IN (
SELECT MAX(`radio`.`Station`.`StationID`)
FROM `radio`.`Station`
GROUP BY `radio`.`Station`.`Url`,`radio`.`Station`.`Name`,`radio`.`Station`.`Country`)
However this query lists more entries:
SELECT *, COUNT(`radio`.`Station`.`Url`) FROM `radio`.`Station` GROUP BY `radio`.`Station`.`Name`,`radio`.`Station`.`Url` HAVING (COUNT(`radio`.`Station`.`Url`) > 1);
But all of these queries group them and display only one row.
I also tried UNION, INNER JOIN, but failed.
WITH cte AS..., but phpMyadmin does NOT like this query, and mariadb cli also did not like it.
I also tried something of this kind, published at oracle blog, which did not work, and I really had no clue what was what in this function:
select *
from (
select f.*,
count(*) over (
partition by `radio`.`Station`.`Url`, `radio`.`Station`.`Name`
) ct
from `radio`.`Station` f
)
where ct > 1
I did not know what f.* was, query did not like ct.
Given
drop table if exists radio;
create table radio
(stationid int,name varchar(3),country varchar(3),url varchar(3));
insert into radio values
(1,'aaa','uk','a/b'),
(2,'bbb','can','a/b'),
(3,'bbb',null,'a/b'),
(4,'bbb',null,'b/b'),
(5,'bbb',null,'b/b');
You could give the null countries a unique value (using coalesce), fortunately stationid is unique so:
select t.stationid,t.name,t.country,t.url
from radio t
join
(select url,max(coalesce(country,stationid)) cntry from radio t group by url) s
on s.url = t.url and s.cntry= coalesce(t.country,t.stationid);
Yields
+-----------+------+---------+------+
| stationid | name | country | url |
+-----------+------+---------+------+
| 1 | aaa | uk | a/b |
| 5 | bbb | NULL | b/b |
+-----------+------+---------+------+
2 rows in set (0.00 sec)
Translated to a delete
delete t from radio t
join
(select url,max(coalesce(country,stationid)) cntry from radio t group by url) s
on s.url = t.url and s.cntry <> coalesce(t.country,t.stationid);
MariaDB [sandbox]> select * from radio;
+-----------+------+---------+------+
| stationid | name | country | url |
+-----------+------+---------+------+
| 1 | aaa | uk | a/b |
| 5 | bbb | NULL | b/b |
+-----------+------+---------+------+
2 rows in set (0.00 sec)
Fix 2 problems at once:
Dup rows already in table
Dup rows can still be put in table
Do this fore each table:
CREATE TABLE new LIKE real;
ALTER TABLE new ADD UNIQUE(x,y); -- will prevent future dups
INSERT IGNORE INTO new -- IGNORE dups
SELECT * FROM real;
RENAME TABLE real TO old, new TO real;
DROP TABLE old;
I need to delete around 300,000 duplicates in my database. I want to check the Card_id column for duplicates, then check for duplicate timestamps. Then delete one copy and keep one. Example:
| Card_id | Time |
| 1234 | 5:30 |
| 1234 | 5:45 |
| 1234 | 5:30 |
| 1234 | 5:45 |
So remaining data would be:
| Card_id | Time |
| 1234 | 5:30 |
| 1234 | 5:45 |
I have tried several different delete statements, and merging into a new table but with no luck.
UPDATE: Got it working!
Alright after many failures I got this to work for DB2.
delete from(
select card_id, time, row_number() over (partition by card_id, time) rn
from card_table) as A
where rn > 1
rn increments when there are duplicates for card_id and time. The duplicated, or second rn, will be deleted.
I strongly suggest you take this approach:
create temporary table tokeep as
select distinct card_id, time
from t;
truncate table t;
insert into t(card_id, time)
select *
from tokeep;
That is, store the data you want. Truncate the table, and then regenerate it. By truncating the table, you get to keep triggers and permissions and other things linked to the table.
This approach should also be faster than deleting many, many duplicates.
If you are going to do that, you ought to insert a proper id as well:
create temporary table tokeep as
select distinct card_id, time
from t;
truncate table t;
alter table t add column id int auto_increment;
insert into t(card_id, time)
select *
from tokeep;
If you haven't Primary key or Candidate key probably there is no option using only one command. Try solution below.
Create table with duplicates
select Card_id,Time
into COPY_YourTable
from YourTable
group by Card_id,Time
having count(1)>1
Remove duplicates using COPY_YourTable
delete from YourTable
where exists
(
select 1
from COPY_YourTable c
where c.Card_id = YourTable.Card_id
and c.Time = YourTable.Time
)
Copy data without duplicates
insert into YourTable
select Card_id,Time
from COPY_YourTabl
I've a database called test and i've tables called x,y,z.
How do i select x,y,z and there is a column called date IN X,Y,Z check whether there is a particular date.
Is there any build in function that does this?
update
SELECT column date from all tables which is in a database called test
Thanks in advance!!
As far as I know, in SQL you cannot 'select a table', you can select some
column(s) from one or many tables at once. The result of such a query is an another table (temporary table) that you retrieve the data from.
Please be more specific about what exactly you want to do (e.g.: "I want to select a column 'z' from table 'tableA' and column 'y' from table 'tableB'") - then I'm sure your question has a pretty simple answer :)
SELECT x.date AS x_date, y.date AS y_date, z.date AS z_date FROM x,y,z;
That produces a result:
+---------+---------+---------+
| x_date | y_date | z_date |
+---------+---------+---------+
| | | |
| | | |
+---------+---------+---------+
Alternatively you can get everything in one column by ussuing a query:
SELECT date FROM x
UNION ALL
SELECT date FROM y
UNION ALL
SELECT date FROM z;
That produces a result:
+-------+
| date |
+-------+
| |
| |
+-------+
In the example above you would get also duplicate values in the single column. If you want to avoid duplicates replace 'UNION ALL' with 'UNION'
I'm still not sure if I undestood what you really want ot achieve, but I still hope that helps
Also take a look at:
http://www.w3schools.com/sql/sql_union.asp
http://www.sql-tutorial.net/SQL-JOIN.asp
I would like to run a query from a table where the content is like that :
id | col1 | col2 | col3
-----------------------
1 | i_11 | i_12 | i_13
2 | i_21 | i_22 | i_23
3 | i_31 | i_32 | i_33
.. | ... | ... | ...
SELECT col1 FROM table WHERE id IN
(SELECT id-1, id+1 FROM table WHERE col1='xxx' AND col2='yyy' AND col3='zzz')
The aim is to get an interval [id-1, id+1] based on the id column which returns the content stored in col1 for id-1 and id+1. The subquery works but I guess I have a problem with the query itself, since I'm having an error "Operand should contain only one column". I understand it, but I don't see any other way to do it in one query ?
I'm quite sure there's a pretty easy solution but I can't figure it out for the moment, even after having carefully read other posts about multiples columns' subqueries...
Thank you for any help :-)
The only way I can think to do it right now is like this:
SELECT col1
FROM table T
WHERE id BETWEEN (SELECT id FROM table WHERE col1='xxx' AND col2='yyy' AND col3='zzz') -1
and (SELECT id FROM table WHERE col1='xxx' AND col2='yyy' AND col3='zzz') +1
Your problem is that you are retrieving two values - but as a list rather than a set. The SQL optimizer can't see 1,3 as a set of two items when they are presented in a single row. There may also be a cast needed.
This should work.
SELECT col1 FROM table WHERE id in
(
select cast(id as int) -1 from table where col1='i_21'
union
select cast(id as int) +1 from table where col1='i_21'
)
I've got a really big problem, and it stems from a table with 50k+ records.
This table looks something like this (+15 or so more columns that aren't too important):
table_1
date | name | email | num_x | num_y
I also have another table ON A DIFFERENT DB (same server) that looks something like this (+1 not important column):
table_2
name | comment | status
table_1 is updated daily with new entries (it is a feed table for use on other projects), which means there are a lot of repeat "name" rows. This is intended. table_2 contains comments and status notes about "name"s, but no repeat "name"s.
I need to write a query that will select all "name"s from table_1 where the total of all num_x + num_y > X. So, for example, if this were a few rows...
2010-11-19 | john.smith | john.smith#example.com | 20 | 20
2010-11-19 | joel.schmo | joel.schmo#example.com | 10 | 10
2010-11-18 | john.smith | john.smith#example.com | 20 | 20
2010-11-18 | joel.schmo | joel.schmo#example.com | 10 | 10
.. and I needed to find all "name"s with total num_x + num_y > 50, then I'd return
john.smith | john.smith#example.com | 80 . I would also return john.smith's status and comment from the other DB.
I wrote a query that I believe works fine, but it's problematic because it takes forever and a day to run. I also successfully retrieve records from the other db (I don't have that listed below).
SELECT
name,
email,
SUM(num_x + num_y) AS total
FROM
table_1
GROUP BY
name
HAVING
SUM(num_x + num_y) > 100
ORDER BY
total ASC
Is there a better way to go about this?
Thanks everyone!
Dylan
Why do you repeat the sum in GHAVING rather than repeat total? Unless im missing something, there is no difference in results and avoiding the second sum would save time
If you can skip the ORDER BY clause and don't mind the slightly different select, I think you'll get some amount of speed up by splitting up the sum. I have a small database and have tested that its a valid query and results are correct, but its not nearly large enough to quantify the performance difference.
SELECT
name,
email,
SUM(num_x) as sumX, SUM(num_y) AS sumY
FROM
table_1
GROUP BY
name
HAVING
sumX + sumY > 100
An index on name is a no-brainer. That's the simplest thing that will speed it up.
Create an index for name, this will improve the performance:
ALTER TABLE `table_1` ADD INDEX (`name`);
But, redesigning your databases would be my recomendation. Create an artificial key for names, something like id_name | name | email, beeing id_name an integer auto_increment, this way you'll have a better performance.
Try:
SELECT
name,
email,
num_x + num_y AS total
FROM
table_1
WHERE
num_x + num_y > 100
ORDER BY
total ASC
Just getting rid of the grouping should make quite a significant difference.
maybe change the database the sum is made everytime you change x or y but it really depends of how often you change them...
Otherwise you can try to do the sum only once...
but I don't see why you do a order by on only one table if you've got a primary key...