Mysql query, allowing duplicates - mysql

I have a table named Data( id, url ). One of the api in my project returns me the list of ids ( there could be duplicate ids in this list). For the sake of this question lets assume this list as ( 1, 1, 2, 3, 4, 4)
I am trying to find the urls associated with these ids.
My first and naive attempt was to use IN clause:
SELECT url from Data where id in ( 1, 1, 2, 3, 4, 4);
This returns me four rows. i.e. urls for id 1,2,3 and 4.
What I want is six rows, each one for specified id ( duplicate rows need to be retained )
I understood that IN clause is not helpful in this situation. Could anyone please point me to right direction?
I could fire a query for individual id by iterating the list but its a last resort for me.
UPDATE: Adding more details about table
mysql> desc Data;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| url | varchar(11) | YES | | NULL | |
+-------+-------------+------+-----+---------+----------------+
2 rows in set (0.00 sec)
mysql> select * from Data;
+----+-------+
| id | url |
+----+-------+
| 1 | a.com |
| 2 | b.com |
| 3 | c.com |
| 4 | d.com |
+----+-------+
4 rows in set (0.00 sec)
mysql> select url from Data where id in(1,1,2,3,4,4);
+-------+
| url |
+-------+
| a.com |
| b.com |
| c.com |
| d.com |
+-------+
4 rows in set (0.00 sec)
What I want is:
+-------+
| url |
+-------+
| a.com |
| a.com |
| b.com |
| c.com |
| d.com |
| d.com |
+-------+

It's not pretty, but
select url
from Data
inner join
( SELECT id FROM (
SELECT 1 as id UNION ALL
SELECT 1 as id UNION ALL
SELECT 2 as id UNION ALL
SELECT 3 as id UNION ALL
SELECT 4 as id UNION ALL
SELECT 4 as id
) as list_table ) as table2
on (Data.id = table2.id);
I found pretty much no way to select values from a list or join a table to a list, but you could check out this SO Question

this works very good
SELECT url
from table1
where id in ( 1, 1, 2, 3, 4, 4);
Demo
to drop the unique on id
alter table Data drop index PRI;
to drop primary key
ALTER TABLE Data DROP INDEX `PRIMARY`;

Related

how to omit mysql rows that match another table when neither is unique

I have two tables, one that holds potential items, the other holds completed items.
The potential item table currently contains the records that have also been added to the completed items table. I want to remove (either by deleting or selecting new results) the already completed items from the list of potential items.
In both tables, items may appear multiple times, and I only want to remove the number of items that are completed, not all that match.
The real data set is more larger of course, but here are samples.
Potential items:
mysql> select * from stack;
+----------+------+------+
| stack_id | type | name |
+----------+------+------+
| 3 | a | aa |
| 4 | b | bb |
| 5 | c | cc |
| 6 | d | dd |
| 7 | a | aa |
| 8 | b | bb |
+----------+------+------+
6 rows in set (0.00 sec)
Completed items
mysql> select * from temp;
+----------+------+------+
| item_id | type | name |
+----------+------+------+
| 1 | a | aa |
| 2 | b | bb |
| 6 | b | bb |
+----------+------+------+
3 rows in set (0.00 sec)
The IDs between tables do not correlate, so they should be ignored as far as finding matches.
I want to omit 1 instance of a/aa and 2 of b/bb since those have been completed and exist in the other table.
when I try this:
mysql> select stack.* from stack where (type,name) not in (select type,name from temp);
I get this:
+----------+------+------+
| stack_id | type | name |
+----------+------+------+
| 5 | c | cc |
| 6 | d | dd |
+----------+------+------+
2 rows in set (0.03 sec)
But this omitted both instances of type="a" and name="aa" and I want to only omit one of them (since it only exists once in the completed items table)
How do I get this?
+----------+------+------+
| stack_id | type | name |
+----------+------+------+
| 5 | c | cc |
| 6 | d | dd |
| 7 | a | aa |
+----------+------+------+
I don't care which instance of a/aa is deleted (whether id=7 or id=3)
The best I've been able to come up with is to use PHP rather than MySQL to loop through each record in temp and delete with a LIMIT 1 from stack.
But I'd rather not have to run code for this, I'd like to do it in queries, it works better that way in my workflow
Thanks!
CREATE TABLE `test`.`stack` (
`stack_id` INT UNSIGNED NOT NULL,
`type` VARCHAR(45) NULL,
`name` VARCHAR(45) NULL,
PRIMARY KEY (`stack_id`));
CREATE TABLE `test`.`temp` (
`item_id` INT UNSIGNED NOT NULL,
`type` VARCHAR(45) NULL,
`name` VARCHAR(45) NULL,
PRIMARY KEY (`item_id`));
and than something like this:
select
min(stack_id), type, name
from stack _s
inner join
(
select min(item_id) item_id, type, name
from temp
group by type, name
) _t using(type, name)
group by _s.type, _s.name
will give you only one the first item in temp:
stack_id
type
name
3
a
aa
4
b
bb

MySQL 5.7 JSON column update

I am using MySQL 5.7. I have a table with a JSON column.
MySQL [test_db]> select * from mytable;
+----+-------+---------------------+
| id | name | hobby |
+----+-------+---------------------+
| 1 | Rahul | {"Game": "Cricket"} |
| 2 | Sam | null |
+----+-------+---------------------+
Here, for row id = 2, I want to insert a data. I did -
update mytable set hobby = JSON_SET(hobby, '$.Game', 'soccer') where id = 2;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0
It seems like data inserted properly, But when I checked
MySQL [test_db]> select * from mytable;
+----+-------+---------------------+
| id | name | hobby |
+----+-------+---------------------+
| 1 | Rahul | {"Game": "Cricket"} |
| 2 | Sam | null |
+----+-------+---------------------+
data is not inserted, Can anybody give some hint, what I am missing here.
Thanks.
Hobby is NULL, and you can't set a property on NULL, so use an IF statement instead, to convert null to an empty object first (Or initialize hobby as an empty object instead of NULL):
UPDATE mytable
SET hobby = JSON_SET(IF(hobby IS NULL, '{}', hobby), '$.Game', 'soccer')
WHERE id = 2;
Alternatitvely, use COALESCE:
UPDATE mytable
SET hobby = JSON_SET(COALESCE(hobby, '{}'), '$.Game', 'soccer')
WHERE id = 2;
See dbfiddle here.

My mysql statement to query by primary key sometimes returns more than one row, so what happened?

My schema is this:
CREATE TABLE `user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_name` varchar(10) NOT NULL,
`account_type` varchar(10) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=latin1
INSERT INTO user VALUES (1, "zhangsan", "premiumv"), (2, "lisi", "premiumv"), (3, "wangwu", "p"), (4, "maliu", "p"), (5, "hengqi", "p"), (6, "shuba", "p");
I have the following 6 rows in the table:
+----+-----------+--------------+
| id | user_name | account_type |
+----+-----------+--------------+
| 1 | zhangsan | premiumv |
| 2 | lisi | premiumv |
| 3 | wangwu | p |
| 4 | maliu | p |
| 5 | hengqi | p |
| 6 | shuba | p |
+----+-----------+--------------+
Here is mysql to query the table by id:
SELECT * FROM user WHERE id = floor(rand()*6) + 1;
I expect it to return one row, but the actual result is non-predictive. It either will return 0 row, 1 row or sometimes more than one row. Can somebody help clarify this? Thanks!
You're testing each row against a different random number, so sometimes multiple rows will match. To fix this, calculate the random number once in a subquery.
SELECT u.*
FROM user AS u
JOIN (SELECT floor(rand()*6) + 1 AS r) AS r
ON u.id = r.r
This method of selecting a random row from a table seems like a poor design. If there are any gaps in the id sequence (which can happen easily -- MySQL doesn't guarantee that they'll always be sequential, and deleting rows will leave gaps) then it could return an empty result. The usual way to select a random row from a table is with:
SELECT *
FROM user
ORDER BY RAND()
LIMIT 1
The WHERE part must be evaluated for each row to see if there is a match. Because of this, the rand() function is evaluated for every row. Getting an inconsistent number of rows seems reasonable.
If you add LIMIT 1 to your query, the probability of returning rows from the end diminishes.
It's because the WHERE clause floor(rand()*6) + 1 is evaluated against every rows in the table to see if the condition matches the criteria. The value could be different each time it is matched against the row from the table.
You can test with a table that has same values in the column used in WHERE clause, and you can see the result:
select * from test;
+------+------+
| id | name |
+------+------+
| 1 | a |
| 2 | b |
| 1 | c |
| 2 | d |
| 1 | e |
| 2 | f |
+------+------+
select * from test where id = floor(rand()*2) + 1;
+------+------+
| id | name |
+------+------+
| 1 | a |
| 2 | d |
| 1 | e |
+------+------+
In the above example, the expression floor(rand()*2) + 1 returns 1 when matching against the first row (with name = 'a') so it is included in the result set. But then it returns 2 when matching against the forth row (with name = 'd'), so it is also included in the result set even the value of id is different from the value of the first row in the result set.

MySQL database relationship without an ID

Hi StackOverflow community,
I have these two tables:
tbl_users
ID_user (PRIMARY KEY)
Username (UNIQUE)
Password
...
tbl_posts
ID_post (PRIMARY KEY)
Owner (UNIQUE)
Description
...
Why always everybody make database relationships with foreign keys? What about if I want to relate Username with Owner instead of doing ID_user with ID_user in both tables?
Username is UNIQUE and the Owner is the username of the creator of the post.
Can it be done like that? There is something to correct or make better? Maybe I have a misconception.
I would appreciate detailed and understandable answers.
Thank you in advance.
The reason is primarily for data integrity. The argument concerning performance is a little misleading. While neither exhaustive, nor definitive, I hope this little example will shed some light on that fact:
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(i INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,s CHAR(12) NOT NULL UNIQUE
);
STEP1:
INSERT IGNORE INTO my_table (s)
SELECT CONCAT(CHAR((RAND()*26)+97),CHAR((RAND()*26)+97),CHAR((RAND()*26)+97),CHAR((RAND()*26)+97),CHAR((RAND()*26)+97),CHAR((RAND()*26)+97)
,CHAR((RAND()*26)+97),CHAR((RAND()*26)+97),CHAR((RAND()*26)+97),CHAR((RAND()*26)+97),CHAR((RAND()*26)+97),CHAR((RAND()*26)+97)
);
STEP2:
INSERT IGNORE INTO my_table (s)
SELECT CONCAT(CHAR((RAND()*26)+97),CHAR((RAND()*26)+97),CHAR((RAND()*26)+97),CHAR((RAND()*26)+97),CHAR((RAND()*26)+97),CHAR((RAND()*26)+97)
,CHAR((RAND()*26)+97),CHAR((RAND()*26)+97),CHAR((RAND()*26)+97),CHAR((RAND()*26)+97),CHAR((RAND()*26)+97),CHAR((RAND()*26)+97)
)
FROM my_table;
[REPEAT STEP 2 SEVERAL TIMES]
SELECT COUNT(*) FROM my_table;
+----------+
| COUNT(*) |
+----------+
| 16384 |
+----------+
1 row in set (0.01 sec)
SELECT * FROM my_table ORDER BY i LIMIT 12;;
+----+------------+
| i | s |
+----+------------+
| 1 | kkxeehxsvy |
| 2 | iuyhrk{vaq |
| 3 | ngpedelooc |
| 4 | irkbyqgkhc |
| 6 | yqkcifcxdz |
| 7 | sgezlgvjjq |
| 8 | blavbvxbnl |
| 9 | wdbtqvgvgt |
| 13 | pakzpbnhxr |
| 14 | vpoy{gdwyd |
| 15 | ezlhz{drwg |
| 16 | ncwcwbpudh |
+----+------------+
SELECT * FROM my_table x JOIN my_table y ON y.i < x.i ORDER BY x.i,y.i LIMIT 1;
+---+------------+---+------------+
| i | s | i | s |
+---+------------+---+------------+
| 2 | iuyhrk{vaq | 1 | kkxeehxsvy |
+---+------------+---+------------+
1 row in set (1 min 22.60 sec)
SELECT * FROM my_table x JOIN my_table y ON y.s < x.s ORDER BY x.s,y.s LIMIT 1;
+-------+------------+------+------------+
| i | s | i | s |
+-------+------------+------+------------+
| 21452 | aabetdlvum | 6072 | aabdnegtav |
+-------+------------+------+------------+
1 row in set (1 min 13.59 sec)
So, we have two queries doing essentially the same thing (a comparison of 270 million values). The first joins the table to itself on an integer value. The second joins the table to itself on a string value. Both columns are indexed. As you can see, in this example, the string join actually performs better than the integer join - even though the hit on the CPU may actually be greater!

MySQL, how to merge table duplicates entries [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How can I remove duplicate rows?
Remove duplicates using only a MySQL query?
I have a large table with ~14M entries. The table type is MyISAM ans not InnoDB.
Unfortunately, I have some duplicate entries in this table that I found with the following request :
SELECT device_serial, temp, tstamp, COUNT(*) c FROM up_logs GROUP BY device_serial, temp, tstamp HAVING c > 1
To avoid these duplicates in the future, I want to convert my current index to a unique constraint using SQL request :
ALTER TABLE up_logs DROP INDEX UK_UP_LOGS_TSTAMP_DEVICE_SERIAL,
ALTER TABLE up_logs ADD INDEX UK_UP_LOGS_TSTAMP_DEVICE_SERIAL ( `tstamp` , `device_serial` )
But before that, I need to clean up my duplicates!
My question is : How can I keep only one entry of my duplicated entries? Keep in mind that my table contain 14M entries, so I would like avoid loops if it is possible.
Any comments are welcome!
Creating a new unique key on the over columns you need to have as uniques will automatically clean the table of any duplicates.
ALTER IGNORE TABLE `table_name`
ADD UNIQUE KEY `key_name`(`column_1`,`column_2`);
The IGNORE part does not allow the script to terminate after the first error occurs. And the default behavior is to delete the duplicates.
Since MySQL allows Subqueries in update/delete statements, but not if they refer to the table you want to update, I´d create a copy of the original table first. Then:
DELETE FROM original_table
WHERE id NOT IN(
SELECT id FROM copy_table
GROUP BY column1, column2, ...
);
But I could imagine that copying a table with 14M entries takes some time... selecting the items to keep when copying might make it faster:
INSERT INTO copy_table
SELECT * FROM original_table
GROUP BY column1, column2, ...;
and then
DELETE FROM original_table
WHERE id IN(
SELECT id FROM copy_table
);
It was some time since I used MySQL and SQL in general last time, so I´m quite sure that there is something with better performance - but this should work ;)
This is how you can delete duplicate rows... I'll write you my example and you'll need to apply to your code. I have Actors table with ID and I want to delete the rows with repeated first_name
mysql> select actor_id, first_name from actor_2;
+----------+-------------+
| actor_id | first_name |
+----------+-------------+
| 1 | PENELOPE |
| 2 | NICK |
| 3 | ED |
....
| 199 | JULIA |
| 200 | THORA |
+----------+-------------+
200 rows in set (0.00 sec)
-Now I use a Variable called #a to get the ID if the next row have the same first_name(repeated, null if it's not).
mysql> select if(first_name=#a,actor_id,null) as first_names,#a:=first_name from actor_2 order by first_name;
+---------------+----------------+
| first_names | #a:=first_name |
+---------------+----------------+
| NULL | ADAM |
| 71 | ADAM |
| NULL | AL |
| NULL | ALAN |
| NULL | ALBERT |
| 125 | ALBERT |
| NULL | ALEC |
| NULL | ANGELA |
| 144 | ANGELA |
...
| NULL | WILL |
| NULL | WILLIAM |
| NULL | WOODY |
| 28 | WOODY |
| NULL | ZERO |
+---------------+----------------+
200 rows in set (0.00 sec)
-Now we can get only duplicates ID:
mysql> select first_names from (select if(first_name=#a,actor_id,null) as first_names,#a:=first_name from actor_2 order by first_name) as t1;
+-------------+
| first_names |
+-------------+
| NULL |
| 71 |
| NULL |
...
| 28 |
| NULL |
+-------------+
200 rows in set (0.00 sec)
-the Final Step, Lets DELETE!
mysql> delete from actor_2 where actor_id in (select first_names from (select if(first_name=#a,actor_id,null) as first_names,#a:=first_name from actor_2 order by first_name) as t1);
Query OK, 72 rows affected (0.01 sec)
-Now lets check our table:
mysql> select count(*) from actor_2 group by first_name;
+----------+
| count(*) |
+----------+
| 1 |
| 1 |
| 1 |
...
| 1 |
+----------+
128 rows in set (0.00 sec)
it works, if you have any question write me back