Get difference of unique and duplicate set - SQL - mysql

I have two similar data sets (table, view, CTE), one of which contains unique rows (guaranteed by DISTINCT or GROUP BY), the second contains duplicates (no primary key constraint involved).
How can I get the difference of two data sets so that I only get the duplicates of the second set in MySql 8?
Say I have a table called Animals, which stores NAME and SPECIES.
+---------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+--------------+------+-----+---------+-------+
| ID | int(11) | NO | PRI | NULL | |
| NAME | varchar(255) | YES | | NULL | |
| SPECIES | varchar(255) | YES | | NULL | |
+---------+--------------+------+-----+---------+-------+
ANIMALS
+----+---------+-------------+
| ID | NAME | SPECIES |
+----+---------+-------------+
| 1 | Lion | Carnivorous |
| 2 | Giraffe | Herbivores |
| 3 | Zebra | Herbivores |
| 4 | Trutle | Herbivores |
| 5 | Tiger | Carnivorous |
| 6 | Bear | Carnivorous |
+----+---------+-------------+
With that in place, I define the view DUPLICATED.
CREATE VIEW DUPLICATED AS
SELECT * FROM ANIMALS
UNION ALL
SELECT * FROM ANIMALS WHERE SPECIES = "Carnivorous";
(Duplicates every Carnivorous in the set)
DUPLICATED
+---------+-------------+-----+
| NAME | SPECIES | CNT |
+---------+-------------+-----+
| Lion | Carnivorous | 2 |
| Tiger | Carnivorous | 2 |
| Bear | Carnivorous | 2 |
| Giraffe | Herbivores | 1 |
| Zebra | Herbivores | 1 |
| Trutle | Herbivores | 1 |
+---------+-------------+-----+
Now I want to get the difference of SELECT * FROM ANIMALS and DUPLICATED or vice versa, essential getting all Carnivorous from ANIMALS.

Basically you can group by whatever combination of fields that guarantee the uniqueness of a record in your result. you haven't provided your queries or your table's schema, so i will try to demonstrate this using a general example. you can get my drift and apply it to your query.
SELECT field1, field2, field3 COUNT(*)
FROM MyTable
GROUP BY field1, field2, field3
HAVING COUNT(*) > 1

Related

How to avoid temporary table on group by with join?

I'm having two tables say(for example), Department and Members
Department table description:
CREATE TABLE `Department` (
`code` int(10) DEFAULT NULL,
`name` char(100) DEFAULT NULL,
KEY `code_index` (`code`),
KEY `name_index` (`name`)
)
Department table values:
+------+-------------+
| code | name |
+------+-------------+
| 1 | Production |
| 2 | Development |
| 3 | Management |
+------+-------------+
Members table description:
CREATE TABLE `Members` (
`department_code` int(10) DEFAULT NULL,
`name` char(100) DEFAULT NULL,
KEY `department_code_index` (`department_code`),
KEY `name_index` (`name`)
)
Members table values:
+-----------------+----------------+
| department_code | name |
+-----------------+----------------+
| 1 | Ross Geller |
| 1 | Monica Geller |
| 1 | Phoebe Buffay |
| 1 | Rachel Green |
| 1 | Chandler Bing |
| 1 | Joey Tribianni |
| 2 | Janice |
| 2 | Gunther |
| 2 | Cathy |
| 2 | Emily |
| 2 | Fun Bobby |
| 2 | Heckles |
| 3 | Paolo |
| 3 | Mike Hannigan |
| 3 | Carol |
| 3 | Susan |
| 3 | Richard |
| 3 | Tag |
+-----------------+----------------+
I want to get the all the department code and name for the given set of users. As i just want the department names alone, I used the below query.
mysql> select Department.code, Department.name, Members.department_code from Department left join Members on (Department.code=Members.department_code) where Members.name in ('Rachel Green', 'Gunther', 'Paolo') group by Department.code;
+------+-------------+-----------------+
| code | name | department_code |
+------+-------------+-----------------+
| 1 | Production | 1 |
| 2 | Development | 2 |
| 3 | Management | 3 |
+------+-------------+-----------------+
This works fine and the "explain" gives me below execution plan.
+----+-------------+------------+------------+------+----------------------------------+-----------------------+---------+----------------------+------+----------+---------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+------+----------------------------------+-----------------------+---------+----------------------+------+----------+---------------------------------+
| 1 | SIMPLE | Department | NULL | ALL | code_index | NULL | NULL | NULL | 3 | 100.00 | Using temporary; Using filesort |
| 1 | SIMPLE | Members | NULL | ref | department_code_index,name_index | department_code_index | 5 | test.Department.code | 1 | 16.67 | Using where |
+----+-------------+------------+------------+------+----------------------------------+-----------------------+---------+----------------------+------+----------+---------------------------------+
But the "group by" uses temporary table which may degrade the performance if the Members table contains a lot of rows. Though I guess some ideal indexing would help out here, i can't get the proper idea. Any help will be appreciated.
Thanks in advance!
You can avoid the group by over all the data using a subquery:
select d.code, d.name, d.department_code
from Department d
where exists (select 1
from Members m
where d.code = m.department_code and
m.name in ('Rachel Green', 'Gunther', 'Paolo')
);
With an index on members(department_code, name), this should be much faster.

Performance issue MYSQL SELECT on view

I'm facing a problem of SELECT perfomance issue with MYSQL.
I have two tables "domain" and "email" which contain duplicates, theses tables are frequently updated (INSERT/DELETE) by different sources (every ten mins approximatively).
My primary objective was to make two views from thoses tables without any duplicates. I know a view is a stored query but this is my only way to keep it dynamic, creating a new table without duplicate every tens mins would be mad (maybe not?).
Both views are used by another thread (postfix) to check if the recipient is an allowed one. When i try to do a simple query
SELECT email FROM emailview WHERE email = 'john#google.com'`
the query takes 3-4seconds. On the contrary if I do my SELECT directly on the email table (with duplicates in) it takes 0,01sec.
How could i improve the SELECT performances on my system to obtain almost similar result with a view and not directly on the table ?
Here is the detail of the architecture (INNODB Engine, value 1 is random and doesn't really matter) :
Domain Table :
| field | type | null | key |
|--------------|--------------|------|------|
| domain | varchar(255) | NO | NULL |
| creationdate | datetime | NO | NULL |
| value 1 | varchar(255) | NO | NULL |
| source_fkey | varchar(255) | MUL | NULL |
| domain | creationdate | value 1 | source_fkey |
|------------|---------------------|-----------------------|
| google.com | 2013-05-28 15:35:01 | john | Y |
| google.com | 2013-04-30 12:10:10 | patrick | X |
| yahoo.com | 2011-04-02 13:10:10 | britney | Z |
| ebay.com | 2012-02-12 10:48:10 | harry | Y |
| ebay.com | 2013-04-15 07:15:23 | bill | X |
Domain View (duplicate domain are removed using the oldest creation date) :
CREATE VIEW domainview AS
SELECT domain.domain, creationdate, value1, source_fkey
FROM domain
WHERE (domain, creationdate) IN (SELECT domain, MIN(creationdate)
FROM domain GROUP BY domain);
| domain | creationdate | value 1 | source_fkey |
|------------|---------------------|-----------------------|
| google.com | 2013-04-30 12:10:10 | patrick | X |
| yahoo.com | 2011-04-02 13:10:10 | britney | Z |
| ebay.com | 2012-02-12 10:48:10 | harry | Y |
Email table :
| field | type | null | key |
|--------------|--------------|------|------|
| email | varchar(255) | NO | NULL |
| source_fkey | varchar(255) | MUL | NULL |
| email | foreign_key |
|--------------------|-------------|
| john#google.com | X |
| john#google.com | Y | <-- duplicate from wrong foreign/domain
| harry#google.com | X |
| mickael#google.com | X |
| david#ebay.com | Y |
| alice#yahoo.com | Z |
Email View (legit emails and emails from domain/foreign_key of the domain view) :
CREATE VIEW emailview AS
SELECT email.email, email.foreign_key
FROM email, domainview
WHERE email.foreign_key = domainview.foreign_key
AND SUBSTRING_INDEX(email.email,'#',-1) = domainview.domain;
| email | foreign_key |
|--------------------|-------------|
| john#google.com | X |
| harry#google.com | X |
| mickael#google.com | X |
| david#ebay.com | Y |
| alice#yahoo.com | Z |
There is no unique, no indexes, the only primary key is in the table where the foreign_key is.
Thanks for help.
Previous discussion : Select without duplicate from a specific string/key
both queries are slow - first because of the subselect in the IN clause - which is not optimized until MySQL 5.6; the second because uses a function in the where clause.
In the first query you can replace the subselect with a join
In the second, it's best to store the domain in separate column and use it for comparision
Make sure you have composite indexes on the fields used in joins, where and group by clauses

MySQL tables are not aligned right

I have a question which relates to MySQL. The problem can be seen in these two images:
http://imgur.com/NrOwSxS,yPo9Cra
http://imgur.com/NrOwSxS,yPo9Cra#1
Does anyone know why MySQL is doing this? It should show up as a nice and neat table, not this bundle of gibberish. Thanks in advance! :D
First, to show that there's nothing really wrong, try this query:
SELECT firstname FROM contact_info
That should look good. Now try this:
SELECT firstname, lastname FROM contact_info
That's how you pick individual columns.
Really you want to capture output to a file, this page shows you how: The MySQL Command-Line Tool
Then you can learn to use other programs to format it nicely.
I assume you created your table somewhat like this:
create table automobile (make char(10),model char(10),year int, color char(10), style char(50), MSRP int);
insert into automobile values ('Ford','Mustang',2006,'Blue','Convertible',27000);
insert into automobile values ('Toyota','Prius',2005,'Silver','Hybrid',22000);
insert into automobile values ('Toyota','Camry',2006,'Blue','Sedan',26000);
insert into automobile values ('Dodge','1500',2005,'Green','Pickup',26000);
so a
describe automobile
will show you your columns as:
+-------+----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+----------+------+-----+---------+-------+
| make | char(10) | YES | | NULL | |
| model | char(10) | YES | | NULL | |
| year | int(11) | YES | | NULL | |
| color | char(10) | YES | | NULL | |
| style | char(50) | YES | | NULL | |
| MSRP | int(11) | YES | | NULL | |
+-------+----------+------+-----+---------+-------+
as long as your columns in total are smaller than your terminal's width you should see
the expected result:
mysql> select * from automobile;
+--------+---------+------+--------+-------------+-------+
| make | model | year | color | style | MSRP |
+--------+---------+------+--------+-------------+-------+
| Ford | Mustang | 2006 | Blue | Convertible | 27000 |
| Toyota | Prius | 2005 | Silver | Hybrid | 22000 |
| Toyota | Camry | 2006 | Blue | Sedan | 26000 |
| Dodge | 1500 | 2005 | Green | Pickup | 28000 |
+--------+---------+------+--------+-------------+-------+
if you'd like the result smaller then pick the columns you'd like to see e.g.
select make,model from automobile
mysql> select make,model from automobile;
+--------+---------+
| make | model |
+--------+---------+
| Ford | Mustang |
| Toyota | Prius |
| Toyota | Camry |
| Dodge | 1500 |
+--------+---------+
to make the content of a column smaller you may use the left string function
select left(make,4) as make, left(model,5) as model,left(style,5) as style from automobile;
+------+-------+-------+
| make | model | style |
+------+-------+-------+
| Ford | Musta | Conve |
| Toyo | Prius | Hybri |
| Toyo | Camry | Sedan |
| Dodg | 1500 | Picku |
+------+-------+-------+
You can try supplying a line separator character in the end.
mysql> LOAD DATA LOCAL INFILE *file_path* INTO *table_name* LINES TERMINATED BY '\r\n';
Separator character may vary for editors. In Windows, most editors use '\r\n'.

how to get all enum field in a column side by side seperated by comma (,) in a mysql by query

In mysql i need to get enum fields side by side in a column when i run a query with group by , just like as follows.
There is table as like below
mysql> describe tabex;
+---------+----------------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+----------------------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| personid| int(11) | YES | | NULL | |
| color | enum('red','blue','white') | YES | | NULL | |
+---------+----------------------------+------+-----+---------+----------------+
there are different shirts , the column personid describes the that person id and color indicates the color of his shirt..
the data in table is as follows
mysql> select * from tabex;
+----+----------+-------+
| id | personid | color |
+----+----------+-------+
| 1 | 1 | red |
| 2 | 1 | white |
| 3 | 2 | blue |
| 4 | 2 | red |
+----+----------+-------+
4 rows in set (0.00 sec)
when i ran a query i am getting results like this
mysql> select personid , color from tabex group by personid;
+----------+-------+
| personid | color |
+----------+-------+
| 1 | red |
| 2 | blue |
+----------+-------+
but i want the result like below
+----------+-------------+
|personid | color |
+----------+-------------+
|1 | red,white |
|2 | blue,red |
| | |
+----------+-------------+
how can i get the result as above by using group by and aggregation (if any for enum).
that is here i want to get the result for enum fields as like we will get by using count or sum functions and group by .
The GROUP_CONCAT() aggregate function does what you want:
SELECT personid, GROUP_CONCAT(color) colors
FROM tabex
GROUP BY personid
This works with any kind of field, not just ENUM.

MYSQL - how to a second unique string key?

Here is my goods table.
+----------------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------------+---------------+------+-----+---------+-------+
| ID | decimal(18,0) | NO | PRI | | |
| gkey | varchar(255) | YES | MUL | NULL | |
| GOODS | decimal(18,0) | YES | MUL | NULL | |
Column ID is auto-increment.
GOODS is the id of a goods category.
Here is the goods category table.
+-------+---------------------+
| ID | NAME |
+-------+---------------------+
| 1 | book |
| 2 | phone |
+-------+---------------------+
My question is in goods table I need the gkey is also an unique key with prefix-id(here id is started from 1.) Like BOOK-1,BOOK-2....PHONE-1,PHONE-2... when insert a new goods record into goods table.
Like that:
+--------------------+---------------+------+-----+---------+-------+
| ID | GKEY |GOODS | PRI | COUNTRY | Extra |
+--------------------+---------------+------+-----+---------+-------+
| 1 | BOOK-1 | 1 | 10 | | |
| 2 | PHONE-1 | 2 | 12 | | |
| 3 | BOOK-2 | 1 | 13 | | |
| 4 | BOOK-3 | 1 | 10 | | |
| 5 | PHONE-2 | 2 | 10 | | |
| 6 | PHONE-3 | 2 | 20 | | |
+--------------------+---------------+------+-----+---------+-------+
How to get that GKEY in PHP+MYSQL?
thanks.
You can use the UNIQUE constraint. For more information check here.
I don't think you want to be doing this at all. Instead, good.gkey should be a foreign key to goods_category.id. If the insertion order is important, you can add an insert_date date field to the goods table.
From there you can run all sorts of queries, with the added bonus of having referential integrity on your tables.
First you should do everything Database side. So no php. That would be my advice.
Then not having sequence in MySQL that what I would Suggest you :
SELECT COUNT(id)
FROM GOODS
WHERE GKEY LIKE('BOOK-%')
Then you insert
INSERT INTO goods (id, "BOOK-" || SELECT MAX(SUBSTR(LENGTH(GKEY -1), LENGTH(GKEY))FROM GOODS WHERE GKEY LIKE('BOOK-%') + 1, ...)
This way you will always have the next number available.
You can do it, but you need to be quite careful to make sure that two simultaneous inserts don't get given the same gkey. I'd design it with these two design points:
In the GoodsCategory table, have a counter which helps to generate the next gkey.
Use locks so that only one process can generate a new key at any one time.
Here's the details:
1. Add a couple of columns to the GoodsCategory table:
+----------------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------------+---------------+------+-----+---------+-------+
| ID | TINYINT | NO | PRI | | |
| NAME | VARCHAR(80) | NO | | | |
| KeyCode | CHAR(5) | NO | | | |
| NextID | INT | NO | | 0 | |
+----------------------+---------------+------+-----+---------+-------+
The KeyCode has 'BOOK' or 'PHONE', and the NextID field stores an int which is used to generate the next key of that type i.e. if the table looks like this:
+----------------+------------+---------+--------+
| ID | NAME | KeyCode | NextID |
+----------------+------------+---------+--------+
| 1 | book | BOOK | 3 |
| 2 | phone | PHONE | 7 |
+----------------+------------+---------+--------+
Then the next time you add a book, it should be given gkey 'BOOK-3', and NextID is incremented to 4.
2: The locking will need to be done in a stored routine, because it involves multiple statements in a transaction and uses local variables too. The core of it should look something like this:
START TRANSACTION;
SELECT KeyCode, NextID INTO v_kc, v_int FROM GoodsCategory WHERE ID = 2 FOR UPDATE;
SET v_newgkey = v_kc + '-' + CAST(v_int AS CHAR);
INSERT INTO goods (gkey, goods, ...) VALUES (v_newgkey, 2, etc);
UPDATE GoodsCategory SET NextID = NextID + 1 WHERE ID = 2;
COMMIT;
The FOR UPDATE bit is crucial; have a look at the Usage Examples in the mysql manual where it discusses how to use locks to generate an ID without interference from another process.