I'm facing a problem of SELECT perfomance issue with MYSQL.
I have two tables "domain" and "email" which contain duplicates, theses tables are frequently updated (INSERT/DELETE) by different sources (every ten mins approximatively).
My primary objective was to make two views from thoses tables without any duplicates. I know a view is a stored query but this is my only way to keep it dynamic, creating a new table without duplicate every tens mins would be mad (maybe not?).
Both views are used by another thread (postfix) to check if the recipient is an allowed one. When i try to do a simple query
SELECT email FROM emailview WHERE email = 'john#google.com'`
the query takes 3-4seconds. On the contrary if I do my SELECT directly on the email table (with duplicates in) it takes 0,01sec.
How could i improve the SELECT performances on my system to obtain almost similar result with a view and not directly on the table ?
Here is the detail of the architecture (INNODB Engine, value 1 is random and doesn't really matter) :
Domain Table :
| field | type | null | key |
|--------------|--------------|------|------|
| domain | varchar(255) | NO | NULL |
| creationdate | datetime | NO | NULL |
| value 1 | varchar(255) | NO | NULL |
| source_fkey | varchar(255) | MUL | NULL |
| domain | creationdate | value 1 | source_fkey |
|------------|---------------------|-----------------------|
| google.com | 2013-05-28 15:35:01 | john | Y |
| google.com | 2013-04-30 12:10:10 | patrick | X |
| yahoo.com | 2011-04-02 13:10:10 | britney | Z |
| ebay.com | 2012-02-12 10:48:10 | harry | Y |
| ebay.com | 2013-04-15 07:15:23 | bill | X |
Domain View (duplicate domain are removed using the oldest creation date) :
CREATE VIEW domainview AS
SELECT domain.domain, creationdate, value1, source_fkey
FROM domain
WHERE (domain, creationdate) IN (SELECT domain, MIN(creationdate)
FROM domain GROUP BY domain);
| domain | creationdate | value 1 | source_fkey |
|------------|---------------------|-----------------------|
| google.com | 2013-04-30 12:10:10 | patrick | X |
| yahoo.com | 2011-04-02 13:10:10 | britney | Z |
| ebay.com | 2012-02-12 10:48:10 | harry | Y |
Email table :
| field | type | null | key |
|--------------|--------------|------|------|
| email | varchar(255) | NO | NULL |
| source_fkey | varchar(255) | MUL | NULL |
| email | foreign_key |
|--------------------|-------------|
| john#google.com | X |
| john#google.com | Y | <-- duplicate from wrong foreign/domain
| harry#google.com | X |
| mickael#google.com | X |
| david#ebay.com | Y |
| alice#yahoo.com | Z |
Email View (legit emails and emails from domain/foreign_key of the domain view) :
CREATE VIEW emailview AS
SELECT email.email, email.foreign_key
FROM email, domainview
WHERE email.foreign_key = domainview.foreign_key
AND SUBSTRING_INDEX(email.email,'#',-1) = domainview.domain;
| email | foreign_key |
|--------------------|-------------|
| john#google.com | X |
| harry#google.com | X |
| mickael#google.com | X |
| david#ebay.com | Y |
| alice#yahoo.com | Z |
There is no unique, no indexes, the only primary key is in the table where the foreign_key is.
Thanks for help.
Previous discussion : Select without duplicate from a specific string/key
both queries are slow - first because of the subselect in the IN clause - which is not optimized until MySQL 5.6; the second because uses a function in the where clause.
In the first query you can replace the subselect with a join
In the second, it's best to store the domain in separate column and use it for comparision
Make sure you have composite indexes on the fields used in joins, where and group by clauses
Related
I have two similar data sets (table, view, CTE), one of which contains unique rows (guaranteed by DISTINCT or GROUP BY), the second contains duplicates (no primary key constraint involved).
How can I get the difference of two data sets so that I only get the duplicates of the second set in MySql 8?
Say I have a table called Animals, which stores NAME and SPECIES.
+---------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+--------------+------+-----+---------+-------+
| ID | int(11) | NO | PRI | NULL | |
| NAME | varchar(255) | YES | | NULL | |
| SPECIES | varchar(255) | YES | | NULL | |
+---------+--------------+------+-----+---------+-------+
ANIMALS
+----+---------+-------------+
| ID | NAME | SPECIES |
+----+---------+-------------+
| 1 | Lion | Carnivorous |
| 2 | Giraffe | Herbivores |
| 3 | Zebra | Herbivores |
| 4 | Trutle | Herbivores |
| 5 | Tiger | Carnivorous |
| 6 | Bear | Carnivorous |
+----+---------+-------------+
With that in place, I define the view DUPLICATED.
CREATE VIEW DUPLICATED AS
SELECT * FROM ANIMALS
UNION ALL
SELECT * FROM ANIMALS WHERE SPECIES = "Carnivorous";
(Duplicates every Carnivorous in the set)
DUPLICATED
+---------+-------------+-----+
| NAME | SPECIES | CNT |
+---------+-------------+-----+
| Lion | Carnivorous | 2 |
| Tiger | Carnivorous | 2 |
| Bear | Carnivorous | 2 |
| Giraffe | Herbivores | 1 |
| Zebra | Herbivores | 1 |
| Trutle | Herbivores | 1 |
+---------+-------------+-----+
Now I want to get the difference of SELECT * FROM ANIMALS and DUPLICATED or vice versa, essential getting all Carnivorous from ANIMALS.
Basically you can group by whatever combination of fields that guarantee the uniqueness of a record in your result. you haven't provided your queries or your table's schema, so i will try to demonstrate this using a general example. you can get my drift and apply it to your query.
SELECT field1, field2, field3 COUNT(*)
FROM MyTable
GROUP BY field1, field2, field3
HAVING COUNT(*) > 1
I have a table in MySql and table name is logs
+---------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------+---------------+------+-----+---------+-------+
| domain | varchar(50) | YES | MUL | NULL | |
| sid | varchar(100) | YES | MUL | NULL | |
| email | varchar(100) | YES | MUL | NULL | |
+---------------+---------------+------+-----+---------+-------+
The following are sample rows from the table
+------------+----------------+---------------
| sid | email | domain|
+------------------------------------+-------+
| 1 | | xxx123#yahoo.com | xxx |
| 2 | | xxx123#yahoo.com | xxx |
| 2 | | yyy123#yahoo.com | yyy |
| 2 | | yyy123#yahoo.com | yyy |
| 3 | | zzz123#yahoo.com | zzz |
| 4 | | qqq123#yahoo.com | qqq |
| 2 | | ppp123#yahoo.com | ppp |
+---+--------+-----------------------+-------+
I want a query something like
select * from logs
where sid IN (select sid from logs
where domain="xxx" AND email="xxx123#yahoo.com")
Desired output
+------------+-----------------------+--------
| sid | email | domain|
+------------------------------------+-------+
| 1 | | xxx123#yahoo.com | xxx |
| 2 | | xxx123#yahoo.com | xxx |
| 2 | | yyy123#yahoo.com | yyy |
| 2 | | yyy123#yahoo.com | yyy |
| 2 | | ppp123#yahoo.com | ppp |
+---+--------+-----------------------+-------+
I can do it using joins but is there any way to get results without using joins or any optimized version of this query
You can use where exists as
select l1.* from logs l1
where exists(
select 1 from logs l2
where l1.sid = l2.sid
and l2.domain = 'xxx'
and l2.email = 'xxx123#yahoo.com'
);
First get a proper id on those rows. Second have you tried it? it looks like it should work. I have no idea why you want that though.
If it actually doesn't work try this structure, could be faster:
SELECT *
FROM some_table
WHERE relevant_field IN
(
SELECT * FROM
(
SELECT relevant_field
FROM some_table
WHERE conditions
) AS subquery
)
Do you want the whole table as result or just one column?
If I get your question right I would simple use:
SELECT * FROM logs WHERE domain="xxx" AND email="xxx123#yahoo.com"
Or if you want only the sid just replace the * with sid.
And if all sid´s are numbers, why don´t you use int or something similar as column type?
It seems like you are doing something redundant just by looking at your request you seem to look for
select * from logs where domain="xxx" AND email="xxx123#yahoo.com"
I dont't know why you are using the first part of the SQL string since this is not a join from other sql tables.
Or am i missing something?
I have a db with 10 fields, of which only a few change very often. I would like to open the current record, update it, and enter it back into the db with a new auto id while retaining the previous record.
If I INSERT a new record, I have to re-enter all the info even tho it may not have changed.
If I UPDATE a record, it overwrites the previous values.
Any suggestion would be greatly appreciated
How about a suitably modified version of
insert into table select * from table where id='current_id'
Sample query:Say i want to read first record ,lift student, subject but change marks and insert a new record
insert into score(student,subject,marks)
select student,subject,'30' from score where id=1;
Sample Data
+---------+---------+-------+----+
| student | subject | marks | id |
+---------+---------+-------+----+
| A1 | phy | 20 | 1 |
| A1 | bio | 87 | 2 |
| A2 | che | 24 | 3 |
| A3 | che | 50 | 4 |
Table structure:
+---------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+-------------+------+-----+---------+----------------+
| student | varchar(20) | YES | | NULL | |
| subject | varchar(20) | YES | | NULL | |
| marks | int(1) | YES | | NULL | |
| id | int(10) | NO | PRI | NULL | auto_increment |
+---------+-------------+------+-----+---------+----------------+
You can do it via two different queries: SELECT and INSERT, or you can use INSERT INTO SELECT construction. Check documentation for complete reference http://dev.mysql.com/doc/refman/5.0/en/insert-select.html
I'm using Mysql and I'm stuck on what I hope is a simple problem.
I need to select data from a table if one condition is true or another condition is true.
One select I tried returns data, but a lot more data than it should. Indeed the table contains just 66 records however my query is returning 177 records. I'm misunderstanding something.
I need to select data if ['city' is equal to a value and 'type' is golden] or 'type' is within a category called 'charms'
I've tried this query
SELECT b.*
FROM bubbles b, bubble_types bt
WHERE
b.city = 10
AND b.type = 'golden'
OR bt.category = 'charm'
AND bt.type = b.type;
and this one (which doesn't work at all but may be closer to the mark?)
SELECT b.*
IF(b.city = 10, b.type = 'golden'),
IF(bt.category = 'charm', bt.type = b.type)
FROM bubbles b, bubble_types bt;
Hopefully what I want makes sense?
I should get about 10 rows from the 66 of those bubbles in city 10 that are 'golden', or those bubbles whose type field puts them in category 'charm'.
Thanks;
edit sample table data for bubble_types:
+----+----------+------------+
| id | category | type |
+----+----------+------------+
| 1 | bubble | golden |
| 2 | charm | teleport |
| 3 | charm | blow |
| 4 | badge | reuser |
| 5 | badge | winner |
| 6 | badge | loothunter |
| 7 | charm | freeze |
| 8 | badge | reuser |
| 9 | badge | winner |
| 10 | badge | loothunter |
+----+----------+------------+
mysql> describe bubbles;
+-------------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------------+------+-----+---------+----------------+
| id | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
| city | bigint(20) | YES | | NULL | |
| type | varchar(32) | YES | | NULL | |
| taken_by | bigint(20) | YES | | NULL | |
| taken_time | bigint(20) | YES | | NULL | |
| label | varchar(256) | YES | | NULL | |
| description | varchar(16384) | YES | | NULL | |
| created | datetime | YES | | NULL | |
+-------------+---------------------+------+-----+---------+----------------+
You are so close! Take the "WHERE"-ness of your first go with the parenthesis of your second (and add an appropriate ON clause to your JOIN):
SELECT b.*
FROM bubbles b
JOIN bubble_types bt
ON b.type = bt.type
WHERE
(b.city = 10 AND b.type = 'golden')
OR
(bt.category = 'charm' AND bt.type = b.type);
The devil is in the details of associativity of AND and OR in the where clauses. When in doubt, use parenthesis to make your intentions explicit.
Here is my goods table.
+----------------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------------+---------------+------+-----+---------+-------+
| ID | decimal(18,0) | NO | PRI | | |
| gkey | varchar(255) | YES | MUL | NULL | |
| GOODS | decimal(18,0) | YES | MUL | NULL | |
Column ID is auto-increment.
GOODS is the id of a goods category.
Here is the goods category table.
+-------+---------------------+
| ID | NAME |
+-------+---------------------+
| 1 | book |
| 2 | phone |
+-------+---------------------+
My question is in goods table I need the gkey is also an unique key with prefix-id(here id is started from 1.) Like BOOK-1,BOOK-2....PHONE-1,PHONE-2... when insert a new goods record into goods table.
Like that:
+--------------------+---------------+------+-----+---------+-------+
| ID | GKEY |GOODS | PRI | COUNTRY | Extra |
+--------------------+---------------+------+-----+---------+-------+
| 1 | BOOK-1 | 1 | 10 | | |
| 2 | PHONE-1 | 2 | 12 | | |
| 3 | BOOK-2 | 1 | 13 | | |
| 4 | BOOK-3 | 1 | 10 | | |
| 5 | PHONE-2 | 2 | 10 | | |
| 6 | PHONE-3 | 2 | 20 | | |
+--------------------+---------------+------+-----+---------+-------+
How to get that GKEY in PHP+MYSQL?
thanks.
You can use the UNIQUE constraint. For more information check here.
I don't think you want to be doing this at all. Instead, good.gkey should be a foreign key to goods_category.id. If the insertion order is important, you can add an insert_date date field to the goods table.
From there you can run all sorts of queries, with the added bonus of having referential integrity on your tables.
First you should do everything Database side. So no php. That would be my advice.
Then not having sequence in MySQL that what I would Suggest you :
SELECT COUNT(id)
FROM GOODS
WHERE GKEY LIKE('BOOK-%')
Then you insert
INSERT INTO goods (id, "BOOK-" || SELECT MAX(SUBSTR(LENGTH(GKEY -1), LENGTH(GKEY))FROM GOODS WHERE GKEY LIKE('BOOK-%') + 1, ...)
This way you will always have the next number available.
You can do it, but you need to be quite careful to make sure that two simultaneous inserts don't get given the same gkey. I'd design it with these two design points:
In the GoodsCategory table, have a counter which helps to generate the next gkey.
Use locks so that only one process can generate a new key at any one time.
Here's the details:
1. Add a couple of columns to the GoodsCategory table:
+----------------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------------+---------------+------+-----+---------+-------+
| ID | TINYINT | NO | PRI | | |
| NAME | VARCHAR(80) | NO | | | |
| KeyCode | CHAR(5) | NO | | | |
| NextID | INT | NO | | 0 | |
+----------------------+---------------+------+-----+---------+-------+
The KeyCode has 'BOOK' or 'PHONE', and the NextID field stores an int which is used to generate the next key of that type i.e. if the table looks like this:
+----------------+------------+---------+--------+
| ID | NAME | KeyCode | NextID |
+----------------+------------+---------+--------+
| 1 | book | BOOK | 3 |
| 2 | phone | PHONE | 7 |
+----------------+------------+---------+--------+
Then the next time you add a book, it should be given gkey 'BOOK-3', and NextID is incremented to 4.
2: The locking will need to be done in a stored routine, because it involves multiple statements in a transaction and uses local variables too. The core of it should look something like this:
START TRANSACTION;
SELECT KeyCode, NextID INTO v_kc, v_int FROM GoodsCategory WHERE ID = 2 FOR UPDATE;
SET v_newgkey = v_kc + '-' + CAST(v_int AS CHAR);
INSERT INTO goods (gkey, goods, ...) VALUES (v_newgkey, 2, etc);
UPDATE GoodsCategory SET NextID = NextID + 1 WHERE ID = 2;
COMMIT;
The FOR UPDATE bit is crucial; have a look at the Usage Examples in the mysql manual where it discusses how to use locks to generate an ID without interference from another process.