I have two tables (MYISAM)
create table A (email varchar(50));
create table B( email varchar(50) key 'email' (email));
Table A has 130K records
Table B has 20K records
why does this sql statement take very long time (more than two minutes, then i aborted query by Ctrl+C)
Statement is:
select count(*) from user A, tmp B where A.email=B.email;
Thanks
I'd guess that the Query optimizer has nothing to go on. Why don't you try defining indexes on the email columns.
In general, joining on strings is more expensive than joining on shorter data types like int.
You could speed up this query by making sure both email columns are indexed.
If table A has an int ID field then table B should store that ID instead of storing the email string again. That would decrease the DB size and along with indexes would provide a much faster query speed than a string would ever give you.
Related
There is my SQL query. Table system_mailer is for logging sent e-mails. When i want to search some data, query is 10 seconds long. It is possible on any way to optimize this query?
SELECT `ID`
FROM `system_mailer`
WHERE `group` = 'selling_center'
AND `group_parameter_1` = '1'
AND `group_parameter_2` = '2138'
Timins is around couple of seconds, how could it be optimised?
You might find the following index on 4 columns would help performance:
CREATE INDEX idx ON system_mailer (`group`, group_parameter_1, group_parameter_2, ID);
MySQL should be able to use this index on your current query. By the way, if you are using InnoDB, and ID is the primary key, then you might be able to drop it from the explicit index definition, and just use this:
CREATE INDEX idx ON system_mailer (`group`, group_parameter_1, group_parameter_2);
Please avoid naming your columns and tables with reserved MySQL keywords like group. Because you made this design decision, you will now be forced to forever escape that column name with backticks (ugly).
just be sure you have a composite index on table system_mailer for the columns
(`group`, `group_parameter_1`, `group_parameter_2`)
and you can use redudancy adding the id to index for avoid data table access in query
(`group`, `group_parameter_1`, `group_parameter_2`, ID)
I'm going to make a mysql database that is potentially going to hold thousands of records. before I create it I would like know what is the best way to approach this? I mean how can I make the database faster and when does a mysql databse start to slow down when fetching data.
Are there any free solutions to this?
Thanks for reading :)
Thousands of records is absolutely nothing for a DB.
Make proper indexes on the column you like to put conditions on in your queries. Example database table:
Persons
------------------------------
id int auto_increment
name varchar(200)
gender char(1)
Now imagine in the table are thousands of Person records. You want to select all data of a single person by the person's name.
The query would be
select * from persons
where name = 'John'
If you put an index on the name column then the query will perform much faster.
Assume I have this table:
create table table_a (
id int,
name varchar(25),
address varchar(25),
primary key (id)
) engine = innodb;
When I run this query:
select * from table_a where id >= 'x' and name = 'test';
How will MySQL process it? Will it pull all the id's first (assume 1000 rows) then apply the where clause name = 'test'?
Or while it looks for the ids, it is already applying the where clause at the same time?
As id is the PK (and no index on name) it will load all rows that satisfy the id based criterion into memory after which it will filter the resultset by the name criterion. Adding a composite index containing both fields would mean that it would only load the records that satisfy both criteria. Adding a separate single column index on the name field may not result in an index merge operation, in which case the index would have no effect.
Do you have indexes on either column? That may affect the execution plan. The other thing is one might cast the 'x'::int to ensure a numeric comparison instead of a string comparison.
For the best result, you should have a single index which includes both of the columns id and name.
In your case, I can't answer the affect of the primary index to that query. That depends on DBMS's and versions. If you really don't want to put more index (because more index means slow write and updates) just populate your table with like 10.000.000 random results, try it and see the effect.
you can compare the execution times by executing the query first when the id comes first in the where clause and then interchange and bring the name first. to see an example of mysql performance with indexes check this out http://www.mysqlperformanceblog.com/2006/06/02/indexes-in-mysql/
You can get information on how the query is processed by running EXPLAIN on the query.
If the idea is to optimize that query then you might want to add an index like:
alter table table_a add unique index name_id_idx (name, id);
I have a MyISAM table in MySQL which consists of two fields (f1 integer unsigned, f2 integer unsigned) and contains 320 million rows. I have an index on f2. Every week I insert about 150,000 rows into this table. I would like to know what is the frequency with which I need to run "analyze" and "optimize" on this table (as it would probably take a long time and block in the meantime)? I do not do any deletes or update statements, but just insert new rows every week. Also, I am not using this table in any joins so, based on this information, are "analyze" and "optimize" really required?
Thanks in advance,
Tim
ANALYZE TABLE checks the keys, OPTIMIZE TABLE kind of reorganizes tables.
If you never...ever... delete or update the data in your table, only insert new ones, you won't need analyze or optimize.
I have a table of the form
CREATE TABLE data
{
pk INT PRIMARY KEY AUTO_INCREMENT,
dt BLOB
};
It has about 160,000 rows and about 2GB of data in the blob column (avg. 14kb per blob). Another table has foreign keys into this table.
Something like 3000 of the blobs are identical. So what I want is a query that will give me a re map table that will allow me to remove the duplicates.
The naive approach took about an hour on 30-40k rows:
SELECT a.pk, MIN(b.pk)
FROM data AS a
JOIN data AS b
ON a.dt=b.dt
WHERE b.pk < a.pk
GROUP BY a.pk;
I happen to have, for other reasons, a table that has the sizes of the blobs:
CREATE TABLE sizes
(
fk INT, // note: non-unique
sz INT
// other cols
);
By building indexes for both fk and another for sz the direct query from that takes about 24 sec with 50k rows:
SELECT da.pk,MIN(db.pk)
FROM data AS da
JOIN data AS db
JOIN sizes AS sa
JOIN sizes AS sb
ON
sa.size=sb.size
AND da.pk=sa.fk
AND db.pk=sb.fk
WHERE
sb.fk<sa.fk
AND da.dt=db.dt
GROUP BY da.pk;
However that is doing a full table scan on da (the data table). Given that the hit rate should be fairly low I'd think that an index scan would be better. With that in mind in added a 3rd copy of data as a 5th join to get that, and lost about 3 sec.
OK so for the question: Am I going to get much better than the second select? If so, how?
A bit of a corollary is: if I have a table where the key column's get very heavy use but the rest should only get rarely used, will I ever be better off adding another join of that table to encourage an index scan vs. a full table scan?
Xgc on #mysql#irc.freenode.net points out that the adding a utility table like sizes but with a unique constraint on fk might help a lot. Some fun with triggers and what not might make it even not to bad to keep up to date.
You can always use a hashing function (MD5 or SHA1) for your data and then compare the hashes.
The question is if you can save the hashes in your database?