Large MySQL InnoDB table - Fast LIKE queries - mysql

I've got a large (1 million+ rows) MySQL InnoDB table with the following columns:
id int(11)
name varchar(255)
If I perform the following query it takes a very long time (Well over 3 minutes [I got bored whilst waiting and stopped it]).
SELECT name FROM large_table WHERE name LIKE '%peter%'
I have added an index on the name column which has made it perform much better 0.009 seconds
I just want to confirm that this is this a suitable work-around for the issue? Or are you reading this cringing?

Related

Faster reporting where source is Mysql

We have a Mysql Master Slave architecture. We have around 1000 tables. 5 or 6 tables in our db is around 30 to 40 GB each. We can not join one 30 GB table to another 30 GB table as it never returns result .
What we do : Select required data from one table and than find matching data in another table in chunks. This gives result to us, but this is slow.
After joining two tables in chunks we further process these tables. We use few more joins as well as per the use case.
Current DB: architecture: 5 Master Server, 100 Slave Servers.
1. How can we make it faster ? Indexing is not an issue here, we are already using it.
2. Do we need some big data approach to get faster result.
EDIT: Query Details Below
Query select count(*) from A, B where A.id = B.uid;
Table A 30 GB, have 51 Columns. Id is primary key which is auto incremental integer.
Table B 27 GB, have 48 Columns. uid (int 11) is non unique index.
MySql ISAM is used.
That's an awful query. It will either
Scan all of A
For each id, lookup (randomly) the uid in B's index.
or
Scan all of B's index on uid
For each uid, lookup (randomly) the id in A (in the PK, hence i the data).
In either case,
the 30GB of A will all be touched
much of the uid index of B will be touched
Step 1 will be a linear scan
Step 2 will be random probe, presumably involving lots of I/O.
Please explain the intent if the query; maybe we can help you reformulate it to achieve the same or similar purpose.
Meanwhile, how much RAM do you have? What is the setting of innodb_buffer_pool_size? And are the tables InnoDB?
The query will eventually return a result, unless some "timeout" kills it.
Is id an AUTO_INCREMENT? Or is uid a "UUID"? (UUIDs make performance worse, but there are some minor tips to help.)

MySQL Locking Tables with millions of rows

I've been running a website, with a large amount of data in the process.
A user's save data like ip , id , date to the server and it is stored in a MySQL database. Each entry is stored as a single row in a table.
Right now there are approximately 24 million rows in the table
Problem 1:
Things are getting slow now, as a full table scan can take too many minutes but I already indexed the table.
Problem 2:
If a user is pulling a select data from table it could potentially block all other users (as the table is locked) access to the site until the query is complete.
Our server
32 Gb Ram
12 core with 24 thread cpu
table use MyISAM engine
EXPLAIN SELECT SUM(impresn), SUM(rae), SUM(reve), `date` FROM `publisher_ads_hits` WHERE date between '2015-05-01' AND '2016-04-02' AND userid='168' GROUP BY date ORDER BY date DESC
Lock to comment from #Max P. If you write to MyIsam Tables ALL SELECTs are blocked. There is only a Table lock. If you use InnoDB there is a ROW Lock that only locks the ROWs they need. Aslo show us the EXPLAIN of your Queries. So it is possible that you must create some new one. MySQL can only handle one Index per Query. So if you use more fields in the Where Condition it can be useful to have a COMPOSITE INDEX over this fields
According to explain, query doesn't use index. Try to add composite index (userid, date).
If you have many update and delete operations, try to change engine to INNODB.
Basic problem is full table scan. Some suggestion are:
Partition the table based on date and dont keep more than 6-12months data in live system
Add an index on user_id

query execution taking too long for altering table

I am using mysql
I have a table called address and the table has a column called zip5 which is of type varchar(6) .
I am using query
alter table address change zip5 zip5 varchar(14);
but the query execution is taking too long I am waiting from almost 15 minutes and waiting for query to execute, the address table has 9.7 million records. Does it take this long for this amount of data or am I doing something wrong here?
Hmm, I dont know why but
ALTER TABLE address MODIFY zip5 varchar(14)
seems to be a bit faster. At least on my system with a comparable table-structure.
ALTER TABLE makes a copy of you table. Perhaps the bottle-neck is your HD? Do you use SSDs? or is your temporary table storage not set on a fast disk? Is the disk full?

Alternative to inner join on very slow server 2

I have a very simple mysql query on a remote windows 7 server on which i cannot change most of the parameters. I need to execute it only once now, to create a table, but in upcoming projects i'll be confronted to the same issue.
The query's the following, and has been running for 24 hours now, it's a basic filtering query :
CREATE TABLE compute_us_gum_2013P13
SELECT A.HHID, UPC, DIVISION, yearweek, CAL_DT, RETAILER, DEAL, RAW_PURCH_QTY,
UNITS,VOL,GROSS_DOL,NET_DOL, CREATE_DATE
FROM work_us_gum_2013P13_digital_purchases_with_yearweek A
INNER JOIN compute_us_gum_2013_digital_panelists B
on A.hhid = B.hhid;
Table A is quite big, around 250 million lines.
table B is 5 million lines
hhid is indexed on both tables, i haven't put a unique index in table B though i could, but will it change things dramatically ?
My ram of 12 GB is completely saturated (actually there's 1GB free but i think mysql can't touch it). Of course i closed everything i could, and the processor is basically not used. The status of the query has been stuck on "sending data" for most of the time.
Table A has also a cover index on 7 column, that i could drop as it's not used, but i don't think it would change something would it ?
One big issue i have is i cannot test a lot of things because i wouldn't know if it works until it works, and i think this query will be long no matter what. Also I don't want to lose for nothing the computation time that's already been spent.
I could also if it helps keep only the columns HHID, UPC and yearweek (resp bigint(20),bigint(20),and int(11), though the columns i would drop are only decimal and dates.
And what if i split table B in several parts ? the operation is only a filtering one, so it can be done in several steps, would i win time ? If i don't gain time but don't lose either, at least i could see my progress.
Another possibility would be to directly delete rows from table A (and if really necessary, columns), so i wouldn't have to write another table, would it be faster ?
I can change some database parameters if i send an email to my client, but it take some tim and is not suitable for a lot of tweeking and testing.
Any solution would be appreciate, even the dirtiest one :), i'm really stuck here.
EDIT:
Explain gives me this:
Id select_type table type possible_keys key keylen ref row Extra
1 Simple B index hhidx hhidx 8 NULL 5003865 Using Index
1 Simple A ref hhidx hhidx 8 ncsmars.B.hhid 6 [nothing]
What is the Engine? Is it InnoDB?
What are the primary keys for both tables?
Did you start both primary keys with your HHID (if HHID is not a candidate key for a table - you can create composite key and start it from that field)?
If you start both PK from HHID and then will join your tables on that field - disk seek will be reduced dramatically so you should see much better performance. If you cannot alter both tables - start from smaller one - alter its PK to have HHID on the first place of it and then check execution plan.
ALTER TABLE compute_us_gum_2013_digital_panelists ADD PRIMARY KEY(HHID, [other necessary fields (if any)])
Suppose it will be better than before.

Loading one table in MySQL is ridiculously slow

Fro clarity all other tables in the DB work as expected, and load ~2million rows in a fraction of a second. The one table of just ~600 rows is taking 10+minutes to load in navcat.
I can't think of any possible reason for this. There are just 4 columns. One of them is a large text field, but I've worked with large text fields before and they've never been this slow.
running explain select * from parser_queue I get
id setect type table type possible keys key key len ref rows extra
1 SIMPLE parser_queue ALL - - - - 658 -
The profile tells me that 453 seconds are spent 'sending data'
I've also got this in the 'Status' tab. I don't understand most of it, but these numbers are much higher than my other tables.
Bytes_received 31
Bytes_sent 32265951
Com_select 1
Created_tmp_files 16
Handler_read_rnd_next 659
Key_read_requests 9018487
Key_reads 3928
Key_write_requests 310431
Key_writes 4290
Qcache_hits 135077
Qcache_inserts 14289
Qcache_lowmem_prunes 4133
Qcache_queries_in_cache 983
Questions 1
Select_scan 1
Table_locks_immediate 31514
The data stored in the text field is about 12000 chars on average.
There is a primary, auto increment int id field, a tinyint status field, a text field, and a timestamp field with on update current timestamp.
OK I will try out both answers, but I can answer the questions quickly first:
Primary key on the ID field is the only key. This table is used for queuing, with ~50 records added/deleted per hour, but I only created it yesterday. Could it become corrupted in such a short time?
It is MyISAM
More work trying to isolate the problem:
repair table did nothing
optimize table did nothing
created a temp table. queries were about 50% slower on the temp table.
Deleted the table and rebuilt it. SELECT * takes 18 seconds with just 4 rows.
Here is the SQL I used to create the table:
CREATE TABLE IF NOT EXISTS `parser_queue` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`status` tinyint(4) NOT NULL DEFAULT '1',
`data` text NOT NULL,
`last_updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=MyISAM;
Stranger still, everything seems fine on my local box. The slowness only happens on the dev site.
For clarity: there are more than 100 tables on the dev site and this is the only one that is funky.
OK I have disabled all cron jobs which use this table. SHOW PROCESSLIST does not reveal any locks on the table.
Changing the engine to InnoDB did not produce any significant improvement (86 seconds vs 94 for MyISAM)
any other ideas? . . .
Running SHOW PROCESSLIST during the query reveals it spends most of its time writing to net
If you suspect corruption somewhere, you can try either (or both) of the following:
CREATE TABLE temp SELECT * FROM parser_queue;
This will create a new table "identical" to the previous one, except it will be recreated. Alternatively (or maybe after you've made a copy), you can try:
REPAIR TABLE parser_queue;
You may also want to try optimizing the table; it might have gotten fragmented since you are using it as a queue.
OPTIMIZE TABLE parser_queue;
You can determine if the table is fragmented by running SHOW TABLE STATUS LIKE 'Data_Free' and see if this produces a high number.
Update
You say you are storing gzcompressed data in the TEXT columns. Try changing the TEXT column to BLOB instead, which is meant to handle binary data, such as compressed text.
The name gives away that you are using the table for queueing (lots of inserts and delets, maybe?). Maybe you have had the table a while and it's heavily fragmented. If my assumptions are correct, try OPTIMIZE TABLE parser_queue;
You can read more about this in the manual:
http://dev.mysql.com/doc/refman/5.1/en/optimize-table.html
Right, the problem seems to have been only this: the text fields where too huge.
Running
SELECT id, status, last_updated FROM parser_queue
takes less time than
SELECT data FROM parser_queue WHERE id = 6
Since all the queries I will be running return only one row, the slowdown will not affect me so much. I'm already using gzcompress on the data stored, so I don't think there is much more I could do anyway.