How improve query speed with bigdata ? Mysql - mysql

The table structure is as follows:
When I run this query, the execute time is about 2-3 minutes:
select id,name,infohash,files from tb_torrent where id between 0 and 10000;
There's just over 200,000 data, why is the execution so slow? And how to fix it?

The unnecessary use of BIGint is not enough to explain the sluggishness. Let's look for other issues.
Does that "key" icon mean that there is an index on id? Perchance is it PRIMARY KEY?
What ENGINE is in use? If it is MyISAM, then you have the drawback of the PK not being 'clustered' with the data, thereby making the 10K lookups slower.
What will you do with 10K rows? Think of the networks costs. And the memory costs in the client.
But maybe this is the real problem... If this is InnoDB, and if the TEXT columns are "big", then the values are stored "off record". This leads to another disk hit to get any big text values. Change them to some realistic max len of VARCHAR(...).
How much RAM do you have? What is the value of innodb_buffer_pool_size? Did you time the query twice? (The first time would be I/O-bound; the second might be hitting cache. How big (in MB or GB) is the table?

Related

Queries fast after creating an index but slow after a few minutes MySQL

I have several tables with ~15 million rows. When I create an idex on the id column and then I execute a simple query like SELECT * FROM my_table WHERE id = 1 I retrieve the data within one second. But then, after a few minutes, if I execute the query with a different id it takes over 15 seconds.
I'm sure it is not the query cache because I'm trying different ids all the time to make sure I'm not retrieving from the cache. Also, I used EXPLAIN to make sure the index it's being used.
The specs of the server are:
CPU: Intel Dual Xeon 5405 Harpertown 2.0Ghz Quad Core
RAM: 8GB
Hard drive 2: 146GB SAS (15k rpm)
Another thing I noticed is that if I execute REPAIR TABLE my_table the queries become within one second again. I assume something is being cached, either the table or the index. If so, is there any way to tell MySQL to keep it cached. Is it normal, given the specs of the server, to take around 13 seconds on an indexed table? The index is not unique and each query returns around 3000 rows.
NOTE: I'm using MyISAM and I know there won't be any write in these tables, all the queries will be to read data.
SOLVED: thank you for your answers, as many of you pointed out it was the key_buffer_size.I also reordered the tables using the same column as the index so the records are not scattered, now I'm executing the queries consistently under 1 second.
Please provide
SHOW CREATE TABLE
SHOW VARIABLES LIKE '%buffer%';
Likely causes:
key_buffer_size (when using MyISAM) is not 20% of RAM; or innodb_buffer_pool_size is not 70% of available RAM (when using InnoDB).
Another query (or group of queries) is coming in and "blowing out the cache" (key_buffer or buffer_pool). Look for such queries).
When using InnoDB, you don't have a PRIMARY KEY. (It is really important to have such.)
For 3000 rows to take 15 seconds to load, I deduce:
The cache for the table (not necessarily for the index) was blown out, and
The 3000 rows were scattered around the table (hence fetching one row does not help much in finding subsequent rows).
Memory allocation blog: http://mysql.rjweb.org/doc.php/memory
Is it normal, given the specs of the server, to take around 13 seconds on an indexed table?
The high variance in response time indicates that something is amiss. With only 8 GB of RAM and 15 million rows, you might not have enough RAM to keep the index in memory.
Is swap enabled on the server? This could explain the extreme jump in response time.
Investigate the memory situation with a tool like top, htop or glances.

Handling huge MyISAM table for optimisation

I have a huge (and growing) MyISAM table (700millions rows = 140Gb).
CREATE TABLE `keypairs` (
`ID` char(60) NOT NULL,
`pair` char(60) NOT NULL,
PRIMARY KEY (`ID`)
) ENGINE=MyISAM
The table option was changed to ROW_FORMAT=FIXED, cause both columns are always fixed length to max (60). And yes yes, ID is well a string sadly and not an INT.
SELECT queries are pretty ok in speed efficiency.
Databases and mysql engine are all 127.0.0.1/localhost. (nothing distant)
Sadly, INSERT is slow as hell. I dont even talk about trying to LOAD DATA millions new rows... takes days.
There won't have any concurrent read on it. All SELECTs are done one by one by only my local server.(it is not for client's use)
(for infos : files sizes .MYD=88Gb, .MYI=53Gb, .TMM=400Mb)
How could i speed up inserts into that table?
Would it help to PARTITION that huge table ? (how then?)
I heard MyISAM is using "structure cache" as .frm files. And that a line into config file is helping mysql keep in memory all the .frm (in case of partitionned), would it help also? Actualy, my .frm file is 9kb only for 700millions rows)
string shortenning/compress function... the ID string? (same idea as rainbow tables) even if it lowers the max allowed unique ID's, i will anyway never reach the max of 60chars. so maybe its an idea? but before creating a new unique ID i have to check if shortened string doesn't exists in db ofc
Same idea as shortening ID strings, what about using md5() on the ID? shorten string means faster or not in that case?
Sort the incoming data before doing the LOAD. This will improve the cacheability of the PRIMARY KEY(id).
PARTITIONing is unlikely to help, unless there is some useful pattern to ID.
PARTITIONing will not help for single-row insert nor for single-row fetch by ID.
If the strings are not a constant width of 60, you are wasting space and speed by saying CHAR instead of VARCHAR. Change that.
MyISAM's FIXED is useful only if there is a lot of 'churn' (deletes+inserts, and/or updates).
Smaller means more cacheable means less I/O means faster.
The .frm is an encoding of the CREATE TABLE; it is not relevant for this discussion.
A simple compress/zip/whatever will almost always compress text strings longer than 10 characters. And they can be uncompressed, losslessly. What do your strings look like? 60-character English text will shrink to 20-25 bytes.
MD5 is a "digest", not a "compression". You cannot recover the string from its MD5. Anyway, it would take 16 bytes after converting to BINARY(16).
The PRIMARY KEY is a BTree. If ID is somewhat "random", then the 'next' ID (unless the input is sorted) is likely not to be cached. No, the BTree is not rebalanced all the time.
Turning the PRIMARY KEY into a secondary key (after adding an AUTO_INCREMENT) will not speed things up -- it still has to update the BTree with ID in it!
How much RAM do you have? For your situation, and for this LOAD, set MyISAM's key_buffer_size to about 70% of available RAM, but not bigger than the .MYI file. I recommend a big key_buffer because that is where the random accesses are occurring; the .MYD is only being appended to (assuming you have never deleted any rows).
We do need to see your SELECTs to make sure these changes are not destroying performance somewhere else.
Make sure you are using CHARACTER SET latin1 or ascii; utf8 would waste a lot more space with CHAR.
Switching to InnoDB will double, maybe triple, the disk space for the table (data+index). Therefore, it will probably show down. But a mitigating factor is that the PK is "clustered" with the data, so you are not updating two things for each row inserted. Note that key_buffer_size should be lowered to 10M and innodb_buffer_pool_size should be set to 70% of available RAM.
(My bullet items apply to InnoDB except where MyISAM is specified.)
In using InnoDB, it would be good to try to insert 1000 rows per transaction. Less than that leads to more transaction overhead; more than that leads to overrunning the undo log, causing a different form of slowdown.
Hex ID
Since ID is always 60 hex digits, declare it to be BINARY(30) and pack them via UNHEX(...) and fetch via HEX(ID). Test via WHERE ID = UNHEX(...). That will shrink the data about 25%, and MyISAM's PK by about 40%. (25% overall for InnoDB.)
To do just the conversion to BINARY(30):
CREATE TABLE new (
ID BINARY(30) NOT NULL,
`pair` char(60) NOT NULL
-- adding the PK later is faster for MyISAM
) ENGINE=MyISAM;
INSERT INTO new
SELECT UNHEX(ID),
pair
FROM keypairs;
ALTER TABLE keypairs ADD
PRIMARY KEY (`ID`); -- For InnoDB, I would do differently
RENAME TABLE keypairs TO old,
new TO keypairs;
DROP TABLE old;
Tiny RAM
With only 2GB of RAM, a MyISAM-only dataset should use something like key_buffer_size=300M and innodb_buffer_pool_size=0. For InnoDB-only: key_buffer_size=10M and innodb_buffer_pool_size=500M. Since ID is probably some kind of digest, it will be very random. The small cache and the random key combine to mean that virtually every insert will involve a disk I/O. My first estimate would be more like 30 hours to insert 10M rows. What kind of drives do you have? SSDs would make a big difference if you don't already have such.
The other thing to do to speed up the INSERTs is to sort by ID before starting the LOAD. But that gets tricky with the UNHEX. Here's what I recommend.
Create a MyISAM table, tmp, with ID BINARY(30) and pair, but no indexes. (Don't worry about key_buffer_size; it won't be used.)
LOAD the data into tmp.
ALTER TABLE tmp ORDER BY ID; This will sort the table. There is still no index. I think, without proof, that this will be a filesort, which is much faster that "repair by key buffer" for this case.
INSERT INTO keypairs SELECT * FROM tmp; This will maximize the caching by feeding rows to keypairs in ID order.
Again, I have carefully spelled out things so that it works well regardless of which Engine keypairs is. I expect step 3 or 4 to take the longest, but I don't know which.
Optimizing a table requires that you optimize for specific queries. You can't determine the best optimization strategy unless you have specific queries in mind. Any optimization improves one type of query at the expense of other types of queries.
For example, if your query is SELECT SUM(pair) FROM keypairs (a query that would have to scan the whole table anyway), partitioning won't help, and just adds overhead.
If we assume your typical query is inserting or selecting one keypair at a time by its primary key, then yes, partitioning can help a lot. It all depends on whether the optimizer can tell that your query will find its data in a narrow subset of partitions (ideally one partition).
Also make sure to tune MyISAM. There aren't many tuning options:
Allocate key_buffer_size as high as you can spare to cache your indexes. Though I haven't ever tried anything higher than about 10GB, and I can't guarantee that MyISAM key buffers are stable at 53GB (the size of your MYI file).
Pre-load the key buffers: https://dev.mysql.com/doc/refman/5.7/en/cache-index.html
Size read_buffer_size and read_rnd_buffer_size appropriately given the queries you run. I can't give a specific value here, you should test different values with your queries.
Size bulk_insert_buffer_size to something large if you want to speed up LOAD DATA INFILE. It's 8MB by default, I'd try at least 256MB. I haven't experimented with that setting, so I can't speak from experience.
I try not to use MyISAM at all. MySQL is definitely trying to deprecate its use.
...is there a mysql command to ALTER TABLE add INT ID increment column automatically?
Yes, see my answer to https://stackoverflow.com/a/251630/20860
First, your primary key is not incrementable.
Which means, roughly: at every insert the index have to be rebalanced.
No wonder it goes slowpoke at the table of such a size.
And such an engine...
So, to the second: what's the point of keeping that MyISAM old junk?
Like, for example, you don't mind to loose row or two (or -teen) in case of an accident? And etc, etc, etc, even setting aside that current MySQL maintainer (Oracle Corp) explicitly discourages usage of MyISAM.
So, here are possible solutions:
1) Switch to Inno;
2) If you can't surrender the char ID, then:
Add autoincrement numerical key and set it primary - then, index would be clustered and the cost of insert would drop significantly;
Turn your current key into secondary index;
3) In case you can - it's obvious

Disk usage for optimizing a partitioned MySQL-Table

I have a large MyISAM table with 3 million rows that has a size of 31 GB due to a 10KB blob in each row. The table has already 30 partitions. I want to optimize the table since I am going to remove rows with some old data and resize the blobs.
My question is about the disk usage while optimzing:
If I do an optimize of the whole table, does MySQL steps through the partitions and optimize only one partionen at a time and thus only need extra space of one small partion? Or do I have to optimize a or few partitions in order not to have so much extra disk space while optimizing.
Optimizing only a partition with s size of ~1 GB takes only very fews seconds and I could not see any heavy disk usage.
(My answer assumes InnoDB. Even if I am overly pessimistic, the 'solution' should work fine for MyISAM.)
For InnoDB, keep in mind the issues of innodb_file_per_table.
OPTIMIZE will build a copy of the entire table.
Solution: If you are tight on space, you can optimize one partition at a time by doing
ALTER TABLE REORGANIZE PARTITION ...
( INTO PARTITION ... );
Yes, you will need to build the ... for one partition at a time, and execute them one at a time.
(Do not do OPTIMIZE PARTITION, that will optimize the entire table.)
Would you like to elaborate on what your table is like? I may want to talk you out of partitioning or talk you into a different way of partitioning.

Mysql select * from tables where primary_key in (1,2,,...) very slow

I am having trouble with one of my queries which is very slow from time to time.
SELECT * FROM table WHERE primary_key IN (1,2,...)
very slow sometimes even 5s for some 100 entries. Obviously we have index on the primary key. Is there any way to optimize this query. The table is slightly big about 100 million entries.
Can you post an EXPLAIN, perhaps there's something fishy going on.
Also, if this is a MyISAM database table and you have high concurrent INSERTs, the table will be locked during the inserts so your SELECTs may block during this time.
For a 100M records table you can't really do much in terms of query optimization as it is simple enough. What you could look into is MySQL config tuning.
In case your table is InnoDB/XtraDB (and if it's not i'd recommend to use those engines) you might want to look into innodb_buffer_pool_size variable - it's essential that used index fits into memory to achieve best performance for your queries.

Fastest MySQL peformance updating a single field in a single indexed row

I'm trying to get the fastest performance from an application that updates indexed rows repeatedly replacing data in a varchar field. This varchar field will be updated with data that is of equal size upon subsequent updates (so a single row never grows). To my utter confusion I have found that the performance is directly related to the size of the field itself and is nowhere near the performance of replacing data in a filesystem file directly. ie 1k field size orders of magnitude faster than 50k field size. (within the row size limit) If the row exists in the database and the size is not changing why would an update incur so much overhead?
i am using innodb and have disabled binary logging. i've ruled out communications overhead by using sql generated strings. tried using myisam and it was roughly 2-3x faster but still too slow. i understand the database has overhead but again i am simply replacing data in a single field with data that is of equal size. what is the db doing other than directly replacing bits?
rough peformance #'s
81 updates/sec (60k string)
1111 updates/sec (1k string)
filesystem performance:
1428 updates/sec (60k string)
the updates i'm doing are insert...on duplicate key update. straight updates are roughly 50% faster but still ridiculously slow for what it is doing.
Can any experts out there enlighten me? Any way to improve these numbers?
I addressed a question in the DBA StackExchange concerning using CHAR vs VARCHAR. Please read all the answers, not just mine.
Keep something else in mind as well. InnoDB features the gen_clust_index, the internal row id clustered index for all InnoDB Tables, one per InnoDB table. If you change anything in the primary key, this will give the gen_clust_index a real workout getting reoganized.