I am having a question about "which storage device to choose" for my database tables. I have a table with 28 million records. I will insert data after creating the table, after that, no insert - update -delete operation will take place. Never. Only select operations.
I have a query like below
SELECT `indexVal`, COUNT(`indexVal`) FROM `key_word` WHERE `hashed_word` IN ('001','01v','0ji','0k9','0vc','0#v','0%d','13#' ,'148' ,'1e1','1sx','1v$','1#c','1?b','1?k','226','2kl','2ue','2*l','2?4','36h','3au','3us','4d~') GROUP BY `indexVal`
This counts how many number of times a particular result appeared in search. In InnoDB, this operation took 5 seconds. This is too much, because my orifginal dataset will be in billions.
To do this kind of work, which MySQL storage you recommend?
More than the storage engine, having the proper index in place seems important.
In your case, CREATE INDEX idx_1 ON key_word (index_val, hashed_word) should help.
And if the data truly never changes, you could even pre-compute and cache some of those results.
For example
CREATE TABLE counts AS SELECT index_val, hashed_word, count(index_val)
FROM key_word
GROUP BY index_val, hashed_word
For SELECT-only queries, ARCHIVE is the fastest storage engine.
As it is MyISAM-based, and the following advice is for MyISAM as well, don't use varchar but fixed-size char columns, and you will get better performance.
Sure, even faster if it's the data is loaded in memory, instead read from disk.
Related
i'm writing mysql query for checking any existing record in final table, if so then i will update it first and then insert those records which are not present in final table. issue here is using join its taking more time to execute and since using this in aws lambda its timing out means taking more than 15 mins. i'm not using any index here since i couldn't because we have cusomters who uses the unique constraint on different columns.
select count(Staging.EmployeeId)
from Staging
inner join Final on Staging.EmployeeId = Final.EmployeeId
where Staging.status='V'
and Staging.StagingId >= 66518110
and Staging.StagingId <= 66761690
and Staging.EmployeeId is not null
and Staging.EmployeeId <> '' ;
I'm looking in range of 250k records at once and no luck using above query. could anyone suggest how to speed up above query. I cannot use index, so looking for other option to optimize above query. thanks in advance
Creating indexes to support the search conditions and the join conditions would be the most common and the most effective way to optimize this query.
But you said you can't use indexes. This seems like an inadvisable limitation, but so be it.
Your options are therefore:
Allocate more RAM to the InnoDB buffer pool and pre-cache your table data pages, so your table-scans at least occur in RAM and do not have to wait for disk I/O.
Upgrade your server to one with faster CPUs.
Delete data until your table-scans take less time.
I mean no disrespect, but frankly, your question is like asking how to start a fire with wet newspaper.
"unique constraint on different columns" -- this does not necessarily prohibit adding indexes. You must have some indexes, whether they are UNIQUE or not.
Staging: INDEX(status, StagingId, EmployeeId)
Final: INDEX(EmployeeId)
When adding a composite index, DROP index(es) with the same leading columns.
That is, when you have both INDEX(a) and INDEX(a,b), toss the former.
If any of those columns is the PRIMARY KEY, then my advice may not be correct.
Are the tables 1:1? If not, are the 1:many, and which table is the "one"?
I want to change engine of 2 million rows table from MyISAM to InnoDB. I am afraid of this long time operation, so I created similar structure InnoDB table and now I want to copy all data from old one to this new one. What is the fastest way? SELECT INSERT? What about START TRANSACTION? Please, help. I dont want to hang my server.
Do yourself a favor: copy the whole setup to your local machine and try it all out there. You'll have a much better idea of what you are getting into. Just be aware of potential differences in hardware between your production server and your local machine.
The fastest way is probably the most straightforward way:
INSERT INTO table2 SELECT * FROM table1;
I suspect that you cannot do it any faster than what is built into the ALTER. And it does have to copy over all the data and rebuild all the indexes.
Be sure to have innodb_buffer_pool_size raised to prepare for InnoDB. And lower key_buffer_size to allow room. Suggest 35% and 12% of RAM, respectively, for the transition. After all tables are converted, suggest 70% and a mere 20MB.
One slight speedup is to do some select that fetches the entire table and the entire PRIMARY KEY (if it can be cached). This will do some I/O before really starting. Example: SELECT avg(id) FROM tbl where id is the primary key. And SELECT avg(foo) FROM tbl where foo is not indexed but it numeric. These will force a full scan of the PK index and the data, thereby caching the stuff that the ALTER will have to read.
Other tips on converting: http://mysql.rjweb.org/doc.php/myisam2innodb .
I have this query
SELECT id, alias, parent FROM `content`
Is there a way to optimize this query so 'type' is different than 'all'
id - primary, unique
id - index
parent - index
alias - index
....
Note that this query will almost never return more than 1500 rows.
Thank you
Your query is fetching all the rows, so by definition it's going to report "ALL" as the query type in the EXPLAIN report. The only other possibility is the "index" query type, an index-scan that visits every entry in the index. But that's virtually the same cost as a table-scan.
There's a saying that the fastest SQL query is one that you don't run at all, because you get the data some other way.
For example, if the data is in a cache of some type. If your data has no more than 1500 rows, and it doesn't change frequently, it may be a good candidate for putting in memory. Then you run the SQL query only if the cached data is missing.
There are a couple of common options:
The MySQL query cache is an in-memory cache maintained in the MySQL server, and purged automatically when the data in the table changes.
Memcached is a popular in-memory key-value store used frequently by applications that also use MySQL. It's very fast. Another option is Redis, which is similar but is also backed by disk storage.
Turn OFF log_queries_not_using_indexes; it clutters the slow log with red herrings like what you got.
0.00XX seconds -- good enough not to worry.
ALL is actually optimal for fetching multiple columns from 'all' rows of a table.
I am using MySQl , I have a table named cars which is in my dev_db database.
I inserted about 6,000,000 data into the table (That's a large amount of data insertion) by using bulk insertion like following:
INSERT INTO cars (cid, name, msg, date)
VALUES (1, 'blabla', 'blabla', '2001-01-08'),
(11, 'blabla', 'blabla', '2001-11-28'),
... ,
(3, 'blabla', 'blabla', '2010-06-03');
After this large data insertion into my cars table
I decide to also optimize the table like following:
OPTIMIZE TABLE cars;
I waited 53min for the optimization, finally it is done and mysql console shows me the following message:
The Msg_text shows me this table does not support optimize... , which makes my brain yields two questions to ask :
1. Does the mysql message above means the 53min I waited actually did nothing useful??
2. is it necessary to optimize my table after large amount data insertion? and why?
Optimize is useful if you have removed or overwritten rows, or if you have changed indexes. If you just inserted data it is not needed to optimize.
The MySQL Optimize Table command will effectively de-fragment a mysql
table and is very useful for tables which are frequently updated
and/or deleted.
Also look here: http://www.dbtuna.com/article.php?id=15
It looks like, You have InnoDB table, which doesn't support OPTIMIZE TABLE
As you can read in the output InnoDB does not support optimize as such.
Instead it does a recreate + optimize on the indexes instead.
The result is much the same and should not really bother you, you end up with optimized indexes.
However you only ever have to optimize your indexes if you delete rows or update indexed fields.
If you only ever insert then your B-trees will not get unbalanced and do not need optimization.
So:
Does the mysql message above means the 53min I waited actually did nothing useful??
The time spend waiting was useless, but not for the reason you think.
If there is anything to optimize, MySQL will do it.
is it necessary to optimize my table after large amount data insertion? and why?
No, never.
The reason is that MySQL (InnoDB) uses B-trees, which are fast only if they are balanced.
If the nodes are all on one side of the tree, the index degrades into a ordered list, which gives you O(n) worst case time, An fully balanced tree has O(log n) time.
However the index can only become unbalanced if you delete rows, or alter the values of indexed fields.
so I executed this query on a table:
EXPLAIN SELECT COUNT(*) FROM table;
and the 'rows' column from the output is displayed as NULL (whereas usually it will show how many rows the query went through)...
does this mean that the COUNT command is instantaneous and therefore does not require going through any row whatsoever?
If your table uses the MyISAM storage engine, then yes, that query resolves in constant time. The row count is part of the table metadata, the table itself does not have to be examined.
From: http://www.wikivs.com/wiki/MySQL_vs_PostgreSQL#COUNT.28.2A.29
Depending on what engine you are using, and I'm guessing it's MyISAM, a quick index count is executed rather than actually counting all the rows.
Many database engines use an index scan to get the count. If you're using MyISAM (fairly likely), it just reads a number in the engine index and returns it. Nearly instantaneous.
Edit:
InnoDB does a full table scan, so it will almost always be slower (than an engine that uses the table index), unless you're comparing queries with a WHERE clause.