mysql query optimisation on huge record - mysql

i'm writing mysql query for checking any existing record in final table, if so then i will update it first and then insert those records which are not present in final table. issue here is using join its taking more time to execute and since using this in aws lambda its timing out means taking more than 15 mins. i'm not using any index here since i couldn't because we have cusomters who uses the unique constraint on different columns.
select count(Staging.EmployeeId)
from Staging
inner join Final on Staging.EmployeeId = Final.EmployeeId
where Staging.status='V'
and Staging.StagingId >= 66518110
and Staging.StagingId <= 66761690
and Staging.EmployeeId is not null
and Staging.EmployeeId <> '' ;
I'm looking in range of 250k records at once and no luck using above query. could anyone suggest how to speed up above query. I cannot use index, so looking for other option to optimize above query. thanks in advance

Creating indexes to support the search conditions and the join conditions would be the most common and the most effective way to optimize this query.
But you said you can't use indexes. This seems like an inadvisable limitation, but so be it.
Your options are therefore:
Allocate more RAM to the InnoDB buffer pool and pre-cache your table data pages, so your table-scans at least occur in RAM and do not have to wait for disk I/O.
Upgrade your server to one with faster CPUs.
Delete data until your table-scans take less time.
I mean no disrespect, but frankly, your question is like asking how to start a fire with wet newspaper.

"unique constraint on different columns" -- this does not necessarily prohibit adding indexes. You must have some indexes, whether they are UNIQUE or not.
Staging: INDEX(status, StagingId, EmployeeId)
Final: INDEX(EmployeeId)
When adding a composite index, DROP index(es) with the same leading columns.
That is, when you have both INDEX(a) and INDEX(a,b), toss the former.
If any of those columns is the PRIMARY KEY, then my advice may not be correct.
Are the tables 1:1? If not, are the 1:many, and which table is the "one"?

Related

Fastest MySQL Storage Engine for `Select` queries

I am having a question about "which storage device to choose" for my database tables. I have a table with 28 million records. I will insert data after creating the table, after that, no insert - update -delete operation will take place. Never. Only select operations.
I have a query like below
SELECT `indexVal`, COUNT(`indexVal`) FROM `key_word` WHERE `hashed_word` IN ('001','01v','0ji','0k9','0vc','0#v','0%d','13#' ,'148' ,'1e1','1sx','1v$','1#c','1?b','1?k','226','2kl','2ue','2*l','2?4','36h','3au','3us','4d~') GROUP BY `indexVal`
This counts how many number of times a particular result appeared in search. In InnoDB, this operation took 5 seconds. This is too much, because my orifginal dataset will be in billions.
To do this kind of work, which MySQL storage you recommend?
More than the storage engine, having the proper index in place seems important.
In your case, CREATE INDEX idx_1 ON key_word (index_val, hashed_word) should help.
And if the data truly never changes, you could even pre-compute and cache some of those results.
For example
CREATE TABLE counts AS SELECT index_val, hashed_word, count(index_val)
FROM key_word
GROUP BY index_val, hashed_word
For SELECT-only queries, ARCHIVE is the fastest storage engine.
As it is MyISAM-based, and the following advice is for MyISAM as well, don't use varchar but fixed-size char columns, and you will get better performance.
Sure, even faster if it's the data is loaded in memory, instead read from disk.

Different ways to create index in mysql?

If i use
CREATE INDEX status_index ON eligible_users (status)
or
CREATE INDEX status ON eligible_users (status)
its the same thing no difference?
Also if i create alot of indexes will it actually help with queries or slow down?
Both statements you wrote do the same exact thing, only difference is the name of the index.
As for usefulness, it depends on the database setup and usage. Indexes are useful to speed up queries, but they have to be maintained on every INSERT/UPDATE, so it depends. There are a lot of resources available online about this wide topic.
An index can make or break a query. The execution time for certain queries can go from minutes to fractions of a second just by adding the correct indexes. In case you need to improve a query you can always prepend EXPLAIN to it, to see what MySQL's execution plan is: it will show what indexes the query uses (if any) and will help you troubleshoot some bottlenecks.
As said, an index is useful but is not free. It has to be kept up to date, so every time you insert or modify data in an indexed field, then the index must be updated too.
Generally in cases where you have a lot of reads and (relatively) few writes, indexes help a lot. But unnecessary indexes can degrade performance instead of improving it.
The short syntax for creating a single column index on column col from table tbl is:
CREATE INDEX [index_name] ON tbl (col)
Full details available in the MySQL Manual.

Do mysql update queries benefit from an index?

I have a table which I do mainly updates and I'm wondering if update queries would benefit from having an index on the where column and the updated column or an index on just where column?
Just on the where column. An index on the update column will actually slow down your query because the index has to be updated along with the data. An index on the where column will speed up updates, and selects, but slow down some insertions.
Indices also cause overhead when you delete rows. In general they are a good thing though on columns you are using WHERE on a lot, and they are basically necessary on columns you do joins on, or ORDER BY
Not a straight forward answer for this one. So here goes.
UPDATE table SET ColumnA = 'something'
if an index exists on ColumnA then you will have a slight performance hit as there will be two write operations for each row. First the data in the table and then the write for the index update.You can even have several indexes that each have ColumnA as part of the index which mean you will have several writes in addition to the table row. You can see how having more than a few indexes can start to really slow your updates down.
But if ColumnA is not indexed at all then it will be a single write for each row only.
UPDATE table SET ColumnA = 'something' WHERE ColumnB = 'something else'
For this query if an index exists on ColumnB and not on ColumnA, it will be very fast to locate the record (called a seek) and a single write to update, and as the index doesn't care about columnA, it wont need updating.But if you index ColumnA and not ColumnB, You will read every row in the table first (called a scan and normally a bad thing) which while a read is faster than a write it is still very slow, then it will write to the table and then another write for the index. Basically the slowest way of doing things.
DELETE table WHERE ColumnB = 'somethingelse'
Now if you have an index on any column in this table two writes, delete from table and a update/delete of the record in the index. Again if ColumnB is not indexed, you will scan the table then delete the row(s) from the table and update indexes if any.
INSERT INTO table (ColumnA, ColumnB) VALUES ('something','something else')
If no indexes exist, a single write to the table and it's done.
Again, if indexes do exist, then an extra write for each one.
I haven't mentioned the primary key unique constraints, because you really cant get around them when you need a primary key, but every record must be checked to see if something already exists with that key before insert. Which will be a fast primary key index seek, but nevertheless, its another step in the process. The less steps the faster it will be.
Now back to yours, Basically, if you need to update a specific record, an index will help you locate that record faster than scanning the entire table. The the time saved to locate the record will be much more then the time lost updating the indexes. If you are only inserting and never reading, then indexes will slow you down. It becomes a balance thing. If you need to read specific records, then an index will help immensely. But the more indexes, the slower the writes get.
Most people here don't know how indexes work in MySQL.
It depends on with storage engine you are using. InnoDB uses indexes completely different from MyISAM. This is because MySQL implements indexes on the storage engine level not the MySQL server level.
I'm afraid most people here are giving you answers based on other databases in which indexes work differently from MySQL.
InnoDB
In the case of InnoDB. This is because whenever a row is updated in InnoDB, the index has to be updated as well, as InnoDB's indexes have to be sequential, so it has to find out which page node of the index it is supposed to be in and inserted there. At times that particular page maybe full, so it has to split the page, wasting both space and increasing the time. This happens no matter which column you index because InnoDB uses clustered indexes, where the index stores the data of the entire row.
MyISAM
In the case of MyISAM, it does not have this problem. MyISAM actually uses only 1 column index, even though you can set multiple uniques on more than 1 column. Also MyISAM's index is not stored sequentially so updates are very quick. Likewise inserts are quick as well, as MyISAM just inserts it at the end of the row.
Conclusion
So in regard to your question, you should consider your schema design instead of worrying about whether the query would use the indexes. If you are updating mostly on a table, I suggest you not use InnoDB unless if you need row-level locking, high concurrency, and transactions. Otherwise MyISAM would be much better for update tasks. And no if you are using InnoDB indexes do not really help with updating, especially if the table is very large.

Mysql - Index Performances

Is there any performance issues if you create an index with multiple columns, or should you do 1 index per column?
There's nothing inherently wrong with a multi-column index -- it depends completely on how you're going to query the data. If you have an index on colA+colB, it will help for queries like where colA='value' and colA='value' and colB='value' but it's not going to help for queries like colB='value'.
Advantages of MySQL Indexes
Generally speaking, MySQL indexing into database gives you three advantages:
Query optimization: Indexes make search queries much faster.
Uniqueness: Indexes like primary key index and unique index help to avoid duplicate row data.
Text searching: Full-text indexes in MySQL version 3.23.23, users have the opportunity to optimize searching against even large amounts of text located in any field indexed as such.
Disadvantages of MySQL indexes
When an index is created on the column(s), MySQL also creates a separate file that is sorted, and contains only the field(s) you're interested in sorting on.
Firstly, the indexes take up disk space. Usually the space usage isn’t significant, but because of creating index on every column in every possible combination, the index file would grow much more quickly than the data file. In the case when a table is of large table size, the index file could reach the operating system’s maximum file size.
Secondly, the indexes slow down the speed of writing queries, such as INSERT, UPDATE and DELETE. Because MySQL has to internally maintain the “pointers” to the inserted rows in the actual data file, so there is a performance price to pay in case of above said writing queries because every time a record is changed, the indexes must be updated. However, you may be able to write your queries in such a way that do not cause the very noticeable performance degradation.

optimize a mySQL DB using indexes

i was wondering, if i add one index for each field in every table of my DB, will that make my queries run faster?
or do i have to analyze my queries and create indexes only when required?
Adding an index on each column will probably make most of your queries faster, but it's not necessarily the best approach. It is better to tune your indexes to your specific queries, using EXPLAIN and performance measurements to guide you in adding the correct indexes.
In particular you need to understand when you shouldn't index a column, and when you need multi-column indexes.
I would advise reading the MySQL manual for optimization of SELECT statements which explains under what conditions indexes can be used.
The more indexes you have, the heavier inserting/updating gets. So it's a tradeoff. The select queries that cannot use an index now will get quicker ofcourse, but if you check what fields you're joining on (or using in a where) you will not trade off that much
(and, ofcourse, there is the disk-space, but most of the time I don't really care bout that: ) )
Another point is that MySql can only use a single index for a query, so if your query is
SELECT * FROM table WHERE status = 1 AND col1='foob' AND col2 = 'bar'
MySql will use 1 of the indexes, and filter out the rest reading the data from the table.
If you have queries like this, its better to create a composite index on (status, col1, col2)
Adding index on every field in every table is not smart.
You should add indexes ONLY on columns that you use in the WHERE clause in select OR on which you sort.
Often, the best results are achieved by using multi-column indexes that are specific to your SQL selects.
There are also a partial indexes with limit on the length of field which can also be used to optimize performance and reduce the index site.
Every unnecessary index will slow down the database during the insert because on every insert, every index has to be updated.
Also the more indexes you have, the more chances you have of data corruption. And lastly, indexes take extra storage space on disk, sometimes a lot of space.
Also MySQL tries to keep indexes in memory. If you have unnecessary indexes, there is a good change MySQL will end up using up the available memory with unnecessary indexes in which case your performance will degrade considerable.
Creating the right kind of indexes is probably the single most important optimization technique. That's why when someone asks something like this I thought it was a joke.
This question can only be asked by someone who have not read a single book on MySQL. Just get a good book and read it, then you will not have to ask questions like this.