database - how to do right indexing for fast execution of large data in mysql - mysql

I have a table which has a huge amount of data. I have 9 column in that table (bp_detail) and 1 column of ID which is my primary key in the table. So I am fetching data using query
select * from bp_detail
so what I need to do to get data in a fast way? should I need to make indexes? if yes then on which column?
I am also using that table (bp_detail) for inner join with a table (extras) to get record on the base of where clause, and the query that I am using is:
select * from bp_detail bp inner join extras e
on (bp.id = e.bp_id)
where bp.id = '4' or bp.name = 'john'
I have joined these tables by applying foreign key on bp_detail id and extras bp_id so in this case what should I do to get speedy data. Right Now I have an indexed on column "name" in extras table.
Guidance highly obliged

If selecting all records you would gain nothing by indexing any column. Index makes filtering/ordering by the database engine quicker. Imagine large book with 20000 pages. Having index on first page with chapter names and page numbers you can quickly navigate through the book. Same applies to the database since it is nothing more than a collection of records kept one after another.
You are planning to join tables though. The filtering takes place when JOINING:
on (bp.id = e.bp_id)
and in the WHERE:
where bp.id = '4' or bp.name = 'john'
(Anyway, any reason why you are filtering by both the ID and the NAME? ID should be unique enough).
Usually table ID's should be primary keys so joining is covered. If you plan to filter by the name frequently, consider adding an index there too. You ought to check how does database indexes work as well.
Regarding the name index, the lookup speed depends on search type. If you plan to use the = equality search it will be very quick. It will be quite quick with right wildcard too (eg. name = 'john%'), but quite slow with the wildcard on both sides (eg. name = '%john%').
Anyway, is your database large enough? Without much data and if your application is not read-intensive this feels like beginner's mistake called premature optimization.

depending on your searching criteria, if you are just selecting all of the data then the primary key is enough, to enhance the join part you can create an index on e.bp_id can help you more if you shared the tables schema

Related

Optimize query through the order of columns in index

I had a table that is holding a domain and id
the query is
select distinct domain
from user
where id = '1'
the index is using the order idx_domain_id is faster than idx_id_domain
if the order of the execution is
(FROM clause,WHERE clause,GROUP BY clause,HAVING clause,SELECT
clause,ORDER BY clause)
then the query should be faster if it use the sorted where columns than the select one.
at 15:00 to 17:00 it show the same query i am working on
https://serversforhackers.com/laravel-perf/mysql-indexing-three
the table has a 4.6 million row.
time using idx_domain_id
time after change the order
This is your query:
select distinct first_name
from user
where id = '1';
You are observing that user(first_name, id) is faster than user(id, firstname).
Why might this be the case? First, this could simply be an artifact of how your are doing the timing. If your table is really small (i.e. the data fits on a single data page), then indexes are generally not very useful for improving performance.
Second, if you are only running the queries once, then the first time you run the query, you might have a "cold cache". The second time, the data is already stored in memory, so it runs faster.
Other issues can come up as well. You don't specify what the timings are. Small differences can be due to noise and might be meaningless.
You don't provide enough information to give a more definitive explanation. That would include:
Repeated timings run on cold caches.
Size information on the table and the number of matching rows.
Layout information, particularly the type of id.
Explain plans for the two queries.
select distinct domain
from user
where id = '1'
Since id is the PRIMARY KEY, there is at most one row involved. Hence, the keyword DISTINCT is useless.
And the most useful index is what you already have, PRIMARY KEY(id). It will drill down the BTree to find id='1' and deliver the value of domain that is sitting right there.
On the other hand, consider
select distinct domain
from user
where something_else = '1'
Now, the obvious index is INDEX(something_else, domain). This is optimal for the WHERE clause, and it is "covering" (meaning that all the columns needed by the query exist in the index). Swapping the columns in the index will be slower. Meanwhile, since there could be multiple rows, DISTINCT means something. However, it is not the logical thing to use.
Concerning your title question (order of columns): The = columns in the WHERE clause should come first. (More details in the link below.)
DISTINCT means to gather all the rows, then de-duplicate them. Why go to that much effort when this gives the same answer:
select domain
from user
where something_else = '1'
LIMIT 1
This hits only one row, not all the 1s.
Read my Indexing Cookbook.
(And, yes, Gordon has a lot of good points.)

SQL indexing Strings for query speed

I want to query for strings as quickly as possible, where the searched-for string could have many thousands of matches in the database. There will also be other conditions to be met, and the returned result set will likely never be over 100 rows.
The option I came up with is to have two tables:
Table items
id | name_id
and
Table names
id | string
where ID on both tables is autoincrement. and names.string is unique. name_id of course is not unique and would have an index.
And I will select from table items like:
SELECT names.string FROM items
JOIN names
ON items.name_id = names.id
WHERE names.string = 'abcdefg'
AND items.someColumn = items.someValue /*to further filter my results.*/
My intention is to save space and improve speed, of course.
Will this be faster than just using a simple SELECT * from items WHERE name='abcdefg'?
Are there any problems I should address, based on the information i've given?
Also, I'm not searching for substrings. Just the full match i.e. names.string = 'abcefg'
And feel free to ignore this question, since it's fairly broad, but are there other approaches that will be noticably faster (preferably without significant complexity increase)?
There are many good reasons for creating a reference table with string values. Such normalization is generally a good idea.
However, it is not needed for performance reasons in your case. If you have queries that are like:
SELECT items.*
FROM items
WHERE items.string = 'abcdefg' AND items.someColumn = <somevalue>;
Then either of the following two indexes are best for this query:
items(string, somecolumn)
items(somecolumn, string)
This is true in both MySQL and SQL Server.
You can index string columns and using the index will be efficient for most queries. (There are issues if you start mixing collations, but let's put issues like that aside for this answer.)

Speed up self joined sql query

I have a relatively simply query I am trying to run on a table:
select distinct(a.question_id || a.app_name)
from quick_stats a
join quick_stats b on a.question_id = b.question_id
and a.app_name != b.app_name;
Unfortunately, the query is taking a very long time to run.
I believe this is because there are about 4 million records in the table and since it must check each record against every other record in the table this means there are 16 trillion checks.
How can I write this query so it doesnt make so many checks?
It's mostly a table design issue.
Check if question_id and app_name are indexed
Keep in mind: the less indexed columns you have, the better your performance is
An index is stored in an extra hash table that points to the full entry in your database
That said: if you have indexed question_id and app_name your query searches in some kind of a seperate table and does not have to read the full table with all their columns
A very useful source about how to index a table correctly is: http://use-the-index-luke.com/welcome

How to make my MySQL SUM() query more faster

I have about 1 million rows on users table and have columns A AA B BB C CC D DD E EE F FF by example to count int values 0 & 1
SELECT
CityCode,SUM(A),SUM(B),SUM(C),SUM(D),SUM(E),SUM(F),SUM(AA),SUM(BB),SUM(CC),SUM(DD),SUM(EE),SUM(FF)
FROM users
GROUP BY CityCode
Result 8 rows in set (24.49 sec).
How to make my statement more faster?
Use explain to to know the excution plan of your query.
Create atleast one or more Index. If possible make CityCode primary key.
Try this one
SELECT CityCode,SUM(A),SUM(B),SUM(C),SUM(D), SUM(E),SUM(F),SUM(AA),SUM(BB),SUM(CC),SUM(DD),SUM(EE),SUM(FF)
FROM users
GROUP BY CityCode,A,B,C,D,E,F,AA,BB,CC,DD,EE,FF
Create an index on the CityCode column.
I believe it is not because of SUM(), try to say select CityCode from users group by CityCode; it should take neary the same time...
Use better hardware
increase caching size - if you use InnoDB engine, then increase the innodb_buffer_pool_size value
refactor your query to limit the number of users (if business logic permits that, of course)
You have no WHERE clause, which means the query has to scan the whole table. This will make it slow on a large table.
You should consider how often you need to do this and what the impact of it being slow is. Some suggestions are:
Don't change anything - if it doesn't really matter
Have a table which contains the same data as "users", but without any other columns that you aren't interested in querying. It will still be slow, but not as slow, especially if there are bigger ones
(InnoDB) use CityCode as the first part of the primary key for table "users", that way it can do a PK scan and avoid any sorting (may still be too slow)
Create and maintain some kind of summary table, but you'll need to update it each time a user changes (or tolerate stale data)
But be sure that this optimisation is absolutely necessary.

Multi-column database indexes and query speed

I'm deploying a Rails application that aggregates coupon data from various third-party providers into a searchable database. Searches are conducted across four fields for each coupon: headline, coupon code, description, and expiration date.
Because some of these third-party providers do a rather bad job of keeping their data sorted, and because I don't want duplicate coupons to creep into my database, I've implemented a unique compound index across those four columns. That prevents the same coupon from being inserted into my database more than once.
Given that I'm searching against these columns (via simple WHERE column LIKE %whatever% matching for the time being), I want these columns to each individually benefit from the speed gains to be had by indexing them.
So here's my question: will the compound index across all columns provide the same searching speed gains as if I had applied an individual index to each column? Or will it only guarantee uniqueness among the rows?
Complicating the matter somewhat is that I'm developing in Rails, so my question pertains both to SQLite3 and MySQL (and whatever we might port over to in the future), rather than one specific RDBMS.
My guess is that the indexes will speed up searching across individual columns, but I really don't have enough "under the hood" database expertise to feel confident in that judgement.
Thanks for lending your expertise.
will the compound index across all
columns provide the same searching
speed gains as if I had applied an
individual index to each column?
Nope. The order of the columns in the index is very important. Lets suppose you have an index like this: create unique index index_name on table_name (headline, coupon_code, description,expiration_date)
In this case these queries will use the index
select * from table_name where headline = 1
select * from table_name where headline = 1 and cupon_code = 2
and these queries wont use the unique index:
select * from table_name where coupon_code = 1
select * from table_name where description = 1 and cupon_code = 2
So the rule is something like this. When you have multiple fields indexed together, then you have to specify the first k field to be able to use the index.
So if you want to be able to search for any one of these fields then you should create on index on each of them separately (besides the combined unique index)
Also, be careful with the LIKE operator.
this will use index SELECT * FROM tbl_name WHERE key_col LIKE 'Patrick%';
and this will not SELECT * FROM tbl_name WHERE key_col LIKE '%Patrick%';
index usage http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
multiple column index http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html