How to create index in SQL to increase performance - mysql

I have around 200,000 rows in database table. When I execute my search query, it's taking around 4-5 seconds to give me results in next page. I want that execution should be fast and results should be loaded under 2 seconds. I have around 16 columns in my table.
Following is my query for creation of table
Create table xml(
PID int not null,
Percentdisc int not null,
name varchar(100) not null,
brand varchar(30) not null,
store varchar(30) not null,
price int not null,
category varchar(20) not null,
url1 varchar(300) not null,
emavail varchar(100) not null,
dtime varchar(100) not null,
stock varchar(30) not null,
description varchar(200) not null,
avail varchar(20) not null,
tags varchar(30) not null,
dprice int not null,
url2 varchar(300),
url3 varchar(300),
sid int primary key auto_increment);
Select query which I'm using
select * from feed where (name like '%Baby%' And NAME like '%Bassinet%')
I dont have much knowledge of indexing the database, to increase performance. Please guide me what index to use.

Indexes aren't going to help. LIKE is a non sargable operator. http://en.wikipedia.org/wiki/Sargable

The wildcard opeartor % used in starting of matching string renders any index created useless .
More are the characters before 1st wildcard operator , faster is the index lookup scan .
Anyways you can add an index to existing table
ALTER TABLE feed ADD INDEX (NAME);
This will have no index usage even after creating index on NAME column becuse it has a leading % character
select * from feed where (name like '%Baby%' And NAME like '%Bassinet%')
This will use indexing as starting % removed
select * from feed where (name like 'Baby%' And NAME like 'Bassinet%')

There's a good read here.
LIKE does not use the full text indexing. If you want to use full text searching you can use MySQL full text search functions, You can read MySQL doc regarding this.
Here's the syntax for adding INDEX in MySQL:
ALTER TABLE `feed`
ADD INDEX (`Name`);
MySQL Match example:
Substring matches: (Matches: Babylonian, Bassineete etc.)
SELECT * FROM `feed` WHERE MATCH (NAME) AGAINST ("+Baby* +Bassinett*" IN BOOLEAN MODE);
Exact matches:
SELECT * FROM `feed` WHERE MATCH (NAME) AGAINST ("+Baby +Bassinett" IN BOOLEAN MODE);

In your case index is not usefull. When we find with like operator it not use index. When we direct search i.e columnname = 'Ajay', at this time it search in index(if apply). The reason is index is searching with the physical data ,not with logical data(for like operator).
You can use Full-text search for this where you can define only those column in which you need to find data. FTS is usefull and get faster data when more data as you have.
How to enable FTS, please check the link.
http://blog.sqlauthority.com/2008/09/05/sql-server-creating-full-text-catalog-and-index/

Related

MySQL GROUP BY with Using Temporary unnecessarily?

I am trying to optimize a query. Using EXPLAIN tells me it is Using temporary. This is really inefficient given the size of the table (20m+ records). Looking at the MySQL documentation Internal Temporary Tables I don't see anything that would imply the need for a Temporary table in my query. I also tried setting the ORDER BY to the same as the GROUP BY, but still says Using Temporary and query takes forever to run. I am using MySQL 5.7.
Is there a way to avoid using a temporary table for this query:
SELECT url,count(*) as sum
FROM `digital_pageviews` as `dp`
WHERE `publisher_uuid` = '8b83120e-3e19-4c34-8556-7b710bd7b812'
GROUP BY url
ORDER BY NULL;
This is my table schema:
create table digital_pageviews
(
id int unsigned auto_increment
primary key,
visitor_uuid char(36) null,
publisher_uuid char(36) default '' not null,
property_uuid char(36) null,
ip_address char(15) not null,
referrer text null,
url_delete text null,
url varchar(255) null,
url_tmp varchar(255) null,
meta text null,
date_created timestamp not null,
date_updated timestamp null
)
collate = utf8_unicode_ci;
create index digital_pageviews_url_index
on digital_pageviews (url);
create index ndx_date_created
on digital_pageviews (date_created);
create index ndx_property_uuid
on digital_pageviews (property_uuid);
create index ndx_publisher_uuid
on digital_pageviews (publisher_uuid);
create index ndx_visitor_uuid_page
on digital_pageviews (visitor_uuid);
The reason it needs a temporary table is that it cannot both filter by publisher_uuid and sort on a column without an index to do both. The first step is to filter by publisher_uuid, so it uses the index on publisher_uuid.
However, next it has to group by and order the records, which will require a temporary table because it cannot use an index which will do this. The reason it cannot use an index is that it already used the publisher_uuid, which is not indexed on the url field to do the group by or on the field you are ordering by.
To filter where publisher_uuid = '8b83120e-3e19-4c34-8556-7b710bd7b812', group by url, and order by url, create an index with these fields in this order:
publisher_uuid
url
create index ndx_publisher_uuid
on digital_pageviews (publisher_uuid, url);

Simple SQL query lasts forever

I am using mysql-workbench and mysql server in ubunt 18 machine with 16 GB RAM.
I have a schema named ips, and two tables, say: table1 and table2.
In table1 and table2 there are two fields: ip and description, bit are of type string. I have a lot of record. table1 has 779938 records and table2 has 136657 records.
I need to make a joint query to find the number of ips in table2 that has a description starts with str1% and does not contains str2 and does not contains str3. In the same time, those ips has a description in table1 that does not start with str1%, and contains either str2 or str3.
This is my query:
SELECT COUNT(`table2`.`ip`)
FROM `ips`.`table2`, `ips`.`table1`
WHERE `table2`.`ip` = `table1`.`ip`
AND (LOWER(`table1`.`description`) NOT LIKE 'str1%'
AND (LOWER(`tabl1`.`description`) LIKE '%-str2-%'
OR LOWER(`table1`.`description`) LIKE '%-str3-%'
)
)
AND (LOWER(`table2`.`description`) LIKE 'str1%'
AND LOWER(`table2`.`description`) NOT LIKE '%-str2-%'
AND LOWER(`table2`.`description`) NOT LIKE '%-str3-%'
);
However, the query never ends. The duration has ? and I never get result. Can you please help?
EDIT:
Here are the SHOW CREATE TABLE and
1) SHOW CREATE TABLEips.table2;
CREATE TABLE `table2` (
`ip` varchar(500) DEFAULT NULL,
`description` varchar(500) DEFAULT NULL,
`type` varchar(500) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
2) SHOW CREATE TABLEips.table1;
CREATE TABLE `table1` (
`ip` varchar(500) DEFAULT NULL,
`description` varchar(500) DEFAULT NULL,
`type` varchar(500) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
3) EXPLAIN <query>
# id, select_type, table, partitions, type, possible_keys, key, key_len, ref, rows, filtered, Extra
1, SIMPLE, table2, , ALL, , , , , 136109, 100.00, Using where
1, SIMPLE, table1, , ALL, , , , , 786072, 10.00, Using where; Using join buffer (Block Nested Loop)
EDIT 2:
The data for ip field are string in this format: str.str.str.str
The decription field is in this format: str1-str2-str3-str4
The previous answer regarding Indexing might optimise the query. It might be correct. But I am sorry that I have to check the answer I used to solve the problem. Thanks to #Raymond Nijland for being first to point the indexing issue which reminded me of the primary keys.
The source of the problem is that both tables in the query did not have primary key. The primary key must be for a key that is unique and not null. In my case I already have the ip field ready to server as the primary key. Since I use mysql- workbench I right click the tables, click Alter Table then check the primary key for the approperiate field as follows:
That solved my problem.
You are getting the ALL operator in the execution plan because the SQL planner is not using any index. It's performing a Full Table Scan on both tables.
A Full Table Scan can be optimal when you are selecting more than 5% of the rows. In your case this could be good if your string prefix "str1" had a single letter. If it has more than one character, then the usage on an index could greatly improve the performance.
Now, the comparisong you are performing is not a simple one. You are not comparing the value of a column, but the result of an expression: LOWER(table1.description). Therefore you need to create virtual columns and index them if you want this query to be fast. This is available on MySQL 5.7 and newer:
alter table table1 add lower_desc varchar(50)
generated always as (LOWER(description)) virtual;
create index ix1 on table1 (lower_desc);
alter table table2 add lower_desc varchar(50)
generated always as (LOWER(description)) virtual;
create index ix2 on table2 (lower_desc);
These indexes will make your queries faster when the prefix has two or more characters. Get the execution plan again. Now, the operators ALL should not be there anymore (INDEX operators should show up in their place now).
Incidentally, I think your missed a join in the query. I think it should look like (I added the third line):
SELECT COUNT(`table2`.`ip`)
FROM `ips`.`table2`
JOIN `ips`.`table1` on `ips`.`table1`.ip = `ips`.`table2`.ip
WHERE `table2`.`ip` = `table1`.`ip`
AND (LOWER(`table1`.`description`) NOT LIKE 'str1%'
AND (LOWER(`tabl1`.`description`) LIKE '%-str2-%'
OR LOWER(`table1`.`description`) LIKE '%-str3-%'
)
)
AND (LOWER(`table2`.`description`) LIKE 'str1%'
AND LOWER(`table2`.`description`) NOT LIKE '%-str2-%'
AND LOWER(`table2`.`description`) NOT LIKE '%-str3-%'
);
Also, to optimize the join performance you'll need one (or both) of the indexes shown below:
create index ix3 on table1 (ip);
create index ix4 on table2 (ip);

Selecting results by not exact match

I need to figure out the best way to select records from db by a string that's not matching exactly the string in db.
The one stored in db is:
So-Fi (S. 1st St. District), 78704 (South Austin), Bouldin Creek, South Congress
And the one I have to match with is:
$myArea = 'So-Fi-S-1st-St-District-78704-South-Austin-Bouldin-Creek-South-Congress';
The $myArea is actually a value taken from db and formatted for SEO-friendly URL on a different page.
I've tried
SELECT* FROM t1 WHERE area = REPLACE('".$myArea."', '-', '')
But clearly there's no match. Basically, since I cannot tame $myArea and format it back to what it was in db.
Is there a way to remove all punctuation and such leaving only alphanumerics in db before selecting?
Doing lookups like this will guarantee you some headache, there are to many special cases which you'll be unable to cover.
Why don't you add a "slug" field to your database, where you put the SEO friendly string. This way you do a direct look up on the slug without having to do a lot of string manipulation.
Example of database table:
CREATE TABLE `locations` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`slug` varchar(255) NOT NULL,
`location` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=UTF8;
Then you do lookups like this:
SELECT location from locations where slug = :slug;

Optimal search query and structure for querying large set of data

I've created file indexer which simply inserts filenames into specified table. Now I'm considering the best way to search for the filenames. There could be 100000+ files in table so performance is important.
File name can be various - 10, 20, 50 or more characters in length. At least for now, my test dataset has no files with spaces in their names. User can do partial search, for example looking for '1001' should return file with name 10_1001_20_30_40_50.
My current table structure:
CREATE TABLE `file` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`id_category` int(10) unsigned NOT NULL,
`filename` varchar(255) NOT NULL,
`file_ext` varchar(3) NOT NULL,
`date_added` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`,`id_category`),
KEY `idx_file_filename` (`filename`) USING BTREE,
KEY `fk_file_1_idx` (`id_category`),
FULLTEXT KEY `filename` (`filename`)
) ENGINE=MyISAM AUTO_INCREMENT=24974 DEFAULT CHARSET=utf8;
INSERT INTO `file` (`id`,`id_category`,`filename`,`file_ext`,`date_added`) VALUES (22474,14199,'095_98_1002_1003_148_98_1001_003','pdf','2016-03-19 19:02:12');
INSERT INTO `file` (`id`,`id_category`,`filename`,`file_ext`,`date_added`) VALUES (22475,14199,'095_98_1002_1003_148_98_1001_001','pdf','2016-03-19 19:02:11');
I've tried to use MATCH () AGAINST (), but it turned out it's not a good idea if you don't have spaces in string and want to do "if string contains search" like:
SELECT id, filename FROM `file` WHERE MATCH(filename) AGAINST ('1002*' IN BOOLEAN MODE);
This is not going to return what I need. What I'm considering is to use FULLTEXT by split all filenames while importing into 3 length (min. string length user can provide) parts separated by spaces and them use queries like this:
SELECT * FROM `file` WHERE MATCH(filename) AGAINST ('100*' IN BOOLEAN MODE);
Of course I can leave filenames as they are and use LIKE operator:
SELECT * FROM `file` WHERE filename LIKE '%100%'
but there is a lot negative opinions about using LIKE for large data sets. I'm curious if my solution with adding spaces to file names will be a good idea.
Attempting to use FULLTEXT: requires space, limits you (mostly) to full "words", gets inefficient with "short" words, misses "stop words", etc.
LIKE '%100%', though inefficient because it must test every row, is what you need.
You imply that all the relevant parts of the filenames are numbers? And that you only want to test for whole parts? That is 22_100_33 will be searched for 22, 100, and 33, but not for 2, 10, 00, etc?? If all that is the case, then LIKE will not work correctly. Example: 101_1000 will be caught by LIKE '%100%'.
So, maybe you want to build an "inverted index": For 10_1001_20_30_40_50, you would have a 6 rows in a table: 10, 1001, etc, and either the rest of the columns, or some id(s) for joining to the file table.
there is a lot negative opinions about using LIKE for large data sets
Chances are it would be good enough for Your case, I would test it first.
If You really want to speed it up, I can think of one option, but sacrifices would be huge - memory, insertion times, maintanability, flexibility, complexity... You can build "inverted index" for suffixes. The table would look like (pseudocode):
CREATE TABLE Pref(
prefix varchar(255) NOT NULL,
fileid bigint(20) unsigned NOT NULL,
CONSTRAINT [PK_Pref] PRIMARY KEY CLUSTERED
(
prefix ASC,
fileid ASC
))
and have data like this
'095_98_1002_1003_148_98_1001_003', 22474
'95_98_1002_1003_148_98_1001_003', 22474
'5_98_1002_1003_148_98_1001_003', 22474
'_98_1002_1003_148_98_1001_003', 22474
'98_1002_1003_148_98_1001_003', 22474
...
'03', 22474
'3', 22474
it would have clustered primary key on both columns. That way it would be ordered by the prefix and you can change infix search '%abcd%' into prefix search 'abcd%'. The query would then have the form
SELECT id, filename FROM `file`
WHERE id IN (SELECT fileid FROM Pref WHERE prefix like 'abcd%')
You just have to make triggers to keep it in sync with the main table. Beware, that when You delete the row in this table, You should avoid search of fileid without prefix specified, or the performance would be a disaster.

Can I create a MySQL index for LIKE searches with both left and right wildcards?

I’m using MySQL 5.5.37. I have a table with a column
`NAME` varchar(100) COLLATE utf8_bin NOT NULL
and I intend to have partial searches on the name column like
select * FROM organnization where name like ‘%abc%’
Note that I want to search that the string “abc” occur anywhere, not necessarily at the beginning. Given this, is there any index I can use on the column to optimize query execution?
If you expect a few matching results only, you can still create index on the name column to speed up queries, with help of a primary key.
If your table have a primary key like
org_id int not null auto_increment primary key,
name varchar(100) COLLATE utf8_bin NOT NULL,
desc varchar(200) COLLATE utf8_bin NOT NULL,
size int,
....
you can create an index on (name, org_id)
and do your query like this:
select * from orgnizations o1 join (select org_id from orgnizations where name like '%abcd%' ) o2 using (org_id)
should be faster than your original query
if you only need one or two other columns for the name searching, you can include those columns in your name index and do queries like
select org_id, name, size from orgnizations where name like '%abcd%'
will still be much faster then the full table scan