Original question was based on where best to set tx isolation to READ UNCOMMITTED but after some advise it would seem that my initial thoughts on that as a possible solution was incorrect.
DDL
CREATE TABLE `tblgpslog` (
`GPSLogID` BIGINT(20) NOT NULL AUTO_INCREMENT,
`DTSaved` DATETIME NULL DEFAULT NULL,
`PrimaryAssetID` BIGINT(20) NULL DEFAULT NULL,
`SecondaryAssetID` BIGINT(20) NULL DEFAULT NULL,
`ThirdAssetID` BIGINT(20) NULL DEFAULT NULL,
`JourneyType` CHAR(1) NOT NULL DEFAULT 'B',
`DateStamp` DATETIME NULL DEFAULT NULL,
`Status` VARCHAR(50) NULL DEFAULT NULL,
`Location` VARCHAR(255) NULL DEFAULT '',
`Latitude` DECIMAL(11,8) NULL DEFAULT NULL,
`Longitude` DECIMAL(11,8) NULL DEFAULT NULL,
`GPSFix` CHAR(2) NULL DEFAULT NULL,
`Speed` BIGINT(20) NULL DEFAULT NULL,
`Heading` INT(11) NULL DEFAULT NULL,
`LifeOdometer` BIGINT(20) NULL DEFAULT NULL,
`Extra` VARCHAR(20) NULL DEFAULT NULL,
`BatteryLevel` VARCHAR(5) NULL DEFAULT '--',
`Ignition` TINYINT(4) NOT NULL DEFAULT '1',
`Radius` INT(11) NOT NULL DEFAULT '0',
`GSMLatitude` DECIMAL(11,8) NOT NULL DEFAULT '0.00000000',
`GSMLongitude` DECIMAL(11,8) NOT NULL DEFAULT '0.00000000',
PRIMARY KEY (`GPSLogID`),
UNIQUE INDEX `GPSLogID` (`GPSLogID`),
INDEX `SecondaryUnitID` (`SecondaryAssetID`),
INDEX `ThirdUnitID` (`ThirdAssetID`),
INDEX `DateStamp` (`DateStamp`),
INDEX `PrimaryUnitIDDateStamp` (`PrimaryAssetID`, `DateStamp`, `Status`),
INDEX `Location` (`Location`),
INDEX `DTSaved` (`DTSaved`),
INDEX `PrimaryAssetID` (`PrimaryAssetID`)
)
COLLATE='latin1_swedish_ci'
ENGINE=InnoDB
AUTO_INCREMENT=153076364
;
The original query is as follows
SELECT L.GPSLogID, L.DateStamp, L.Status, Location, Latitude, Longitude, GPSFix, Speed, Heading, LifeOdometer, BatteryLevel, Ignition, L.Extra
FROM tblGPSLog L
WHERE PrimaryAssetID = 183 AND L.GPSLogID > 147694199
ORDER BY DateStamp ASC
LIMIT 100;
"id","select_type","table","type","possible_keys","key","key_len","ref","rows","Extra"
"1","SIMPLE","L","index_merge","PRIMARY,GPSLogID,PrimaryUnitIDDateStamp,PrimaryAssetID","PrimaryAssetID,PRIMARY","9,8",\N,"96","Using intersect(PrimaryAssetID,PRIMARY); Using where; Using filesort"
This gave issues a few months ago and after a bit of investigation I changed the query to below, but that is now acting very similar.
EXPLAIN SELECT GPSLogID, DateStamp, tmpA.Status, Location, Latitude, Longitude, GPSFix, Speed, Heading, LifeOdometer, BatteryLevel, Ignition, tmpA.Extra,
PrimaryAssetID FROM (SELECT L.GPSLogID, L.DateStamp, L.Status, Location, Latitude, Longitude, GPSFix, Speed, Heading, LifeOdometer,
BatteryLevel, Ignition, L.Extra, PrimaryAssetID
FROM tblGPSLog L
WHERE L.GPSLogID > 147694199) AS tmpA
WHERE PrimaryAssetID = 183
ORDER BY DateStamp ASC;
"id","select_type","table","type","possible_keys","key","key_len","ref","rows","Extra"
"1","PRIMARY","<derived2>","ALL",\N,\N,\N,\N,"5380842","Using where; Using filesort"
"2","DERIVED","L","range","PRIMARY,GPSLogID","PRIMARY","8",\N,"8579290","Using where"
Thanks for any advise.
Jim
I believe setting tx isolation to READ UNCOMMITTED, will stop the SELECT from locking the table.
Why would you believe that READ UNCOMMITTED will accomplish that?
SELECT is already non-locking by default in all isolation levels except for SERIALIZABLE.
That is, SELECT is always non-locking unless you use FOR UPDATE or FOR SHARE / LOCK IN SHARE MODE. When using SERIALIZABLE isolation level, SELECT is implicitly converted to a locking SELECT FOR SHARE. See https://dev.mysql.com/doc/refman/8.0/en/innodb-transaction-isolation-levels.html
I strongly recommend to never use READ UNCOMMITTED. This is not a good idea, because your transaction can read uncommitted work by other transactions, which means you can read inconsistent data (partially completed transactions), and phantom data (changes from transactions that are eventually rolled back). There is no advantage to doing this, and a potential for queries returning wrong results.
What makes you think locking is the cause of your performance problem? Have you observed an increase in lock time in the slow query log?
It's more common for performance problems to be caused by poor query optimization or not enough system resource.
If your database has become slower after 8+ years, I would guess that the database has grown until the active data set no longer fits in RAM.
Re your comment:
Is there a tool or way to investigate this further? I know the query that causing the issue, just can't determine why
There are many tools and ways to investigate. There are books on this subject like High Performance MySQL, and whole companies devoted to creating performance monitoring tools, like Percona and VividCortex.
I can't guess at a suggestion without knowing more specific details. If you want more help, can you please edit your original question above and add:
The SQL query that is having trouble.
The output of EXPLAIN <query> for the query that's having trouble.
The output of SHOW CREATE TABLE <tablename> for each table referenced by the query. You can run this statement in the MySQL client.
That's for starters.
Your statements
its rare that an SELECT would hit the table while INSERT is happening and even if it does, it wouldn't cause any great issues.
DELETE statements are scheduled once a week only at off peak hours,
equate to "Changing the isolation mode won't help much."
I recommend setting long_query_time=1 and turning on the slowlog. Later, look through the slowlog with pt-query-digest to find the few "worst" queries. Then let's discuss improving them.
More
INDEX `PrimaryUnitIDDateStamp` (`PrimaryAssetID`, `DateStamp`,
INDEX `PrimaryAssetID` (`PrimaryAssetID`)
The first of those takes care of the second, so the second is unnecessary.
PRIMARY KEY (`GPSLogID`),
UNIQUE INDEX `GPSLogID` (`GPSLogID`),
A PK is a UNIQUE key, so chuck the second of those. That extra unique index slows down inserts and wastes disk space.
In this, I see no reason to have a query and subquery:
SELECT GPSLogID, DateStamp, tmpA.Status, Location, Latitude,
Longitude, GPSFix, Speed, Heading, LifeOdometer, BatteryLevel,
Ignition, tmpA.Extra, PrimaryAssetID
FROM
( SELECT L.GPSLogID, L.DateStamp, L.Status, Location, Latitude,
Longitude, GPSFix, Speed, Heading, LifeOdometer, BatteryLevel,
Ignition, L.Extra, PrimaryAssetID
FROM tblGPSLog L
WHERE L.GPSLogID > 147694199
) AS tmpA
WHERE PrimaryAssetID = 183
ORDER BY DateStamp ASC;
A pair of DECIMAL(11,8) adds up to 12 bytes, and is overkill for lat&lng. See this for smaller alternatives.
The table has been growing in size, correct? And, after it got so big, performance took a nose dive? Shrinking datatypes to shrink the table is one approach, albeit a temporary fix.
Using intersect(PrimaryAssetID,PRIMARY) -- Almost always, it is better to build a composite index than to use "Index merge intersect".
Although
INDEX `PrimaryAssetID` (`PrimaryAssetID`)
should have been equivalent to
INDEX `PrimaryAssetID` (`PrimaryAssetID`, GPSLogID)
something is preventing it. Suggest you add this 2-column composite index. Perhaps a large percentage of rows have PrimaryAssetID = 183?? If convenient, please do SELECT COUNT(*) FROM tblgpslog WHERE PrimaryAssetID = 183
Will you be purging 'old' data from this log? If so, the optimal way involves PARTITIONing; see this.
Related
EDIT 2: now that we have optimized the db and narrowed down in MySQL - Why is phpMyAdmin extremely slow with this query that is super fast in php/mysqli?
EDIT 1: there are two solutions that helped us. One on database level (configuration) and one on query level. I could of course only accept one as the best answer, but if you are having similar problems, look at both.
We have a database that has been running perfectly fine for years. However, right now, we have a problem that I don't understand. Is it a mysql/InnoDB configuration problem? And we currently have nobody for system maintenance (I am a programmer).
The tabel TitelDaggegevens is a few Gigs in size, about 12,000,000 records, so nothing extraordinary.
If we do:
SELECT *
FROM TitelDaggegevens
WHERE fondskosten IS NULL
AND (datum BETWEEN 20200401 AND 20200430)
it runs fine, within a few tenths of a second.
The result: 52 records.
Also if we add ORDER BY datum or if we order by any other non-indexed field: all is well, same speed.
However, if I add ORDER BY id (id being the primary key), suddenly the query takes 15 seconds for the same 52 records.
And when I ORDER BY another indexed field, the query-time increases tot 4-6 minutes. For ordering 52 records. On an indexed field.
I have no clue what is going on. EXPLAIN doesn't help me. I optimized/recreated the table, checked it, and restarted the server. All to no avail. I am absolutely no expert on configuring MySQL or InnoDB, so I have no clue where to start the search.
I am just hoping that maybe someone recognises this and can point me into the right direction.
SHOW TABLE STATUS WHERE Name = 'TitelDaggegevens'
Gives me:
I know this is a very vague problem, but I am not able to pin it down more specifically. I enabled the logging for slow queries but the table slow_log stays empty. I'm lost.
Thank you for any ideas where to look.
This might be a help to someone who knows something about it, but not really to me, phpmyadmins 'Advisor':
In the comments and a reaction were asked for EXPLAIN outputs:
1) Without ORDER BY and with ORDER BY datum (which is in the WHERE and has an index):
2) With ORDER BY plus any field other than datum (indexed or not, so the same for both quick and slow queries).
The table structure:
CREATE TABLE `TitelDaggegevens` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`isbn` decimal(13,0) NOT NULL,
`datum` date NOT NULL,
`volgendeDatum` date DEFAULT NULL,
`prijs` decimal(8,2) DEFAULT NULL,
`prijsExclLaag` decimal(8,2) DEFAULT NULL,
`prijsExclHoog` decimal(8,2) DEFAULT NULL,
`stadiumDienstverlening` char(2) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
`stadiumLevenscyclus` char(1) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
`gewicht` double(7,3) DEFAULT NULL,
`volume` double(7,3) DEFAULT NULL,
`24uurs` tinyint(1) DEFAULT NULL,
`UitgeverCode` varchar(4) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
`imprintId` int(11) DEFAULT NULL,
`distributievormId` tinyint(4) DEFAULT NULL,
`boeksoort` char(1) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
`publishingStatus` tinyint(4) DEFAULT NULL,
`productAvailability` tinyint(4) DEFAULT NULL,
`voorraadAlles` mediumint(8) unsigned DEFAULT NULL,
`voorraadBeschikbaar` mediumint(8) unsigned DEFAULT NULL,
`voorraadGeblokkeerdEigenaar` smallint(5) unsigned DEFAULT NULL,
`voorraadGeblokkeerdCB` smallint(5) unsigned DEFAULT NULL,
`voorraadGereserveerd` smallint(5) unsigned DEFAULT NULL,
`fondskosten` enum('depot leverbaar','depot onleverbaar','POD','BOV','eBoek','geen') COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `ISBN+datum` (`isbn`,`datum`) USING BTREE,
KEY `UitgeverCode` (`UitgeverCode`),
KEY `Imprint` (`imprintId`),
KEY `VolgendeDatum` (`volgendeDatum`),
KEY `Index op voorraad om maxima snel te vinden` (`isbn`,`voorraadAlles`) USING BTREE,
KEY `fondskosten` (`fondskosten`),
KEY `Datum+isbn+fondskosten` (`datum`,`isbn`,`fondskosten`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=16519430 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_520_ci
Have this to handle the WHERE entirely:
INDEX(fondskosten, Datum)
Note: the = is first, then the range.
Fetch the *. Note: If there are big TEXT or BLOB columns that you don't need, spell out the SELECT list so you can avoid them. They may be stored "off-record", hence take longer to fetch.
An optional ORDER BY. If it is on Datum, then there is no extra effort. If it is on any other column, then there will be a sort. But a sort of 52 rows will be quite fast (milliseconds).
Notes:
If you don't have fondskosten IS NULL or you have some other test, then all bets are off. We have to start over in designing the optimal composite index.
USE/FORCE INDEX -- use this as a last resort.
Always provide SHOW CREATE TABLE when needing to discuss a query.
The Advisor has some good stuff, but without any clues of what is "too big", it is rather useless.
I suspect all the other discussions failed to realize that there are far more than 52 rows for the given Datum range. That is fondskosten IS NULL is really part of the problem and solution.
For people searching for tweaks in similar cases, these are the tweaks the specialist made to the db that sped it up considerably (mind you this is for a database with 100s of tables and MANY very complex and large queries sometimes joining over 15 tables but not super massive number of records. The database is only 37 gigabytes.
[mysqld]
innodb_buffer_pool_size=2G
innodb_buffer_pool_instances=4
innodb_flush_log_at_trx_commit=2
tmp_table_size=64M
max_heap_table_size=64M
join_buffer_size=4M
sort_buffer_size=8M
optimizer_search_depth=5
The optimizer_search_depth was DECREASED to minimize the time the optimizer needs for the complex queries.
After restarting the server, (regularly) run all queries that are the result of running this query:
SELECT CONCAT('OPTIMIZE TABLE `', TABLE_SCHEMA , '`.`', TABLE_NAME ,'`;') AS query
FROM INFORMATION_SCHEMA.TABLES
WHERE DATA_FREE/DATA_LENGTH > 2 AND DATA_LENGTH > 4*1024*1024
(This first one better when the server is off line or has low use if you have large tables. It rebuilds and thus optimizes the tables that need it.)
And then:
SELECT CONCAT('ANALYZE TABLE `', TABLE_SCHEMA , '`.`', TABLE_NAME ,'`;') AS query
FROM INFORMATION_SCHEMA.TABLES
WHERE DATA_FREE/DATA_LENGTH > 2 AND DATA_LENGTH > 1*1024*1024
(This second querie-series is much lighter and less infringing but may still help speed up some queries by recalculating query strategies by the server.)
Looks like ORDER BY uses 3 different optimization plans
ORDER BY id - Extra: Using index condition; Using where; Using filesort. MySQL uses filesort to resolve the ORDER BY. But rows are sorted already. So, it takes 15 second.
ORDER BY Datum or other non-indexed field - Extra: Using index condition; Using where. MySQL uses Datum index to resolve the ORDER BY. It takes few seconds.
ORDER BY index_field - Extra: Using index condition; Using where; Using filesort. MySQL uses filesort to resolve the ORDER BY. Rows are unsorted. It takes few minutes.
It's my suggestion. Only EXPLAIN can tells what's going on
Influencing ORDER BY Optimization
UPD:
Could you check this query with every ORDER BY clauses?
SELECT *
FROM TitelDaggegevens USE INDEX FOR ORDER BY (Datum)
WHERE fondskosten IS NULL
AND (Datum BETWEEN 20200401 AND 20200430)
Also you may try to increasing the sort_buffer_size
If you see many Sort_merge_passes per second in SHOW GLOBAL STATUS output, you can consider increasing the sort_buffer_size value to speed up ORDER BY or GROUP BY operations that cannot be improved with query optimization or improved indexing.
On Linux, there are thresholds of 256KB and 2MB where larger values may significantly slow down memory allocation, so you should consider staying below one of those values.
Say i have a table like below:
CREATE TABLE `hadoop_apps` (
`clusterId` smallint(5) unsigned NOT NULL,
`appId` varchar(35) COLLATE utf8_unicode_ci NOT NULL,
`user` varchar(64) COLLATE utf8_unicode_ci NOT NULL,
`queue` varchar(35) COLLATE utf8_unicode_ci NOT NULL,
`appName` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`submitTime` datetime NOT NULL COMMENT 'App submission time',
`finishTime` datetime DEFAULT NULL COMMENT 'App completion time',
`elapsedTime` int(11) DEFAULT NULL COMMENT 'App duration in milliseconds',
PRIMARY KEY (`clusterId`,`appId`,`submitTime`),
KEY `hadoop_apps_ibk_finish` (`finishTime`),
KEY `hadoop_apps_ibk_queueCluster` (`queue`,`clusterId`),
KEY `hadoop_apps_ibk_userCluster` (`user`(8),`clusterId`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
mysql> SELECT COUNT(*) FROM hadoop_apps;
This would return me a count 158593816
So I am trying to understand what is inefficient about the below query and how I can improve it.
mysql> SELECT * FROM hadoop_apps WHERE DATE(finishTime)='10-11-2013';
Also, what's the difference between these two queries?
mysql> SELECT * FROM hadoop_apps WHERE user='foobar';
mysql> SELECT * FROM hadoop_apps HAVING user='foobar';
WHERE DATE(finishTime)='10-11-2013';
This is a problem for the optimizer because anytime you put a column into a function like this, the optimizer doesn't know if the order of values returned by the function will be the same as the order of values input to the function. So it can't use an index to speed up lookups.
To solve this, refrain from putting the column inside a function call like that, if you want the lookup against that column to use an index.
Also, you should use MySQL standard date format: YYYY-MM-DD.
WHERE finishTime BETWEEN '2013-10-11 00:00:00' AND '2013-10-11 23:59:59'
What is the difference between [conditions in WHERE and HAVING clauses]?
The WHERE clause is for filtering rows.
The HAVING clause is for filtering results after applying GROUP BY.
See SQL - having VS where
If WHERE works, it is preferred over HAVING. The former is done earlier in the processing, thereby cutting down on the amount of data to shovel through. OK, in your one example, there may be no difference between them.
I cringe whenever I see a DATETIME in a UNIQUE key (your PK). Can't the app have two rows in the same second? Is that a risk you want to take.
Even changing to DATETIME(6) (microseconds) could be risky.
Regardless of what you do in that area, I recommend this pattern for testing:
WHERE finishTime >= '2013-10-11'
AND finishTime < '2013-10-11' + INTERVAL 1 DAY
It works "correctly" for DATE, DATETIME, and DATETIME(6), etc. Other flavors add an extra midnight or miss parts of a second. And it avoids hassles with leapdays, etc, if the interval is more than a single day.
KEY `hadoop_apps_ibk_userCluster` (`user`(8),`clusterId`)
is bad. It won't get past user(8). And prefixing like that is often useless. Let's see the query that tempted you to build that key; we'll come up with a better one.
158M rows with 4 varchars. And they sound like values that don't have many distinct values? Build lookup tables and replace them with SMALLINT UNSIGNED (2 bytes, 0..64K range) or other small id. This will significantly shrink the table, thereby making it faster.
I have a MYSQL database around 50GB size with millions of rows. Here is my table structure
CREATE TABLE `logs` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`mac` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`firstTime` datetime DEFAULT NULL,
`lastTime` datetime DEFAULT NULL,
`locid` int(11) DEFAULT NULL,
`client_id` int(11) DEFAULT NULL,
`created_at` datetime NOT NULL,
`updated_at` datetime NOT NULL,
`isOut` tinyint(1) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_logs_on_location_id` (`location_id`),
KEY `index_logs_on_client_id` (`client_id`),
KEY `macID` (`macID`)
) ENGINE=InnoDB AUTO_INCREMENT=39537721 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
I was looking ways to avoid full table scans. I tried to add index for mac column. However when I run EXPLAIN on my queries, possible_keys and keys are always NULL when I don't use client_id in WHERE clause, otherwise my only used index is client_id or location_id which doesn't have a significant effect on my queries in the sense of execution time. I mainly use these types of queries(grouping,sorting etc..)
SELECT mac,COUNT(mac),DATE(lastTime)
FROM logs
WHERE client_id = 1
GROUP BY mac,DATE(lastTime)
When you consider this type of table structure, how can I optimize my table to execute queries faster? I'm open to all suggestions. Thank you
To get MySQL (or Oracle, SQL Server, Postgres, MariaDB, DB2 and others) to use an index depends on how unique is the data in the mac column and how the distribution of the uniqueness is. The database engines mentioned use a cost based optimizer which estimates the cost of a certain solution and execute the solution with the lowest cost. Sometimes they are incorrect. This estimate can be influenced by playing with database parameters, however this can have unexpected side effects on other queries.
The second way to influence the result is to change the data structure.
The third way, most feasible is to influence the execution plan by providing a hint. For this lets assume an index is present on mac and lastTime so that the db engine only needs to load this index to do its job:
CREATE INDEX idx_mac_nn_1 ON logs(mac,lastTime);
The assumed to be optimized query is (so your version without the client_id column)
SELECT mac,COUNT(mac),DATE(lastTime)
FROM logs FORCE INDEX idx_mac_nn_1
GROUP BY mac,DATE(lastTime);
This then should force MySQL to use the index no matter what.
For this query:
SELECT mac, COUNT(mac), DATE(lastTime)
FROM logs
WHERE client_id = 1
GROUP BY mac, DATE(lastTime)
You want an index on (client_id, mac, lastTime). I would suggest a covering index, if you don't mind the extra space required.
We have a data set that is fairly static in a MySQL database, but the read times are terrible (even with indexes on the columns being queried). The theory is that since rows are stored randomly (or sometimes in order of insertion), the disk head has to scan around to find different rows, even if it knows where they are due to the index, instead of just reading them sequentially.
Is it possible to change the order data is stored in on disk so that it can be read sequentially? Unfortunately, we can't add a ton more RAM at the moment to have all the queries cached. If it's possible to change the order, can we define an order within an order? As in, sort by a certain column, then sort by another column if the first column is equal.
Could this have something to do with the indices?
Additional details: non-relational single-table database with 16 million rows, 1 GB of data total, 512 mb RAM, MariaDB 5.5.30 on Ubuntu 12.04 with a standard hard drive. Also this is a virtualized machine using OpenVZ, 2 dedicated core E5-2620 2Ghz CPU
Create syntax:
CREATE TABLE `Events` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`provider` varchar(10) DEFAULT NULL,
`location` varchar(5) DEFAULT NULL,
`start_time` datetime DEFAULT NULL,
`end_time` datetime DEFAULT NULL,
`cost` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `provider` (`provider`),
KEY `location` (`location`),
KEY `start_time` (`start_time`),
KEY `end_time` (`end_time`),
KEY `cost` (`cost`)
) ENGINE=InnoDB AUTO_INCREMENT=16321002 DEFAULT CHARSET=utf8;
Select statement that takes a long time:
SELECT *
FROM `Events`
WHERE `Events`.start_time >= '2013-05-03 23:00:00' AND `Events`.start_time <= '2013-06-04 22:00:00' AND `FlightRoutes`.location = 'Chicago'
Explain select:
1 SIMPLE Events ref location,start_time location 18 const 3684 Using index condition; Using where
MySQL can only select one index upon which to filter (which makes sense, because having restricted the results using an index it cannot then determine how such restriction has affected other indices). Therefore, it tracks the cardinality of each index and chooses the one that is likely to be the most selective (i.e. has the highest cardinality): in this case, it has chosen the location index, but that will typically leave 3,684 records that must be fetched and then filtered Using where to find those that match the desired range of start_time.
You should try creating a composite index over (location, start_time):
ALTER TABLE Events ADD INDEX (location, start_time)
I have a big base in MYSQL - 300 mb, where are 4 tables: the first one is about 200mb, the second is - 80.
There are 150 000 records in first table and 200 000 in second.
At the same time I use inner join there.
Select takes 3 seconds when I use optimization and indeces (before that it took about 20-30 seconds).
It is enough good result. But I need more, because page is loading for 7-8 seconds (3-4 for select, 1 for count, another small queries 1 sec, and 1-2 for page generation).
So, what I should do then? May be postgres takes less time than mysql? Or may be better to use memcaches, but in this case it can take lots of memory then (there are too many variants of sorting).
May be anybody has another idea? I would be glad to hear the new one:)
OK. I see we need queries:)
I renamed fields for table_1.
CREATE TABLE `table_1` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`field` varchar(2048) DEFAULT NULL,
`field` varchar(2048) DEFAULT NULL,
`field` int(10) unsigned DEFAULT NULL,
`field` text,
`field` text,
`field` text,
`field` varchar(128) DEFAULT NULL,
`field` text,
`field` text,
`field` text,
`field` text,
`field` text,
`field` varchar(128) DEFAULT NULL,
`field` text,
`field` varchar(4000) DEFAULT NULL,
`field` varchar(4000) DEFAULT NULL,
`field` int(10) unsigned DEFAULT '1',
`field` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`field` text,
`new` tinyint(1) NOT NULL DEFAULT '0',
`applications` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `indexNA` (`new`,`applications`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=153235 DEFAULT CHARSET=utf8;
CREATE TABLE `table_2` (
`id_record` int(10) unsigned NOT NULL AUTO_INCREMENT,
`catalog_name` varchar(512) NOT NULL,
`catalog_url` varchar(4000) NOT NULL,
`parent_id` int(10) unsigned NOT NULL DEFAULT '0',
`checked` tinyint(1) NOT NULL DEFAULT '0',
`level` int(10) unsigned NOT NULL DEFAULT '0',
`work` int(10) unsigned NOT NULL DEFAULT '0',
`update` int(10) unsigned NOT NULL DEFAULT '1',
`type` int(10) unsigned NOT NULL DEFAULT '0',
`hierarchy` varchar(512) DEFAULT NULL,
`synt` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id_record`,`type`) USING BTREE,
KEY `rec` (`id_record`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=14504 DEFAULT CHARSET=utf8;
CREATE TABLE `table_3` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`id_table_1` int(10) unsigned NOT NULL,
`id_category` int(10) unsigned NOT NULL,
`work` int(10) unsigned NOT NULL DEFAULT '1',
`update` int(10) unsigned NOT NULL DEFAULT '1',
PRIMARY KEY (`id`),
KEY `site` (`id_table_1`,`id_category`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=203844 DEFAULT CHARSET=utf8;
There queries are:
1) get general count (takes less than 1 sec):
SELECT count(table_1.id) FROM table_1
INNER JOIN table_3 ON table_3.id_table_id = table_1.id
INNER JOIN table_2 ON table_2.id_record = table_3.id_category
WHERE ((table_2.type = 0)
AND (table_3.work = 1 AND table_2.work = 1)
AND (table_1.new = 1))AND 1 IN (table_1.applications)
2) get list for page with limit (it takes from 3 to 7 seconds, depends on count):
SELECT table_1.field, table_1.field, table_1.field, table_1.field, table_2.catalog_name FROM table_1
INNER JOIN table_3 ON table_3.id_table_id = table_1.id
INNER JOIN table_2 ON table_2.id_record = table_3.id_category
WHERE ((table_2.type = 0)
AND (table_3.work = 1 AND table_2.work = 1)
AND (table_1.new = 1))AND 1 IN (table_1.applications) LIMIT 10 OFFSET 10
Do Not Change DBMS
I would not suggest to change your DBMS, it may be very disruptive. If you have used MySQL specific queries that are not compatible with Postgres; you might need to redo whole indexing etc. Even then it may not guarantee a performance improvement.
Caching is a Good Option
Caching is really good idea. It takes load off your DBMS. It is best suited if you have heavy read, light write. This way objects would stay more time in Cache. MemcacheD is really good caching mechanism, and is really simple. Rapidly scaling sites (like Facebook and the likes) make heavy use of MemcacheD to alleviate the load from database.
How to Scale-up Really Big Time
Although, you do not have very heavy data.. so most likely caching would help you. But the next step ahead of caching is noSQL based solutions like Cassandra. We use cassandra in one of our application where we have heavy read and write (50:50) operation and database is really large and fast growing. Cassandra gives good performance. But, I guess in your case, Cassandra is an overkill.
But...
Before, you dive into any serious changes, I would suggest to really look into indexes. Try scaling vertically. Look into slow queries. (Search for slow query logging directive). Hopefully, MySQL will be faster after optimizing these thing and you would not need additional tools.
You should look into indexing specific to the most frequent/time consuming queries you use. Check this post on indexing for mysql.
Aside from all the other suggestions others have offered, I've slightly altered and not positive of the performance impact under MySQL. However, I've added STRAIGHT_JOIN so the optimizer doesn't try to think which order or table to join FOR you.
Next, I moved the "AND" conditions into the respective JOIN clauses for tables 2 & 3.
Finally, the join from table 1 to 3 had (in your post)
table_3.id_table_id = table_1.id
instead of
table_3.id_table_1 = table_1.id
Additionally, I can't tell performance, but maybe having a stand-alone index on just the "new" column for exact match first without regards to the "applications" column. I don't know if the compound index is causing an issue since you are using an "IN" for the applications and not truly an indexable search basis.
Here's the modified results
SELECT STRAIGHT_JOIN
count(table_1.id)
FROM
table_1
JOIN table_3
ON table_1.id = table_3.id_table_1
AND table_3.work = 1
JOIN table_2
ON table_3.id_category = table_2.id_record
AND table_2.type = 0
AND table_2.work = 1
WHERE
table_1.new = 1
AND 1 IN table_1.applications
SELECT STRAIGHT_JOIN
table_1.field,
table_1.field,
table_1.field,
table_1.field,
table_2.catalog_name
FROM
table_1
JOIN table_3
ON table_1.id = table_3.id_table_1
AND table_3.work = 1
JOIN table_2
ON table_3.id_category = table_2.id_record
AND table_2.type = 0
AND table_2.work = 1
WHERE
table_1.new = 1
AND 1 IN table_1.applications
LIMIT 10 OFFSET 10
You should also optimize your query.
Without a look into the statements this question can only be answered using theoretical approaches. Just a few ideas to take into consideration...
The SELECT-Statement...
First of all, make sure that your query is as "good" as it can be. Are there any indeces you might have missed? Are those indeces the same field types and so on? Can you perhaps narrow the query down so the database has less to work on?
The Query cache...
If your query is repeated pretty often, it might help to use the Query cache or - in case you're already using it - give it more RAM.
The Hardware...
Of course different RDBMS are slower or faster than others, depending on their strenght or weaknesses, but if your query is optimized into oblivion, you only can get it faster while scaling up the database server (better cpu, better i/o and so on, depending on where the bottleneck is).
Other Factors...
If this all is maxed out, maybe try speeding up the other components (1-2 secs for page generation looks pretty slow to me).
To all those factors mentioned there is a huge amount of ideas and posts in stackoverflow.com.
That is actually not such a big database, certainly not too much for your database system. As comparison, the database that we are using is currently around 40 GB. It's an MS SQL Server, though, so it's not directly comparable, but there is no dramatic difference between the database systems.
My guess is that you haven't been completely successful in using indexes to speed up the query. You should look at the execution plan for the query and see if you can spot what part of the execution that is taking most of the time.