Longest MySQL queries for worst case testing - mysql

I have a big mysql Database (planned is about one million entries) and I want to test its performance by creating a worst query (longest calculation time) i am able to.
For now it is a database with two tables:
CREATE TABLE user (ID BIGINT NOT NULL AUTO_INCREMENT,
createdAt DATETIME NULL DEFAULT NULL,
lastAction DATETIME NULL DEFAULT NULL,
ip TEXT NULL DEFAULT NULL,
browser TEXT NULL DEFAULT NULL,
PRIMARY KEY (ID))
CREATE TABLE evt (ID BIGINT AUTO_INCREMENT,
UID BIGINT NULL DEFAULT NULL,
timeStamp DATETIME NULL DEFAULT NULL,
name TEXT NULL DEFAULT NULL,
PRIMARY KEY (ID),
FOREIGN KEY (UID)
REFERENCES user(ID))
It's populated and is running locally so no connection is required.
Are there any rules of Thumb on how to create horrible queries?
My worst query for now was:
SELECT user.browser, evt.name, count(*) as AmountOfActions
FROM evt
JOIN user ON evt.UID = user.ID
GROUP BY user.browser, evt.name
ORDER BY AmountOfActions DESC

The number one cost in a query is disk hits. So, make a table big enough so that it cannot be cached in RAM. And/or do a cross-join (etc) such that an intermediate table is too big to be cached in RAM.
A common problem on this forum is lots of joins followed by a group by. Or lots of joins, plus an order by on the big intermediate result.
Here's a double-whammy -- join two tables (each too big to be cached) on a UUID.

Related

How to avoid full table scan

I have a MYSQL database around 50GB size with millions of rows. Here is my table structure
CREATE TABLE `logs` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`mac` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`firstTime` datetime DEFAULT NULL,
`lastTime` datetime DEFAULT NULL,
`locid` int(11) DEFAULT NULL,
`client_id` int(11) DEFAULT NULL,
`created_at` datetime NOT NULL,
`updated_at` datetime NOT NULL,
`isOut` tinyint(1) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_logs_on_location_id` (`location_id`),
KEY `index_logs_on_client_id` (`client_id`),
KEY `macID` (`macID`)
) ENGINE=InnoDB AUTO_INCREMENT=39537721 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
I was looking ways to avoid full table scans. I tried to add index for mac column. However when I run EXPLAIN on my queries, possible_keys and keys are always NULL when I don't use client_id in WHERE clause, otherwise my only used index is client_id or location_id which doesn't have a significant effect on my queries in the sense of execution time. I mainly use these types of queries(grouping,sorting etc..)
SELECT mac,COUNT(mac),DATE(lastTime)
FROM logs
WHERE client_id = 1
GROUP BY mac,DATE(lastTime)
When you consider this type of table structure, how can I optimize my table to execute queries faster? I'm open to all suggestions. Thank you
To get MySQL (or Oracle, SQL Server, Postgres, MariaDB, DB2 and others) to use an index depends on how unique is the data in the mac column and how the distribution of the uniqueness is. The database engines mentioned use a cost based optimizer which estimates the cost of a certain solution and execute the solution with the lowest cost. Sometimes they are incorrect. This estimate can be influenced by playing with database parameters, however this can have unexpected side effects on other queries.
The second way to influence the result is to change the data structure.
The third way, most feasible is to influence the execution plan by providing a hint. For this lets assume an index is present on mac and lastTime so that the db engine only needs to load this index to do its job:
CREATE INDEX idx_mac_nn_1 ON logs(mac,lastTime);
The assumed to be optimized query is (so your version without the client_id column)
SELECT mac,COUNT(mac),DATE(lastTime)
FROM logs FORCE INDEX idx_mac_nn_1
GROUP BY mac,DATE(lastTime);
This then should force MySQL to use the index no matter what.
For this query:
SELECT mac, COUNT(mac), DATE(lastTime)
FROM logs
WHERE client_id = 1
GROUP BY mac, DATE(lastTime)
You want an index on (client_id, mac, lastTime). I would suggest a covering index, if you don't mind the extra space required.

ORDER BY ignored as there is a user-defined clustered index in the table

I have a table
CREATE TABLE `tableMain` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`value1` varchar(45) NOT NULL,
'value2' varchar(50) NOT NULL,
'value3' int NOT NULL,
'value4' timestamp NOT NULL,
'value5' int NOT NULL
PRIMARY KEY (`id`)
)
So I create that table and I want it to be always ordered by value2, if there is two a like it should sort by value3 and after that value4.
So I try to do it by that
ALTER TABLE tableMain
ORDER BY value2 ASC, value3 ASC, value4 ASC
And when I run that code I get an error:
Error Code: 1105. ORDER BY ignored as there is a user-defined clustered index in the table 'tableMain'
I want to add that I got that as a homework for school and others who have same task can run this ALTER TABLE line. So I'm little bit confused and don't know what to do.
If we want rows returned in a particular sequence, we can specify a suitable ORDER BY clause in the query.
The idea that rows need to be "ordered" in a table flies in the face of relational database theory. (A relation is a set of tuples; altering the "order" of tuples within a relation does not alter the relation.)
Translating the theory into practice, with the InnoDB storage engine, it doesn't make sense to specify ORDER BY as a table attribute, since an InnoDB table will always be ordered (rows stored in sequence), arranged by its cluster index.
In the case of the MyISAM storage engine, specifying ORDER BY may improve performance of some queries. The ALTER TABLE ... ORDER BY statement only reorganizes the table one time. The "order" of the rows may not be preserved when subsequent DELETE, INSERT and UPDATE statements are run.
To reiterate: if we need rows returned in a particular order, we should not depend on the "order" that rows are physically stored in a table. It's imperative that we include an ORDER BY on the query.
To really improve performance with large tables, adding appropriate indexes is the way to go.
As to why your classmates get the statement to run, and your statement returns an error... the most likely explanation is that their tables are using the MyISAM storage engine, while your table is using the InnoDB storage engine.
(Whatever the assignment is, changing the storage engine of your table can not be the right answer... MyISAM storage engine is an appropriate choice for some use cases; but InnoDB is the most appropriate choice for traditional "relational database" uses cases.)
If your requirement is that your InnoDB table to "always be ordered by" a set of columns (for whatever reason), then have the cluster index include those columns as the leading columns. You can do that by declaring those columns as the leading columns of the PRIMARY KEY of your table. You can create a UNIQUE INDEX on the id column.
CREATE TABLE `tableMain`
( `id` INT(11) NOT NULL AUTO_INCREMENT
, `value1` VARCHAR(45) NOT NULL
, `value2` VARCHAR(50) NOT NULL
, `value3` INT NOT NULL
, `value4` TIMESTAMP NOT NULL
, `value5` INT NOT NULL
, PRIMARY KEY (`value2`,`value3`,`value4`,`id`)
, UNIQUE KEY `tableMain_UX1` (`id`)
)
In reality, we'd never do this... because any secondary indexes are going to include the PRIMARY KEY values as the "pointer" back to the cluster index, and that's going to be an incredible waste of resources. In practice, we'd leave id as the PRIMARY KEY of the table, and create a secondary index on the other columns...
CREATE TABLE `tableMain`
( `id` INT(11) NOT NULL AUTO_INCREMENT
, `value1` VARCHAR(45) NOT NULL
, `value2` VARCHAR(50) NOT NULL
, `value3` INT NOT NULL
, `value4` TIMESTAMP NOT NULL
, `value5` INT NOT NULL
, PRIMARY KEY (`id`)
, KEY `tableMain_IX1` (`value2`,`value3`,`value4`)
)

GROUP BY Query -why so slow

I am trying to generate a group query on a large table (more than 8 million rows). However I can reduce the need to group all the data by date. I have a view that captures that dates I require and this limits the query bu it's not much better.
Finally I need to join to another table to pick up a field.
I am showing the query, the create on the main table and the query explain below.
Main Query:
SELECT pgi_raw_data.wsp_channel,
'IOM' AS wsp,
pgi_raw_data.dated,
pgi_accounts.`master`,
pgi_raw_data.event_id,
pgi_raw_data.breed,
Sum(pgi_raw_data.handle),
Sum(pgi_raw_data.payout),
Sum(pgi_raw_data.rebate),
Sum(pgi_raw_data.profit)
FROM pgi_raw_data
INNER JOIN summary_max
ON pgi_raw_data.wsp_channel = summary_max.wsp_channel
AND pgi_raw_data.dated > summary_max.race_date
INNER JOIN pgi_accounts
ON pgi_raw_data.account = pgi_accounts.account
GROUP BY pgi_raw_data.event_id
ORDER BY NULL
The create table:
CREATE TABLE `pgi_raw_data` (
`event_id` char(25) NOT NULL DEFAULT '',
`wsp_channel` varchar(5) NOT NULL,
`dated` date NOT NULL,
`time` time DEFAULT NULL,
`program` varchar(5) NOT NULL,
`track` varchar(25) NOT NULL,
`raceno` tinyint(2) NOT NULL,
`detail` varchar(30) DEFAULT NULL,
`ticket` varchar(20) NOT NULL DEFAULT '',
`breed` varchar(12) NOT NULL,
`pool` varchar(10) NOT NULL,
`gross` decimal(11,2) NOT NULL,
`refunds` decimal(11,2) NOT NULL,
`handle` decimal(11,2) NOT NULL,
`payout` decimal(11,4) NOT NULL,
`rebate` decimal(11,4) NOT NULL,
`profit` decimal(11,4) NOT NULL,
`account` mediumint(10) NOT NULL,
PRIMARY KEY (`event_id`,`ticket`),
KEY `idx_account` (`account`),
KEY `idx_wspchannel` (`wsp_channel`,`dated`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=latin1
This is my view for summary_max:
CREATE ALGORITHM=UNDEFINED DEFINER=`root`#`localhost` SQL SECURITY DEFINER VIEW
`summary_max` AS select `pgi_summary_tbl`.`wsp_channel` AS
`wsp_channel`,max(`pgi_summary_tbl`.`race_date`) AS `race_date`
from `pgi_summary_tbl` group by `pgi_summary_tbl`.`wsp
And also the evaluated query:
1 PRIMARY <derived2> ALL 6 Using temporary
1 PRIMARY pgi_raw_data ref idx_account,idx_wspchannel idx_wspchannel
7 summary_max.wsp_channel 470690 Using where
1 PRIMARY pgi_accounts ref PRIMARY PRIMARY 3 gf3data_momutech.pgi_raw_data.account 29 Using index
2 DERIVED pgi_summary_tbl ALL 42282 Using temporary; Using filesort
Any help on indexing would help.
At a minimum you need indexes on these fields:
pgi_raw_data.wsp_channel,
pgi_raw_data.dated,
pgi_raw_data.account
pgi_raw_data.event_id,
summary_max.wsp_channel,
summary_max.race_date,
pgi_accounts.account
The general (not always) rule is anything you are sorting, grouping, filtering or joining on should have an index.
Also: pgi_summary_tbl.wsp
Also, why the order by null?
The first thing is to be sure that you have indexes on pgi_summary_table(wsp_channel, race_date) and pgi_accounts(account). For this query, you don't need indexes on these columns in the raw data.
MySQL has a tendency to use indexes even when they are not the most efficient path. I would start by looking at the performance of the "full" query, without the joins:
SELECT pgi_raw_data.wsp_channel,
'IOM' AS wsp,
pgi_raw_data.dated,
-- pgi_accounts.`master`,
pgi_raw_data.event_id,
pgi_raw_data.breed,
Sum(pgi_raw_data.handle),
Sum(pgi_raw_data.payout),
Sum(pgi_raw_data.rebate),
Sum(pgi_raw_data.profit)
FROM pgi_raw_data
GROUP BY pgi_raw_data.event_id
If this has better performance, you may have a situation where the indexes are working against you. The specific problem is called "thrashing". It occurs when a table is too bit to fit into memory. Often, the fastest way to deal with such a table is to just read the whole thing. Accessing the table through an index can result in an extra I/O operation for most of the rows.
If this works, then do the joins after the aggregate. Also, consider getting more memory, so the whole table will fit into memory.
Second, if you have to deal with this type of data, then partitioning the table by date may prove to be a very useful option. This will allow you to significantly reduce the overhead of reading the large table. You do have to be sure that the summary table can be read the same way.

Optimization of a MySQL view

I want to join two MySQL tables and store it as a view, so I can address this view in a application in stead of querying two tables. But this view occurs to be extremely slow.
This are my tables:
CREATE TABLE spectrumsets (
setid INT(11) NOT NULL,
timestampdt INT(11) NULL DEFAULT NULL,
timestampd INT(10) UNSIGNED NOT NULL,
timestampt INT(10) UNSIGNED NOT NULL,
device INT(11) NOT NULL,
methodname VARCHAR(50) NOT NULL,
PRIMARY KEY (setid),
UNIQUE INDEX setid_idx (setid),
UNIQUE INDEX timestamp_device_idx (timestampd, timestampt, device),
INDEX device_fk (device),
INDEX timestampd_idx (timestampd),
CONSTRAINT device_fk FOREIGN KEY (device)
REFERENCES spectrumdevices (deviceid)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
CREATE TABLE spectrumdata (
valueid INT(11) NOT NULL AUTO_INCREMENT,
spectrumset INT(11) NOT NULL,
wavelength DOUBLE NULL DEFAULT NULL,
intensity DOUBLE NULL DEFAULT NULL,
PRIMARY KEY (valueid),
INDEX spectrumset_idx (spectrumset),
CONSTRAINT spectrumset_fk FOREIGN KEY (spectrumset)
REFERENCES spectrumsets (setid)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
And this is my view:
SELECT spectrumsets.timestampd,spectrumsets.timestampt,spectrumsets.device,
spectrumdata.wavelength,spectrumdata.intensity
FROM spectrumdata INNER JOIN spectrumsets ON spectrumdata.spectrumset=
spectrumsets.setid
WHERE spectrumdata.wavelength>0
ORDER BY spectrumsets.timestampd,spectrumsets.timestampt,spectrumsets.device,
spectrumdata.wavelength
A select count(*) on my machine takes 385.516 seconds and results into 82923705 records, so a rather large dataset
I already found this link but still don't fully understand what's wrong.
UPDATE:
EXPLAIN gives this results:
"id","select_type","table","type","possible_keys","key","key_len","ref","rows","Extra"
"1","SIMPLE","spectrumsets","index","PRIMARY,setid_idx","timestamp_device_idx","12",NULL,"327177","Using index; Using temporary; Using filesort"
"1","SIMPLE","spectrumdata","ref","spectrumset_idx","spectrumset_idx","4","primprod.spectrumsets.setid","130","Using where"
Explain suggests that the query is hitting the indices for the join (which is good), but then using a temporary table and file sort for the rest of the query.
This is for two reasons:
the where clause isn't hitting the index
the order by clause isn't hitting the index
In a comment, you say that removing the where clause has lead to a big improvement; that suggests you need the compound index on spectrumset, wavelength, assuming wavelength has a decent number of possible values (if it's just 10 values, an index may not do anything).
If you leave the "order by" clause out of your view, it should go a lot faster - and there's a good case for letting sort order be determined by the query extracting data, not the view. I'm guessing most queries will be very selective about the data - limiting to a few timestamps; by embedding the order by in the view, you pay the price for sorting every time.
If you really must have the "order by" in the view, create an index that includes all fields in the order of the "order by", with the join at the front. For instance:
UNIQUE INDEX timestamp_device_idx (set_id, timestampd, timestampt, device),

Optimizing mySQL database on large table with frequent writes and fulltext columns?

My team runs a shopping information site and as we've started to grow, we're starting to experience issues with query response time on our product table impacting display speed.
The main issue we experience is when a user "saves" an item triggering an update query as other users are searching on the FULLTEXT indexes that exist as well:
UPDATE product SET listed = listed+1 WHERE product_id = XX
For example, I just ran the update in 0.01 seconds with no other queries hitting but a few minutes ago, with a large FULLTEXT request also going, the same request took 23 seconds.
I assume this is because the table is MYISAM and can't do row-level locks.
Our product table contains over 3.5 million records and will double by the end of the month. After that it should level off to 2-5% monthly increases.
CREATE TABLE product (
product_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
category_id INT UNSIGNED NOT NULL DEFAULT '0',
title VARCHAR (100) NOT NULL DEFAULT '',
short_desc VARCHAR (255) NOT NULL DEFAULT '',
description TEXT NOT NULL,
msrp DECIMAL (6,2) NOT NULL DEFAULT '000.00',
rating DECIMAL(3,2) NOT NULL DEFAULT '0.0',
reviews INT UNSIGNED NOT NULL DEFAULT '0',
listed INT UNSIGNED NOT NULL DEFAULT '0',
sku VARCHAR(75) NOT NULL DEFAULT '0',
upc VARCHAR(20) NOT NULL DEFAULT '0',
updateddate DATETIME NOT NULL,
PRIMARY KEY (product_id),
KEY title (title),
KEY category_id (category_id),
KEY listed (listed),
KEY mfrg_id (mfrg_id),
KEY identifier (identifier),
FULLTEXT INDEX (title),
FULLTEXT INDEX (description)
) ENGINE = MYISAM;
The database runs on a dedicated server that only hosts our site. We are planning to move the database into a replication structure with a Dual Proc, 16gb RAM server for the query box [slave] and the current "web" server handling the writes [dual proc, 4gb ram].
I'm not DB expert [clearly] and from researching have become warry of running InnoDB at the same time as MYISAM [replication & backup implications?] but it does seem like splitting the product table to house main information [innodb] and the fulltext descriptions [myisam] seperately may help dramatically?
If you have an idea and need more info please comment and I will provide back more details.
Thank you
You're exact about MyIsam. It's table-lock instead of row-lock like innoDB.
So from what you say, the query (full text) is taking a lot of time, so the update need to wait before being done.
You should really consider to switch to innoDB ( easy solution ) or switch your full text search to somewhere else like solr , elastic search, sphinx.
Also, you should check your slow_query log and optimize all those query.