Simplifying the database/table structure i have a situation with two tables where we store 'items' and item properties (the relation between the two is 1-N)
I'm trying to optimize the following query, which fetches latest items being in the hotdeals section. To do that we have item_property table which stores items sections along with many other item metadata
NOTE: table structure can't be changed to optimize the query, ie: we can't simply add the section as a column in the item table as we can have unlimited amount of sections for each item.
Here's the Structure of both tables:
CREATE TABLE `item` (
`iditem` int(11) unsigned NOT NULL AUTO_INCREMENT,
`itemname` varchar(200) NOT NULL,
`desc` text NOT NULL,
`ok` int(11) NOT NULL DEFAULT '10',
`date_created` datetime NOT NULL,
PRIMARY KEY (`iditem`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
CREATE TABLE `item_property` (
`iditem` int(11) unsigned NOT NULL,
`proptype` varchar(64) NOT NULL DEFAULT '',
`propvalue` varchar(200) NOT NULL DEFAULT '',
KEY `iditem` (`iditem`,`proptype`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
And here's the query:
SELECT *
FROM item
JOIN item_property ON item.iditem=item_property.iditem
WHERE
item.ok > 70
AND item_property.proptype='section'
AND item_property.propvalue = 'hotdeals'
ORDER BY item.date_created desc
LIMIT 20
Which would be the best indexes to optimize this query?
Right now the optimizer (Explain) will use temporary and filesort, processing a Ton of rows (the size of the join)
Tables are both MyIsam at the moment, but can be changed to InnoDB if its really necessary to optimize the queries
Thanks
What is the type of item_property.idOption and item_property.type columns?
If they contain a limited number of options - make them ENUM (if they are not already). Enum values are indexed automatically.
And (of course) you should have item_property.iditem and item.date_created columns indexed also. This will increase the size of the tables, but will considerably fasten the queries that join and sort by these fields.
A note about data correctness:
One of the big benefits of a NOT NULL is to prevent your program from creating a row that doesn't have all columns properly specified. Having a DEFAULT renders that useless.
Is it ever OK to have a blank proptype or propvalue? What does a blank in those fields mean? If it's OK to not have a proptype set, then remove the NOT NULL constraint. If you must always have a proptype set, then having DEFAULT '' will not save you from the case of inserting into the row but forgetting to set proptype.
In most cases, you want either NOT NULL or DEFAULT 'something' on your columns, but not both.
Related
I am new to MySQL and databases overall. Is it possible two create a table where a column is a sum of two other columns from two other tables.
For instance if I have database `Books :
CREATE TABLE `books` (
`book_id` int(100) NOT NULL,
`book_name` varchar(20) NOT NULL,
`book_author` varchar(20) NOT NULL,
`book_co-authors` varchar(20) NOT NULL,
`book_edition` tinyint(4) NOT NULL,
`book_creation` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`book_amount` int(100) NOT NULL COMMENT 'Amount of book copies in both University libraries'
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
How can make column book_amount be a sum of the two book_amount columns from library1 and library2 tables where book_id = book_id?
Library1 :
CREATE TABLE `library1` (
`book_id` int(11) NOT NULL,
`book_amount` int(11) NOT NULL,
`available_amount` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
You can define a column with whatever type you want, so long as it's valid, and then populate it as you will with data from other tables. This is generally called "denormalizing" as under ideal circumstances you'll want that data stored in other tables and computed on demand so there's never a chance of your saved value and the source data falling out of sync.
You can also define a VIEW which is like a saved query that behaves as if it's a table. This can do all sorts of things, like dynamically query other tables, and presents the result as a column. A "materialized view" is something some databases support where the view is automatically updated and saved based on some implicit triggers. A non-materialized view is slower, but in your case the speed difference might not be a big deal.
So you have options in how you represent this.
One thing to note is that you should use INT as a default "integer" field, not wonky things like INT(100). The number for integer fields specifies how many significant digits you're expecting, and as INT can only store at most 11 this is wildly out of line.
Not directly, however there are a few ways to achieve what you're after.
Either create a psuedo column in your select clause which adds the other two columns
select *, columna+columnb AS `addition` from books
Don't forget to swap out columna and columnb to the name of the columns, and addition to the name you'd like the psuedo column to have.
Alternatively, you could use a view to auto add the psuedo field in the same way. However, views do not have indexes, so performing lookups in them and joining them can get rather slow very easily.
You could also use triggers to set the values upon insert and update, or simply calculate the value within the language that inserts into the DB.
Following query will work if library1 and library2 table has similar schema as table books:
Insert into books
select l1.book_id,
l1.book_name,
l1.book_authors,
l1.book_co_authors,
l1.book_edition,
l1.book_creation,
(l1.book_amount + l2.book_amount)
from library1 l1
inner join library2 l2
on l1.book_id = l2.book_id
group by l1.book_id
;
I have a table for storing stats. Currently this is populated with about 10 million rows at the end of the day then copied to daily stats table and deleted. For this reason I can't have an auto-incrementing primary key.
This is the table structure:
CREATE TABLE `stats` (
`shop_id` int(11) NOT NULL,
`title` varchar(255) CHARACTER SET latin1 NOT NULL,
`created` datetime NOT NULL,
`mobile` tinyint(1) NOT NULL DEFAULT '0',
`click` tinyint(1) NOT NULL DEFAULT '0',
`conversion` tinyint(1) NOT NULL DEFAULT '0',
`ip` varchar(20) CHARACTER SET latin1 NOT NULL,
KEY `shop_id` (`shop_id`,`created`,`ip`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
I have a key on shop_id, created, ip but I'm not sure what columns I should use to create the optimal index to increase lookup speeds any further?
The query below takes about 12 seconds with no key and about 1.5 seconds using the index above:
SELECT DATE(CONVERT_TZ(`created`, 'UTC', 'Australia/Brisbane')) AS `date`, COUNT(*) AS `views`
FROM `stats`
WHERE `created` <= '2017-07-18 09:59:59'
AND `shop_id` = '17515021'
AND `click` != 1
AND `conversion` != 1
GROUP BY DATE(CONVERT_TZ(`created`, 'UTC', 'Australia/Brisbane'))
ORDER BY DATE(CONVERT_TZ(`created`, 'UTC', 'Australia/Brisbane'));
If there is no column (or combination of columns) that is guaranteed unique, then do have an AUTO_INCREMENT id. Don't worry about truncating/deleting. (However, if the id does not reset, you probably need to use BIGINT, not INT UNSIGNED to avoid overflow.)
Don't use id as the primary key, instead, PRIMARY KEY(shop_id, created, id), INDEX(id).
That unconventional PK will help with performance in 2 ways, while being unique (due to the addition of id). The INDEX(id) is to keep AUTO_INCREMENT happy. (Whether you DELETE hourly or daily is a separate issue.)
Build a Summary table based on each hour (or minute). It will contain the count for such -- 400K/hour or 7K/minute. Augment it each hour (or minute) so that you don't have to do all the work at the end of the day.
The summary table can also filter on click and/or conversion. Or it could keep both, if you need them.
If click/conversion have only two states (0 & 1), don't say != 1, say = 0; the optimizer is much better at = than at !=.
If they 2-state and you changed to =, then this becomes viable and much better: INDEX(shop_id, click, conversion, created) -- created must be last.
Don't bother with TZ when summarizing into the Summary table; apply the conversion later.
Better yet, don't use DATETIME, use TIMESTAMP so that you won't need to convert (assuming you have TZ set correctly).
After all that, if you still have issues, start over on the Question; there may be further tweaks.
In your where clause, Use the column first which will return the small set of results and so on and create the index in the same order.
You have
WHERE created <= '2017-07-18 09:59:59'
AND shop_id = '17515021'
AND click != 1
AND conversion != 1
If created will return the small number of set as compare to other 3 columns then you are good otherwise you that column at first position in your where clause then select the second column as per the same explanation and create the index as per you where clause.
If you think order is fine then create an index
KEY created_shopid_click_conversion (created,shop_id, click, conversion);.
I have a table which is going to dynamically change. It might have thousands of columns in it.
I need a way to find the distinct values in each column whenever I run a SQL Query.
There is no possible justification for a table with this many columns. Your schema design is wrong.
With good hardware and appropriate indexes mysql can handle millions, or even hundreds of millions of rows, but columns is another story. Even if there was some design justification for thousands of columns there are physical limits within MySQL that would prohibit it, roughly as follows (my figures might not be up to date with the latest versions of MySQL - I'll try to verify later):
4096 columns: theoretical or 'hard' limit for number of columns.
2,829 columns : actual physical limit of a myisam table
1000 columns: hard limit for innodb
Assume you have a table as follows
CREATE TABLE `table_no1` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`column0001` INT(11) NOT NULL DEFAULT '0',
`column0002` INT(11) NOT NULL DEFAULT '0'
`column0003` INT(11) NOT NULL DEFAULT '0'
`column0004` INT(11) NOT NULL DEFAULT '0'
.
.
.
`column2000` INT(11) NOT NULL DEFAULT '0'
PRIMARY KEY (`id`)
);
i also assume that you have a good program to analyze this data and for that program you are requesting this query.
You can use following queries to get distinct values
select DISTINCT(column0001) from table_no1;
select DISTINCT(column0002) from table_no1;
But thats not efficient. So you need to create this query dynamically by using your program. For that you have to find column names dynamically. For that you can use following query
select COLUMN_NAME from information_schema.`COLUMNS`
where TABLE_SCHEMA='your_database_name' and TABLE_NAME='table_no1'
Above query will give column names of that table. From that you can create your distinct query dynamically and retrieve individual column distinct values.
I have a table with 8 columns, as shown below in the create statement.
Rows have to be unique, that is, no two rows can have the exact same value in each column. To this end I defined each column to be a Primary Key.
However, performing a select as show below takes extremely long as, i suppose, MySQL will have to scan each row to find results. As the table is pretty large, this takes a lot of time.
Do you have any suggestion how I could increase performance?
EDIT create statement:
CREATE TABLE `volatilities` (
`instrument` varchar(45) NOT NULL DEFAULT '',
`baseCurrencyId` int(11) NOT NULL,
`tenor` varchar(45) NOT NULL DEFAULT '',
`tenorUnderlying` varchar(45) NOT NULL DEFAULT '',
`strike` double NOT NULL DEFAULT '0',
`evalDate` date NOT NULL DEFAULT '0000-00-00',
`volatility` double NOT NULL DEFAULT '0',
`underlying` varchar(45) NOT NULL DEFAULT '',
PRIMARY KEY (`instrument`,`baseCurrencyId`,`tenor`,`tenorUnderlying`,`strike`,`evalDate`,`volatility`,`underlying`)) ENGINE=InnoDB DEFAULT CHARSET=utf8
Select statement:
SELECT evalDate,
max(case when strike = 0.25 then volatility end) as '0.25'
FROM niels_testdb.volatilities
WHERE
instrument = 'Swaption' and tenor = '3M'
and tenorUnderlying = '3M' and strike = 0.25
GROUP BY
evalDate
One of your requirements is that all the rows need to have unique values. So that is why you created the table with composite primary keys for all columns. But your table WOULD allow duplicated values for every column, as long as the rows themselves were unique.
Take a look at this sql fiddler post: http://sqlfiddle.com/#!2/85ae6
In there you'll see that the column instrument and tenor do have duplicate values.
I'd say you need to investigate more how unique keys work and what primary keys are.
My suggestion is to re-think your requirements and investigate what needs to be unique and why and have a different structure to support your decision. Composite primary keys, in this case, is not the way to go.
I am trying to generate a group query on a large table (more than 8 million rows). However I can reduce the need to group all the data by date. I have a view that captures that dates I require and this limits the query bu it's not much better.
Finally I need to join to another table to pick up a field.
I am showing the query, the create on the main table and the query explain below.
Main Query:
SELECT pgi_raw_data.wsp_channel,
'IOM' AS wsp,
pgi_raw_data.dated,
pgi_accounts.`master`,
pgi_raw_data.event_id,
pgi_raw_data.breed,
Sum(pgi_raw_data.handle),
Sum(pgi_raw_data.payout),
Sum(pgi_raw_data.rebate),
Sum(pgi_raw_data.profit)
FROM pgi_raw_data
INNER JOIN summary_max
ON pgi_raw_data.wsp_channel = summary_max.wsp_channel
AND pgi_raw_data.dated > summary_max.race_date
INNER JOIN pgi_accounts
ON pgi_raw_data.account = pgi_accounts.account
GROUP BY pgi_raw_data.event_id
ORDER BY NULL
The create table:
CREATE TABLE `pgi_raw_data` (
`event_id` char(25) NOT NULL DEFAULT '',
`wsp_channel` varchar(5) NOT NULL,
`dated` date NOT NULL,
`time` time DEFAULT NULL,
`program` varchar(5) NOT NULL,
`track` varchar(25) NOT NULL,
`raceno` tinyint(2) NOT NULL,
`detail` varchar(30) DEFAULT NULL,
`ticket` varchar(20) NOT NULL DEFAULT '',
`breed` varchar(12) NOT NULL,
`pool` varchar(10) NOT NULL,
`gross` decimal(11,2) NOT NULL,
`refunds` decimal(11,2) NOT NULL,
`handle` decimal(11,2) NOT NULL,
`payout` decimal(11,4) NOT NULL,
`rebate` decimal(11,4) NOT NULL,
`profit` decimal(11,4) NOT NULL,
`account` mediumint(10) NOT NULL,
PRIMARY KEY (`event_id`,`ticket`),
KEY `idx_account` (`account`),
KEY `idx_wspchannel` (`wsp_channel`,`dated`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=latin1
This is my view for summary_max:
CREATE ALGORITHM=UNDEFINED DEFINER=`root`#`localhost` SQL SECURITY DEFINER VIEW
`summary_max` AS select `pgi_summary_tbl`.`wsp_channel` AS
`wsp_channel`,max(`pgi_summary_tbl`.`race_date`) AS `race_date`
from `pgi_summary_tbl` group by `pgi_summary_tbl`.`wsp
And also the evaluated query:
1 PRIMARY <derived2> ALL 6 Using temporary
1 PRIMARY pgi_raw_data ref idx_account,idx_wspchannel idx_wspchannel
7 summary_max.wsp_channel 470690 Using where
1 PRIMARY pgi_accounts ref PRIMARY PRIMARY 3 gf3data_momutech.pgi_raw_data.account 29 Using index
2 DERIVED pgi_summary_tbl ALL 42282 Using temporary; Using filesort
Any help on indexing would help.
At a minimum you need indexes on these fields:
pgi_raw_data.wsp_channel,
pgi_raw_data.dated,
pgi_raw_data.account
pgi_raw_data.event_id,
summary_max.wsp_channel,
summary_max.race_date,
pgi_accounts.account
The general (not always) rule is anything you are sorting, grouping, filtering or joining on should have an index.
Also: pgi_summary_tbl.wsp
Also, why the order by null?
The first thing is to be sure that you have indexes on pgi_summary_table(wsp_channel, race_date) and pgi_accounts(account). For this query, you don't need indexes on these columns in the raw data.
MySQL has a tendency to use indexes even when they are not the most efficient path. I would start by looking at the performance of the "full" query, without the joins:
SELECT pgi_raw_data.wsp_channel,
'IOM' AS wsp,
pgi_raw_data.dated,
-- pgi_accounts.`master`,
pgi_raw_data.event_id,
pgi_raw_data.breed,
Sum(pgi_raw_data.handle),
Sum(pgi_raw_data.payout),
Sum(pgi_raw_data.rebate),
Sum(pgi_raw_data.profit)
FROM pgi_raw_data
GROUP BY pgi_raw_data.event_id
If this has better performance, you may have a situation where the indexes are working against you. The specific problem is called "thrashing". It occurs when a table is too bit to fit into memory. Often, the fastest way to deal with such a table is to just read the whole thing. Accessing the table through an index can result in an extra I/O operation for most of the rows.
If this works, then do the joins after the aggregate. Also, consider getting more memory, so the whole table will fit into memory.
Second, if you have to deal with this type of data, then partitioning the table by date may prove to be a very useful option. This will allow you to significantly reduce the overhead of reading the large table. You do have to be sure that the summary table can be read the same way.