I am currently building an aggregate mySQL table which is build based on 4 different tables. The largest table (accel) has 7.8mil rows and the other 3 have less than 5mil rows. The tables that I am using have duplicate rows i.e.
creatorId, capabilities, frequency_MHz, rssi, dutyCycleLevel
'X^6%g9#tg!Q:]0uqkwcOc)==', '[WEP]', '2412', '-72', '-3733'
'X^6%g9#tg!Q:]0uqkwcOc)==', '[WEP]', '2437', '-54', '-3733'
All 4 tables contain a creatorId and a dutyCycleLevel. Depending on the other values stored in other columns, I am doing some operations on the values and then copying the result in my new table. Everything has to be grouped by the creatorId and the dutyCycleLevel of one main table (namely 'accel') such that we only get one final creatorId along with one final dutyCycleLevel for all the duplicates (i.e. if 'abc' appears 10 times in 10 different rows, 'abc' will only appear once in the final table). The problem that I am encountering is the huge processing time to build the table. I have left my machine overnight and it is still not completed (it has been running for 24 hours now). Here is my query:
DROP TABLE `BoxCounting_aggregate`;
CREATE TABLE `SHED5`.`BoxCounting_aggregate` (
`creatorId` VARCHAR(55) DEFAULT NULL COMMENT '',
`timestamp` DATETIME DEFAULT NULL COMMENT '',
`latitude` DOUBLE NULL DEFAULT NULL COMMENT '',
`longitude` DOUBLE NULL DEFAULT NULL COMMENT '',
`norm_accel` FLOAT NULL DEFAULT NULL COMMENT '',
`std_dev_accel` FLOAT NULL DEFAULT NULL COMMENT '',
`batteryStatus` FLOAT NULL DEFAULT NULL COMMENT '',
`wifi_seen` INT(11) NULL DEFAULT NULL COMMENT '',
`dutyCycle` INT(11) NULL DEFAULT NULL COMMENT ''
);
INSERT INTO BoxCounting_aggregate
(
creatorId, timestamp, latitude, longitude, norm_accel, std_dev_accel, batteryStatus, wifi_seen, dutyCycle
)
(
SELECT
location.creatorId,
location.timestamp,
sqrt(pow(AVG(accel.accel_x),2)+pow(AVG(accel.accel_y),2)+pow(AVG(accel.accel_z),2)),
STD(accel.accel_z),
case battery.pluggedInDescription
when 'Not Plugged' then 0
when 'Plugged USB' then 0.5
when 'Plugged AC' then 1
else null
end,
COUNT(wifi.dutyCycleLevel),
location.dutyCycleLevel
FROM SHED5.location, SHED5.accel, SHED5.wifi, SHED5.battery
GROUP BY location.creatorId, location.dutyCycleLevel
);
I am grouping by creatorId and dutyCycle level since those two columns are the most important to keep record of. I am using AVG on latitude and longitude since I want the averaged location of all the records stored in the table. Like I said, creatorId and dutyCycleLevel appear multiple times. I do not think there is anything syntactically wrong with my query but it is definitely an inefficient way of doing what I am trying to do. All 4 tables have indexes but do not have primary keys since, by having creatorId as the main column containing duplicates, I cannot use it as the primary key. Any suggestions for improving the processing time? or anything that I have to sintactically change to the query?
Related
I have a table for storing stats. Currently this is populated with about 10 million rows at the end of the day then copied to daily stats table and deleted. For this reason I can't have an auto-incrementing primary key.
This is the table structure:
CREATE TABLE `stats` (
`shop_id` int(11) NOT NULL,
`title` varchar(255) CHARACTER SET latin1 NOT NULL,
`created` datetime NOT NULL,
`mobile` tinyint(1) NOT NULL DEFAULT '0',
`click` tinyint(1) NOT NULL DEFAULT '0',
`conversion` tinyint(1) NOT NULL DEFAULT '0',
`ip` varchar(20) CHARACTER SET latin1 NOT NULL,
KEY `shop_id` (`shop_id`,`created`,`ip`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
I have a key on shop_id, created, ip but I'm not sure what columns I should use to create the optimal index to increase lookup speeds any further?
The query below takes about 12 seconds with no key and about 1.5 seconds using the index above:
SELECT DATE(CONVERT_TZ(`created`, 'UTC', 'Australia/Brisbane')) AS `date`, COUNT(*) AS `views`
FROM `stats`
WHERE `created` <= '2017-07-18 09:59:59'
AND `shop_id` = '17515021'
AND `click` != 1
AND `conversion` != 1
GROUP BY DATE(CONVERT_TZ(`created`, 'UTC', 'Australia/Brisbane'))
ORDER BY DATE(CONVERT_TZ(`created`, 'UTC', 'Australia/Brisbane'));
If there is no column (or combination of columns) that is guaranteed unique, then do have an AUTO_INCREMENT id. Don't worry about truncating/deleting. (However, if the id does not reset, you probably need to use BIGINT, not INT UNSIGNED to avoid overflow.)
Don't use id as the primary key, instead, PRIMARY KEY(shop_id, created, id), INDEX(id).
That unconventional PK will help with performance in 2 ways, while being unique (due to the addition of id). The INDEX(id) is to keep AUTO_INCREMENT happy. (Whether you DELETE hourly or daily is a separate issue.)
Build a Summary table based on each hour (or minute). It will contain the count for such -- 400K/hour or 7K/minute. Augment it each hour (or minute) so that you don't have to do all the work at the end of the day.
The summary table can also filter on click and/or conversion. Or it could keep both, if you need them.
If click/conversion have only two states (0 & 1), don't say != 1, say = 0; the optimizer is much better at = than at !=.
If they 2-state and you changed to =, then this becomes viable and much better: INDEX(shop_id, click, conversion, created) -- created must be last.
Don't bother with TZ when summarizing into the Summary table; apply the conversion later.
Better yet, don't use DATETIME, use TIMESTAMP so that you won't need to convert (assuming you have TZ set correctly).
After all that, if you still have issues, start over on the Question; there may be further tweaks.
In your where clause, Use the column first which will return the small set of results and so on and create the index in the same order.
You have
WHERE created <= '2017-07-18 09:59:59'
AND shop_id = '17515021'
AND click != 1
AND conversion != 1
If created will return the small number of set as compare to other 3 columns then you are good otherwise you that column at first position in your where clause then select the second column as per the same explanation and create the index as per you where clause.
If you think order is fine then create an index
KEY created_shopid_click_conversion (created,shop_id, click, conversion);.
I have a table with 8 columns, as shown below in the create statement.
Rows have to be unique, that is, no two rows can have the exact same value in each column. To this end I defined each column to be a Primary Key.
However, performing a select as show below takes extremely long as, i suppose, MySQL will have to scan each row to find results. As the table is pretty large, this takes a lot of time.
Do you have any suggestion how I could increase performance?
EDIT create statement:
CREATE TABLE `volatilities` (
`instrument` varchar(45) NOT NULL DEFAULT '',
`baseCurrencyId` int(11) NOT NULL,
`tenor` varchar(45) NOT NULL DEFAULT '',
`tenorUnderlying` varchar(45) NOT NULL DEFAULT '',
`strike` double NOT NULL DEFAULT '0',
`evalDate` date NOT NULL DEFAULT '0000-00-00',
`volatility` double NOT NULL DEFAULT '0',
`underlying` varchar(45) NOT NULL DEFAULT '',
PRIMARY KEY (`instrument`,`baseCurrencyId`,`tenor`,`tenorUnderlying`,`strike`,`evalDate`,`volatility`,`underlying`)) ENGINE=InnoDB DEFAULT CHARSET=utf8
Select statement:
SELECT evalDate,
max(case when strike = 0.25 then volatility end) as '0.25'
FROM niels_testdb.volatilities
WHERE
instrument = 'Swaption' and tenor = '3M'
and tenorUnderlying = '3M' and strike = 0.25
GROUP BY
evalDate
One of your requirements is that all the rows need to have unique values. So that is why you created the table with composite primary keys for all columns. But your table WOULD allow duplicated values for every column, as long as the rows themselves were unique.
Take a look at this sql fiddler post: http://sqlfiddle.com/#!2/85ae6
In there you'll see that the column instrument and tenor do have duplicate values.
I'd say you need to investigate more how unique keys work and what primary keys are.
My suggestion is to re-think your requirements and investigate what needs to be unique and why and have a different structure to support your decision. Composite primary keys, in this case, is not the way to go.
I have 1 main table and two tables that hold multiple dinamyc information about the first table.
The first table called 'items' holds main information. Then there are two tables (ratings and indexes) that holds information about some values for dinamyc count of auditories and time period.
What i want:
When I query for those items, I want result to have an additional column names from ratings and indexes tables.
I have the code like this
SELECT items.*, ratings.val AS rating, indexes.val AS idx
FROM items,ratings,indexes
WHERE items.date>=1349902800000 AND items.date <=1349989199000
AND ratings.period_start <= items.date
AND ratings.period_end > items.date
AND ratings.auditory = 'kids'
AND indexes.period_start <= items.date
AND indexes.period_end > items.date
AND indexes.auditory = 'kids'
ORDER BY indexes.added, ratings.added DESC
The tables look something like this
items:
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(200) DEFAULT NULL,
`date` bigint(40) DEFAULT NULL
PRIMARY KEY (`id`)
ratings:
`id` bigint(50) NOT NULL AUTO_INCREMENT,
`period_start` bigint(50) DEFAULT NULL,
`period_end` bigint(50) DEFAULT NULL,
`val` float DEFAULT NULL,
`auditory` varchar(200) DEFAULT NULL,
`added` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
All dates except 'added' fields which are simple TIMESTAMPS are BIGINT format - miliseconds from whatever date it is in AS3 when you do Date.getTime();
So - what is the correct way to get this acomplished?
The only thing I'm not seeing is the unique correlation of any individual ITEM to its ratings... I would think the ratings table would need an "ItemID" to link back to items. As it stands now, if you have 100 items within a given time period say 3 months... and just add all the ratings / reviews, but don't associate those ratings to the actual Item, you are stuck. Put the ItemID in and add that to your WHERE condition and you should be good to go.
Simplifying the database/table structure i have a situation with two tables where we store 'items' and item properties (the relation between the two is 1-N)
I'm trying to optimize the following query, which fetches latest items being in the hotdeals section. To do that we have item_property table which stores items sections along with many other item metadata
NOTE: table structure can't be changed to optimize the query, ie: we can't simply add the section as a column in the item table as we can have unlimited amount of sections for each item.
Here's the Structure of both tables:
CREATE TABLE `item` (
`iditem` int(11) unsigned NOT NULL AUTO_INCREMENT,
`itemname` varchar(200) NOT NULL,
`desc` text NOT NULL,
`ok` int(11) NOT NULL DEFAULT '10',
`date_created` datetime NOT NULL,
PRIMARY KEY (`iditem`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
CREATE TABLE `item_property` (
`iditem` int(11) unsigned NOT NULL,
`proptype` varchar(64) NOT NULL DEFAULT '',
`propvalue` varchar(200) NOT NULL DEFAULT '',
KEY `iditem` (`iditem`,`proptype`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
And here's the query:
SELECT *
FROM item
JOIN item_property ON item.iditem=item_property.iditem
WHERE
item.ok > 70
AND item_property.proptype='section'
AND item_property.propvalue = 'hotdeals'
ORDER BY item.date_created desc
LIMIT 20
Which would be the best indexes to optimize this query?
Right now the optimizer (Explain) will use temporary and filesort, processing a Ton of rows (the size of the join)
Tables are both MyIsam at the moment, but can be changed to InnoDB if its really necessary to optimize the queries
Thanks
What is the type of item_property.idOption and item_property.type columns?
If they contain a limited number of options - make them ENUM (if they are not already). Enum values are indexed automatically.
And (of course) you should have item_property.iditem and item.date_created columns indexed also. This will increase the size of the tables, but will considerably fasten the queries that join and sort by these fields.
A note about data correctness:
One of the big benefits of a NOT NULL is to prevent your program from creating a row that doesn't have all columns properly specified. Having a DEFAULT renders that useless.
Is it ever OK to have a blank proptype or propvalue? What does a blank in those fields mean? If it's OK to not have a proptype set, then remove the NOT NULL constraint. If you must always have a proptype set, then having DEFAULT '' will not save you from the case of inserting into the row but forgetting to set proptype.
In most cases, you want either NOT NULL or DEFAULT 'something' on your columns, but not both.
First.. here are the two tables I've created (sans irrelevant columns)..
CREATE TABLE users_history1 (
circuit tinyint(1) unsigned NOT NULL default '0',
userh_season smallint(4) unsigned NOT NULL default '0',
userh_userid int(11) unsigned NOT NULL default '0',
userh_rank varchar(2) NOT NULL default 'D',
userh_wins int(11) NOT NULL default '0',
userh_losses int(11) NOT NULL default '0',
userh_points int(11) NOT NULL default '1000',
KEY (circuit, userh_userid),
KEY (userh_season)
) ENGINE=MyISAM;
CREATE TABLE users_ladders1 (
circuit tinyint(1) unsigned NOT NULL default '0',
userl_userid int(11) unsigned NOT NULL default '0',
userl_rank char(2) NOT NULL default 'D',
userl_wins smallint(3) NOT NULL default '0',
userl_losses smallint(3) NOT NULL default '0',
userl_points smallint(4) unsigned NOT NULL default '1000',
PRIMARY KEY (circuit, userl_userid),
KEY (userl_userid)
) ENGINE=MyISAM;
Some background.. these tables hold data for a competitive ladder where players are compared against each other on an ordered standings by points. users_history1 is a table that contains records stored from previous seasons. users_ladders1 contains records from the current season. I'm trying to create a page on my site where players are ranked on the average points of their previous records and current record. Here is the main standings for a 1v1 ladder:
http://vilegaming.com/league.x/standings1/3
I want to select from the database from the two tables an ordered list players depending on their average points from their users_ladders1 and users_history1 records. I really have no idea how to select from two tables in one query, but I'll try, as generic as possible, to illustrate it..
Using hyphens throughout the examples since SO renders it weird.
SELECT userh-points
FROM users-history1
GROUP BY userh-userid
ORDER BY (total userh-points for the user)
Needs the GROUP BY since some players may have played in multiple previous seasons.
SELECT userl-points
FROM users-ladders1
ORDER BY userl-points
I want to be able to combine both tables in a query so I can get the data in form of rows ordered by total points, and if possible also divide the total points by the number of unique records for the player so I can get the average.
You'll want to use a UNION SELECT:
SELECT p.id, COUNT(p.id), SUM(p.points)
FROM (SELECT userh_userid AS id, userh_points AS points
FROM users_history1
UNION SELECT userl_userid, userl_points
FROM users_ladders1) AS p
GROUP BY p.id
The sub query is the important part. It will give you a single table with the results of both the current and history tables combined. You can then select from that table and do COUNT and SUM to get your averages.
My MySQL syntax is quite rusty, so please excuse it. I haven't had a chance to run this, so I'm not even sure if it executes, but it should be enough to get you started.
If you want to merge to table and you want to select particular column from one table and in another table want to select all.
e.g.
Table name = test1 , test2
query:
SELECT test1.column1,test1.column2, test2.* FROM test1 ,test2
if you want to merge with particular column
query:
SELECT test1.column1,test1.column2, test2.* FROM test1 ,test2 where test2.column3='(what ever condition u want to pass)'
Select col1 from test1 where id = '1'
union
select * from table2
this one can also used for the joining to tables.