First.. here are the two tables I've created (sans irrelevant columns)..
CREATE TABLE users_history1 (
circuit tinyint(1) unsigned NOT NULL default '0',
userh_season smallint(4) unsigned NOT NULL default '0',
userh_userid int(11) unsigned NOT NULL default '0',
userh_rank varchar(2) NOT NULL default 'D',
userh_wins int(11) NOT NULL default '0',
userh_losses int(11) NOT NULL default '0',
userh_points int(11) NOT NULL default '1000',
KEY (circuit, userh_userid),
KEY (userh_season)
) ENGINE=MyISAM;
CREATE TABLE users_ladders1 (
circuit tinyint(1) unsigned NOT NULL default '0',
userl_userid int(11) unsigned NOT NULL default '0',
userl_rank char(2) NOT NULL default 'D',
userl_wins smallint(3) NOT NULL default '0',
userl_losses smallint(3) NOT NULL default '0',
userl_points smallint(4) unsigned NOT NULL default '1000',
PRIMARY KEY (circuit, userl_userid),
KEY (userl_userid)
) ENGINE=MyISAM;
Some background.. these tables hold data for a competitive ladder where players are compared against each other on an ordered standings by points. users_history1 is a table that contains records stored from previous seasons. users_ladders1 contains records from the current season. I'm trying to create a page on my site where players are ranked on the average points of their previous records and current record. Here is the main standings for a 1v1 ladder:
http://vilegaming.com/league.x/standings1/3
I want to select from the database from the two tables an ordered list players depending on their average points from their users_ladders1 and users_history1 records. I really have no idea how to select from two tables in one query, but I'll try, as generic as possible, to illustrate it..
Using hyphens throughout the examples since SO renders it weird.
SELECT userh-points
FROM users-history1
GROUP BY userh-userid
ORDER BY (total userh-points for the user)
Needs the GROUP BY since some players may have played in multiple previous seasons.
SELECT userl-points
FROM users-ladders1
ORDER BY userl-points
I want to be able to combine both tables in a query so I can get the data in form of rows ordered by total points, and if possible also divide the total points by the number of unique records for the player so I can get the average.
You'll want to use a UNION SELECT:
SELECT p.id, COUNT(p.id), SUM(p.points)
FROM (SELECT userh_userid AS id, userh_points AS points
FROM users_history1
UNION SELECT userl_userid, userl_points
FROM users_ladders1) AS p
GROUP BY p.id
The sub query is the important part. It will give you a single table with the results of both the current and history tables combined. You can then select from that table and do COUNT and SUM to get your averages.
My MySQL syntax is quite rusty, so please excuse it. I haven't had a chance to run this, so I'm not even sure if it executes, but it should be enough to get you started.
If you want to merge to table and you want to select particular column from one table and in another table want to select all.
e.g.
Table name = test1 , test2
query:
SELECT test1.column1,test1.column2, test2.* FROM test1 ,test2
if you want to merge with particular column
query:
SELECT test1.column1,test1.column2, test2.* FROM test1 ,test2 where test2.column3='(what ever condition u want to pass)'
Select col1 from test1 where id = '1'
union
select * from table2
this one can also used for the joining to tables.
Related
I have to insert data in one temporary table using two table Joins.It is taking 30 min to executes completely as one of my table having hist_data_app (approximately 300 million) records .I would like to know how i more optimized the query to make it faster.
First Table contains the changelog data with some particular data and other table is having all the data related to that.Below is my both table creation statement.
CREATE TABLE `hist_data_app` (
`product_id` MEDIUMINT(8) UNSIGNED NOT NULL DEFAULT '0',
`application_id` INT(10) UNSIGNED NOT NULL DEFAULT '0',
`year_id` TINYINT(3) UNSIGNED NOT NULL DEFAULT '0',
`history_id` BIGINT(20) UNSIGNED NOT NULL DEFAULT '0',
`field_name` VARCHAR(60) NOT NULL COLLATE 'utf8_unicode_ci',
`old_value` TEXT NOT NULL COLLATE 'utf8_unicode_ci',
`new_value` TEXT NOT NULL COLLATE 'utf8_unicode_ci',
`comments` TEXT NOT NULL,
INDEX `ps` (`product_id`, `history_id`)
)
CREATE TABLE `histry_log` (
`history_id` BIGINT(20) UNSIGNED NOT NULL DEFAULT '0',
`history_hash` CHAR(32) NOT NULL COLLATE 'utf8_unicode_ci',
`type` ENUM('products','brands','partnames','mc_partnames','applications') NOT NULL,
`user_id` SMALLINT(5) UNSIGNED NOT NULL DEFAULT '0',
`stamp` TIMESTAMP NOT NULL DEFAULT '0000-00-00 00:00:00',
`source` TINYINT(1) UNSIGNED NOT NULL DEFAULT '0',
`source_data` TEXT NOT NULL COLLATE 'utf8_unicode_ci',
`description` TEXT NOT NULL COLLATE 'utf8_unicode_ci',
PRIMARY KEY (`history_id`),
INDEX `Types` (`type`)
)
This is my explain result
EXPLAIN
SELECT DISTINCT a.product_id
, a.history_id
, a.comments
, a.field_name
FROM history_log b
JOIN hist_data_app a
ON a.history_id = b.history_id
GROUP
BY product_id;
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE a ALL NULL NULL NULL NULL 278327646 Using temporary; Using filesort
1 SIMPLE b eq_ref PRIMARY PRIMARY 8 LONGBOW_data.a.history_id 1 Using index
history_app_data table
product_id application_id year_id history_id
598865023 12813220945 92 16777304
598865023 12813220945 93 16777304
598865023 12813221222 93 16777304
598865023 12815428123 94 16777304
598865023 12813221833 92 16777304
598865023 12813221833 93 16777304
598865023 12815457549 92 16777304
598865023 12815457549 93 16777304
598865023 12815457549 94 16777304
Based on your response to my comment, you should be good with what you have setup now. Since your primary consideration would be a single product, your Hist_Data_App table has the index on product_id in the first position which would be what you want, but also having the history_id as well to join to the log table will help. Just add your where clause and test for some products.
SELECT
hd.product_id,
hd.history_id,
hd.comments,
hd.field_name
FROM
hist_data_app hd
where
-- or whatever single product ID you want
hd.product_id = 12345
Now, I don't know if you really need distinct, but can add that back in no problem. Since you are not pulling any columns from the log table, you don't even need to join to that table either. Since you have no aggregations (such as sum, count, avg, etc), you don't need a group by. I think you had that just in your sample query while pulling ALL your data down so you would not have 300 million rows returned.
Now, to get a better test on time, you may want to sample with some of the products that DO have the most records to see how much time for a single product that literally DOES have the most records. To get that, you may want
SELECT
hd.product_id,
count(*) totalRecsPerProduct
FROM
hist_data_app hd
group by
hd.product_id
order by
count(*) desc
limit 10
This will get the top 10 products with the most records out of your 300 million, then you can run the prior query against these and see how much REAL time it takes to get the results back. I think you will see the performance is fine going with your single need product.
The query is ill-formed; we should not discuss it until you fix that problem. Read about "only_full_group_by".
It is perhaps never 'correct' to have both GROUP BY and DISTINCT in the same SELECT.
You have the "explode-implode" syndrome. This is when you do a JOIN, which builds a large temp table, followed by a GROUP BY to shrink back to what you perhaps had to start with in one of the tables.
The resultset is still huge; what do you plan to do with the result?
You really should have a PRIMARY KEY on every table. If this is unique (history_id, product_id), make it the PK. (Note that I swapped the order, as discussed in other Comments.)
year_id -- Is that the normalization of a YEAR? Not worth it. Simply have the year in the table; no extra lookup.
How big is the other table? (I may be barking up the wrong Optimization.)
This might give you the same data, but a lot faster:
SELECT a.product_id , a.history_id , a.comments , a.field_name
FROM ( SELECT DISTINCT history_id FROM history_log ) AS b
JOIN hist_data_app a ON a.history_id = b.history_id
GROUP BY product_id;
You will have to make some structural changes to the tables; plan for some downtime.
If you only need data on Hammers, please show us the WHERE clause that would limit the output. Optimizing for that query may be significantly different than the 30-minute query in your Question!
Fix most of what I and others have suggested, then come back with a new Question with fresh schema, etc. (This Q&A is getting too messy to keep going.)
I am currently building an aggregate mySQL table which is build based on 4 different tables. The largest table (accel) has 7.8mil rows and the other 3 have less than 5mil rows. The tables that I am using have duplicate rows i.e.
creatorId, capabilities, frequency_MHz, rssi, dutyCycleLevel
'X^6%g9#tg!Q:]0uqkwcOc)==', '[WEP]', '2412', '-72', '-3733'
'X^6%g9#tg!Q:]0uqkwcOc)==', '[WEP]', '2437', '-54', '-3733'
All 4 tables contain a creatorId and a dutyCycleLevel. Depending on the other values stored in other columns, I am doing some operations on the values and then copying the result in my new table. Everything has to be grouped by the creatorId and the dutyCycleLevel of one main table (namely 'accel') such that we only get one final creatorId along with one final dutyCycleLevel for all the duplicates (i.e. if 'abc' appears 10 times in 10 different rows, 'abc' will only appear once in the final table). The problem that I am encountering is the huge processing time to build the table. I have left my machine overnight and it is still not completed (it has been running for 24 hours now). Here is my query:
DROP TABLE `BoxCounting_aggregate`;
CREATE TABLE `SHED5`.`BoxCounting_aggregate` (
`creatorId` VARCHAR(55) DEFAULT NULL COMMENT '',
`timestamp` DATETIME DEFAULT NULL COMMENT '',
`latitude` DOUBLE NULL DEFAULT NULL COMMENT '',
`longitude` DOUBLE NULL DEFAULT NULL COMMENT '',
`norm_accel` FLOAT NULL DEFAULT NULL COMMENT '',
`std_dev_accel` FLOAT NULL DEFAULT NULL COMMENT '',
`batteryStatus` FLOAT NULL DEFAULT NULL COMMENT '',
`wifi_seen` INT(11) NULL DEFAULT NULL COMMENT '',
`dutyCycle` INT(11) NULL DEFAULT NULL COMMENT ''
);
INSERT INTO BoxCounting_aggregate
(
creatorId, timestamp, latitude, longitude, norm_accel, std_dev_accel, batteryStatus, wifi_seen, dutyCycle
)
(
SELECT
location.creatorId,
location.timestamp,
sqrt(pow(AVG(accel.accel_x),2)+pow(AVG(accel.accel_y),2)+pow(AVG(accel.accel_z),2)),
STD(accel.accel_z),
case battery.pluggedInDescription
when 'Not Plugged' then 0
when 'Plugged USB' then 0.5
when 'Plugged AC' then 1
else null
end,
COUNT(wifi.dutyCycleLevel),
location.dutyCycleLevel
FROM SHED5.location, SHED5.accel, SHED5.wifi, SHED5.battery
GROUP BY location.creatorId, location.dutyCycleLevel
);
I am grouping by creatorId and dutyCycle level since those two columns are the most important to keep record of. I am using AVG on latitude and longitude since I want the averaged location of all the records stored in the table. Like I said, creatorId and dutyCycleLevel appear multiple times. I do not think there is anything syntactically wrong with my query but it is definitely an inefficient way of doing what I am trying to do. All 4 tables have indexes but do not have primary keys since, by having creatorId as the main column containing duplicates, I cannot use it as the primary key. Any suggestions for improving the processing time? or anything that I have to sintactically change to the query?
I have 1 main table and two tables that hold multiple dinamyc information about the first table.
The first table called 'items' holds main information. Then there are two tables (ratings and indexes) that holds information about some values for dinamyc count of auditories and time period.
What i want:
When I query for those items, I want result to have an additional column names from ratings and indexes tables.
I have the code like this
SELECT items.*, ratings.val AS rating, indexes.val AS idx
FROM items,ratings,indexes
WHERE items.date>=1349902800000 AND items.date <=1349989199000
AND ratings.period_start <= items.date
AND ratings.period_end > items.date
AND ratings.auditory = 'kids'
AND indexes.period_start <= items.date
AND indexes.period_end > items.date
AND indexes.auditory = 'kids'
ORDER BY indexes.added, ratings.added DESC
The tables look something like this
items:
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(200) DEFAULT NULL,
`date` bigint(40) DEFAULT NULL
PRIMARY KEY (`id`)
ratings:
`id` bigint(50) NOT NULL AUTO_INCREMENT,
`period_start` bigint(50) DEFAULT NULL,
`period_end` bigint(50) DEFAULT NULL,
`val` float DEFAULT NULL,
`auditory` varchar(200) DEFAULT NULL,
`added` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
All dates except 'added' fields which are simple TIMESTAMPS are BIGINT format - miliseconds from whatever date it is in AS3 when you do Date.getTime();
So - what is the correct way to get this acomplished?
The only thing I'm not seeing is the unique correlation of any individual ITEM to its ratings... I would think the ratings table would need an "ItemID" to link back to items. As it stands now, if you have 100 items within a given time period say 3 months... and just add all the ratings / reviews, but don't associate those ratings to the actual Item, you are stuck. Put the ItemID in and add that to your WHERE condition and you should be good to go.
I've been working on a small Perl program that works with a table of articles, displaying them to the user if they have not been already read. It has been working nicely and it has been quite speedy, overall. However, this afternoon, the performance has degraded from fast enough that I wasn't worried about optimizing the query to a glacial 3-4 seconds per query. To select articles, I present this query:
SELECT channelitem.ciid, channelitem.cid, name, description, url, creationdate, author
FROM `channelitem`
WHERE ciid NOT
IN (
SELECT ciid
FROM `uninet_channelitem_read`
WHERE uid = '1030'
)
AND (
cid =117
OR cid =308
OR cid =310
)
ORDER BY `channelitem`.`creationdate` DESC
LIMIT 0 , 100
The list of possible cid's varies and could be quite a bit more. In any case, I noted that about 2-3 seconds of the total time to make the query is devoted to "ORDER BY." If I remove that, it only takes about a half second to give me the query back. If I drop the subquery, the performance goes back to normal... but the subquery didn't seem to be problematic until just this afternoon, after working fine for a week or so.
Any ideas what could be slowing it down so much? What might I do to try to get the performance back up to snuff? The table being queried has 45,000 rows. The subquery's table has fewer than 3,000 rows at present.
Update: Incidentally, if anyone has suggestions on how to do multiple queries or some other technique that would be more efficient to accomplish what I am trying to do, I am all ears. I'm really puzzled how to solve the problem at this point. Can I somehow apply the order by before the join to make it apply to the real table and not the derived table? Would that be more efficient?
Here is the latest version of the query, derived from suggestions from #Gordon, below
SELECT channelitem.ciid, channelitem.cid, name, description, url, creationdate, author
FROM `channelitem`
LEFT JOIN (
SELECT ciid, dateRead
FROM `uninet_channelitem_read`
WHERE uid = '1030'
)alreadyRead ON channelitem.ciid = alreadyRead.ciid
WHERE (
alreadyRead.ciid IS NULL
)
AND `cid`
IN ( 6648, 329, 323, 6654, 6647 )
ORDER BY `channelitem`.`creationdate` DESC
LIMIT 0 , 100
Also, I should mention what my db structure looks like with regards to these two tables -- maybe someone can spot something odd about the structure:
CREATE TABLE IF NOT EXISTS `channelitem` (
`newsversion` int(11) NOT NULL DEFAULT '0',
`cid` int(11) NOT NULL DEFAULT '0',
`ciid` int(11) NOT NULL AUTO_INCREMENT,
`description` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`url` varchar(222) DEFAULT NULL,
`creationdate` datetime DEFAULT NULL,
`urgent` varchar(10) DEFAULT NULL,
`name` varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`lastchanged` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`author` varchar(255) NOT NULL,
PRIMARY KEY (`ciid`),
KEY `newsversion` (`newsversion`),
KEY `cid` (`cid`),
KEY `creationdate` (`creationdate`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1638554365 ;
CREATE TABLE IF NOT EXISTS `uninet_channelitem_read` (
`ciid` int(11) NOT NULL,
`uid` int(11) NOT NULL,
`dateRead` datetime NOT NULL,
PRIMARY KEY (`ciid`,`uid`),
KEY `ciid` (`ciid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
It never hurts to try the left outer join version of such a query:
SELECT ci.ciid, ci.cid, ci.name, ci.description, ci.url, ci.creationdate, ci.author
FROM `channelitem` ci left outer join
(SELECT ciid
FROM `uninet_channelitem_read`
WHERE uid = '1030'
) cr
on ci.ciid = cr.ciid
where cr.ciid is null and
ci.cid in (117, 308, 310)
ORDER BY ci.`creationdate` DESC
LIMIT 0 , 100
This query will be faster with an index on uninet_channelitem_read(ciid) and probably on channelitem(cid, ciid, createddate).
The problem could be that you need to create an index on the channelitem table for the column creationdate. Indexes help a database to run queries faster. Here is a link about MySQL Indexing
I have a problem constructing a mysql query:
I have this table "tSubscribers" were I store the subscribers for my newsletter mailing list.
The table looks like this (simplified):
--
-- Table structure for tSubscriber
--
CREATE TABLE tSubscriber (
fId INT UNSIGNED NOT NULL AUTO_INCREMENT,
fSubscriberGroupId INT UNSIGNED NOT NULL,
fEmail VARCHAR(255) NOT NULL DEFAULT '',
fDateConfirmed DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00',
fDateUnsubscribed TIMESTAMP NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (fId),
INDEX (fSubscriberGroupId),
) ENGINE=MyISAM;
Now what I want to accomplish is to have a diagram showing the subscriptions and unsubscriptions per month per subscriber group.
So I need to extract the year and months from the fDateConfirmed, fDateUnsubscribed dates, count them and show the count sorted by month and year for a subscriber group.
I think this sql query gets quite complex and I just can't get my head around it. Is this even possible with one query.
You will need two separate queries, one for subscriptions and other for unsubscriptions.
SELECT COUNT(*), YEAR(fDateConfirmed), MONTH(fDateConfirmed) FROM tSubscriber GROUP BY YEAR(fDateConfirmed), MONTH(fDateConfirmed)
SELECT COUNT(*), YEAR(fDateUnsubscribed), MONTH(fDateUnsubscribe ) FROM tSubscriber GROUP BY YEAR(fDateUnsubscribed), MONTH(fDateUnsubscribed)