I have a MySQL query I wrote that displays the data I want it to, but it takes at least 30 secs - 1 min to run.
I researched to find out how to created the nested SELECT query with the COUNT that I needed in order to display the data I required. The SQL is also part of a web page I have, and when I go from page to page it takes the same amount of time to load. I am sure there is a more efficient way to write the query so it loads fast, as there are only about 1,500 records in the ttb_shows table and about 11k in the ttb_books table. Below is the query.
-- DDL
CREATE TABLE `ttb_books` (
`book_id` int(11) NOT NULL,
`book_name` varchar(255) NOT NULL DEFAULT '',
`cover_image` varchar(255) DEFAULT NULL,
`show_id` int(11) NOT NULL DEFAULT '0',
`state_id` int(7) NOT NULL DEFAULT '0',
`notes` text,
`year` varchar(255) DEFAULT NULL,
`publisher` varchar(255) DEFAULT NULL,
`status_id` int(7) NOT NULL DEFAULT '0',
`no_pages` varchar(255) DEFAULT NULL,
`footer` text,
`opt1` varchar(255) NOT NULL DEFAULT '$5-10',
`opt2` varchar(255) DEFAULT NULL,
`opt3` varchar(255) DEFAULT NULL,
`opt4` varchar(255) DEFAULT NULL,
`opt5` varchar(255) DEFAULT NULL,
`owned` int(1) DEFAULT '0'
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `ttb_shows` (
`show_id` int(11) NOT NULL,
`show_name` varchar(255) NOT NULL DEFAULT '',
`date_added` datetime NOT NULL DEFAULT '0000-00-00 00:00:00'
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
-- QUERY
SELECT ttb_shows.show_id, ttb_shows.show_name, ttb_shows.date_added,
COUNT(ttb_books.book_id) AS books,
(SELECT COUNT(ttb_books.owned) AS owned FROM ttb_books WHERE (owned=1 AND ttb_books.show_id = ttb_shows.show_id))
FROM ttb_shows LEFT JOIN ttb_books ON ttb_shows.show_id = ttb_books.show_id
GROUP BY ttb_shows.show_id, ttb_shows.show_name, ttb_shows.date_added
Thank you to all who are able to help with this. It is really appreciated!
You could avoid the subquery for owner using sum based on case
SELECT ttb_shows.show_id
, ttb_shows.show_name
, ttb_shows.date_added
, COUNT(ttb_books.book_id) AS books
, sum( case when ttb_books.owned = 1 then 1 else 0 end) AS owned
FROM ttb_shows
LEFT JOIN ttb_books ON ttb_shows.show_id = ttb_books.show_id
GROUP BY ttb_shows.show_id, ttb_shows.show_name, ttb_shows.date_added
Your query can be optimized by you only. It seems that so many left outer will obviously slow the output. If you can either avoid so many left outers or make small chunks of cases to fetch out data and then fetch the Final output.
Related
Important updade (explanation):
I realized that my query having single DESC order is 10 times slower that the same query with ASC order. The ordered field has an index. Is it normal behavior?
Original question with queries:
I have a mysql table with a few hundred of product items. It's suprising (for me) how 2 similar sql queries differs in terms of performance. I don't know why. Can you please give me a hint or explain why the difference is so huge?
This query takes 3ms:
SELECT
*
FROM
`product_items`
WHERE
(product_items.shop_active = 1)
AND (product_items.active = 1)
AND (product_items.active_category_id is not null)
AND (has_picture is not null)
AND (price_orig is not null)
AND (category_min_discount IS NOT NULL)
AND (product_items.slug is not null)
AND `product_items`.`active_category_id` IN (6797, 5926, 5806, 6852)
ORDER BY
price asc
LIMIT 1
But the following query takes already 169ms... Only difference is that the order clause contains 2 columns. "Price" value has each product, while "price top" has roughly only 1% of products.
SELECT
*
FROM
`product_items`
WHERE
(product_items.shop_active = 1)
AND (product_items.active = 1)
AND (product_items.active_category_id is not null)
AND (has_picture is not null)
AND (price_orig is not null)
AND (category_min_discount IS NOT NULL)
AND (product_items.slug is not null)
AND `product_items`.`active_category_id` IN (6797, 5926, 5806, 6852)
ORDER BY
price asc,
price_top desc
LIMIT 1
The table structure looks like this:
CREATE TABLE `product_items` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`shop_id` int(11) DEFAULT NULL,
`item_id` varchar(255) DEFAULT NULL,
`productname` varchar(255) DEFAULT NULL,
`description` text,
`url` text,
`url_hash` varchar(255) DEFAULT NULL,
`img_url` text,
`price` decimal(10,2) DEFAULT NULL,
`price_orig` decimal(10,2) DEFAULT NULL,
`discount` decimal(10,2) DEFAULT NULL,
`discount_percent` decimal(10,2) DEFAULT NULL,
`manufacturer` varchar(255) DEFAULT NULL,
`delivery_date` varchar(255) DEFAULT NULL,
`categorytext` text,
`created_at` datetime NOT NULL,
`updated_at` datetime NOT NULL,
`active_category_id` int(11) DEFAULT NULL,
`shop_active` int(11) DEFAULT NULL,
`active` int(11) DEFAULT '0',
`price_top` decimal(10,2) NOT NULL DEFAULT '0.00',
`attention_priority` int(11) DEFAULT NULL,
`attention_priority_over` int(11) DEFAULT NULL,
`has_picture` varchar(255) DEFAULT NULL,
`size` varchar(255) DEFAULT NULL,
`category_min_discount` int(11) DEFAULT NULL,
`slug` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `index_product_items_on_url_hash` (`url_hash`),
KEY `index_product_items_on_shop_id` (`shop_id`),
KEY `index_product_items_on_active_category_id` (`active_category_id`),
KEY `index_product_items_on_productname` (`productname`),
KEY `index_product_items_on_price` (`price`),
KEY `index_product_items_on_discount_percent` (`discount_percent`),
KEY `index_product_items_on_price_top` (`price_top`)
) ENGINE=InnoDB AUTO_INCREMENT=1715708 DEFAULT CHARSET=utf8;
UPDATE
I realized that the difference is mainly in the type of ordering: if I use asc+asc for both columns the query takes around 6ms, if I use asc+desc or desc+asc, the query takes around 160ms..
Thank you.
If creating an index to help ORDER BY doesn't help, try creating an index that helps both WHERE and ORDER BY:
CREATE INDEX product_items_i1 ON product_items (
shop_active,
active,
active_category_id,
has_picture,
price_orig,
category_min_discount,
slug,
price,
price_top DESC
)
Obviously, this is a bit clunky, and you'll have to balance the performance gain for the query with the price of maintaining the index.
trying to query a large table (senddb.order_histories) that has close to 50M rows and this is the MySQL query I am using:
FIRST APPROACH- inner join:
select a.id,
a.order_number,
a.sku_id,
a.fulfillment_status,
a.modified_by,
a.created_at,
a.updated_at
from senddb.order_line_items a
inner join (
select order_line_item_id,
order_number,
order_status,
order_status_description,
action,
modified_by,
created_at,
max(updated_at) as updated_at
from senddb.order_histories
where order_status in ('x','y','z')
and fulfillment_location = 'abcd'
group by order_line_item_id) as b
on a.id = b.order_line_item_id
and a.fulfillment_status = '2';
EXPLAIN output :
SECOND APPROACH- nested select:
select a.id,
a.order_number,
a.sku_id,
a.fulfillment_status,
a.modified_by,
a.created_at,
a.updated_at
from senddb.order_line_items a
where a.fulfillment_status = '2'
and a.id in (
select b.order_line_item_id from(
select order_line_item_id,
order_number,
order_status,
order_status_description,
action,
modified_by,
created_at,
max(updated_at) as updated_at
from senddb.order_histories
where
order_status in ('x','y','z')
and fulfillment_location = 'abcd'
group by order_line_item_id) as b);
I believe nested select is a bad approach on large data but i anyhow added it here because it worked on my sample set. Anyway both the queries eventually time out after 600 seconds with the message : Error Code: 2013. Lost connection to MySQL server during query.
I would like to know if there are any ways to alter the query to make it run faster. I have already tried reducing the columns in the inner select / inner join but that should not really be an issue IMO. I also looked up a solution that says "create a clustered index" but i wasn't really able to follow. Any help is appreciated.
TABLE order_histories :
order_histories CREATE TABLE `order_histories` (
`id` int(4) unsigned NOT NULL AUTO_INCREMENT,
`order_number` varchar(24) DEFAULT NULL,
`order_status_description` varchar(255) DEFAULT NULL,
`datetime_stamp` datetime DEFAULT NULL,
`action` varchar(32) DEFAULT NULL,
`fulfillment_location` int(8) DEFAULT NULL,
`order_status` int(8) DEFAULT NULL,
`user_id` int(8) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`modified_by` varchar(32) DEFAULT NULL,
`order_line_item_id` int(11) DEFAULT NULL,
`pooled` tinyint(1) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `order_histories_ecash_idx` (`order_number`),
KEY `order_line_item_id` (`order_line_item_id`)
) ENGINE=InnoDB AUTO_INCREMENT=454738178 DEFAULT CHARSET=latin1
TABLE order_line_items :
order_line_items CREATE TABLE `order_line_items` (
`id` int(4) unsigned NOT NULL AUTO_INCREMENT,
`order_number` varchar(24) DEFAULT NULL,
`sku_id` int(8) DEFAULT NULL,
`original_price` float DEFAULT NULL,
`dept_description` varchar(100) DEFAULT NULL,
`description` varchar(100) DEFAULT NULL,
`quantity_ordered` int(8) DEFAULT NULL,
`gift_indicator` char(1) DEFAULT NULL,
`gift_wrap_flag` char(1) DEFAULT NULL,
`shipping_record_flag` char(1) DEFAULT NULL,
`gift_comments` varchar(100) DEFAULT NULL,
`item_status` char(1) DEFAULT NULL,
`tax_amount` float DEFAULT NULL,
`tax_rate` float DEFAULT NULL,
`upc` varchar(20) DEFAULT NULL,
`final_price` float DEFAULT NULL,
`line_number` int(8) DEFAULT NULL,
`master_line_number` int(8) DEFAULT NULL,
`gift_wrap_flag_type` char(1) DEFAULT NULL,
`color_code` varchar(4) DEFAULT NULL,
`size_id` varchar(6) DEFAULT NULL,
`width_id` varchar(6) DEFAULT NULL,
`brand` varchar(15) DEFAULT NULL,
`vpn` varchar(30) DEFAULT NULL,
`dept_number` int(8) DEFAULT NULL,
`class_number` int(8) DEFAULT NULL,
`non_merch_item` char(1) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`modified_by` varchar(32) DEFAULT NULL,
`chain_id` int(11) DEFAULT NULL,
`fulfillment_location` int(11) DEFAULT NULL,
`fulfillment_date` datetime DEFAULT NULL,
`fulfillment_status` int(11) DEFAULT NULL,
`fulfillment_sales_associate` int(11) DEFAULT NULL,
`gift_wrap_line_number` int(11) DEFAULT NULL,
`shipping_type` int(11) DEFAULT NULL,
`order_track_info_id` int(11) DEFAULT NULL,
`store_tlog_updated` varchar(1) DEFAULT NULL,
`shipping_tlx_code` int(11) DEFAULT NULL,
`store_closed` tinyint(1) DEFAULT NULL,
`flags` int(11) DEFAULT NULL,
`deal_based_index` int(11) DEFAULT NULL,
`tlog_calc_ret_price` float DEFAULT NULL,
`tlog_amount` float DEFAULT NULL,
`tlog_retail_price` float DEFAULT NULL,
`tlog_ext_amount` float DEFAULT NULL,
`tlog_flag_1` int(11) DEFAULT NULL,
`tlog_flag_2` int(11) DEFAULT NULL,
`tlog_flag_3` int(11) DEFAULT NULL,
`time_remaining` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `order_line_items_ecash_idx` (`order_number`),
KEY `order_line_item_fulfillment_location_idx` (`fulfillment_location`),
KEY `order_line_item_fulfillment_status_idx` (`fulfillment_status`),
KEY `upc_idx` (`upc`),
KEY `sku_id_idx` (`sku_id`),
KEY `order_line_items_idx001` (`order_number`,`id`,`fulfillment_status`),
KEY `order_track_info_id` (`order_track_info_id`),
KEY `shipping_type_idx` (`shipping_type`,`non_merch_item`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=11367052 DEFAULT CHARSET=latin1
This query can be simplified:
select a.id,
a.order_number,
a.sku_id,
a.fulfillment_status,
a.modified_by,
a.created_at,
a.updated_at
from senddb.order_line_items a
inner join senddb.order_histories b on a.id = b.order_line_item_id
where b.order_status in ('x','y','z')
and b.fulfillment_location = 'abcd'
and a.fulfillment_status = '2';
Since you're only selecting values from table a, you don't need to select specific values from table b and can instead just apply your conditions. Outside of this, you need to ensure that b.order_line_item_id has an index on it. You can find more about creating indexes here. I'm not an expert in MySQL but something similar to this should work if senddb.order_histories.order_line_item_id isn't already the primary key.
CREATE INDEX IX_order_histories_order_line_item_id ON order_histories (order_line_item_id);
You need to read up the optimization section of the MySQL docs. It contains a lot of information on how you can optimize your queries and data sets. The main idea here is to add indexes to the fields that are being used as the criteria in the WHERE clause of the SQL statements.
Basically, both of your alternatives are using a "sub-SELECT, not an INNER JOIN.
The syntax of a true JOIN is one of the following:
SELECT ...
FROM X INNER JOIN Y USING (field_list)
... or ...
SELECT ...
FROM X INNER JOIN Y ON (x.field1 = y.field2) ...
But in both cases the objects being joined are tables or views.
I'm going to presume ... admittedly, without checking ... that Nick Larsen's answer #1 adequately re-expresses your original query using JOINs.
(Notice how, in his answer, the shorthand identifiers A and B are introduced as referring to each of the two table-names mentioned in his query.)
Firstly, you need to decide if a 50 million resultset is what you are asking for. Big data tables are not there so that you can select all their rows. They are there so that you can ask them questions using sql queries. SQL is a query language, it's not a data loading language.
What's your purpose? If you want to copy the data you can do that by loading the data, for example, 1000 rows per query in a for loop. if you are loading the data for processing, you can do that in the same way.
If you want to derive statistical information, you can use outer join and return a low number of rows, using aggregate functions. But you shouldn't do that either, what you "should" do is to decide what you want from the table and preferably, run aggregate functions to store useful information in a different table. (mostly SELECT INTO queries) You should never need to join a table of 50 million records in the first place.
Telling you how to do something wrong using indexes wouldn't be the right thing here.
I have two tables:
`search_chat` (
`pubchatid` varchar(255) NOT NULL,
`profile` varchar(255) DEFAULT NULL,
`prefs` varchar(255) DEFAULT NULL,
`init` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`session` varchar(255) DEFAULT NULL,
`device` varchar(255) DEFAULT NULL,
`uid` int(10) DEFAULT NULL,
PRIMARY KEY (`pubchatid`)
and
`chats` (
`id` int(10) NOT NULL AUTO_INCREMENT,
`chatlog` varchar(255) DEFAULT NULL,
`block` varchar(2) DEFAULT '',
`whenadded` datetime DEFAULT NULL,
`pubchatid1` varchar(255) DEFAULT NULL,
`pubchatid2` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
So basically people chat with each other through a search system based on prefrences. The further they are apart, the worse it is. So the query I have is simple:
SELECT *
FROM search_chat
WHERE levenshtein(profile, "[user_prefs]") < 20
AND pubchatid <> "[user_pubchatid]"
ORDER BY
levenshtein(profile, "[user_prefs]")
LIMIT 1
It is a shitty query in itself, but it does the job (everything between "[]" is a variable I put in, just to make it clear).
As you can see this query only makes a selection between two peoples preferences (prefs) and how they are (profile). So far so good.
I have been bugging around some to make this query also check if they have had previous chats. That is where "chats" comes in. I can not get the query to check for a proper user and see if they have an open chat.
In chats, the "search_chat.pubchatid" can be either "chats.pubchatid1" or "chats.pubchatid2"
So somehow I have got to make these two work, making chats rule out options in search_chat.
Do you want something like this:
-- ... ( start of query as per your question )
and not exists (
select *
from chats
where ( ( chats.pubchatid1 = search_chat.pubchatid )
or ( chats.pubchatid2 = search_chat.pubchatid ) )
and -- ... add any restriction on how recent the chat was
)
I want to fetch the videos that top viewed from the last week,
my client has the following table :
CREATE TABLE `videos` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`album` int(10) unsigned NOT NULL DEFAULT '0',
`name` varchar(225) NOT NULL,
`title` varchar(255) NOT NULL DEFAULT '',
`uploading_user` int(8) NOT NULL DEFAULT '2715',
`host` tinyint(1) unsigned NOT NULL DEFAULT '0',
`host_url` varchar(255) NOT NULL DEFAULT '',
`active` tinyint(1) unsigned NOT NULL DEFAULT '0',
`featured` tinyint(1) unsigned NOT NULL DEFAULT '0',
`date_added` date NOT NULL DEFAULT '0000-00-00',
`view` int(125) NOT NULL,
`rating` int(125) NOT NULL,
`rating_count` int(125) NOT NULL,
`category` varchar(225) NOT NULL,
`genre` varchar(225) NOT NULL,
`playlist` varchar(225) NOT NULL,
`video_image` varchar(225) NOT NULL,
`votecount` int(5) DEFAULT NULL,
`banner_image` varchar(255) NOT NULL DEFAULT 'IB_header_solo4.jpg',
`bg_image` varchar(255) NOT NULL DEFAULT 'IB_bck_hd_sect.jpg',
`bg_color` varchar(255) NOT NULL DEFAULT '#000000',
`user_video` int(1) NOT NULL DEFAULT '0',
`description` varchar(1000) DEFAULT NULL,
`country` varchar(30) DEFAULT NULL,
`city` varchar(50) DEFAULT NULL,
`location` varchar(50) DEFAULT NULL,
`reported` int(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=1531 DEFAULT CHARSET=latin1
in this table date_added field for on which date we have added this video,
and i have updated the table on every video view in the view field, means i have the total views of the particular video.
now what i want, how can i fetch the result of top views videos from last week?
can i have to add another date field on which i add the id of the viewed videos?
or any other alternate solution?
If you want to get a list of videos with the number of views in the last week, you need to store views separately. The views column in the table would only give you access to the total number of views but it does not give any indication as to when those views were generated.
I would use a table with video_id, datetime and number of views to store the views. By using a DATETIME column and the requirement that the combination video_id and DATETIME is unique, you can store views down to the second and create more statistics later.
I think it's best you keep a separate table to store who viewed the videos, adding it to this table you will be limited with the number of people you can associate as viewers. Regarding your query. It would be better if you keep a seperate field that stores weekly views.
assuming you had a field called week_views
SELECT required_fields FROM videos ORDER BY week_views DESC
SELECT * (or required fields) FROM videos
WHERE date_added >= CURDATE() - INTERVAL WEEKDAY(CURDATE()) +7 DAY
AND date_added < CURDATE() - INTERVAL WEEKDAY(CURDATE()) DAY
ORDER BY view DESC
Something like that should select the videos from last week, Monday to Sunday (not necessarily last 7 days.)
i'm having a bit of trouble trying to write an sql query. i have a table for listings, now all i am trying to do here is make it so that listings that have been paid for will show up at the top of the results.
so say i am on province page and the province ID is 23, i need to make the listings that have upgrade_province='yes' and upgrade_status='paid' and id_province='23' show up as the first results ordered by date, and everything else below those ordered by date. if upgrade_province='yes' and upgrade_status='unpaid' then it needs to show that listing as a normal listing.
heres the table structure:
`id` int(11) NOT NULL auto_increment,
`id_user` int(11) NOT NULL,
`contact_email` varchar(255) NOT NULL,
`price` varchar(255) NOT NULL,
`title` varchar(255) NOT NULL,
`title_slug` varchar(255) NOT NULL,
`description` text NOT NULL,
`id_province` int(11) NOT NULL,
`id_city` int(11) NOT NULL,
`map_address` varchar(255) NOT NULL,
`id_type` int(11) NOT NULL,
`id_contract` int(11) NOT NULL,
`bedrooms` varchar(255) NOT NULL,
`bathrooms` varchar(255) NOT NULL,
`image_main` varchar(255) NOT NULL,
`image_2` varchar(255) NOT NULL,
`image_3` varchar(255) NOT NULL,
`image_4` varchar(255) NOT NULL,
`image_5` varchar(255) NOT NULL,
`status` varchar(255) NOT NULL default 'Active',
`upgrade_urgent` varchar(255) NOT NULL default 'no',
`upgrade_city` varchar(255) NOT NULL default 'no',
`upgrade_province` varchar(255) NOT NULL default 'no',
`upgrade_price` varchar(255) NOT NULL,
`upgrade_status` varchar(255) NOT NULL default 'unpaid',
`num_views` int(11) NOT NULL default '0',
`num_replies` int(11) NOT NULL default '0',
`date_posted` int(11) NOT NULL,
`date_upgrade` int(11) NOT NULL default '0',
`mod_status` varchar(255) NOT NULL default 'approved',
i can't seem to figure out how i can make it treat unpaid ads as normal even if their upgrade_province='yes'
You can use arbitrary expressions in ORDER BY clauses. The boolean true/false values will get treated as '0' and '1' integers which can be ordered with ASC/DESC as usual, so
...rest of query...
ORDER BY (upgrade_province='yes' and upgrade_status='paid' and id_province='23') DESC,
... other order clauses ...
Any records which evaluate to TRUE in that 'and' sequence will evaluate to a '1', which when ordered DESC will appear first in the results. Anything which comes out false is cast to 0 and will sort lower.
If I'm understanding his desired result, the answer Marc B offered is not appropriate. Specifying multiple order by clauses causes secondary sorts on collisions of the first and so on.
My understanding of the question is there are two distinct sets of things that need to be sorted differently.
Also the order by clause is incorrect. He wants that to be part of the where clause and then ordered by date after the set is identified.
I would suggest trying
select .. from table where (upgrade_province='yes' and upgrade_status='paid' and id_province='23') order by date desc
union
select .. from table where (upgrade_province!='yes' and upgrade_status!='paid' and id_province!='23') order by date desc
Note: the way you negate the second part of the union might need the ands changed to ors depending on what exactly you want.
Either way to point is to independently select the two different sets and sort them with in themselves. Something similar could also be accomplished on the application level issuing multiple queries and having the app sort it out as desired.