Nested selects in MySQL - mysql

Setting:
Each page on my site has four widgets that are arranged in different orders (1-4).
I have a table 'content' and table 'widgets'. I have a bridging table that maps content.id to widgets.content_id.
Problem:
What I want to do is run a query that selects * from content along with addition columns widget_1, widget_2, widget_3, widget_4, each containing the id of the widget linked to that page.
I've been trying some nested selects all morning and can't seem to crack it. I've copied the MySQL dumps of the involved tables below :-).
CREATE TABLE `content` (
`id` int(11) NOT NULL auto_increment,
`permalink` varchar(64) character set latin1 NOT NULL,
`parent` int(11) NOT NULL default '1',
`title` varchar(128) character set latin1 NOT NULL,
`content` text character set latin1,
`content_type` varchar(16) NOT NULL default 'page',
PRIMARY KEY (`id`),
FULLTEXT KEY `title` (`title`,`content`,`meta_description`,`meta_keywords`)
)
CREATE TABLE `widgets` (
`id` int(11) unsigned NOT NULL auto_increment,
`title` varchar(64) default NULL,
`text` varchar(256) default NULL,
`image` varchar(128) default NULL,
`target` varchar(128) default NULL,
`code` varchar(32) default NULL,
PRIMARY KEY (`id`)
)
CREATE TABLE `content_widgets` (
`content_id` int(11) NOT NULL,
`widget_id` int(11) NOT NULL,
`order` tinyint(4) NOT NULL
)
thanks a lot!

You don't need a nested query - just a join. Assuming that you want to start with a content record and return the matching widgets....
SELECT c.*, w.*
FROM content c
LEFT JOIN (
content_widgets cw INNER JOIN widgets w
ON cw.widget_id=w.id
) ON c.id=cw.id
WHERE c.id=....
Although a simple innter join is a better idea of you know you've got the widgets:
SELECT c.*, w.*
FROM content c, content_widgets cw widgets w
WHERE cw.widget_id=w.id
AND c.id=cw.id
AND c.id=....

Related

How to store translates in MySQL to use join?

I have a table that contains all translations of words:
CREATE TABLE `localtexts` (
`Id` int(11) NOT NULL,
`Lang` char(2) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT 'pe',
`Text` varchar(300) DEFAULT NULL,
`ShortText` varchar(100) NOT NULL,
`DbVersion` timestamp NOT NULL DEFAULT current_timestamp(),
`Status` int(11) NOT NULL DEFAULT 1
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
As example there is a table that refers to localtexts:
CREATE TABLE `composes` (
`Status` int(11) NOT NULL DEFAULT 1,
`Id` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The table above has foreign key Id to localtexts.Id. And when I need to get word on English I do:
SELECT localtexts.text,
composes.status
FROM composes
LEFT JOIN localtexts ON composes.Id = localtexts.Id
WHERE localtexts.Lang = 'en'.
I'm concerned in performance this decision when there are a lot of tables for join with localtexts.
You might find that adding the following index to the localtexts table would speed up the query:
CREATE INDEX idx ON localtexts (Lang, id, text);
This index covers the WHERE clause, join, and SELECT.

Query large table with 50 million rows

trying to query a large table (senddb.order_histories) that has close to 50M rows and this is the MySQL query I am using:
FIRST APPROACH- inner join:
select a.id,
a.order_number,
a.sku_id,
a.fulfillment_status,
a.modified_by,
a.created_at,
a.updated_at
from senddb.order_line_items a
inner join (
select order_line_item_id,
order_number,
order_status,
order_status_description,
action,
modified_by,
created_at,
max(updated_at) as updated_at
from senddb.order_histories
where order_status in ('x','y','z')
and fulfillment_location = 'abcd'
group by order_line_item_id) as b
on a.id = b.order_line_item_id
and a.fulfillment_status = '2';
EXPLAIN output :
SECOND APPROACH- nested select:
select a.id,
a.order_number,
a.sku_id,
a.fulfillment_status,
a.modified_by,
a.created_at,
a.updated_at
from senddb.order_line_items a
where a.fulfillment_status = '2'
and a.id in (
select b.order_line_item_id from(
select order_line_item_id,
order_number,
order_status,
order_status_description,
action,
modified_by,
created_at,
max(updated_at) as updated_at
from senddb.order_histories
where
order_status in ('x','y','z')
and fulfillment_location = 'abcd'
group by order_line_item_id) as b);
I believe nested select is a bad approach on large data but i anyhow added it here because it worked on my sample set. Anyway both the queries eventually time out after 600 seconds with the message : Error Code: 2013. Lost connection to MySQL server during query.
I would like to know if there are any ways to alter the query to make it run faster. I have already tried reducing the columns in the inner select / inner join but that should not really be an issue IMO. I also looked up a solution that says "create a clustered index" but i wasn't really able to follow. Any help is appreciated.
TABLE order_histories :
order_histories CREATE TABLE `order_histories` (
`id` int(4) unsigned NOT NULL AUTO_INCREMENT,
`order_number` varchar(24) DEFAULT NULL,
`order_status_description` varchar(255) DEFAULT NULL,
`datetime_stamp` datetime DEFAULT NULL,
`action` varchar(32) DEFAULT NULL,
`fulfillment_location` int(8) DEFAULT NULL,
`order_status` int(8) DEFAULT NULL,
`user_id` int(8) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`modified_by` varchar(32) DEFAULT NULL,
`order_line_item_id` int(11) DEFAULT NULL,
`pooled` tinyint(1) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `order_histories_ecash_idx` (`order_number`),
KEY `order_line_item_id` (`order_line_item_id`)
) ENGINE=InnoDB AUTO_INCREMENT=454738178 DEFAULT CHARSET=latin1
TABLE order_line_items :
order_line_items CREATE TABLE `order_line_items` (
`id` int(4) unsigned NOT NULL AUTO_INCREMENT,
`order_number` varchar(24) DEFAULT NULL,
`sku_id` int(8) DEFAULT NULL,
`original_price` float DEFAULT NULL,
`dept_description` varchar(100) DEFAULT NULL,
`description` varchar(100) DEFAULT NULL,
`quantity_ordered` int(8) DEFAULT NULL,
`gift_indicator` char(1) DEFAULT NULL,
`gift_wrap_flag` char(1) DEFAULT NULL,
`shipping_record_flag` char(1) DEFAULT NULL,
`gift_comments` varchar(100) DEFAULT NULL,
`item_status` char(1) DEFAULT NULL,
`tax_amount` float DEFAULT NULL,
`tax_rate` float DEFAULT NULL,
`upc` varchar(20) DEFAULT NULL,
`final_price` float DEFAULT NULL,
`line_number` int(8) DEFAULT NULL,
`master_line_number` int(8) DEFAULT NULL,
`gift_wrap_flag_type` char(1) DEFAULT NULL,
`color_code` varchar(4) DEFAULT NULL,
`size_id` varchar(6) DEFAULT NULL,
`width_id` varchar(6) DEFAULT NULL,
`brand` varchar(15) DEFAULT NULL,
`vpn` varchar(30) DEFAULT NULL,
`dept_number` int(8) DEFAULT NULL,
`class_number` int(8) DEFAULT NULL,
`non_merch_item` char(1) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`modified_by` varchar(32) DEFAULT NULL,
`chain_id` int(11) DEFAULT NULL,
`fulfillment_location` int(11) DEFAULT NULL,
`fulfillment_date` datetime DEFAULT NULL,
`fulfillment_status` int(11) DEFAULT NULL,
`fulfillment_sales_associate` int(11) DEFAULT NULL,
`gift_wrap_line_number` int(11) DEFAULT NULL,
`shipping_type` int(11) DEFAULT NULL,
`order_track_info_id` int(11) DEFAULT NULL,
`store_tlog_updated` varchar(1) DEFAULT NULL,
`shipping_tlx_code` int(11) DEFAULT NULL,
`store_closed` tinyint(1) DEFAULT NULL,
`flags` int(11) DEFAULT NULL,
`deal_based_index` int(11) DEFAULT NULL,
`tlog_calc_ret_price` float DEFAULT NULL,
`tlog_amount` float DEFAULT NULL,
`tlog_retail_price` float DEFAULT NULL,
`tlog_ext_amount` float DEFAULT NULL,
`tlog_flag_1` int(11) DEFAULT NULL,
`tlog_flag_2` int(11) DEFAULT NULL,
`tlog_flag_3` int(11) DEFAULT NULL,
`time_remaining` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `order_line_items_ecash_idx` (`order_number`),
KEY `order_line_item_fulfillment_location_idx` (`fulfillment_location`),
KEY `order_line_item_fulfillment_status_idx` (`fulfillment_status`),
KEY `upc_idx` (`upc`),
KEY `sku_id_idx` (`sku_id`),
KEY `order_line_items_idx001` (`order_number`,`id`,`fulfillment_status`),
KEY `order_track_info_id` (`order_track_info_id`),
KEY `shipping_type_idx` (`shipping_type`,`non_merch_item`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=11367052 DEFAULT CHARSET=latin1
This query can be simplified:
select a.id,
a.order_number,
a.sku_id,
a.fulfillment_status,
a.modified_by,
a.created_at,
a.updated_at
from senddb.order_line_items a
inner join senddb.order_histories b on a.id = b.order_line_item_id
where b.order_status in ('x','y','z')
and b.fulfillment_location = 'abcd'
and a.fulfillment_status = '2';
Since you're only selecting values from table a, you don't need to select specific values from table b and can instead just apply your conditions. Outside of this, you need to ensure that b.order_line_item_id has an index on it. You can find more about creating indexes here. I'm not an expert in MySQL but something similar to this should work if senddb.order_histories.order_line_item_id isn't already the primary key.
CREATE INDEX IX_order_histories_order_line_item_id ON order_histories (order_line_item_id);
You need to read up the optimization section of the MySQL docs. It contains a lot of information on how you can optimize your queries and data sets. The main idea here is to add indexes to the fields that are being used as the criteria in the WHERE clause of the SQL statements.
Basically, both of your alternatives are using a "sub-SELECT, not an INNER JOIN.
The syntax of a true JOIN is one of the following:
SELECT ...
FROM X INNER JOIN Y USING (field_list)
... or ...
SELECT ...
FROM X INNER JOIN Y ON (x.field1 = y.field2) ...
But in both cases the objects being joined are tables or views.
I'm going to presume ... admittedly, without checking ... that Nick Larsen's answer #1 adequately re-expresses your original query using JOINs.
(Notice how, in his answer, the shorthand identifiers A and B are introduced as referring to each of the two table-names mentioned in his query.)
Firstly, you need to decide if a 50 million resultset is what you are asking for. Big data tables are not there so that you can select all their rows. They are there so that you can ask them questions using sql queries. SQL is a query language, it's not a data loading language.
What's your purpose? If you want to copy the data you can do that by loading the data, for example, 1000 rows per query in a for loop. if you are loading the data for processing, you can do that in the same way.
If you want to derive statistical information, you can use outer join and return a low number of rows, using aggregate functions. But you shouldn't do that either, what you "should" do is to decide what you want from the table and preferably, run aggregate functions to store useful information in a different table. (mostly SELECT INTO queries) You should never need to join a table of 50 million records in the first place.
Telling you how to do something wrong using indexes wouldn't be the right thing here.

MySQL Left join take too long for huge data

I have two tables .Property tables and it related photo.One property may have many photo but I want only one any of it related photo, When I use left join MySQL query it become too slow.
Here is my query
SELECT `sp_property`.`id` as propertyid, `sp_property`.`property_name`, `sp_property`.`property_price`, `sp_property`.`adv_type`, `sp_property`.`usd`, `images`.`filepath_name`
FROM (`sp_property`)
LEFT JOIN (select id, Max(property_id) as pid,filepath_name
from sp_property_images
group by property_id) `images`
ON `images`.`pid` = `sp_property`.`id`
WHERE `sp_property`.`published` = 'yes'
GROUP BY `propertyid`
ORDER BY `sp_property`.`feature_listing` desc, `submit_date` desc
LIMIT 1,20
CREATE TABLE IF NOT EXISTS `sp_property_images` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`property_id` varchar(100) NOT NULL,
`filepath_name` text,
`label_name` varchar(45) DEFAULT NULL,
`primary` char(10) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `property_id` (`property_id`),
KEY `id` (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=12941 ;
CREATE TABLE IF NOT EXISTS `sp_property` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`propertytype` varchar(50) NOT NULL,
`adv_type` varchar(45) NOT NULL,
`property_name` text,
`division` varchar(45) NOT NULL,
`township` varchar(45) NOT NULL,
`property_price` decimal(20,2) unsigned DEFAULT NULL,
`price_type` varchar(45) NOT NULL,
`availability` varchar(100) DEFAULT NULL,
`property_address` text,
`p_dimension_length` varchar(45) NOT NULL,
`p_dimension_width` varchar(45) NOT NULL,
`p_dimension_sqft` varchar(45) NOT NULL,
`p_dimension_acre` varchar(45) NOT NULL,
`floor` varchar(45) NOT NULL,
`phone` varchar(100) DEFAULT NULL,
`aircorn` varchar(45) NOT NULL,
`ownership` varchar(45) NOT NULL,
`bedroom` varchar(45) NOT NULL,
`bathroom` varchar(45) NOT NULL,
`special_feature` text,
`amentites` text,
`property_detail` text,
`submit_date` datetime DEFAULT NULL,
`published` varchar(45) NOT NULL,
`published_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`agent_id` varchar(45) NOT NULL,
`source` varchar(45) NOT NULL,
`contact_name` varchar(100) NOT NULL,
`contact_no` varchar(100) NOT NULL,
`contact_address` text NOT NULL,
`contact_email` varchar(100) NOT NULL,
`unit_type` varchar(100) DEFAULT NULL,
`map_lat` varchar(100) DEFAULT NULL,
`map_long` varchar(100) DEFAULT NULL,
`show_map` varchar(3) DEFAULT 'no',
`total_view` bigint(20) NOT NULL DEFAULT '0',
`feature_listing` varchar(10) NOT NULL DEFAULT 'no',
`new_homes_id` int(11) NOT NULL,
`publish_price` int(1) NOT NULL DEFAULT '0',
`usd` decimal(20,2) NOT NULL,
`tag_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=18524 ;
Have you added indices on your tables? You would need three indices on the following columns:
article_photo.a_id for grouping and joining
article_photo.p_id for sorting
article.a_id for joining (although this is hopefully already the PK of your table)
The result of joins is not guaranteed to be sorted in any order, so you probably want to move your ORDER BY clause from the subquery to the outer query:
SELECT * from `article`
LEFT JOIN (
SELECT * from `article_photo`
GROUP BY `a_id`) as images
ON article.a_id = images.a_id
ORDER BY images.p_id DESC
Also, you have no guarantee on which article_photo you will get, since select data without an aggregate function (and only MySQL will allow you to do that).
Now that the question contains the real tables and all information essential to answering, here's my take – first, here's your query:
SELECT `sp_property`.`id` as propertyid, `sp_property`.`property_name`, `sp_property`.`property_price`, `sp_property`.`adv_type`, `sp_property`.`usd`, `images`.`filepath_name`
FROM (`sp_property`)
LEFT JOIN (select id, Max(property_id) as pid,filepath_name
from sp_property_images
group by property_id) `images`
ON `images`.`pid` = `sp_property`.`id`
WHERE `sp_property`.`published` = 'yes'
GROUP BY `propertyid`
ORDER BY `sp_property`.`feature_listing` desc, `submit_date` desc
LIMIT 1,20
Let's see. you are joining sp_property_images.property_id with sp_property.id. These columns have different types (int vs. varchar) and I suppose this results in a severe performance penalty (since the values have to be converted to the same type).
Then, you are filtering by sp_property.published, so I suggest adding an index on this column as well. Also, examine if you really need to have this column as varchar. A bool/bit flag probably suffices as well (if it doesn't, an enum might be a better choice still).
Ordering benefits from an index too. Add an index spanning both columns sp_property.feature_listing and sp_property.submit_date.
If all of the above still doesn't help, you might have to remove the sub-select. It might prevent the mysql engine from recognizing (and using!) the index you have defined on the sp_property_images.property_id column.
This simple query will give you what you asked for, all articles with their photos
SELECT ar.a_id, ar.a_title, ap.p_id, ap.photo_name
FROM article ar
JOIN article_photo ap on ar.a_id = ap.a_id
No reason for left join and grouping there or you wanna get sum on photos by article?

MySQL query issue with combining two tables

I have two tables:
`search_chat` (
`pubchatid` varchar(255) NOT NULL,
`profile` varchar(255) DEFAULT NULL,
`prefs` varchar(255) DEFAULT NULL,
`init` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`session` varchar(255) DEFAULT NULL,
`device` varchar(255) DEFAULT NULL,
`uid` int(10) DEFAULT NULL,
PRIMARY KEY (`pubchatid`)
and
`chats` (
`id` int(10) NOT NULL AUTO_INCREMENT,
`chatlog` varchar(255) DEFAULT NULL,
`block` varchar(2) DEFAULT '',
`whenadded` datetime DEFAULT NULL,
`pubchatid1` varchar(255) DEFAULT NULL,
`pubchatid2` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
So basically people chat with each other through a search system based on prefrences. The further they are apart, the worse it is. So the query I have is simple:
SELECT *
FROM search_chat
WHERE levenshtein(profile, "[user_prefs]") < 20
AND pubchatid <> "[user_pubchatid]"
ORDER BY
levenshtein(profile, "[user_prefs]")
LIMIT 1
It is a shitty query in itself, but it does the job (everything between "[]" is a variable I put in, just to make it clear).
As you can see this query only makes a selection between two peoples preferences (prefs) and how they are (profile). So far so good.
I have been bugging around some to make this query also check if they have had previous chats. That is where "chats" comes in. I can not get the query to check for a proper user and see if they have an open chat.
In chats, the "search_chat.pubchatid" can be either "chats.pubchatid1" or "chats.pubchatid2"
So somehow I have got to make these two work, making chats rule out options in search_chat.
Do you want something like this:
-- ... ( start of query as per your question )
and not exists (
select *
from chats
where ( ( chats.pubchatid1 = search_chat.pubchatid )
or ( chats.pubchatid2 = search_chat.pubchatid ) )
and -- ... add any restriction on how recent the chat was
)

Avoid UNION for two almost identical tables in MySQL

I'm not very good at MySQL and i'm going to write a query to count messages sent by an user, based on its type and is_auto field.
Messages can be of type "small text message" or "newsletter". I created two entities with a few fields that differs between them. The important one is messages_count that is absent in table newsletter and it's used in the query:
CREATE TABLE IF NOT EXISTS `small_text_message` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`messages_count` int(11) NOT NULL,
`username` varchar(255) NOT NULL,
`method` varchar(255) NOT NULL,
`content` longtext,
`sent_at` datetime DEFAULT NULL,
`status` varchar(255) NOT NULL,
`recipients_count` int(11) NOT NULL,
`customers_count` int(11) NOT NULL,
`sheduled_at` datetime DEFAULT NULL,
`sheduled_for` datetime DEFAULT NULL,
`is_auto` tinyint(1) NOT NULL,
`user_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
And:
CREATE TABLE `newsletter` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`subject` varchar(78) DEFAULT NULL,
`content` longtext,
`sent_at` datetime DEFAULT NULL,
`status` varchar(255) NOT NULL,
`recipients_count` int(11) NOT NULL,
`customers_count` int(11) NOT NULL,
`sheduled_at` datetime DEFAULT NULL,
`sheduled_for` datetime DEFAULT NULL,
`is_auto` tinyint(1) NOT NULL,
`user_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
I ended up with a UNION query. Can this query be shortened or optimized since the only difference is messages_count that should be always 1 for newsletter?
SELECT
CONCAT('sms_', IF(is_auto = 0, 'user' , 'auto')) AS subtype,
SUM(messages_count * (customers_count + recipients_count)) AS count
FROM small_text_message WHERE status <> 'pending' AND user_id = 1
GROUP BY is_auto
UNION
SELECT
CONCAT('newsletter_', IF(is_auto = 0, 'user' , 'auto')) AS subtype,
SUM(customers_count + recipients_count) AS count
FROM newsletter WHERE status <> 'pending' AND user_id = 1
GROUP BY is_auto
I don't see any easy way to avoid a UNION (or UNION ALL) operation, that will return the specified result set.
I would recommend you use a UNION ALL operator in place of the UNION operator. Then the execution plan will not include the step that eliminates duplicate rows. (You already have GROUP BY operations on each query, and there is no way that those two queries can produce an identical row.)
Otherwise, your query looks fine just as it is written.
(It's always a good thing to consider the question, might there be a better way? To get the result set you are asking for, from the schema you have, your query looks about as good as it's going to get.)
If you are looking for more general DB advice, I recommend restructuring the tables to factor the common elements into one table, perhaps called outbound_communication or something, with all of your common fields, then perhaps have "sub tables" for the specific types to host the fields which are unique to that type. It does mean a simple JOIN is necessary to select all of the fields you want, but the again, it's normalized and actually makes situations like this one easier (one table holds all of the entities of interest). Additionally, you have the option of writing that JOIN just once as a "view", and then your existing code would not even need to change to see the two tables as if they never changed.
CREATE TABLE IF NOT EXISTS `outbound_communicaton` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`content` longtext,
`sent_at` datetime DEFAULT NULL,
`status` varchar(255) NOT NULL,
`recipients_count` int(11) NOT NULL,
`customers_count` int(11) NOT NULL,
`sheduled_at` datetime DEFAULT NULL,
`sheduled_for` datetime DEFAULT NULL,
`is_auto` tinyint(1) NOT NULL,
`user_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
CREATE TABLE `small_text_message` (
`oubound_communication_id` int(11) NOT NULL,
`messages_count` int(11) NOT NULL,
`username` varchar(255) NOT NULL,
`method` varchar(255) NOT NULL,
PRIMARY KEY (`outbound_communication_id`),
FOREIGN KEY (outbound_communication_id)
REFERENCES outbound_communicaton(id)
) ENGINE=InnoDB;
CREATE TABLE `newsletter` (
`oubound_communication_id` int(11) NOT NULL,
`subject` varchar(78) DEFAULT NULL,
PRIMARY KEY (`outbound_communication_id`),
FOREIGN KEY (outbound_communication_id)
REFERENCES outbound_communicaton(id)
) ENGINE=InnoDB;
Then selecting a text msg is like this:
SELECT *
FROM outbound_communication AS parent
JOIN small_text_message
ON parent.id = small_text_message.outbound_communication_id
WHERE parent.id = 1234;
The nature of the query is inherently the union of the data from the small text message and the newsletter tables, so the UNION query is the only realistic formulation. There's no join of relevance between the two tables, for example.
So, I think you're very much on the right lines with your query.
Why are you worried about a UNION?