why does index not work as expected in mysql? - mysql

I really want to why my index not working.
I have two table post, post_log.
create table post
(
id int auto_increment
primary key,
comment int null,
is_used tinyint(1) default 1 not null,
is_deleted tinyint(1) default 0 not null
);
create table post_log
(
id int auto_increment
primary key,
post_id int not null,
created_at datetime not null,
user int null,
constraint post_log_post_id_fk
foreign key (post_id) references post (id)
);
create index post_log_created_at_index
on post_log (created_at);
When I queried below, created_at index works well.
explain
SELECT *
FROM post p
INNER JOIN post_log pl ON p.id = pl.post_id
WHERE pl.created_at > DATE('2022-06-01')
AND pl.created_at < DATE('2022-06-08')
AND p.is_used is TRUE
AND p.is_deleted is FALSE;
When I queried below, it doesn't work and post table do full scan.
explain
SELECT *
FROM post p
INNER JOIN post_log pl ON p.id = pl.post_id
WHERE pl.created_at > DATE('2022-06-01')
AND pl.created_at < DATE('2022-06-08')
AND p.is_used = 1
AND p.is_deleted = 0;
And below not working either.
explain
SELECT *
FROM post p
INNER JOIN post_log pl ON p.id = pl.post_id
WHERE pl.created_at > DATE('2022-06-01')
AND pl.created_at < DATE('2022-06-08')
and p.comment = 111
what is different between 'tinyint = 1' and 'tinyint is true'?
and, why first query work correctly and the others don't work correctly??

When making the query plan, MySQL has to decide whether to first filter the post_log table using the index, or first filter the post table using the is_used and is_deleted columns.
= 1 tests for the specific value 1, while IS TRUE is true for any non-zero value. I guess it decides that when you're searching for specific values, it will be more efficient to filter the post table first because there will likely be fewer matches (since these columns aren't indexed, it doesn't know that 0 and 1 are the only values).

Related

MySQL. How to make a selection by multiple columns

I have a database with the following base structure.
create table objects
(
id int auto_increment primary key,
);
create table object_attribute_values
(
id int auto_increment primary key,
object_id int not null,
attribute_id int not null,
value varchar(255) null
);
create table attributes
(
id int auto_increment primary key,
attribute varchar(20) null,
);
And so let's say the attribute table has 3 :
id
attribute
1
color
2
rating
3
size
I need select all objects that have color='black', rating IN (5, 10), size=10.
I understand how to get all objects in black
SELECT o.id
FROM objects o
INNER JOIN object_attribute_values oav ON oav.object_id = o.id
INNER JOIN join attributes a ON a.id = oav.attribute_id
WHERE a.attribute = 'color' AND oav.value = 'black'
The result should be like this:
object_id
attributes
1
color:black,rating:6,size:10
7
color:black,rating:6,size:10
12
color:black,rating:9,size:10
What you are dealing with is a key/value table. I don't like them much, because they make querying data more complex and don't guarantee consisteny (data type, obligatory/optional values) as normal columns do. But sometimes they are necessary.
Anyway, the typical way to query key/value tables is by aggregation:
SELECT
o.id as object_id,
GROUP_CONCAT(CONCAT(a.attribute, ':', oav.value) ORDER BY a.id SEPARATOR ';') AS attributes
FROM objects o
INNER JOIN object_attribute_values oav ON oav.object_id = o.id
INNER JOIN join attributes a ON a.id = oav.attribute_id
GROUP BY o.id
HAVING SUM(a.attribute = 'color' AND oav.value = 'black') > 0;
The HAVING clause looks for all objetcs that have color = black. Others are dismissed. This works, because in MySQL true = 1, false = 0, so we can just add up the condition results.
Yes, you can do this. The knack you need is the concept that there are two ways of getting tables out of the table server. One way is ..
FROM TABLE A
The other way is
FROM (SELECT col as name1, col2 as name2 FROM ...) B

How to get values through connected tables?

I have such a question. I got two tables, the first one contains comments, and the second id comments and album id to which the comment was left
> CREATE TABLE `review` (`id` VARCHAR(32) NOT NULL,
> `user_id` VARCHAR(32) NOT NULL,`comment` MEDIUMTEXT NOT NULL,
> PRIMARY KEY (`id`) )
> CREATE TABLE `review_album` (`review_id` VARCHAR(32) NOT NULL,
> `album_id` VARCHAR(32) NOT NULL, PRIMARY KEY (`review_id`,
> `album_id`), INDEX `review_album_review_idx` (`review_id`) )
I tried this way:
SELECT * from review_album JOIN review WHERE album_id = '300001'
But i got result two times.
How can I get comment text for a specific album_id?
The general syntax is:
SELECT column-names
FROM table-name1 JOIN table-name2
ON column-name1 = column-name2
WHERE condition
The general syntax with INNER is:
SELECT column-names
FROM table-name1 INNER JOIN table-name2
ON column-name1 = column-name2
WHERE condition
Note: The INNER keyword is optional: it is the default as well as the most commonly used JOIN operation.
Refrence : https://www.dofactory.com/sql/join
Try with InnerJoin
SELECT *
FROM review_album
JOIN review ON review_album.review_id=review.id
WHERE album_id = '300001'
Reference
you have forgotten the on condition, everytime you have a join you'd better specify the condition of join, otherwais you have every connection available.
Hovewer the solution
SELECT *
FROM review_album RA
JOIN review R ON RA.column_fk = R.column_fk
WHERE album_id = '300001'
Here the documentation for join https://www.w3schools.com/sql/sql_join.asp
try using this :
SELECT *
FROM review_album ra
JOIN review r ON rareview_id=r.id
WHERE album_id = '300001'

How to optimize this SELECT?

I have one-to-many tables Payment and PaymentFlows to keep track of payment workflows.
For different managers, they are interested in certain workflows only. So whenever a payment reach a certain workflow, a list is provided to them.
For example,
Payment 1 - A) Apply
B) Checked
C) Approved by Manager
D) Approved by CFO
E) Cheque issued
Payment 2 - A) Apply
B) Checked
C) Approved by Manager
Payment 3 - A) Apply
B) Checked
C) Approved by Manager
Payment 4 - A) Apply
B) Checked
To show all payments at workflow C, what I did is:
class Payment < ActiveRecord::Base
def self.search_by_workflow(flow_code)
self.find_by_sql("SELECT * FROM payments P INNER JOIN (
SELECT payment_id FROM (
SELECT * FROM (
SELECT * FROM payment_flows F
ORDER BY F.payment_flow_id DESC
) latest GROUP BY payment_id
) flows WHERE flows.code = flow_code)
) IDs ON IDs.payment_id = P.payment_id ORDER BY P.payment_id DESC LIMIT 100;")
end
end
so:
#payments = Payment.search_by_workflow('Approved by Manager')
returns: Payment 2 and 3
However, the performance is not very good (5 to 7 seconds for 15,000 payments and 55,000 workflows).
How can I improve the performance?
UPDATE (with table structures):
CREATE TABLE `payments` (
`payment_id` int(11) NOT NULL,
`payment_type_code` varchar(50) default 'PETTY_CASH',
`status` varchar(16) NOT NULL default '?',
PRIMARY KEY (`payment_id`),
KEY `status` (`status`),
KEY `payment_type_code` (`payment_type_code`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `payment_flows` (
`payment_flow_id` int(11) NOT NULL,
`payment_id` int(11) default NULL,
`code` varchar(64) default NULL,
`status` varchar(255) NOT NULL default 'new',
PRIMARY KEY (`payment_flow_id`),
KEY `payment_id` (`payment_id`),
KEY `code` (`code`),
KEY `status` (`status`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
UPDATE (with name_scope):
named_scope :by_workflows, lambda { |workflows| { :conditions => [ "EXISTS (
SELECT 'FLOW'
FROM payment_flows pf
WHERE pf.payment_id = payments.payment_id
AND pf.proc_code IN (:flows)
AND NOT EXISTS (
SELECT 'OTHER'
FROM payment_flows pfother
WHERE pfother.payment_id = pf.payment_id
AND pfother.payment_flow_id > pf.payment_flow_id
)
)", { :flows => workflows } ]}
}
for convenience, e.g.:
Payment.by_workflows(['Approved by Manager', 'Approved by CFO']).count
Try this:
SELECT * FROM payment p
WHERE EXISTS(
SELECT 'FLOW'
FROM payment_flows pf
WHERE pf.payment_id = p.payment_id
AND pf.code = flow_code
AND NOT EXISTS(
SELECT 'OTHER'
FROM payment_flows pf2
WHERE pf2.payment_id = pf.payment_id
AND pf2.payment_flow_id > pf.payment_flow_id
)
)
Pay attention: in the query flow_code is a variable with the code you want to search
I've added a main EXISTS condition about the presence of flow_code and a nested NOT EXISTS condition about the absence of other id of the same payment next about flow_code.
Tell me if is it OK about better performance.
It looks like you are defining "latest" payment_flows for a given payment to be the row with largest value of payment_flow_id.
For better performance, if you can replace a couple of your indexes on payment_flow_id
by ADDING these indexes
... ON payment_flow_id(code,payment_id,payment_flow_id)
... ON payment_flow_id(payment_id,payment_flow_id)
and DROPPING these (now redundant) indexes
... ON payment_flow_id(code)
... ON payment_flow_id(payment_id)
I would suggest this query:
SELECT p.*
FROM payments p
JOIN ( SELECT c.payment_id
, MAX(c.payment_flow_id) AS flow_id
FROM payment_flows c
WHERE c.code = :flow_code /* <-- query parameter */
GROUP BY c.payment_id
ORDER BY c.code DESC, c.payment_id DESC
) d
ON d.payment_id = p.payment_id
LEFT
JOIN payment_flows n
ON n.payment_id = d.payment_id
AND n.payment_flow_id > d.payment_flow_id
WHERE n.payment_id IS NULL
ORDER BY d.payment_id DESC
LIMIT 100
The inline view query "d" gets the payment_flow_id (if any) for the specified code (:flow_code), so it returns only the payments that are at least that far in the processing flow.
The query uses an anti-join pattern to exclude rows that have a payment_flow_id that is "later" than the one for the specified code.
The anti-join is an outer join, to return all rows from the left side, along with matching rows from the right side, with a condition in the WHERE clause that excludes all rows that had a matching row. (Note the inequality comparison, only rows that had a "later" payment_flow_id values would be a match.)
There's no guarantee that this will be faster.
But with the suggested index improvements, it should get you nice looking EXPLAIN output. (Using EXPLAIN is a pretty good handle on the access plan that will be used by the query.)

How to write correct sql with left join on some tables?

Good day.
STRUCTURE TABLES AND ERROR WHEN EXECUTE QUERY ON SQLFIDDLE
I have some sql queries:
First query:
SELECT
n.Type AS Type,
n.UserIdn AS UserIdn,
u.Username AS Username,
n.NewsIdn AS NewsIdn,
n.Header AS Header,
n.Text AS Text,
n.Tags AS Tags,
n.ImageLink AS ImageLink,
n.VideoLink AS VideoLink,
n.DateCreate AS DateCreate
FROM News n
LEFT JOIN Users u ON n.UserIdn = u.UserIdn
SECOND QUERY:
SELECT
IFNULL(SUM(Type = 'up'),0) AS Uplikes,
IFNULL(SUM(Type = 'down'),0) AS Downlikes,
(IFNULL(SUM(Type = 'up'),0) - IFNULL(SUM(Type = 'down'),0)) AS SumLikes
FROM JOIN Likes
WHERE NewsIdn=NewsIdn //only for example- in main sql NewsIdn = value NewsIdn from row table News
ORDER BY UpLikes DESC
AND TREE QUERY
SELECT
count(*) as Favorit
Form Favorites
WHERE NewsIdn=NewsIdn //only for example- in main sql NewsIdn = value NewsIdn from row table News
I would like to combine both queries, display all rows from the table News, as well as the number of Uplikes, DownLikes and number of Favorit for each value NewsIdn from the table of News (i.e. number of Uplikes, DownLikes and number of Favorit for each row of News) and make order by Uplikes Desc.
Tell me please how to make it?
P.S.: in result i would like next values
TYPE USERIDN USERNAME NEWSIDN HEADER TEXT TAGS IMAGELINK VIDEOLINK DATECREATE UPLIKES DOWNLIKES SUMLIKES FAVORIT
image 346412 test 260806 test 1388152519.jpg December, 27 2013 08:55:27+0000 2 0 2 2
image 108546 test2 905554 test2 1231231111111111111111111 123. 123 1388153493.jpg December, 27 2013 09:11:41+0000 1 0 1 0
text 108546 test2 270085 test3 123 .123 December, 27 2013 09:13:30+0000 1 0 1 0
image 108546 test2 764955 test4 1388192300.jpg December. 27 2013 19:58:22+0000 0 1 -1 0
First, your table structures with all the "Idn" of varchar(30). It appears those would actually be ID keys to the other tables and should be integers for better indexing and joining performance.
Second, this type of process, especially web-based is a perfect example of DENORMALIZING the values for likes, dislikes, and favorites by actually having those columns as counters directly on the record (ex: News table). When a person likes, dislikes or makes as a favorite, stamp it right away and be done with it. If a first time through you do a bulk sql-update do so, but also have triggers on the table to automatically handle updating the counts appropriately. This way, you just query the table directly and order by that which you need and you are not required to query all likes +/- records joined to all news and see which is best. Having an index on the news table will be your best bet.
Now, that said, and with your existing table constructs, you can do via pre-aggregate queries and joining them as aliases in the sql FROM clause... something like
SELECT
N.Type,
N.UserIdn,
U.UserName,
N.NewsIdn,
N.Header,
N.Text,
N.Tags,
N.ImageLink,
N.VideoLink,
N.DateCreate,
COALESCE( SumL.UpLikes, 0 ) as Uplikes,
COALESCE( SumL.DownLikes, 0 ) as DownLikes,
COALESCE( SumL.NetLikes, 0 ) as NetLikes,
COALESCE( Fav.FavCount, 0 ) as FavCount
from
News N
JOIN Users U
ON N.UserIdn = U.UserIdn
LEFT JOIN ( select
L.NewsIdn,
SUM( L.Type = 'up' ) as UpLikes,
SUM( L.Type = 'down' ) as DownLikes,
SUM( ( L.Type = 'up' ) - ( L.Type = 'down' )) as NetLikes
from
Likes L
group by
L.NewsIdn ) SumL
ON N.NewsIdn = SumL.NewsIdn
LEFT JOIN ( select
F.NewsIdn,
COUNT(*) as FavCount
from
Favorites F
group by
F.NewsIdn ) Fav
ON N.NewsIdn = Fav.NewsIdn
order by
SumL.UpLikes DESC
Again, I do not understand why you would have an auto-increment numeric ID column for the news table, then ANOTHER value for it as NewsIdn as a varchar. I would just have this and your other tables reference the News.ID column directly... why have two columns representing the same component. And obviously, each table you are doing aggregates (likes, favorites), should have indexes on any such criteria you would join or aggregate on (hence NewsIdn) column, UserIdn, etc.
And final reminder, this type of query is ALWAYS running aggregates against your ENTIRE TABLE of likes, favorites EVERY TIME and suggest going with denormalized columns to hold the counts when someone so selects them. You can always go back to the raw tables if you ever want to show or update for a particular person to change their like/dislike/favorite status.
You'll have to look into reading on triggers as each database has its own syntax for handling.
As for table structures, this is a SIMPLIFIED version of what I would have (removed many other columns from you SQLFiddle sample)
CREATE TABLE IF NOT EXISTS `News` (
id int(11) NOT NULL AUTO_INCREMENT,
UserID integer NOT NULL,
... other fields
`DateCreate` datetime NOT NULL,
PRIMARY KEY ( id ),
KEY ( UserID )
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=5 ;
extra key on the User ID in case you wanted all news activity created by a specific user.
CREATE TABLE IF NOT EXISTS `Users` (
id int(11) NOT NULL AUTO_INCREMENT,
other fields...
PRIMARY KEY ( id ),
KEY ( LastName, Name )
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=5 ;
additional key in case you want to do a search by a user's name
CREATE TABLE IF NOT EXISTS `Likes` (
id int(11) NOT NULL AUTO_INCREMENT,
UserId integer NOT NULL,
NewsID integer NOT NULL,
`Type` enum('up','down') NOT NULL,
`IsFavorite` enum('yes','no') NOT NULL,
`DateCreate` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY ( UserID ),
KEY ( NewsID, IsFavorite )
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=6 ;
additional keys here for joining and/or aggregates. I've also added a flag column for being a favorite too. This could prevent the need of a favorites table since they hold the same basic content of the LIKES. So someone could just LIKE/DISLIKE, against a given news item, but ALSO LIKE/DISLIKE it as a FAVORITE the end-user wants to quickly be able to reference.
Now, how do these table structures get simplified for querying? Each table has its own "id" column, but any OTHER table is uses the tableNameID (UserID, NewsID, LikesID or whatever) and that is the join.
select ...
from
News N
Join Users U
on N.UserID = U.ID
Join Likes L
on N.ID = L.NewsID
Integer columns are easier and more commonly identifiable by others when writing queries... Does this make a little more sense?
SELECT
n.Type AS Type,
n.UserIdn AS UserIdn,
u.Username AS Username,
n.NewsIdn AS NewsIdn,
n.Header AS Header,
n.Text AS Text,
n.Tags AS Tags,
n.ImageLink AS ImageLink,
n.VideoLink AS VideoLink,
n.DateCreate AS DateCreate,
IFNULL(SUM(Likes.Type = 'up'),0) AS Uplikes,
IFNULL(SUM(Likes.Type = 'down'),0) AS Downlikes,
(IFNULL(SUM(Likes.Type = 'up'),0) - IFNULL(SUM(Likes.Type = 'down'),0)) AS SumLikes,
COUNT(DISTINCT Favorites.id) as Favorit
FROM News n
LEFT JOIN Users u ON n.UserIdn = u.UserIdn
LEFT JOIN Likes ON Likes.NewsIdn = n.NewsIdn
LEFT JOIN Favorites ON n.NewsIdn=Favorites.NewsIdn
GROUP BY n.NewsIdn

Mysql query to check if all sub_items of a combo_item are active

I am trying to write a query that looks through all combo_items and only returns the ones where all sub_items that it references have Active=1.
I think I should be able to count how many sub_items there are in a combo_item total and then compare it to how many are Active, but I am failing pretty hard at figuring out how to do that...
My table definitions:
CREATE TABLE `combo_items` (
`c_id` int(11) NOT NULL,
`Label` varchar(20) NOT NULL,
PRIMARY KEY (`c_id`)
)
CREATE TABLE `sub_items` (
`s_id` int(11) NOT NULL,
`Label` varchar(20) NOT NULL,
`Active` int(1) NOT NULL,
PRIMARY KEY (`s_id`)
)
CREATE TABLE `combo_refs` (
`r_id` int(11) NOT NULL,
`c_id` int(11) NOT NULL,
`s_id` int(11) NOT NULL,
PRIMARY KEY (`r_id`)
)
So for each combo_item, there is at least 2 rows in the combo_refs table linking to the multiple sub_items. My brain is about to make bigbadaboom :(
I would just join the three tables usually and then combo-item-wise sum up the total number of sub-items and the number of active sub-items:
SELECT ci.c_id, ci.Label, SUM(1) AS total_sub_items, SUM(si.Active) AS active_sub_items
FROM combo_items AS ci
INNER JOIN combo_refs AS cr ON cr.c_id = ci.c_id
INNER JOIN sub_items AS si ON si.s_id = cr.s_id
GROUP BY ci.c_id
Of course, instead of using SUM(1) you could just say COUNT(ci.c_id), but I wanted an analog of SUM(si.Active).
The approach proposed assumes Active to be 1 (active) or 0 (not active).
To get only those combo-items whose all sub-items are active, just add WHERE si.Active = 1. You could then reject the SUM stuff anyway. Depends on what you are looking for actually:
SELECT ci.c_id, ci.Label
FROM combo_items AS ci
INNER JOIN combo_refs AS cr ON cr.c_id = ci.c_id
INNER JOIN sub_items AS si ON si.s_id = cr.s_id
WHERE si.Active = 1
GROUP BY ci.c_id
By the way, INNER JOIN ensures that there is at least one sub-item per combo-item at all.
(I have not tested it.)
See this answer:
MySQL: Selecting foreign keys with fields matching all the same fields of another table
Select ...
From combo_items As C
Where Exists (
Select 1
From sub_items As S1
Join combo_refs As CR1
On CR1.s_id = S1.s_id
Where CR1.c_id = C.c_id
)
And Not Exists (
Select 1
From sub_items As S2
Join combo_refs As CR2
On CR2.s_id = S2.s_id
Where CR2.c_id = C.c_id
And S2.Active = 0
)
The first subquery ensures that at least one sub_item exists. The second ensures that none of the sub_items are inactive.