Refactor query to work with MySQL (unknown column error) - mysql

Schema:
CREATE TABLE IF NOT EXISTS `user` (
`id` BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
`deleted` TIMESTAMP NOT NULL,
`email` VARCHAR(254) NOT NULL UNIQUE
);
CREATE TABLE IF NOT EXISTS `userVersion` (
`userId` BIGINT UNSIGNED,
`effective` TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
`created` TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
`name` VARCHAR(100) NOT NULL,
PRIMARY KEY (`userId`, `effective`, `created`),
FOREIGN KEY (`userId`) REFERENCES `user`(`id`)
);
The query I'm trying to perform:
SELECT u.id
FROM `user` u
INNER JOIN userVersion uv
ON u.id = uv.userId
AND uv.effective = (
SELECT MAX(uv1.effective)
FROM userVersion uv1
WHERE uv1.userId = u.id
AND uv1.effective <= NOW())
AND uv.created = (
SELECT MAX(uv2.created)
FROM userVersion uv2
WHERE uv2.userId = u.id
AND uv2.effective = uv1.effective
AND uv2.created <= NOW())
I'm getting an unknown column error of uv1.effective (situated right before the last line). I believe this query works for other databases (e.g. Oracle) but doesn't seem to work with MySQL. How could I change this query to get the same behavior?
PS: The created column is supposed to represent when the row was inserted in the database while effective is supposed to represent when that row should start being used (this allows me to add changes in the present that will work in the future).

Related

Use Indexes For Join on Indexed DATETIME and Indexed DATE columns

EDIT
I misread my initial error and blamed the INDEX not being used on the wrong columns.
I was able to recreate the issue that I saw and the solution that ysth suggested worked.
Below are the create tables statements, inserts to the tables, and two queries - one that has the error and another with the solution which does not have it.
# Make tables and indices
DROP TABLE a;
DROP TABLE b;
create table a
(
DT DATE,
USER INT,
COMMENT_SENTIMENT INT,
PRIMARY KEY (USER, DT));
CREATE INDEX a_DT_USER_IDX ON a (DT,USER);
create table b
(
id int auto_increment primary key,
DT DATETIME(6),
USER mediumtext,
COMMENT_SENTIMENT INT);
CREATE INDEX b_DT_USER_IDX ON b (DT);
CREATE UNIQUE INDEX b_DT_USER ON b (USER(16), DT);
# Insert some dummy data
INSERT INTO a VALUES('2023-01-01', 5, 4);
INSERT INTO b VALUES(NULL, '2023-01-01 00:00:00', 5, 4);
# Explain that shows the issue I was seeing.
EXPLAIN
SELECT *
FROM a
JOIN b
ON a.DT = b.DT
AND a.USER = b.USER;
# Out
# 1,SIMPLE,a,,ALL,"PRIMARY,a_DT_USER_IDX",,,,1,100,
# 1,SIMPLE,b,,ref,"b_DT_USER,b_DT_USER_IDX",b_DT_USER_IDX,9,a.DT,1,100,Using index condition; Using where
[2023-01-24 18:00:14] [HY000][1739] Cannot use ref access on index 'b_DT_USER' due to type or collation conversion on field 'USER'
[2023-01-24 18:00:14] [HY000][1003] /* select#1 */ select `a`.`DT` AS `DT`, `a`.`USER` AS `USER`,`a`.`COMMENT_SENTIMENT` AS `COMMENT_SENTIMENT`,`b`.`id` AS `id`,`b`.`DT` AS `DT`,`b`.`USER` AS `USER`,`b`.`COMMENT_SENTIMENT` AS `COMMENT_SENTIMENT` from `a` join `b` where ((`a`.`DT` = `b`.`DT`) and (`a`.`USER` = `b`.`USER`))
# Explain with the fix ysth suggested
EXPLAIN
SELECT *
FROM a
JOIN b
ON a.DT = b.DT
AND a.USER = CAST(b.USER AS DECIMAL );
# 1,SIMPLE,a,,ALL,"PRIMARY,a_DT_USER_IDX",,,,1,100,
# 1,SIMPLE,b,,ref,b_DT_USER_IDX,b_DT_USER_IDX,9,a.DT,1,100,Using index condition; Using where
# [2023-01-24 18:04:24] [HY000][1003] /* select#1 */ select `a`.`DT` AS `DT`,`a`.`USER` AS `USER`,`a`.`COMMENT_SENTIMENT` AS `COMMENT_SENTIMENT`,`b`.`id` AS `id`,`b`.`DT` AS `DT`,`b`.`USER` AS `USER`,`b`.`COMMENT_SENTIMENT` AS `COMMENT_SENTIMENT` from `a` join `b` where ((`a`.`DT` = `b`.`DT`) and (`a`.`USER` = cast(`b`.`USER` as decimal(10,0))))
# [2023-01-24 18:04:24] 2 rows retrieved starting from 1 in 359 ms (execution: 250 ms, fetching: 109 ms)
__
The below information is incorrect. Please use the edit to see the issue I was having and it's solution.
I have three tables a, b, and c in my MySQL 5.7 database. SHOW CREATE statements for each table are:
CREATE TABLE `a` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`DT` date DEFAULT NULL,
`USER` int(11) DEFAULT NULL,
`COMMENT_SENTIMENT` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `a_DT_USER_IDX` (`DT`,`USER`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
CREATE TABLE `b` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`DT` datetime DEFAULT NULL,
`USER` int(11) DEFAULT NULL,
`COMMENT_SENTIMENT` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `b_DT_USER_IDX` (`DT`,`USER`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
CREATE TABLE `c` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`DT` date DEFAULT NULL,
`USER` int(11) DEFAULT NULL,
`COMMENT_SENTIMENT` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `b_DT_USER_IDX` (`DT`,`USER`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
Table a has a DATE column a.DT, table b has a DATETIME column b.DT, and table c has a DATE column c.DT.
All of these DT columns are indexed.
As a caveat, while b.DT is a DATETIME, all of the 'time' portions in it are 00:00:00 and they always will be. It probably should be a DATE, but I cannot change it.
I want to join table a and table b on their DT columns, but explain tells me that their indices are not used:
Cannot use ref access on index 'b.DT_datetime_index' due to type or collation conversion on field 'DT'
When I join table a and b on a.DT and b.DT
SELECT *
FROM a
JOIN b
ON a.DT = b.DT;
The result is much slower than when I do the same with a and c
SELECT *
FROM a
JOIN c
ON a.DT = c.DT;
Is there a way to use the indices in join from the first query on a.DT = b.DT, specifically without altering the tables? I'm not sure if b.DT having only 00:00:00 for the time portion could be relevant in a solution.
The end goal is a faster select using this join.
Thank you!
-- What I've done section --
I compared the joins between a.DT = b.DT and a.DT = c.DT, and saw the time difference.
I also tried wrapping b's DT column with DATE(b.DT), but explain gave the same issue, which is pretty expected.
MySQL won't use an index to join DATE and DATETIME columns.
You can create a virtual column with the corresponding DATE and use that.
CREATE TABLE `b` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`DT` datetime DEFAULT NULL,
`USER` int(11) DEFAULT NULL,
`COMMENT_SENTIMENT` int(11) DEFAULT NULL,
`DT_DATE` DATE AS (DATE(DT)),
PRIMARY KEY (`id`),
KEY `b_DT_USER_IDX` (`DT_DATE`,`USER`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
SELECT *
FROM a
JOIN b
ON a.DT = b.DT_DATE;
Assuming you want to read a and join b rows, you can just do
SELECT *
FROM a
JOIN b
ON b.DT = timestamp(a.DT);
If the other way around, then
SELECT *
FROM b
JOIN a
ON a.DT = date(b.DT);
No need for a virtual column.
Virtually any function call is "not sargable " That is, CAST(b.USER AS DECIMAL ) prevents the use of an index.
Do not mix strings and ints in comparisons. The string will be converted to numeric. If the string is a literal, such as '123' then the Optimizer is smart enough to do that once. If it is a column name, it must check every row.
Tip: If you are likely to test for one user and a range of dates, then this works better than the opposite order.
INDEX(user, dt)`
(You may need an index starting with dt for other queries.)

Any way to do this query faster with big data

This query takes around 2.23seconds and feels a bit slow ... is there anyway to make it faster.
our member.id, member_id, membership_id, valid_to, valid_from has index as well.
select *
from member
where (member.id in ( select member_id from member_membership mm
INNER JOIN membership m ON mm.membership_id = m.id
where instr(organization_chain, 2513) and m.valid_to > NOW() and m.valid_from < NOW() ) )
order by id desc
limit 10 offset 0
EXPLAIN FOR WHAT QUERY DOING: every member has many a member_memberships and and member_memberships connect with another table called membership there we have the membership details. so query will get all members that has valid memberships and where the organization id 2513 exist on member_membership.
Tables as following:
CREATE TABLE `member` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`first_name` varchar(255) DEFAULT NULL,
`last_name` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;
CREATE TABLE `member_membership` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`membership_id` int(11) DEFAULT NULL,
`member_id` int(11) DEFAULT NULL,
`organization_chain` text DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `member_membership_to_membership` (`membership_id`),
KEY `member_membership_to_member` (`member_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;
CREATE TABLE `membership` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
`valid_to` datetime DEFAULT NULL,
`valid_from` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `valid_to` (`valid_to`),
KEY `valid_from` (`valid_from`),
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;
ALTER TABLE `member_membership` ADD CONSTRAINT `member_membership_to_membership` FOREIGN KEY (`membership_id`) REFERENCES `membership` (`id`);
ALTER TABLE `member_membership` ADD CONSTRAINT `member_membership_to_member` FOREIGN KEY (`member_id`) REFERENCES `member` (`id`);
Here with EXPLAIN statement => https://i.ibb.co/xjrcYWR/EXPLAIN.png
Relations
member has many member_membership
membership has manymember_membership
So member_membership is like join for tables member and membership.
Well I found a way to make it less to 800ms ... like this. Is this good way or maybe there is more we can do?
select *
from member
where (member.id in ( select member_id from member_membership mm FORCE INDEX (PRIMARY)
INNER JOIN membership m ON mm.membership_id = m.id
where instr(organization_chain, 2513) and m.valid_to > NOW() and m.valid_from < NOW() ) )
order by id desc
limit 10 offset 0
NEW UPDATE.. and I think this solve the issue.. 15ms :)
I added FORCE INDEX..
The FORCE INDEX hint acts like USE INDEX (index_list), with the addition that a table scan is assumed to be very expensive. In other words, a table scan is used only if there is no way to use one of the named indexes to find rows in the table.
select *
from member
where (member.id in ( select member_id from member_membership mm FORCE INDEX (member_membership_to_member)
INNER JOIN membership m FORCE INDEX (organization_to_membership) ON mm.membership_id = m.id
where instr(organization_chain, 2513) and m.valid_to > NOW() and m.valid_from < NOW() ) )
order by id desc
limit 10 offset 0
How big is organization_chain? If you don't need TEXT, use a reasonably sized VARCHAR so that it could be in an index. Better yet, is there some way to get 2513 in a column by itself?
Don't use id int(11) NOT NULL AUTO_INCREMENT, in a many-to-many table; rather have the two columns in PRIMARY KEY.
Put the ORDER BY and LIMIT in the subquery.
Don't use IN ( SELECT ...), use a JOIN.

Mysql: Get latest row by two dates

May be someone could give me an advice how to achieve my goal.
I'm using MySQL
I have a table with historical data
CREATE TABLE `history` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`parent_id` int(11) DEFAULT NULL,
`from_dt` date NOT NULL,
`date_create` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
`approved` tinyint(1) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`),
KEY `parent_id` (`parent_id`),
)
Is there easier way to get dataset with latest records for each user(user_id) in this table, where from_dt less than now()
from_dt - could contain any date, so there're might be records in the future and in the past.
What I got for now:
SELECT * FROM `history` right join (SELECT
history.user_id, MAX(date_create)
FROM
history
RIGHT JOIN
(SELECT
user_id, MAX(from_dt) max_from
FROM
history
WHERE
from_dt < NOW()
GROUP BY user_id , from_dt) AS hf ON hf.max_from = history.from_dt
AND hf.user_id = history.user_id
GROUP BY user_id) as hdt on hdt.user_id = history.user_id
But join tables 3 times to itself looks a little bit messy for me, cause I have to join here additional data (like user info, etc)
Many thanks,
Max
You can simply try this -
SELECT *
FROM `history` H1
INNER JOIN (SELECT user_id, MAX(from_dt) max_from
FROM history
GROUP BY user_id) users
ON H1.`date_create` = users.max_from
WHERE from_dt < NOW()

mySql subtract row of different table

I want to subtract between two rows of different table:
I have created a view called leave_taken and table called leave_balance.
I want this result from both table:
leave_taken.COUNT(*) - leave_balance.balance
and group by leave_type_id_leave_type
Code of both table
-----------------View Leave_Taken-----------
CREATE ALGORITHM = UNDEFINED DEFINER=`1`#`localhost` SQL SECURITY DEFINER
VIEW `leave_taken`
AS
select
`leave`.`staff_leave_application_staff_id_staff` AS `staff_leave_application_staff_id_staff`,
`leave`.`leave_type_id_leave_type` AS `leave_type_id_leave_type`,
count(0) AS `COUNT(*)`
from
(
`leave`
join `staff` on((`staff`.`id_staff` = `leave`.`staff_leave_application_staff_id_staff`))
)
where (`leave`.`active` = 1)
group by `leave`.`leave_type_id_leave_type`;
----------------Table leave_balance----------
CREATE TABLE IF NOT EXISTS `leave_balance` (
`id_leave_balance` int(11) NOT NULL AUTO_INCREMENT,
`staff_id_staff` int(11) NOT NULL,
`leave_type_id_leave_type` int(11) NOT NULL,
`balance` int(3) NOT NULL,
`date_added` date NOT NULL,
PRIMARY KEY (`id_leave_balance`),
UNIQUE KEY `id_leave_balance_UNIQUE` (`id_leave_balance`),
KEY `fk_leave_balance_staff1` (`staff_id_staff`),
KEY `fk_leave_balance_leave_type1` (`leave_type_id_leave_type`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=3 ;
------- Table leave ----------
CREATE TABLE IF NOT EXISTS `leave` (
`id_leave` int(11) NOT NULL AUTO_INCREMENT,
`staff_leave_application_id_staff_leave_application` int(11) NOT NULL,
`staff_leave_application_staff_id_staff` int(11) NOT NULL,
`leave_type_id_leave_type` int(11) NOT NULL,
`date` date NOT NULL,
`active` int(11) NOT NULL DEFAULT '1',
`date_updated` date NOT NULL,
PRIMARY KEY (`id_leave`,`staff_leave_application_id_staff_leave_application`,`staff_leave_application_staff_id_staff`),
KEY `fk_table1_leave_type1` (`leave_type_id_leave_type`),
KEY `fk_table1_staff_leave_application1` (`staff_leave_application_id_staff_leave_application`,`staff_leave_application_staff_id_staff`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=32 ;
Well, I still don't think you've provided enough information. It would be very helpful to have some sample data and your expected output (in tabular format). That said, I may have something you can start working with. This query finds all staff members, calculates their current leave (grouped by type), and determines the difference between that and their balance by leave type. Take a look at it, and more importantly (perhaps) the sqlfiddle here that I used which has the sample data in it (very important to determining if this is the correct path for your data).
SELECT
staff.id_staff,
staff.name,
COUNT(`leave`.id_leave) AS leave_count,
leave_balance.balance,
(COUNT(`leave`.id_leave) - leave_balance.balance) AS leave_difference,
`leave`.leave_type_id_leave_type AS leave_type
FROM
staff
JOIN `leave` ON staff.id_staff = `leave`.staff_leave_application_staff_id_staff
JOIN leave_balance ON
(
staff.id_staff = leave_balance.staff_id_staff
AND `leave`.leave_type_id_leave_type = leave_balance.leave_type_id_leave_type
)
WHERE
`leave`.active = 1
GROUP BY
staff.id_staff, leave_type;
Good luck!

MySQL query killing my server

Looking at this query there's got to be something bogging it down that I'm not noticing. I ran it for 7 minutes and it only updated 2 rows.
//set product count for makes
$tru->query->run(array(
'name' => 'get-make-list',
'sql' => 'SELECT id, name FROM vehicle_make',
'connection' => 'core'
));
while($tempMake = $tru->query->getArray('get-make-list')) {
$tru->query->run(array(
'name' => 'update-product-count',
'sql' => 'UPDATE vehicle_make SET product_count = (
SELECT COUNT(product_id) FROM taxonomy_master WHERE v_id IN (
SELECT id FROM vehicle_catalog WHERE make_id = '.$tempMake['id'].'
)
) WHERE id = '.$tempMake['id'],
'connection' => 'core'
));
}
I'm sure this query can be optimized to perform better, but I can't think of how to do it.
vehicle_make = 45 rows
taxonomy_master = 11,223 rows
vehicle_catalog = 5,108 rows
All tables have appropriate indexes
UPDATE: I should note that this is a 1-time script so overhead isn't a big deal as long as it runs.
CREATE TABLE IF NOT EXISTS `vehicle_make` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(32) NOT NULL,
`product_count` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=46 ;
CREATE TABLE IF NOT EXISTS `taxonomy_master` (
`product_id` int(10) NOT NULL,
`v_id` int(10) NOT NULL,
`vehicle_requirement` varchar(255) DEFAULT NULL,
`is_sellable` enum('True','False') DEFAULT 'True',
`programming_override` varchar(25) DEFAULT NULL,
PRIMARY KEY (`product_id`,`v_id`),
KEY `idx2` (`product_id`),
KEY `idx3` (`v_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE IF NOT EXISTS `vehicle_catalog` (
`v_id` int(10) NOT NULL,
`id` int(11) NOT NULL,
`v_make` varchar(255) NOT NULL,
`make_id` int(11) NOT NULL,
`v_model` varchar(255) NOT NULL,
`model_id` int(11) NOT NULL,
`v_year` varchar(255) NOT NULL,
PRIMARY KEY (`v_id`,`v_make`,`v_model`,`v_year`),
UNIQUE KEY `idx` (`v_make`,`v_model`,`v_year`),
UNIQUE KEY `idx2` (`v_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Update: The successful query to get what I needed is here....
SELECT
m.id,COUNT(t.product_id) AS CountOf
FROM taxonomy_master t
INNER JOIN vehicle_catalog v ON t.v_id=v.id
INNER JOIN vehicle_make m ON v.make_id=m.id
GROUP BY m.id;
without the tables/columns this is my best guess from reverse engineering the given queries:
UPDATE m
SET product_count =COUNT(t.product_id)
FROM taxonomy_master t
INNER JOIN vehicle_catalog v ON t.v_id=v.id
INNER JOIN vehicle_make m ON v.make_id=m.id
GROUP BY m.name
The given code loops over each make, and then runs a query the counts for each. My answer just does them all in one query and should be a lot faster.
have an index for each of these:
vehicle_make.id cover on name
vehicle_catalog.id cover make_id
taxonomy_master.v_id
EDIT
give this a try:
CREATE TEMPORARY TABLE CountsOf (
id int(11) NOT NULL
, CountOf int(11) NOT NULL DEFAULT 0.00
);
INSERT INTO CountsOf
(id, CountOf )
SELECT
m.id,COUNT(t.product_id) AS CountOf
FROM taxonomy_master t
INNER JOIN vehicle_catalog v ON t.v_id=v.id
INNER JOIN vehicle_make m ON v.make_id=m.id
GROUP BY m.id;
UPDATE taxonomy_master,CountsOf
SET taxonomy_master.product_count=CountsOf.CountOf
WHERE taxonomy_master.id=CountsOf.id;
instead of using nested query ,
you can separated this query to 2 or 3 queries,
and in php insert the result of the inner query to the out query ,
its faster !
#haim-evgi Separating the queries will not increase the speed significantly, it will just shift the load from the DB server to the Web server and create overhead of moving data between the two servers.
I am not sure with the appropriate indexes you run such query 7 minutes. Could you please show the table structure of the tables involved in these queries.
Seems like you need the following indices:
INDEX BTREE('make_id') on vehicle_catalog
INDEX BTREE('v_id') on taxonomy_master