I have tables Alpha and Beta. Beta belongs to Alpha.
create table Alpha
(
id int auto_increment primary key
);
create table Beta
(
id int auto_increment primary key,
alphaId int null,
orderValue int,
constraint Alpha_ibfk_1 foreign key (alphaId) references Alpha (id)
);
Here are a few test records:
insert into Alpha (id) values (1);
insert into Alpha (id) values (2);
insert into Beta (id, alphaId, orderValue) values (1, 1, 23);
insert into Beta (id, alphaId, orderValue) values (2, 1, 43);
insert into Beta (id, alphaId, orderValue) values (3, 2, 73);
I want to create a pagination for them, that would make sense in terms of my application logic. So when I set limit 2, for example, I expect to get a list of two Alpha records and their related records, but in fact when I set limit 2:
select *
from Alpha
inner join Beta on Alpha.id = Beta.alphaId
order by Beta.orderValue
limit 2;
I am resulted with only one Alpha record and its related data:
While I want to figure out a way for my LIMIT construct to only count unique occurrences of Alpha records and return me something like this:
Is it possible to do it in MySQL in one query? Maybe different RDBMS? Or going with multiple queries is the only option?
=== EDIT
The reason for such requirements is that I want to create an API with paging that returns records of Alpha, and their related Beta records. The problem is that the way limit works does not make sense from the user's standpoint: "Hey, I said I want 2 records of Alpha with its related data, not 1. What is that?"
There are a couple of issues with your example:
Your foreign key seems to be wrongly established.
Limiting overwhelmingly requires an explicit order of the rows. Otherwise the result will be unstable and non-reproducible.
Anyway, having said that, you can place a limit on rows for table Alpha and then perform the join against table Beta.
For example:
select *
from (
select *
from Alpha
order by id
limit 2 -- this limit only affects table Alpha
) x
join Beta b on b.alphaId = x.id
Related
Problem
Is there a way to automatically count related tables (one to Many), where the count will automatically increment / decrement depending if we add or remove a related item.
So, I could obviously just do a COUNT, but for perfomance reason it is expensive millions of records and are queries many many times.
As I solution, rather then count every time, I would actually create a counter where add 1 when adding a new related item or remove 1 when deleting or de-ferencing another item
In order to do that, I may just create another table that serves as a counter, and query that table without counting.
Is there a better way, preferably, that is automatic?
Example
schema
create table object
(
object_id int auto_increment primary key,
name varchar(120) not null
);
create table item
(
item_id varchar(63) not null,
object_id int not null,
primary key (object_id, item_id)
);
insert into object (name) VALUES ("hello");
insert into item (item_id, object_id) VALUES
("item1", 1),
("item2", 1),
("item3", 1),
("item4", 1);
Object "hello" has 4 items:
select count(*) from item where object_id = 1;
-- ouput: 4
However, I found that as a work around, I can create a counter (that is using back-end Python) that each time you do a CRUD operation, the counter is updated. For example:
Counter schema
create table item_counter
(
counter bigint NOT NULL DEFAULT 0,
object_id int NOT NULL primary key
)
So now the ORM in the could we would do something like this (again, it will be handle in Python, but it doesn't matter is just as example:
-- create object AND item_counter
insert into object (name) VALUES ("hello");
-- create in the same time a counter
insert into item_counter (object_id) VALUES ((SELECT object_id FROM object where name = "hello"));
-- create items
insert into item (item_id, object_id) VALUES ("item1", 2);
update item_counter set counter = counter + 1 where object_id = 2;
insert into item (item_id, object_id) VALUES ("item2", 2);
update item_counter set counter = counter + 1 where object_id = 2;
insert into item (item_id, object_id) VALUES ("item3", 2);
update item_counter set counter = counter + 1 where object_id = 2;
insert into item (item_id, object_id) VALUES ("item4", 2);
update item_counter set counter = counter + 1 where object_id = 2;
-- select the counter instead
select counter from item_counter where object_id = 2;
That if it was in python it would look like
# pseudo ORM code
class ItemORM:
def save(self, item_id, object_id):
self.orm.save(item_id, object_id)
counter = self.orm.get_counter(object_id)
counter.add()
So, is there a better way, especially, something that MySQL may do automatically?
More Context
In case you want to know why, let's say that I am working in a large code base, which lots of legacy code and API dependecies used by external clients and currently there isn't a counter implementation but is merely my idea to overcome it. So, changing large part of the code is very risky, instead, tweak come MYSQL tables may be a better solution.
MySQL alone doesn't have any automatic way of doing what you describe. It's up to you to implement a solution, or use an additional technology with MySQL.
I can think of at least the following solutions:
Use triggers to maintain your counter. Increment on INSERT, decrement on DELETE. You might run into throughput problems if your rate of concurrent inserts and delete is very high.
create trigger ti after insert on item
for each row
update item_counter
set counter = counter + 1
where object_id = NEW.object_id;
create trigger td after delete on item
for each row
update item_counter
set counter = counter - 1
where object_id = OLD.object_id;
Demo: https://dbfiddle.uk/tgJM0I4m
Use a materialized view. PlanetScale recently offered a product (in beta as of November 2022) that does something like a materialized view, but it's more complex than that. Read their description: https://planetscale.com/blog/how-planetscale-boost-serves-your-sql-queries-instantly
Use a data warehouse. An OLTP database like MySQL typically stores tables as rows, so it's optimized for CRUD type queries. A data warehouse or OLAP database stores tables as columns, so aggregate queries like the one you describe are its strength. But this means setting up a separate database technology, and syncing your MySQL data to the data warehouse continually.
WARP is a MySQL 8.0 storage engine that seems to combine both a column-store and materialized views. See https://www.leapdb.com/ I worked with the developer who created WARP, he's brilliant.
Given MySQL tables something like this1:
alpha (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
, beta_id INT NULL
, type_info INT NOT NULL
, CONSTRAINT fk_alpha_beta FOREIGN KEY (beta_id) REFERENCES beta (id)
) ENGINE=InnoDB;
beta (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
, type_info INT NOT NULL
) ENGINE=InnoDB;
My alpha table contains many rows, and I want to create corresponding rows in table beta, and store the ids in the alpha.beta_id column.
I can create the rows in beta in one shot like this:
INSERT INTO beta (type_info)
SELECT type_info FROM alpha
WHERE beta_id IS NULL;
How can I do this in such a way that I can put the generated beta ids into the beta_id columns of the corresponding rows in alpha?
The rows in beta do not have unique characteristics that would allow me to identify them retrospectively.
If it were only one row I was creating, I could use something like:
UPDATE alpha
SET beta_id=LAST_INSERT_ID()
WHERE beta_id IS NULL;
But that does not seem to be generalisable to multiple rows.
This question is vaguely similar, but it is using INSERT ... VALUES and isn't applicable to a use case with INSERT ... SELECT.
This question has similar requirements, and I know I can use ROW_COUNT to see how many rows were inserted, but that doesn't give me a correspondence between the rows I added and the rows in alpha that they were created for.
I'm left with writing a non-SQL script that will pull all the information out, create the beta rows one at a time, and then update rows in alpha individually. But I would rather do this in-SQL, and it seems like a fairly obvious use case.
1. Obviously our actual schema is somewhat more complicated, but I have tried to boil it down to the information necessary to demonstrate the question.
You can assume that a multi-row INSERT generates consecutive auto-increment id's, unless innodb_auto_inc_lock_mode=2. See https://dev.mysql.com/doc/refman/8.0/en/innodb-auto-increment-handling.html
LAST_INSERT_ID() will tell you the first id generated by the insert, and the others will each be 1 greater, as many as the number of rows inserted.
Then you can UPDATE alpha to set these, but be sure to use ORDER BY to specify both the order of copying and the order of UPDATE so they remain the same.
INSERT INTO beta (type_info)
SELECT type_info FROM alpha
WHERE beta_id IS NULL
ORDER BY id;
SET #id = LAST_INSERT_ID()-1;
UPDATE alpha
SET beta_id=(#id := #id + 1)
WHERE beta_id IS NULL
ORDER BY id;
If there is a one-to-one mapping, you can join back using the type column:
update alpha a join
beta b
on a.type_info = b.type_info
set a.beta_id = b.id;
EDIT:
MySQL doesn't have great facilities for this. But you can add an alpha_id (perhaps temporarily) and do:
alter table beta add column alpha_id int;
INSERT INTO beta (alpha_id, type_info)
SELECT a.id, a.type_info
FROM alpha a
WHERE beta_id IS NULL;
update alpha a join
beta b
on a.id = b.alpha_id
set a.beta_id = b.id;
You can remove the column after the update, if you want.
I have a table which consists of columns for users, categories and amount.
A user can buy an amount products from each category. I want to store only the very last purchase.
User Category Amount
1 100 15
1 103 25
Imagine that this user has just bought 30 pieces from 100 or from 110. Either additional category or a new category. This can be handled using following pseudo code:
SELECT amount FROM table WHERE user=1 AND category=100
if row exists
UPDATE table SET amount=30 WHERE user=1 AND category=100
else
INSERT INTO table (user, category, amount) VALUES(1, 100, 30)
The other way to do is, just always deleting the old value (ignoring the error message when not exists( and always inserting a new one.
DELETE FROM table WHERE user=1 AND category=100
INSERT INTO table VALUES(1, 100, 30)
Which of these patterns is preferred from performance point of view?
Does it matter which PK and FK exists?
mysql supports replace, so no need of delete insert or update. But this one assumes a unique key or primary key on your table as reference
REPLACE
INTO yourtable (user, category, amount)
VALUES (1, 100, 30);
I'm using MySQL and I have a database that I'm trying to use as the canonical version of data that I will be storing in different indexes. That means that I have to be able to retrieve a set of data quickly from it by primary key, but I also need to sort it on the way out. I can't figure out how to let MySQL efficiently do that.
My table looks like the following:
CREATE TABLE demo.widgets (
id INT AUTO_INCREMENT,
-- lots more information I need
awesomeness INT,
PRIMARY KEY (id),
INDEX IDX_AWESOMENESS (awesomeness),
INDEX IDX_ID_AWESOMENESS (id, awesomeness)
);
And I want to do something along the lines of:
SELECT *
FROM demo.widgets
WHERE id IN (
1,
2,
3,
5,
8,
13 -- you get the idea
)
ORDER BY awesomeness
LIMIT 50;
But unfortunately I can't seem to get good performance out of this. It always has to resort to a filesort. Is there a way to get better performance from this setup, or do I need to consider a different database?
This is explained in the documentation ORDER BY Optimization. In particular, if the index used to select rows in the WHERE clause is different from the one used in ORDER BY, it won't be able to use an index for ORDER BY.
In order to get an optimized query to fetch and retrieve like that, you need to have a key that orders by sort and then primary like so:
create table if not exists test.fetch_sort (
id int primary key,
val int,
key val_id (val, id)
);
insert into test.fetch_sort
values (1, 10), (2, 5), (3, 30);
explain
select *
from test.fetch_sort
where id in (1, 2, 3)
order by val;
This will give a query that only uses the index for searching/sorting.
I am developing an app that needs to match people together. Each person can only match with one other person.. So, in the table below I am trying to make it so that values from user1 & user2 are unique across both fields:
CREATE TABLE `match_table` (
`user1` int(11) NOT NULL,
`user2` int(11) NOT NULL,
UNIQUE KEY `user2` (`user2`),
UNIQUE KEY `user1` (`user1`))
So for example the following INSERT statement should ignore rows 2 and 4. Or at a minimum I need to be able to flag those rows to be ignored. Note that line 5 is OK because of the fact that line 2 and 4 have been ignored.
INSERT IGNORE INTO match_table (user1, user2)
VALUES
(1,2),
(2,3),
(4,5),
(6,4),
(3,6)
Is there any index that can accomplish this? .. Otherwise is there some UPDATE I could do after insertion that could flag the ones I want to ignore?
Assuming that the matching has no direction, you can use your existing table design but store the match both ways in the table:
INSERT IGNORE INTO match_table (user1, user2)
VALUES
(1,2),
(2,1),
(2,3),
(3,2),
(4,5),
(5,4),
(6,4),
(4,6),
(3,6),
(6,3)
If you just fetch all rows you will got each matching twice. You can avoid this as follows:
SELECT * FROM match_table WHERE user1 < user2
1, 2
3, 6
4, 5
One potential issue with this design is that it is possible to insert a row with a user who matches himself.
If you are just asking for a database constraint, you can get that with UNIQUE INDEX ('user1','user2')