Table Schema
For the two tables, the CREATE queries are given below:
Table1: (file_path_key, dir_path_key)
create table Table1(
file_path_key varchar(500),
dir_path_key varchar(500),
primary key(file_path_key))
engine = innodb;
Table2: (file_path_key, hash_key)
create table Table2(
file_path_key varchar(500) not null,
hash_key bigint(20) not null,
foreign key (file_path_key) references Table1(file_path_key) on update cascade on delete cascade)
engine = innodb;
Objective:
Given a file_path F and it's dir_path string D, I need to find all those
file names which have at least one hash in the set of hashes of F, but
don't have their directory names as D. If a file F1 shares multiple hashes
with F, then it should be repeated that many times.
Note that the file_path_key column in Table1 and the hash_key column in Table2 are indexed.
In this particular case, Table1 has around 350,000 entries and Table2 has 31,167,119 entries, which makes my current query slow:
create table temp
as select hash_key from Table2
where file_path_key = F;
select s1.file_path_key
from Table1 as s1
join Table2 as s2
on s1.file_path_key join
temp on temp.hash_key = s2.hash_key
where s1.dir_path_key != D
How can I speed up this query?
I do not understand what is the purpose of temp table, but remember that such table, created with CREATE .. SELECT, does not have any indexes. So at the very least fix that statement to
CREATE TABLE temp (INDEX(hash_key)) ENGINE=InnoDB AS
SELECT hash_key FROM Table2 WHERE file_path_key = F;
Otherwise the other SELECT performs full join with temp, so it might be very slow.
I would also suggest using a numerical primary key (INT, BIGINT) in Table1 and reference it from Table2 rather than the text column. Eg:
create table Table1(
id int not null auto_increment primary key,
file_path_key varchar(500),
dir_path_key varchar(500),
unique key(file_path_key))
engine = innodb;
create table Table2(
file_id int not null,
hash_key bigint(20) not null,
foreign key (file_id) references Table1(id)
on update cascade on delete cascade) engine = innodb;
Queries joining the two tables may be a lot faster if integer columns are used in join predicate rather than text ones.
Related
Issue:
I'm using PostgreSQL Database.
I have one table (Albums) to be linked to two other tables (Clients, Domains). So if you are Client or Domain you can have Album. But in Albums table owner can handle only single foreign key. How can I solve this issue?
Dream: Single Album can own only (1) Client or Domain. Need fix issue with foreign keys. Albums: id | owner (multiple foreign -> Clients:id or Domains:id) --> can not do this | name. I just need some smart rework.
Tables (now can have Album only Domain):
Albums
Clients
Domains
Albums (table with foreign key yet):
id | owner (foreign key -> Domains:id) | name
Clients:
id | first_name | last_name
Domains:
id | owner | name
Add 2 FK columns, and a CHECK constraint, to enforce only one of them is NOT NULL...
Something like this:
CREATE TABLE albums (
id serial PRIMARY KEY,
client_id integer,
domain_id integer,
name varchar(255) NOT NULL,
FOREIGN KEY (client_id) REFERENCES clients(id),
FOREIGN KEY (domain_id) REFERENCES domains(id),
CHECK ((client_id IS NULL) <> (domain_id IS NULL))
);
To query you can use something like this:
SELECT a.id, COALESCE(c.id, d.id) AS owner_id, COALESCE(c.name, d.name) AS owner_name,
a.name AS title
FROM albums a
LEFT JOIN clients c ON a.client_id = c.id
LEFT JOIN domains d ON a.domain_id = d.id
#e_i_pi's version
CREATE TABLE entities (
id serial PRIMARY KEY,
type integer, -- could be any other type
-- any other "common" values
);
CREATE TABLE client_entities (
id integer PRIMARY KEY, -- at INSERT this comes from table `entities`
name varchar(255) NOT NULL,
);
CREATE TABLE domain_entities (
id integer PRIMARY KEY, -- at INSERT this comes from table `entities`
name varchar(255) NOT NULL,
);
CREATE TABLE albums (
id serial PRIMARY KEY,
owner_id integer FOREIGN KEY REFERENCES entities(id), -- maybe NOT NULL?
name varchar(255) NOT NULL,
);
Query:
SELECT a.id, owner_id, COALESCE(c.name, d.name) AS owner_name, a.name AS title
FROM albums a
LEFT JOIN entities e ON a.owner_id = e.id
LEFT JOIN client_entities c ON e.id = c.id AND e.type = 1 -- depending on the type of `type`
LEFT JOIN domain_entities d ON e.id = d.id AND e.type = 2
Righto, so as suggested in the comment to the answer by #UsagiMiyamoto, there is a way to do this that allows declaration of entity types, with cascading. Note that this solution doesn't support unlimited entity types, as we need to maintain concrete FK constraints. There is a way to do this with unlimited entity types, but involves triggers and quite a bit of nastiness.
Here's the easy to understand solution:
-- Start with a test schema
DROP SCHEMA IF EXISTS "entityExample" CASCADE;
CREATE SCHEMA IF NOT EXISTS "entityExample";
SET SEARCH_PATH TO "entityExample";
-- We'll need this to enforce constraints
CREATE OR REPLACE FUNCTION is_entity_type(text, text) returns boolean as $$
SELECT TRUE WHERE $1 = $2
;
$$ language sql;
-- Unique entity types
CREATE TABLE "entityTypes" (
name TEXT NOT NULL,
CONSTRAINT "entityTypes_ukey" UNIQUE ("name")
);
-- Our client entities
CREATE TABLE clients (
id integer PRIMARY KEY,
name TEXT NOT NULL
);
-- Our domain entities
CREATE TABLE domains (
id integer PRIMARY KEY,
name TEXT NOT NULL
);
-- Our overaching entities table, which maintains FK constraints against clients and domains
CREATE TABLE entities (
id serial PRIMARY KEY,
"entityType" TEXT NOT NULL,
"clientID" INTEGER CHECK (is_entity_type("entityType", 'client')),
"domainID" INTEGER CHECK (is_entity_type("entityType", 'domain')),
CONSTRAINT "entities_entityType" FOREIGN KEY ("entityType") REFERENCES "entityTypes" (name) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT "entities_clientID" FOREIGN KEY ("clientID") REFERENCES "clients" (id) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT "entities_domainID" FOREIGN KEY ("domainID") REFERENCES "domains" (id) ON DELETE CASCADE ON UPDATE CASCADE
);
-- Our albums table, which now can have one owner, but of a dynam ic entity type
CREATE TABLE albums (
id serial PRIMARY KEY,
"ownerEntityID" integer,
name TEXT NOT NULL,
CONSTRAINT "albums_ownerEntityID" FOREIGN KEY ("ownerEntityID") REFERENCES "entities"("id")
);
-- Put the entity type in
INSERT INTO "entityTypes" ("name") VALUES ('client'), ('domain');
-- Enter our clients and domains
INSERT INTO clients VALUES (1, 'clientA'), (2, 'clientB');
INSERT INTO domains VALUES (50, 'domainA');
-- Make sure the clients and domains are registered as entities
INSERT INTO entities ("entityType", "clientID")
SELECT
'client',
"clients".id
FROM "clients"
ON CONFLICT DO NOTHING
;
INSERT INTO entities ("entityType", "domainID")
SELECT
'domain',
"domains".id
FROM "domains"
ON CONFLICT DO NOTHING
;
If you don't like the idea of inserting twice (once in client, once in entites, for example) you can have a trigger on inserts in the clients table, or alternately create an insert function that inserts to both tables at once.
here is a stripped down example of my problem:
I create 2 tables, which are connected via a 'grouping table'.
CREATE TABLE table1
(
t1_pk INT(11) AUTO_INCREMENT NOT NULL,
t1_entry VARCHAR(150),
PRIMARY KEY (t1_pk)
) engine = innodb;
CREATE TABLE table2
(
t2_pk int(11) AUTO_INCREMENT NOT NULL,
t2_entry VARCHAR(150),
PRIMARY KEY (t2_pk)
) engine = innodb;
CREATE TABLE grouping
(
grouping_pk INT(11) AUTO_INCREMENT NOT NULL,
t1_fk INT(11) NOT NULL,
t2_fk INT(11) NOT NULL,
PRIMARY KEY (grouping_pk),
CONSTRAINT table1_fk FOREIGN KEY (t1_fk) REFERENCES table1 (t1_pk) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT table2_fk FOREIGN KEY (t2_fk) REFERENCES table2 (t2_pk) ON DELETE CASCADE ON UPDATE CASCADE
) engine = innodb;
Now I want to delete all the entries from grouping, table1 and table2 where table1.t1_entry is "abc".
I try to do it like this:
DELETE FROM grouping
WHERE grouping.grouping_pk IN (SELECT
temp.entry_id
FROM (SELECT grouping.grouping_pk,
grouping.t1_fk,
grouping.t2_fk,
table1.t1_pk,
table1.t1_entry,
table2.t2_pk,
table2.t2_entry
FROM grouping
LEFT OUTER JOIN table1 ON grouping.t1_fk = table1.t1_pk
LEFT OUTER JOIN table2 ON grouping.t2_fk = table2.t2_pk
WHERE table1.t1_entry LIKE 'abc'
) AS temp)
As a result, the entries are deleted in the grouping table, but not in table1 and table2.
My question is now, how could I selected records and delete the result set from all tables? I feel like a dummy, because I can't figure this out by myself.
In grouping table definition of table2_fk you references to table1 instead of table2. It may be that the problem...
Closing this one now...
Conclusion:
I have to rework my datamodel for a better solution to my problem, based on Barmar and Spencer7593 comments.
Thanks for the much appreciated help!
I need to retrieve a report of the affected rows when a table has been altered with the following commands:
1.- Changing the engine:
ALTER TABLE <table> ENGINE=INNODB;
2.- Adding constraints:
ALTER TABLE nombre_tabla ADD PRIMARY KEY símbolo_clave_foránea;
ALTER TABLE nombre_tabla DROP PRIMARY KEY símbolo_clave_foránea;
ALTER TABLE nombre_tabla ADD FOREIGN KEY símbolo_clave_foránea;
ALTER TABLE nombre_tabla DROP FOREIGN KEY símbolo_clave_foránea;
3.- Adding a UNIQUE constraint.
Primary or Unique Key failure is look for duplicates, if you have nulls in there you'll need to sort them first.
E.g given MyTable(KeyField int not null) then
Select KeyField From MyTable
inner join (Select KeyField,Count() as NumberOfTimes Group By KeyField) Duplicates
Where NumberOfTimes > 1
Then you'll have to come up with something to do with them. Delete or rekey.
Foreign Keys just a outer join query with where key is null
e.g Given MyTable (KeyField int not null, ForeignKeyField int not null) and
MyLookUpTable(LookUpkey int not null, Description VarChar(32) not null) then
Select KeyField From MyTable
Left Join MyLookUpTable On MyTable.LookUpField = MyLookUpTable.LookUpKey
Where MyTable.LookUpField Is Null
Again you'll have to decide what to do with them. You could delete them, but this might help.
One way is to insert a "Missing" Record in the look Up Table, grab it's key, then do an update with join. So given that key is 999
Update m
Set LookUpField = 999
From MyTable m
Left Join MyLookUpTable On m.LookUpField = MyLookUpTable.LookUpKey
Where m.LookUpField Is Null
Now you can dig out 999s and deal with them at your leisure.
Is there a way to create a unique index across tables in a MySQL database?
By unique, I mean,
table A ids=1,2,5,7...
table B ids=3,4,6,8...
I think, it is not possible by directly creating a constraint. But there are some solutions in order for you to get unique IDs.
One suggestion would be is by using TRIGGER. Create a BEFORE INSERT trigger on both tables which checks the ID for existence during INSERT or UPDATE.
Second, is by creating a third table which contains UNIQUE numbers and the other tables: TableA and TableB will then be reference on the third one.
Like JW says, this would probably work well with using a third table. In MySQL, you can use identity fields to make this easier. A very simple example would be using these tables (simple versions of the tables with just names without knowing any further info):
CREATE TABLE a
(
`id` int,
`name` varchar(100),
PRIMARY KEY(`id`)
)
ENGINE = INNODB;
CREATE TABLE b
(
`id` int,
`name` varchar(100),
PRIMARY KEY(`id`)
)
ENGINE = INNODB;
CREATE TABLE c
(
`id` int auto_increment,
`intable` varchar(10) not null,
PRIMARY KEY(`id`)
)
ENGINE = INNODB;
Then, when you want to insert a value on either table, do (a sample inserting 'Joe' into a):
INSERT INTO c (`intable`) VALUES ('a');
INSERT INTO a (`id`, `name`)
SELECT LAST_INSERT_ID() AS id, 'Joe' AS name;
The only reason for the intable entry in c is so you know which table it was created for. Any sort of value to insert into c can be used instead.
I have to related tables like in the example script below:
-- Creating table 'Product'
CREATE TABLE [dbo].[Product] (
[Id] int IDENTITY(1,1) NOT NULL,
[Text] nvarchar(max) NOT NULL,
[AvgRating] decimal(18,2) NOT NULL
);
GO
-- Creating table 'Review'
CREATE TABLE [dbo].[Review] (
[Id] int IDENTITY(1,1) NOT NULL,
[Text] nvarchar(max) NOT NULL,
[Rating] decimal(18,2) NOT NULL,
[Product_Id] int NOT NULL
);
GO
-- Creating primary key on [Id] in table 'Product'
ALTER TABLE [dbo].[Product]
ADD CONSTRAINT [PK_Product]
PRIMARY KEY CLUSTERED ([Id] ASC);
GO
-- Creating primary key on [Id] in table 'Review'
ALTER TABLE [dbo].[Review]
ADD CONSTRAINT [PK_Review]
PRIMARY KEY CLUSTERED ([Id] ASC);
GO
-- Creating foreign key on [Product_Id] in table 'Review'
ALTER TABLE [dbo].[Review]
ADD CONSTRAINT [FK_ProductReview]
FOREIGN KEY ([Product_Id])
REFERENCES [dbo].[Product]
([Id])
ON DELETE NO ACTION ON UPDATE NO ACTION;
-- Creating non-clustered index for FOREIGN KEY 'FK_ProductReview'
CREATE INDEX [IX_FK_ProductReview]
ON [dbo].[Review]
([Product_Id]);
GO
I would like to compute AvgRating on Product row when user inserts/updates/deletes review. The most obvious and brute force approach that comes to my mind is to pull data from database and compute the average on client side, and then update the product row manually. Is it possible to do it automatically on database server without having to pull the data? If so, how? Database is hosted on MS SQL Server 2008.
You could create a trigger on the Review table. After an update on Review it could recompute the average review.
CREATE TRIGGER TRG_REVIEW
ON Review
AFTER INSERT, UPDATE, DELETE
AS
;WITH ratings
AS
(
SELECT product_id, AVG(Rating) AS rating
FROM Review R
WHERE EXISTS (SELECT * FROM INSERTED WHERE product_id = R.product_id AND UPDATE(rating))
OR EXISTS (SELECT * FROM DELETED WHERE product_id = R.product_id)
GROUP BY product_id
)
UPDATE P set AvgRating = COALESCE(R.rating,0)
FROM Product P
INNER JOIN ratings R
ON P.id = R.Product_id