Formulating a query to detect inconsistent data - mysql

I have the following table structure:
CREATE TABLE a (
a_id int(10) unsigned NOT NULL AUTO_INCREMENT,
);
CREATE TABLE b {
b_id int(10) unsigned NOT NULL AUTO_INCREMENT,
};
CREATE TABLE cross (
a_id int(10) unsigned NOT NULL,
b_id int(10) unsigned NOT NULL,
PRIMARY KEY (a_id),
KEY (b_id),
CONSTRAINT FOREIGN KEY (a_id) REFERENCES a (a_id),
CONSTRAINT FOREIGN KEY (b_id) REFERENCES b (b_id)
);
CREATE TABLE prices (
a_id int(10) unsigned NOT NULL,
price int(10) NOT NULL,
PRIMARY KEY (a_id),
CONSTRAINT FOREIGN KEY (a_id) REFERENCES a (a_id)
);
I would like to retrieve every b_id value for which there are inconsistent prices. A b.id value 'B' has an inconsistent price if the following conditions both hold:
There exist two a_id values (say, 'A1' and 'A2') such that table cross contains both ('A1', 'B') and ('A2', 'B'). (For any b_id value, there may be zero or more rows in cross.)
Either 'A1' and 'A2' correspond to rows of prices that have different values of price, or else exactly one of 'A1' and 'A2' corresponds to an entry in prices.
Because of restrictions by the hosting provider, I cannot use stored procedures with this data base. I haven't figured out a sensible way to do this with SQL queries. So far, I've resorted to retrieving all relevant data and scanning for inconsistencies in Perl. That's a lot of data retrieval. Is there a better way? (I'm using InnoDB, if it makes a difference.)

/* Condition 1 and Condition 2a */
SELECT
c.b_id
FROM
`cross` AS c
JOIN prices AS p ON (p.a_id = c.a_id)
GROUP BY
c.b_id
HAVING
COUNT(c.a_id) > 1 AND
MAX(p.price) != MIN(p.price)
UNION
/* Condition 1 and Condition 2b */
SELECT
c.b_id
FROM
`cross` AS c
LEFT JOIN prices AS p ON (p.a_id = c.a_id)
GROUP BY
c.b_id
HAVING
COUNT(c.a_id) > 1 AND
SUM(IF(p.price IS NULL, 0 ,1)) = 1;

Related

Find records, which have several specific records in a joined table

I have two tables as follows:
create table gift_certificate
(
id int auto_increment
primary key,
name varchar(64) not null,
description mediumtext not null,
price decimal default 0 not null,
duration int default 1 not null,
create_date datetime not null,
last_update_date datetime not null
)
and
create table tag
(
id int auto_increment
primary key,
name varchar(64) not null,
constraint tag_name_uindex
unique (name)
)
with a linking table:
create table gift_certificate__tag
(
certificate_id int not null,
tag_id int not null,
primary key (certificate_id, tag_id),
constraint gift_certificate__tag_gift_certificate_id_fk
foreign key (certificate_id) references gift_certificate (id),
constraint gift_certificate__tag_tag_id_fk
foreign key (tag_id) references tag (id)
)
I need to search for gift certificates by several tags (“and” condition). I only came up with a solution for one tag
select distinct gc.*, tag.* from gift_certificate gc
left outer join gift_certificate__tag joint on gc.id=joint.certificate_id
left outer join tag on joint.tag_id=tag.id
where tag.name='puppy'
order by gc.id desc;
Would be grateful for some support
You can aggregate the joint table by certificate and use HAVING to only keep certificates that have all the tags. Then select all matching certificates using an IN clause. For instance:
select *
from gift_certificate
where id in
(
select joint.certificate_id
from gift_certificate__tag joint
join tag on joint.tag_id=tag.id
group by joint.certificate_id
having max(case when tag.name = 'puppy' then 1 else 0 end) = 1
and max(case when tag.name = 'something' then 1 else 0 end) = 1
);
As true = 1 and false = 0 in MySQL, you can shorten the expression to
having max(tag.name = 'puppy')
if you find this readable. Or
having sum(tag.name = 'puppy') > 0

Left Join Specific Row

Think this is probably fairly simple but cannot find the correct search terms, so if this is duplicated then great cause im sure there will be an answer somewhere.
I have the following tables setup
CREATE TABLE IF NOT EXISTS `customer` (
`id` int(6) unsigned auto_increment NOT NULL,
`name` varchar(200) NOT NULL,
PRIMARY KEY (`id`)
) DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `billing_run` (
`id` int(6) unsigned auto_increment NOT NULL,
`date` datetime NOT NULL,
PRIMARY KEY (`id`)
) DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `invoice` (
`id` int(6) unsigned auto_increment NOT NULL,
`billing_run_id` int(6) unsigned NOT NULL,
`customer_id` int(6) unsigned NOT NULL,
PRIMARY KEY (`id`),
FOREIGN KEY (billing_run_id) REFERENCES billing_run(id),
FOREIGN KEY (customer_id) REFERENCES customer(id)
) DEFAULT CHARSET=utf8;
with the following data
insert into customer (name) values ('test customer');
insert into billing_run (date) values ('2019-01-01 12:00:00');
insert into billing_run (date) values ('2019-02-01 12:00:00');
insert into billing_run (date) values ('2019-03-01 12:00:00');
insert into invoice (customer_id,billing_run_id) values (1,1);
SQLFiddle here -> http://sqlfiddle.com/#!9/a54162/5
And i want to get the customer records that do not have an invoice related to billing_run with id of 2
My query
select c.id from customer c
left join invoice i on i.customer_id = c.id
left join billing_run br on br.id = i.billing_run_id and br.id = 2
where i.id is null
returns 0 records. Why ?
First you join the table customer (1 row) with the table invoice (1 row).
This join will return 1 row because there is a match between the columns in the ON clause:
on i.customer_id = c.id
(both i.customer_id and c.id have the value 1 in your sample data).
So there is not any row with i.id is null.
The next join to the table billing_run does not affect the first 2 joined tables.
So the condition:
where i.id is null
returns no rows.
The correct condition (which you had in the original fiddle) is:
where br.id is null
because the join to the table billing_run will return a non matching row for the condition:
on br.id = i.billing_run_id and br.id = 2
because there is no i.billing_run_id = 2 in invoice.
You will want to do an exclusive where clause this will return the 1 row that you want.
select * from customer c
where c.id not in (Select customer_id from invoice i LEFT JOIN billing_run br on
i.billing_run_id=br.id WHERE br.id=2 and br.id is not null)
http://sqlfiddle.com/#!9/a54162/14
You don't need the billing_run table. So I think you intend:
select c.id
from customer c left join
invoice i
on i.customer_id = c.id and i.billing_run_id = 2
where i.id is null

MYSQL: left Join and sum two tables where one table has two columns referring to the first table

I am trying to create a procedure where my transfer table is joined to my account table. In my transfer table, there are two FK columns that reference the account table id column.
account table:
CREATE TABLE account (
id INT NOT NULL AUTO_INCREMENT,
name VARCHAR(30) NOT NULL,
number VARCHAR(30) NOT NULL DEFAULT '',
description VARCHAR(255)NOT NULL DEFAULT '',
is_active BIT(1) NOT NULL DEFAULT b'1',
PRIMARY KEY (id),
UNIQUE account_name (name, number)
);
transfer table:
CREATE TABLE transfer (
id INT NOT NULL AUTO_INCREMENT,
date DATE NOT NULL,
from_account INT NULL,
to_account INT NULL,
amount DECIMAL(12, 2) NOT NULL,
PRIMARY KEY (id),
FOREIGN KEY (from_account)
REFERENCES account(id),
FOREIGN KEY (to_account)
REFERENCES account(id)
);
get_account procedure:
CREATE PROCEDURE get_account()
SELECT a.*,
(SUM(t.amount) - SUM(f.amount)) AS balance
FROM account a
LEFT JOIN transfer f
ON a.id = f.from_account
LEFT JOIN transfer t
ON a.id = t.to_account
GROUP BY a.id;
I am trying to subtract the total of the from_accout column from the total of the to_account column. I am able to get the sum of just one column but when I try to get both it returns a NULL.
This seems like it should be easy, but I can't figure it out.

How to properly index table

We use to index our tables basing on where statement. It works fine during our MSSQL days, but now we are using MySQL and things are differenct. Sub-query has terrible performance. Consider this table :
# 250K records per day
create table t_101(
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`transaction_date` datetime not null,
`memo_1` nvarchar(255) not null,
`memo_2` nvarchar(255) not null,
`product_id` bigint not null,
#many more columns
PRIMARY KEY (`id`),
key `index.t_101.101`(`transaction_date`, `product_id`, `memo_1`),
key `index.t_101.102`(`transaction_date`, `product_id`, `memo_2`)
)ENGINE=MyIsam;
A temporary table where I store condition values :
# 150 records
create temporary table `temporary.user.accessibleProducts`
(
product_id bigint not null,
PRIMARY KEY (`product_id`)
)Engine=MyIsam;
And this is the original query :
select
COUNT(a.id) as rowCount_la1,
COUNT(DISTINCT a.product_id) as productCount
from t_101 a
where a.transaction_date = '2017-05-01'
and a.product_id in(select xa.product_id from `temporary.user.accessibleProducts` xa)
and a.memo_1 <> '';
it takes 7 seconds to be executed, while this query :
select
COUNT(a.id) as rowCount_la1,
COUNT(DISTINCT a.product_id) as productCount
from t_101 a
inner join `temporary.user.accessibleProducts` b on b.product_id = a.product_id
where a.transaction_date = '2017-05-01'
and a.memo_1 <> '';
takes 0.063 seconds to execute.. Even though 0.063 seconds is acceptable, I'm worrying about index. With the given above, how do I index t_101 properly?
We are using MySQL 5.5.42.

MySQL using IN/FIND_IN_SET to read multiple rows in sub query

I have two tables, locations and location groups
CREATE TABLE locations (
location_id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(63) UNIQUE NOT NULL
);
INSERT INTO locations (name)
VALUES
('london'),
('bristol'),
('exeter');
CREATE TABLE location_groups (
location_group_id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
location_ids VARCHAR(255) NOT NULL,
user_ids VARCHAR(255) NOT NULL,
name VARCHAR(63) NOT NULL,
);
INSERT INTO location_groups (location_ids, user_ids, name)
VALUES
('1', '1,2,4', 'south east'),
('2,3', '2', 'south west');
What I am trying to do is return all location_ids for all of the location_groups where the given user_id exists. I'm using CSV to store the location_ids and user_ids in the location_groups table. I know this isn't normalised, but this is how the database is and it's out of my control.
My current query is:
SELECT location_id
FROM locations
WHERE FIND_IN_SET(location_id,
(SELECT location_ids
FROM location_groups
WHERE FIND_IN_SET(2,location_groups.user_ids)) )
Now this works fine if the user_id = 1 for example (as only 1 location_group row is returned), but if i search for user_id = 2, i get an error saying the sub query returns more than 1 row, which is expected as user 2 is in 2 location_groups. I understand why the error is being thrown, i'm trying to work out how to solve it.
To clarify when searching for user_id 1 in location_groups.user_ids the location_id 1 should be returned. When searching for user_id 2 the location_ids 1,2,3 should be returned.
I know this is a complicated query so if anything isn't clear just let me know. Any help would be appreciated! Thank you.
You could use GROUP_CONCAT to combine the location_ids in the subquery.
SELECT location_id
FROM locations
WHERE FIND_IN_SET(location_id,
(SELECT GROUP_CONCAT(location_ids)
FROM location_groups
WHERE FIND_IN_SET(2,location_groups.user_ids)) )
Alternatively, use the problems with writing the query as an example of why normalization is good. Heck, even if you do use this query, it will run more slowly than a query on properly normalized tables; you could use that to show why the tables should be restructured.
For reference (and for other readers), here's what a normalized schema would look like (some additional alterations to the base tables are included).
The compound fields in the location_groups table could simply be separated into additional rows to achieve 1NF, but this wouldn't be in 2NF, as the name column would be dependent on only the location part of the (location, user) candidate key. (Another way of thinking of this is the name is an attribute of the regions, not the relations between regions/groups, locations and users.)
Instead, these columns will be split off into two additional tables for 1NF: one to connect locations and regions, and one to connect users and regions. It may be that the latter should be a relation between users and locations (rather than regions), but that's not the case with the current schema (which could be another problem of the current, non-normalized schema). The region-location relation is one-to-many (since each location is in one region). From the sample data, we see the region-user relation is many-many. The location_groups table then becomes the region table.
-- normalized from `location_groups`
CREATE TABLE regions (
`id` INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
`name` VARCHAR(63) UNIQUE NOT NULL
);
-- slightly altered from original
CREATE TABLE locations (
`id` INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
`name` VARCHAR(63) UNIQUE NOT NULL
);
-- missing from original sample
CREATE TABLE users (
`id` INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
`name` VARCHAR(63) UNIQUE NOT NULL
);
-- normalized from `location_groups`
CREATE TABLE location_regions (
`region` INT UNSIGNED,
`location` INT UNSIGNED UNIQUE NOT NULL,
PRIMARY KEY (`region`, `location`),
FOREIGN KEY (`region`)
REFERENCES regions (id)
ON DELETE restrict ON UPDATE cascade,
FOREIGN KEY (`location`)
REFERENCES locations (id)
ON DELETE cascade ON UPDATE cascade
);
-- normalized from `location_groups`
CREATE TABLE user_regions (
`region` INT UNSIGNED NOT NULL,
`user` INT UNSIGNED NOT NULL,
PRIMARY KEY (`region`, `user`),
FOREIGN KEY (`region`)
REFERENCES regions (id)
ON DELETE restrict ON UPDATE cascade,
FOREIGN KEY (`user`)
REFERENCES users (id)
ON DELETE cascade ON UPDATE cascade
);
Sample data:
INSERT INTO regions
VALUES
('South East'),
('South West'),
('North East'),
('North West');
INSERT INTO locations (`name`)
VALUES
('London'),
('Bristol'),
('Exeter'),
('Hull');
INSERT INTO users (`name`)
VALUES
('Alice'),
('Bob'),
('Carol'),
('Dave'),
('Eve');
------ Location-Region relation ------
-- temporary table used to map natural keys to surrogate keys
CREATE TEMPORARY TABLE loc_rgns (
`location` VARCHAR(63) UNIQUE NOT NULL
`region` VARCHAR(63) NOT NULL,
);
-- Hull added to demonstrate correctness of desired query
INSERT INTO loc_rgns (region, location)
VALUES
('South East', 'London'),
('South West', 'Bristol'),
('South West', 'Exeter'),
('North East', 'Hull');
-- map natural keys to surrogate keys for final relationship
INSERT INTO location_regions (`location`, `region`)
SELECT loc.id, rgn.id
FROM locations AS loc
JOIN loc_rgns AS lr ON loc.name = lr.location
JOIN regions AS rgn ON rgn.name = lr.region;
------ User-Region relation ------
-- temporary table used to map natural keys to surrogate keys
CREATE TEMPORARY TABLE usr_rgns (
`user` INT UNSIGNED NOT NULL,
`region` VARCHAR(63) NOT NULL,
UNIQUE (`user`, `region`)
);
-- user 3 added in order to demonstrate correctness of desired query
INSERT INTO usr_rgns (`user`, `region`)
VALUES
(1, 'South East'),
(2, 'South East'),
(2, 'South West'),
(3, 'North West'),
(4, 'South East');
-- map natural keys to surrogate keys for final relationship
INSERT INTO user_regions (`user`, `region`)
SELECT user, rgn.id
FROM usr_rgns AS ur
JOIN regions AS rgn ON rgn.name = ur.region;
Now, the desired query for the normalized schema:
SELECT DISTINCT loc.id
FROM locations AS loc
JOIN location_regions AS lr ON loc.id = lr.location
JOIN user_regions AS ur ON lr.region = ur.region
;
Result:
+----+
| id |
+----+
| 1 |
| 2 |
| 3 |
+----+