I am trying to write below query in vertica
`SELECT a.*
FROM a
WHERE a.country="India"
AND a.language ="Hindi"
AND ( CASE WHEN (a.spoken = true
AND exist ( select 1
FROM b
WHERE b.country=a.country
AND b.language=a.language
AND ( CASE WHEN (a.population <b.population
AND a.statsyear > b.statsyear))
THEN true //pick recent stats
WHEN (a.population > b.population)
THEN true
ELSE false
END)) THEN true
WHEN (a.written = true ) THEN
true
ELSE false
END)`
it is not working, because we can't reference "a.population" outer query field in case expression of innerquery. I tried rewriting it wil OR caluse Vertica is not allowing it.
How can I re-write this
I created below tables in MySQL local box
Example of Tables and Results
CREATE TABLE tableA
(
id INT,
country VARCHAR(20),
language VARCHAR(20),
spoken INT,
written INT,
population INT,
stats INT
)
insert into tableA values(1,'India','Hindi',1,0,9,2010)
insert into tableA values(2,'India','Hindi',1,0,11,2011)
insert into tableA values(3,'India','Hindi',1,0,10,2012)
insert into tableA values(4,'India','Hindi',0,1,10,2013)
insert into tableA values(5,'India','Hindi',1,1,10,2012)
insert into tableA values(6,'India','English',1,1,10,2012)
CREATE TABLE tableB
(
id INT,
country VARCHAR(20),
language VARCHAR(20),
population INT,
stats INT
)
insert into TableB values(1,'India','Hindi',10,2009)
insert into TableB values(2,'India','Hindi',10,2011)
insert into TableB values(3,'India','Hindi',10,2012)
Rewrote the query slightly in different way
select distinct a.id
from (
SELECT a.*
FROM TableA a
WHERE a.country="India"
AND a.language ="Hindi" ) a, TableB b
WHere ( CASE WHEN a.written=1 THEN
TRUE
WHEN ( (a.spoken = 1) AND (a.country=b.country) AND (a.language=b.language)) THEN
(case WHEN ((a.population < b.population) AND (a.stats > b.stats)) THEN
TRUE
WHEN (a.population > b.population) THEN
TRUE
ELSE
FALSE
END)
ELSE
FALSE
END)
got below results
1,2,4,5
This is what I need, now could you please help me in writing it more efficient manner
Boolean logic equivalent:
SELECT DISTINCT a.*
FROM TableA a
left join TableB b on a.country=b.country AND a.language=b.language
WHERE a.country='India'
AND a.language ='Hindi'
AND (
a.written=1
OR
(a.spoken = 1 AND a.population < b.population AND a.stats > b.stats)
OR
a.population > b.population
)
;
Result:
+----+---------+----------+--------+---------+------------+-------+
| id | country | language | spoken | written | population | stats |
+----+---------+----------+--------+---------+------------+-------+
| 1 | India | Hindi | 1 | 0 | 9 | 2010 |
| 2 | India | Hindi | 1 | 0 | 11 | 2011 |
| 4 | India | Hindi | 0 | 1 | 10 | 2013 |
| 5 | India | Hindi | 1 | 1 | 10 | 2012 |
+----+---------+----------+--------+---------+------------+-------+
Demo
Related
I try to select columns which names are the content of other columns. I'm using MySQL 5.6.
Let's say I have "table1":
+------+------------+------------+---------------+---------------+
| id | val_int1 | val_int2 | val_string1 | val_string2 |
+------+------------+------------+---------------+---------------+
| 1 | 70 | 88 | xxx | yyy |
+------+------------+------------+---------------+---------------+
And "table2":
+------+--------+----------+
| id | type | ref_id |
+------+--------+----------+
| 10 | i1 | 1 |
| 20 | s2 | 1 |
+------+--------+----------+
What I want to do is: join table1 and table2, the table2.type field contains the name of the column from table1 which I want to select. And then there's the problem that the type field only contains abbreviations which I have to extend.
This ends up in the following SQL statement:
SELECT
t1.id,
IF(t2.type REGEXP 'i[0-9]+', REPLACE(t2.type, 'i', 'val_int'), REPLACE(t2.type, 's', 'val_string'))
FROM
table1 t1, table2 t2
WHERE
t1.id = t2.ref_id AND t1.id = 1
The result is that the REPLACE functions return val_int1 and val_string2 as fixed strings and not handle it as column names.
What I really expect is:
+-----+-------+
| 1 | 70 |
| 1 | yyy |
+-----+-------+
You need some sort of case expression:
select t1.id,
(case when t2.type = 'i1' then cast(val_int_1 as varchar(255))
when t2.type = 'i2' then cast(val_int_2 as varchar(255))
when t2.type = 's1' then val_string_1
when t2.type = 's2' then val_string_2
end) as val
from table1 t1 cross join
table2 t2;
You are likely to complain "oh, I have so many columns". Basically, too bad. You have a poor database design. You are trying to do a partial match on strings and column names. Even a dynamic SQL solution is not very feasible.
Using case expression, this is how i would solve this:
DECLARE #table1 TABLE
(
id INT,
val_int1 INT,
val_int2 INT,
val_string1 NVARCHAR(100),
val_string2 NVARCHAR(100)
)
INSERT INTO #table1 VALUES
(1,70,88,'xxx','yyy')
DECLARE #table2 TABLE
(
id INT,
type NVARCHAR(MAX),
ref_id INT
)
INSERT INTO #table2 VALUES
(10,'i1',1),
(20,'s2',1)
SELECT
id,
CASE WHEN type = 'i1' THEN CAST((SELECT TOP 1 val_int1 FROM #table1) AS NVARCHAR(100)) ELSE
CASE WHEN type = 'i2' THEN CAST((SELECT TOP 1 val_int2 FROM #table1) AS NVARCHAR(100)) ELSE
CASE WHEN type = 's1' THEN (SELECT TOP 1 val_string1 FROM #table1) ELSE
(SELECT TOP 1 val_string2 FROM #table1) END END END
FROM #table2 t2
OUTPUT:
10 70
20 yyy
I have a table like the below:
Site | Name | ID
A | Mike | 1
A | Mary | 2
A | Mary | 3
B | Mary | 1
B | Rich | 2
I'd like to find all the duplicate Name's within a Site. So I'm trying to return:
Site | Name | ID
A | Mary | 2
A | Mary | 3
I've tried this:
SELECT DISTINCT Site, Name, ID
from table
group by ID having count(*) > 1
The results come back erroneously because it's counting Sites A & B together. I would like to only find the duplicates for within each Site--not duplicates across Sites.
You can use exists or in:
select t.*
from t
where exists (select 1
from t t2
where t2.site = t.site and t2.name = t.name and t2.id <> t.id
);
You can try to use NOT exists with subquery having.
the duplicate Name's within a Site
CREATE TABLE T(
Site varchar(5),
Name varchar(5),
ID int
);
insert into t values ('A','Mike', 1);
insert into t values ('A','Mary', 2);
insert into t values ('A','Mary', 3);
insert into t values ('B','Mary', 1);
insert into t values ('B','Rich', 2);
Query 1:
SELECT * FROM T t1 WHERE
NOT exists
(
SELECT 1
FROM T tt
where t1.name = tt.name
group by Name
HAVING MIN(tt.ID) = t1.ID
)
Results:
| Site | Name | ID |
|------|------|----|
| A | Mary | 2 |
| A | Mary | 3 |
find the duplicates for within each Site
CREATE TABLE T(
Site varchar(5),
Name varchar(5),
ID int
);
insert into t values ('A','Mike', 1);
insert into t values ('A','Mary', 2);
insert into t values ('A','Mary', 3);
insert into t values ('B','Mary', 1);
insert into t values ('B','Rich', 2);
Query 1:
SELECT t1.*
FROM T t1
WHERE not exists
(
SELECT 1
FROM T tt
where t1.name = tt.name and t1.Site = tt.Site
group by Name,Site
HAVING MIN(tt.ID) = t1.ID
)
Results:
| Site | Name | ID |
|------|------|----|
| A | Mary | 3 |
I have a fairly big table (10,000+ records) that looks more or less like this:
| id | name | contract_no | status |
|----|-------|-------------|--------|
| 1 | name1 | 1022 | A |
| 2 | name2 | 1856 | B |
| 3 | name3 | 1322 | C |
| 4 | name4 | 1322 | C |
| 5 | name5 | 1322 | D |
contract_no is a foreign key which of course can appear in several records and each record will have a status of either A, B, C, D or E.
What I want is to get a list of all the contract numbers, where ALL the records referencing that contract are in status C, D, E, or a mix of those, but if any of the records are in status A or B, omit that contract number.
Is it possible to do this using a SQL query? Or should I better export the data and try to run this analysis using another language like Python or R?
Post aggregate filtering should do the trick
SELECT contract_no FROM t
GROUP BY contract_no
HAVING SUM(status='A')=0
AND SUM(status='B')=0
You can use group by with having to get such contract numbers.
select contract_number
from yourtable
group by contract_number
having count(distinct case when status in ('C','D','E') then status end) >= 1
and count(case when status = 'A' then 1 end) = 0
and count(case when status = 'B' then 1 end) = 0
Not that elegant as the other two answers, but more expressive:
SELECT DISTINCT contract_no
FROM the_table t1
WHERE NOT EXISTS (
SELECT *
FROM the_table t2
WHERE t2.contract_no = t1.contract_no
AND t2.status IN ('A', 'B')
)
Or
SELECT DISTINCT contract_no
FROM the_table
WHERE contract_no NOT IN (
SELECT contract_no
FROM the_table
AND status IN ('A', 'B')
)
I have exports of person data which I would like to import into a table considering historization.
I wrote single sql-steps but two questions arises:
1. There is a step where I got a unexpected date
2: I would like to avoid manually submitting some steps and using stored procedure
The tables are:
Table to be filled considering historization:
CREATE TABLE person (
id INTEGER DEFAULT NULL
, name VARCHAR(50) DEFAULT NULL
, effective_dt DATE DEFAULT NULL
, expiry_dt DATE DEFAULT NULL
);
Table with person data to be imported:
CREATE TABLE person_stg (
id INTEGER DEFAULT NULL
, name VARCHAR(50) DEFAULT NULL
, export_dt DATE DEFAULT NULL
, import_flag TINYINT DEFAULT 0
);
-- Several exports which has to be imported
INSERT INTO person_stg (id, name, export_dt) VALUES
(1,'Jonn' , '2000-01-01')
, (2,'Marry' , '2000-01-01')
, (1,'John' , '2000-01-05')
, (2,'Marry' , '2000-01-06')
, (2,'Mary' , '2000-01-10')
, (3,'Samuel', '2000-01-10')
, (2,'Maria' , '2000-01-15')
;
The following first step (1) populates the table person with the first state of the person:
INSERT INTO person
SELECT a.id, a.name, a.export_dt, '9999-12-31' expiry_dt
FROM person_stg a
LEFT JOIN person_stg b
ON a.id = b.id
AND a.export_dt > b.export_dt
WHERE b.id IS NULL
;
SELECT * FROM person ORDER BY id, effective_dt;
+----+--------+--------------+------------+
| id | name | effective_dt | expiry_dt |
+----+--------+--------------+------------+
| 1 | Jonn | 2000-01-01 | 9999-12-31 |
| 2 | Marry | 2000-01-01 | 9999-12-31 |
| 3 | Samuel | 2000-01-10 | 9999-12-31 |
+----+--------+--------------+------------+
Step (2) changes the expiry date:
-- (2) Update expiry_dt where changes happened
UPDATE
person a
, person_stg b
SET a.expiry_dt = SUBDATE(b.export_dt,1)
WHERE a.id = b.id
AND a.name <> b.name
AND a.expiry_dt = '9999-12-31'
AND b.export_dt = (SELECT MIN(b.export_dt)
FROM person_stg c
WHERE b.id = c.id
AND c.import_flag = 0
)
;
SELECT * FROM person ORDER BY id, effective_dt;
+----+--------+--------------+------------+
| id | name | effective_dt | expiry_dt |
+----+--------+--------------+------------+
| 1 | Jonn | 2000-01-01 | 2000-01-04 |
| 2 | Marry | 2000-01-01 | 2000-01-09 |
| 3 | Samuel | 2000-01-10 | 9999-12-31 |
+----+--------+--------------+------------+
The third step (3) inserts the second status of person data:
-- (3) Insert new exports which has changes
INSERT INTO person
SELECT a.id, a.name, a.export_dt, '9999-12-31' expiry_dt
FROM person_stg a
INNER JOIN person b
ON a.id = b.id
AND b.expiry_dt = SUBDATE(a.export_dt,1)
AND a.export_dt > b.effective_dt
AND a.import_flag = 0
;
SELECT * FROM person ORDER BY id, effective_dt;
+----+--------+--------------+------------+
| id | name | effective_dt | expiry_dt |
+----+--------+--------------+------------+
| 1 | Jonn | 2000-01-01 | 2000-01-04 |
| 1 | John | 2000-01-05 | 9999-12-31 |
| 2 | Marry | 2000-01-01 | 2000-01-09 |
| 2 | Mary | 2000-01-10 | 9999-12-31 |
| 3 | Samuel | 2000-01-10 | 9999-12-31 |
+----+--------+--------------+------------+
And the last step (4) defines on person_stg which record was inserted:
-- (4) Define imported records
UPDATE
person_stg a
, person b
SET import_flag = 1
WHERE a.id = b.id
AND a.export_dt = b.effective_dt
;
So far, so good. If I repeat step (2) I got the following table:
+----+--------+--------------+------------+
| id | name | effective_dt | expiry_dt |
+----+--------+--------------+------------+
| 1 | Jonn | 2000-01-01 | 2000-01-04 |
| 1 | John | 2000-01-05 | 9999-12-31 |
| 2 | Marry | 2000-01-01 | 2000-01-09 |
| 2 | Mary | 2000-01-10 | 1999-12-31 | <--- ??? Should be 2000-01-14
| 3 | Samuel | 2000-01-10 | 9999-12-31 |
+----+--------+--------------+------------+
Mary/2000-01-10 got expiry_dt 1999-12-31 instead of 2000-01-14. I don't understand how this can happened.
So, my questions are:
(1a) Why this update of the expiry date gives this strange date?
(1b) Is there maybe a better code then (2)?
(2) How can I repeat steps (2) until (4) automatically? I need only some hints for a stored procedure.
-- (4) Define imported records
UPDATE
person_stg a
, person b
SET import_flag = 1
WHERE a.id = b.id
AND a.export_dt = b.effective_dt
;
If I understand what you want to do, you don't need a multi-step process. You are just looking for the "end date" for each record. Here is a method that uses correlated subqueries:
SELECT p.*, export_dt as effdate,
COALESCE((SELECT export_dt - interval 1 day
FROM person_stg p2
WHERE p2.id = p.id AND
p2.export_dt > p.export_dt
ORDER BY p2.export_dt
LIMIT 1
), '9999-12-31') as enddate
FROM person_stg p;
You can also do something using variables.
I'm not sure if this answers your question, because it replaces the whole process with a simpler query.
I found a solution using cursor which I never used before. First I made a stored procedure (SP) sp_add_record which update, insert new status or insert a new element given id and export_dt from patient_stg. This stored procedure was then used using SP with cursor (curs_add_records):
CALL curs_add_records();
SELECT * FROM person;
+----+--------+--------------+------------+
| id | name | effective_dt | expiry_dt |
+----+--------+--------------+------------+
| 1 | Jonn | 2000-01-01 | 2000-01-04 |
| 2 | Marry | 2000-01-01 | 2000-01-09 |
| 1 | John | 2000-01-05 | 9999-12-31 |
| 2 | Mary | 2000-01-10 | 2000-01-14 |
| 3 | Samuel | 2000-01-10 | 9999-12-31 |
| 2 | Maria | 2000-01-15 | 9999-12-31 |
+----+--------+--------------+------------+
The advantage of this procedure is that I can load table with the same code independently if it is an inital load (population load) or incremental.
Literatur I used:
Djoni Damrawikarte: Dimensional Data Warehousing with MySQL (DWH issues)
Ben Forta: MariaDB Crash Course (SP issues)
What follows are the SP I used.
PS: Was it appropriate to answer to my own question?
DELIMITER //
DROP PROCEDURE IF EXISTS sp_add_record //
CREATE PROCEDURE sp_add_record(
IN p_id INTEGER
, IN p_export_dt DATE
)
BEGIN
-- Change expiry_dt
UPDATE
person p
, person_stg s
SET p.expiry_dt = SUBDATE(p_export_dt,1)
WHERE p.id = s.id
AND p.id = p_id
AND s.export_dt = p_export_dt
AND p.effective_dt <= p_export_dt
AND ( p.name <> s.name )
AND p.expiry_dt = '9999-12-31'
;
-- Add new status
INSERT INTO person
SELECT s.id, s.name, s.export_dt, '9999-12-31' expiry_dt
FROM
person p
, person_stg s
WHERE p.id = s.id
AND p.id = p_id
AND s.export_dt = p_export_dt
AND ( p.name <> s.name )
-- does a entry exists with new expiry_dt?
AND EXISTS (SELECT *
FROM person p2
WHERE p2.id = p.id
AND p.expiry_dt = SUBDATE(p_export_dt,1)
)
-- entry with open expiry_dt not should not exist
AND NOT EXISTS (SELECT *
FROM person p3
WHERE p3.id = p.id
AND p3.expiry_dt = '9999-12-31'
)
;
-- Add new id
INSERT INTO person
SELECT s.id, s.name, s.export_dt, '9999-12-31' expiry_dt
FROM person_stg s
WHERE s.export_dt = p_export_dt
AND s.id = p_id
-- Add new id from stage if it does not exist in person
AND s.id NOT IN (SELECT p3.id
FROM person p3
WHERE p3.id = s.id
AND p3.expiry_dt = '9999-12-31'
)
;
END
//
DELIMITER ;
DELIMITER //
DROP PROCEDURE IF EXISTS curs_add_records //
CREATE PROCEDURE curs_add_records()
BEGIN
-- Local variables
DECLARE done BOOLEAN DEFAULT 0;
DECLARE p_id INTEGER;
DECLARE p_export_dt DATE;
-- Cursor
DECLARE c1 CURSOR
FOR
SELECT id, export_dt
FROM person_stg
ORDER BY export_dt, id
;
-- Declare continue handler
DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET done=1;
-- Open cursor
OPEN c1;
-- Loop through all rows
REPEAT
-- Get record
FETCH c1 INTO p_id, p_export_dt;
-- Call add record procedure
CALL sp_add_record(p_id,p_export_dt);
-- End of loop
UNTIL done END REPEAT;
-- Close cursor
CLOSE c1;
END;
//
DELIMITER ;
I have 2 tables.
CREATE TABLE designs
( game_id INT NOT NULL,
des_id INT NOT NULL,
PRIMARY KEY(game_id, des_id),
FOREIGN KEY(game_id) REFERENCES Game(id),
ON UPDATE CASCADE)
CREATE TABLE designer
( name VARCHAR(30) NOT NULL,
id INT NOT NULL,
PRIMARY KEY(id),
FOREIGN KEY(id) REFERENCES designs(des_id),
ON UPDATE CASCADE);
Lets say I have data:
designs:
0---0
0---1
1---2
2---3
2---4
.............................
designer:
Bob---0
Jill---1
Bob---2
Rob---3
Jill---4
After the update, I would like the "designs" table to look like:
0---0
0---1
1---0
2---3
2---1
What update query would I need to accomplish this?
Some queries I tried are:
UPDATE designs
SET des_id = (
SELECT a.id
FROM designer as a
JOIN designer as b
ON a.name=b.name AND a.id < b.id
WHERE des_id = b.id);
...
UPDATE `designs` as a
JOIN designer as b
ON a.des_id=b.id
SET a.des_id = b.id
WHERE b.id = (
SELECT c.id
FROM designer as c
LEFT JOIN designer as d
ON c.name=d.name
WHERE c.id<d.id)
Here's one idea. Note that it uses an documented hack in the form of a 'group by/order by' trick:
UPDATE designs d
JOIN
( select d1.id matcher_id
, d2.id select_id
from `designer` d1
JOIN designer d2
ON d1.name = d2.name
group
by d1.id
Order
by d2.id
) x
ON x.matcher_id = d.des_id
SET d.des_id = select_id
Your LEFT JOIN idea is almost right, but here's another idea which is faster...
DROP TABLE IF EXISTS designs;
CREATE TABLE designs
( game_id INT NOT NULL
, designer_id INT NOT NULL
, PRIMARY KEY(game_id, designer_id)
);
DROP TABLE IF EXISTS designers;
CREATE TABLE designers
( name VARCHAR(30) NOT NULL
, designer_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
);
INSERT INTO designs VALUES
(1,1),
(1,2),
(2,3),
(3,4),
(3,5);
INSERT INTO designers VALUES
('Bob',1),
('Jill',2),
('Bob',3),
('Rob',4),
('Jill',5);
SELECT * FROM designs;
+---------+-------------+
| game_id | designer_id |
+---------+-------------+
| 1 | 1 |
| 1 | 2 |
| 2 | 3 |
| 3 | 4 |
| 3 | 5 |
+---------+-------------+
SELECT * FROM designers;
+------+-------------+
| name | designer_id |
+------+-------------+
| Bob | 1 |
| Jill | 2 |
| Bob | 3 |
| Rob | 4 |
| Jill | 5 |
+------+-------------+
UPDATE designs g
JOIN designers d
ON d.designer_id = g.designer_id
JOIN designers x ON x.name = d.name
JOIN
( SELECT name
, MIN(designer_id) min_designer_id
FROM designers
GROUP
BY name
) y
ON y.name = x.name
AND y.min_designer_id = x.designer_id
SET g.designer_id = x.designer_id;
SELECT * FROM designs;
+---------+-------------+
| game_id | designer_id |
+---------+-------------+
| 1 | 1 |
| 1 | 2 |
| 2 | 1 |
| 3 | 2 |
| 3 | 4 |
+---------+-------------+
Actually, in the special case of an UPDATE, I think this will work just as well, and I'm not really sure that it's any less performative...
UPDATE designs g
JOIN designers x
ON x.designer_id = g.designer_id
JOIN designers y
ON y.name = x.name
AND y.designer_id < x.designer_id
SET g.designer_id = y.designer_id;