get more fields from multiple things in sql - mysql

I need to get multiple information within 1 query if possible.
Lets say this is the row:
Name - c1 - c2 - c3 - c4
What I need to get is the top in each column, so like
Paulus - 50 - 0 - 0 - 0
John - 0 - 50 - 0 - 0
Anne - 0 - 0 - 50 - 0
Chris - 0 - 0 - 0 - 50
And my query should return something like:
Paulus - c1 (50) - John - c2 (50) - Anne - c3 (50) - Chris - c4 (50)
Name - c1 - name - c2
I've tried: SELECT Name, c1, Name, c2 FROM table ORDER BY c1 DESC, c2 DESC
But it just doesn't work, I know it all looks vague but I hope someone is able to understand my question here...

you can create a store procedure or function and use a while loop then create a temp table in it and in each loop alter your table and add new column at the end return your answer.
but i recommend you to use another way to get your answer and change it in view. just change it for show to end-user.

You can do it with a subquery and series of self joins. A subquery will return the max numbers within the c1, ..., c4 columns. Then join your table 4x on the previous subquery to get the record(s) where c1, ..., c4 columns match the maximums.
However, pls heed #strawberry's advice and normalise your data structure.
I'll give you a sample code for 2 fields:
select t1.name as maxc1name, m.maxc1, t2.name as maxc2name, m.maxc2 from
(select max(c1) as maxc1, max(c2) as maxc2 from mytable) m
inner join mytable t1 on m.maxc1=t1.c1
inner join mytable t2 on m.maxc2=t2.c2
The catch: what happens if there is a tie in any of the columns and more than 1 record matches from your table any of the max values. You have not defined what you want to do with these, so I'm not dealing with it either.

Leaving the schema requirement as is, try:
create table `c_data` (
`Name` varchar(24) not null,
`c1` int(5) not null,
`c2` int(5) not null,
`c3` int(5) not null,
`c4` int(5) not null
);
insert c_data values
('Paulus', 50, 50, 0, 0),
('John', 0, 50, 0, 0),
('Anne', 0, 0, 50, 0),
('Chris', 0, 0, 0, 50);
select
name,
case greatest(c1, c2, c3, c4)
when c1 then concat('c1', ' (', c1, ')')
when c2 then concat('c2', ' (', c2, ')')
when c3 then concat('c3', ' (', c3, ')')
when c4 then concat('c4', ' (', c4, ')')
end as top_column
from c_data
order by name;
As #shadow states you still have to decide how to handle ties. This query just takes the first column with the shared max value (see Paulus).
BUT, consider normalizing your table as has been suggested. The data will be stored more efficiently and the table will be more versatile for future queries. Ties still need to be handled. This version takes the first data_type from an ascending alphanumeric sort.
create table c_data
(
name varchar(24),
data_type varchar(8),
data_value int(5)
);
insert into c_data values
('Paulus', 'c1', 50),
('Paulus', 'c2', 50),
('John', 'c2', 50),
('Anne', 'c3', 50),
('Chris', 'c4', 50);
select
name,
concat(min(data_type), ' (', max(data_value), ')') as top_column
from c_data
group by name
order by name;

Related

How to INSERT a value based on the current date and a generated sequence number in MySQL?

I have this MySQL table:
CREATE TABLE bills
(
id_interess INT UNSIGNED NOT NULL,
id_bill VARCHAR(30) NULL,
PRIMARY KEY (id_interess)
) ENGINE=InnoDB;
And now I want to be able to manually insert unique integer for id_interess and automatically generate id_bill so that it consists of a current date and an integer (integer resets on a new year using trigger) like this:
id_interess |id_bill |
------------+-----------+
1 |20170912-1 |
2 |20171030-2 |
6 |20171125-3 |
10 |20171231-4 |
200 |20180101-1 |
3 |20180101-2 |
8 |20180102-3 |
If anyone has direct solution to this using only one query, I would be very glad! I only came up with a solution that uses three queries, but I still get some errors...
My newbie attempt: I created an additional column id_bill_tmp which holds integer part of id_bill like this:
CREATE TABLE bill
(
id_interess INT UNSIGNED NOT NULL,
id_bill_tmp INT UNSIGNED NULL,
id_bill VARCHAR(30) NULL,
PRIMARY KEY (id_interess)
) ENGINE=InnoDB;
Table from above would in this case look like this (note that on new year id_bill_tmp is reset to 1 and therefore I can't use AUTO_INCREMENT which can only be used on keys and keys need unique values in a column):
id_interess |id_bill_tmp |id_bill |
------------+--------------+-----------+
1 |1 |20170912-1 |
2 |2 |20171030-2 |
6 |3 |20171125-3 |
10 |4 |20171231-4 |
200 |1 |20180101-1 |
3 |2 |20180101-2 |
6 |3 |20180102-3 |
So for example to insert 1st row from the above table, table would have to be empty, and I would insert a value in three queries like this:
1st query:
INSERT INTO racuni (id_interess) VALUES (1);
I do this first because I don't know how to increment a nonexistent value for id_bill_tmp and this helped me to first get id_bill_tmp = NULL:
id_interess |id_bill_tmp |id_bill |
------------+--------------+-----------+
1 |[NULL] |[NULL] |
2nd query
Now I try to increment id_bill_tmp to become 1 - I tried two queries both fail saying:
table is specified twice both as a target for 'update' and as a separate source for data
This are the queries I tried:
UPDATE bills
SET id_bill_tmp = (SELECT IFNULL(id_bill_tmp, 0)+1 AS id_bill_tmp FROM bills)
WHERE id_interess = 1;
UPDATE bills
SET id_bill_tmp = (SELECT max(id_bill_tmp)+1 FROM bills)
WHERE id_interess = 1;
3rd query:
The final step would be to reuse id_bill_tmp as integer part of id_bill like this:
UPDATE bills
SET id_bill = concat(curdate()+0,'-',id_bill_tmp)
WHERE id_interess = 1;
so that I finally get
id_interess |id_bill_tmp |id_bill |
------------+--------------+-----------+
1 |1 |20170912-1 |
So if anyone can help me with the 2nd query or even present a solution with a single query or even without using column id_bill_tmp it would be wonderful.
Solution #1 - with the extra column
Demo
http://rextester.com/GOTPA70741
SQL
INSERT INTO bills (id_interess, id_bill_tmp, id_bill) VALUES (
1, -- (Change this value appropriately for each insert)
IF(LEFT((SELECT id_bill FROM
(SELECT MAX(CONCAT(LEFT(id_bill, 8),
LPAD(SUBSTR(id_bill, 10), 10, 0))) AS id_bill
FROM bills) b1), 4) = DATE_FORMAT(CURDATE(),'%Y'),
IFNULL(
(SELECT id_bill_tmp
FROM (SELECT id_bill_tmp
FROM bills
WHERE CONCAT(LEFT(id_bill, 8),
LPAD(SUBSTR(id_bill, 10), 10, 0)) =
(SELECT MAX(CONCAT(LEFT(id_bill, 8),
LPAD(SUBSTR(id_bill, 10), 10, 0)))
FROM bills)) b2),
0),
0)
+ 1,
CONCAT(DATE_FORMAT(CURDATE(),'%Y%m%d'), '-' , id_bill_tmp));
Notes
The query looks slightly more complicated that it actually is because of the issue that MySQL won't let you directly use a subselect from the same table that's being inserted into. This is circumvented using the method of wrapping another subselect around it as described here.
Solution #2 - without the extra column
Demo
http://rextester.com/IYES40010
SQL
INSERT INTO bills (id_interess, id_bill) VALUES (
1, -- (Change this value appropriately for each insert)
CONCAT(DATE_FORMAT(CURDATE(),'%Y%m%d'),
'-' ,
IF(LEFT((SELECT id_bill
FROM (SELECT MAX(CONCAT(LEFT(id_bill, 8),
LPAD(SUBSTR(id_bill, 10), 10, 0))) AS id_bill
FROM bills) b1), 4) = DATE_FORMAT(CURDATE(),'%Y'),
IFNULL(
(SELECT id_bill_tmp
FROM (SELECT SUBSTR(MAX(CONCAT(LEFT(id_bill, 8),
LPAD(SUBSTR(id_bill, 10), 10, 0))), 9)
AS id_bill_tmp
FROM bills) b2),
0),
0)
+ 1));
Notes
This is along the same lines as above but gets the numeric value that would have been in id_bill_tmp by extracting from the right part of id_bill from the 10th character position onwards via SUBSTR(id_bill, 10).
Step by step breakdown
CONCAT(...) assembles the string by concatenating its parts together.
DATE_FORMAT(CURDATE(),'%Y%m%d') formats the current date as yyyymmdd (e.g. 20170923).
The IF(..., <x>, <y>) is used to check whether the most recent date that is already present is for the current year: If it is then the numeric part should continue by incrementing the sequence, otherwise it is reset to 1.
LEFT(<date>, 4) gets the year from the most recent date - by extracting from the first 4 characters of id_bill.
SELECT MAX(...) AS id_bill FROM bills gets the most recent date + sequence number from id_bill and gives this an alias of id_bill. (See the notes above about why the subquery also needs to be given an alias (b1) and then wrapped in another SELECT). See the two steps below for how a string is constructed such that MAX can be used for the ordering.
CONCAT(LEFT(id_bill, 8), ...) is constructing a string that can be used for the above ordering by combining the date part with the sequence number padded with zeros. E.g. 201709230000000001.
LPAD(SUBSTR(id_bill, 10), 10, 0) pads the sequence number with zeros (e.g. 0000000001 so that MAX can be used for the ordering. (See the comment by Paul Spiegel to understand why this needs to be done - e.g. so that sequence number 10 is ordered just after 9 rather than just after 1).
DATE_FORMAT(CURDATE(),'%Y') formats the current date as a year (e.g. 2017) for the IF comparison mentioned in (3) above.
IFNULL(<x>, <y>) is used for the very first row since no existing row will be found so the result will be NULL. In this case the numeric part should begin at 1.
SELECT SUBSTR(MAX(...), 9) AS id_bill_tmp FROM bills selects the most recent date + sequence number from id_bill (as described above) and then extracts its sequence number, which is always from character position 9 onwards. Again, this subquery needs to be aliased (b2) and wrapped in another SELECT.
+ 1 increments the sequence number. (Note that this is always done since 0 is used in the cases described above where the sequence number should be set to 1).
If you are certain to be inserting in chronological order, then this will both bump the number and eliminate the need for the annual trigger:
DROP FUNCTION fcn46309431;
DELIMITER //
CREATE FUNCTION fcn46309431 (_max VARCHAR(22))
RETURNS VARCHAR(22)
DETERMINISTIC
SQL SECURITY INVOKER
BEGIN
RETURN
CONCAT(DATE_FORMAT(CURDATE(), "%Y%m%d"), '-',
IF( LEFT(_max, 4) = YEAR(CURDATE()),
SUBSTRING_INDEX(_max, '-', -1) + 1,
1 ) );
END
//
DELIMITER ;
INSERT INTO se46309431 (id_interess, id_bill)
SELECT 149, fcn46309431(MAX(id_bill)) FROM se46309431;
SELECT * FROM se46309431;
(If you might insert out of date order, then the MAX(..) can mess up.)
A similar solution is shown here: https://www.percona.com/blog/2008/04/02/stored-function-to-generate-sequences/
What you could do is to create a sequence with table, as shown there:
delimiter //
create function seq(seq_name char (20)) returns int
begin
update seq set val=last_insert_id(val+1) where name=seq_name;
return last_insert_id();
end
//
delimiter ;
CREATE TABLE `seq` (
`name` varchar(20) NOT NULL,
`val` int(10) unsigned NOT NULL,
PRIMARY KEY (`name`)
)
Then you need to populate the sequence values for each year, like so:
insert into seq values('2017',1);
insert into seq values('2018',1);
insert into seq values('2019',1);
...
(only need to do this once)
Finally, this should work:
insert into bills (id_interess, id_bill)
select
123,
concat(date_format(now(), '%Y%m%d-'), seq(date_format(now(), '%Y')));
Just replace 123 with some real/unique/dynamic id and you should be good to go.
I think you should redesign your approach to make life easier.
I would design your table as follows:
id_interess |id_counter |id_bill |
------------+--------------+-----------+
1 |1 |20170912 |
2 |2 |20171231 |
3 |1 |20180101 |
Your desired output for the first row would be "20170912-1", but you would merge id_counter and id_bill in your SQL-Query or in your application logic, not directly in a table (here is why).
Now you can write your SQL-Statements for that table.
Furthermore, I would advise not to store the counter in the table. You should only read the records' id and date from your database and calculate the id_counter in your application (or even in your SQL-Query).
You could also declare your column id_counter as auto_increment and reset it each time, see here.
One approach to do in single query would be just save the date in your table when ever you update any record. For id_bill no., generate a sequence when you want to display the records.
Schema
CREATE TABLE IF NOT EXISTS `bill` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY,
`bill_date` date NULL
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
Query
select a.id,concat(DATE_FORMAT(a.bill_date,"%Y%m%d"),'-',a.no) id_bill
from(
select b.*,count(b2.bill_date) no
from bill b
join bill b2 ON (EXTRACT(YEAR FROM b.bill_date) = EXTRACT(YEAR FROM b2.bill_date)
and b.bill_date >= b2.bill_date)
group by b.id
order by b.bill_date,no
) a
Inner query will return you the rank of each record per year by joining the same table outer query just format the data as per your desired view
DEMO
If for same date there can be more than 1 entries then in inner query the id column which is set to auto_increment can be used to handle this case
Updated Query
select a.id,concat(DATE_FORMAT(a.bill_date,"%Y%m%d"),'-',a.no) id_bill
from(
select b.*,count(b2.bill_date) no
from bill b
join bill b2 ON (EXTRACT(YEAR FROM b.bill_date) = EXTRACT(YEAR FROM b2.bill_date)
and b.id >= b2.id)
group by b.id
order by b.bill_date,no
) a
Updated Demo
The following solution requires generated (virtual) columns (available in MySQL 5.7 and MariaDB).
CREATE TABLE bills (
id_interess INT UNSIGNED NOT NULL,
bill_dt DATETIME DEFAULT CURRENT_TIMESTAMP,
bill_year YEAR AS (year(bill_dt)),
year_position INT UNSIGNED NULL,
id_bill VARCHAR(30) AS (concat(date_format(bill_dt, '%Y%m%d-'), year_position)),
PRIMARY KEY (id_interess),
INDEX (bill_year, year_position)
) ENGINE=InnoDB;
bill_year and id_bill are not stored in the table. They are derived from other columns. However - bill_year is stored in the index, which we need to get the last position for a specific year efficiently (it would also work without the index).
To insert a new row with the current timestamp:
insert into bills(id_interess, year_position)
select 1, coalesce(max(year_position), 0) + 1
from bills
where bill_year = year(now());
You can also use a custom timestamp or date:
insert into bills(id_interess, bill_dt, year_position)
select 10, '2016-01-01', coalesce(max(year_position), 0) + 1
from bills
where bill_year = year('2016-01-01')
Demo: https://www.db-fiddle.com/f/8pFKQb93LqNPNaD5UhzVwu/0
To get even simpler inserts, you can create a trigger which will calculate year_postion:
CREATE TRIGGER bills_after_insert BEFORE INSERT ON bills FOR EACH ROW
SET new.year_position = (
SELECT coalesce(max(year_position), 0) + 1
FROM bills
WHERE bill_year = year(coalesce(new.bill_dt, now()))
);
Now your insert statement would look like:
insert into bills(id_interess) values (1);
or
insert into bills(id_interess, bill_dt) values (11, '2016-02-02');
And the select statements:
select id_interess, id_bill
from bills
order by id_bill;
Demo: https://www.db-fiddle.com/f/55yqMh4E1tVxbpt9HXnBaS/0
Update
If you really, really need to keep your schema, you can try the following insert statement:
insert into bills(id_interess, id_bill)
select
#id_interess,
concat(
date_format(#date, '%Y%m%d-'),
coalesce(max(substr(id_bill, 10) + 1), 1)
)
from bills
where id_bill like concat(year(#date), '%');
Replace #id_interess and #date accordingly. For #date you can use CURDATE() but also any other date you want. There is no issue inserting dates out of order. You can even insert dates from 2016 when entries for 2017 already exist.
Demo: http://rextester.com/BXK47791
The LIKE condition in the WHERE clause can use an index on id_bill (if you define it), so the query only need to read the entries from the same year. But there is no way to determine the last counter value efficiently with this schema. The engine will need to read all rows for the cpecified year, extract the counter and search for the MAX value. Beside the complexity of the insert statement, this is one more reason to change the schema.

How to select all records based on non-duplication of one column

I just cannot seem to find an answer for this deceptively simple question. Most every solution either deletes all the duplicates, selects all the duplicates, or selects all the records except the duplicates. How can I select all rows such that, in this example, the "name" column values are unique, while selecting the first record of any duplicate set and ignoring the remaining duplicates of that same name? I do need all the values from all the columns in all the records in the selected record set.
Given the set of records:
pk fk name secs note
1 100 cat 90 gray
2 111 dog 123 mix
3 233 fish 75 gold
4 334 dog 932 black
5 238 cow 90 stray
6 285 cat 90 stray
The returned set should be:
pk fk name secs note
1 100 cat 90 gray
2 111 dog 123 mix
3 233 fish 75 gold
5 238 cow 90 stray
-- SQL
drop table if exists foo;
create table foo (
pk int unsigned,
fk int unsigned,
name varchar(10),
secs int,
note varchar(10),
primary key (pk)
) engine=innodb default charset=utf8;
insert into foo
(pk, fk, name, secs, note)
values
(1, 100, 'cat', 90, 'gray'),
(2, 111, 'dog', 123, 'mix'),
(3, 233, 'fish', 75, 'gold'),
(4, 334, 'dog', 932, 'black'),
(5, 238, 'cow', 90, 'stray'),
(6, 285, 'cat', 90, 'stray');
Here is a query that that returns what you're looking for:
SELECT
F.*
FROM
foo F
INNER JOIN
( SELECT
MIN(F2.pk) AS `pk`
,F2.name
FROM foo F2
GROUP BY F2.name
) T
ON T.pk = F.pk;
Hope this will help you.
I think you are looking for this :
select * from table group by name;
Which can be written using distinct keyword as :
select distinct on name * from table;
In SQL it will be
select pk, fk, name, secs, note
from (select row_number()
over(partition by name order by pk) as rn,*
from foo) as tbl
where rn=1

how to avoid missing rows? and strange case of Distinct not working in mysql

a very simple SQL statement is returning two rows instead of the one i would expect.
select distinct(F_NAME) from tab2 where F_NAME like '%SMITH%'
F_NAME
CLAIRE_SMITH
CLAIRE_SMITH
The text is the same, tried lowercase, trim and some other text functions to see if i can find a difference, but no joy. I've also re-entered the data by hand and by using an update function. I've also checked character encoding to make sure nothing weird was happening, both are varchar(90) and latin1_general_ci
The name actually exists in two tables I am looking at with around 5 rows in the tab1 table and 100 row in tab2
The problem came to light when joining^1 tab1 and tab2 tables together where F_NAME=F_NAME and CLAIRE_SMITH didn't appear in the results, yet every other person in tab1and tab2 are returned.
^1 right join, implicit join, left join, right join, left outer join and right outer join.
tab1
F_NAME, F_DEPTNO, F_AGE
CLAIRE_SMITH, 1, 17
BOB_JONES, 2, 37,
SUE_JENKINS, 2, 29,
tab2
F_ID, F_NAME, F_VALUE1, F_VALUE2, F_VALUE3
1, CLAIRE_SMITH, 10, 11, 15
2, BOB_JONES, 15, 11, 15
3, SUE_JENKINS, 20, 13, 14
4, CLAIRE_SMITH, 10, 11, 15
5, BOB_JONES, 15, 11, 15
6, SUE_JENKINS, 20, 13, 14
what i am trying to do is sum the values in tab2 and group by F_NAME, whilst adding in some values from tab1, unfortunately their is no foreign key in tab2 that i can use to aid the join and the data is provided as is and all i have to work with.
my query outputs correctly for all people except CLAIRE_SMITH who does not appear.
SELECT a.F_DEPTNO, a.F_NAME, sum(b.F_VALUE1), sum(b.F_VALUE2), sum(b.F_VALUE3)
FROM TAB1 a, TAB2 b WHERE a.F_NAME=b.F_NAME
GROUP BY a.F_ID, a.F_NAME
ORDER BY a.F_ID ASC
can anyone explain what might be happening?
1) the name appears to be the same but is returned as two distinct instances
2) the join excludes this person.
thanks
My reputation is too low to comment. This query might be interesting:
select id1.f_name=id4.f_name
from ( select f_name from tab2 where id=1 ) id1
join ( select f_name from tab2 where id=4 ) id4 using (f_name)
This would indicate whether the db considers those two values equal (query will return 1) or not (query will return 0)
Use
select hex(F_NAME) from tab2 where F_NAME like '%SMITH%'
in order to find the difference.
hex function returns a hexadecimal string representation, so any invisible (i.e space) as a string becomes revealed.
DISTINCT is not a function.
That aside, I cannot reproduce this problem. Perhaps there's something you've neglected to mention...
CREATE TABLE tab2
( F_ID INT NOT NULL AUTO_INCREMENT PRIMARY KEY
, F_NAME VARCHAR(15) NOT NULL
, F_VALUE1 INT NOT NULL
, F_VALUE2 INT NOT NULL
, F_VALUE3 INT NOT NULL
);
INSERT INTO tab2 VALUES
(1, 'CLAIRE_SMITH', 10, 11, 15),
(2, 'BOB_JONES', 15, 11, 15),
(3, 'SUE_JENKINS', 20, 13, 14),
(4, 'CLAIRE_SMITH', 10, 11, 15),
(5, 'BOB_JONES', 15, 11, 15),
(6, 'SUE_JENKINS', 20, 13, 14);
SELECT DISTINCT f_name FROM tab2 WHERE f_name LIKE '%smith%';
+--------------+
| f_name |
+--------------+
| CLAIRE_SMITH |
+--------------+
http://sqlfiddle.com/#!2/d3b27/1

Coalesce equivalent for nth not null value - MySQL

I have been tearing my hair out over this issue. I am working with an existing data set and need to remove all the null values from the columns in table A and shunt them across so they are ordered like in table B
I need something which is equivalent to Coalesce but to retrieve the nth value so I can get the result sorted like in table B
What I have:
Table A
Name CURRENT OCT12 SEPT12 AUG12 JUL12 JUN12 MAY12 APR12
---------------------------------------------------------
A NULL NULL Aug-12 NULL NULL Jun-12 NULL Apr-12
B Nov-12 NULL Aug-12 NULL Jul-12Jun-12 NULL Apr-12
What I need:
Table B
Name Change1 Change2 Change3 Change4 Change5 Change6
----------------------------------------------------
A Aug-12 Jun-12 Apr-12 NULL NULL NULL
B Nov-12 Aug-12 Jul-12 Jun-12 Apr-12 NULL
Code-wise, it would be something like:
Select
first non-null value as Change1
,second non-null value as Change2
,third non-null value as Change3
,fourth non-null value as Change4
,fifth non-null value as Change5...etc..
from Table_A
I am using MySQL and i have no idea how to reference the nth non null value in order to call them into Table_B
Does anyone have any ideas?
I am not sure if I would reccommend using this solution... normalization of your data is always a better choice, but I wanted to answer using plain SQL with some strings functions. This query should return what you are looking for:
SELECT
Name,
Changes,
REVERSE(SUBSTRING_INDEX(REVERSE(SUBSTRING_INDEX(Changes, ',', 1)), ',', 1)) as Change1,
REVERSE(SUBSTRING_INDEX(REVERSE(SUBSTRING_INDEX(Changes, ',', 2)), ',', 1)) as Change2,
REVERSE(SUBSTRING_INDEX(REVERSE(SUBSTRING_INDEX(Changes, ',', 3)), ',', 1)) as Change3,
REVERSE(SUBSTRING_INDEX(REVERSE(SUBSTRING_INDEX(Changes, ',', 4)), ',', 1)) as Change4,
REVERSE(SUBSTRING_INDEX(REVERSE(SUBSTRING_INDEX(Changes, ',', 5)), ',', 1)) as Change5,
REVERSE(SUBSTRING_INDEX(REVERSE(SUBSTRING_INDEX(Changes, ',', 6)), ',', 1)) as Change6
FROM (
SELECT
Name,
CONCAT_WS(',', CURRENT, OCT12, SEPT12, AUG12, JUL12, JUN12, MAY12, APR12, ',') as Changes
FROM
TableA
) s
I'm concatenating all values in a comma separated string, with two commas at the end of the string (one comma would be enough anyway, but it's easier to put two and just ignore the last one...), and since I'm using CONCAT_WS it will automatically skip null values, and the resulting string will be something like Aug-12,Jun-12,Apr-12,,.
Then in the outer query I'm extracting the n-th element of the string, using SUBSTRIG_INDEX. I would recommend to normalize your database, but if you need a quick fix this solution might be a good starting point.
See it working here.
Please notice that I am not returning NULL values where there are no changes, but I am returning empty strings instead. This can be changed if you need.
If you don't want to use strings functions you can try this sql using unpivot and row number partitioning:
CREATE TABLE #TableA
(
"Name" VARCHAR(10),
"CURRENT" VARCHAR(10),
OCT12 VARCHAR(10),
SEPT12 VARCHAR(10),
AUG12 VARCHAR(10),
JUL12 VARCHAR(10),
JUN12 VARCHAR(10),
MAY12 VARCHAR(10),
APR12 VARCHAR(10)
)
INSERT INTO #TableA
("Name", "CURRENT", OCT12, SEPT12, AUG12, JUL12, JUN12, MAY12, APR12)
VALUES
('A', NULL, NULL, 'Aug-12', NULL, NULL, 'Jun-12', NULL, 'Apr-12'),
('B', 'Nov-12', NULL, 'Aug-12', NULL, 'Jul-12', 'Jun-12', NULL, 'Apr-12')
SELECT * FROM #TableA;
Select "Name",
Min(Case row_num When 1 Then data End) Change1,
Min(Case row_num When 2 Then data End) Change2,
Min(Case row_num When 3 Then data End) Change3,
Min(Case row_num When 4 Then data End) Change4,
Min(Case row_num When 5 Then data End) Change5,
Min(Case row_num When 6 Then data End) Change6
From
(
select "Name",data,DBColumnName,
ROW_NUMBER() OVER (PARTITION BY "Name" ORDER BY "Name") row_num
From #TableA
unpivot (data for DBColumnName in ("CURRENT",OCT12,SEPT12,AUG12,JUL12,JUN12,MAY12,APR12) ) as z
) TableB
group by "Name";
References:
-- TSQL Pivot without aggregate function
-- https://www.sqlservertutorial.net/sql-server-window-functions/sql-server-row_number-function/
-- https://learn.microsoft.com/en-us/sql/t-sql/queries/from-using-pivot-and-unpivot?view=sql-server-ver15

Correct way to store uni/bi/trigrams ngrams in RDBMS?

I have a list of unigrams (single word), bigrams (two words), and trigrams (three words) I have pulled out of a bunch of documents. My goal is a statically analyses report and also a search I can use on these documents.
John Doe
Xeon 5668x
corporate tax rates
beach
tax plan
Porta San Giovanni
The ngrams are tagged by date and document. So for example, I can find relations between bigrams and when their phrases first appeared as well as relations between documents. I can also search for documents that contain these X number of un/bi/trigram phrases.
So my question is how to store them to optimize these searches.
The simplest approach is just a simple string column for each phrase and then I add relations to the document_ngram table each time I find that word/phrase in the document.
table document
{
id
text
date
}
table ngram
{
id
ngram varchar(200);
}
table document_ngram
{
id
ngram_id
document_id
date
}
However, This means that if I want to search through trigrams for a single word I have to use string searching. For example, lets say I wanted all trigrams with the word "summer" in them.
So if I instead split the words up so that the only thing stored in ngram was a single word, then added three columns so that all 1, 2, & 3 word chains could fit inside document_ngram?
table document_ngram
{
id
word1_id NOT NULL
word2_id DEFAULT NULL
word3_id DEFAULT NULL
document_id
date
}
Is this the correct way to do it? Are their better ways? I am currently using PostgreSQL and MySQL but I believe this is a generic SQL question.
This is how I would model your data (note that 'the' is referenced twice) You could also add weights to the single words.
DROP SCHEMA ngram CASCADE;
CREATE SCHEMA ngram;
SET search_path='ngram';
CREATE table word
( word_id INTEGER PRIMARY KEY
, the_word varchar
, constraint word_the_word UNIQUE (the_word)
);
CREATE table ngram
( ngram_id INTEGER PRIMARY KEY
, n INTEGER NOT NULL -- arity
, weight REAL -- payload
);
CREATE TABLE ngram_word
( ngram_id INTEGER NOT NULL REFERENCES ngram(ngram_id)
, seq INTEGER NOT NULL
, word_id INTEGER NOT NULL REFERENCES word(word_id)
, PRIMARY KEY (ngram_id,seq)
);
INSERT INTO word(word_id,the_word) VALUES
(1, 'the') ,(2, 'man') ,(3, 'who') ,(4, 'sold') ,(5, 'world' );
INSERT INTO ngram(ngram_id, n, weight) VALUES
(101, 6, 1.0);
INSERT INTO ngram_word(ngram_id,seq,word_id) VALUES
( 101, 1, 1)
, ( 101, 2, 2)
, ( 101, 3, 3)
, ( 101, 4, 4)
, ( 101, 5, 1)
, ( 101, 6, 5)
;
SELECT w.*
FROM ngram_word nw
JOIN word w ON w.word_id = nw.word_id
WHERE ngram_id = 101
ORDER BY seq;
RESULT:
word_id | the_word
---------+----------
1 | the
2 | man
3 | who
4 | sold
1 | the
5 | world
(6 rows)
Now, suppose you want to add a 4-gram to the existing (6-gram) data:
INSERT INTO word(word_id,the_word) VALUES
(6, 'is') ,(7, 'lost') ;
INSERT INTO ngram(ngram_id, n, weight) VALUES
(102, 4, 0.1);
INSERT INTO ngram_word(ngram_id,seq,word_id) VALUES
( 102, 1, 1)
, ( 102, 2, 2)
, ( 102, 3, 6)
, ( 102, 4, 7)
;
SELECT w.*
FROM ngram_word nw
JOIN word w ON w.word_id = nw.word_id
WHERE ngram_id = 102
ORDER BY seq;
Additional result:
INSERT 0 2
INSERT 0 1
INSERT 0 4
word_id | the_word
---------+----------
1 | the
2 | man
6 | is
7 | lost
(4 rows)
BTW: adding a document-type object to this model will add two additional tables to this model: one for the document, and one for document*ngram. (or in another approach: for document*word) A recursive model would also be a possibility.
UPDATE: the above model will need an additional constraint, which will need triggers (or a rule+ an additional table) to be implemented. Pseudocode:
ngram_word.seq >0 AND ngram_word.seq <= (select ngram.n FROM ngram ng WHERE ng.ngram_id = ngram_word.ngram_id)
One idea would be to modify your original table layout a bit. Consider the ngram varchar(200) column to only contain 1 word of the ngram, add in a word_no (1, 2, or 3) column, and add in a grouping column, so that, for example the two records for the two words in a bigram are related (give them the same word_group). [In Oracle, I'd pull the word_group numbers from a Sequence - I think PostGres would have something similar)
table document
{
id
text
date
}
table ngram
{
id
word_group
word_no
ngram varchar(200);
}
table document_ngram
{
id
ngram_id
document_id
date
}