How to break data of columns in multiple rows in SQL - sql-server-2008

I have a table in which there are some columns with a single value and some columns with multiple values in a string separated by a character '&'.
CREATE TABLE [dbo].[tbl_data](
[SessionNumber] [float] NULL,
[Patientnumber] [nvarchar](max) NULL,
[Operationnumber] [nvarchar](max) NULL)
INSERT INTO tbl_data([SessionNumber],[Patientnumber],[Operationnumber]) values (3000010815,'0021234360&0010426450','A&B')
INSERT INTO tbl_data([SessionNumber],[Patientnumber],[Operationnumber]) values (3000010816,'0060570630&0077815550&0002201160','C&D&E')
I want to get the data in the following way:
SessionNumber | PatientNumber | Operatienumber
3000010815 | 0021234360 | A
3000010815 | 0010426450 | B
3000010816 | 0060570630 | C
3000010816 | 0077815550 | D
3000010816 | 0002201160 | E
Means splitting it to rows depending on the number of values separated by '&'. I tried it by creating a table-valued function which splits a string in rows by a separator, but I dont know how to use it for multiple columns and multiple rows as well.

As SQL Server 2008 does not have a built in Split-function you might try this:
The trick is to transform your A&C&E in <x>A</x><x>C</x><x>E</x>.
This can easily be splitted with XML methods.
Attention:
If your real operation-codes might contain one of the characters <> or & you must replace this with < > and &
EDIT new code
This code will create two numbered sub-sets and join them together.
If performance matters there would be a faster approach with a numbers (tally) table to read the elements according to their position in the XML
CREATE TABLE [dbo].[tbl_data](
[SessionNumber] [float] NULL,
[Patientnumber] [nvarchar](max) NULL,
[Operationnumber] [nvarchar](max) NULL)
INSERT INTO tbl_data([SessionNumber],[Patientnumber],[Operationnumber]) values (3000010815,'0021234360&0010426450','A&B')
INSERT INTO tbl_data([SessionNumber],[Patientnumber],[Operationnumber]) values (3000010816,'0060570630&0077815550&0002201160','C&D&E');
WITH Splitted AS
(
SELECT SessionNumber
,CAST('<x>' + REPLACE(Patientnumber,'&','</x><x>') + '</x>' AS XML) AS CastedPNr
,CAST('<x>' + REPLACE(Operationnumber,'&','</x><x>') + '</x>' AS XML) AS CastedONr
FROM tbl_data
)
,NumberedPN AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY SessionNumber ORDER BY (SELECT NULL)) AS PNInx
,PNr.value('.','int') AS PNr
,SessionNumber AS PNS
FROM Splitted
CROSS APPLY CastedPNr.nodes('x') AS One(PNr)
)
,NumberedON AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY SessionNumber ORDER BY (SELECT NULL)) AS ONInx
,ONr.value('.','varchar(max)') AS ONr
,SessionNumber AS ONS
FROM Splitted
CROSS APPLY CastedONr.nodes('x') AS One(ONr)
)
SELECT SessionNumber
,p.PNr
,o.ONr
FROM tbl_data AS d
INNER JOIN NumberedPN AS p ON d.SessionNumber=p.PNS
INNER JOIN NumberedON AS o ON d.SessionNumber=o.ONS AND p.PNInx=o.ONInx;

If you have a split() function, then you can use apply:
select d.*, s.*
from tbl_data d outer apply
dbo.split(d.operationnumber, '&') s;
Note: You should not be storing multiple values in a single column. You should be using a junction table instead.

Related

How to find which values are not in a column from the list?(SQL) [duplicate]

This question already has an answer here:
Select values from a list that are not in a table
(1 answer)
Closed 2 years ago.
I have a list of values:
('WEQ7EW', 'QWE7YB', 'FRERH4', 'FEY4B', .....)
and the dist table with a dist_name column.
and I need to create SQL query which would return values from the list which don't exist in the dist_name column.
Yo need to use left join. This requires creating a derived table with the values you care about. Here is typical syntax:
select v.val
from (values ('WEQ7EW'), ('QWE7YB'), ('FRERH4'), ('FEY4B')
) v(val) left join
t
on t.col = v.val
where t.col is null;
Not all databases support the values() table constructor but allow allow some method for creating a derived table. In MySQL, this looks like:
select v.val
from (select'WEQ7EW' as val union all
select 'QWE7YB' as val union all
select 'FRERH4' as val union all
select 'FEY4B' as val
) v(val) left join
t
on t.col = v.val
where t.col is null;
You would typically put this list of values in a derived table, and then use not exists. In MySQL:
select v.dist_name
from (
select 'WEQ7EW' as dist_name
union all select 'QWE7YB'
union all ...
) v
where not exists (select 1 from dist d where d.dist_name = v.dist_name)
Or if you are running a very recent version (8.0.19 or higher), you can use the VALUES ROW() syntax:
select v.dist_name
from (values row('WEQ7EW'), row('QWE7YB'), ...) v(dist_name)
where not exists (select 1 from dist d where d.dist_name = v.dist_name)
SELECT TRIM(TRAILING ',' FROM result) result
FROM ( SELECT #tmp:=REPLACE(#tmp, CONCAT(words.word, ','), '') result
FROM words, (SELECT #tmp:='WEQ7EW,QWE7YB,FRERH4,FEY4B,') arg
) perform
ORDER BY LENGTH(result) LIMIT 1;
fiddle
The list of values to be cleared from existing values is provided as CSV string with final comma and without spaces before/after commas ('WEQ7EW,QWE7YB,FRERH4,FEY4B,' in shown code).
If CSV contains duplicate values all of them will be removed whereas non-removed duplicates won't be compacted. The relative arrangement of the values will stay unchanged.
Remember that this query performs full table scan, so it is not applicable to huge tables because it will be slow.

What is the proper MySQL way to take data from 4 rows, 1 column, and separate into 9 columns?

I've studied and tried days worth of SQL queries to find "something" that will work. I have a table, apj32_facileforms_subrecords, that uses 7 columns. All the data I want to display is in 1 column - "value". The "record" displays the number of the entry. The "title" is what I would like to appear in the header row, but that's not as important as "value" to display in 1 row based upon "record" number.
I've tried a lot of CONCAT and various Pivot queries, but nothing seems to do more than "get close" to what I'd like as the end result.
Here's a screen shot of the table:
The output "should" be linear, so that 1 row contains 9 columns:
Project; Zipcode; First Name; Last Name; Address; City; Phone; E-mail; Trade (in that order). And the values in the 9 columns come from "value" as they relate to the "record" number.
I know there are LOT of examples that are similar, but nothing I've found covers taking all the values from "value" and CONCAT to 1 row.
This works to get all the data I want - SELECT record,value FROM apj32_facileforms_subrecords WHERE (record IN (record,value)) ORDER BY record
But the values are still in multiple rows. I can play with that query to get just the values, but I'm still at a loss to get them into 1 row. I'll keep playing with that query to see if I can figure it out before one of the experts here shows me how simple it is to do that.
Any help would be appreciated.
Using SQL to flatten an EAV model representation into a relational representation can be somewhat convoluted, and not very efficient.
Two commonly used approaches are conditional aggregation and correlated subqueries in the SELECT list. Both approaches call out for careful indexing for suitable performance with large sets.
correlated subqueries example
Here's an example of the correlated subquery approach, to get one value of the "zipcode" attribute for some records
SELECT r.id
, ( SELECT v1.value
FROM `apj32_facileforms_subrecords` v1
WHERE v1.record = r.id
AND v1.name = 'zipcode'
ORDER BY v1.value LIMIT 0,1
) AS `Zipcode`
FROM ( SELECT 1 AS id ) r
Extending that, we repeat the correlated subquery, changing the attribute identifier ('firstname' in place of 'zipcode'. looks like we we could also reference it by element, e.g. v2.element = 2
SELECT r.id
, ( SELECT v1.value
FROM `apj32_facileforms_subrecords` v1
WHERE v1.record = r.id
AND v1.name = 'zipcode'
ORDER BY v1.value LIMIT 0,1
) AS `Zipcode`
, ( SELECT v2.value
FROM `apj32_facileforms_subrecords` v2
WHERE v2.record = r.id
AND v2.name = 'firstname'
ORDER BY v2.value LIMIT 0,1
) AS `First Name`
, ( SELECT v3.value
FROM `apj32_facileforms_subrecords` v3
WHERE v3.record = r.id
AND v3.name = 'lastname'
ORDER BY v3.value LIMIT 0,1
) AS `Last Name`
FROM ( SELECT 1 AS id UNION ALL SELECT 2 ) r
returns something like
id Zipcode First Name Last Name
-- ------- ---------- ---------
1 98228 David Bacon
2 98228 David Bacon
conditional aggregation approach example
We can use GROUP BY to collapse multiple rows into one row per entity, and use conditional tests in expressions to "pick out" attribute values with aggregate functions.
SELECT r.id
, MIN(IF(v.name = 'zipcode' ,v.value,NULL)) AS `Zip Code`
, MIN(IF(v.name = 'firstname' ,v.value,NULL)) AS `First Name`
, MIN(IF(v.name = 'lastname' ,v.value,NULL)) AS `Last Name`
FROM ( SELECT 1 AS id UNION ALL SELECT 2 ) r
LEFT
JOIN `apj32_facileforms_subrecords` v
ON v.record = r.id
GROUP
BY r.id
For more portable syntax, we can replace MySQL IF() function with more ANSI standard CASE expression, e.g.
, MIN(CASE v.name WHEN 'zipcode' THEN v.value END) AS `Zip Code`
Note that MySQL does not support SQL Server PIVOT syntax, or Oracle MODEL syntax, or Postgres CROSSTAB or FILTER syntax.
To extend either of these approaches to be dynamic, to return a resultset with a variable number of columns, and variety of column names ... that is not possible in the context of a single SQL statement. We could separately execute SQL statements to retrieve information, that would allow us to dynamically construct a SQL statement of a form show above, with an explicit set of columns to be returned.
The approaches outline above return a more traditional relational model, (individual columns each with a value).
non-relational munge of attributes and values into a single string
If we have some special delimiters, we could munge together a representation of the data using GROUP_CONCAT function
As a rudimentary example:
SELECT r.id
, GROUP_CONCAT(v.title,'=',v.value ORDER BY v.name) AS vals
FROM ( SELECT 1 AS id ) r
LEFT
JOIN `apj32_facileforms_subrecords` v
ON v.record = r.id
AND v.name in ('zipcode','firstname','lastname')
GROUP
BY r.id
To return two columns, something like
id vals
-- ---------------------------------------------------
1 First Name=David,Last Name=Bacon,Zip Code=98228
We need to be aware that the return from GROUP_CONCAT is limited to group_concat_max_len bytes. And here we have just squeezed the balloon, moving the problem to some later processing, to parse the resulting string. If we have any equal signs or commas that appear in the values, it's going to make a mess of parsing the result string. So we will have to properly escape any delimiters that appear in the data, so that GROUP_CONCAT expression is going to get more involved.

mysql group concat into multiple fields

I have a recipe table, called recipes. There is the IDRecipe field and other parameters of the recipe except the categories. Categories are multi dimensional, so I have another table that connects one to many with one recipe. It is called category table (table 1 below). As you will see below, one recipe can have multiple categories in multiple dimensions. So I have another table (table 2) that describes the categories and dimensions, also below:
-- Table 1
CREATE TABLE `recepti_kategorije` (
`IDRecipe` int(11) NOT NULL,
`IDdimenzija` int(11) NOT NULL,
`IDKategorija` int(11) NOT NULL,
KEY `Iskanje` (`IDdimenzija`,`IDKategorija`,`IDRecipe`) USING BTREE,
KEY `izvlecek_recept` (`IDdimenzija`,`IDRecipe`),
KEY `IDRecipe` (`IDRecept`,`IDdimenzija`,`IDKategorija`) USING BTREE,
KEY `kategorija` (`IDKategorija`,`IDdimenzija`,`IDRecipe`) USING BTREE
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_slovenian_ci;
INSERT INTO `recepti_kategorije` VALUES
(1,1,1),
(1,1,2),
(1,2,3),
(1,3,2);
-- Table 2
CREATE TABLE `recipes_dimensions` (
`IDDimenzija` int(11) NOT NULL,
`IDKategorija` int(11) NOT NULL,
`Ime` char(50) COLLATE utf8_slovenian_ci NOT NULL,
KEY `IDDmenzija` (`IDDimenzija`,`IDKategorija`) USING BTREE,
KEY `IDKategorija` (`IDKategorija`,`IDDimenzija`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_slovenian_ci;
INSERT INTO `recipes_dimensions` VALUES
(1,1,'cheese'),
(1,2,'eggs'),
(1,3,'meat'),
(1,4,'vegetables'),
(2,1,'main dish'),
(2,2,'sweet'),
(2,3,'soup'),
(3,1,'summer'),
(3,2,'winter');
-- Table 3
CREATE TABLE `recepti_dimenzije_glavne` (
`IDDimenzija` int(11) NOT NULL,
`DimenzijaIme` char(50) COLLATE utf8_slovenian_ci DEFAULT NULL,
PRIMARY KEY (`IDDimenzija`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_slovenian_ci;
INSERT INTO `recepti_dimenzije_glavne` VALUES
(1,'ingredient'),
(2,'type'),
(3,'season');
Table 2 is the key table to find out the legend of each dimensions and each category.
So from this example we see that my recipe with ID1 has the tag: cheese and eggs from dimension 1 and is soup for winter season.
Now on my recipes page I need to get all this out to print the names of each dimension together with all the category names.
Ok, so there is another table, table 3, to get the names of the dimensions out:
Now what I need is a query that would get me at the same time for recipe ID=1 all the dimensions group concatenated with names, like:
ingredient: cheese, eggs | type: soup | season: winter
I tried doing a query for each of them in SELECT statement and it works, but I need 8 select queries (in total I have 8 dimensions, for the example I only wrote 3), my select query is:
SELECT
r.ID
(
SELECT
group_concat(ime SEPARATOR ', ')
FROM
recepti_kategorije rkat
JOIN recepti_dimenzije rd ON rd.IDKategorija = rkat.IDKategorija
AND rd.IDDimenzija = rkat.IDdimenzija
WHERE
rkat.IDRecipe = r.ID
AND rkat.IDDimenzija = 1
ORDER BY
ime ASC
) AS ingredient,
(
SELECT
group_concat(ime SEPARATOR ', ')
FROM
recepti_kategorije rkat
JOIN recepti_dimenzije rd ON rd.IDKategorija = rkat.IDKategorija
AND rd.IDDimenzija = rkat.IDdimenzija
WHERE
rkat.IDRecipe = r.ID
AND rkat.IDDimenzija = 2
ORDER BY
ime ASC
) AS type,
(
SELECT
group_concat(ime SEPARATOR ', ')
FROM
recepti_kategorije rkat
JOIN recepti_dimenzije rd ON rd.IDKategorija = rkat.IDKategorija
AND rd.IDDimenzija = rkat.IDdimenzija
WHERE
rkat.IDRecipe = r.ID
AND rkat.IDDimenzija = 3
ORDER BY
ime ASC
) AS season
FROM
recipes r
WHERE
r.ID = 1
That works, but it is somehow slow because the explain says it is searching like 6-8 rows each time and it is a long query and I don't get the names of the dimensions out because I need another join.
What would be optimal way to get all the dimensions separated into fields and concated with category names? I need to have this optimised as this is for one recipe presentation that happens each second, I can not fool around here. And whta indexes do I need so that this would be fast.
Something like below, not sure I typed the table/column names right or not, but should be easy to debug:
SELECT c.ID,GROUP_CONCAT(CONCAT(d.DimenzijaIme,': ',c.imes) SEPARATOR ' | ')
FROM (
SELECT
r.ID,rkat.IDDimenzija,
group_concat(rd.ime SEPARATOR ', ' ORDER BY rd.ime) AS imes
FROM recepti_kategorije rkat
JOIN recepti_dimenzije rd
ON rd.IDKategorija = rkat.IDKategorija
AND rd.IDDimenzija = rkat.IDdimenzija
INNER JOIN recipes r
ON r.ID=rkat.IDRecipe
GROUP BY r.ID,rkat.IDDimenzija) c
INNER JOIN recepti_dimenzije_glavne d
ON d.IDDimenzija=c.IDDimenzija
GROUP BY c.ID

MySQL Query Fixing/Optimisation for my configuration table

I got a mySQL table, that holds the configuration of my project, each configuration change creates a new entry, so that i have a history of all changes, and who changed it.
CREATE TABLE `configurations` (
`name` varchar(255) NOT NULL,
`value` text NOT NULL,
`lastChange` datetime NOT NULL,
`changedBy` bigint(32) NOT NULL,
KEY `lastChange` (`lastChange`),
KEY `name` (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
INSERT INTO `configurations` (`name`, `value`, `lastChange`, `changedBy`) VALUES
('activePageLimit', 'activePageLimit-old-value', '2016-01-06 12:25:05', 1096775260340178),
('activePageLimit', 'activePageLimit-new-value', '2016-01-06 12:27:57', 1096775260340178),
('customerLogo', 'customerLogo-old-value', '2016-02-06 00:00:00', 1096775260340178),
('customerLogo', 'customerLogo-new-value', '2016-01-07 00:00:00', 1096775260340178);
Right now i have a problem with my select query, that should return all names and their latest value (ordered by lastChange).
| name | value | lastChange |
|-----------------|---------------------------|---------------------------|
| customerLogo | customerLogo-new-value | January, 07 2016 00:00:00 |
| activePageLimit | activePageLimit-new-value | January, 06 2016 12:27:57 |
My current Query is:
SELECT `name`, `value`, `lastChange`
FROM (
SELECT `name`, `value`, `lastChange`
FROM `configurations`
ORDER BY `lastChange` ASC
) AS `c`
GROUP BY `name` DESC
But unfortunately this does not always return the right values, and i don't like to use a subquery, there has to be a cleaner and faster way to do this.
I also created a SQL-Fiddle for you as a playground: http://sqlfiddle.com/#!9/f1dc9/1/0
Is there any other clever solution i missed?
Your method is documented to return indeterminate results (because you have columns in the select that are not in the group by).
Here are three alternatives. The first is standard SQL, using an explicit aggregation to get the most recent change.
SELECT c.*
FROM configurations c JOIN
(SELECT `name`, MAX(`lastChange`) as maxlc
FROM `configurations`
GROUP BY name
) mc
ON c.name = mc.name and c.lasthange = mc.maxlc ;
The second is also standard SQL, using not exists:
select c.*
from configurations c
where not exists (select 1
from configurations c2
where c2.name = c.name and c2.lastchange > c.lastchange
);
The third uses a hack which is available in MySQL (and it assumes that the value does not have any commas in this version and is not too long):
select name, max(lastchange),
substring_index(group_concat(value order by lastchange desc), ',', 1) as value
from configurations
order by name;
Use this version carefully, because it is prone to error (for instance, the intermediate group_concat() result could exceed a MySQL parameter, which would then have to be re-set).
There are other methods -- such as using variables. But these three should be sufficient for you to consider your options.
If we want to avoid SUBQUERY the only other option is JOIN
SELECT cc.name, cc.value, cc.lastChange FROM configurations cc
JOIN (
SELECT name, value, lastChange
FROM configurations
ORDER BY lastChange ASC
) c on c.value = cc.value
GROUP BY cc.name DESC
You have two requirements: a historical log, and a "state". Keep them in two different tables, in spite of that providing redundant information.
That is, have one table that faithfully records who changed what when.
Have another table that faithfully specifies the current state for the configuration.
Plan A: INSERT into the Log and UPDATE the `State whenever anything happens.
Plan B: UPDATE the State and use a TRIGGER to write to the Log.

Mysql deduplicate records in single query

I have the following table:
CREATE TABLE `relations` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`relationcode` varchar(25) DEFAULT NULL,
`email_address` varchar(100) DEFAULT NULL,
`firstname` varchar(100) DEFAULT NULL,
`latname` varchar(100) DEFAULT NULL,
`last_contact_date` varchar(25) DEFAULT NULL,
PRIMARY KEY (`id`)
)
In this table there are duplicates, these are relation with exact the same relationcode and email_address. They can be in there twice or even 10 times.
I need a query that selects the id's of all records, but excludes the ones that are in there more than once. Of those records, I only would like to select the record with the most recent last_contact_id only.
I'm more into Oracle than Mysql, In Oracle I would be able to do it this way:
select * from (
select row_number () over (partition by relationcode order by to_date(last_contact_date,'dd-mm-yyyy')) rank,
id,
relationcode,
email_address ,
last_contact_date
from RELATIONS)
where rank = 1
But I can't figure out how to modify this query to work in MySql. I'm not even dure it's possible to do the same thing in a single query in MySQl.
Any ideas?
Normal way to do this is a sub query to get the latest record and then join that against the table:-
SELECT id, relationcode, email_address, firstname, latname, last_contact_date
FROM RELATIONS
INNER JOIN
(
SELECT relationcode, email_address, MAX(last_contact_date) AS latest_contact_date
FROM RELATIONS
GROUP BY relationcode, email_address
) Sub1
ON RELATIONS.relationcode = Sub1.relationcode
AND RELATIONS.email_address = Sub1.email_address
AND RELATIONS.last_contact_date = Sub1.latest_contact_date
It is possible to manually generate the kind of rank that your Oracle query uses using variables. Bit messy though!
SELECT id, relationcode, email_address, firstname, latname, last_contact_date
FROM
(
SELECT id, relationcode, email_address, firstname, latname, last_contact_date, #seq:=IF(#relationcode = relationcode AND #email_address = email_address, #seq + 1, 1) AS seq, #relationcode := relationcode, #email_address := email_address
(
SELECT id, relationcode, email_address, firstname, latname, last_contact_date
FROM RELATIONS
CROSS JOIN (SELECT #seq:=0, #relationcode := '', #email_address :='') Sub1
ORDER BY relationcode, email_address, last_contact_date DESC
) Sub2
) Sub3
WHERE seq = 1
This uses a sub query to initialise the variables. The sequence number is added to if the relation code and email address are the same as the previous row, if not they are reset to 1 and stored in a field. Then the outer select check the sequence number (as a field, not as the variable name) and records only returned if it is 1.
Note that I have done this as multiple sub queries. Partly to make it clearer to you, but also to try to force the order that MySQL executes it is. There are a couple of possible issues with how MySQL says it may order the execution of things that could cause an issue. They never have done for me, but with sub queries I would hope for force the order.
Here is a method that will work in both MySQL and Oracle. It rephrases the question as: Get me all rows from relations where the relationcode has no larger last_contact_date.
It works something like this:
select r.*
from relations r
where not exists (select 1
from relations r2
where r2.relationcode = r.relationcode and
r2.last_contact_date > r.last_contact_date
);
With the appropriate indexes, this should be pretty efficient in both databases.
Note: This assumes that last_contact_date is stored as a date not as a string (as in your table example). Storing dates as strings is just a really bad idea and you should fix your data structure