MySQL data transfer with re-formatting - mysql

I have this table synonym_temp:
id | synonyms
----------------------------
1 | ebay,online,dragon
2 | auto, med
And I want to transfer it to synonym table but each synonym should be inserted separately. Like:
id | synonym
----------------------------
1 | ebay
1 | online
1 | dragon
2 | auto
2 | med
How should I do that? Thanks in advance.

Use PHP
$records=mysql_query("select * from synonm_temp");
foreach($records as $row)
{
$synonums=$row['synonyms'];
$synonums_array=explode(',', $synonums);
$id=$row['id'];
foreach($synonums_array as $syn)
{
mysql_query("insert into synonm_temp values ($id,'$syn') ");
}
}

You can do this with a Numbers or Tally table which contains a sequential list of integers:
Select T.id, Substring(T.synonym, N.Value, Locate(', ', T.synonym+ ', ', N.Value) - N.Value)
From Numbers As N
Cross Join MyTable As T
Where N.Value <= Len(T.synonym)
And Substring(', ' + T.synonym, N.Value, 1) = ', '
In the above case, my Numbers table is structured like so:
Create Table Numbers( Value int not null primary key )

Related

Creating a column being the multiple of others

I need some help. I have 2 colluns from mysql query result: 1 with text, and another with decimal values. Like that:
select desc, value from table a
|5,50 % | 2984.59 |
|Subs | 10951.70 |
|Isent | 3973.17 |
|13,30 % | 560.26 |
From the rows that have the %, I want to multiply the values and create a third result column, rounding up to two decimal places. See above
2984,59 * 0,055 = 164,15245
560,26 * 0,133 = 74,514
I need make the sql query that show something like above.
+-------+-----------+-----------+
|5,50 % | 2984,59 | 164,16 |
|Subs | 10951,70 | 0 or NULL |
|Isent | 3973,17 | 0 or NULL |
|13,30% | 560,26 | 74,52 |
+-------+-----------+-----------+
How i can do it?
Thanks so much for help
It would be better to have floaring numbers in the first place, converting costs time
You have commas in your procentage, but mysql needs dots there
If value isn't always a number, you can use the mysql way to add a 0 0 to it, that remioves all non numerical characters
SELECT `desc`, `value`, (REPLACE(`desc`,',','.') + 0) * `value` / 100 FROM val
desc
value
(REPLACE(`desc`,',','.') + 0) * `value` / 100
5,50 %
2985
164.175
Subs
10952
0
Isent
3973
0
13,30 %
560
74.48
fiddle
SELECT `desc`, `value`, CEIL((REPLACE(`desc`,',','.') + 0) * `value`) / 100 FROM val
desc
value
CEIL((REPLACE(`desc`,',','.') + 0) * `value`) / 100
5,50 %
2985
164.18
Subs
10952
0
Isent
3973
0
13,30 %
560
74.48
fiddle

Convert JSON string in to separate fields

I have a table with two columns:
create table customerData (id bigint IDENTITY(1,1) NOT NULL, rawData varchar(max))
here the rawData will save the json format data in string, for example below will be the data in that column:
insert into customerData
values ('[{"customerName":"K C Nalina","attendance":"P","collectedAmount":"757","isOverdrafted":false,"loanDisbProduct":null,"paidBy":"Y","customerNumber":"1917889","totalDue":"757"},{"customerName":"Mahalakshmi","attendance":"P","collectedAmount":"881","isOverdrafted":false,"loanDisbProduct":"Emergency Loan","paidBy":"Y","customerNumber":"430833","totalDue":"757"}]'),
('[{"customerName":"John","attendance":"P","collectedAmount":"700","isOverdrafted":false,"loanDisbProduct":null,"paidBy":"Y","customerNumber":"192222","totalDue":"788"},{"customerName":"weldon","attendance":"P","collectedAmount":"771","isOverdrafted":false,"loanDisbProduct":"Emergency Loan","paidBy":"Y","customerNumber":"435874","totalDue":"757"}]')
Expected result :
I need these customerName, customerNumber, loanDisbProduct to be shown in separate fields for each rows.
Also to note the customer details inside rawData for each row will be more than two in many cases.
I don't know how to shred the data inside rawData column.
And I'm using SQL server 2012 and it doesn't support JSON data so I have to manipulate the string and get the field.
Thanks to Red-Gate blog post, first define a View as follow:(I will use this view to generate a new uniqueidentifier inside the function)
CREATE VIEW getNewID as SELECT NEWID() AS new_id
Then create a function as follow(This function is same as the one in Red-Gate blog post, but I have changed it a little a bit and include the identifier in it):
CREATE FUNCTION dbo.parseJSON( #JSON NVARCHAR(MAX))
RETURNS #hierarchy TABLE
(
Element_ID INT IDENTITY(1, 1) NOT NULL, /* internal surrogate primary key gives the order of parsing and the list order */
SequenceNo [int] NULL, /* the place in the sequence for the element */
Parent_ID INT null, /* if the element has a parent then it is in this column. The document is the ultimate parent, so you can get the structure from recursing from the document */
Object_ID INT null, /* each list or object has an object id. This ties all elements to a parent. Lists are treated as objects here */
Name NVARCHAR(2000) NULL, /* the Name of the object */
StringValue NVARCHAR(MAX) NOT NULL,/*the string representation of the value of the element. */
ValueType VARCHAR(10) NOT NULL, /* the declared type of the value represented as a string in StringValue*/
Identifier UNIQUEIDENTIFIER NOT NULL
)
AS
BEGIN
DECLARE
#FirstObject INT, --the index of the first open bracket found in the JSON string
#OpenDelimiter INT,--the index of the next open bracket found in the JSON string
#NextOpenDelimiter INT,--the index of subsequent open bracket found in the JSON string
#NextCloseDelimiter INT,--the index of subsequent close bracket found in the JSON string
#Type NVARCHAR(10),--whether it denotes an object or an array
#NextCloseDelimiterChar CHAR(1),--either a '}' or a ']'
#Contents NVARCHAR(MAX), --the unparsed contents of the bracketed expression
#Start INT, --index of the start of the token that you are parsing
#end INT,--index of the end of the token that you are parsing
#param INT,--the parameter at the end of the next Object/Array token
#EndOfName INT,--the index of the start of the parameter at end of Object/Array token
#token NVARCHAR(200),--either a string or object
#value NVARCHAR(MAX), -- the value as a string
#SequenceNo int, -- the sequence number within a list
#Name NVARCHAR(200), --the Name as a string
#Parent_ID INT,--the next parent ID to allocate
#lenJSON INT,--the current length of the JSON String
#characters NCHAR(36),--used to convert hex to decimal
#result BIGINT,--the value of the hex symbol being parsed
#index SMALLINT,--used for parsing the hex value
#Escape INT,--the index of the next escape character
#Identifier UNIQUEIDENTIFIER
DECLARE #Strings TABLE /* in this temporary table we keep all strings, even the Names of the elements, since they are 'escaped' in a different way, and may contain, unescaped, brackets denoting objects or lists. These are replaced in the JSON string by tokens representing the string */
(
String_ID INT IDENTITY(1, 1),
StringValue NVARCHAR(MAX)
)
SELECT--initialise the characters to convert hex to ascii
#characters='0123456789abcdefghijklmnopqrstuvwxyz',
#SequenceNo=0, --set the sequence no. to something sensible.
/* firstly we process all strings. This is done because [{} and ] aren't escaped in strings, which complicates an iterative parse. */
#Parent_ID=0,
#Identifier = (SELECT new_id FROM dbo.getNewID)
WHILE 1=1 --forever until there is nothing more to do
BEGIN
SELECT
#start=PATINDEX('%[^a-zA-Z]["]%', #json collate SQL_Latin1_General_CP850_Bin);--next delimited string
IF #start=0 BREAK --no more so drop through the WHILE loop
IF SUBSTRING(#json, #start+1, 1)='"'
BEGIN --Delimited Name
SET #start=#Start+1;
SET #end=PATINDEX('%[^\]["]%', RIGHT(#json, LEN(#json+'|')-#start) collate SQL_Latin1_General_CP850_Bin);
END
IF #end=0 --either the end or no end delimiter to last string
BEGIN-- check if ending with a double slash...
SET #end=PATINDEX('%[\][\]["]%', RIGHT(#json, LEN(#json+'|')-#start) collate SQL_Latin1_General_CP850_Bin);
IF #end=0 --we really have reached the end
BEGIN
BREAK --assume all tokens found
END
END
SELECT #token=SUBSTRING(#json, #start+1, #end-1)
--now put in the escaped control characters
SELECT #token=REPLACE(#token, FromString, ToString)
FROM
(SELECT '\b', CHAR(08)
UNION ALL SELECT '\f', CHAR(12)
UNION ALL SELECT '\n', CHAR(10)
UNION ALL SELECT '\r', CHAR(13)
UNION ALL SELECT '\t', CHAR(09)
UNION ALL SELECT '\"', '"'
UNION ALL SELECT '\/', '/'
) substitutions(FromString, ToString)
SELECT #token=Replace(#token, '\\', '\')
SELECT #result=0, #escape=1
--Begin to take out any hex escape codes
WHILE #escape>0
BEGIN
SELECT #index=0,
--find the next hex escape sequence
#escape=PATINDEX('%\x[0-9a-f][0-9a-f][0-9a-f][0-9a-f]%', #token collate SQL_Latin1_General_CP850_Bin)
IF #escape>0 --if there is one
BEGIN
WHILE #index<4 --there are always four digits to a \x sequence
BEGIN
SELECT --determine its value
#result=#result+POWER(16, #index)
*(CHARINDEX(SUBSTRING(#token, #escape+2+3-#index, 1),
#characters)-1), #index=#index+1 ;
END
-- and replace the hex sequence by its unicode value
SELECT #token=STUFF(#token, #escape, 6, NCHAR(#result))
END
END
--now store the string away
INSERT INTO #Strings (StringValue) SELECT #token
-- and replace the string with a token
SELECT #JSON=STUFF(#json, #start, #end+1,
'#string'+CONVERT(NCHAR(5), ##identity))
END
-- all strings are now removed. Now we find the first leaf.
WHILE 1=1 --forever until there is nothing more to do
BEGIN
SELECT #Parent_ID=#Parent_ID+1, #Identifier=(SELECT new_id FROM dbo.getNewID)
--find the first object or list by looking for the open bracket
SELECT #FirstObject=PATINDEX('%[{[[]%', #json collate SQL_Latin1_General_CP850_Bin)--object or array
IF #FirstObject = 0 BREAK
IF (SUBSTRING(#json, #FirstObject, 1)='{')
SELECT #NextCloseDelimiterChar='}', #type='object'
ELSE
SELECT #NextCloseDelimiterChar=']', #type='array'
SELECT #OpenDelimiter=#firstObject
WHILE 1=1 --find the innermost object or list...
BEGIN
SELECT
#lenJSON=LEN(#JSON+'|')-1
--find the matching close-delimiter proceeding after the open-delimiter
SELECT
#NextCloseDelimiter=CHARINDEX(#NextCloseDelimiterChar, #json,
#OpenDelimiter+1)
--is there an intervening open-delimiter of either type
SELECT #NextOpenDelimiter=PATINDEX('%[{[[]%',
RIGHT(#json, #lenJSON-#OpenDelimiter)collate SQL_Latin1_General_CP850_Bin)--object
IF #NextOpenDelimiter=0
BREAK
SELECT #NextOpenDelimiter=#NextOpenDelimiter+#OpenDelimiter
IF #NextCloseDelimiter<#NextOpenDelimiter
BREAK
IF SUBSTRING(#json, #NextOpenDelimiter, 1)='{'
SELECT #NextCloseDelimiterChar='}', #type='object'
ELSE
SELECT #NextCloseDelimiterChar=']', #type='array'
SELECT #OpenDelimiter=#NextOpenDelimiter
END
---and parse out the list or Name/value pairs
SELECT
#contents=SUBSTRING(#json, #OpenDelimiter+1,
#NextCloseDelimiter-#OpenDelimiter-1)
SELECT
#JSON=STUFF(#json, #OpenDelimiter,
#NextCloseDelimiter-#OpenDelimiter+1,
'#'+#type+CONVERT(NCHAR(5), #Parent_ID))
WHILE (PATINDEX('%[A-Za-z0-9#+.e]%', #contents collate SQL_Latin1_General_CP850_Bin))<>0
BEGIN
IF #Type='object' --it will be a 0-n list containing a string followed by a string, number,boolean, or null
BEGIN
SELECT
#SequenceNo=0,#end=CHARINDEX(':', ' '+#contents)--if there is anything, it will be a string-based Name.
SELECT #start=PATINDEX('%[^A-Za-z#][#]%', ' '+#contents collate SQL_Latin1_General_CP850_Bin)--AAAAAAAA
SELECT #token=RTrim(Substring(' '+#contents, #start+1, #End-#Start-1)),
#endofName=PATINDEX('%[0-9]%', #token collate SQL_Latin1_General_CP850_Bin),
#param=RIGHT(#token, LEN(#token)-#endofName+1)
SELECT
#token=LEFT(#token, #endofName-1),
#Contents=RIGHT(' '+#contents, LEN(' '+#contents+'|')-#end-1)
SELECT #Name=StringValue FROM #strings
WHERE string_id=#param --fetch the Name
END
ELSE
SELECT #Name=null,#SequenceNo=#SequenceNo+1
SELECT
#end=CHARINDEX(',', #contents)-- a string-token, object-token, list-token, number,boolean, or null
IF #end=0
--HR Engineering notation bugfix start
IF ISNUMERIC(#contents) = 1
SELECT #end = LEN(#contents) + 1
Else
--HR Engineering notation bugfix end
SELECT #end=PATINDEX('%[A-Za-z0-9#+.e][^A-Za-z0-9#+.e]%', #contents+' ' collate SQL_Latin1_General_CP850_Bin) + 1
SELECT
#start=PATINDEX('%[^A-Za-z0-9#+.e][A-Za-z0-9#+.e]%', ' '+#contents collate SQL_Latin1_General_CP850_Bin)
--select #start,#end, LEN(#contents+'|'), #contents
SELECT
#Value=RTRIM(SUBSTRING(#contents, #start, #End-#Start)),
#Contents=RIGHT(#contents+' ', LEN(#contents+'|')-#end)
IF SUBSTRING(#value, 1, 7)='#object'
INSERT INTO #hierarchy
(Name, SequenceNo, Parent_ID, StringValue, Object_ID, ValueType, Identifier)
SELECT #Name, #SequenceNo, #Parent_ID, SUBSTRING(#value, 8, 5),
SUBSTRING(#value, 8, 5), 'object' , #Identifier
ELSE
IF SUBSTRING(#value, 1, 6)='#array'
INSERT INTO #hierarchy
(Name, SequenceNo, Parent_ID, StringValue, Object_ID, ValueType, Identifier)
SELECT #Name, #SequenceNo, #Parent_ID, SUBSTRING(#value, 7, 5),
SUBSTRING(#value, 7, 5), 'array' , #Identifier
ELSE
IF SUBSTRING(#value, 1, 7)='#string'
INSERT INTO #hierarchy
(Name, SequenceNo, Parent_ID, StringValue, ValueType, Identifier)
SELECT #Name, #SequenceNo, #Parent_ID, StringValue, 'string', #Identifier
FROM #strings
WHERE string_id=SUBSTRING(#value, 8, 5)
ELSE
IF #value IN ('true', 'false')
INSERT INTO #hierarchy
(Name, SequenceNo, Parent_ID, StringValue, ValueType, Identifier)
SELECT #Name, #SequenceNo, #Parent_ID, #value, 'boolean', #Identifier
ELSE
IF #value='null'
INSERT INTO #hierarchy
(Name, SequenceNo, Parent_ID, StringValue, ValueType, Identifier)
SELECT #Name, #SequenceNo, #Parent_ID, #value, 'null', #Identifier
ELSE
IF PATINDEX('%[^0-9]%', #value collate SQL_Latin1_General_CP850_Bin)>0
INSERT INTO #hierarchy
(Name, SequenceNo, Parent_ID, StringValue, ValueType,Identifier)
SELECT #Name, #SequenceNo, #Parent_ID, #value, 'real', #Identifier
ELSE
INSERT INTO #hierarchy
(Name, SequenceNo, Parent_ID, StringValue, ValueType, Identifier)
SELECT #Name, #SequenceNo, #Parent_ID, #value, 'int', #Identifier
if #Contents=' ' Select #SequenceNo=0
END
END
INSERT INTO #hierarchy (Name, SequenceNo, Parent_ID, StringValue, Object_ID, ValueType, Identifier)
SELECT '-',1, NULL, '', #Parent_ID-1, #type, #Identifier
--
RETURN
END
Finally, If we have this table and data:
DECLARE #customerData TABLE (jsonValue NVARCHAR(MAX))
INSERT INTO #customerData
VALUES ('[{"customerName":"K C Nalina","attendance":"P","collectedAmount":"757","isOverdrafted":false,"loanDisbProduct":null,"paidBy":"Y","customerNumber":"1917889","totalDue":"757"},{"customerName":"Mahalakshmi","attendance":"P","collectedAmount":"881","isOverdrafted":false,"loanDisbProduct":"Emergency Loan","paidBy":"Y","customerNumber":"430833","totalDue":"757"}]'),
('[{"customerName":"John","attendance":"P","collectedAmount":"700", "isOverdrafted":false,"loanDisbProduct":null,"paidBy":"Y","customerNumber":"192222","totalDue":"788"},{"customerName":"weldon","attendance":"P","collectedAmount":"771","isOverdrafted":false,"loanDisbProduct":"Emergency Loan","paidBy":"Y","customerNumber":"435874","totalDue":"757"}]')
We can simply parse the JSON value as below:
;WITH jsonValue AS(
SELECT * FROM #customerData
CROSS APPLY(SELECT * FROM dbo.parseJSON(jsonvalue)) AS d
WHERE d.Name IN('customerName', 'customerNumber', 'loanDisbProduct')
)
,openResult AS(
SELECT i.Name, i.StringValue, i.Identifier FROM jsonValue AS i
)
SELECT
MAX(K.CustomerName) AS CustomerName,
MAX(K.CustomerNumber) AS CustomerNumber,
MAX(K.LoanDisbProduct) AS LoanDisbProduct
FROM (
SELECT
CASE WHEN openResult.Name='customerName' THEN openResult.StringValue ELSE NULL END AS CustomerName,
CASE WHEN openResult.Name='customerNumber' THEN openResult.StringValue ELSE NULL END AS CustomerNumber,
CASE WHEN openResult.Name='loanDisbProduct' THEN openResult.StringValue ELSE NULL END AS LoanDisbProduct,
openResult.Identifier
FROM openResult
) AS K
GROUP BY K.Identifier
And we will get the following output:
CustomerName | CustomerNumber | LoanDisbProduct
------------------------------------------------------
John | 192222 | null
Mahalakshmi | 430833 | Emergency Loan
K C Nalina | 1917889 | null
weldon | 435874 | Emergency Loan
If you do not know how many customers for each row, you shouldn't shred each customer to one field, at least a row pr customer.
Here is a start on shredding the data, I am using the dbo.STRING_SPLIT function from this page:
First I split by {} in the Json, then I split by ',', and then You ave the attribute name and value for each ID, with numbering of the customers in each row.
I could have split on ',' the same way as for '{...}' however I chose to use a function for this.
Everything is reliant on the same structure of the JSON. To do better parsing SQL server 2016+ would be recommended.
DROP TABLE IF EXISTS #customerData
create table #customerData (id bigint IDENTITY(1,1) NOT NULL, rawData varchar(max))
INSERT INTO #customerData
VALUES ('[{"customerName":"K C Nalina","attendance":"P","collectedAmount":"757","isOverdrafted":false,"loanDisbProduct":null,"paidBy":"Y","customerNumber":"1917889","totalDue":"757"},{"customerName":"Mahalakshmi","attendance":"P","collectedAmount":"881","isOverdrafted":false,"loanDisbProduct":"Emergency Loan","paidBy":"Y","customerNumber":"430833","totalDue":"757"}]'),
('[{"customerName":"John","attendance":"P","collectedAmount":"700","isOverdrafted":false,"loanDisbProduct":null,"paidBy":"Y","customerNumber":"192222","totalDue":"788"},{"customerName":"weldon","attendance":"P","collectedAmount":"771","isOverdrafted":false,"loanDisbProduct":"Emergency Loan","paidBy":"Y","customerNumber":"435874","totalDue":"757"}]')
;
WITH cte AS
(
SELECT id
, REPLACE(REPLACE(REPLACE(REPLACE(SUBSTRING(rawData, CHARINDEX('{', rawData), CHARINDEX('}', rawData) - CHARINDEX('{', rawData)), '{', ''), '[', ''), '}', ''), ']', '') person
, SUBSTRING(rawData, CHARINDEX('}', rawData) + 1, LEN(rawData)) personrest
, 1 nr
FROM #customerData
UNION ALL
SELECT id
, REPLACE(REPLACE(REPLACE(REPLACE(SUBSTRING(personrest, CHARINDEX('{', personrest), CHARINDEX('}', personrest) - CHARINDEX('{', personrest)), '{', ''), '[', ''), '}', ''), ']', '')
, SUBSTRING(personrest, CHARINDEX('}', personrest) + 1, LEN(personrest)) personrest
, nr + 1
FROM cte
WHERE CHARINDEX('}', personrest) > 0
AND CHARINDEX('{', personrest) > 0
)
SELECT id
, a.nr CustomerOrder
, LEFT([value], CHARINDEX(':', [value]) - 1)
, SUBSTRING([value], CHARINDEX(':', [value]) + 1, LEN([value]))
FROM cte a
CROSS APPLY (
SELECT *
FROM dbo.STRING_SPLIT(REPLACE(a.person, '"', ''), ',')
) b
The result is:
+─────+────────────────+──────────────────+─────────────────+
| id | CustomerOrder | Attribute | value |
+─────+────────────────+──────────────────+─────────────────+
| 1 | 1 | customerName | K C Nalina |
| 1 | 1 | attendance | P |
| 1 | 1 | collectedAmount | 757 |
| 1 | 1 | isOverdrafted | false |
| 1 | 1 | loanDisbProduct | null |
| 1 | 1 | paidBy | Y |
| 1 | 1 | customerNumber | 1917889 |
| 1 | 1 | totalDue | 757 |
| 2 | 1 | customerName | John |
| 2 | 1 | attendance | P |
| 2 | 1 | collectedAmount | 700 |
| 2 | 1 | isOverdrafted | false |
| 2 | 1 | loanDisbProduct | null |
| 2 | 1 | paidBy | Y |
| 2 | 1 | customerNumber | 192222 |
| 2 | 1 | totalDue | 788 |
| 2 | 2 | customerName | weldon |
| 2 | 2 | attendance | P |
| 2 | 2 | collectedAmount | 771 |
| 2 | 2 | isOverdrafted | false |
| 2 | 2 | loanDisbProduct | Emergency Loan |
| 2 | 2 | paidBy | Y |
| 2 | 2 | customerNumber | 435874 |
| 2 | 2 | totalDue | 757 |
| 1 | 2 | customerName | Mahalakshmi |
| 1 | 2 | attendance | P |
| 1 | 2 | collectedAmount | 881 |
| 1 | 2 | isOverdrafted | false |
| 1 | 2 | loanDisbProduct | Emergency Loan |
| 1 | 2 | paidBy | Y |
| 1 | 2 | customerNumber | 430833 |
| 1 | 2 | totalDue | 757 |
+─────+────────────────+──────────────────+─────────────────+
Best was to upgrade to v2016+. With JSON support this was easy...
On v2012 you have to hack around. It might be a better choice to use another tool for this. But, if you have to stick to TSQL, I would try to transform the JSON to attribute centered XML like here:
DECLARE #customerData TABLE (id bigint IDENTITY(1,1) NOT NULL, rawData varchar(max));
insert into #customerData
values ('[{"customerName":"K C Nalina","attendance":"P","collectedAmount":"757","isOverdrafted":false,"loanDisbProduct":null,"paidBy":"Y","customerNumber":"1917889","totalDue":"757"},{"customerName":"Mahalakshmi","attendance":"P","collectedAmount":"881","isOverdrafted":false,"loanDisbProduct":"Emergency Loan","paidBy":"Y","customerNumber":"430833","totalDue":"757"}]'),
('[{"customerName":"John","attendance":"P","collectedAmount":"700","isOverdrafted":false,"loanDisbProduct":null,"paidBy":"Y","customerNumber":"192222","totalDue":"788"},{"customerName":"weldon","attendance":"P","collectedAmount":"771","isOverdrafted":false,"loanDisbProduct":"Emergency Loan","paidBy":"Y","customerNumber":"435874","totalDue":"757"}]')
--the query
SELECT cd.id
,B.*
FROM #customerData cd
CROSS APPLY(SELECT REPLACE(REPLACE(REPLACE(cd.rawData,'false','"0"'),'true','"1"'),'null','"#NULL"')) A(JustStringValues)
CROSS APPLY(SELECT CAST(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(JustStringValues,'[',''),']',''),'},{"',' /><x '),'{"','<x '),'}',' />'),'","','" '),'":"','="') AS XML)) B(SinlgeRow)
--the result
<x customerName="K C Nalina" attendance="P" collectedAmount="757" isOverdrafted="0" loanDisbProduct="#NULL" paidBy="Y" customerNumber="1917889" totalDue="757" /x>
<x customerName="Mahalakshmi" attendance="P" collectedAmount="881" isOverdrafted="0" loanDisbProduct="Emergency Loan" paidBy="Y" customerNumber="430833" totalDue="757" /x>
<x customerName="John" attendance="P" collectedAmount="700" isOverdrafted="0" loanDisbProduct="#NULL" paidBy="Y" customerNumber="192222" totalDue="788" /x>
<x customerName="weldon" attendance="P" collectedAmount="771" isOverdrafted="0" loanDisbProduct="Emergency Loan" paidBy="Y" customerNumber="435874" totalDue="757" /x>
The idea in short:
We replace the non-quoted values (false, true, null) with a quoted place holder
We use various replacements to get the attribute centered XML
Use this query to get the values
SELECT cd.id
,OneCustomer.value('#customerName','nvarchar(max)') AS CustomerName
,OneCustomer.value('#attendance','nvarchar(max)') AS Attendance
--more attributes
FROM #customerData cd
CROSS APPLY(SELECT REPLACE(REPLACE(REPLACE(cd.rawData,'false','"0"'),'true','"1"'),'null','"#NULL"')) A(JustStringValues)
CROSS APPLY(SELECT CAST(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(JustStringValues,'[',''),']',''),'},{"',' /><x '),'{"','<x '),'}',' />'),'","','" '),'":"','="') AS XML)) B(SinlgeRow)
CROSS APPLY B.SinlgeRow.nodes('/x') AS C(OneCustomer);

MYSQL query average price

I have to calculate the average price of a house in Groningen.
Though the price is not stored as an number but as a string (with some additional information) and it uses a point ('.') as a thousands separator.
Price is stored as 'Vraagprijs' in Dutch.
The table results are:
€ 95.000 k.k.
€ 116.500 v.o.n.
€ 115.000 v.o.n.
and goes so on...
My query:
'$'SELECT AVG(SUBSTRING(value,8,8)) AS AveragePrice_Groningen
FROM properties
WHERE name = 'vraagprijs'
AND EXISTS (SELECT *
FROM estate
WHERE pc_wp LIKE '%Groningen%'
AND properties.woid = estate.id);
The result is:
209.47509187620884
But it has to be:
20947509187620,884
How can i get this done?
The AVG(SUBSTRING(value,8,8)) dosent work:
sample
MariaDB [yourSchema]> SELECT *,SUBSTRING(`value`,8,8), SUBSTRING_INDEX(SUBSTRING_INDEX(`value`, ' ', -2),' ',1) FROM properties;
+----+-----------------------+------------------------+----------------------------------------------------------+
| id | value | SUBSTRING(`value`,8,8) | SUBSTRING_INDEX(SUBSTRING_INDEX(`value`, ' ', -2),' ',1) |
+----+-----------------------+------------------------+----------------------------------------------------------+
| 1 | € 95.000 k.k. | 95.000 k | 95.000 |
| 2 | € 116.500 v.o.n. | 116.500 | 116.500 |
| 3 | € 115.000 v.o.n. | 115.000 | 115.000 |
+----+-----------------------+------------------------+----------------------------------------------------------+
3 rows in set (0.00 sec)
MariaDB [yourSchema]>
**change it to **
AVG(SUBSTRING_INDEX(SUBSTRING_INDEX(`value`, ' ', -2),' ',1))
Try using a CAST DECIMAL and SPLIT for get the right part of the string
'$'
SELECT AVG( CAST(SPLIT_STR(value,' ', 2)) AS DECIMAL) AS AveragePrice_Groningen
FROM properties
WHERE name = 'vraagprijs'
AND EXISTS (SELECT *
FROM estate
WHERE pc_wp LIKE '%Groningen%'
AND properties.woid = estate.id);
You entered the data with the . as decimal separator, which is normal in Dutch, but not normal in English where they tend to use the , as decimal separator.
Enter the data into you database as 215000.000, etc and you should get normal values as answer.

Two methods of performing cohort analysis in MySQL using joins

I make a cohort analysis processor. Input parameters: time range and step, condition (initial event) to exctract cohorts, additional condition (retention event) to check after each N hours/days/months. Output parameters: cohort analysis grid, like this:
0h | 16h | 32h | 48h | 64h | 80h | 96h |
cohort #00 15 | 6 | 4 | 1 | 1 | 2 | 2 |
cohort #01 1 | 35 | 8 | 0 | 2 | 0 | 1 |
cohort #02 0 | 3 | 31 | 11 | 5 | 3 | 0 |
cohort #03 0 | 0 | 4 | 27 | 7 | 6 | 2 |
cohort #04 0 | 1 | 1 | 4 | 29 | 4 | 3 |
Basically:
fetch cohorts: unique users who did something 1 in every period from time_begin every time_step.
find how many of them (in each cohort) did something 2 after N seconds, N*2 seconds, N*3, and so on until now.
In short - I have 2 solutions. One works too slow and includes a heavy select with joins for each data step: 1 day, 2 day, 3 day, etc. I want to optimize it by joining result for every data step to cohorts - and it's the second solution. It looks like it works but I'm not sure it's the best way and that it will give the same result even if cohorts will intersect. Please check it out.
Here's the whole story.
I have a table of > 100,000 events, something like this:
#user-id, timestamp, event_name
events_view (uid varchar(64), tm int(11), e varchar(64))
example input row:
"user_sampleid1", 1423836540, "level_end:001:win"
To make a cohort analisys first I extract cohorts: for example, users, who send special event '1st_launch' in 10 hour periods starting from 2015-02-13 and ending with 2015-02-16. All code in this post is simplified and shortened to see the idea.
DROP TABLE IF EXISTS tmp_c;
create temporary table tmp_c (uid varchar(64), tm int(11), c int(11) );
set beg = UNIX_TIMESTAMP('2015-02-13 00:00:00');
set en = UNIX_TIMESTAMP('2015-02-16 00:00:00');
select min(tm) into t_start from events_view ;
select max(tm) into t_end from events_view ;
if beg < t_start then
set beg = t_start;
end if;
if en > t_end then
set en = t_end;
end if;
set period = 3600 * 10;
set cnt_c = ceil((en - beg) / period) ;
/*works quick enough*/
WHILE i < cnt_c DO
insert into tmp_c (
select uid, min(tm), i from events_view where
locate("1st_launch", e) > 0 and tm > (beg + period * i)
AND tm <= (beg + period * (i+1)) group by uid );
SET i = i+1;
END WHILE;
Cohorts may consist the same user ids, though usually one user is exist only in one cohort. And in each cohort users are unique.
Now I have temp table like this:
user_id | 1st timestamp | cohort_no
uid1 1423836540 0
uid2 1423839540 0
uid3 1423841160 1
uid4 1423841460 2
...
uidN 1423843080 M
Then I need to again divide time range on periods and calculate for each period how many users from each cohort have sent event "level_end:001:win".
For each small period I select all unique users who have sent "level_end:001:win" event and left join them to tmp_c cohorts table. So I have something like this:
user_id | 1st timestamp | cohort_no | user_id | other fields...
uid1 1423836540 0 uid1
uid2 1423839540 0 null
uid3 1423841160 1 null
uid4 1423841460 2 uid4
...
uidN 1423843080 M null
This way I see how many users from my cohorts are in those who have sent "level_end:001:win", exclude not found by where clause: where t2.uid is not null.
Finally I perform grouping and have counts of users in each cohort, who have sent "level_end:001:win" in this particluar period.
Here's the code:
DROP TABLE IF EXISTS tmp_res;
create temporary table tmp_res (uid varchar(64) CHARACTER SET cp1251 NOT NULL, c int(11), cnt int(11) );
set i = 0;
set cnt_c = ceil((t_end - beg) / period) ;
WHILE i < cnt_c DO
insert into tmp_res
select concat(beg + period * i, "_", beg + period * (i+1)), c, count(distinct(uid)) from
(select t1.uid, t1.c from tmp_c t1 left join
(select uid, min(tm) from events_view where
locate("level_end:001:win", e) > 0 and
tm > (beg + period * i) AND tm <= (beg + period * (i+1)) group by uid ) t2
on t1.uid = t2.uid where t2.uid is not null) t3
group by c;
SET i = i+1;
END WHILE;
/*getting result of the first method: tooo slooooow!*/
select * from tmp_res;
The result I've got (it's ok that some cohorts are not appear on some periods):
"1423832400_1423890000","1","35"
"1423832400_1423890000","2","3"
"1423832400_1423890000","3","1"
"1423832400_1423890000","4","1"
"1423890000_1423947600","1","21"
"1423890000_1423947600","2","50"
"1423890000_1423947600","3","2"
"1423947600_1424005200","1","9"
"1423947600_1424005200","2","24"
"1423947600_1424005200","3","70"
"1423947600_1424005200","4","6"
"1424005200_1424062800","1","7"
"1424005200_1424062800","2","15"
"1424005200_1424062800","3","21"
"1424005200_1424062800","4","32"
"1424062800_1424120400","1","7"
"1424062800_1424120400","2","13"
"1424062800_1424120400","3","24"
"1424062800_1424120400","4","18"
"1424120400_1424178000","1","10"
"1424120400_1424178000","2","12"
"1424120400_1424178000","3","18"
"1424120400_1424178000","4","14"
"1424178000_1424235600","1","6"
"1424178000_1424235600","2","7"
"1424178000_1424235600","3","9"
"1424178000_1424235600","4","12"
"1424235600_1424293200","1","6"
"1424235600_1424293200","2","8"
"1424235600_1424293200","3","9"
"1424235600_1424293200","4","5"
"1424293200_1424350800","1","5"
"1424293200_1424350800","2","3"
"1424293200_1424350800","3","11"
"1424293200_1424350800","4","10"
"1424350800_1424408400","1","8"
"1424350800_1424408400","2","5"
"1424350800_1424408400","3","7"
"1424350800_1424408400","4","7"
"1424408400_1424466000","2","6"
"1424408400_1424466000","3","7"
"1424408400_1424466000","4","3"
"1424466000_1424523600","1","3"
"1424466000_1424523600","2","4"
"1424466000_1424523600","3","8"
"1424466000_1424523600","4","2"
"1424523600_1424581200","2","3"
"1424523600_1424581200","3","3"
It works but it takes too much time to process because there are many queries here instead of one, so I need to rewrite it.
I think it can be rewritten with joins, but I'm still not sure how.
I decided to make a temporary table and write period boundaries in it:
DROP TABLE IF EXISTS tmp_times;
create temporary table tmp_times (tm_start int(11), tm_end int(11));
set cnt_c = ceil((t_end - beg) / period) ;
set i = 0;
WHILE i < cnt_c DO
insert into tmp_times values( beg + period * i, beg + period * (i+1));
SET i = i+1;
END WHILE;
Then I get periods-to-events mapping (user_id + timestamp represent particular event) to temp table and left join it to cohorts table and group the result:
SELECT Concat(tm_start, "_", tm_end) per,
t1.c coh,
Count(DISTINCT( t2.uid ))
FROM tmp_c t1
LEFT JOIN (SELECT *
FROM tmp_times t3
LEFT JOIN (SELECT uid,
tm
FROM events_view
WHERE Locate("level_end:101:win", e) > 0)
t4
ON ( t4.tm > t3.tm_start
AND t4.tm <= t3.tm_end )
WHERE t4.uid IS NOT NULL
ORDER BY t3.tm_start) t2
ON t1.uid = t2.uid
WHERE t2.uid IS NOT NULL
GROUP BY per,
coh
ORDER BY per,
coh;
In my tests this returns the same result as method #1. I can't check the result manually, but I understand how method #1 work more and as far I can see it gives what I want. Method #2 is faster, but I'm not sure it's the best way and it will give the same result even if cohorts will intersect.
Maybe there are well-known common methods to perform a cohort analysis in SQL? Is method #1 I use more reliable than method #2? I work with joins not that often, that's why still do not fully understand joins magic yet.
Method #2 looks like pure magic, and I used to not believe in what I don't understand :)
Thanks for answers!

split characters and numbers in MySQL

I have a column in my table like this,
students
--------
abc23
def1
xyz567
......
and so on. Now i need output like only names
Need output as
students
--------
abc
def
xyz
How can i get this in mysql. Thanks advance.
You can do it with string functions ans some CAST() magic:
SELECT
SUBSTR(
name,
1,
CHAR_LENGTH(#name) - CHAR_LENGTH(
IF(
#c:=CAST(REVERSE(name) AS UNSIGNED),
#c,
''
)
)
)
FROM
students
for example:
SET #name:='abc12345';
mysql> SELECT SUBSTR(#name, 1, CHAR_LENGTH(#name) - CHAR_LENGTH(IF(#c:=CAST(REVERSE(#name) AS UNSIGNED), #c, ''))) AS name;
+------+
| name |
+------+
| abc |
+------+