Ordering by a non-numeric hierarchy string numerically - mysql

I have a table with data representing a tree structure, with one column indicating the row's position in the hierarchical tree. Each level is separated with a -.
1
1-1
2
2-1
2-2
2-2-1
2-2-2
2-2-2-1
The tree is retrieved in order simply with an ORDER BY on this column. This falls down when there are more than 10 items at any level, as the column is sorted alphabetically. MySQL sorts 10 before 3.
Actual result:
1
1-10
1-3
2
Desired result:
1
1-3
1-10
2
There could be any number of levels of depth to the values.
Is it possible to sort this data numerically in MySQL?

I think your best shot is to convert the data into something that does naturally sort. If you tree structure will always have less than 99 children, you could create a function like I have below. You would just use the "GetTreeStructureSort(columnName)" in the sort function. (If you have the possibility of 3-digit numbers, you could adjust this to be more intuitive.)
CREATE FUNCTION GetTreeStructureSort
(
-- Add the parameters for the function here
#structure varchar(500)
)
RETURNS varchar(500)
AS
BEGIN
DECLARE #sort varchar(500)
-- Add a hyphen to the beginning and end to make all the numbers from 1 to 9 easily replaceable
SET #sort = '-' + #structure + '-'
-- Replace each instance of a one-digit number to a two-digit representation
SELECT #sort = REPLACE(#sort, '-1-', '-01-')
SELECT #sort = REPLACE(#sort, '-2-', '-02-')
SELECT #sort = REPLACE(#sort, '-3-', '-03-')
SELECT #sort = REPLACE(#sort, '-4-', '-04-')
SELECT #sort = REPLACE(#sort, '-5-', '-05-')
SELECT #sort = REPLACE(#sort, '-6-', '-06-')
SELECT #sort = REPLACE(#sort, '-7-', '-07-')
SELECT #sort = REPLACE(#sort, '-8-', '-08-')
SELECT #sort = REPLACE(#sort, '-9-', '-09-')
-- Strip off the first and last hyphens that were added at the beginning.
SELECT #sort = SUBSTRING(#sort, 2, LEN(#sort) - 2)
-- Return the result of the function
RETURN #sort
END
This would convert these results:
1
1-10
1-3
2
into this:
01
01-03
01-10
02
I tested this with the following code:
DECLARE #something varchar(255)
set #something = '1-10-3-21'
SELECT dbo.GetTreeStructureSort(#something)

Related

Mysql function to genarate custom ids

I need some help in creating a MySQL function
This function generates a user id for my user, Which generates 5 digits unique id starting from A0001, A0002, B0001, C0001, and so on but the problem is it reaches F9999 as per my function the following number should be G0000
But my requirement is can't go past letter F
We can't have a user id that is more than 5 'digits' and we can only use the letters A to F
Se I come with some Solution moving on to a range that is something like this: AA000, AA001, AA002.... and then AB000, AB001, AB002, AF999 BA000, etc.
This is my current function which I use to generate userid
DELIMITER $$
CREATE DEFINER=`root`#`localhost` FUNCTION `getNextID`() RETURNS varchar(10) CHARSET utf8
BEGIN
set #prefix := (select COALESCE(max(left(id, 1)), 'A') from users where left(id, 1) < 1);
set #highest := (select max(CAST(right(id, 4) AS UNSIGNED))+1 from users where left(id, 1) = #prefix);
if #highest > 9999 then
set #prefix := CHAR(ORD(#prefix)+1);
set #highest := 0;
end if;
RETURN concat( #prefix , LPAD( #highest, 4, 0 ) );
END$$
DELIMITER ;
Your ID can be thought of a hexadecimal number consisting of letters only, followed by a decimal number. Each hexadecimal digit starts a new series of decimal numbers, because the ID is of fixed length 5.
The first subproblem is to find the maximum ID, because it should be assumed that F9998 < F9999 < AA000 < AA001. We can calculate H*10000 + D with H being the hexadecimal part and D the decimal part of the ID to get the right order.
SELECT id
FROM (
SELECT 'AB999' as id UNION
SELECT 'AA000' UNION
SELECT 'F9999' UNION
SELECT 'AAA00' UNION
SELECT 'FFFF9' UNION
SELECT 'FFFF8' UNION
SELECT 'FFFD3') user
ORDER BY conv(regexp_substr(id, '^[A-F]*'), 16, 10) * 10000 + CAST(substring(id, length(regexp_substr(id, '^[A-F]*')) + 1) AS unsigned) DESC
LIMIT 1;
The second subproblem is to find the successor of a given ID. We calculate the decimal number like above but use the correct factor (10^n with n being the length of the decimal part) this time, then we add one to this number and convert it back to the hex/dec representation. In the hexadecimal part there may be 0s and 1s which have to be replaced by 'A'. Whenever the hex part gets longer, the decimal part consists of 0s only. That is, we can just return a substring of the desired length and strip trailing 0es:
DELIMITER //
CREATE FUNCTION nextId(id VARCHAR(5)) RETURNS VARCHAR(5) NO SQL
BEGIN
set #hexStr := regexp_substr(id, '^[A-F]*');
set #digits := length(id) - length(#hexStr);
set #decimalPart := CAST(right(id, #digits) AS UNSIGNED);
set #factor := pow(10, #digits);
set #hexPart := conv(#hexStr, 16, 10);
set #n := #hexPart * #factor + #decimalPart + 1; -- ID increased by 1
set #decimalPart := mod(#n, #factor);
set #hexStr := regexp_replace(conv(floor(#n / #factor), 10, 16), '[01]', 'A');
return substring(concat(#hexStr, lpad(#decimalPart, #digits, '0')), 1, length(id));
END;
//
DELIMITER ;
Using this function
SELECT id, nextId(id) next_id
FROM (
SELECT 'F9998' as id UNION
SELECT 'F9999' UNION
SELECT 'AA999' as id UNION
SELECT 'AB000' UNION
SELECT 'AB999' UNION
SELECT 'AF999' UNION
SELECT 'FF999' UNION
SELECT 'AAA00') user;
results in
id
next_id
F9998
F9999
F9999
AA000
AA999
AB000
AB000
AB001
AB999
AC000
AF999
BA000
FF999
AAA00
AAA00
AAA01
Here's a fiddle

Teradata Masking - Retain all chararcters at position 1,4,8,12,16 .... in a string and mask remaining characters with 'X'

I have a requirement where I need to mask all but characters in position 1,4,8,12,16.. for a variable length string with 'X'
For example:
Input string - 'John Doe'
Output String - 'JXXn xxE'
SPACE between the two strings must be retained.
Kindly help or reach out for more details if required.
I think maybe an external function would be best here, but if that's too much to bite off, you can get crafty with strtok_split_to_table, xml_agg and regexp_replace to rip the string apart, replace out characters using your criteria, and stitch it back together:
WITH cte AS (SELECT REGEXP_REPLACE('this is a test of this functionality', '(.)', '\1,') AS fullname FROM Sys_Calendar.calendar WHERE calendar_date = CURRENT_DATE)
SELECT
REGEXP_REPLACE(REGEXP_REPLACE((XMLAGG(tokenout ORDER BY tokennum) (VARCHAR(200))), '(.) (.)', '\1\2') , '(.) (.)', '\1\2')
FROM
(
SELECT
tokennum,
outkey,
CASE WHEN tokennum = 1 OR tokennum mod 4 = 0 OR token = ' ' THEN token ELSE 'X' END AS tokenout
FROM TABLE (strtok_split_to_table(cte.fullname, cte.fullname, ',')
RETURNS (outkey VARCHAR(200), tokennum integer, token VARCHAR(200) CHARACTER SET UNICODE)) AS d
) stringshred
GROUP BY outkey
This won't be fast on a large data set, but it might suffice depending on how much data you have to process.
Breaking this down:
WITH cte AS (SELECT REGEXP_REPLACE('this is a test of this functionality', '(.)', '\1,') AS fullname FROM Sys_Calendar.calendar WHERE calendar_date = CURRENT_DATE)
This CTE is just adding a comma between every character of our incoming string using that regexp_replace function. Your name will come out like J,o,h,n, ,D,o,e. You can ignore the sys_calendar part, I just put that in so it would spit out exactly 1 record for testing.
SELECT
tokennum,
outkey,
CASE WHEN tokennum = 1 OR tokennum mod 4 = 0 OR token = ' ' THEN token ELSE 'X' END AS tokenout
FROM TABLE (strtok_split_to_table(cte.fullname, cte.fullname, ',')
RETURNS (outkey VARCHAR(200), tokennum integer, token VARCHAR(200) CHARACTER SET UNICODE)) AS d
This subquery is the important bit. Here we create a record for every character in your incoming name. strtok_split_to_table is doing the work here splitting that incoming name by comma (which we added in the CTE)
The Case statement just runs your criteria swapping out 'X' in the correct positions (record 1, or a multiple of 4, and not a space).
SELECT
REGEXP_REPLACE(REGEXP_REPLACE((XMLAGG(tokenout ORDER BY tokennum) (VARCHAR(200))), '(.) (.)', '\1\2') , '(.) (.)', '\1\2')
Finally we use XMLAGG to combine the many records back into one string in a single record. Because XMLAGG adds a space in between each character we have to hit it a couple of times with regexp_replace to flip those spaces back to nothing.
So... it's ugly, but it does the job.
The code above spits out:
tXXs XX X XeXX oX XhXX fXXXtXXXaXXXy
I couldn't think of a solution, but then #JNevill inspired me with his idea to add a comma to each character :-)
SELECT
RegExp_Replace(
RegExp_Replace(
RegExp_Replace(inputString, '(.)(.)?(.)?(.)?', '(\1(\2[\3(\4', 2)
,'(\([^ ])', 'X')
,'(\(|\[)')
,'this is a test of this functionality' AS inputString
tXXs XX X XeXX oX XhXX fXXXtXXXaXXXy
The 1st RegExp_Replace starts at the 2nd character (keep the 1st character as-is) and processes groups of (up to) 4 characters adding either a ( (characters #1,#2,#4, to be replaced by X unless it's a space) or [ (character #3, no replacement), which results in :
t(h(i[s( (i(s[ (a( (t[e(s(t( [o(f( (t[h(i(s( [f(u(n(c[t(i(o(n[a(l(i(t[y(
Of course this assumes that both characters don't exists in your input data, otherwise you have to choose different ones.
The 2nd RegExp_Replace replaces the ( and the following character with X unless it's a space, which results in:
tXX[s( XX[ X( X[eXX( [oX( X[hXX( [fXXX[tXXX[aXXX[y(
Now there are some (& [ left which are removed by the 3rd RegExp_Replace.
As I still consider me as a beginner in Regular Expressions, there will be better solutions :-)
Edit:
In older Teradata versions not all parameters were optional, then you might have to add values for those:
RegExp_Replace(
RegExp_Replace(
RegExp_Replace(inputString, '(.)(.)?(.)?(.)?', '(\1(\2[\3(\4', 2, 0 'c')
,'(\([^ ])', 'X', 1, 0 'c')
,'(\(|\[)', '', 1, 0 'c')

Extracting more than one date from a text field

I have a SQL query that I am using that works perfectly for what I am needing at extracting the first date in a text field. This is a free form text field that is tied to a status that when the right conditions align my query looks for a date and if it is in the correct format it extracts this date.
Sometimes the date will be input as a range of dates or a comma separated list of dates. I would like to know if there is a way to extract the last date in the case of a date range or the other dates in a list of dates?
The current query has 3 steps in temp tables for extracting the date here are snippets for each step.
In the first step it looks for the word 'proposed' and grabs a number of characters after:
,SUBSTRING(al.Comments,
PATINDEX('%proposed%',al.Comments)+9,17) [DateFirstPass]
In the second step the query it extracts the date:
,LEFT(
SUBSTRING(p1.DateFirstPass, PATINDEX('%[0-9/]%', p1.DateFirstPass), 10), --string (only numbers and forward slash) MAX of 10 chars (mm/dd/yyyy)
PATINDEX('%[^0-9/]%', SUBSTRING(p1.DateFirstPass, PATINDEX('%[0-9/]%', p1.DateFirstPass), 10) + 'X')-1) --char length after negating nonvalid characters '%[^0-9/]%'
[DateSecondPass]
In the last pass it adds in the year if it is missing:
CASE
WHEN ISDATE(p2.DateSecondPass) = 1
THEN CAST(p2.DateSecondPass AS DATE)
WHEN ISDATE(p2.DateSecondPass + '/' + this.yr) = 1
THEN p2.DateSecondPass + '/' + this.yr --Adds missing year
END [DateThirdPass]
FROM
#ProposedDateParse2 p2
CROSS JOIN (VALUES (CAST(YEAR(GETDATE()) AS varchar(4)))) this(yr)
Grab a copy of patternSplitCM
Here's the code:
-- PatternSplitCM will split a string based on a pattern of the form
-- supported by LIKE and PATINDEX
--
-- Created by: Chris Morris 12-Oct-2012
CREATE FUNCTION dbo.PatternSplitCM
(
#List VARCHAR(8000) = NULL,
#Pattern VARCHAR(50)
) RETURNS TABLE WITH SCHEMABINDING
AS
RETURN
WITH numbers AS (
SELECT TOP(ISNULL(DATALENGTH(#List), 0))
n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
FROM
(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n),
(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e (n),
(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) f (n),
(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) g (n))
SELECT
ItemNumber = ROW_NUMBER() OVER(ORDER BY MIN(n)),
Item = SUBSTRING(#List,MIN(n),1+MAX(n)-MIN(n)),
[Matched]
FROM (
SELECT n, y.[Matched], Grouper = n - ROW_NUMBER() OVER(ORDER BY y.[Matched],n)
FROM numbers
CROSS APPLY (
SELECT [Matched] = CASE WHEN SUBSTRING(#List,n,1) LIKE #Pattern THEN 1 ELSE 0 END
) y
) d
GROUP BY [Matched], Grouper
Solution against a variable:
DECLARE #string varchar(8000) =
'Blah blah 3/1/2017, 12/19/2018,1/2/2020,1111/11111/1111/111/111 blah blah';
SELECT TOP (1) Item
FROM dbo.patternSplitCM(#string, '[0-9/]')
WHERE [Matched] = 1 AND ISDATE(item) = 1
ORDER BY -ItemNumber;
Results:
item
------
1/2/2020
Example against a table:
DECLARE #table table (someid int identity, sometext varchar(1000));
INSERT #table(sometext) VALUES
('Blah blah 3/1/2017, 12/19/2018,1/2/2020,1111/11111/1111/111/111 blah blah'),
('Yada yada 1/1/12, 12/31/1999 call me at 555-1212!');
SELECT t.someid, getLastDate.Item
FROM #table t
CROSS APPLY
(
SELECT TOP (1) Item
FROM dbo.patternSplitCM(t.sometext, '[0-9/]')
WHERE [Matched] = 1 AND ISDATE(item) = 1
ORDER BY -ItemNumber
) getLastDate;
Results:
someid Item
----------- -----------
1 1/2/2020
2 12/31/1999

mysql + all numbers from one field to Sum

Is it possible to sum the digits in a string and sort by that?
Example values: 19, 21
19 Should be transformed to 10. Explanation: 1+9=10
21 Should be transformed to 3. Explanation: 2+1= 3
After calculating these results, the table needs to be sorted by the resulting values (using SORT BY).
Originally, I have those values stored as JSON array, so it's ["1","9"] and ["2","1"], and in order to parse the JSON I'm using replace as follows:
REPLACE(REPLACE(REPLACE(item_qty, '["', ''), '"]', ''), '","', '')
How about trying something like:
SELECT (
SUBSTRING('["1","9"]', 3, 3) +
SUBSTRING('["1","9"]', 7, 7)
) AS sumOfDigits;
And then if the value ["1","9"] is stored in a column named json and the table is named table you van do:
SELECT * FROM (
SELECT table.*, (
SUBSTRING('json', 3, 3) +
SUBSTRING('json', 7, 7)
) AS sumOfDigits
FROM table
) tmp
ORDER BY sumOfDigits;
I would define a function to do the sum, as follow:
DELIMITER //
CREATE FUNCTION add_digits
(
number INTEGER
) RETURNS INTEGER
BEGIN
DECLARE my_sum INTEGER;
SET my_sum = 0;
SET number = ABS(number);
WHILE (number > 0) DO
SET my_sum = my_sum + (number MOD 10);
SET number = number DIV 10;
END WHILE;
RETURN my_sum;
END //
DELIMITER ;
You could also create a function that works directly on your json sting, parsing it for digits and adding their values.

Count flags for a variable (big) number of colums

I have a table which looks like this: http://i.stack.imgur.com/EyKt3.png
And I want a result like this:
Conditon COL
ted1 4
ted2 1
ted3 2
I.e., the count of the number of '1' only in this case.
I want to know the total no. of 1's only (check the table), neglecting the 0's. It's like if the condition is true (1) then count +1.
Also consider: what if there are many columns? I want to avoid typing expressions for every single one, like in this case ted1 to ted80.
Using proc means is the most efficient method:
proc means data=have noprint;
var ted:; *captures anything that starts with Ted;
output out=want sum =;
run;
proc print data=want;
run;
Try this
select
sum(case when ted1=1 then 1 else 0 end) as ted1,
sum(case when ted2=1 then 1 else 0 end) as ted2,
sum(case when ted3=1 then 1 else 0 end) as ted3
from table
In PostgreSQL (tested with version 9.4) you could unpivot with a VALUES expression in a LATERAL subquery. You'll need dynamic SQL.
This works for any table with any number of columns matching any pattern as long as selected columns are all numeric or all boolean. Only the value 1 (true) is counted.
Create this function once:
CREATE OR REPLACE FUNCTION f_tagcount(_tbl regclass, col_pattern text)
RETURNS TABLE (tag text, tag_ct bigint)
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE (
SELECT
'SELECT l.tag, count(l.val::int = 1 OR NULL)
FROM ' || _tbl || ', LATERAL (VALUES '
|| string_agg( format('(%1$L, %1$I)', attname), ', ')
|| ') l(tag, val)
GROUP BY 1
ORDER BY 1'
FROM pg_catalog.pg_attribute
WHERE attrelid = _tbl
AND attname LIKE col_pattern
AND attnum > 0
AND NOT attisdropped
);
END
$func$;
Call:
SELECT * FROM f_tagcount('tbl', 'ted%');
Result:
tag | tag_ct
-----+-------
ted1 | 4
ted2 | 1
ted3 | 2
The 1st argument is a valid table name, possibly schema-qualified. Defense against SQL-injection is built into the data type regclass.
The 2nd argument is a LIKE pattern for the column names. Hence the wildcard %.
db<>fiddle here
Old sqlfiddle
Related:
Select columns with particular column names in PostgreSQL
SELECT DISTINCT on multiple columns