Ordering an existing database by numbers in a string

Ordering an existing database by numbers in a string - mysql

This one is a bit of a nightmare. I'm working on frontend for an existing database,and I'm having to jump through hoops to make sure that data is displayed in the correct order. It'd make my life a whole lot simpler if I could just order by Id, but the Ids have little or no correlation to the data.
Here's what I mean
ID DATA
357 "7-1-5: Sensitive Information I can't share"
2521 "30-2-8-17: Yet more sensitive Information"
6002 "9-30: There's a 10 behind the colon, because I hate you"
8999 "2-2-4: This was populated in no particular order"
9001 "30-3: More Info."
I'm trying to get it ordered like this
ID DATA
0001 "2-2-4: This was populated in no particular order"
0002 "7-1-5: Sensitive Information I can't share"
0003 "9-30: There's a 10 behind the colon, because I hate you"
0004 "30-2-8-17: Yet more sensitive Information"
0005 "30-3: More Info."
Basically, I need it to sort by each 1 to 2 digit number that's separated by dashes, again and again, so that 1-3 comes after 1-2-1, which comes after 1-1-50.
Like I said in the beginning, I'm a frontend guy, so executing stuff in MySql is more than I can do alone. Any help would be immensely appreciated.
Edit: I just realized there's foreign keys in a separate table pointing to this one, making things just that much worse.

Try this query:
SELECT col
FROM yourTable
ORDER BY SUBSTRING(col, INSTR(col, '"') + 1, INSTR(col, ':') - INSTR(col, '"') - 1)
The SUBSTRING(...) term in the ORDER BY clause extracts just the ids from the text. Presumably you want them to sorted numerically, from left to right. Even though they are varchar, numerical sorting should still work.
For your sample data, this produced the following output:
ID 8999 DATA "2-2-4: This was populated in no particular order"
ID 2521 DATA "30-2-8-17: Yet more sensitive Information"
ID 357 DATA "7-1-5: Sensitive Information I can't share"
ID 6002 DATA "9-30: There's a 10 behind the colon, because I hate you"
Fiddle is down as of the writing of this answer, but I tested the query in MySQL Workbench and it seems to work well.
Edit:
If you want to assign a new ID to each record, you create a new table (newTable) with an ID column which is auto increment. Then you can use INSERT INTO ... SELECT along with the above ORDER BY logic to populate the table. The ID field should be incremented automatically by MySQL.
INSERT INTO newTable (`id`, `col`)
SELECT NULL, col
FROM yourTable
ORDER BY SUBSTRING(col, INSTR(col, '"') + 1, INSTR(col, ':') - INSTR(col, '"') - 1)

Something like this should work, but it is very delicate; all the fields calculated in the outer SELECT (after the *) must be performed in that exact order. Note that the calculations aliased nl#, p#, and r# (except r0) repeat exactly... so the query is not as complicated as it initially appears.
SELECT *
, #r := dataOrd AS r0 -- #r is "remaining string"
, #nextSep := INSTR(#r, '-') AS nl1
, CAST(CASE #nextSep WHEN 0 THEN #r ELSE SUBSTR(#r, 1, #nextSep-1) END AS UNSIGNED) AS p1
, #r := CASE #nextSep WHEN 0 THEN '' ELSE SUBSTRING(#r, #nextSep+1) END AS r1
, #nextSep := INSTR(#r, '-') AS nl2
, CAST(CASE #nextSep WHEN 0 THEN #r ELSE SUBSTR(#r, 1, #nextSep-1) END AS UNSIGNED) AS p2
, #r := CASE #nextSep WHEN 0 THEN '' ELSE SUBSTRING(#r, #nextSep+1) END AS r2
, #nextSep := INSTR(#r, '-') AS nl3
, CAST(CASE #nextSep WHEN 0 THEN #r ELSE SUBSTR(#r, 1, #nextSep-1) END AS UNSIGNED) AS p3
, #r := CASE #nextSep WHEN 0 THEN '' ELSE SUBSTRING(#r, #nextSep+1) END AS r3
, #nextSep := INSTR(#r, '-') AS nl4
, CAST(CASE #nextSep WHEN 0 THEN #r ELSE SUBSTR(#r, 1, #nextSep-1) END AS UNSIGNED) AS p4
, #r := CASE #nextSep WHEN 0 THEN '' ELSE SUBSTRING(#r, #nextSep+1) END AS r4
FROM
(
SELECT *, SUBSTR(`DATA`, 1, INSTR(`DATA`, ':') - 1) AS dataOrd
FROM yourTable
) AS sepSubQ
ORDER BY p1, p2, p3, p4
;
Technically, the last #r assignment (aliased r4) is unnecessary, but it completes the pattern that will be need to be repeated if you need to handle more than 4 ordering "parts"; in which case you just need to repeat the last three field calculations (with incremented aliases).
If you want to be rid of the "working" fields, you can wrap this in another outer query only selects the fields from the original table you wanted and the pX fields from the above query; technically, you don't even need to select the pX fields as the order will already be performed by this query, or can be done in the wrapper without selecting them.
SELECT `ID`, `DATA`
FROM ([the query above]) AS subQ
ORDER BY p1, p2, p3, p4
;

Related

SQL delete unnecessary characters from a row and parse a table into a new table

I have created the below code:
with t as (select *,
case
when `2kids`= '1' then '2kids' else'' end as new_2kids,
case
when `3kids`= '1' then '3kids' else'' end as new_3kids,
case
when kids= '1' then 'kids' else'' end as kids
from test.family)
select concat_ws('/',new_2kids, new_3kids, new_kids) as 'nc_kids'
from t;
If I run this query my output will be:
nc_kids
2kids/new_3kids/
2kids//
/new_3kids/new_kids
2kids/new_3kids/new_kids
How can I remove all the unnecessary '/' which not followed by character.
For example:
nc_kids
2kids/new_3kids
2kids
new_3kids/new_kids
2kids/new_3kids/new_kids

concat_ws() ignore nulls, so you can just turn the empty strings to null values at concatenation time:
select concat_ws('/',
nullif(new_2kids, ''),
nullif(new_3kids, ''),
nullif(new_kids, '')
) as nc_kids
from t;
Better yet, fix the case expressions so they produce null values instead of empty stings in the first place:
with t as (
select f.*,
case when `2kids`= 1 then '2kids' end as new_2kids,
case when `3kids`= 1 then '3kids' end as new_3kids,
case when kids = 1 then 'kids' end as kids
from test.family f
)
select concat_ws('/',new_2kids, new_3kids, new_kids) as nc_kids
from t;

User-defined function sorting column problem

I have taken reference from the internet about one user-defined function to locate 'nth occurrence of a string to do the sort column name in the database. I am using MySQL 5.5 version, not the latest version. Here is my sample database link https://dbfiddle.uk/?rdbms=mysql_5.5&fiddle=bcb32a6b47d0d5b061fd401d0888bdc3
My problem is I want to sort column name in the database follow the prefix number, but I am using below the SQL query, it doesn't work.
select t.id,t.name
from
(
select t.*, cast((case when col1_col2_ref > 0
then
substring_index(modified_name,'-',1)
else
modified_name
end
) as unsigned) col1
, cast((case when col1_col2_ref > 0
and col3_ref > 0
then
substr(modified_name,(col1_col2_ref + 1),(col3_ref - (col1_col2_ref + 1)))
when col1_col2_ref > 0
then
substr(modified_name,(col1_col2_ref + 1))
end) as unsigned) col2
, cast((case when col3_ref > 0
and col4_ref > 0
then
substr(modified_name,(col3_ref + 1),(col4_ref - (col3_ref + 1)))
when col3_ref > 0
then
substr(modified_name,(col3_ref + 1))
end) as unsigned) col3
, cast((case when col4_ref > 0
then
substr(modified_name,(col4_ref + 1))
end) as unsigned) col4
from
(
select t.*,substring_index(name,' ',1) modified_name
,locate('-',name,1) col1_col2_ref
,locate('/',name,1) col3_ref
,locate('/',name,locate('/',name,1)+1) col4_ref
from filing_code_management t
) t
) t
order by col1,col2,col3,col4
It shows me below the result, it cannot sort properly.
Output 1
Actually I want the output sample like below:
Output 2
Output 3
This is before I can sort the column name link, https://dbfiddle.uk/?rdbms=mysql_5.5&fiddle=6b12a4d42359cb30f27a5bfb9d0c8210. After I am inserted into new data, it cannot work for me. Maybe an example in new data like this error (R)100-6-2-2 Mesyuarat Majlis Kerajaan Negeri (MMKN) JKK if I put () in front. Or in new data like this error 100-1-1 Penggubalan/Penyediaan/Pindaan Undang-Undang/Peraturan if I put / in between the word.
Hope someone can guide me to solve this problem. Thanks.

You should be able to adapt the following code to your needs (tested at your DB Fiddle!). I've used the file_name column instead of the name column to slightly simplify building the sort fields, as it seems the file name is always repeated in the first part of the name field anyway.
This would be quite a bit simpler using regular expression support, but I note that the version of MySQL you are using doesn't have this feature (I think it arrives in SQL 8.0, if I'm not mistaken).
SELECT id,
num_hyphens,
CAST(SUBSTRING_INDEX(CONCAT(file_name_adj,'-'), '-', 1) AS UNSIGNED) AS sort1,
CAST(CASE WHEN num_hyphens = 0
THEN '0'
ELSE SUBSTRING_INDEX(SUBSTRING_INDEX(file_name_adj,'-', 2), '-',-1)
END AS UNSIGNED) AS sort2,
CAST(CASE WHEN num_hyphens <= 1
THEN '0'
ELSE SUBSTRING_INDEX(SUBSTRING_INDEX(file_name_adj,'-', 3), '-',-1)
END AS UNSIGNED) AS sort3,
CAST(CASE WHEN num_hyphens <= 2
THEN '0'
ELSE SUBSTRING_INDEX(file_name_adj, '-', -1)
END AS UNSIGNED) AS sort4,
file_name,
name
FROM (
SELECT id, name, MID(file_name, instr(file_name, ')') + 1) AS file_name_adj, file_name,
LENGTH(file_name) - LENGTH(REPLACE(file_name, '-', '')) AS num_hyphens
FROM filing_code_management
) t1
ORDER BY sort1, sort2, sort3, sort4

Teradata Masking - Retain all chararcters at position 1,4,8,12,16 .... in a string and mask remaining characters with 'X'

I have a requirement where I need to mask all but characters in position 1,4,8,12,16.. for a variable length string with 'X'
For example:
Input string - 'John Doe'
Output String - 'JXXn xxE'
SPACE between the two strings must be retained.
Kindly help or reach out for more details if required.

I think maybe an external function would be best here, but if that's too much to bite off, you can get crafty with strtok_split_to_table, xml_agg and regexp_replace to rip the string apart, replace out characters using your criteria, and stitch it back together:
WITH cte AS (SELECT REGEXP_REPLACE('this is a test of this functionality', '(.)', '\1,') AS fullname FROM Sys_Calendar.calendar WHERE calendar_date = CURRENT_DATE)
SELECT
REGEXP_REPLACE(REGEXP_REPLACE((XMLAGG(tokenout ORDER BY tokennum) (VARCHAR(200))), '(.) (.)', '\1\2') , '(.) (.)', '\1\2')
FROM
(
SELECT
tokennum,
outkey,
CASE WHEN tokennum = 1 OR tokennum mod 4 = 0 OR token = ' ' THEN token ELSE 'X' END AS tokenout
FROM TABLE (strtok_split_to_table(cte.fullname, cte.fullname, ',')
RETURNS (outkey VARCHAR(200), tokennum integer, token VARCHAR(200) CHARACTER SET UNICODE)) AS d
) stringshred
GROUP BY outkey
This won't be fast on a large data set, but it might suffice depending on how much data you have to process.
Breaking this down:
WITH cte AS (SELECT REGEXP_REPLACE('this is a test of this functionality', '(.)', '\1,') AS fullname FROM Sys_Calendar.calendar WHERE calendar_date = CURRENT_DATE)
This CTE is just adding a comma between every character of our incoming string using that regexp_replace function. Your name will come out like J,o,h,n, ,D,o,e. You can ignore the sys_calendar part, I just put that in so it would spit out exactly 1 record for testing.
SELECT
tokennum,
outkey,
CASE WHEN tokennum = 1 OR tokennum mod 4 = 0 OR token = ' ' THEN token ELSE 'X' END AS tokenout
FROM TABLE (strtok_split_to_table(cte.fullname, cte.fullname, ',')
RETURNS (outkey VARCHAR(200), tokennum integer, token VARCHAR(200) CHARACTER SET UNICODE)) AS d
This subquery is the important bit. Here we create a record for every character in your incoming name. strtok_split_to_table is doing the work here splitting that incoming name by comma (which we added in the CTE)
The Case statement just runs your criteria swapping out 'X' in the correct positions (record 1, or a multiple of 4, and not a space).
SELECT
REGEXP_REPLACE(REGEXP_REPLACE((XMLAGG(tokenout ORDER BY tokennum) (VARCHAR(200))), '(.) (.)', '\1\2') , '(.) (.)', '\1\2')
Finally we use XMLAGG to combine the many records back into one string in a single record. Because XMLAGG adds a space in between each character we have to hit it a couple of times with regexp_replace to flip those spaces back to nothing.
So... it's ugly, but it does the job.
The code above spits out:
tXXs XX X XeXX oX XhXX fXXXtXXXaXXXy

I couldn't think of a solution, but then #JNevill inspired me with his idea to add a comma to each character :-)
SELECT
RegExp_Replace(
RegExp_Replace(
RegExp_Replace(inputString, '(.)(.)?(.)?(.)?', '(\1(\2[\3(\4', 2)
,'(\([^ ])', 'X')
,'(\(|\[)')
,'this is a test of this functionality' AS inputString
tXXs XX X XeXX oX XhXX fXXXtXXXaXXXy
The 1st RegExp_Replace starts at the 2nd character (keep the 1st character as-is) and processes groups of (up to) 4 characters adding either a ( (characters #1,#2,#4, to be replaced by X unless it's a space) or [ (character #3, no replacement), which results in :
t(h(i[s( (i(s[ (a( (t[e(s(t( [o(f( (t[h(i(s( [f(u(n(c[t(i(o(n[a(l(i(t[y(
Of course this assumes that both characters don't exists in your input data, otherwise you have to choose different ones.
The 2nd RegExp_Replace replaces the ( and the following character with X unless it's a space, which results in:
tXX[s( XX[ X( X[eXX( [oX( X[hXX( [fXXX[tXXX[aXXX[y(
Now there are some (& [ left which are removed by the 3rd RegExp_Replace.
As I still consider me as a beginner in Regular Expressions, there will be better solutions :-)
Edit:
In older Teradata versions not all parameters were optional, then you might have to add values for those:
RegExp_Replace(
RegExp_Replace(
RegExp_Replace(inputString, '(.)(.)?(.)?(.)?', '(\1(\2[\3(\4', 2, 0 'c')
,'(\([^ ])', 'X', 1, 0 'c')
,'(\(|\[)', '', 1, 0 'c')

SSRS 2008 R2 - all parameters multi value AND optional

I've got a report that has 6 parameters. All parameters need to be optional and 3 have to be multi-value. One of the optional parameters is a dropdown, the rest are manually keyed in text boxes.
The Where clause below works when there are multiple #VendorNum values and one #FullJA value, but fails with multiple #FullJA values regardless of the #VendorNum count.
Parameters:
#VendorNum - keyed manually by user (space delimited) - optional, can be multivalue
#FullJA - keyed manually by user (space delimited) - optional, can be multivalue
#BU - optional, can be multivalue - when #JA is populated, this will auto-populate, if #JA isn't populated it's a dropdown with all selected.
#JA3 - keyed by user - optional, single value
#StartDate and #EndDate - optional single values
select * from some_table
WHERE
/*FULL JA*/
(
SUBSTRING(VendorNum, PATINDEX('%[^0]%', VendorNum + '.'), LEN(VendorNum)
) IN (#VendorNum)
AND LEFT(JA, 7) IN (#FullJA)
AND BU IN(#BU)
AND #JA3 IS NULL
)
OR
/*DATE RANGE*/
(
SUBSTRING(VendorNum, PATINDEX('%[^0]%', VendorNum + '.'), LEN(VendorNum)
) IN (#VendorNum)
AND LEN(ISNULL(CONVERT(VARCHAR(20), Cleared_When), '0')) >= #ClearedOnly
AND ad.Audit_Publish_Date >= ISNULL(#StartDate, '2015-01-01')
AND ad.Audit_Publish_Date <= ISNULL(#EndDate, '2025-12-31')
AND BU IN (#BU)
AND #FullJA IS NULL
AND #JA3 IS NULL
)
/*BUS UNIT AND JA3*/
OR (
SUBSTRING(VendorNum, PATINDEX('%[^0]%', VendorNum + '.'), LEN(VendorNum)
) IN (#VendorNum)
AND BU IN (#BU)
AND ad.Audit_Publish_Date >= ISNULL(#StartDate, '2015-01-01')
AND ad.Audit_Publish_Date <= ISNULL(#EndDate, '2025-12-31')
AND LEFT(JA, 3) = (#JA3)
AND #FullJA IS NULL
)
/*BUS UNIT ONLY*/
OR (
SUBSTRING(VendorNum, PATINDEX('%[^0]%', VendorNum + '.'), LEN(VendorNum)
) IN (#VendorNum)
AND BU IN (#BU)
AND ad.Audit_Publish_Date >= ISNULL(#StartDate, '2015-01-01')
AND ad.Audit_Publish_Date <= ISNULL(#EndDate, '2025-12-31')
AND #JA3 IS NULL
AND #FullJA IS NULL
)
The dataset parameter values for #FullJA and #VendorNum are both
=IIF(InStr(Parameters!FullJA.Value," ")>0,SPLIT(Parameters!FullJA.Value," "),Parameters!FullJA.Value) and all params are set as NOT multivalue, with nulls allowed.
Any help would be greatly appreciated. I've written over 200 reports for this project and this is the only one that is really grinding my gears!
Thanks!

I would approach this by building up some temp tables / table variables, to hold the potentially multi-valued variables, and then joining to those tables. This has the advantage of you being able to insert all possible values, in the case they have omitted the variable. So, you'd split your strings and put them into those tables (something along the lines of this example) if given the variable, and otherwise just do an insert into to populate your temp table / table variable.
For a split function, I prefer something like this:
create FUNCTION [dbo].[Split] (#sep VARCHAR(32), #s VARCHAR(MAX))
RETURNS TABLE
AS
RETURN
(
SELECT r.value('.','VARCHAR(MAX)') as Item
FROM (SELECT CONVERT(XML, N'<root><r>' + REPLACE(REPLACE(REPLACE(#s,'& ','& '),'<','<'), #sep, '</r><r>') + '</r></root>') as valxml) x
CROSS APPLY x.valxml.nodes('//root/r') AS RECORDS(r)
)
GO
GRANT SELECT
ON OBJECT::[dbo].[Split] TO PUBLIC
AS [dbo];
I would then put those variables into a table using something like this (my separator is a ", "):
select ltrim(rtrim(ppl.Item)) as PersonName
into #gppl
from dbo.Split(', ', #PersonListForCompare) as ppl
You would do something more like:
select ltrim(rtrim(vnd.Item)) as VendorNum
into #vendorNums
from dbo.Split(', ', #VendorNum) as vnd
You would then join to that temp table just like any other table & use it to limit your results that way. In your case, you want to put in all vendors (possibly) if they didn't give you any input. So, you'd do something like:
create table #vendorNums (VendorName varchar(64)) --I have no idea, here, what this data looks like
if #VendorNum is not null and datalength(#VendorNum) > 0
insert into into #vendorNums (VendorNum)
select ltrim(rtrim(vnd.Item))
from dbo.Split(', ', #VendorNum) as vnd
else
insert into into #vendorNums (VendorNum)
select VendorNum
from dbo.Vendors
That said, I think that you could use your select from dbo.Split directly as a table in a join, rather than putting it into the temp table. Only problem would be you'd have to be sure you had data in there to split, or else you're going to have a bunch of combinations to get the right match-up of null parameters vs. filled ones.

mySQL extract number or string depending on value

Please take a look at a table below:
I need "the last part" of "what I have" to be number if it's all number and string if it contains character.
In Excel I've achieved this with the following function (as shown above):
=IFERROR(VALUE(TEXT(D2;"0"));TEXT(D2;"0"))
However I want to do this in mySQL in order to compute more effectively.
I've "floated" somwhere around CASE with CAST or CONVERT and also TRIM functions, but I haven't been able to put up something sensical.
A nice "bonus" would be to extract "the part part" by looking for the last "" character (so first "" from right of the string) but no idea at all how to achieve that.

Use SUBSTRING_INDEX.
Query
SELECT
CASE
WHEN SUBSTRING_INDEX(records.data, '_', -1) > 1 # is int check '00004949' returns 1
THEN SUBSTRING_INDEX(records.data, '_', -1) + 0 # converts '00004949' to 4949
ELSE SUBSTRING_INDEX(records.data, '_', -1)
END
AS word
FROM (
SELECT "TRA_PL_NWL_EMA_NWLY_DAI_000_20170610_IN1_01P002bc" AS DATA
UNION
SELECT "TRA_PL_NWL_EMA_NWLY_DAI_000_2017_0909_JET_00004949" AS DATA
) records
Result
word
----------
01P002bc
4949

Following query will somewhat achieve the task:
SELECT
case
when SUBSTRING_INDEX(value, "_", -1) REGEXP('(^[0-9]+$)')
then Trim(Leading 0 from SUBSTRING_INDEX(value, "_", -1))
else SUBSTRING_INDEX(value, "_", -1)
end as Value
From yourtable;
Click here for Demo
Hope it helps!

I hope this is what u want.
At least it does what you have asked for.
SELECT
CASE
WHEN (
CONVERT(
substring(
txt,
LENGTH(txt) - LOCATE('_', REVERSE(txt))+2,
length(txt)
)
, signed integer
)
) = 0
THEN substring(
txt,
LENGTH(txt) - LOCATE('_', REVERSE(txt))+2,
length(txt)
)
ELSE CONVERT(
substring(
txt,
LENGTH(txt) - LOCATE('_', REVERSE(txt))+2,
length(txt)
)
, signed integer
)
END as NUMBER
from test.test
This is my test Table and result of SQL:
txt NUMBER
DA_DA_ADAD_ADAD_ADAD_asd123 asd123
DA_DA_ADAD_ADAD_ADAD_000123 123
DA_DA_ADAD_ADAD_ADAD_444 444
DA_DA_ADAD_ADAD_ADAD_bsd123 bsd123
DA_DA_ADAD_ADAD_ADAD_csd123 csd123
DA_DA_ADAD_ADAD_ADAD_dsd123 dsd123

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Ordering an existing database by numbers in a string - mysql

Related

SQL delete unnecessary characters from a row and parse a table into a new table

User-defined function sorting column problem

Teradata Masking - Retain all chararcters at position 1,4,8,12,16 .... in a string and mask remaining characters with 'X'

SSRS 2008 R2 - all parameters multi value AND optional

mySQL extract number or string depending on value

Categories

Resources