A vendor writes user defined data into a a single column as XML, I need to write an SQL query (2008 / 2012 / 2014) that pulls data from that column for a 3rd party application, here's an example of what is in the column
<udf><udf_date_ppe>15/12/2019</udf_date_ppe><udf_text_ppn>300965994</udf_text_ppn><udf_date_ved>8/12/2016</udf_date_ved><udf_text_vtno>417 - Working holiday</udf_text_vtno><udf_text_ppi>Taiwan</udf_text_ppi></udf>
The problem is I need to grab all the actual data not the XML and the XML isn't stored in the same order meaning I have to dynamically figure out the lengths to trim left and right. For example, I want the date only inbetween this XML
<udf_date_ppe>15/12/2019</udf_date_ppe>
but I don't know how many characters are before it. Once I figure out how to do one I can replicate for the other fields, this is only one user defined filed but at least the XML isn't going to change. I only have view access to the server.
Bit of a pain I know but any help is appreciated.
If you just want to grab the 10 characters appearing inside the <udf_date_ppe> tags, you can use SQL Server's string functions and the following query:
SELECT SUBSTRING(col, CHARINDEX('<udf_date_ppe>', col) + 14, 10)
FROM yourTable
This assumes that there is only a single <udf_date_ppe> tag in the column.
Related
I want to select only the area code from a list of column entries populated by phone numbers. This is what I have:
SELECT LEFT(phone, 3) AS areacode, COUNT(phone) AS count
FROM registration
GROUP BY areacode;
The problem is, the entries aren't consistent. So some phone numbers start as +123-456-7899, and others with (123)-456-7899, and others with no symbol at the beginning.
So my question is: is there a way that I can ensure the SELECT LEFT starts at the first integer?
Thanks!
There are somethings that SQL is just not meant for. This is one. I would select the phone number into a string, and do some pattern matching in your programming language of choice to find the area code.
-OR-
Change your table such that area code is a different column.
Two options (neither of which being SQL):
Select all phone numbers and use a programming language of your choice to programatically strip out the unnecessary characters.
Clean the input to strip out all unnecessary characters prior to inserting them into the database
SQL is not the best way to do this, rather, SQL + programming
There actually is a way to do this in SQL that was intentionally designed for this exact purpose.
SELECT SUBSTRING(office_phone_number, 1, 3) FROM contact;
Of course, this depends on how the number is stored in the table. If parenthesis are present, your starting position would be off.
Here is more information:
MySQL substring function
I have a table with a decimal column with a lenght = 9 and decimals = 2.
If I put a value of 21.59 (for example) it works ok.
If I put 52.00 it writes only 52. I need to keep 52.00 instead.
Master question: Can the database store the value this way? Instead of
using format/cast in select to retrieve the value...
As noted bellow, this make sense:
"You shouldn't worry about display formatting issues at the database
level but at the ... display level"
Use the FORMAT function:
select format(mycolumn, 2) from mytable;
This also has the effect of adding thousand's separator into the number, so you would get output like 123,456.70. There are workarounds if this doesn't work for you.
Given that MySQL doesn't have the world's best facilities for formatting numbers, display issues like this are usually handled in client code.
I ran into a problem with SQL Server Integration Services 2012's new string function in the Expression Editor called TOKEN().
This is supposed to help you parse a delimited record. If the record comes out of a flat file, you can do this with the Flat File Source. In this case, I am dealing with old delimited import records that were stored as strings in a database VARCHAR field. Now they need to be extracted, massaged, and re-exported as delimited strings. For example:
1^Apple^0001^01/01/2010^Anteater^A1
2^Banana^0002^03/15/2010^Bear^B2
3^Cranberry^0003^4/15/2010^Crow^C3
If these strings are in a column called OldImportRecord, the delimiter is a caret (as shown), and we wish to put the fifth field into a Derived Column, we would use an expression like:
TOKEN(OldImportRecord,"^",5)
This returns Anteater, Bear, Crow, etc. In fact, we can create Derived Columns for each of the fields in this record (note that the index is one-based), change them as needed, and then build another delimited record for export.
Here's the problem. What if some of our data includes some empty strings (or Nulls rendered as empty strings)?
4^^0004^6/15/2010^Duck^D4
The TOKEN() fails to count the adjacent column delimiters, which throws off the column count. Now it only sees five columns instead of six columns. Our TOKEN(OldImportRecord,"^",5) returns "D4" instead of the intended "Duck". When we extract the fourth column, we wind up trying to put "Duck" into a Date column, and all sorts of fun ensues.
Here's a partial workaround:
TOKEN(REPLACE(OldImportRecord,"^^","^ ^"),"^",5)
Notice this misses every second delimiter pair, so it will fail for a string like "5^^^^Emu^E5", which looks like"5^ ^^ ^Emu^E5" after the REPLACE(). The column count is still wrong.
So here's my full workaround. This includes two nested REPLACE statements(), an RTRIM() to remove the superfluous spaces, and a DT_STR cast because I would like to keep the result in VARCHAR:
(DT_STR,255,1252)RTRIM(TOKEN(REPLACE(REPLACE(OldImportRecord,"^^","^ ^"),"^^","^ ^"),"^",5))
I am posting this for information, since others may also run into this problem.
Does anyone have a better workaround, or even a real solution?
Reason for the issue:
TOKEN method in SSIS uses the implementation of strtok function in C++. I gathered this information while reading the book Microsoft® SQL Server® 2012 Integration Services. It is mentioned as note on page 113 (I like this book! Lots of nice information.).
I searched for the implementation of strtok function and I found the following links.
INFO: strtok(): C Function -- Documentation Supplement - The code sample in this link shows that the function does ignore consecutive delimiter characters.
The answers to the following SO questions point out that strtok function is designed to ignore consecutive delimiters.
Need to know when no data appears between two token separators using strtok()
strtok_s behaviour with consecutive delimiters
I think that the TOKEN and TOKENCOUNT functions are working as per design but whether that is how SSIS should behave might be a question for the Microsoft SSIS team.
Original Post - Above section is an update:
I created a simple package in SSIS 2012 based on your data inputs. As you had described in your question, the TOKEN function does not behave as intended. I agree with you that the function doesn't seem to work. This post is not an answer to your original issue.
Here is an alternative way to write the expression in a relatively simpler fashion. This will only work if the last segment in your input record will always have a value (say A1, B2, C3 etc.).
Expression can be rewritten as:
This statement will take the input record as the parameter, the delimiter caret (^) as the second parameter. The third parameter calculates the total number segments in the records when split by the delimiter. If you have data in the last segment, you are guaranteed to have two segments. You can then subtract 1 to fetch the penultimate segment.
(DT_STR,50,1252)TOKEN(OldImportRecord,"^",TOKENCOUNT(OldImportRecord,"^") - 1)
I created a simple package with data flow task. OLE DB source retrieves the data and the derived transformation parses and splits the data as per the screenshot below. The output is then inserted into the destination table. You can see the source and destination tables in the last screenshot. Destination table has two columns. The first column stores the penultimate segment data and the segments count based on the delimiter (which again isn't correct). You can notice that the last record didn't fetch the correct results. If the last record didn't have the value 8, then the above expression will fail because the expression will evaluate to zero index.
Hope that helps to simplify your expression.
If you don't hear from anyone else, I would recommend logging this issue in Microsoft Connect website.
Create table and populate scripts:
CREATE TABLE [dbo].[SourceTable](
[OldImportRecord] [varchar](50) NOT NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[DestinationTable](
[NewImportRecord] [varchar](50) NOT NULL,
[CaretCount] [int] NOT NULL
) ON [PRIMARY]
GO
INSERT INTO dbo.SourceTable (OldImportRecord) VALUES
('1^Apple^0001^01/01/2010^Anteater^A1'),
('2^Banana^0002^03/15/2010^Bear^B2'),
('3^Cranberry^0003^4/15/2010^Crow^C3'),
('4^^0004^6/15/2010^Duck^D4'),
('5^^^^Emu^E5'),
('6^^^^Geese^F6'),
('^^^^Pheasant^G7'),
('8^^^^Sparrow^');
GO
Derived column transformation inside data flow task:
Data in source and destination tables:
Not only does TOKEN skip adjacent delimiters, it also skips leading and trailing delimiters as well. So, using your example, if you had a field "good" field that looks like this:
1^Apple^0001^01/01/2010^Anteater^A1
Followed by one with adjacent and leading delimiters like this:
^^^0004^6/15/2010^Duck^
TOKENCOUNT would only find two delimiters and you'd end up with 0004 assigned to Token1, 6/15/2010 for Token2, and Duck for Token3.
I used a different kind of replace. Rather than placing spaces between adjacent delimiters, which wouldn't help with leading or training, I used replace to surround the delimiters with characters I absolutely wouldn't find in my text. The following Expression works well for me. It's wordy, but it is what it is.
(DT_STR,255,1252)REPLACE(TOKEN(REPLACE(OldImportRecord,"^","~^~"),"^",1),"~","")
Of course, you'd replace the number 1 with whatever Token you wanted and adjust the cast according to your needs. Hope that helps.
I am teaching a guy to polish matura and they have to use Access sometimes.
I have a table with columns pyt_1, pyt_5. I would like to display only these rows in which these columns are equal.
In SQL it looks like that:
SELECT pyt_1, pyt_5
FROM Ankiety
WHERE pyt_1=pyt_5;
but he has to know how to do it in query design
I chose the pyt_1, pyt_5 column and added criteria pyt_1 = pyt_5 but ACCESS automatically converts it into string, how to avoid that?
He converted my SQL into query design but it's weird.
Your problem is that you write = pyt_5 into the criteria field for the column pyt_1, and Access automatically converts it into = "pyt_5", correct?
If yes, you need to put pyt_5 in braces ([ ]).
This makes the Access query editor recognize it as a column name.
Here is a screenshot (German Access version with German labels, but you get the idea):
I process data from a legacy system with SSIS before importing data into a SQL Server 2008 db.
A currency field is so formatted:
000000xxx.xx
I need to remove the leading zeros. Note that the actual currency value has not a fixed number of digit. So, for instance, it could be xxxxx.xx or x.xx or xxxxxxxxxxx.xx
I found this answer, but the derived column tool display only limited string functions, like the Access formula wizard.
So my question:
how may I use PATINDEX in the DTS derived column tool
otherwise, how may I remove leading zeros from input column
how are you querying the data? I suppose you are using direct table access on your OleDBSource component, right? I do not advise that because a tiny change on the table may break your package.
I would right a view containing the select from your source table and on this select I would use the PATINDEX function and inside the package, I would select the view instead of the table.
This is a good approach because if you need to do a minor change, you can change the view instead of the package
If you dont want to write a view, fine, instead of selecting "data access mode" = "table or view", select "sql command" and write your sql directly on the package.
Are you not able to cast the data into a numeric equivalent type in the source system? That should be a quick way to drop the leading zeros. SELECT CAST(myCurrency AS decimal(18,2)) AS leadingZerosDropped FROM myTable
Otherwise, the dead simple SSIS way to do it is to use a Data Conversion task and cast it to a numeric type (DT_CY/Currency or DT_Numeric/Numeric worked just fine)
Source query
SELECT '000000111.11' AS stringCurrency
UNION ALL SELECT '0.22'
UNION ALL SELECT '03.33'
UNION ALL SELECT '004.44'
UNION ALL SELECT '0005.55'
UNION ALL SELECT '00000000000000000006.66'
Data Conversion transformation
I created a new column, currencyCurrency which was the stringCurrency with a data type of currency [DT_CY] applied.
Results
stringCurrency currencyCurrency
000000111.11 111.11
0.22 0.22
03.33 3.33
004.44 4.44
0005.55 5.55
00000000000000000006.66 6.66
I fully support not using table access mode and in general I'm in favor of pushing work onto the source system but this seems like a low effort if the source system doesn't allow for conversion or if you have to resort to mucking about with strings.