SQL - Parse HTML Data in Column - html

I have a column named "Message". In this column there is a data which is HTML code. I need to parse this HTML in SQL then split it 5 different column "Name" - "Surname" - "Email" - "Telephone" - "Message". Here is the HTML format that I need to parse;
<html>
<body>
<br><br>
<table>
<tr>
<td>NameSurname</td>
<td>kaydi peldi sord</td>
</tr>
<tr>
<td>Email</td>
<td>...#gmail.com</td>
</tr>
<tr>
<td>Telephone</td>
<td>535...5464</td>
</tr>
<tr>
<td colspan=2>Message</td>
</tr>
<tr>
<td colspan=2>Benfica-Fenerbahçe</td>
</tr>
</table>
</body>
</html>
First, split NameSurname to Name and Surname. The rule is split from last space (in this sample, it should "Name : ejder mehmet" , "Surname : sıkık", then insert other columns directly. How can I do that? Thanks for answers!

I'm a year late, it's not pretty, and it's definitely not 100% safe, but this does the job for me on the rare occasions I need to parse HTML. Create this function first.
CREATE FUNCTION dbo.StringBetweenTwoPatterns (#PrePattern varchar(max) #PostPattern varchar(max), #string varchar(max))
RETURNS varchar(Max)
AS
BEGIN
DECLARE #WildPre VARCHAR(MAX) = '%' + #PrePattern + '%'
DECLARE #WildPost VARCHAR(MAX) = '%' + #PostPattern + '%'
IF PATINDEX(#WildPre, #String) > 0
AND PATINDEX(#WildPost, #String) > 0
BEGIN
DECLARE #RIGHT VARCHAR(MAX) = SUBSTRING(#string, PATINDEX(#WildPre,#string) + LEN(#PrePattern), LEN(#string))
RETURN LEFT(#RIGHT,(PATINDEX(#WildPost,#RIGHT) - 1))
END
RETURN NULL
END
GO
When you call this function, you have to keep full formatting and white space in the search strings, so it's going to look like this:
SELECT [NameSurname] = StringBetweenTwoPatterns('<td>NameSurname</td>
<td>','</td>',[Message]
Splitting Name and Surname is something you should be able to extrapolate from the the substring, right, left, and patindex examples above. Or just google some other answers for that.

Related

Specify the culture used within the "With Clause" of OpenJSON

I live in Denmark. Here the Thousand separator is a dot (.), and we use comma (,) as comma-separator.
I know that you can use TRY_PARSE to convert a varchar into a money/float value.
An example:
declare
#JSON varchar(max)=
'
{
"Data Table":
[
{
"Value" : "27.123,49"
}
]
}
'
select
TRY_PARSE(Value as money using 'da-dk') "Correct Value"
FROM OpenJson(#json, '$."Data Table"')
WITH
(
"Value" nvarchar(255) N'$."Value"'
)
select
Value "Wrong Value"
FROM OpenJson(#json, '$."Data Table"')
WITH
(
"Value" money N'$."Value"'
)
This query gives me two results
My question is: Can I control the culture in the WiTH Clause of OpenJSON, so I get the correct result without having to use TRY_PARSE?
Target: SQL Server 2019
Not directly in OPENJSON(), no. ECMA-404 JSON Data Interchange Syntax specifically defines the decimal point as the U+002E . character - and doesn't provide for cultural allowances - which is why you're having to define culture-specific values as strings in the first place.
The correct way to do it is only using TRY_PARSE or TRY_CONVERT. eg
select try_parse('27.123,49' as money using 'da-DK')

SQL Server - transforming a string into JSON Object in routine

I am currently working on a way to get a distribution list for all users registered to our application. We are using a SQL server to store information, however, due to a move to a non-relational DB schema in the near future, most of our data is stored in one field as a JSON string (It's a VARCHAR(max) field that looks like JSON). When we serve this data back to our Java controllers, we convert the String into a JSON Object. As my question would most likely indicate, the list of users is located in this JSON string. While I know I can just do SELECT JSON_DATA FROM MYTABLE to get all entries of this field, convert it in Java, and get the users field that way, I would be essentially returning a TON of data and throwing away 95% of it.
I was wondering if there is a way in SQL Server to parse a JSON string? Essentially what I want to do is with the following table:
<table style="width:100%" border="1">
<tr>
<th>ID</th>
<th>JSON_DATA</th>
</tr>
<tr>
<td>1</td>
<td>{"data":data,"users":["User1", "User2"]}</td>
</tr>
<tr>
<td>2</td>
<td>{"data":data2,"users":["User2", "User3"]}</td>
</tr>
</table>
I want to return from my SQL routine a list of all unique users.
I think this might give you what you need:
Select JSON_QUERY([fieldName], $.users)
Here's a link to check out too: https://learn.microsoft.com/en-us/sql/t-sql/functions/json-query-transact-sql?view=sql-server-2017
Without native JSON support, I'm afraid you're looking a good ol' string parsing. This should get you part of the way: it returns a single string with
"User1", "User2", "User2", "User3"
DECLARE #ThisUserString VARCHAR(255)
, #FullUserString VARCHAR(255) = ''
DECLARE #xmlSTring VARCHAR(MAX) =
'<table style="width:100%" border="1">
<tr>
<th>ID</th>
<th>JSON_DATA</th>
</tr>
<tr>
<td>1</td>
<td>{"data":data,"users":["User1", "User2"]}</td>
</tr>
<tr>
<td>2</td>
<td>{"data":data2,"users":["User2", "User3"]}</td>
</tr>
</table>'
WHILE CHARINDEX('[', #xmlSTring) > 0
BEGIN
-- Find the next set of users, like ["User1", "User2"]
SELECT #ThisUserString =
SUBSTRING(
#xmlSTring
, /*start*/ CHARINDEX('[', #xmlSTring)
, /*length*/ CHARINDEX(']', #xmlSTring) - CHARINDEX('[', #xmlSTring) + 1
)
-- Add them to the list of all users, like "User1", "User2"
SELECT #FullUserString += ', ' +
SUBSTRING(
#xmlSTring
, /*start*/ CHARINDEX('[', #xmlSTring) + 1
, /*length*/ CHARINDEX(']', #xmlSTring) - CHARINDEX('[', #xmlSTring) - 1
)
-- And remove this set from the string so our WHILE loop will end sometime:
SET #xmlSTring = REPLACE(#xmlSTring, #ThisUserString, '')
END
SET #FullUserString = RIGHT(#FullUserString, LEN(#FullUserString) - 2) -- remove the initial comma
SELECT #FullUserString

HTML table output

I've got a XML file that contains this data:
<?xml version="1.0" encoding="iso-8859-1" ?>
<university>
<lecture>
<class>English</class>
<hours>3</hours>
<pupils>30</pupils>
</lecture>
<lecture>
<class>Math</class>
<hours>4</hours>
<pupils>27</pupils
</lecture>
<lecture>
<class>Science</class>
<hours>2</hours>
<pupils>25</pupils>
</lecture>
</university>
Is it possible if I can get an XQuery code that produces this output (below) for me?
<table>
<tr>
<td>English class runs for 3 with 30 pupils</td>
<td>Math class runs for 4 with 26 pupils</td>
<td>Science class runs for 2 with 25 pupils</td>
</tr>
</table>
EDIT: Below is my attempt:
let $classroom := doc("uni.xml")/university/lecture/class
let $hr := doc("uni.xml")/university/lecture/hours
let $pupl := doc("uni.xml")/university/lecture/pupils
<tr>
<table>
for $a in $classroom,
$b in $hr,
$c in $pupl
return <td>{$a} class runs for {$b} with {$c} pupils</td>
</tr>
</table>
I get an error saying I need to return a value, which I already did. If I take out the tr and table tags, it works, but gives me an endless loop result.
Responding to the updated XML, and a bit of your attempted XQuery style which using variables, I'd suggest something like this :
<table>
<tr>
{
for $lecture in doc("uni.xml")/university/lecture
let $class := $lecture/class/string()
let $hours := $lecture/hours/string()
let $pupils := $lecture/pupils/string()
return
<td>{$class} class runs for {$hours} with {$pupils} pupils</td>
}
</tr>
</table>
Demo : http://www.xpathtester.com/xquery/33e31b342098712e331285bda3010e0c
Basically, the query loop through common parent elements that contains all the needed information; the lecture elements. Then use simple relative XPath/XQuery to extract data from each lecture to produce the required output.

How to extract all rows with concatenated cells from a table using Xpath?

I have an html table:
<table class="info">
<tbody>
<tr><td class="name">Year</td><td>2011</td></tr>
<tr><td class="name">Storey</td><td>3</td></tr>
<tr><td class="name">Area</td><td>170</td></tr>
<tr><td class="name">Condition</td><td>Renovated</td></tr>
<tr><td class="name">Bathroom</td><td>2</td></tr>
</tbody>
</table>
In this table data is organized in such way that each row contains 2 cells enclosed in <td> tags. First cell contains information about data type. For example year of building of house. Second cell contains year information itself which is 2011.
I want to extract data in such way that data type and information are divided and corresponded to each other. I want to extract data type and information this way:
Year - 2011
Storey - 3
Area - 170
Condition - Renovated
Bathroom - 2
For now I am using Xpath's concatenation function concat. Here is my Xpath expression:
concat(//table[#class="info"]//tr//td[contains(#class, 'name')]/text() , ' - ', //table[#class="info"]//tr//td[not(contains(#class, 'name'))]/text())
This XPath returns this result:
Year - 2011
My table contains 5 rows. My Xpath expression returned only 1st row with concatenated cells.
But 2 Xpath expressions that I send to concat function individually return the normal result with all rows.
These are the 2 XPath expressions:
//table[#class="info"]//tr//td[contains(#class, 'name')]/text()
and
//table[#class="info"]//tr//td[not(contains(#class, 'name'))]/text()
Both of this expressions return all rows with required information. When I send this two expressions to concat function, it returns only the 1st row.
How to get all rows with concatenated cells using Xpath? I guess it is not possible using Xpath only. Do I have to do it with the help of some programming language such as PHP or may be new version of Xpath or some sophisticated expressions can help me in this case?
If you use java:
1 get a Dom document
2 loop
int i=1;
while (true)
{
if (xpath.compile("//tr["+i+"]").evaluate(document,XPathConstants.NODE) ==null) break;
expr = xpath.compile("concat (//tr["+i+"]/td[#class='name']/text(),' - ',//tr["+i+"]/td[not(#class='name')]/text())");
resX= (String) expr.evaluate(document, XPathConstants.STRING);
System.out.println(resX);
i++;
}
Another option:
get every tr
expression="//table[#class=\"info\"]//tr";
XPathExpression expr = xpath.compile(expression) ;
NodeList nodes = (NodeList) expr.evaluate(document, XPathConstants.NODESET);
and inside
for (int temp1 = 0; temp1 < nodes.getLength(); temp1++) {
Node nodeSegment = nodes.item(temp1);
if (nodeSegment.getNodeType() == Node.ELEMENT_NODE) {
...
expr = xpath.compile("concat (td[#class='name']/text(),' - ',td[not(#class='name')]/text())");
resX= (String) expr.evaluate(eElement, XPathConstants.STRING);
System.out.println(resX);

"0" to date- ssis, derived column

I have a flat file which coming as str (fixed delimeted) and consists in data like this: "20130505" or "0"(only one digit). My destination has to be a date. i make substring by derived column and transform the date to like this :
(2013-04-05):
ISNULL(FIELD_1519_Out) ? (DT_DBDATE)(SUBSTRING(FIELD_1519,1,4) + "-" +
SUBSTRING(FIELD_1519,5,2) + "-" + SUBSTRING(FIELD_1519,7,2)) : (DT_DBDATE)
(SUBSTRING(FIELD_1519_Out,1,4) + "-" + SUBSTRING(FIELD_1519_Out,5,2) + "-" +
SUBSTRING(FIELD_1519_Out,7,2)).
My question is how can i transform "0" to the requested length in order to convert it to date? Since there is the only one digit - "0"' i cannot continue with same logical substring.
Thanks
See the example below, this may help you. My source data is as shown below.The column [yourDate] is VARCHAR(8)
I developed an SSIS package as below. the output of data viewer is what the devived column gives.
Derived column content.
(DT_STR,8,1252)RIGHT(("0000000" + yourDate),8)
You use such a derived column in your DFT.I am still not sure, whether this is what you are looking for,