Byte length of a string in xslt - html

Is there any xslt function to retrieve the byte length of a string.
For. e.g: i ♥ u
Character length obtained by string-length = 5
Byte length which I need = 7 bytes.

Assuming there is support for the EXPath binary module then you can use bin:length(bin:encode-string('i ♥ u')), as in
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
xmlns:bin="http://expath.org/ns/binary">
<xsl:template name="main" match="/">
<xsl:value-of select="for $enc in ('UTF-8', 'UTF-16') return bin:length(bin:encode-string('i ♥ u', $enc))"/>
</xsl:template>
</xsl:transform>

You could also play some tricks with iri-to-uri().
Try this:
Apply iri-to-uri() to the string
Convert any %xx sequences in the result to a single ASCII character using the replace() function
The length of the resulting string is the number of bytes in the UTF-8 representation of the original string.
For example string-length(replace(iri-to-uri('§'), '%..', '%')) => 2
Also tested on your example.

And here's another approach (again assuming UTF-8 encoding):
sum(for $c in string-to-codepoints($in)
return (1 + number($c>127) + number($c>2047) + number($c>65535)))

Related

How to make a single-quoted string act like a double-quoted string in Ruby?

I have a file that have an HTMl code, the HTML tags are encoded like the following content:
\x3cdiv data-name\x3d\x22region-name\x22 class\x3d\x22main-id\x22\x3eUK\x3c/div\x3e
The decoded HTML should be:
<div data-name="region-name" class="main-id">UK</div>
In Ruby, I used cgi library to unescapeHTML however it does not work because when it read the content it does not identify the encoded tags, here is another example:
require 'cgi'
single_quoted_string = '\x3cdiv data-name\x3d\x22region-name\x22 class\x3d\x22main-id\x22\x3eUK\x3c/div\x3e'
double_quoted_string = "\x3cdiv data-name\x3d\x22region-name\x22 class\x3d\x22main-id\x22\x3eUK\x3c/div\x3e"
puts 'unescape single_quoted_string ' + CGI.unescapeHTML(single_quoted_string)
puts 'unescape double_quoted_string ' + CGI.unescapeHTML(double_quoted_string)
The output of the previous code is:
unescape single_quoted_string \x3cdiv data-name\x3d\x22region-name\x22 class\x3d\x22main-id\x22\x3eUK\x3c/div\x3e
unescape double_quoted_string <div data-name="region-name" class="main-id">UK</div>
My question is, how can I make the single_quoted_string act as if its content is double-quoted to make the function understand the encoded tags?
Thanks
Ruby's parser allows certain escape sequences in string literals.
The double-quoted string literal "\x3c" is recognized as containing a hexadecimal pattern \xnn which represents the single character <. (0x3C in ASCII)
The single-quoted string literal '\x3c' however is treated literally, i.e. it represents four characters: \, x, 3, and c.
how can I make the single_quoted_string act as if its content is double-quoted
You can't. In order to turn these four characters into < you have to parse the string yourself:
str = '\x3c'
str[2, 2] #=> "3c" take hex part
str[2, 2].hex #=> 60 convert to number
str[2, 2].hex.chr #=> "<" convert to character
You can apply this to gsub:
str = '\x3cdiv data-name\x3d\x22region-name\x22 class\x3d\x22main-id\x22\x3eUK\x3c/div\x3e'
str.gsub(/\\x\h{2}/) { |m| m[2, 2].hex.chr }
#=> "<div data-name=\"region-name\" class=\"main-id\">UK</div>"
/\\x\h{2}/ matches a literal backslash (\\) followed by x and two ({2}) hex characters (\h).
Just for reference, a CGI encoded string would look like this:
str = "<div data-name=\"region-name\" class=\"main-id\">UK</div>"
CGI.escapeHTML(str)
#=> "<div data-name="region-name" class="main-id">UK</div>"
It uses &...; style character references.
Your problem has nothing to do with HTML, \x3c represent the hex number '3c' in the ascii table.
Double-quoted strings look for this patterns and convert them to the desired value, single-quoted strings treat it the final outcome.
You can check for yourself that CGI is not doing anything.
CGI.unescapeHTML(double_quoted_string) == double_quoted_string
The easiest way I know to solve your problem is through gsub
def convert(str)
str.gsub(/\\x(\w\w)/) do
[Regexp.last_match(1)].pack("H*")
end
end
single_quoted_string = '\x3cdiv data-name\x3d\x22region-name\x22 class\x3d\x22main-id\x22\x3eUK\x3c/div\x3e'
puts convert(single_quoted_string)
What convert does is to get every pair of hex escaped values and pack them as characters.

Customize JSON created by CL_SXML_STRING_WRITER

I create JSON like this to extract any table (name "randomly" decided at runtime, its name is in variable iv_table_name):
FIELD-SYMBOLS <itab> TYPE STANDARD TABLE.
DATA ref_itab TYPE REF TO data.
DATA(iv_table_name) = 'SCARR'.
CREATE DATA ref_itab TYPE STANDARD TABLE OF (iv_table_name).
ASSIGN ref_itab->* TO <itab>.
SELECT *
INTO TABLE <itab>
FROM (iv_table_name).
DATA results_json TYPE TABLE OF string.
DATA sub_json TYPE string.
DATA(lo_json_writer) = cl_sxml_string_writer=>create( type = if_sxml=>co_xt_json ).
CALL TRANSFORMATION id
SOURCE result = <itab>
RESULT XML lo_json_writer.
cl_abap_conv_in_ce=>create( )->convert(
EXPORTING
input = lo_json_writer->get_output( )
IMPORTING
data = sub_json ).
The result variable sub_json looks like this:
{"RESULT":
[
{"MANDT":"220","AUFNR":"0000012", ...},
{"MANDT":"220","AUFNR":"0000013", ...},
...
]
}
Is there a way to avoid the surrounding dictionary and get the result like this?
[
{"MANDT":"220","AUFNR":"0000012", ...},
{"MANDT":"220","AUFNR":"0000013", ...},
...
]
Background:
I used this:
sub_json = /ui2/cl_json=>serialize( data = <lt_result> pretty_name = /ui2/cl_json=>pretty_mode-low_case ).
But the performance of /ui2/cl_json=>serialize( ) is not good.
If you really want to use it just as a tool for extracting table records then you could write your own ID transformation in STRANS. It could look like that, let us name it Z_JSON_TABLE_CONTENTS (create it with type XSLT):
<xsl:transform version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:sap="http://www.sap.com/sapxsl"
>
<xsl:output method="text" encoding="UTF-8" />
<xsl:strip-space elements="*"/>
<xsl:template match="RESULT">
[
<xsl:for-each select="*">
{
<xsl:for-each select="*">
"<xsl:value-of select="local-name()" />": "<xsl:value-of select="text()" />"<xsl:if test="position() != last()">,</xsl:if>
</xsl:for-each>
}<xsl:if test="position() != last()">,</xsl:if>
</xsl:for-each>
]
</xsl:template>
</xsl:transform>
Then you could use it like that.
REPORT ZZZ.
FIELD-SYMBOLS <itab> TYPE STANDARD TABLE.
DATA ref_itab TYPE REF TO data.
DATA(iv_table_name) = 'SCARR'.
CREATE DATA ref_itab TYPE STANDARD TABLE OF (iv_table_name).
ASSIGN ref_itab->* TO <itab>.
SELECT *
INTO TABLE <itab>
FROM (iv_table_name).
DATA results_json TYPE TABLE OF string.
DATA sub_json TYPE string.
DATA g_string TYPE string.
DATA(g_document) = cl_ixml=>create( )->create_document( ).
DATA(g_ref_stream_factory) = cl_ixml=>create( )->create_stream_factory( ).
DATA(g_ostream) = g_ref_stream_factory->create_ostream_cstring( g_string ).
CALL TRANSFORMATION Z_JSON_TABLE_CONTENTS
SOURCE result = <itab>
RESULT XML g_ostream.
DATA(g_json_parser) = new /ui5/cl_json_parser( ).
g_json_parser->parse( g_string ).
I've got no answer whether it's possible to omit the initial "RESULT" tag in full sXML, but my opinion is NO.
Now, there's the solution with the KISS principle :
REPLACE ALL OCCURRENCES OF REGEX '^\{"RESULT":|\}$' IN sub_json WITH ``.
There's also this other writing (slightly slower):
sub_json = replace( val = sub_json regex = '^\{"RESULT":|\}$' with = `` occ = 0 ).
ADDENDUM about performance:
I measured that for a string of 880K characters, the following code with the exact number of positions to remove (10 leading characters and 1 trailing character) is 6 times faster than regex (could vary based on version of ABAP kernel), but maybe it won't be noticeable compared to the rest of the program:
SHIFT sub_json LEFT BY 10 PLACES CIRCULAR.
REPLACE SECTION OFFSET strlen( sub_json ) - 11 OF sub_json WITH ``.
Just a bit of manual work and voila!
DATA(writer) = CAST if_sxml_writer( cl_sxml_string_writer=>create( type = if_sxml=>co_xt_json ) ).
DATA(components) =
CAST cl_abap_structdescr( cl_abap_typedescr=>describe_by_name( iv_table_name ) )->components.
writer->open_element( name = 'object' ).
LOOP AT <itab> ASSIGNING FIELD-SYMBOL(<line>).
LOOP AT components ASSIGNING FIELD-SYMBOL(<fs_comp>).
ASSIGN COMPONENT <fs_comp>-name OF STRUCTURE <line> TO FIELD-SYMBOL(<fs_val>).
writer->open_element( name = 'str' ).
writer->write_attribute( name = 'name' value = CONV string( <fs_comp>-name ) ).
writer->write_value( CONV string( <fs_val> ) ).
writer->close_element( ).
ENDLOOP.
ENDLOOP.
writer->close_element( ).
DATA(xml_json) = CAST cl_sxml_string_writer( writer )->get_output( ).
sub_json = cl_abap_codepage=>convert_from( source = xml_json codepage = `UTF-8` ).
No surrounding list and no dictionary. If you wanna each line in separate dictionary it is easily adjustable.
If you use ID call transformation, then what ever node you give at transformation that node will be added by default. We cannot skip this but you can remove following way..
Replace: Using Regex or Direct word with Replace First Occurrence statement and next last closing brace }. The way you did.
FIND: You can simple use this below statement
FIND REGEX '(\[.*\])' in sub_json SUBMATCHES sub_json.

Powershell - xml

I have an input XML file which contains normal HTML names for various characters e.g. Double Quote = " etc.
<Notes>Double Quote " Single Quote &pos; Ampersand &</Notes>
Before
<?xml version="1.0" encoding="UTF-8"?>
<OrganisationUnits>
<OrganisationUnitsRow num="8">
<OrganisationId>ACME24/7HOME</OrganisationId>
<OrganisationName>ACME LTD</OrganisationName>
<Notes>Double Quote " Single Quote &pos; Ampersand & </Notes>
<Sector>P</Sector>
<SectorDesc>Private Private & Voluntary</SectorDesc>
</OrganisationUnitsRow>
</OrganisationUnits>
After
<?xml version="1.0" encoding="UTF-8"?>
<OrganisationUnits>
<OrganisationUnitsRow num="8">
<OrganisationId>ACME24/7HOME</OrganisationId>
<OrganisationName>ACME LTD</OrganisationName>
<Notes>Double Quote " Single Quote ' Ampersand &</Notes>
<Sector>P</Sector>
<SectorDesc>Private Private & Voluntary</SectorDesc>
</OrganisationUnitsRow>
</OrganisationUnits>
I am treating the file as XML and it gets processed OK, nothing very fancy.
$xml = [xml](Get-Content $path\$File)
foreach ($CMCAddressesRow in $xml.OrganisationUnits.OrganisationUnitsRow) {
blah
blah
}
$xml.Save("$path\$File")
When the output is saved all the HTML codes like " get replaced by ".
How can I retain the original HTML " characters? And more importantly why is it happening.
What you're referring to is called "character entities". PowerShell converts them on import, so you can work with the actual characters these entities represent, and converts on export only what must be encoded in the XML file. Quotation characters don't need to be encoded in a node value, so they're not being encoded on export.

how to escape xml space in service?

hi now i m working with XSLT and now i call to a service
as like this
<xsl:for-each select="ext">
<config type="2" liveserver="XXX.com" localserver="XXX.com" httpuri="/myservices/jsonrequesthomenew?companyid=homepage&outputtype=xml" params="" readtimeout="10000"/>
</xsl:for-each>
if i cehck to this live url than show to this result
<?xml version="1.0" encoding="UTF-8"?>
<indexes>
<data>
<sensex>
<CloseIndexValue>25719.58</CloseIndexValue>
<trend>equal</trend>
<premarket>false</premarket>
<DateTime>03:53 PM | 10 Sep 2015</DateTime>
<CurrentIndexValue>25622.17</CurrentIndexValue>
<Segment>BSE</Segment>
<OpenIndexValue>25522.96</OpenIndexValue>
<IndexName>SENSEX</IndexName>
<PercentChange>-0.38</PercentChange>
<NetChange>-97.41</NetChange>
</sensex>
<nifty>
<CloseIndexValue>7818.60</CloseIndexValue>
<trend>equal</trend>
<premarket>false</premarket>
<DateTime>03:53 PM | 10 Sep 2015</DateTime>
<CurrentIndexValue>7788.10</CurrentIndexValue>
<Segment>NSE</Segment>
<OpenIndexValue>7729.05</OpenIndexValue>
<IndexName>CNX NIFTY</IndexName>
<PercentChange>-0.39</PercentChange>
<NetChange>-30.50</NetChange>
</nifty>
<USD>
<DateTime>2015-09-10 15:48:06.0</DateTime>
<netChange>0.05</netChange>
<percentChange>0.08</percentChange>
<name>USD/INR</name>
<bidprice>66.47</bidprice>
</USD>
<silver>
<ClosePrice>35294.00</ClosePrice>
<trend>negative</trend>
<OpenPrice>35391.00</OpenPrice>
<ExpiryDate>2015-12-04</ExpiryDate>
<SpotSymbol>SSILVERAHM</SpotSymbol>
<LastTradedPrice>35475.00</LastTradedPrice>
<DateTime>10-September-2015 15:46:32</DateTime>
<Symbol>SILVER</Symbol>
<PercentChange>0.51</PercentChange>
<CommodityName>Silver</CommodityName>
<NetChange>181.00</NetChange>
<SpotPrice>34912.0</SpotPrice>
<PriceQuotationUnit>1 KGS </PriceQuotationUnit>
</silver>
<marketstatus>
<currentMarketStatus>Live</currentMarketStatus>
</marketstatus>
<gold>
<ClosePrice>26057.00</ClosePrice>
<trend>positive</trend>
<OpenPrice>26143.00</OpenPrice>
<ExpiryDate>2015-10-05</ExpiryDate>
<SpotSymbol>SGOLDAHM</SpotSymbol>
<LastTradedPrice>26067.00</LastTradedPrice>
<DateTime>10-September-2015 15:46:15</DateTime>
<Symbol>GOLD</Symbol>
<PercentChange>0.04</PercentChange>
<CommodityName>Gold</CommodityName>
<NetChange>10.00</NetChange>
<SpotPrice>26003.0</SpotPrice>
<PriceQuotationUnit>10 GRMS </PriceQuotationUnit>
</gold>
<DXY Index>
<DateTime>2015-09-10 15:49:21.0</DateTime>
<netChange>0.1</netChange>
<percentChange>0.1</percentChange>
<name>DXY Index</name>
<bidprice>96.11</bidprice>
</DXY Index>
</data>
</indexes>
but this service is not call to my xslt file why can u please help me .
i check to this xml data validate in thsi url http://www.xmlvalidation.com/index.php?id=1&L=0
than show to this error
Click on to jump to the error. In the document, you can point at with your mouse to see the error message.
Errors in the XML document:
1: 1931 Attribute name "Index" associated with an element type "DXY" must be followed by the ' = ' character.
How to resolve this error in front end .
The error you are getting is because of this element in your XML
<DXY Index>
<DateTime>2015-09-10 15:49:21.0</DateTime>
<netChange>0.1</netChange>
<percentChange>0.1</percentChange>
<name>DXY Index</name>
<bidprice>96.11</bidprice>
</DXY Index>
DXY Index is not a valid element name, as you can't have spaces in element names.
The XML needs to be corrected, so the name is something DXYIndex or DXY-Index, although the name you actually use will depend on what the XML is actually used for.

How to iterate through DOM elements that match a css class using xpath?

I'm processing an HTML page with a variable number of p elements with a css class "myclass", using Python + Selenium RC.
When I try to select each node with this xpath:
//p[#class='myclass'][n]
(with n a natural number)
I get only the first p element with this css class for every n, unlike the situation if I iterate through selecting ALL p elements with:
//p[n]
Is there any way I can iterate through elements by css class using xpath?
XPath 1.0 doesn't provide an iterating construct.
Iteration can be performed on the selected node-set in the language that is hosting XPath.
Examples:
In XSLT 1.0:
<xsl:for-each select="someExpressionSelectingNodes">
<!-- Do something with the current node -->
</xsl:for-each>
In C#:
using System;
using System.IO;
using System.Xml;
public class Sample {
public static void Main() {
XmlDocument doc = new XmlDocument();
doc.Load("booksort.xml");
XmlNodeList nodeList;
XmlNode root = doc.DocumentElement;
nodeList=root.SelectNodes("descendant::book[author/last-name='Austen']");
//Change the price on the books.
foreach (XmlNode book in nodeList)
{
book.LastChild.InnerText="15.95";
}
Console.WriteLine("Display the modified XML document....");
doc.Save(Console.Out);
}
}
XPath 2.0 has its own iteration construct:
for $varname1 in someExpression1,
$varname2 in someExpression2,
. . . . . . . . . . .
$varnameN in someExpressionN
return
SomeExpressionUsingTheVarsAbove
Now that I look again at this question, I think the real problem is not in iterating, but in using //.
This is a FAQ:
//p[#class='myclass'][1]
selects every p element that has a class attribute with value "myclass" and that is the first such child of its parent. Therefore this expression may select many p elements, none of which is really the first such p element in the document.
When we want to get the first p element in the document that satisfies the above predicate, one correct expression is:
(//p)[#class='myclass'][1]
Remember: The [] operator has a higher priority (precedence) than the // abbreviation.
WHanever you need to index the nodes selected by //, always put the expression to be indexed in brackets.
Here is a demonstration:
<nums>
<a>
<n x="1"/>
<n x="2"/>
<n x="3"/>
<n x="4"/>
</a>
<b>
<n x="5"/>
<n x="6"/>
<n x="7"/>
<n x="8"/>
</b>
</nums>
The XPath expression:
//n[#x mod 2 = 0][1]
selects the following two nodes:
<n x="2" />
<n x="6" />
The XPath expression:
(//n)[#x mod 2 = 0][1]
selects exactly the first n element in the document with the wanted property:
<n x="2" />
Try this first with the following transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:copy-of select="//n[#x mod 2 = 0][1]"/>
</xsl:template>
</xsl:stylesheet>
and the result is two nodes.
<n x="2" />
<n x="6" />
Now, change the XPath expression as below and try again:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:copy-of select="(//n)[#x mod 2 = 0][1]"/>
</xsl:template>
</xsl:stylesheet>
and the result is what we really wanted -- the first such n element in the document:
<n x="2" />
Maybe all your divs with this class are at the same level, so by //p[#class='myclass'] you receive the array of paragraphs with the specified class. So you should iterate through it using indexes, i.e.
//p[#class='myclass'][1], //p[#class='myclass'][2],...,//p[#class='myclass'][last()]
I don't think you're using the "index" for it's real purpose. The //p[selection][index] syntax in this selection is actually telling you which element within its parent it should be... So //p[selection][1] is saying that your selected p must be the first child of its parent. //p[selection][2] is saying it must be the 2nd child. Depending on your html, it's likely this isn't what you want.
Given that you're using Selenium and Python, there's a couple ways to do what you want, and you can look at this question to see them (there are two options given there, one in selenium Javascript, the other using the server-side selenium calls).
Here's a C# code snippet that might help you out.
The key here is the Selenium function GetXpathCount(). It should return the number of occurrences of the Xpath expression you are looking for.
You can enter //p[#class='myclass'] in XPather or any other Xpath analysis tool so you can indeed verify multiple results are returned. Then you just iterate through the results in your code.
In my case, it was all the list items in an UL that needed to be iterated -i.e. //li[#class='myclass']/ul/li - so based on your requirements should be something like:
int numProductsInLeftNav = Convert.ToInt32(selenium.GetXpathCount("//p[#class='myclass']"));
List<string> productsInLeftNav = new List<string>();
for (int i = 1; i <= numProductsInLogOutLeftNav; i++) {
string productName = selenium.GetText("//p[#class='myclass'][" + i + "]");
productsInLogoutLeftNav.Add(productName);
}