Parsing through text with incorrect attribute definition - html

While trying to parse an html document as XML (added xml start at the beginning) I've ran into a problem with attribute inside tags.
<tr>
<td class="yfnc_tabledata1" nowrap align="right">Jun 4, 2013</td>
<td class="yfnc_tabledata1" align="right">453.22</td>
<td class="yfnc_tabledata1" align="right">454.43</td>
<td class="yfnc_tabledata1" align="right">447.39</td>
<td class="yfnc_tabledata1" align="right">449.31</td>
<td class="yfnc_tabledata1" align="right">10,454,600</td>
<td class="yfnc_tabledata1" align="right">449.31</td>
</tr>
While normally it wouldn't matter (since my xslt code doesn't actually reference it), I am getting an error :
ERROR: 'Attribute name "nowrap" associated with an element type "td" must be followed by the ' = ' character.'
ERROR: 'com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: Attribute name "nowrap" associated with an element type "td" must be followed by the ' = ' character.'
So i was wondering if there's a way to make it suppress / ignore those errors. (Looking for a way of doing it that doesn't involve a separate parse through that would remove all nowrap first.)
(For reference, xml : http://pastebin.com/TLD4bZkq , xslt : http://pastebin.com/dPzDzeAX )

The data you're trying to process isn't XML, so the XML parser is right to produce an error.
Depending on what XSLT processor you're using and how you call it you might be able to use an HTML parser instead of an XML parser to parse your HTML into a DOM tree which you then pass to the XSLT processor, rather than having the processor parse the file itself.
But remember that XSLT expects namespace-well-formed XML and if the parser's output doesn't conform to this then you will have problems. For example, in Java (which is what I'm most familiar with), for a DOM Document to be usable by XSLT it must have been produced by a namespace-aware parser even if the document in question doesn't actually use any namespaces.

Related

How to select elements containing special characters in XPath?

I am trying to exclude three <td> elements from a result set:
<td>
πŸ₯‡
</td>
<td>
πŸ₯ˆ
</td>
<td>
πŸ₯‰
</td>
I've tried using:
td[not(contains(., 'πŸ₯ˆ'))]
For example, but the element I don't want still comes back...
In the xpath expression, you need to use the escape conventions of the host language. Using &-escaping is fine if the host is XSLT, but if it’s JavaScript, for example, you’ll need to use backslash escaping.
To avoid the labyrinth of escaping conventions, just use literal Unicode characters themselves, which can be searched and then copy-and-pasted from sites such as Compart:
Char Entity Ref
Literal Unicode
XPath
πŸ₯‡
πŸ₯‡
//td[not(contains(.,'πŸ₯‡'))]
πŸ₯ˆ
πŸ₯ˆ
//td[not(contains(.,'πŸ₯ˆ'))]
πŸ₯‰
πŸ₯‰
//td[not(contains(.,'πŸ₯‰'))]
Here's a single XPath 2.0+ expression that will select all td elements in the document except those consisting of only the targeted special characters:
//td[not(normalize-space() = ('πŸ₯‡', 'πŸ₯ˆ','πŸ₯‰'))]
In XPath 1.0, you'd have to write out the clauses separately:
//td[not(normalize-space() = 'πŸ₯‡') and
not(normalize-space() = 'πŸ₯ˆ') and
not(normalize-space() = 'πŸ₯‰')]
Rearrange via DeMorgan's per taste. Go back to contains() if you truly want to test via substring containment rather than string value equality.

Is there a way to have an Xquery in an XSLT stylesheet which will be executed upon transformation?

I have an XML file which I've been trying to transform both with xQuery and XSLT at the same moment.
The document basically encodes two different types of text according to TEI standards. The first part is a philological study which I have written about an epic poem, and the second part is a scholarly edition of said poem.
<text>
<front><!-- chapters of the study --></front>
<body>
<lg n="1">
<l n="1.a">first line of the poem</l>
<l n="1.a">second line with <distinct>interesting stuff</distinct></l></lg>
<!-- rest of the poem-->
</body></text>
My main goal is to transform this with XSLT into a nicely formatted html document, and for the most part it works.
Now, the study discusses data from the edition ("This interesting stuff occurs quite often in our poem, as is shown in the following table"). Since all the "interesting stuff" is marked up (see example above), I can easily create those tables using a combination of HTML and xQuery:
<table>
<tr>
<td>Verse Number</td>
<td>Interesting Stuff</td>
<tr>
for $case in doc("mydocument.xml")//distinct
return
<tr>
<td>{data($case/ancestor::l/#n)}</td>
<td>$case</td></tr></table>
The easy way at the moment would be to change the xQuery so it will create a TEI-conform xml table and copy that manually into the document. Then, the XSLT will work smoothly, just as it does for the few static tables that I have. But most of my tables should be dynamic, I want the numbers to change if I change something in the edition. This should be done every time a new reader opens the formatted text in the browser (i.e., each time the XSLT transformation is executed).
I tried combining the code as follows:
<xsl:template match="table[type='query']">
{ (: the xQuery-html instructions from above go here :) }
</xsl template>
I creates a table at the right place, but before it and in the cells it just repeats the xQuery instructions. I've been looking for similar questions, but I found only the reverse process, i.e. how to use xQuery to create XSLT (for example this: calling XQuery from XSLT, building XSLT dynamically in XQuery?), which does not help my problem.
Is there a way to combine the two codes?
Thanks in advance for your help!
There are various ways you can combine XSLT and XQuery. You can have XSLT tasks and XQuery tasks in the same pipeline, or you can invoke XQuery functions from XSLT (for example using load-xquery-module() in XSLT 3.0). But for the case you're describing, it's simplest to just replace the FLWOR expression with an equivalent xsl:for each:
<xsl:for-each select='doc("mydocument.xml")//distinct'>
<xsl:variable name="case" select="."/>
<tr>
<td>{$case/ancestor::l/#n}</td>
<td>{$case}</td>
</tr>
</xsl:for-each>
Note: XSLT 3.0 allows the curly-brace syntax (you need to specify expand-text="yes") but the semantics are slightly different from XQuery - it means "value-of" rather than "copy-of".

Angular 2 ngClass: Get Got interpolation ({{}}) where expression was expected when trying to display json on html

I'm new to Angular 2 and was hoping to get some help from the community. I'm currently trying to implement a dynamic/conditional implementation of ngClass in a <tr> element of my html view. The trufy used is a variable and its original value comes from a JSON object set on my Componenet:
<td [ngClass] = "{weak : {{jsonInMyComponenet.element}}, neutral : !{{jsonInMyComponenet.element}}}" ></td>
When I use the code above I get this error:
Got interpolation ({{}}) where expression was expected
If I remove the curly brackets I get no errors but the page doesn't render the element, so I can't see the class implementation of weak nor neutral. What am I'm doing wrong?
Don't use [...] and {{...}} together. Either the one or the other.
<td [ngClass] = "{'weak' : jsonInMyComponenet.element, 'neutral' : !jsonInMyComponenet.element}" ></td>
{{...}} is for string interpolation. [...] interprets the value as expression.

Special Characters in HTML Element

What I'm trying to do is output a percent sign (%) directly into a < td > tag. Below is my code:
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td class="item_container" %%=v(#Item_Container_Style)=%%>
...
When I test the XSL I get the following error:
SAXParseException: Expected an attribute name (Set_A_Custom.xsl, line 205, column 38)
So basically it's seeing "%%=v(#Item_Container_Style)=%%" as invalid HTML but I need this code to be there.
If you are wondering why I am doing this it is because I am writing the XSL to output HTML that contains AMPscript (An ExactTarget proprietary Scripting language). You don't need to know anything about AMPscript though to help me out though, I just need to output the percent sign (%) in the HTML and everything will work.
Any ideas? For the record I'm using XSL 1.0. Thanks all!
An XSLT stylesheet must itself be well-formed XML, so you can't include this kind of construct directly in the stylesheet. If the XSLT processor you're using supports disable-output-escaping then you would be able to do something like
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<xsl:text disable-output-escaping="yes"><![CDATA[<td class="item_container" %%=v(#Item_Container_Style)=%%>]]></xsl:text>
...
<xsl:text disable-output-escaping="yes"><![CDATA[</td>]]></xsl:text>
</tr>
</table>
If it does not allow disable-output-escaping then your only option is to use the text output method, and write all the tags you want to output as text with the angle brackets escaped (or in CDATA).
What I'm trying to do is output a percent sign (%) directly into a <td> tag.
Not possible with the "html" or "xml" output modes. XSLT has been designed to create syntactically sane HTML, you cannot make it do anything else.
Of course you could switch to the "text" output mode and do whatever you like, but generating HTML this way it a lot harder.
Alternatively you can use disable-output-escaping, if your XSLT processor supports it, but this will quickly degenerate your XSLT stylesheet into a mess if you need to do it in many places.
That being said, here's a proposal. In XSLT you use the "html" output mode and this:
<td
class="item_container"
amp-1="%%=v({#Item_Container_Style})%%"
amp-2="%%=v({#Some_Other_Element})%%"
>
some text %%=v(<xsl:value-of select="Other_Stuff" />)%% more text
</td>
That is syntactically valid XSLT which covers both cases (multiple placeholders in attributes, multiple placeholders in the text) and creates syntactically valid HTML:
<td
class="item_container"
amp-1="%%=v(item container style content)%%"
amp-2="%%=v(some other element content)%%"
>
Here some text %%=v(other stuff)%%
</td>
and then you use a post-processing step to convert that HTML into AMPscript:
Regex-replace \bamp-\d+="(%%[\s\S]*?%%)" with $1, which would result in
<td
class="item_container"
%%=v(item container style content)%%
%%=v(some other element content)%%
>
Here some text %%=v(other stuff)%%
</td>
Handling HTML with regular expressions is generally strongly dis-recommended, but this might just be a narrow-enough use case.
AMPScript appears to have a standards-based syntax as an alternative to its proprietary syntax:
Delimiter Comparison
The table below demonstrates the similarities between standard AMPscript delimiters and server-side delimiters.
Standard AMPscript Delimiter Tag-based AMPscript Delimiter
%%[ <script runat=server language=ampscript>
etc
Does this help you?

Powershell modifying HTML from ConvertTo-HTML

I have a script that generates an array of objects that I want to email out in HTML format. That part works fine. I am trying to modify the HTML string to make certain rows a different font color.
Part of the html string looks like this (2 rows only):
<tr>
<td>ABL - Branch5206 Daily OD Report</td>
<td>'\\CTB052\Shared_Files\FIS-BIC Reporting\Report Output Files\ABL\Operations\Daily\ABL - Branch5206 Daily OD Report.pdf'</td>
<td>13124</td>
<td>4/23/2013 8:05:34 AM</td>
<td>29134</td>
<td>0</td>
<td>Delivered</td>
</tr>
<tr>
<td>ABL - Branch5206 Daily OD Report</td>
<td>'\\CTB052\Shared_Files\FIS-BIC Reporting\Report Output Files\ABL\Operations\Daily\ABL - Branch5206 Daily OD Report.xls'</td>
<td>15716</td>
<td>4/23/2013 8:05:34 AM</td>
<td>29134</td>
<td>0</td>
<td>Delivered</td>
</tr>
I tried regex to add a font color to the beginning and end of the rows where the row ends with "Delivered":
$email = [regex]::Replace($email, "<tr><td>(.*?)Delivered</td></tr>", '<tr><font color = green><td>$1Delivered</td></font></tr>')
This didn't work (I am not sure if you can set font color for a whole row like that).
Any ideas on how to do this easily/efficiently? I have to do it on several different statuses (like Delivered)
Disclaimer: HTML cannot be parsed by regular expression parser. A regular expression will NOT provide a general solution to this problem. If your HTML structure is well known and you don't have any other <tr></tr> elements, though, the following might work. On that note, though, is there some reason you can't modify the HTML generation to do this then instead of waiting until the HTML is already generated?
Try this command:
PS > $email = $email -replace '(?s)<tr>(.*?)<td>Delivered</td>(.*?)</tr>','<tr style="color: #FF0000">$1<td>Delivered</td>$2</tr>'
The first string is the pattern. The (?s) tells the parser to allow . to accept newlines; this is called "single line" mode. Then it grabs a <tr> element that contains the string <td>Delivered</td>. The two capture groups grab everything else in the <tr> element around the <td>Delivered</td> string. Take note of the question marks following the *s. * by itself is greedy and matches as much text as possible; *? matches as little text as possible. If we just used * here, it would treat your entire string as one match and only replace the first <tr>.
The second string is the replacement. It plops the <tr> element and its contents back in place with an added style attribute, and all without back ref.
One other minor note is the quoting. I tend toward single quotes anyway, but in this case, you're likely to have double quotes in the replacement string. So single quotes are probably the way to go.
As for how you could do this for different statuses, regular expressions really aren't designed for conditional content like that; it's like trying to use a screwdriver as a drill. You can hard code several replaces or loop over status/color pairs and build your pattern and replace strings from them. A full blown HTML parser would be more efficient if you can find one for .NET; you might try to get away with an XML parser if you can guarantee it's valid XML. Or, going back to my question at the beginning, you could modify the HTML generation. If your e-mails are few in number, though, this may not be a bottleneck worth addressing. Development time spent is also costly. See if it's fast enough and try a different route if not.
Credit where it's due: I took the HTML style attribute from #FrankieTheKneeMan.