tcl XML parsing error - tcl

I am trying to parse the XML file using dom package, but here is the error which I got:
unterminatedattribute {invalid attribute list around line 4}
Here is the simple test:
package require dom;
set XML "
<Top>
<Name name='name' />
<Group number=1>
<Member name='name1' test='test1' l=100/>
</Group>
</Top>"
set doc [::dom::parse $XML]
set root [$doc cget -documentElement]
set node [$root cget -firstChild]
puts "[$node cget -nodeValue]"

That “XML” is actually formally invalid; all attribute values must be quoted. If you can, fix that.
set XML "
<Top>
<Name name='name' />
<Group number='1'>
<Member name='name1' test='test1' l='100'/>
</Group>
</Top>"
If you can't fix that, you might try using tDOM instead in HTML mode (which is a lot laxer about well-formedness constraints, though it also lower-cases all element and attribute names). Mind you, even with that it still fails on your particular input document:
% package require tdom
0.8.3
% set doc [dom parse -html $XML]
error "Unterminated element 'group' (within 'member')" at position 114
">
<group number=1>
<member name='name1' test='test1' l=100/>
</group> <--Error--
</Top>"
Fixing your document is the #1 thing to do!

The problem is that you have to enclose the element values with " or '. After fixing your XML the parsing was successful.
I usually don't use the dom package, instead I use the tdom package.
The tdom package has a -html option that enables loose parsing.

Related

How to create a "Line Feed (LF)" in a XML and XSL files [duplicate]

I have an XML file and I would like to make a new line in the text
"Sample Text 123" like this
Sample
Text 123
I've tried already everything I mean &#xA &#xD \n but nothing works:
<?xml version="1.0" encoding="UTF-8" ?>
<item>
<text>Address</text>
<data>
Sample
Text 123
</data>
</item>
A newline (aka line break or end-of-line, EOL) is special character or character sequence that marks the end of a line of text. The exact codes used vary across operating systems:
Operating System
End-of-Line (EOL) marker
Unix
LF
Mac OS up to version 9
CR
Windows, DOS
CR+LF
You can use
for line feed (LF) or 
 for carriage return (CR), and an XML parser will replace it with the respective character when handing off the parsed text to an application. These can be added manually, as you show in your example, but are particularly convenient when needing to add newlines programmatically within a string:
Common programming languages:
LF: "
"
CR: "
"
XSLT:
LF: <xsl:text>
</xsl:text>
CR: <xsl:text>
</xsl:text>
Or, if you want to see it in the XML immediately, simply put it in literally:
<?xml version="1.0" encoding="UTF-8" ?>
<item>
<text>Address</text>
<data>
Sample
Text 123
</data>
</item>
Newline still not showing up?
Keep in mind that how an application interprets text, including newlines, is up to it. If you find that your newlines are being ignored, it might be that the application automatically runs together text separated by newlines.
HTML browsers, for example, will ignore newlines (and will normalize space within text such that multiple spaces are consolidated). To break lines in HTML,
use <br/>; or
wrap block in an element such as a div or p which by default causes a line break after the enclosed text, or in an element such as pre which by default typically will preserve whitespace and line breaks; or
use CSS styling such as white-space to control newline rendering.
XML application not cooperating?
If an XML application isn't respecting your newlines, and working within the application's processing model isn't helping, another possible recourse is to use CDATA to tell the XML parser not to parse the text containing
the newline.
<?xml version="1.0" encoding="UTF-8" ?>
<item>
<text>Address</text>
<data>
<![CDATA[Sample
Text 123]]>
</data>
</item>
or, if HTML markup is recognized downstream:
<?xml version="1.0" encoding="UTF-8" ?>
<item>
<text>Address</text>
<data>
<![CDATA[Sample <br/>
Text 123]]>
</data>
</item>
Whether this helps will depend upon application-defined semantics of one or more stages in the pipeline of XML processing that the XML passes through.
Bottom line
A newline (aka line break or end-of-line, EOL) can be added much like any character in XML, but be mindful of
differing OS conventions
differing XML application semantics

BIML Annotation Tag not matching

I'm dumping the schema from the table into a Tag Annotation for the package.
<Annotation AnnotationType="Tag" Tag="PackageSchema">
<#=Table.Schema#>
</Annotation>
In the BIML for creating a master package I'm creating a sequence container for each schema and putting the packages in the corresponding container. At least that's what I'm asking it to do.
<Package Name="01-Master" ConstraintMode="Linear">
<Tasks>
<# foreach (var SchemaNode in RootNode.Schemas) { #>
<Container Name="SEQC <#=SchemaNode.Name#>" ConstraintMode = "Parallel">
<Tasks>
<# foreach (var Pckg in RootNode.Packages.Where(pkgschema => pkgschema.GetTag("PackageSchema")==SchemaNode.Name)) { #>
<ExecutePackage Name="EP <#=Pckg.Name#>" DelayValidation="true">
<ExternalProjectPackage Package="<#=Pckg.Name#>.dtsx">
</ExternalProjectPackage>
</ExecutePackage>
<# } #>
</Tasks>
</Container>
<# } #>
</Tasks>
When that runs I get a Master package with empty sequence containers. I took the where out of the package foreach, and it generates but puts all packages in every container. I put the GetTag in the name of the package just to make sure it picked it up correctly.
<# foreach (var Pckg in RootNode.Packages) { #>
<ExecutePackage Name="EP <#=Pckg.Name#>" DelayValidation="true">
<ExternalProjectPackage Package="<#=Pckg.Name#>.dtsx--<#=Pckg.GetTag("PackageSchema")#>--<#=SchemaNode.Name#>">
The tag was put into the package name but it is padded with lots of space around it.
<ExecutePackage Name="EP Application_TransactionTypes" DelayValidation="true">
<ExternalProjectPackage Package="Application_TransactionTypes.dtsx-- Application --Application" />
</ExecutePackage>
<ExecutePackage Name="EP Purchasing_PurchaseOrderLines" DelayValidation="true">
<ExternalProjectPackage Package="Purchasing_PurchaseOrderLines.dtsx-- Purchasing --Application" />
</ExecutePackage>
So I'm guessing that the padded value is why the RootNode.Packages.Where is not matching up to the schema name. I can't figure out how to trim off the spaces though. I tried putting a trim() in different places but the BIML engine complains about it. I was able to get rid of the leading spaces by taking the tabs out in front of the actual annotation in the BIML but it still pads the end.
Any ideas on why the tag is getting padded, or maybe I'm completely off base here and it's not the spaces around the tag.
This is one of those "rock and a hard place" situations. In a much earlier version, we actually automatically trimmed annotation tag values to remove the leading and trailing whitespace. This caused issues for users in scenarios where they really needed that whitespace.
There are a few workarounds for this:
As Bill pointed out, just delete the whitespace in your BimlScript.
If you really want the whitespace in the BimlScript for readability, wrap the value in a CDATA block so that the newlines outside of the CDATA block are ignored. The syntax for this would be:
<Annotation AnnotationType="Tag" Tag="PackageSchema">
<![CDATA[<#=table.Schema#>]]>
</Annotation>
Alternatively, if you want to keep the whitespace for readability and don't like CDATA, you can trim the whitespace from the annotation value at the point of use. The syntax for this would be:
<#=Pckg.GetTag("PackageSchema").Trim()#>
This is one of the ugly places where Biml/XML formatting is biting you in the backside
<Annotation AnnotationType="Tag" Tag="PackageSchema">
<#=Table.Schema#>
</Annotation>
If you change that definition to the following, does everything "magically" work?
<Annotation AnnotationType="Tag" Tag="PackageSchema"><#=Table.Schema#></Annotation>
I assume so because I ran into a similar issue with package parameters...

Using SQL Server xml.modify on an xm document with escaped xml

I want to make modifications to an XML document using SQL Server's XML.modify. My problem is my XML document uses some escaped XML so "<" are appearing as "<" and ">" is appearing as ">". I want to know if it would be possible to set the value of an element that is surrounded by escaped XML. An example of what I'm dealing with is below:
Declare #myDoc as xml;
Set #myDoc = '<Root>
<ProductDescription ProductID="1" ProductName="Road Bike">
<Features>
<BikeLight>False</BikeLight>
<BikeHorn>True</BikeHorn>
</Features>
</ProductDescription>
</Root>' ;
I know I can edit the value of the BikeLight element by using
set #myDoc.modify('replace value of (/Root/ProductDescription/Features/BikeLight/text())[1] with "True"')
but trying to do the same with BikeHorn only returns the XML document, unmodified. Is it possible to modify the value of elements surrounded by escaped XML? Any help would be appreciated, thanks. Also, just to note that in my actual code all elements under Features would be surrounded by escaped XML.
The problem is you don't have a node called <BikeHorn>, you have a complex node <Features> which contains some text of its own in addition to a <BikeLight> child node. So you need to modify <Features> to change the value of BikeHorn:
set #myDoc.modify('
replace value of (/Root/ProductDescription/Features/text())[1]
with "<BikeHorn>False</BikeHorn>"')

Set TextView text from html-formatted string resource in XML

I have some fixed strings inside my strings.xml, something like:
<resources>
<string name="somestring">
<B>Title</B><BR/>
Content
</string>
</resources>
and in my layout I've got a TextView which I'd like to fill with the html-formatted string.
<TextView android:id="#+id/formattedtext"
android:layout_width="fill_parent"
android:layout_height="wrap_content"
android:text="#string/htmlstring"/>
if I do this, the content of formattedtext is just the content of somestring stripped of any html tags and thus unformatted.
I know that it is possible to set the formatted text programmatically with
.setText(Html.fromHtml(somestring));
because I use this in other parts of my program where it is working as expected.
To call this function I need an Activity, but at the moment my layout is just a simple more or less static view in plain XML and I'd prefer to leave it that way, to save me from the overhead of creating an Activity just to set some text.
Am I overlooking something obvious? Is it not possible at all? Any help or workarounds welcome!
Edit: Just tried some things and it seems that HTML formatting in xml has some restraints:
tags must be written lowercase
some tags which are mentioned here do not work, e.g. <br/> (it's possible to use \n instead)
Just in case anybody finds this, there's a nicer alternative that's not documented (I tripped over it after searching for hours, and finally found it in the bug list for the Android SDK itself). You CAN include raw HTML in strings.xml, as long as you wrap it in
<![CDATA[ ...raw html... ]]>
Edge Cases:
Characters like apostrophe ('), double-quote ("), and ampersand (&) only need to be escaped if you want them to appear in the rendered text AS themselves, but they COULD be plausibly interpreted as HTML.
' and " can be represented as\' and \", or &apos; and ".
< and > always need to be escaped as < and > if you literally want them to render as '<' and '>' in the text.
Ampersand (&) is a little more complicated.
Ampersand followed by whitespace renders as ampersand.
Ampersand followed by one or more characters that don't form a valid HTML entity code render as Ampersand followed by those characters. So... &qqq; renders as &qqq;, but <1 renders as <1.
Example:
<string name="nice_html">
<![CDATA[
<p>This is a html-formatted \"string\" with <b>bold</b> and <i>italic</i> text</p>
<p>This is another paragraph from the same \'string\'.</p>
<p>To be clear, 0 < 1, & 10 > 1<p>
]]>
</string>
Then, in your code:
TextView foo = (TextView)findViewById(R.id.foo);
foo.setText(Html.fromHtml(getString(R.string.nice_html), FROM_HTML_MODE_LEGACY));
IMHO, this is several orders of magnitude nicer to work with :-)
August 2021 update: My original answer used Html.fromHtml(String), which was deprecated in API 24. The alternative fromHtml(String,int) form is suggested as its replacement.
FROM_HTML_MODE_LEGACY is likely to work... but one of the other flags might be a better choice for what you want to do.
On a final note, if you'd prefer to render Android Spanned text suitable for use in a TextView using Markdown syntax instead of HTML, there are now multiple thirdparty libraries to make it easy including https://noties.io/Markwon.
As the top answer here is suggesting something wrong (or at least too complicated), I feel this should be updated, although the question is quite old:
When using String resources in Android, you just have to call getString(...) from Java code or use android:text="#string/..." in your layout XML.
Even if you want to use HTML markup in your Strings, you don't have to change a lot:
The only characters that you need to escape in your String resources are:
double quotation mark: " becomes \"
single quotation mark: ' becomes \'
ampersand: & becomes & or &
That means you can add your HTML markup without escaping the tags:
<string name="my_string"><b>Hello World!</b> This is an example.</string>
However, to be sure, you should only use <b>, <i> and <u> as they are listed in the documentation.
If you want to use your HTML strings from XML, just keep on using android:text="#string/...", it will work fine.
The only difference is that, if you want to use your HTML strings from Java code, you have to use getText(...) instead of getString(...) now, as the former keeps the style and the latter will just strip it off.
It's as easy as that. No CDATA, no Html.fromHtml(...).
You will only need Html.fromHtml(...) if you did encode your special characters in HTML markup. Use it with getString(...) then. This can be necessary if you want to pass the String to String.format(...).
This is all described in the docs as well.
Edit:
There is no difference between getText(...) with unescaped HTML (as I've proposed) or CDATA sections and Html.fromHtml(...).
See the following graphic for a comparison:
Escape your HTML tags ...
<resources>
<string name="somestring">
<B>Title</B><BR/>
Content
</string>
</resources>
Android does not have a specification to indicate the type of resource string (e.g. text/plain or text/html). There is a workaround, however, that will allow the developer to specify this within the XML file.
Define a custom attribute to specify that the android:text attribute is html.
Use a subclassed TextView.
Once you define these, you can express yourself with HTML in xml files without ever having to call setText(Html.fromHtml(...)) again. I'm rather surprised that this approach is not part of the API.
This solution works to the degree that the Android studio simulator will display the text as rendered HTML.
res/values/strings.xml (the string resource as HTML)
<resources>
<string name="app_name">TextViewEx</string>
<string name="string_with_html"><![CDATA[
<em>Hello</em> <strong>World</strong>!
]]></string>
</resources>
layout.xml (only the relevant parts)
Declare the custom attribute namespace, and add the android_ex:isHtml attribute. Also use the subclass of TextView.
<RelativeLayout
...
xmlns:android_ex="http://schemas.android.com/apk/res-auto"
...>
<tv.twelvetone.samples.textviewex.TextViewEx
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:text="#string/string_with_html"
android_ex:isHtml="true"
/>
</RelativeLayout>
res/values/attrs.xml (define the custom attributes for the subclass)
<resources>
<declare-styleable name="TextViewEx">
<attr name="isHtml" format="boolean"/>
<attr name="android:text" />
</declare-styleable>
</resources>
TextViewEx.java (the subclass of TextView)
package tv.twelvetone.samples.textviewex;
import android.content.Context;
import android.content.res.TypedArray;
import android.support.annotation.Nullable;
import android.text.Html;
import android.util.AttributeSet;
import android.widget.TextView;
public TextViewEx(Context context, #Nullable AttributeSet attrs) {
super(context, attrs);
TypedArray a = context.obtainStyledAttributes(attrs, R.styleable.TextViewEx, 0, 0);
try {
boolean isHtml = a.getBoolean(R.styleable.TextViewEx_isHtml, false);
if (isHtml) {
String text = a.getString(R.styleable.TextViewEx_android_text);
if (text != null) {
setText(Html.fromHtml(text));
}
}
} catch (Exception e) {
e.printStackTrace();
} finally {
a.recycle();
}
}
}
Latest update:
Html.fromHtml(string);//deprecated after Android N versions..
Following code give support to android N and above versions...
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.N) {
textView.setText(Html.fromHtml(yourHtmlString,Html.FROM_HTML_MODE_LEGACY));
}
else
{
textView.setText(Html.fromHtml(yourHtmlString));
}
String termsOfCondition="<font color=#cc0029>Terms of Use </font>";
String commma="<font color=#000000>, </font>";
String privacyPolicy="<font color=#cc0029>Privacy Policy </font>";
Spanned text=Html.fromHtml("I am of legal age and I have read, understood, agreed and accepted the "+termsOfCondition+commma+privacyPolicy);
secondCheckBox.setText(text);
I have another case when I have no chance to put CDATA into the xml as I receive the string HTML from a server.
Here is what I get from a server:
<p>The quick brown <br />
fox jumps <br />
over the lazy dog<br />
</p>
It seems to be more complicated but the solution is much simpler.
private TextView textView;
protected void onCreate(Bundle savedInstanceState) {
.....
textView = (TextView) findViewById(R.id.text); //need to define in your layout
String htmlFromServer = getHTMLContentFromAServer();
textView.setText(Html.fromHtml(htmlFromServer).toString());
}
Hope it helps!
Linh
If you want to show html scrip in android app Like TextView
Please follow this code
Kotlin
var stringvalue = "Your Sting"
yourTextVew.text = if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.N) {
Html.fromHtml(stringvalue, Html.FROM_HTML_MODE_COMPACT)
} else {
Html.fromHtml(stringvalue)
}
Java
String stringvalue = "Your String";
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.N) {
yourTextVew.setText(Html.fromHtml(stringvalue, Html.FROM_HTML_MODE_COMPACT))
} else {
yourTextVew.setText( Html.fromHtml(stringvalue))
}

Is it possible to have HTML text or CDATA inside an XML attribute?

I keep getting "XML parser failure: Unterminated attribute" with my parser when I attempt to put HTML text or CDATA inside my XML attribute. Is there a way to do this or is this not allowed by the standard?
No, The markup denoting a CDATA Section is not permitted as the value of an attribute.
According to the specification, this prohibition is indirect rather than direct. The spec says that the Attribute value must not have an open angle bracket. Open angle brackets and ampersand must be escaped. Therefore you cannot insert a CDATA section. womp womp.
A CData Section is interpreted only when it is in a text node of an element.
Attributes can only have plain text inside, no tags, comments, or other structured data. You need to escape any special characters by using character entities. For example:
<code text="<a href="/">">
That would give the text attribute the value <a href="/">. Note that this is just plain text so if you wanted to treat it as HTML you'd have to run that string through an HTML parser yourself. The XML DOM wouldn't parse the text attribute for you.
CDATA is unfortunately an ambiguous thing to say here. There are "CDATA Sections", and "CDATA Attribute Type".
Your attribute value can be of type CDATA with the "CDATA Attribute Type".
Here is an xml that contains a "CDATA Section" (aka. CDSect):
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<elemke>
<![CDATA[
foo
]]>
</elemke>
Here is an xml that contains a "CDATA Attribute Type" (as AttType):
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE elemke [
<!ATTLIST brush wood CDATA #REQUIRED>
]>
<elemke>
<brush wood="guy
threep"/>
</elemke>
You cannot use a "CDATA Section" for an Attribute Value: wrong:<brush wood=<![CDATA[foo]]>/>
You can use a "CDATA Attribute Type" for your Attribute's Type, I think this is actually what happens in the usual case, and your attribute value is actually a CDATA: for an element like <brush wood="guy
threep"/>, in the raw binary bytestring that is the .xml file, you have guy
threep however when the file is processed, the attribute value in memory will be
guy
threep
Your problem may lie in 1) producing a right xml file and 2) configuring a "xml processor" to produce an output you want.
For example, in case you write a raw binary file as your xml by hand, you need to put these escapes inside the attribute value part in the raw file, like I wrote <brush wood="guy
threep"/> here, instead of <brush wood="guy (newline) threep"/>
Then the parse would actually give you a newline, I've tried this with a processor.
You can try it with a processor like saxon or for poor-man's experiment one like a browser, opening the xml in firefox and copying the value to a text editor - firefox displayed the newline as a space, but copying the string to a text editor showed the newline. (Probably with a better suited processor you could save the direct output right away.)
Now the "only" thing you need to do is make sure you handle this CDATA appropriately. For example, if you have an XSL stylesheet, that would produce you a html, you can use something like this .xsl for such an xml:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template name="split">
<xsl:param name="list" select="''" />
<xsl:param name="separator" select="'
'" />
<xsl:if test="not($list = '' or $separator = '')">
<xsl:variable name="head" select="substring-before(concat($list, $separator), $separator)" />
<xsl:variable name="tail" select="substring-after($list, $separator)" />
<xsl:value-of select="$head"/>
<br/><xsl:text>
</xsl:text>
<xsl:call-template name="split">
<xsl:with-param name="list" select="$tail" />
<xsl:with-param name="separator" select="$separator" />
</xsl:call-template>
</xsl:if>
</xsl:template>
<xsl:template match="brush">
<html>
<xsl:call-template name="split">
<xsl:with-param name="list" select="#wood"/>
</xsl:call-template>
</html>
</xsl:template>
</xsl:stylesheet>
Which in a browser or with a processor like saxon using java -jar saxon9he.jar -s:eg2.xml -xsl:eg2.xsl -o:eg2.html saxon home edition 9.5 would produce this html-like thing:
<html>guy<br>
threep<br>
</html>
which will look like this in a browser:
guy
threep
Here I am using a recursive template 'split' from Tomalak, thanks to Mads Hansen, because my target processor doesn't support neither string-join nor tokenize which are version 2.0 only.
If an attribute is not a tokenized or enumerated type, it is processed as CDATA. The details for how the attribute is processed can be found in the Extensible Markup Language (XML) 1.0 (Fifth Edition).
3.3.1 Attribute Types
XML attribute types are of three kinds: a string type, a set of tokenized types, and enumerated types. The string type may take any literal string as a value; the tokenized types are more constrained. The validity constraints noted in the grammar are applied after the attribute value has been normalized as described in 3.3.3 Attribute-Value Normalization.
[54] AttType ::= StringType | TokenizedType | EnumeratedType
[55] StringType ::= 'CDATA'
[56] TokenizedType ::= 'ID' [VC: ID]
[VC: One ID per Element Type]
[VC: ID Attribute Default]
| 'IDREF' [VC: IDREF]
| 'IDREFS' [VC: IDREF]
| 'ENTITY' [VC: Entity Name]
| 'ENTITIES' [VC: Entity Name]
| 'NMTOKEN' [VC: Name Token]
| 'NMTOKENS' [VC: Name Token]
...
3.3.3 Attribute-Value Normalization
Before the value of an attribute is passed to the application or checked for validity, the XML processor MUST normalize the attribute value by applying the algorithm below, or by using some other method such that the value passed to the application is the same as that produced by the algorithm.
All line breaks MUST have been normalized on input to #xA as described in 2.11 End-of-Line Handling, so the rest of this algorithm operates on text normalized in this way.
Begin with a normalized value consisting of the empty string.
For each character, entity reference, or character reference in the unnormalized attribute value, beginning with the first and continuing to the last, do the following:
For a character reference, append the referenced character to the normalized value.
For an entity reference, recursively apply step 3 of this algorithm to the replacement text of the entity.
For a white space character (#x20, #xD, #xA, #x9), append a space character (#x20) to the normalized value.
For another character, append the character to the normalized value.
If the attribute type is not CDATA, then the XML processor MUST further process the normalized attribute value by discarding any leading and trailing space (#x20) characters, and by replacing sequences of space (#x20) characters by a single space (#x20) character.
Note that if the unnormalized attribute value contains a character reference to a white space character other than space (#x20), the normalized value contains the referenced character itself (#xD, #xA or #x9). This contrasts with the case where the unnormalized value contains a white space character (not a reference), which is replaced with a space character (#x20) in the normalized value and also contrasts with the case where the unnormalized value contains an entity reference whose replacement text contains a white space character; being recursively processed, the white space character is replaced with a space character (#x20) in the normalized value.
All attributes for which no declaration has been read SHOULD be treated by a non-validating processor as if declared CDATA.
It is an error if an attribute value contains a reference to an entity for which no declaration has been read.
We can't use CDATA as attribute, but we can bind html using HTML codes.
Here is one example:
to achieve this: <span class="abc"></span>
use XML code like this:
<xmlNode attibuteName="<span class="abc">Your Text</span>"></xmlNode>
Yes you can when you encode the content within the XML tags.
I.e. use & < > " &apos;, that way it will not be seen as markup inside your markup.