Break up Long title attribute Value - html

I have an web page which has tooltip set as follows:
title="Tel: {%- recordFields.providerTel || 'N/A' %} Email: {%-recordFields.providerEmail || 'N/A' %}"
The line occupies 142 columns...
Is there a way to break up the title string in the source so that it can span multiple lines?
Something along these lines:
title="Tel: {%- recordFields.providerTel || 'N/A' %} \
Email: {%-recordFields.providerEmail || 'N/A' %}"

In a comment on the question I asked:
You only want it to span multiple lines in the source, right? The actual value shouldn't have newlines (e.g., when used)?
and you said:
Let's say that. In this particular case, I also want a newline in the output, but I'll remove it for clarity.
It's a really fundamental part of the question. :-)
If you do want the newlines, the answer is easy but (to my mind) unsatisfying: Just put them in, literally:
<div title="Tel: {%- recordFields.providerTel || 'N/A' %}
Email: {%-recordFields.providerEmail || 'N/A' %}">...</div>
Live example. Note that it's important not to have leading whitespace on the next line, because that whitespace is part of the attribute value. This is what makes it unsatisfying to me, because having that subsequent line start at column 0 in something that's otherwise indented seems unclean (and some tools will fight with you, trying to indent it).
If you don't want the newlines in the attribute's value, I'm not aware of a way to do it. According to the HTML specification, an attribute's value is "...Attribute values are a mixture of text and character references...", and if we follow that link for "text" it doesn't say anything about putting source-only linebreaks in the value.
Since you seem to be using some kind of templating engine, if it runs server-side then you could of course define a property on the values object to hold the title string:
title="{%- getTitleFor(recordFields) %}"
...but that moves the content out of your HTML source (where content generally belongs) into your server-side language source, so it's not a great alternative.

Related

Create an IF statement for a Variable to see if it contains a range of letters

{% if val|first in 'ABCDEFG' %}
This works but doing it this way seems a little clumsy to me. Especially with the number of if statements I will be creating.
I've been searching for a solution for hours but can't quite find what I need. I imagine there is an option to help clean up my script.
Is there a way to make this something like: { if val|first in 'A-G' %} ? Bonus points if you can help me make it case insensitive.
I hope this is the answer you are looking for.
{% if val|first|lower in 'a'-'g' %}
This will convert the first character of val to uppercase and check if it is within the range of A-G. You can make it case insensitive by converting it to either upper or lower.

set auto indent on newline between auto closed HTML tag in vim (<div>|</div>) [duplicate]

I most IDEs and modern text editors (Sublime Text 3) the cursor is correctly indented after inserting a newline in between an html tag (aka 'expanding" the tag):
Before:
<div>|</div>
After pressing CR:
<div>
|
</div>
But in Vim, this is what I get:
<div>
|</div>
How can I get the same behaviour in Vim like in most other editors (see above)?
The only correct behavior of <CR> in insert mode is to break the line at the cursor.
What you want is an enhanced behavior and you need to add something to your config to get it: a mapping, a short function or a full fledged plugin.
When I started to use vim, that behavior was actually one of the first things I added to my vimrc. I've changed it many times in the past but this mapping has been quite stable for a while:
inoremap <leader><CR> <CR><C-o>==<C-o>O
I've used <leader><CR> to keep the normal behavior of <CR>.
Here is a small function that seems to do what you want:
function! Expander()
let line = getline(".")
let col = col(".")
let first = line[col-2]
let second = line[col-1]
let third = line[col]
if first ==# ">"
if second ==# "<" && third ==# "/"
return "\<CR>\<C-o>==\<C-o>O"
else
return "\<CR>"
endif
else
return "\<CR>"
endif
endfunction
inoremap <expr> <CR> Expander()
This little snippet will remap Enter in insert mode to test whether or not the cursor is between > and < and act accordingly if it is. Depending on your indent settings the \<Tab> may need to be removed.
It will not play nice with other plugins that might be also be mapping the Enter key so be aware that there is probably more work to do if you want that compatibility.
function EnterOrIndentTag()
let line = getline(".")
let col = getpos(".")[2]
let before = line[col-2]
let after = line[col-1]
if before == ">" && after == "<"
return "\<Enter>\<C-o>O\<Tab>"
endif
return "\<Enter>"
endfunction
inoremap <expr> <Enter> EnterOrIndentTag()
I have only tested the simple cases (beginning of the line, end of the line, inside and outside of ><), there are probably edge cases that this would not catch.
#RandyMorris and #romainl have posted good solutions for your exact problem.
There are some other possibilities you might be interested in if you are typing out these tags yourself: there's the ragtag.vim plugin for HTML/XML editing.
With ragtag.vim you type this to create your "before" situation (in insert mode):
div<C-X><Space>
To create your "after" situation you would instead type:
div<C-X><Enter>
So if you know beforehand that you are going to "expand" the tag, typing just the element name and the combo CtrlX followed by Enter is enough.
There are also other more advanced plugins to save keystrokes when editing HTML, such as ZenCoding.vim and Sparkup.
Since no one have mentioned it I will. There is excellent plugin that does exactly that
delemitmate

preventing xss hole in django

In the Django docs it says:
Django templates escape specific characters which are particularly
dangerous to HTML. While this protects users from most malicious
input, it is not entirely foolproof. For example, it will not protect
the following:
<style class={{ var }}>...</style>
If var is set to 'class1 onmouseover=javascript:func()', this can
result in unauthorized JavaScript execution, depending on how the
browser renders imperfect HTML.
How can I prevent this?
I'm not especially familiar with Django, but it looks to me like the error they intended to point out is that there are no quotes around the attribute value, meaning that the space in the example value causes the rest of the string (onmouseover=...) to be interpreted as a separate attribute. Instead, you should put quotes like so:
<style class="{{ var }}">...</style>
If I understand correctly, this would be safe since all the characters that could interfere with the quoting are escaped. You might want to verify that interpretation; for example, write <span title="{{ var }}">foo</span>, run the template with foo set to <>"'&, and then make sure that they're properly escaped in the HTML and that the title appears in the browser with the original characters.
One thing you can do is not allow variable classes. You can use something like
<style class={% if class_foo %}foo{% elif class_bar %}bar{% else %}baz{% endif %}>...</style>
There are also filters available to prevent xss elsewhere: https://docs.djangoproject.com/en/dev/ref/templates/builtins/#std:templatefilter-escape

Newlines not being interpreted when getting from sqlite db?

I'm using Flask and Sqlite.
I take some string, which contains newlines, and store it in the db. At some later point I get it from the db and include it on some page, and the string shows up without newlines. What's with that?
For example if I have
{{ entry.content }}
in my template, and the entry that was stored had content "hello\nhello", it displays "hellohello" on the page.
However if I have
{{ entry.content.replace('\r\n','<br />') }}
or
{{ entry.content.replace('\r\n','
') }}
in my template, it will display "hellohello" or "hello
hello" on the page.
So my impression is that the newline characters just aren't being interpreted and displayed by the browser. What am I doing wrong?
Try {{ entry.content|safe }} so Flask/Jinja doesn't escape your HTML.
(Be careful, though, as any user entered content, including script tags, will be output as-is. If you really want to be cautious and only allow tags you might want to do write your own scrubber: Jinja2 escape all HTML but img, b, etc)

Regex: Extracting readable (non-code) text and URLs from HTML documents

I am creating an application that will take a URL as input, retrieve the page's html content off the web and extract everything that isn't contained in a tag. In other words, the textual content of the page, as seen by the visitor to that page. That includes 'masking' out everything encapsuled in <script></script>, <style></style> and <!-- -->, since these portions contain text that is not enveloped within a tag (but is best left alone).
I have constructed this regex:
(?:<(?P<tag>script|style)[\s\S]*?</(?P=tag)>)|(?:<!--[\s\S]*?-->)|(?:<[\s\S]*?>)
It correctly selects all the content that i want to ignore, and only leaves the page's text contents. However, that means that what I want to extract won't show up in the match collection (I am using VB.Net in Visual Studio 2010).
Is there a way to "invert" the matching of a whole document like this, so that I'd get matches on all the text strings that are left out by the matching in the above regex?
So far, what I did was to add another alternative at the end, that selects "any sequence that doesn't contain < or >", which then means the leftover text. I named that last bit in a capture group, and when I iterate over the matches, I check for the presence of text in the "text" group. This works, but I was wondering if it was possible to do it all through regex and just end up with matches on the plain text.
This is supposed to work generically, without knowing any specific tags in the html. It's supposed to extract all text. Additionally, I need to preserve the original html so the page retains all its links and scripts - i only need to be able to extract the text so that I can perform searches and replacements within it, without fear of "renaming" any tags, attributes or script variables etc (so I can't just do a "replace with nothing" on all the matches I get, because even though I am then left with what I need, it's a hassle to reinsert that back into the correct places of the fully functional document).
I want to know if this is at all possible using regex (and I know about HTML Agility Pack and XPath, but don't feel like).
Any suggestions?
Update:
Here is the (regex-based) solution I ended up with: http://www.martinwardener.com/regex/, implemented in a demo web application that will show both the active regex strings along with a test engine which lets you run the parsing on any online html page, giving you parse times and extracted results (for link, url and text portions individually - as well as views where all the regex matches are highlighted in place in the complete HTML document).
what I did was to add another alternative at the end, that selects "any sequence that doesn't contain < or >", which then means the leftover text. I named that last bit in a capture group, and when I iterate over the matches, I check for the presence of text in the "text" group.
That's what one would normally do. Or even simpler, replace every match of the markup pattern with and empty string and what you've got left is the stuff you're looking for.
It kind of works, but there seems to be a string here and there that gets picked up that shouldn't be.
Well yeah, that's because your expression—and regex in general—is inadequate to parse even valid HTML, let alone the horrors that are out there on the real web. First tip to look at, if you really want to chase this futile approach: attribute values (as well as text content in general) may contain an unescaped > character.
I would like to once again suggest the benefits of HTML Agility Pack.
ETA: since you seem to want it, here's some examples of markup that looks like it'll trip up your expression.
<a href=link></a> - unquoted
<a href= link></a> - unquoted, space at front matched but then required at back
- very common URL char missing in group
- more URL chars missing in group
<a href=lïnk></a> - IRI
<a href
="link"> - newline (or tab)
<div style="background-image: url(link);"> - unquoted
<div style="background-image: url( 'link' );"> - spaced
<div style="background-image: url('link');"> - html escape
<div style="background-image: ur\l('link');"> - css escape
<div style="background-image: url('link\')link');"> - css escape
<div style="background-image: url(\
'link')"> - CSS folding
<div style="background-image: url
('link')"> - newline (or tab)
and that's just completely valid markup that won't match the right link, not any of the possible invalid markup, markup that shouldn't but does match a link, or any of the many problems with your other technique of splitting markup from text. This is the tip of the iceberg.
Regex is not reliable for retrieving textual contents of HTML documents. Regex cannot handle nested tags. Supposing a document doesn't contain any nested tag, regex still requires every tags are properly closed.
If you are using PHP, for simplicity, I strongly recommend you to use DOM (Document Object Model) to parse/extract HTML documents. DOM library usually exists in every programming language.
If you're looking to extract parts of a string not matched by a regex, you could simply replace the parts that are matched with an empty string for the same effect.
Note that the only reason this might work is because the tags you're interested in removing, <script> and <style> tags, cannot be nested.
However, it's not uncommon for one <script> tag to contain code to programmatically append another <script> tag, in which case your regex will fail. It will also fail in the case where any tag isn't properly closed.
You cannot parse HTML with regular expressions.
Parsing HTML with regular expressions leads to sadness.
I know you're just doing it for fun, but there are so many packages out there than actually do the parsing the right way, AND do it reliably, AND have been tested.
Don't go reinventing the wheel, and doing it a way that is all but guaranteed to frustrate you down the road.
OK, so here's how I'm doing it:
Using my original regex (with the added search pattern for the plain text, which happens to be any text that's left over after the tag searches are done):
(?:(?:<(?P<tag>script|style)[\s\S]*?</(?P=tag)>)|(?:<!--[\s\S]*?-->)|(?:<[\s\S]*?>))|(?P<text>[^<>]*)
Then in VB.Net:
Dim regexText As New Regex("(?:(?:<(?<tag>script|style)[\s\S]*?</\k<tag>>)|(?:<!--[\s\S]*?-->)|(?:<[\s\S]*?>))|(?<text>[^<>]*)", RegexOptions.IgnoreCase)
Dim source As String = File.ReadAllText("html.txt")
Dim evaluator As New MatchEvaluator(AddressOf MatchEvalFunction)
Dim newHtml As String = regexText.Replace(source, evaluator)
The actual replacing of text happens here:
Private Function MatchEvalFunction(ByVal match As Match) As String
Dim plainText As String = match.Groups("text").Value
If plainText IsNot Nothing AndAlso plainText <> "" Then
MatchEvalFunction = match.Value.Replace(plainText, plainText.Replace("Original word", "Replacement word"))
Else
MatchEvalFunction = match.Value
End If
End Function
Voila. newHtml now contains an exact copy of the original, except every occurrence of "Original word" in the page (as it's presented in a browser) is switched with "Replacement word", and all html and script code is preserved untouched. Of course, one could / would put in a more elaborate replacement routine, but this shows the basic principle. This is 12 lines of code, including function declaration and loading of html code etc. I'd be very interested in seeing a parallel solution, done in DOM etc for comparison (yes, I know this approach can be thrown off balance by certain occurrences of some nested tags quirks - in SCRIPT rewriting - but the damage from that will still be very limited, if any (see some of the comments above), and in general this will do the job pretty darn well).
For Your Information,
Instead of Regex, With JQuery , Its possible to extract text alone from a html markup. For that you can use the following pattern.
$("<div/>").html("#elementId").text()
You can refer this JSFIDDLE