Surrounding text with tag and populating tag - html

I have several lines of text, in them there is a word or words that are capitalized like this:
Hello HOW ARE YOU good to see you
I am FINE
Is there a tool that can go through the text and surround all those capitalized with the HTML anchor text?
and
I guess more difficultly, also populate the href with uncapitalized, space(s) removed version of that capitalized text?
Any help on one or both questions is appreciated.

It took me a while, but here it is in javascript: http://jsfiddle.net/RdJ4E/4/
I'm sure you will find the way hot to tune the code. Good luck!

Is this a beginning? Matching all uppercased words is trivial with regex, and with providing the String.replace method with a callback function instead of a string you can do whatever you want with the matched string.
myString.replace(/(\b[A-Z\s]+\b)/g, function(result, match){
var stripped = encodeURI(result.trim().toLowerCase());
return ' '+result.trim()+' ';
});
http://jsfiddle.net/mwxnC/2/

Related

How to get Emmet recognize text replacement or aliases in class names?

For example: ".row>.small-9-centered.small-3-centered" by writing " from writting ".row>div.s9c.s3c". I am trying to tell emmet to replace "s9c" for "small-9-centered".
Any suggestions? If it is not possible, any way you know to script a plugin to do this by detecting the next of . "dot" and before another dot or >.
I was looking for a text replacement, or something like var replacement in sublime but I did find anything, even if I would, maybe It wouldn't be recognized by the Emmet parser, that is why I was looking something like Snippets abbreviations but only for text, since emmet snippets always assumes it is a tag and adds closing tags.
Thank you in advance,
You can do it by using this line of javascript str.replace("s9c", "small-9-centered");, it is looking for a string and replacing it with another. See fiddle: http://jsfiddle.net/JchE5/448/
javascript:
function myFunction() {
var str = document.getElementById("change").innerHTML;
var res = str.replace("s9c", "small-9-centered");
document.getElementById("change").innerHTML = res;
}

Remove first line from HTML Markup Field using RegEx

I have a single text field that contains HTML markup. The system that generates this field content always seems to generate a first line with a non-visible carriage return value in it and I can't seem to prevent if from doing so.
Does anyone know of a way (perhaps using a Regular Expression), to remove that first line from this text field?
I'd prefer to leave all other instances of the carriage return values in the field as is, so if it's a RegEx statement that will just remove the first line of a text field, that would work for me.
Any suggestions most welcomed.
Cheers,
Wayne
Usually the trim (often removes whitespaces, CR ) method is used for this in many programming languages. You did not state in what language you will be doing this...

How do I put two spaces after every period in our HTML?

I need there to be two spaces after every period in every sentence in our entire site (don't ask).
One way to do it is to embark on manually adding a &nbsp after every single period. This will take several hours.
We can't just find and replace every period, because we have concatenations in PHP and other cases where there is a period and then a space, but it's not in a sentence.
Is there a way to do this...and everything still work in Internet Explorer 6?
[edit] - The tricky part is that in the code, there are lines of PHP that include dots with spaces around them like this:
<?php echo site_url('/css/' . $some_name .'.css');?>
I definitely don't want extra spaces to break lines like that, so I would be happy adding two visible spaces after each period in all P tags.
As we all know, HTML collapses white space, but it only does this for display. The extra spaces are still there. So if the source material was created with two spaces after each period, then some of these substitution methods that are being suggested can be made to work reliably - search for "period-space-space" and replace it with something more suituble, like period-space-&emsp14;. Please note that you shouldn't use because it can prevent proper wrapping at margins. (If you're using ragged right, the margin change won't be noticeable as long as you use the the nbsp BEFORE the space.)
You can also wrap each sentence in a span and use the :after selector to add a space and format it to be wide with "word-spacing". Or you can wrap the space between sentences itself in a span and style that directly.
I've written a javascript solution for blogger that does this on the fly, looks for period-space-space, and replaces it with a spanned, styled space that appears wider.
If however your original material doesn't include this sort of thing then you'll have to study up on sentence boundary detection algorithms (which are not so simple), and then modify one to also not trip over PHP code.
You might be able to use the JavaScript split method or regex depending on the scope of the text.
Here's the split method:
var el = document.getElementById("mydiv");
if (el){
el.innerText = el.innerText.split(".").join(".\xA0 ");
}
Test case:
Hello world.Insert spaces after the period.Using the split method.
Result:
Hello world. Insert spaces after the period. Using the split method.
Have you thought using output buffer? ob_start($callback)
Not tested, but if you'll stick this before any output (or betetr yet, offload the function):
<?php
function processDots($buffer)
{
return (str_replace(".", ". ", $buffer));
}
ob_start("processDots");
?>
and add this to end of input:
<?php ob_end_flush(); ?>
Might just work :)
If you're not opposed to a "post processing"/"javascript" solution:
var nodes = $('*').contents().map(function(a, b) {
return (b.nodeType === Node.TEXT_NODE ? b : null);
});
$.each(nodes, function(i,node){
node.data = node.data.replace(/(\.\s)/g, '.\u00A0\u00A0');
});
Using jQuery for the sake of brevity, but not required.
p.s. I saw your comment about not all periods and a space are to be treated equal, but this is about as good as it gets. otherwise, you're going to need a lot better/more bullet-proof approach.
Incorporate something like this into your PHP file:
<?php if (preg_match('/^. [A-Z]$/' || '/^. [A-Z]$/')) { preg_replace('. ', '. '); } ?>
This allows you to search for the beginning of each new sentence as in .spacespaceA-Z, or .spaceA-Z and then replaces that with . space. [note: Capital letter is not replaced]

regex: selecting everything but img tag

I'm trying to select some text using regular expressions leaving all img tags intact.
I've found the following code that selects all img tags:
/<img[^>]+>/g
but actually having a text like:
This is an untagged text.
<p>this is my paragraph text</p>
<img src="http://example.com/image.png" alt=""/>
this is a link
using the code above will select the img tag only
/<img[^>]+>/g #--> using this code will result in:
<img src="http://example.com/image.png" alt=""/>
but I would like to use some regex that select everything but the image like:
/magical regex/g # --> results in:
This is an untagged text.
<p>this is my paragraph text</p>
this is a link
I've also found this code:
/<(?!img)[^>]+>/g
which selects all tags except the img one. but in some cases I will have untagged text or text between tags so this won't work for my case. :(
is there any way to do it?
Sorry but I'm really new to regular expressions so I'm really struggling for few days trying to make it work but I can't.
Thanks in advance
UPDATE:
Ok so for the ones thinking I would like to parse it, sorry I don't want it, I just want to select text.
Another thing, I'm not using any language in specific, I'm using Yahoo Pipes which only provide regex and some string tools to accomplish the job. but it doesn't evolves any programming code.
for better understanding here is the way regex module works in yahoo pipes:
http://pipes.yahoo.com/pipes/docs?doc=operators#Regex
UPDATE 2
Fortuntately I'm being able to strip the text near the img tag but on a step-by-step basis as #Blixt recommended, like:
<(?!img)[^>]+> , replace with "" #-> strips out every tag that is not img
(?s)^[^<]*(.*), replace with $1 #-> removes all the text before the img tag
(?s)^([^>]+>).*, replace with $1 #-> removed all the text after the img tag
the problem with this is that it will only catch the first img tag and then I would have to do it manually and catch the others hard-coding it, so I still not sure if this is the best solution.
The regexp you have to find the image tags can be used with a replace to get what you want.
Assuming you are using PHP:
$htmlWithoutIMG = preg_replace('/<img[^>]+>/g', '', $html);
If you are using Javascript:
var htmlWithoutIMG = html.replace(/<img[^>]+>/g, '');
This takes your text, finds the <img> tags and replaces them with nothing, ie. it deletes them from the text, leaving what you want. Can not recall if the <,> need escaping.
Regular expression matches have a single start and length. This means the result you want is impossible in a single match (since you want the result to end at one point, then continue later).
The closest you can get is to use a regular expression that matches everything from start of string up to start of <img> tag, everything between <img> tags and everything from end of <img> tag to end of string. Then you could get all matches from that regular expression (in your example, there would be two matches).
The above answer is assuming you can't modify the result. If you can modify the result, simply replace the <img> tags with the empty string to get your result.

Invisible Delimiter for Strings in HTML

I need a way to identify certain strings in HTML markup. I know what the strings are, but it is possible that they could be substrings of other strings in the document. To find them, I output a special delimiter character (currently using \032). On page load, we go through the HTML and record the location of the strings, and remove the delimiter.
Unfortunately, most browsers show the delimiter character until we can find and remove them all. I'd like to avoid that if possible. Is there a character or string that will be preserved in the HTML content (so a comment wont work) but wont be visible to the user? It also needs to be something that is fairly unlikely to appear next to a string, so something like wouldn't work either.
EDIT: Sorry, I forgot to mention that the strings will be in attributes, so any sort of tag wont work.
‌ - zero-width non-joiner (see http://htmlhelp.org/reference/html40/entities/special.html)
On the off chance that this already appears in your text, double it up (eg: ‌‌mytext‌‌
Edit in response to comment: works in Firefox 3. Note that you have to search for the Unicode value of the entity.
<html>
<body>
<div id="test">
This is a ‌test
</div>
<script type="application/javascript">
var myDiv = document.getElementById("test");
var content = myDiv.innerHTML;
var pos = content.indexOf("\u200C");
alert(pos);
</script>
</body>
</html>
You could insert them into <span> elements. This will work only for in-page text (not attributes, or the like).
Otherwise, you could insert a whitespace character that your program doesn't already output as part of the HTML, like a tab character (\x09), a vertical tab (\x0b), a bare carriage return (\x0d) — without a newline beside it, ala Windows text encoding — or, just a null byte (\x00).
The best thing that I shall like to insert, which is not visible on the browser, will be a pair of tags with some special id, like <span id="delimiter" class="Delimiter"></span>. This will not show up on the content, while this can be present in the doc. You don't need to remove them.
You could use left-to-right (LTR) marks. Is this for some sort of XSS testing? If so, this might be of interest: Taint support for PHP