Can I replace placeholder text in a rendered HTML page dynamically? - html

I wish I could think of a better way to word my question, but basically here is what I want to do: in an HTML file, I would like to fill the body with a specific string multiple times. For example:
<div>
This is some content. XXX
</div>
<div>
This is some more content. XXX
</div>
<div>
This is even more content. XXX
</div>
Then, I would like some script to go through the page, and replace every instance of the string (in this case XXX but it could be anything) with an incrementing number, so, like:
<div>
This is some content. 001
</div>
<div>
This is some more content. 002
</div>
<div>
This is even more content. 003
</div>
This is a simple example of course, and you might be thinking well that's dumb, just type the numbers. But obviously this is simpler than what I'm intending to do, and right now what I'm building, the order of all the content has not been decided yet, so things could move up or down in their placement on the page, but I'd like all the numbers to be sequential in order of their appearance on the page.
So, final thoughts: I am super sure there's a way better way to do this than I'm even thinking of, methodology wise (i.e., make an XML table or something). I am definitely open to ANY suggestion on how to do this, but I am kind of an idiot so if your answer is "pff this would be super easy in Ruby just use Ruby", that's not gonna really get me where I need to be. Also if this has already been answered, it was hard to think of how to word the question to search for previous answers so I apologize in advance if I didn't find the pre-existing answer when I was searching.

You can easily do this with CSS counters, sample here:
CSS
ul {
counter-reset:list;
}
li:after {
counter-increment:list;
content: " (" counter(list) ")";
}
For some more advanced examples visit the MDN documentation page.

You could use PHP to achieve this. If you've had no experience with it, it does integrate with HTML easily. Basically you write your html as usual, but you name the file .php instead of .html. Then you insert php scripts as follows, for example: <p>I can count to <?php nextNumber(); ?></p>.
at the top of the page you should insert more script with a counter function:
<?php
$i = 1;
$places = 4;
function nextNumber() {
GLOBAL $i, $places;
print str_pad($i++,$places,'0',STR_PAD_LEFT);
}
?>
This may be better than CSS. It's not browser-dependant.
Change $places to the number of digits you'd like to have (for leading zeros)

Related

Regex that extracts a little text from a huge HTML page and ignore tags and the rest

I want to write a regex which extract the content that only the strings/text in html and i need to remove all the rest. I have a huge webpage with a lot of data, but I will show only a stretch:
<div class="flex-row column"><div class="max-360 a-small"> <img class='width-50' src="irobot.io"><h3 class="pad-30">Do you think like a robot?</h3><p> This is not the problem, the problem is about the human failure.</p></div><div class="max-500> <img class='width-50' src="irobot.io"><h3 class="pad-30">
"
I need to received back only:
Do you think like a robot? This is not the problem, the problem is about the human failure.
Someone can help me? I tried something like:
Regex: Do you.*[^<]\b<
But i never worked before with regex.
Thank you!
Edit: you mentioned that you need the solution to work in Python or Java. All my suggestions stay the same; you should use a XML/HTML parser instead of a regular expression. Python has Lib/xml, there are multiple options in Java.
If you must use a regular expression, the syntax is the same for Java. In Python, you can use a pattern like (?:<([^> ]*)[^>]*>)(.*)(?:<\/?\1[^>]*>) with all the same restrictions I mentioned for JavaScript/Java. Try it out!
You can try to parse the text contents from a tag using regular expressions, but it's not recommended in most cases. If you must use a regular expression, you can try something like (?:<(?<tag>[^> ]*)[^>]*>)(?<text>.*)(?:<\/?\k<tag>[^>]*>) on a single tag at a time.
Try it out!
const pattern = /(?:<(?<tag>[^> ]*)[^>]*>)(?<text>.*)(?:<\/?\k<tag>[^>]*>)/;
const matches = pattern.exec(document.body.innerHTML);
console.log('The whole tag: ', matches[0]);
console.log('The tag type: ', matches[1]);
console.log('The text content: ', matches[2]);
<h3 class="pad-30">Do you think like a robot?</h3>
This will not work over multiple nested tags. There are better options available if you want to parse the whole tree in one action. For instance:
The browser's document tree parser already provides extensive navigation options including the ability to return the result of concatenating all visible textNode contents given a starting node.
You can use the innerText or textContent properties like so (open the full screen version):
console.log('Option 1:');
console.log(document.querySelector('body').innerText);
console.log('Option 2:');
console.log(document.querySelector('body').textContent);
code {
background-color: lightgray;
border-radius: 0.5em;
padding: 0.25em 0.5em;
}
<p>Do you <em>need</em> to use a <code>regular expression</code> for this task or would one of the following be suitable?</p>
<ol>
<li>Use the innerText property of the <code>body</code> node like this: <code>document.querySelector("body").innerText();</code></li>
<li>Use the textContent property to also get the contents of <code>script</code> tags in the body like this: <code>document.querySelector('body').textContent);</code></li>
</ol>

RegEx to substitute tag names, leaving the content and attributes intact

I would like to replace opening and closing tag, leaving the content of tags and its attribute intact.
Here is what I have:
<div class="QText">Text to be kept</div>
to be replaced with
<span class="QText">Text to be kept</span>
I tried this expression which finds all expressions I want but there seems to be no way to replace found expressions.
<div class="QText">(.*?)</div>
Thanks in advance.
I think #AmitJoki's answer will work well enough in certain circumstances, but if you only want to replace div elements when they have an attribute or a specific set of attributes, then you would want to use a regex replacement with backreferences - how you specify and refer to a backreference, unfortunately, depends upon your chosen editor. Visual Studio has the most unique and annoying "flavor" of regex I know of, while Dreamweaver has a fairly typical implementation (both as well as I imagine whatever editor you're using do regex replacement - you just have to know the menu item or keystroke to bring up the dialog).
If memory serves, Dreamweaver has replacement options when you hit Ctrl+F, while you have to hit Ctrl+H, so try those.
Once you get a "Find" and "Replace" box, you would put something like what you have in your last example above: <div class="QText">(.*?)</div> or perhaps <div class="(QText|RText|SText)">(.*?)</div> into your "Find" box, then put something like <span class="QText">\1</span> or <span class="\1">\2</span> in the "Replacement" box. A few utilities might use $1 to refer to a backreference rather than \1, but you'll have to lookup help or experiment to be sure.
If you are using a language to run this expression, you need to tell us which language.
If you are using a specific editor to run this expression, you need to tell us which editor.
...and never forget the prevailing wisdom on regex and HTML
Just replace div.
var s="<div class='QText'>Text to be kept</div>";
alert(s.replace(/div/g,"span"));
Demo: http://jsfiddle.net/9sgvP/
Mark it as answer if it helps ;)
Posted as requested
If its going to be literal like that, capture what's to be kept, then replace the rest,
Find: <div( class="QText">.*?</)div>
Replace: <span$1span>

Good way to store formatted text in DB to output later

I write news for my website and format it like this:
[h1]News[h1]
[red]Happy New Year[/red]
[white]Happy New Year[/white]
The news are stored as is on the MySQL DB.
Then when it's called by my website, a function converts every code into HTML format.
[h1][/h1] = <h1></h1>
[red][/red] = <font color=red></font>
I'm not happy with this method for a long time, but now such codes are obsolet for HTML5.
Instead of using I should add it to CSS.
I'm very beginner with PHP, MySQL, CSS, HTML...really, but I'm trying and learning.
So, what I need is the best solution for this matter.
I was thinking to create a CSS rule like:
span.news-red { color=red }
span.news-white { color=white }
And then them into the code for red text, etc...
Is this an effective solution or just a palliative?
Thank you.
EDIT
I have this two functions to convert format of my text in order to be outputed for the visitor.
1st = Converts [white-text][/white-text] into
$string = preg_replace("/\[white-text\](\S+?)\[\/white-text\]/si","<font color=white>\\1</font>", $string);
2nd - Converts [url][/url] into
$string = preg_replace("/\[url\](\S+?)\[\/url\]/si","\\1", $string);
Problems:
WHITE-TEXT - It only changes the color of one word phrases.
URL - It works fine, but I would like to be able to write anything in the readable part of the URL.
In general, you want to have styles of text that are common. Give them descriptions as to why you are doing what you are doing. If I were you, I would name them something as to what they are in the db. Then let's say you decide that Red is just a horrible choice of colors. You could always change it to a different one very easily, just by editing the CSS.
Not knowing why you choose to make something red, I can't give you much of an answer, other than to try and use the css name that relates to why you chose red, rather than what you are doing in the first place.

How do I put two spaces after every period in our HTML?

I need there to be two spaces after every period in every sentence in our entire site (don't ask).
One way to do it is to embark on manually adding a &nbsp after every single period. This will take several hours.
We can't just find and replace every period, because we have concatenations in PHP and other cases where there is a period and then a space, but it's not in a sentence.
Is there a way to do this...and everything still work in Internet Explorer 6?
[edit] - The tricky part is that in the code, there are lines of PHP that include dots with spaces around them like this:
<?php echo site_url('/css/' . $some_name .'.css');?>
I definitely don't want extra spaces to break lines like that, so I would be happy adding two visible spaces after each period in all P tags.
As we all know, HTML collapses white space, but it only does this for display. The extra spaces are still there. So if the source material was created with two spaces after each period, then some of these substitution methods that are being suggested can be made to work reliably - search for "period-space-space" and replace it with something more suituble, like period-space-&emsp14;. Please note that you shouldn't use because it can prevent proper wrapping at margins. (If you're using ragged right, the margin change won't be noticeable as long as you use the the nbsp BEFORE the space.)
You can also wrap each sentence in a span and use the :after selector to add a space and format it to be wide with "word-spacing". Or you can wrap the space between sentences itself in a span and style that directly.
I've written a javascript solution for blogger that does this on the fly, looks for period-space-space, and replaces it with a spanned, styled space that appears wider.
If however your original material doesn't include this sort of thing then you'll have to study up on sentence boundary detection algorithms (which are not so simple), and then modify one to also not trip over PHP code.
You might be able to use the JavaScript split method or regex depending on the scope of the text.
Here's the split method:
var el = document.getElementById("mydiv");
if (el){
el.innerText = el.innerText.split(".").join(".\xA0 ");
}
Test case:
Hello world.Insert spaces after the period.Using the split method.
Result:
Hello world. Insert spaces after the period. Using the split method.
Have you thought using output buffer? ob_start($callback)
Not tested, but if you'll stick this before any output (or betetr yet, offload the function):
<?php
function processDots($buffer)
{
return (str_replace(".", ". ", $buffer));
}
ob_start("processDots");
?>
and add this to end of input:
<?php ob_end_flush(); ?>
Might just work :)
If you're not opposed to a "post processing"/"javascript" solution:
var nodes = $('*').contents().map(function(a, b) {
return (b.nodeType === Node.TEXT_NODE ? b : null);
});
$.each(nodes, function(i,node){
node.data = node.data.replace(/(\.\s)/g, '.\u00A0\u00A0');
});
Using jQuery for the sake of brevity, but not required.
p.s. I saw your comment about not all periods and a space are to be treated equal, but this is about as good as it gets. otherwise, you're going to need a lot better/more bullet-proof approach.
Incorporate something like this into your PHP file:
<?php if (preg_match('/^. [A-Z]$/' || '/^. [A-Z]$/')) { preg_replace('. ', '. '); } ?>
This allows you to search for the beginning of each new sentence as in .spacespaceA-Z, or .spaceA-Z and then replaces that with . space. [note: Capital letter is not replaced]

RegEx: Link Twitter-Name Mentions to Twitter in HTML

I want to do THIS, just a little bit more complicated:
Lets say, I have an HTML input:
Don't break!
Some Twitter Users: #codinghorror, #spolsky, #jarrod_dixon and #blam4c.
You can't reach me at blam4c#example.com.
Is there a good RegEx to replace the twitter username mentions by links to twitter, but leave #example (eMail-Adress at the bottom) AND #test (in the link title, i.e. in HTML tags)?
It probably should also try to not add links inside existing links, i.e. not break this:
Hello #someone there!
My current attempt is to add ">" at the beginning of the string, then use this RegEx:
Search: '/>([^<]*\s)\#([a-z0-9_]+)([\s,.!?])/i'
Replace: '>\1#\2\3'
Then remove the ">" I added in step 1.
But that won't match anything but the "#blam4c". I know WHY it does so, that's not the problem.
I would like to find a solution that finds and replaces all twitter user name mentions without destroying the HTML. Maybe it might even be better to code this without RegEx?
First, keep the angle brackets out of your regexps.
Use a HTML parser and xpath to select the text nodes you are interested in processing, then consider a regexp for matching only #refs in those nodes.
I'll let to other people to try and give a specific answer to the regex part.
I agree with ddaa, there's almost no sane way to attack this without stripping the html links out first.
Presumably you'd be starting out with an actual Twitter message, which cannot by definition include any manually entered hyperlinks.
For example, here's how I found this question (the link resolves to this question so don't bother clicking it!)
Some Twitter Users: #codinghorror, #spolsky, #jarrod_dixon and #blam4c. http://bit.ly/2phvZ1
In this case, it's easy:
var msg = "Some Twitter Users: #codinghorror, #spolsky, #jarrod_dixon and #blam4c. http://bit.ly/2phvZ1";
var html = Regex.Replace(msg, "(?<!\w)(#(\w+))",
"$1");
(this might need some tweaking, I'd like to test it against a corpus, but it seems correct for the average Twitter message)
As for your more complicated cases (with HTML markup embedded in the tweets), I have no idea. Way too hard for me.
This regexp might work a bit better: /\B\#([\w\-]+)/gim
Here's a jsFiddle example of it in action: http://jsfiddle.net/2TQsx/4/