How to match with RegExp outside of HTML Tags [duplicate] - html

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
RegEx match open tags except XHTML self-contained tags
How can I match some alphanumerical words that are outside of an HTML Tag instead of match every words
Example:
<div id="mariano mariano mariano" nota="mariano/mariano">mariano was looking forward Mariano. I want to match this "Mariano" too. Mariano</div>
In this example I want to match all "Mariano" outside of the tag id.
I think the key of this issue is looking forward for a "<" before a ">" and match that word, but if the regex find ">" before a "<" this means that the word is in the tag,
But I couldn't manage to achieve/produce a Regex for this.
I fail trying to concat this Regex (?<=^|>)[^><]+?(?=<|$) with another one.
And my final lowest quality solution was:
<!-- language: lang-js -->
var searchFor = new RegExp("((!?<=^|>)" + termino + ")","ig");
var searchFor2 = new RegExp("(" + termino + "(?=<|$))","ig");
var searchFor3 = new RegExp("(!?<=^|[\\s\\.;,])" + termino + "(?=[\\s\\.;,]|$)","ig");
but those 3 don't cover all the alternatives.
Edit: Im working with javascript:
<script>
container.find("p, span, div, .texto,").each(function() {
var containerText = $(this).html();
for (var i = 0; i < terms.length; i++) {
var termino = terms[i];
// 1st issue ">termino" was remplaced for: ">Pedro"
var searchFor = new RegExp("((!?<=^|>)" + termino + ")","ig");
containerText = containerText.replace(searchFor,">Pedroedro");
// 2nd issue "termino<" was remplaced for: "Pedro"
var searchFor2 = new RegExp("(" + termino + "(?=<|$))","ig");
containerText = containerText.replace(searchFor2,"Pedro");
// 3rd issue "[\.\s,;:]termino[\.\s,;:]
var searchFor3 = new RegExp("(!?<=^|[\\s\\.;,])" + termino + "(?=[\\s \\.;,]|$)","ig");
containerText = containerText.replace(searchFor3," Pedro");
};
$(this).html(containerText);
});
</script>

A few things -
Welcome to stackoverflow!
Please, search for questions before asking. There are numerous results for parsing
xml with regex.
Don't use regex expressions for parsing xml/html! Try xpath!
var termino = // how ever you were defining before...
// Give me all divs, where the text content contains value of "termino"
var iterator = document.evaluate('//div/text()[contains(.,' + termino + ')]', documentNode, null, XPathResult.UNORDERED_NODE_ITERATOR_TYPE, null );
try {
// init thisNode to the first item in the iterator
var thisNode = iterator.iterateNext();
// go through all items, alert their content (which should contain termino)
while (thisNode) {
alert( thisNode.textContent );
thisNode = iterator.iterateNext();
}
}
catch (e) {
dump( 'Error: Document tree modified during iteration ' + e );
}

Related

xpath in apps script?

I made a formula to extract some Wikipedia data in Google Seets which works fine. Here is the formula:
=regexreplace(join("",flatten(IMPORTXML(D2,".//p[preceding-sibling::h2[1][contains(., 'Geography')]]"))),"\[[^\]]+\]","")&char(10)&char(10)&iferror(regexreplace(join("",flatten(IMPORTXML(D2,".//p[preceding-sibling::h2[1][contains(., 'Education')]]"))),"\[[^\]]+\]",""))
Where D2 is a URL like https://en.wikipedia.org/wiki/Abbeville,_Alabama
This extracts some Geography and Education data from the Wikipedia page. Trouble is that importxml only runs a few times before it dies due to quota.
So I thought maybe better to use Apps Script where there are much higher limits on fetching and parsing. I could not see a good way however of using Xpath in Apps Script. Older posts on the web discuss using a deprecated service called Xml but it seems to no longer work. There is a Service called XmlService which looks like it may do the job but you can't just plug in an Xpath. It looks like a lot of sweating to get to the result. Any solutions out there where you can just plug in Xpath?
Here is an alternative solution I actually do in a case like this.
I have used XmlService but only for parsing the content, not for using Xpath. This makes use of the element tags and so far pretty consistent on my tests. Although, it might need tweaks when certain tags are in the result and you might have to include them into the exclusion condition.
Tested the code below in both links:
https://en.wikipedia.org/wiki/Abbeville,_Alabama#Geography
https://en.wikipedia.org/wiki/Montgomery,_Alabama#Education
My test shows that the formula above used did not return the proper output from the 2nd link while the code does. (Maybe because it was too long)
Code:
function getGeoAndEdu(path) {
var data = UrlFetchApp.fetch(path).getContentText();
// wikipedia is divided into sections, if output is cut, increase the number
var regex = /.{1,100000}/g;
var results = [];
// flag to determine if matches should be added
var foundFlag = false;
do {
m = regex.exec(data);
if (foundFlag) {
// if another header is found during generation of data, stop appending the matches
if (matchTag(m[0], "<h2>"))
foundFlag = false;
// exclude tables, sub-headers and divs containing image description
else if(matchTag(m[0], "<div") || matchTag(m[0], "<h3") ||
matchTag(m[0], "<td") || matchTag(m[0], "<th"))
continue;
else
results.push(m[0]);
}
// start capturing if either IDs are found
if (m != null && (matchTag(m[0], "id=\"Geography\"") ||
matchTag(m[0], "id=\"Education\""))) {
foundFlag = true;
}
} while (m);
var output = results.map(function (str) {
// clean tags for XmlService
str = str.replace(/<[^>]*>/g, '').trim();
decode = XmlService.parse('<d>' + str + '</d>')
// convert html entity codes (e.g.  ) to text
return decode.getRootElement().getText();
// filter blank results due to cleaning and empty sections
// separate data and remove citations before returning output
}).filter(result => result.trim().length > 1).join("\n").replace(/\[\d+\]/g, '');
return output;
}
// check if tag is found in string
function matchTag(string, tag) {
var regex = RegExp(tag);
return string.match(regex) && string.match(regex)[0] == tag;
}
Output:
Difference:
Formula ending output
Script ending output
Education ending in wikipedia
Note:
You still have quota when using UrlFetchApp but should be better than IMPORTXML's limit depending on the type of your account.
Reference:
Apps Script Quotas
Sorry I got very busy this week so I didn't reply. I took a look at your answer which seems to work fine, but it was quite code heavy. I wanted something I would understand so I coded my own solution. not that mine is any simpler. It's just my own code so it's easier for me to follow:
function getTextBetweenTags(html, paramatersInFirstTag, paramatersInLastTag) { //finds text values between 2 tags and removes internal tags to leave plain text.
//eg getTextBetweenTags(html,[['class="mw-headline"'],['id="Geography"']],[['class="wikitable mw-collapsible mw-made-collapsible"']])
// **Note: you may want to replace &#number; with ascII number
var openingTagPos = null;
var closingTagPos = null;
var previousChar = '';
var readingTag = false;
var newTag = '';
var tagEnd = false;
var regexFirstTagParams = [];
var regexLastTagParams = [];
//prepare regexes to test for parameters in opening and closing tags. put regexes in arrays so each condition can be tested separately
for (var i in paramatersInFirstTag) {
regexFirstTagParams.push(new RegExp(escapeRegex(paramatersInFirstTag[i][0])))
}
for (var i in paramatersInLastTag) {
regexLastTagParams.push(new RegExp(escapeRegex(paramatersInLastTag[i][0])))
}
var startTagIndex = null;
var endTagIndex = null;
var matches = 0;
for (var i = 0; i < html.length - 1; i++) {
var nextChar = html.substr(i, 1);
if (nextChar == '<' && previousChar != '\\') {
readingTag = true;
}
if (nextChar == '>' && previousChar != '\\') { //if end of tag found, check tag matches start or end tag
readingTag = false;
newTag += nextChar;
//test for firstTag
if (startTagIndex == null) {
var alltestsPass = true;
for (var j in regexFirstTagParams) {
if (!regexFirstTagParams[j].test(newTag)) alltestsPass = false;
}
if (alltestsPass) {
startTagIndex = i + 1;
//console.log('Start Tag',startTagIndex)
matches++;
}
}
//test for lastTag
else if (startTagIndex != null) {
var alltestsPass = true;
for (var j in regexLastTagParams) {
if (!regexLastTagParams[j].test(newTag)) alltestsPass = false;
}
if (alltestsPass) {
endTagIndex = i + 1;
matches++;
}
}
if(startTagIndex && endTagIndex) break;
newTag = '';
}
if (readingTag) newTag += nextChar;
previousChar = nextChar;
}
if (matches < 2) return 'No matches';
else return html.substring(startTagIndex, endTagIndex).replace(/<[^>]+>/g, '');
}
function escapeRegex(string) {
if (string == null) return string;
return string.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');
}
My function requires an array of attributes for the start tag and an array of attributes for the end tag. It gets any text in between and removes any tags found inbetween. One issue I also noticed was there were often special characters (eg  ) so they need to be replaced. I did that outside the scope of the function above.
The function could be easily improved to check the tag type (eg h2), but it wasn't necessary for the wikipedia case.
Here is a function where I called the above function. the html variable is just the result of UrlFetchApp.fetch('some wikipedia city url').getContextText();
function getWikiTexts(html) {
var geography = getTextBetweenTags(html, [['class="mw-headline"'], ['id="Geography']], [['class="mw-headline"']]);
var economy = getTextBetweenTags(html, 'span', [['class="mw-headline"'], ['id="Economy']], 'span', [['class="mw-headline"']])
var education = getTextBetweenTags(html, 'span', [['class="mw-headline"'], ['id="Education']], 'span', [['class="mw-headline"']])
var returnString = '';
if (geography != 'No matches' && !/Wikipedia/.test(geography)) returnString += geography + '\n';
if (economy != 'No matches' && !/Wikipedia/.test(economy)) returnString += economy + '\n';
if (education != 'No matches' && !/Wikipedia/.test(education)) returnString += education + '\n';
return returnString
}
Thanks for posting your answer.

Setting a custom #font in a <title> tag? Is it possible? [duplicate]

This question already has answers here:
Can we set style to title tag in header
(6 answers)
Closed 7 years ago.
I want to create my custom font, having my logo as a symbol. And would set it for a <title> tag to display it in browsers.
— So, is it possible?
— Any tricks for it?
Thanks a lot for any help and ideas!
There is no way to change the font in the title. Sadly the W3C Standard does not allow this.
The only thing I know, is to set a custom behavior on the document.title with javascript.
Here are two examples, but you have to try it in your own enviroment, because in this build in stackoverflow-interpreter document.title is not available.
var titleText = " t i t l e ";
var pos = 0;
var blinkCount = 0;
var blink = [".....", ".. ..", ". ."];
var doScrollingTimeout = null;
var doBlinkTimeout = null;
function DoScrolling() {
clearTimeout(doBlinkTimeout);
document.title = titleText.substring(pos, titleText.length) + blink[0] + titleText.substring(0, pos);
pos++;
if (pos > titleText.length) {
pos = 0;
}
doScrollingTimeout = window.setTimeout("DoScrolling()", 150);
}
function DoBlink() {
clearTimeout(doScrollingTimeout);
document.title = blink[blinkCount % 3] + titleText.slice(0, blinkCount) + titleText.charAt(blinkCount).toUpperCase() + titleText.slice(blinkCount + 1, titleText.length - 1) + blink[blinkCount % 3];
blinkCount++;
if (blinkCount == titleText.length) {
blinkCount = 0;
}
doBlinkTimeout = window.setTimeout("DoBlink()", 350);
}
DoScrolling();
Copy the javascript and HTML to a local file, because in the stackoverflow-interpreter document.title is not available
<button onclick="DoScrolling();">Scroll</button>
<button onclick="DoBlink()">Blink</button>

Using jQuery to find <em> tags and adding content within them

The users on my review type of platform highlight titles (of movies, books etc) in <em class="title"> tags. So for example, it could be:
<em class="title">Pacific Rim</em>
Using jQuery, I want to grab the content within this em class and add it inside a hyperlink. To clarify, with jQuery, I want to get this result:
<em class="title">Pacific Rim</em>
How can I do this?
Try this:
var ems = document.querySelectorAll("em.title");
for (var i = 0; i < ems.length; ++i) {
if (ems[i].querySelector("a") === null) {
var em = ems[i],
text = jQuery(em).text();
var before = text[0] == " ";
var after = text[text.length-1] == " ";
text = text.trim();
while (em.nextSibling && em.nextSibling.className && em.nextSibling.className.indexOf("title") != -1) {
var tmp = em;
em = em.nextSibling;
tmp.parentNode.removeChild(tmp);
text += jQuery(em).text().trim();
++i;
}
var link = text.replace(/[^a-z \-\d']+/gi, "").replace(/\s+/g, "+");
var innerHTML = "<a target=\"_blank\" href=\"http://domain.com/?=" + link + "\">" + text + "</a>";
innerHTML = before ? " " + innerHTML: innerHTML;
innerHTML = after ? innerHTML + " " : innerHTML;
ems[i].innerHTML = innerHTML;
}
}
Here's a fiddle
Update: http://jsfiddle.net/1t5efadk/14/
Final: http://jsfiddle.net/186hwg04/8/
$("em.title").each(function() {
var content = $(this).text();
var parameter_string = content.replace(/ /g, "+").trim();
parameter_string = encodeURIComponent(parameter_string);
var new_content = '' + content + '';
$(this).html(new_content);
});
If you want to remove any kind of punctuation, refer to this other question.
$('em.title').html(function(i,html) {
return $('<a/>',{href:'http://domain.com/?='+html.trim().replace(/\s/g,'+'),text:html});
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<em class="title">Pacific Rim</em>
UPDATE 1
The following updated version will perform the following:
Grab the contents of the em element
Combine with the contents of the next element, if em and remove that element
Create a query string parameter from this with the following properties
Remove the characters ,.&
Remove html
Append the query parameter to a predetermined URL and wrap the unmodified contents in an e element with the new URL.
DEMO
$('em.title:not(:has(a))').html(function() {
$(this).append( $(this).next('em').html() ).next('em').remove();
var text = $(this).text().trim().replace(/[\.,&]/g,'');
return $('<a/>',{href:'http://domain.com/?par='+encodeURIComponent(text),html:$(this).html()});
});
Or DEMO
$('em.title:not(:has(a))').html(function() {
$(this).append( $(this).next('em').html() ).next('em').remove();
var text = $(this).text().trim().replace(/[\.,&]/g,'').replace(/\s/g,'+');
return $('<a/>',{href:'http://domain.com/?par='+text,html:$(this).html()});
});
UPDATE 2
Per the comments, the above versions have two issues:
Merge two elements that may be separated by a text node.
Process an em element that's wrapped in an a element.
The following version resolves those two issues:
DEMO
$('em.title:not(:has(a))').filter(function() {
return !$(this).parent().is('a');
}).html(function() {
var nextNode = this.nextSibling;
nextNode && nextNode.nodeType != 3 &&
$(this).append( $(this).next('em').html() ).next('em').remove();
var text = $(this).text().trim().replace(/[\.,&]/g,'').replace(/\s/g,'+');
return $('<a/>',{href:'http://domain.com/?par='+text,html:$(this).html()});
});
Actually,if you just want to add a click event on em.title,I suggest you use like this:
$("em.title").click(function(){
q = $(this).text()
window.location.href = "http://www.domain.com/?="+q.replace(/ /g,"+")
}
you will use less html code on browser and this seems simply.
In addition you may need to add some css on em.title,like:
em.title{
cursor:pointer;
}
Something like this?
$(document).ready(function(){
var link = $('em').text(); //or $('em.title') if you want
var link2 = link.replace(/\s/g,"+");
$('em').html('' + link + '');
});
Ofcourse you can replace the document ready with any type of handler
$('.title').each(function() {
var $this = $(this),
text = $this.text(),
textEnc = encodeURIComponent(text);
$this.empty().html('' + text + '');
});
DEMO

Javascript ReGEX for JSON

This is all so confusing, I've seen so many examples of how to do different things and cannot seem to find a valid example for what I am trying to do.
I'm using the YQL for the stock quotes only to get just the major indexes, DOW S&P 500 and NASDAQ.
The project is getting the data and working, but I need to determine if the stock value is returning + or - (up or down).
if the market is up or flat, I want to add a CSS class to set it to green, if it is down, I want to set a CSS to red.
One other issue, this only seems to work when I place the function between the head and body, not in the head, not in the body.
<script type="text/javascript">
function stock_quotes(obj)
{
var items = obj.query.results.quote;
var output = '';
var num_quotes = items.length;
items[0].symbol = "DOW ";
items[1].symbol = "NASDAQ ";
items[2].symbol = "S&P 500 ";
//var posquote = {"\d\.?\d{0,9}\.\d{0,9}\s\+"};
//var negquote = {"\d\.?\d{0,9}\.\d{0,9}\s\-"};
for (var i = 0; i < num_quotes; i++) {
var link = items[i].url;
var symbl = items[i].symbol;
var Change_PercentChange = items[i].Change_PercentChange;
var LastTradePriceOnly = items[i].LastTradePriceOnly;
output += "<table><tr><td>" + "<a href='" + link + "'>" + symbl + "</a>" + LastTradePriceOnly + " " + Change_PercentChange + "</td></tr></table>";
}
// Place news stories in div tag
document.getElementById('results').innerHTML = output;
}
This is the HTML with the query
<div id='results'></div>
<script type="text/javascript" src='http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.quotes%20where%20symbol%20in%20(%22DOW%22%2C%22%5EIXIC%22%2C%22%5EGSPC%22)%0A%09%09&format=json&diagnostics=true&env=http%3A%2F%2Fdatatables.org%2Falltables.env&callback=stock_quotes'></script>
Ideally I'd like to predefine the html elements which would make it easier to set the css class but one headache at a time.
In the end, this seemed to work just fine
var match_nas_neg = nas_result.match(/\-/);

How I find that string contain a character more then 6 time in Flex?

I want to implement an alogorithm/validation. How can I find out if a string contains a specific character more than 6 times in Flex ?
There are 2 ways, I can think of:
Use RegExp and .replace() like this:
var ch:String = "a"; //Character, that must be checked
var text:String = "This is an example to show how many times '"+ch+"' occured.";
//Matches non-`ch` characters
var regexp:RegExp = new RegExp("[^"+ch+"]","g");
//Replacing non-`ch` characters with empty string
var timesOccured:Number = text.replace(regexp,"").length;
trace(text, ": " ,timesOccured );
Use RegExp and .match() like this:
var ch:String = "a"; //Character, that must be checked
var text:String = "This is an example to show how many times '"+ch+"' occured.";
//Matches `ch` characters
var regexp:RegExp = new RegExp(ch,"g");
var matches:Array = text.match(regexp);
var timesOccured:Number = 0;
//`matches` can be 'null', so we are performing additional check
if( matches ){
timesOccured = matches.length;
}
trace(text, ": " ,timesOccured );
Now when you have timesOccured, you could easily do your validation:
if( timesOccured > 6 ){
//Do some stuff
}else{
//Do other stuff
}
Warning: If your ch is a special character for Regular Expression, like a .,+,(,],\,etc..., you need to escape it, before passing it to regexp variable:
ch = ch.replace(new RegExp("[.*+?|()\\[\\]{}\\\\]", "g"), "\\$&");
a simpler alternative to regular expressions can be the following:
var str:String = "This is an example to show how many...";
//find occurrences for character 'a'
trace("Ocurrences:" + str.split('a').length-1);