Extracting metadata from Web Pages [duplicate] - html

I was wondering if there's a way in javascript that allows me to process the html source code that allows me to take out specific tags that I want?
Sorry if it sounds easy or too simple. i am new to programming.

If you have the HTML in a string, then you can use:
var str = '<html></html>'; // your html text goes here
var div = document.createElement('div');
div.innerHTML = str;
var dom = div.firstChild; // dom is the object you want,
// you can manipulate it using standard dom methods
Alternately, use jQuery. jQuery is a library to help you manipulate and access HTML elements more easily. First, add this to the head of your document:
<script type="text/javascript" src="https://ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.min.js"></script>
This is a reference to the jQuery library. Then, do:
var foo = $("<html>Your html here</html>");
Or, if your html is in a variable (e.g. str), you can do:
var foo = $(str);
Then, you can manipulate and parse foo in a number of ways. For example, to remove all paragraph elements, you would use
foo.remove('p');
Or, to remove the paragraph element with id="bar", use:
foo.remove('p.bar');
Once you are done your modifications, you can get the new html text using:
foo.html();
Why is your html in a string? Is it not the html of the current page?

Use DOM it can pull data from webpages if you know the structure.

Related

Trying to pull through P tags from render section to place else where

Hi I'm using rendersection in MVC. Is it possible to store any paragraph tags seperately in a seperate variable. My current code is this
var menuText1= RenderSection("text`", false).ToHtmlString();
and i push the content on the front end like this:
#Html.Raw(menuText1)
The actuall content in menuText1 consists of several anchor tags and one paragraph tag is possible to pull through both sets of content seperately
If you would consider using a DOM Parser. You can install HTML Agility Pack from HTML Agility Pack Nuget Package
Once installed you can load your HTML in the HTML Document Object. Then you can get the tags as shown below.
var doc = new HtmlDocument();
doc.LoadHtml("Your HTML");
var pTags = doc.DocumentNode.Descendants("p").ToList();
var aTags = doc.DocumentNode.Descendants("a").ToList();
Hope that can help!

Setting HTML tag attribute with value containing HTML entity, via knockoutjs

How can I set the title attribute of an HTML tag via knockoutjs in a way that will cause any HTML entities in the tag contents to be evaluated and displayed (i.e. not escaped)?
Example:
<div data-bind="attr: { title: titleObservable }"></div>
In the above example, if titleObservable contains an HTML entity, it will not be rendered, rather the entity name will be displayed. See this fiddle for a working example. Notice that when you hover over the div, the title text contains &#39 instead of the apostrophe symbol.
I know that when setting the contents of an HTML tag with the knockoutjs text binding that HTML is escaped for security reasons (see this thread). I am assuming that this is what is happening to the entity in my title attribute. I also know that I can just embed the apostrophe directly into the title attribute, but I would like to know if there is a way that I can do this with the HTML entities (due to certain limitations on the project I am working on).
The only way to use HTML entities within Javascript (which Knockout bindings use) is through innerHTML. All other access to the DOM uses UTF-8 text.
I suggest that you update your code to use plain text within your model and only use HTML entities within actual HTML documents. But if you cannot do so, you can use a custom binding handler that converts from HTML to text before setting the DOM property. Here's one I just made that sets the title.
ko.bindingHandlers.myTitle = {
update: function(element, valueAccessor) {
var value = ko.utils.unwrapObservable(valueAccessor());
var d = document.createElement('div');
d.innerHTML = value;
element.title = d.innerText;
}
};
Example: http://jsfiddle.net/mbest/TMSHB/2/

How to convert HTML tags (eg: <) into their correct tags (eg: <) in contentEditable?

How can I convert tags in a ContentEditable div so users can write in to the div and once done they can click a button and the code will be displayed
Eg: A user enters: <li>lol</li> it outputs <li>lol</li> can I make it so it outputs the correct HTML?
Actually, if you use .text() to retrieve the contents of your div, you keep the html markup.
Demo: http://codepen.io/anon/pen/pzFEa
(function(){
$('button').on('click', function(){
var value = $('div').text();
alert(value);
});
})();
Or do you mean that you get HTML entities server-side? If so, please clarify.
At any rate, I recommend you not to let your users type HTML. Instead, let them use Markdown.

Regular expression for getting HREF from footer

I have a requirement where I need to get the last HREF in the HTML code, means getting the HREF in the footer of the page.
Is there any direct regular expression for the same?
No regex, use the :last jQuery selector instead.
demo :
foo
bar
var link = $("a:last");
You could use plain JavaScript for this (if you don't need it to be a jQuery object):
var links = document.links;
var lastLink = links[links.length - 1];
var lastHref = lastLink.href;
alert(lastHref);
JS Fiddle demo.
Disclaimer: the above code only works using JavaScript; as HTML itself has no regex, or DOM manipulation, capacity. If you need to use a different technology please leave a comment or edit your question to include the relevant tags.
It's not a good idea to parse html with regular expressions. Have a look at HtmlParser
to parse html.

Is there anyway I can put a variable inside a value tag in HTML?

I would like to have a
<input type="text" value=VARIABLENAME />.
Is there anyway I can do this? Putting value = "VARIABLENAME" interprets it as the name of the variable. But I would like to assign the content of the variable to the value property.
EDIT: The variable is from the text content of one of my tables. I got the variable by using doing something like this in my script tag.
selectedScheduleName = e.target.childNodes[0].wholeText;
Thank you.
Yes, you can assign a value to the input's value property from a variable, e.g.:
theInput.value = theVariable;
You do this in the JavaScript, after getting a reference to the input element.
So for instance, if you give the input an id value of "foo", you can do this:
document.getElementById("foo").value = theVariable;
...within a script tag. (Be sure that the input has already been added to the DOM first, either by putting the script after it — the bottom of the body tag is good — or by using window's load event or, if you use a library that supports one, some kind of "dom ready" event.)
The element doesn't have to have an id, if you can get at it via getElementsByTagName or by the form element's elements array, etc., etc.
Handy references:
DOM2 Core specification (well supported cross-browser)
DOM2 HTML specification (reasonably well supported cross-browser)
DOM3 Core specification (not quite as well supported cross-browser yet)
The HTML5 specification now has IDL for the HTML DOM objects in it directly (supplanting/supplementing the DOM2 HTML spec), such as for HTMLElement, HTMLFormElement, and HTMLInputElement
If as you say you have a JavaScript variable you wish to apply:
var foo = "Testing";
document.GetElementById('ElementID').value = foo;
or using jQuery:
var foo = "Testing";
$("#ElementID").val(foo);