URL to an unnamed part of a web page - html

I'd like to refer to a specific part of a web page which I am not the author of, and which is not tagged with the NAME attribute. The specification of the part I have in mind could be made, e.g., as the location a certain word appears, and which could be manually reached via a FIND operation. I imagine something like
http://somesite.com#search-for:foo-bar
Is there some feature in HTML allowing for this?

No.
You can only link to elements with an id and a elements with a name.

Related

What is tppabs attribute?

As the title suggest, I've encountered this tag today. Searched for it, however couldn't find anything informative. It is something like this
<a href="activities.html" tppabs="http://www.dreamguys.co.in/hrms/activities.html">
What is the meaning of "tppabs" attribute?
From:
http://www.tenmax.com/teleport/support.htm
Q: I notice that the HTML pages Teleport creates will have "tppabs" tags in them. What are these and can I remove them?
A: The tppabs tags are created and used by Teleport as part of its Link Localization system. You can prevent the tags from being inserted by turning OFF Link Localization on the Project Properties, Browsing/Mirroring page; but then the links between files may not work correctly in the offline copy.
This is not a default attribute and will more than likely be a custom created attribute, you will need to look at your code to find out more.
Elements (like div and a) have starting and ending tags, and starting tags can contain attributes.
As for the status of tppabs, it has never existed. It's inserted into markup by Teleport Pro and contains the absolute form of a URL. This allows the software to locate a resource once the document has been downloaded. It serves no purpose as far as HTML is concerned.

Algorithm to develop an article extractor

I have undertaken a project which will extract the main content from any webpage. For example, if I input the URL of any news article, it will return the article part only. The first step would be getting the source code of the given URL. There are many ways to do it. After getting HTML code of given webpage, I will keep the part inside <body> tag because obviously article will be somewhere inside body.
After this, I am selecting each div element and checking how much text it contains. At end I am selecting the div with most text inside it.
Other way I am thinking is, for each <p> element, I will check the parent of it. At end, I will select the div which has most <p> child directly. To understand it better check this tree- Tree of an HTML
Now I know that these methods are the basic and that's why I am asking this question. I want to know the suggestions of the community about this. What approaches you all use?
I like the idea of implementing your own 'News' crawler...
A few suggestions:
Check the source ('Right Click' > 'Inspect' at chrome) of some popular sites (e.g. The New York Times); search for common html object names, ids or classes they use to identify the different blocks in the html; for instance: divs with 'story' or 'story-body' ids.
I would go with the word count, but also use a dictionary of common phrases, which are likely to appear in a news article.
I would search for the block within 'header' and 'footer', excluding comments section or advertisements (again, by searching the values of the object id or class names).
Start your crawling from the main page, it will probably have references to the sub pages or articles - once you have the reference (e.g. a header or article name), it will help you navigate in the sub page itself.
In any case, I suggest working with java jsoup library - it will make your life easier; use it with the jquery-like selectors.
Goodluck.

Making a direcly link on the page

I want to do this:
this is the list
- Option 1 how to keep warm
- Option 2 how to keep cold
way down on the doc. comes the answer
here is the answer for keeping you warm (this is were I want to go to. option 1)
You're looking for named anchor/named target/bookmark anchor links. Their format is essentially the same as that of normal anchors, however instead of pointing to a page, you point to the ID of the element you want to jump to.
For example:
Option 1 how to keep warm
Then further on down the page:
<h4 id="option1">here is the answer for keeping you warm</h4>
See: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/a, specifically the section that says:
href This is the single required attribute for anchors defining a hypertext source link. It indicates the link target, either a URL or a
URL fragment. A URL fragment is a name preceded by a hash mark (#),
which specifies an internal target location (an ID) within the current
document. URLs are not restricted to Web (HTTP)-based documents. URLs
might use any protocol supported by the browser. For example, file,
ftp, and mailto work in most user agents.

Retrieve all hashes in a page for URL use

I am trying to copy a link from this site (stack overflow), but I like the link to include a hash so when someone clicks on the link they go directly to the answer I would like them to see. How can I find the hashes in a page?
Example:
http://www.blahblah.com/index.php#label
How can I know there is a #label, and how to find it?
The value of the hash is simply the ID attribute of any element in the page.
You can see them in the source or the DOM inspector.
Are you looking for something like this?
var hash = window.location.hash;
There might not be a simple answer for your here. In a pure HTML context (i.e. excluding javascript functionality). The has would reference an anchor on the page like this:
<a name="label"></a>
So you could just look for named anchors.
Now, if you are talking about javascript functionality it gets much more complex. Via javascript you can use a hash tag like that and make it do any number of things (like show a hidden element with id="label", download some content asynchronously based on that hash, etc. So there might not be an easy way to determine allowable values.

Link to a section of a webpage

I want to make a link that when clicked, sends you to a certain line on the page (or another page). I know this is possible, but how do I do it?
your jump link looks like this
jump link
Then make
<div id="div_id"></div>
the jump link will take you to that div
Hashtags at the end of the URL bring a visitor to the element with the ID: e.g.
http://stackoverflow.com/questions/8424785/link-to-a-section-of-a-webpage#answers
Would bring you to where the DIV with the ID 'answers' begins. Also, you can use the name attribute in anchor tags, to create the same effect.
Resource
The fragment identifier (also known as: Fragment IDs, Anchor Identifiers, Named Anchors) introduced by a hash mark # is the optional last part of a URL for a document. It is typically used to identify a portion of that document.
Link to fragment identifier
Syntax for URIs also allows an optional query part introduced by a question mark ?. In URIs with a query and a fragment the fragment follows the query.
Link to fragment with a query
When a Web browser requests a resource from a Web server, the agent sends the URI to the server, but does not send the fragment. Instead, the agent waits for the server to send the resource, and then the agent (Web browser) processes the resource according to the document type and fragment value.
Named Anchors <a name="fragment"> are deprecated in XHTML 1.0, the ID attribute is the suggested replacement. <div id="fragment"></div>
If you are a user and not a site developer, you can do it as follows:
https://example.com/index.html#:~:text=foo
Simple:
Use <section>.
and use Visit the Useful Tips Section
w3school.com/html_links