HTML Page Text search and navigation without pre-embedded tags - html

I'm looking for ideas/solutions for the following scenario:
I'm a website developer that is given 150'ish HTML pages from a 3rd party who update and re-issue the html pages from time to time.
I'm looking for a way to implement search functionality for these pages and then navigate to that location within the page.
I don't want to add navigation tags to the html pages as these would be lost when the 3rd party re-issue the html pages.
Ideally, I would like to have a search string, search the html files, then return a list of results (kinda like Google results) then when the user clicks on the link for a particular result, the page opens and navigates to the result location within the page.
I'm familiar with c#/javascript/jquery
Any ideas/suggestions to achieve this would be welcome...or confirmation that this cant be done :)

Don't Google, Bing, and other search engines provide APIs that let you use them to index the site then use their search capabilities to show results on only your site?

Related

How can I use google to find all unknown webpages point to one specific known web page?

It's easy to find all children pages of one webpage. But it is not trivial to get all parent pages of one webpage, how can I do that by using Google?
You can't and Google cannot help you as it doesn't index all of the web.
At best it follows links on other pages, or is initiated by someone wanting to have something indexed explicitly.
Create a server. Put an HTML page on it that no other page on your server has a link to. Name the page with some non-guessable UUID in the name.
Google will not find this unless they start to randomly change parts of URLs to test for existing pages (a lengthy process).
Within that page you can have links pointing to other pages. It is a parent page for those pages, is a web page, and will not be found via Google.

Searching for pages in my site

I have a html site I made at work for local use only it meets our requierments howver now I have a large number of pages (174 and growing) and I wanted to make a search function which will apear on the home page and search for the text enterd if found open my page if not redirect to a page stating not found.
The Nicety is that if I googel online all I get is how to build search engines or other non relative things.
dose anybody thus here know how to search the contents of a textbox and if the text matches a site page name that it is then opend?
Do you know any of server side programming language?
I guess you are working with only HTML so I would suggest you to learn any server side programming language and visit the following links so you can get the idea how things work.
http://www.codetrip.info/codetrip/104/Web-Development/What-is-the-difference-between-a-static-and-Dynamic-Web-site
http://answers.yahoo.com/question/index?qid=20130529053614AAbetfc

How to find the parent page of a webpage

I have a webpage that it cannot be accessed through my website.
Say, my website is www.google.com and the webpage that I cannot access using the website is like www.google.com/iamaskingthis/asdasd. This webpage appears on the google results when I type its content, however there is nothing which sends me to that page on my website.
I've already tried analyzing the page source to find its parent location but I can't seem to find it. I want to delete that page, but since I cannot find it, I can't destroy it either.
Thank you
You can use a robots.txt file to prevent search engine bots from visiting a page, and thus not showing search results for it.
For example, you can create a robots.txt file in the root of your website and add the following content to it:
User-agent: *
Disallow: /mysecretpage.html
More details at: http://www.robotstxt.org/robotstxt.html
There is no such concept as a 'parent page'. If you mean, by which link Google found the page, plese keep in mind, that it need not be under your control: If I put a link to www.google.com/iamaskingthis/asdasd on a page on my website and thegooglebat crawls it, it will know about it.
To make it short: There is no reliable way of hiding a page on a website. Use authentication, if you want to restrict access.
Google will crawl the page even if the button is gone, as it already has the page stored in it's records. The only way to disallow google crawling to it is either robots.txt or simply deleting it off the server (via FTP or your hostings control panel).

live content from html to html

I'm using UIWebView to display data from my organization data (publicize and legal), however, for instance, I would only want to pull specific data from the html file rather than pulling the whole URL. e.g. I want to pull the "News" section of the html and I want the user to only stay in that page, not enabling them to go into other parts of the website (e.g. home page, contact us) and allowing them to view the PDF article on the HTML file.
I've asked around and read up on DOM and screen scraping, but it seem that the data pulled are stored in a database instead.
Is there any way that I can pull just the HTML "News" section with the PDF URL into my customized HTML file and that it will be updated live (maybe every 30second it will refresh and pull information from the website so that the content and list of PDF are up to date)(e.g. added in 3new article into the main website, my customize HTML file will also refresh and pull information from website and update my article list)
If anyone can point to me a specific method that allow HTML to HTML data passing (live), that will be great and I can go do more research on it. Currently very lost and confuse as it is my first time doing this. Any help/feedback will be very much appreciated :)
EDIT: For example, google map or google search. I don't want to use the whole google webpage, just taking the important thing that i want like the search result or map display.
This will involve quite a lot of learning on your part - you'll have to learn HTML / the DOM / JavaScript and iOS/UIWebVIew.
Lets leave the live refresh part for now, I'll post another answer or edit to that later on.
That's not going to easy either (check out my earlier posting today on background execution issues that will affect you, unless the update is only to take place in the foreground
iOS Run Code Once a Day)
You will have to do something like this. And note that I've never tried this, nor seen posting of people who have on here, but in theory it should work, but there will be a lot of learning as I've said, and lots of trial and error. Its a big task when you're not familiar with these things.
1) Download the html page and load it in a UIWebView, but that UIWebView is hidden so the user's can't see it.
2) When the page has loaded its dom will be accessable.
3) You can use Javascript to access the DOM and look for the parts you want.
How you inject and run the Javascript in UIWebView can be answered in a separate question (this answer will get too long if all the exact details are included).
4) Remove the parts of the dom you are not interested in. Or use use events to make only those parts you are interested in appear, jQuery can probably help here.
5) Display the UIWebView
Alternatively the HTML could be saved to a file and string parsing could be used to search for the bits you are looking for and create a new text html file from it. I think this would get very messy, better to take advantage of the fact that UIWebView will parse the HTML page and create the dom for you.

dynamic sitemap : xml or html

I've created website with dynamic content, and I want google to know all my pages, so I've given a file "mysitemap.xml" via webmaster tools.
Basically, my links are like mysite.com/one-id/one-name , with one-id an id between 1 and 2000 (but will be greater with the time...).
I'm wondering if I need to create a page on my website (a kind of html sitemap), which will list all these links to help google bots to find my web pages, or is it enough for google to have the xml sitemap?
The problem is that the html sitemap will be very ugly and only a "for google" page, so I want to avoid this...
No, Google only requires a sitemap.
Sitemap is for Search Engines and Navigation is for humans.
Sitemap includes page's content type, update frequency, last modified, etc.
Navigation may include dropdown menus, hyperlinks, etc.