How to implement a search in a HTML site?

How to implement a search in a HTML site? - html

I am really wondering if i can use search for a HTML website. The pages are static. I just want the users to able to search for contents of my site. and the results shown with in my site itself. Is there anyway i can achieve this. I can use PHP on my server.
Google search can be implemented but it takes you to google's page to show the results

You're better off not creating your own search engine - there's loads of good ones that can be integrated into your site, which will be better than you can write yourself.
Google is the most popular search engine, so you might as well use that. As an alternative to customising the html results page, you could use the Google AJAX Search API - this does your search, and inserts the results to a specific element on your page. (DON'T forget people with javascript turned off, however...)

I like easy and fast, so consider Google Custom Search

Possibilities are:
Database Extraction (http://www.ibm.com/developerworks/library/os-php-sphinxsearch/)
Search Indexer (http://framework.zend.com/manual/en/zend.search.lucene.html#zend.search.lucene.overview)
Custom Search Crawler
As for customise Google Custom Search: http://googlecustomsearch.blogspot.com/2009/10/structured-custom-search.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+blogspot%2FSyga+%28Google+Custom+Search%29

I think you can customize Google's result page.

Well, if you have php available, I would definitely suggest using that. If I were you, I would go through some PHP tutorials, and learn the basics.
W3Schools has some great tutorials.
Then, I would do some searches on building a text based database on your site, or use a clever solution like this one. You can build a small database with metadata and store it in a text file, and it should get you going. Good luck.

Related

Building a search bar

I'm currently creating a website for a school project, and I would like to add a search bar in which we can look for a word on the website. It would work like a traditional Ctrl+F (Command+F) but for every html file on the website, and then present the result either on a pop-up or on a different page.
I believe this requires a specific software, but I don't know which one, and I'm pretty sure I don't know how to use it.
Thanks for any insight on how to do that!

You have a plenty of options. The particular choice will depend on your site specifics (is the content static or dynamic etc.). Anyway you will need to build a search index. I suggest to use something almost ready to go. See elasticsearch and a crawler for it.

One easy way would be to use Google Custom Search if the website will be online.
Another option would be to build or use an existing php search script. In this case your webhoster must support php. for static webpages this is not the best option, as the script has to search through all files everytime you search something.
It will be a lot easier if you use a CMS (Content Management System) to build your site, they usually have the Search-functionality included or as an addon.
For more information also checkout this post

Web comic aggregator RSS feed general questions?

I am trying to create a web comic aggregation website using HTML 5, CSS 3, and JavaScript. I would like users to be able to view comics of different dates from different websites all in one place. After some research, it seems like I'm probably going to need to use an RSS feed to accomplish this. However, I don't fully understand the capabilities and usage of an RSS feed.
First, would it be possible to pull images from comic websites in an automated and orderly fashion using an RSS feed? Or would I need to use something else? Or would it not be possible at all? If it is possible with an RSS feed, I'm also confused somewhat about the general implementation. Would the code for the feed be in HTML, or JavaScript, or both? Would I need to use existing libraries and or APIs? Is their existing code with a similar enough function that I could use it as a starting point?
Thanks

You are in the right direction - RSS is a standard format used to update users/readers of newly published contents.
I'm sure you've searched it already, but its Wikipedia page is quite informative. Basically, it is a standardisation and extension of xml allowing for a uniform way to distribute and read material in an automated fashion.
In the same way there are other formats, such as Atom.
So, for your purpose the main thing to understand is that you want to READ RSS feeds, rather than writing/making one (although you could make one as well - combining the comics you've found). for example, at the bottom of xkcd you can see there are two links - one for an RSS feed and another for an Atom feed. You need to find websites like that, which publish RSS/Atom feeds of comic strips and write your site to read their feed and update itself with the new content. You can maybe automate even the way your site links to feeds by using (if you find one) or creating a feed for comic feeds (so your site would lookup this feed which would contain links to other feeds which would all be appropriate for you).
You could also put up a backend on a server that would fetch the feeds and update a database/databases from which the front-end would fetch the content from using one linking point, but let's stick with the technologies you've mentioned - for a client-based-website for now.
To read and parse the feeds you can look at the answer here, recommending using jFeed, a plugin for jQuery (jQuery is a very popular library for javaScript, if you don't know it)
I'm pretty sure that answers your questions, but let's address them again, dividing it down and going one by one:
would it be possible to pull images from comic websites in an automated and orderly fashion using an RSS feed?
Yes! As you can see in the feed of xkcd I've linked above, it is both possible and widely used to pull/distribute images using RSS (and Atom) feeds.
would I need to use something else?
You can use Atom, which is just a different standard, but fairly the same idea (also an extension of xml, still you can use jFeed)
would it not be possible at all?
It is possible. Do not worry. Stay calm and code away.
If it is possible with an RSS feed, I'm also confused somewhat about the general implementation. Would the code for the feed be in HTML, or JavaScript, or both?
Do not confuse the feed's code with yours. Your code should READ the feed. Not be it. The feed itself, as explained above is written in a standard form of xml called RSS (or Atom if you go with that). But that is what your code reads. For you code see next question/answer.
Would I need to use existing libraries and or APIs? Is their existing code with a similar enough function that I could use it as a starting point?
As mentioned above - you can use jQuesry and the plugin jFeed for it.
Hope that helps and is not confusing.

How to add all of the websites Google has indexed to a Google Custom Search

I want to add all of the websites indexed by Google in a Google custom search. If I try to add sites manually from the Google custom search, it would take more than my life to include all of them.Is there any way I can do that with any trick? Thanks in advance.

I dont think that there is a solution to your question. Here is why:
Google provides their search tool in order to enable people to search their own site and a list of relevant sites of their choosing. However, if Google allowed you to use their entire search tool in your site then there would be no reason to visit Google anymore, thus their advertising revenue would plummet.
One of the things you could consider is choosing the sites that you want to add and then creating a script that will automatically add more sites based on what your users search for.

Where it says: Sites to search, don't go into the submenus for adding specific sites. There is a dropdown off to the side that will change from "Search Only Included Sites" to "Search Entire Web but emphasize included sites"

I think this may be what you're looking for:
http://www.google.com/cse/

According to google, all web search is not possible:
http://support.google.com/customsearch/bin/answer.py?hl=en&answer=1210656
However it is possible to include all know top-level domains to search engine.
Acording to:
http://support.google.com/customsearch/bin/answer.py?hl=en&answer=70322
it would be like: *.com, *.ch, *.pl and many more: http://en.wikipedia.org/wiki/List_of_Internet_top-level_domains
More importantly, if google forbids to perform all web search then any hack won't last long.

Is there a way to find all webpages with specific CSS classes or id's? A search engine for markup?

I'm trying to find websites with the Disqus commenting platform implemented (see a post on avc.com for reference). All Disqus comments are contained within a div with the id "disqus_thread". I've tried searching for words that appear within Disqus' comments interface such as "Real-time updating is" and "Subscribe by email" but it appears Google doesn't index those words.
Is there a search engine for markup, or an easy way to quickly scrape many sites for specific markup? Thanks.

I am currently building a tool capable of that. It is based on Selenium, which you could use for your goal. But that would involve some development effort.
But currently i am not aware of a search engine capable of that.

Methods for preventing search engines from indexing irrelevant content on a page

I'm looking for ways to prevent indexing of parts of a page. Specifically, comments on a page, since they weigh up entries a lot based on what users have written. This makes a Google search on the page return lots of irrelevant pages.
Here are the options I'm considering so far:
1) Load comments using JavaScript to prevent search engines from seeing them.
2) Use user agent sniffing to simply not output comments for crawlers.
3) Use search engine-specific markup to hide parts of the page. This solution seems quirky at best, though. Allegedly, this can be done to prevent Yahoo! indexing specific content:
<div class="robots-nocontent">
This content will not be indexed!
</div>
Which is a very ugly way to do it. I read about a Google solution that looks better, but I believe it only works with Google Search Appliance (can someone confirm this?):
<!--googleoff: all-->
This content will not be indexed!
<!--googleon: all-->
Does anyone have other methods to recommend? Which of the three above would be the best way to go? Personally, I'm leaning towards #2 since while it might not work for all search engines, it's easy to target the biggest ones. And it has no side-effect on users, unless they're deliberately trying to impersonate a web crawler.

I would go with your JavaScript option. It has two advantages:
1) bots don't see it
2) it would speed up your page load time (load the comments asynchronously and unobtrusively, e.g. via jQuery) ... page load times have a much underrated positive effect on your search rankings

Javascript is an option but engines are getting better at reading javascript, to be honest I think your thinking too much into it, Engines love unique content, the more content you have on each page the better and if the users are providing it... its the holy grail.
Just because your commenter made a reference to star wars on your toaster review doesn't mean your not going to rank for the toaster model, it just means you might rank for star wars toaster.
Another idea would be, you could only show comments to people who are logged in, collegehumor do the same I believe, they show the amount of comments a post has but you have to login to see them.

googleoff and googleon are for the Google Search Appliance, which is a search engine they sell to companies that need to search through their own internal documents. It's not effective for the live Google site.
I think number 1 is the best solution, actually. The search engines doesn't like when you give them other material than you give your users so number 2 could get you kicked out from the search listings altogether.

This is the first I have heard that search engines provide a method for informing them that part of a page is irrelevant.
Google has a feature for web masters to declare parts of their site for a web search engine to use to find pages when crawling.
http://www.google.com/webmasters/
http://www.sitemaps.org/protocol.php
You might be able to relatively de-emphasize some things on the page by specifying the most relevant keywords using META tag(s) in the HEAD section of your HTML pages. I think that is more in line with the engineering philosophy used to architect search engines in the first place.
Look at Google's Search Engine Optimization tips. They spell out clearly what they will and will not let you do to influence how they index your site.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008