How does Google crawl a dynamic page?

How does Google crawl a dynamic page? - html

I specifically mean a dynamic page that depends on GET variable. My website holds most of the data in a database, and depending on GET variable, it prints different results. How do I make it so that Google can see all results in database and index them?

How does Google crawl a dynamic page?
The same way as any other page
I specifically mean a dynamic page that depends on GET variable.
If you have a single variable, then it probably won't cause any issues. Having lots of parameters in the query string can cause Google to decide it is probably not a useful page to crawl.
How do I make it so that Google can see all results in database and index them??
Link to them.

To have google index all of your results, create a sitemap.xml of all your links and place it at the root of your website. If you have a lot of links / pages to set, maybe make a function in php that automatically makes the sitemap.xml. There are plugins to wordpress that do this, maybe download one and have a look at the source if you need an example.

Related

Using a database to store and get html pages for website

I've got a lack of understanding at the moment. I'm developing a website with many articles and instead of creating a .html page for every article, I thought about storing the text into a database and get it from there (somehow) again.
I'm totally unsure if it is the common way to store text in a database. How do all of the "big" websites handle the mass of articles they publish? They won't create single pages neither but instead using a database, I guess.
But how can I achieve this? How can I store whole html files with divs and jquery and stuff into a database and get them when clicking on a link? Might XML be a keyword?

First of all, you need to clearly understand how things should work.
Clearly the approach of creating a page per article cannot work for multiple reasons:
if you have a huge number of articles you'll need to have a huge number of pages
if you need to change something small in design, you'll need to make that change for every single stored article
What you need to do is to create a more generic page, which has all the common stuff for all articles in it (a place for title, a place for content). The articles themselves can be stored in a database. When opening a page for a specific article, your application should place the title and content in the right place in that page.
This approach is universal _ it will work for any number of articles.

The keywords you are looking for are : Dynamic, Content Management.
In order to achieve this, you should learn a scripting language, PHP for example.
You will find a lot of tutorials to get started and how to make your website a bit more dynamic.
But you were right about the database part, most blogging systems and other content providers use databases to store all of this in data tables. PHP (and some other languages) would allow you to interface the database and the content you provide to your users.

You should look into using a web development framework like ruby on rails. Rails has templating that essentially let's you define variables inside of your html (e.g. "text of article").
As for storing the text of the article, the way I do things like that is to store them in a file on my server and then fetch that file using AJAX and then insert into an html file.

Most sites accomplish this by having templates, in which the common-to-every-page html is stored in a file. Page-specific data (article text, etc.) is stored in the database and "inserted" into the relevant parts of the template before returning to the client.

download word press and check how it work! it will help you
http://wordpress.org/download/

Preemptively getting pages with HTML5 offline manifest or just their data

Background
I have a (glorified) CRUD application that I'd like to enable HTML5 offline support with. The cache-manifest system looks simple yet powerful, but I'm curious about how I can allow users to access data while offline.
For example, suppose I have these pages for the entity "Case" (i.e. this is CRM case-management software):
http://myapplication.com/Case
http://myapplication.com/Case/{id}
http://myapplication.com/Case/Create
The first URI contains a paged listing of all cases, using the querystring parameters pageIndex and pageSize, e.g. /Case?pageIndex=2&pageSize=20.
The second URI is the template for editing individual cases, e.g. /Case/1 or /Case/56.
Finally, /Case/Create is the form used to create cases.
The Problem
I would like all three to be available offline.
/Case
The simple way would be to add /Case to the cache-manifest, however that would break paging (as the links wouldn't work).
I think I could instead add something like /Case/AllData which is an XML resource, which is cached and if offline then a script on /Case would use this XML data to populate the list and provide for pagination.
If I go for the latter, how can I have this XML data stored in the in-browser SQL database instead of as a cached resource? I think using the SQL database would be more resilient.
/Case/{id}
This is more complicated. There is the simple solution of manually adding /Case/1, /Case/2, /Case/3 etc... to /Case/1234, but there can be hundreds or even thousands of cases so this isn't very practical.
I think the system should provide access to the 30 most recent cases, for example. As above, how can I store this data in the database?
Also, how would this work? If I don't explicitly add /Case/34 to the manifest and the user clicks on to /Case/34 how can I get the browser to load a page that my JavaScript will populate based on the browser's SQL database data and not display the offline message?
/Case/Create
This one is more simple - as it's just an empty page and on the <form>'s submit action my script would detect if it's offline, and if it is offline then it would add it to the browser's SQL database. Does this sound okay?
Thanks!

I think you need to be looking at a LocalStorage database (though it does have some downsides), but there are other alternatives such as WebSQL and IndexedDB.
Also I don't think you should be using numeric Id's if you are allowing people to create as you will get Primary Key conflicts, it is probably best to use something like a GUID.
Another thing you need is the ability to push those new cases onto the server. there could be multiple...
Can they be edited? If they can I think you really need to be thinking about synchronization and conflict resolution hard very hard if that is the case.
Shameless self promotion, I have a project that is designed to handle these very issues, though it's not done, it's close. You can see it (with an ugly but very functional) demo at https://github.com/forbesmyester/SyncIt

What does it mean to "Index a page"?

For example in the sentence: "This tells Google how to index the page" what does Index the page mean in the grand scheme of things. Why would a page have an 'index.' What is it useful for?

Google servers are constantly visiting pages on the Internet (crawling) and reading their contents. Based on the contents Google builds an internal index, which is basically a data structure mapping from keywords to pages containing them (very simplified). Also when the crawler discovers hyperlinks, it will follow them and repeat the process on linked pages. This process happens all the time on thousands of servers.
In general, the term indexing means analyzing large amounts of data and building some sort of index to access the data in more efficient way based on some search criteria. Compare it with database indexes.

i guess you are asking the question of whats the need for indexing with google? Here it is why?
After creating a website that is very beautiful and have all good features. But as i guess You would have know that web is all about connecting the Webpages! And you have created a site, in which you can only look at it. If the world want to know about your site, the next step will be hosting! After that obviously you have to do index your webpage to any search engine, say for example google. Now your site will be indexed according to the google bot, i cant explain how bot works! And if the person searching your site name in any engine then that engine with the help of indexing can retrive your page as the result :) This is how you connect to the WEB!

This simply means Google is reading your page, figuring out what content is on it (via the page structure, links, etc.) assigning a page rank to it, among other things, and adding it to their database.
There is no specific terminology here.
See Web Crawler: http://en.wikipedia.org/wiki/Web_crawler

In short Index page this is page that originate from table of content that help to search materials in older to access data or information within the given basket of data that can be book or web-page easily.

Best Way To Partial Search in SQL 2008

I've looked into SQL 2008's built-in Full-Text search, and also Lucene.NET.. but I don't think they'll do what I need to do. And I just want to make sure I'm building my program as efficient as possible.
So here's the dream. I want to have a single textbox on a page (like google) and allow the user to enter ANYTHING in. And based on their text, I will search 10's of tables to find what they're looking for.
Example. My database contains thousands of locations, each of which have multiple names / codes. Within each location, there is tonnes of data associated with them.
So if the user wants to display all the locations with the codes that contain "VM" ("CD-VM01", "CD-VM02", "CD-VM03", etc).. they should be able to. Or if they want to find all the locations in Toronto, they just type Toronto.. I want to make the search as easy as possible for people. (I've found that people don't like thinking)..
Plus it ends up being easier to scale to more search options if I can just search the database, and not have to add new fields to a search screen.
So if I don't use Full Text search (which I can't for partial) the only thing I can see that i'm left with is "Like" .. is that right? is that my only option?

I guess the question is, even if you were able to do this in the database, how would you handle it in the UI?
Most likely every search result from a different table will have different attributes that need to be displayed in order for the end user to understand what it is.
The Google search box only needs to search one thing - the content of web pages - and return one type of result - web page URLs and excerpts. Fundamentally you are trying to search for many different things, and so you'll most likely need to handle each case separately.
Alternatively, you could maintain a denormalized search table that contains only the search text and the common attributes you think need to be displayed with each hit. Maintain it either with a scheduled task or with triggers. You'd be able to use FTS on this as well.
Update
Some of the comments express some uncertainty over what SQL Server Full-Text Search is capable of. FTS can most definitely search for a single string anywhere within the text of a column, and can do other things as well (proximity search, free-text search, etc.) If you're just getting started then I'd recommend the TechNet pages on the subject, the documentation is very comprehensive.
In particular I'd suggest having a look at the section on Configuring Catalogs and the Getting Started page (Cole's Notes: you have to create catalogs - writing CONTAINS queries without them won't get you very far). Then take a look at the querying page. I'd be very surprised if you can't find answers to any and all of your questions there.
If you still can't get it to work, I would post a new question with the specifics of your problem - what you've tried, what you're expecting, and what's happening instead.

I believe Lucene does exactly what you're looking for. You can add an index from any external data source (including multiple database tables), then query that index and you'll get back pointers to the matching records.
The drawback is that unlike with full-text indexing, you're responsible for building and maintaining the index yourself.
You can see an example of how Lucene.NET might be used.

It appears that the easiest / quickest solution for this exact problem would be to use LIKE.

Opening MySql database to search engines

Most of my content on my web application gets stored in MySql database. I want to open this content for search engine to index it.
What is the best solution to do this.
Best could be either performance oriented or ease of implementation.
Thanks in advance!

You can also create a sitemaps xml file that could sit at example.com/sitemaps.xml and contain a dump of all blog posts, products, user profiles etc etc in a format google can understand (more so than a normal webpage).
You can also ping a url to tell google to come check your sitemap whenever you add or edit content.

Assuming you are talking about web based search engines (such as Google), then they index webpages.
Make webpages for all entries in the database and link to them.

Like David said, a webpage should be available for each resource. Not only to force indexing, but also as a "landing page" to which the search result will then direct you. This can then of course be a redirect to another page.
The pages can be dynamic of course, but make sure that they are reference somewhere on your site so the spiders can reach them.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008