It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 12 years ago.
Whats does "indexing" mean? How it is useful to a web crawler?
Internal indexing in a database and the index that a web crawler looks at are two different concepts.
Indexing in a database is a method to make looking up records by certain columns faster.
Indexing in the context of web crawlers involves storing web pages and building an index based on what's in them so as to access them by content more easily.
http://en.wikipedia.org/wiki/Index_(database)
http://en.wikipedia.org/wiki/Web_indexing
Related
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
Is it faster to search your web page content if the content is stored in html files or in a database like SQL?
depends on how you want to search through it. I would prefer to store it into a database and implement full-text-search.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
For users like facebook they will have separate table for each users or all in a single table. Which will be efficient?
SQL doesn't work like that. The paradigm is to have tables that store information about groups of entities, such as companies, people, compact discs. Having a table per entity, i.e. one table for every user, doesn't really make sense, and would be very hard to use.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
How one can efficiently store the WebGraph in Relational databases such MySQL for playing with algorithms like PageRank? I think of creating two tables: one for URLs where only distinct URLs will be stored and another outgoing links table, for each url store its outgoing URL. Any ideas or any suggestions for efficient storage?
There are specific databases which where created for such a purpose. Take a look at
http://neo4j.org/
https://launchpad.net/giraffedb
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
thish is a general question about the "big" database player.
I want to know how these DBMS manage deleted record. In particular: does they free space and re-utilize that free space ?
Hope not to be off topic.
Thank you!
In general, they mark a deleted record as deleted, and they keep track of how much free space is available (and where), and will reuse the storage eventually (for new or updated records), but not necessarily in a timely manner.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
When I study about cloud-computing, I usually see these terms: on-premise, off-premise applications. I tried to search them on Google, but no luck. Can anyone please explain these terms to me?
On premises means on location, whereas off premises means remote (in the cloud). For instance if an application runs on an "on-premises" server it means the server is physically in the company. If you have an off-premises solution it's hosted in the cloud or centralized location.
you can always check wikipedia for explanation of technical terms:
http://en.wikipedia.org/wiki/On-premises_software
http://en.wikipedia.org/wiki/Cloud_computing