I am making a website where I want to get content of Wikipedia so I need the database of Wikipedia in my localhost so how can I get it?
I cannot find any SQL file. Also, what is the difference between a database dump and a .sql database file?
Wikipedia do not expose their base databases to the public, so you can't do that.
They also rather object to people 'stealing' large portions of their content - but I don't think that they actually do much about it. (Some content is licenced from original copyright holders. There would be objections about you copying stuff like that; I think it's mostly images though.)
Your only option is to scrape their website by downloading the HTML they generate. This will probably not be very useful and won't make you many friends.
Probably the best thing to do is create an account at Wikipedia, go to something the Community Portal (link top left bar) and then the Village Pump (where you ask questions) and ask them.
#HoschNok - indeed!
Cheers -
Related
A friend of mine and I are in our senior year and will be starting a senior project soon. We had the idea to do a data analysis and data visualization project for it. Our project involves reading a CSV file that is updated every 2 minutes, parsing that data, then storing it in a database. Once that data is stored we want to run some analysis on it and provide an API through which we could access that data to visualize in some way. Our end goal would be to build an Android app that displays some of the raw data from the CSV and the analysis in a user friendly format. I talked to another CS Major and he explained that I would need a few different servers to accomplish this: One for the storage, another for analysis, and another for some type of queue that would make sure things don't get screwy while we are doing scraping and analysis. The problem is, I don't really know where to start with this. I've done some work with a SQL database before and a PHP front end, but nothing with multiple servers. I've heard of tools to use with big data projects like Hadoop but i'm not exactly sure where it fits in. If someone could point me to a resource of some kind to explain, or explain themselves, how I would start to structure this kind of project, that would be awesome!
Since you don't have much experience with these things you'll probably want to look at projects like Cloudera. Specifically their resources page has a nice set of videos and articles.
Another source of solid information (that I personally use) is by clicking on an Stack Overflow tag and selecting the votes option. Many good questions on a plethora of big data topics already exists.
I've got a lack of understanding at the moment. I'm developing a website with many articles and instead of creating a .html page for every article, I thought about storing the text into a database and get it from there (somehow) again.
I'm totally unsure if it is the common way to store text in a database. How do all of the "big" websites handle the mass of articles they publish? They won't create single pages neither but instead using a database, I guess.
But how can I achieve this? How can I store whole html files with divs and jquery and stuff into a database and get them when clicking on a link? Might XML be a keyword?
First of all, you need to clearly understand how things should work.
Clearly the approach of creating a page per article cannot work for multiple reasons:
if you have a huge number of articles you'll need to have a huge number of pages
if you need to change something small in design, you'll need to make that change for every single stored article
What you need to do is to create a more generic page, which has all the common stuff for all articles in it (a place for title, a place for content). The articles themselves can be stored in a database. When opening a page for a specific article, your application should place the title and content in the right place in that page.
This approach is universal _ it will work for any number of articles.
The keywords you are looking for are : Dynamic, Content Management.
In order to achieve this, you should learn a scripting language, PHP for example.
You will find a lot of tutorials to get started and how to make your website a bit more dynamic.
But you were right about the database part, most blogging systems and other content providers use databases to store all of this in data tables. PHP (and some other languages) would allow you to interface the database and the content you provide to your users.
You should look into using a web development framework like ruby on rails. Rails has templating that essentially let's you define variables inside of your html (e.g. "text of article").
As for storing the text of the article, the way I do things like that is to store them in a file on my server and then fetch that file using AJAX and then insert into an html file.
Most sites accomplish this by having templates, in which the common-to-every-page html is stored in a file. Page-specific data (article text, etc.) is stored in the database and "inserted" into the relevant parts of the template before returning to the client.
download word press and check how it work! it will help you
http://wordpress.org/download/
I am trying to find a way for a user to come to my site and fill in a form and when they submit the form a new webpage is made. I want to make it create a new webpage in an admin area so I can view what they have submitted without having to troll my databases. I am assuming this is possible because the concept is hardly new, but hours of scanning google has left me empty handed on any remotely close tutorial or anything of that nature. Perhaps I simply do not know how to word it, I am very new at forms but I am assuming this has something to do with the form action. Are there tutorials for this that someone can link me to or can someone give me a quick explanation? I can figure out the work for myself, I just need a point in the right direction. Thank you.
You're going to need to learn about 1) persistant storage (a database), 2) a server side programming language (HTML is purely for creating the structure of a web page), and ... I dunno, a lot more. I would suggest you actually look at a CMS (content management system) and see if that gets you where you want to go.
Databases don't interact with HTML in that way without some sort of application sitting between the site and the database. It doesn't have to be a PHP application, but something is going to have to store and get data from the db, and something is going to have to dynamically create these pages you want. And that's going to be some sort of programming language -- or a content management system like Drupal.
(Also, don't forget about security, support, etc. You write the app, you have to support it. =)
One of my sites is a social networking site running on MySQL. I use postal code and country information to geolocate users using a webservice. This webservice also allows you to download all their many tables of information so that you can access it locally. My site has gotten big enough that I wish to do this now.
My question is, should I create a new database on my site for all of this postal code and country information and all its tables, or should I incorporate those tables into my existing database for my social networking site?
What are the pros/cons either way?
When you're talking about scaling and want to know about other databases like NOSQL, you might find this article interesting: http://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.html
I'd vote in favor of a separate database if you planned to use the data as read-only and put a web service in front of it to access it. Users would search it based on a small handful of parameters (e.g. address info to get lat/lon data).
I'd say put it in the existing database if you planned to JOIN it with other information in your current schema.
it will live on the same disk probably.
so disk space is not an issue.
if you query the tables in a completely separate manner, then no impact on the existing site.
if you query things together, then easier when all in one database.
overall administration of one database vs 2 is easier.
i think it's a no brainer... they go in one db.
Most of my content on my web application gets stored in MySql database. I want to open this content for search engine to index it.
What is the best solution to do this.
Best could be either performance oriented or ease of implementation.
Thanks in advance!
You can also create a sitemaps xml file that could sit at example.com/sitemaps.xml and contain a dump of all blog posts, products, user profiles etc etc in a format google can understand (more so than a normal webpage).
You can also ping a url to tell google to come check your sitemap whenever you add or edit content.
Assuming you are talking about web based search engines (such as Google), then they index webpages.
Make webpages for all entries in the database and link to them.
Like David said, a webpage should be available for each resource. Not only to force indexing, but also as a "landing page" to which the search result will then direct you. This can then of course be a redirect to another page.
The pages can be dynamic of course, but make sure that they are reference somewhere on your site so the spiders can reach them.