Parsing web-site [closed] - html

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
So, I have a web-site. The links have a following structure: http://example.com/1, http://example.com/2, http://example.com/3, etc. Each of this pages has a simple table. So how can I download automatically every single page on my computer? Thanks.
P.S. I know that some of you may tell me to google it. But I don't know what I'm actually looking for (I mean what to type in search field).

usewget (http://www.gnu.org/software/wget/ ) to scrape the site

Check out the wget command line tool. It will let you download and save web pages.
Beyond that, your question is too broad for the Stack Overflow community to be of much help.

You could write a simple app and loop through all the urls and pull down the html. For a Java example, take a look at: http://docs.oracle.com/javase/tutorial/networking/urls/readingWriting.html

Related

MediaWiki Portal Creation [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
Link to the site with the issue is here.
It looks like I got the portal created correctly. The instructions for this on mediawiki seem better than on wikipedia. What I don't get is why the link in the box adds this "Template:"
to the fullpagename. It makes it so the edit button does not go the the page that contains the content for the box.
Followed instructions here on portal creation.
Got the portal templates from here.
In box is Template:Portal:Phantom Jump/Intro, it should be Portal:Phantom Jump/Intro.
The code {{{{FULLPAGENAME}}/Intro}} is transformed into {{Portal:Phantom Jump/Intro}}, which is understood as “transclude the template Portal:Phantom Jump/Intro”, since most of the time, when transcluding, you do want to tranclude a template. If that's not what you want, you need to override that using :: {{:{{FULLPAGENAME}}/Intro}} will transclude the page Portal:Phantom Jump/Intro.

Private (invisible) html page [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
How would I upload an html file to my website and not have it be visible to the world? I don't want it showing up on Google or Bing or any weird web spider matrix bot thing being able to see it. I don't want it password protected. I just want it invisible and to be the only person who knows the url.
It would be something like.
My-Website.com/INVISIBLE.html
Your webpage My-Website.com/INVISIBLE.html stays unknown to the world unless you tell someone about it. To make it restricted to search engines, you could use a robots.txt file, details of which are documented at http://www.robotstxt.org/robotstxt.html however not all search engines respect the robots.txt file.
Adding robots.txt to your page should do it
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449

How did this website do their splash page/age verification? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I am looking at this website - http://www.shopmss.com/ - and I was wondering how they did the splash page, age verification and store all on the same URL 'shopmss.com'. You click through 3 screens before you get back to the store.
My secondary question is, can you do this without setting a cookie? i.e. Javascript, that appends the browser bar URL? Or something with mod_rewrite?
EDIT: I thought this was a relevant question to ask because I was exploring the best practice to accomplish the task, I figured it would have something technical. My bad.
The site is setting a cookie called BX. That could be tracking a session, in which they can display different content based on the state of the session.
They are using a frameset. Check the source.

How can I run an HTML5 validator against an entire website? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I've been using HTML5 in websites for about a year now, but the W3C doesn't offer an option to check if an entire domain is valid. There are tools out there to do this with HTML4, but they aren't helpful in HTML5.
Is there an online service or browser extension that can solve this problem? I've looked but couldn't find any.
Did you see the one I wrote? It uses an instance of the Validator.nu engine on our server and it's called HTML Validator Pro. It goes up to 50 pages for free, but I don't know the size of your domain, so I don't know if this will meet your requirements, but I hope so! Please let me know if it works for you and any feedback you have for me.
Thank You
Looking around online, I found a service here: http://html5.validator.nu that provides HTML 5 verification for the entire domain. Have you also seen Total Validator? http://www.totalvalidator.com It also seems to do what you are looking to accomplish.

How do free webhosters enforce ads? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
How do those webhosting companies enforce ads on your page?
I'd love to enforce a specific piece of html code on a webserver.
So, how do they?
They might use append and prepend depending on the exact solution you are referring to.
You basically use it to call another file (html, php etc) which is appended or prepended on the page (At the top or botton).
I did it once years ago and it worked.
Maybe stick the adsense code in the appended/prepended file.
See: http://www.maheshchari.com/php-auto-append-prepend-file-using-htaccess/
James