Is there a tool or service I can use to identify how a website looked liked a year ago if they have access to the website's html code? For instance, say I save how nba.com's html code today, can I see how this html code visually appeared today, in a year's time?
I'm asking because instead of scraping websites daily to obtain info and manually checking if the info is in the same position, what's stopping me from scraping a website's html code and retroactively looking back at the script and accessing the information I need?
This site has a history of what web pages looked like in olden times...
https://archive.org/web/
And they do provide an API:
https://archive.org/help/wayback_api.php
Related
I am helping remodel a website and was wondering if it was possible to scrape just the Text out of the entire site. Doing a page one at a time using DATA SCRAPER is possible but there is hundreds of pages that need to be worked on. Is there a way to get them all in one scrape? Or further suggestions?
If I understand your question correctly, there is a standalone program called HTTrack (https://www.httrack.com/) that will download an entire website to your local computer. I've used it successfully in the past when I need to grab everything.
Edit: This is my first answer on stackoverflow. Why was this voted down? I'd like to know so I don't do it again.
I want to display current date in yyyymmdd format in our company's internal wiki page. I have looked through mediawiki page for guidance, but it seems like that I need to link to some libraries or template. In general, it's a little difficult to navigate on mediawiki's page and learn necessary "programming language". Please advise.
{{CURRENTYEAR}}{{CURRENTMONTH}}{{CURRENTDAY2}}
Due to MediaWiki and browser caching, these variables frequently show when the page was cached rather than the current time.
Source: http://www.mediawiki.org/wiki/Help:Magic_words
What would be the best way to handle this situation?
Company has two types of products, therefore two seperate webpages to serve up each:
products-professional.html
products-consumer.html
Company changes structure and now does not want to list products as seperate, new page is:
products.html
According to Google Webmaster Tools, some sites have links to our old pages. I've added a redirect on them to point them towards the new page, but the errors still show in Google Webmaster Tools. I don't want errors.
You should:
monitor GWT errors and add missing redirects
try to contact as many as possible users linking to the old url's and ask them to fix it
Since 2'nd point is hard to achieve in 100%, your redirects has to be bulletproof, and even then google can find some weird urls from a year ago and report errors.
I have seen this extension in some urls and I would like to know what they are used for.
It seems odd, but I couldn't find any information about them. I think they are specific for some plug-in.
It seems to be connected to 'Share This'-buttons on the websites.
I found this page which gives a quite comprehensive explanation:
This tag is mainly developed for tracking the URL sharing on various Social Networks, so every time anyone copies your blog content there he gets the URL ending with #sthash and extension with .dpuf or .dpbs
I am working on a intranet project that allows our companies web authors to edit the content of our internet site. I am currently trying to get a rollback feature to work. I currently show the current content and backup content code side by side on the page so the author can see what will happen if they do a backup. I was wondering if there was some sort of comparer (or easy way of implementing one) that will highlight the differences between the html code of the two files.
Thanks
John Resig implemented a nice algorithm for isolating differences between files in javascript, which can be found here.