Extracting an article from the BBC website [closed] - html

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I want to extract an article say this:
http://www.bbc.com/news/magazine-32156264
and only display the article content, so no BBC heading or footer. How would I do this? I'm thinking put it in an iFrame.

As you ask specifically about the BBC:
You are allowed to display the RSS feed of BBC headlines - you could use the WordPress RSS Links widget to do this.
You certainly aren't allowed to just copy someone else's story (or start removing branding etc.) – which is quite reasonable.
Note: The BBC doesn't have an API for news, but some do - e.g. The Guardian's Open Platform - again there will usually be strict restrictions on how you can display things, required branding, what you are/aren't allowed to change.
Correct approach: choose one or two relevant quotes you find interesting, highlight those, and make sure you have prominent link back to the original article.

First of all, there will be legal issues. Second, your page rank will be destroyed because to duplicate content.
If you already considered the above, you should do a PHP curl request, then parse it using a regular expression to get the target data and finally post the retrieved data.
Or, you can use APIs of other news providers like williamt mentioned.

Related

Multilingual Website HTML best solutions [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am currently building an HTML web site and i need the make it multilingual ...
so i was asking what are the best solutions?
Should I use sub domains http://en.mysite.com, or make it simple, e.g.:
mysite.com/en/index.htm -
mysite.com/fr/index.htm
Should I translate everything for all language or are there tools to auto translate?
Finally, how can I make the website detect the users location and redirect him to his language?
Generally, I would say to have higher rank in search engines it might be better to avoid subdomains.
Since you only have HTML, the redirection solutions are not good enough to consider them. For instance if someone decide to visit English version from Paris, to avoid annoying redirections you might need to keep track of this choice by cookies.
Instead of redirection you can still suggest a language for them (according to their location). It's possible by google loader: https://developers.google.com/loader/
You could use /your/path?lang=en.
To detect the users language, see: https://stackoverflow.com/a/8199791/1500022

Ads filtering server side [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I'm working on a web application where I display HTML from other websites. Before displaying the final version I'd like to get rid of the ads.
Any ideas, suggestions on how to accomplish this? it doesn't need to be a super efficient filtering tool, I was thinking in porting some of the filters defined by adblockplus to Ruby and return the parsed doc with some help of Nokogiri.
Let's say I use the super wildcard filter ad. That's not an official adblock but for simplicity I'll use it here. The idea then would be to remove all the elements for which any of the attributes match the filter, e.g: src="http://ad.foo.com?my-ad.gif" href="http://ad.foo.com" class="annoying-ad" etc.
The Nokogiri command for this filter would be:
doc.xpath("//*[#*[contains(., 'ad')]]").each { |element| element.remove }
I applied the filter for this page:
And the result was:
Not that bad, note that the global wildcard filter also got rid of valid elements like headers because they have attributes like id="masthead".
So I think this approach is ok for my case, now the question would be what filters to use? they have a huge list of filters and I don't feel like iterating over all of them. I'm thinking in grabbing the top 10-20 and parse the docs based on that, is there a list out there with the most popular ones? If so, I haven't been able to find it.

What to take in account when deploying complete redesign of website regarding search engines? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I am planning redesign of my page (4-5 years old with pagerank 3-4). There will not be any URLs changing, meaning that the same content will stay under the same URL. But I am still bothered, because I heard that changing HTML structure on whole page can have some effect, mainly negative. But there is no way of changing design and layout of the page without changing HTML structure.
Could you please sum up all the things to take into account when redesigning website search-engine-friendly-way ?
I could go into some detail but basically check your site with this to get a detailed breakdown: http://nibbler.silktide.com/ Before and your test site (Preferably on a test domain ie. test.mywebsite.com).
Basic things not to do are: Do not use html tables for anything but displaying data in a grid, do not use semantic html where not needed this is used to highlight things as important.
Order of importance tags on a page
H1 < H2 < H3 < B
Make sure your html is valid and you have all the appropriate meta-tags in place as per the w3c standard you choose for your design.
Content is key, keyword density and page themes are what are important don't dilute a page, if you are going to add a new page.
Make sure you add a site map and submit to all search engines and have a robots.txt file pointing to your local xml sitemap.
For everything that you didn't understand that I said google the phrases in bold and you will find more detail of implementation.

Optimize HTML page with a TON of links [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I maintain an HTML page that contains a list of links to photo galleries. Over the past few years it has gone from a small page to a list that contains HUNDREDS of links. My fear is that it has affected the SEO of the page as a whole; being interpeted by spiders as a link farm. Of course, I have no real way of knowing fo sure but I have started to suspect.
Is there an efficient simple way to deal with a large number of links in a manner that is still easy for the user to browse? While having hundreds of links one of top of the other may not be the best looking method, its easy to search since they are all in chronological order. I am looking to figure out a way that I can keep the page simple without creating more of a maintenance nightmare for myself.
One idea I had was use XML to store the links and use some kind of dropdown so that when a spider hit the page it would not see a mountain of links, just a reference to XML
Use a "pager" script to show, say 10 at a time. They are available in every web framework or you could quickly hack up your own.
... how about this. Put links in separate file(s) (or somehow store them outside of the page, db, flat file, etc.) and load them via ajax call as needed. Say, something like 'Category A' button, when clicked loads links into a div. That should keep it out view for spiders.
Then there's this: http://www.robotstxt.org/meta.html and this: http://en.wikipedia.org/wiki/Nofollow

Google auto-recognizing menus? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I wondered what would be the markup to achieve the following on google, somehow they recognize the menu items and show it as part of the search result but I couldn't find an easy way to do it.
attached screenshot:
Basically, you are asking how to cause "sitelinks" to appear for your website. Unfortunately as far as SEO is concerned, there isn't any special markup you can use to make these appear. They will be shown if Google's algorithm determines it is appropriate to show them, otherwise, they won't be.
For more information, see the following help article from the Google webmaster tools:
http://www.google.com/support/webmasters/bin/answer.py?answer=47334&topic=8523
There isn't anything special about the markup. Google needs to be able to crawl the site and be able to determine the site's structure based on how pages link to each other. In addition, you can tell Google how the site is structured by submitting a sitemap to them. This is a simple step you can do to encourage Google to build this structure in their search results. Be patient for the results to occur, however, as it can take a while.
A good site navigation tree (logical) and breadcrumbs on the internal page, may help google to check right your "menu". HTML5 too maybe a good idea to say to search engine "Hi. I'm the nav".