What are dpuf (extension) files? - html

I have seen this extension in some urls and I would like to know what they are used for.
It seems odd, but I couldn't find any information about them. I think they are specific for some plug-in.

It seems to be connected to 'Share This'-buttons on the websites.
I found this page which gives a quite comprehensive explanation:
This tag is mainly developed for tracking the URL sharing on various Social Networks, so every time anyone copies your blog content there he gets the URL ending with #sthash and extension with .dpuf or .dpbs

Related

Possible to embed Tumblr into other website?

I've been able to embed my latest 10 Tumblr posts into a website, but it doesn't include all of the functionality (comments, re-blogs, shares, etc.) of Tumblr. I'm really looking to do that, but I can't find an answer on this anywhere.
I know a lot of programming languages, so I'll take a solution in any language. The website IS a built-from-scratch website, so a Wordpress plugin won't help.
EDIT: Just to confirm (based on comments/questions below), we've followed the API documentation. We've got plenty of APIs working, but this one doesn't. We've tried gems, a Javascript version, the API with oauth and tokens, and more attempts than I can recall.
It's easy to do in Wordpress, and if we were doing it as a subdomain of a site, that would be possible. But the client (pro-bono) wants it embedded on a page that does lots of other things. Maybe there's a Javascript library we don't know about? Some other secret means of doing it? But the API (at least with available documentation) isn't working. Heck, even if you could direct us to a site where someone is using Tumblr embedded on a non-Wordpress/Tumblr website, that would be helpful. We could inspect the code.
We've got Twitter, Google Maps, and plenty of other APIs working. I swear we aren't idiots, and the answer to this isn't as easy as it appears.
THANKS!
If you want a clear example on how to use the JSON, check this link, it helped me tons:
http://janzheng.com/2013/06/tumblr_integration.html

Getting the web page content, similar to Readability as service

I'm looking for some facility for getting out clean HTML content for different pages (blog articles, magazines etc). The basic idea is how the 'Reader' in iOS Safary works.
This answer I can up that iOS Safary uses Readability for content parsing. Unfortunatelly the API does not include any methods for parsing, instead saving a bookmark and getting it's content, which does not suit me much.
Another answer here suggests to use https://www.readability.com/api/content/v1/parser but it does not work for me.
Any suggestions for similar services?
Have a look at Tranquility. It is a Firefox Add-on so you can look at the source. You can download the XPI and unpack it. Then look into content/tranquility.js and the related files in content/.

Finding number of pages of a website

I want to find the number of pages of a website. Usually what I look for is a sitemap but I just encountered a site which does not have a sitemap so I am out of ideas of how to find its total pages. I tried to Google the URL but that did not helped much. Is there any other way we can find out the pages of a website?
Thanks in advance.
Ask Google "site:yourdomain.com"
This gives you all indexed pages.
Or use the free tool "Xenu". It crawls the whole site. But it won't find sites which have no internal links pointing to them. You can also export a sitemap with it.
I was about to suggest the same thing :) If this is a website you own, you can also add it to the Google Webmaster tools. It will show you lots of things about your site including number of links, pages, search terms, etc Its very useful and is free of charge.
I have found a better solution myself. You can go to Google Advanced Search and restrict the search results to your domain name. Leave everything else empty. It would give you the list of all pages cached by Google.
You could also try A1 Website Analyzer. But for all link checker software, you will have to make sure you configure them correctly to obey/not-obey (whatever your needs are) e.g robots.txt, noindex and nofollow instructions. (Common source for confusion in my experience.)

Crawling data or using API

How these sites gather all the data - questionhub, bigresource, thedevsea, developerbay?
Is this legal to show data in frame as bigresource do?
#amazed
EDITED : fixed some spelling issues 20110310
How these sites gather all data- questionhub, bigresource ...
Here's a very general sketch of what is probably happening in the background at website like questionhub.com
Spider program (google "spider program" to learn more)
a. configured to start reading web pages at stackoverflow.com (for example)
b. run program so it goes to home page of stackoverflow.com and starts visiting all links that it finds on those pages.
c. Returns HTML data from all of those pages
Search Index Program
Reads HTML data returned by spider and creates search index
Storing the words that it found AND what URL those words where found at
User Interface web-page
Provides feature rich user-interface so you can search the sites that have been spidered.
Is this legal to show data in frame as bigresource do?
To be technical, "it all depends" ;-)
Normally, websites want to be visible in google, so why not other search engines too.
Just as google displays part of the text that was found when a site was spidered,
questionhub.com (or others) has chosen to show more of the text found on the original page,
possibly keeping the formatting that was in the orginal HTML OR changing the formatting to
fit their standard visual styling.
A remote site can 'request' that spyders do NOT go thru some/all of their web pages
by adding a rule in a well-known file called robots.txt. Spiders do not
have to honor the robots.txt, but a vigilant website will track the IP addresses
of spyders that do not honor their robots.txt file and then block that IP address
from looking at anything on their website. You can find plenty of information about robots.txt here on stackoverflow OR by running a query on google.
There is a several industries (besides google) built about what you are asking. There are tags in stack-overflow for search-engine, search; read some of those question/answers. Lucene/Solr are open source search engine components. There is a companion open-source spider, but the name eludes me right now. Good luck.
I hope this helps.
P.S. as you appear to be a new user, if you get an answer that helps you please remember to mark it as accepted, or give it a + (or -) as a useful answer. This goes for your other posts here too ;-)

Web site as image/clip art library with reference?

As a software developer, I have done many web page applications and been doing blog for my programming experiences. I would like to use pictures in many cases. Pictures worth thousand words and they are universal language!
You could create your own clip art images or download graphics(actually many are open clip art/image libraries available, Open Clip Art Library as example). However your time and art skill are limited and you can only keep limited library of images.
I wish if there is any open art/image library web sites with permanent references available so that you just add a simple reference in your html page like this just like a way that you could use other people or web site's graphics:
<img src="http://OpenArtLibray.net/icon/work/DoItYourself.png".../>
In this way, there is no need to waste time to download and upload images and no waste on your and other computer's disk spaces(no duplication). Just one place with a huge amount of variety of images available, and open for people to use, or with some reasonable fees. People may vote the popularity of art/images as well.
Is there any such kind of web site available?
Typically sites discourage this. What this really does is shift the bandwidth cost to the hosting site. There have been cases where sites with pictures have analyzed the referrer to determine if images are linked to from other sites, then servering an image with text claiming the image is being 'stolen'.
The point of that, is the idea isn't very well liked.
However, some sites like w3c, allow you to link to their certification images. It all depends on what you are linking to.
It is hard to think of a business doing this, as there doesn't seem to be a revenue aspect.
Even if some were charged fees, there's a lot of work involved in checking/verifying who has paid, via referrer texts. Maybe you have a new business plan.
Update:
Oh, I have a friend who always sends me emails with links to flickr. Maybe their license lets you link to images on their site. Something for you to check out.
Update:
This text, "photo hosting sites", makes for an interesting, relevant google search.
Thanks for Chris explanation. I could accept it as a answer. However, I raised this question because I really don't like to "steal" images. I can see it is hard to charge fees, but there are some many open resources available on the web. Actually, I found one Open Clip Art Library, which allow people to contribute and share images. I found many good pictures there and downloaded. I may contribute some when I create images for my blog so I'll let people to use my.
Flickre is an open social place for people store and sharing pictures. As long as pictures are shared there, specially by people, I think you can use and link images there. Still you have to do the work: creating and uploading. Actually, I tried another open social site called as DropBox. I can create a public folder there and add my pictures for sharing. All those sites have one common problem: personal account and may not be available if inactive for certain period of time (90 days for DropBox?).
That's why I asked this question here in StackOverFlow. I hope some people may know some hosts available or any other alternative options available. Maybe it is just like Chris said, "the idea isn't very well liked".
Actually, I realize that Open Clip Art Library I mentioned in my previous email does provide image hosting-like service. If you click on any one's picture download link, it will open a new tab or window to display the graphic. The display has its URL. I have created a new user name and submitted my picture. It works well. I can include the graphics in my test web page. Not sure how long the URL will be there. It looks like permanent one.
Try searching Creative Commons licensed works. People will often upload and share photos on such places as Flickr under a Creative Commons license which allows you to remix, reference or use on your own projects, blogs or site.
There are different types of licenses under the CC with some asking you to not use their works if you're going to be making money from it or if you're engaging in commercial activity.
You just have to nod back to the original author when using items under CC and if you link back to them, that's just good karma.