What is this meta content property og? - html

I was looking at the source code for a web page and this is the 3rd time I've seen this in the header:
<meta content="http://www.example.com/cmswp/wp-content/uploads/2013/01/01/something.jpg" property="og:image">
What is this and what is it for?

This is meta data for Open Graph Protocol
By og:image you can provide an image URL which should represent your object within the graph.
The Open Graph protocol enables any web page to become a rich object
in a social graph. For instance, this is used on Facebook to allow any
web page to have the same functionality as any other object on
Facebook.
While many different technologies and schemas exist and could be
combined together, there isn't a single technology which provides
enough information to richly represent any web page within the social
graph. The Open Graph protocol builds on these existing technologies
and gives developers one thing to implement. Developer simplicity is a
key goal of the Open Graph protocol which has informed many of the
technical design decisions.
Basically these meta tags will be used by social networks to represent your web page anywhere they need.
Visit the following site to see how Facebook interpret og meta tags in real time and create an object that represent a page. Try google.com in it.
https://developers.facebook.com/tools/debug/

The open graph protocol:
The Open Graph protocol enables any web page to become a rich object in a social graph. For instance, this is used on Facebook to allow any web page to have the same functionality as any other object on Facebook.

Related

WikiMedia API - How to determine which portal(s) a Page belongs to?

I wish to determine whether a given Wikipedia page belongs to a certain Wikipedia Portal using the MediaWiki API. So far, I have been experimenting with the page properties of the API but I cannot seem to find a way to derive what Portal a given page belongs to.
As an example, on the Wikipedia page for Cake in the very bottom of the page, I can press Show on the section Cakes, and a bunch of links to different cake pages show up. There I can also see that all of these belong to the Food portal. It is that information that I would wish to extract from a given page using the MediaWiki API.
As far as I know, there is actually no formal definition of "belongings to a portal" in Wikipedia. Opposed to categories which are part of the MediaWiki software, portals are custom pages for Wikipedia that are aimed to make it easier to explore a topic.
Instead of a formal definition though, you can use an heuristic and determine the connection between the page and some portal based on one of them linking to the other. There are API endpoints for both:
(Note: 100 is the id of the 'Portal` namespace)
Which portal pages are linked from the page "Cake" or "Pizza"
https://en.wikipedia.org/w/api.php?action=query&format=json&prop=links&titles=Cake%7CPizza&plnamespace=100
Which portal pages link to the page "Cake" or "Pizza"
https://en.wikipedia.org/w/api.php?action=query&format=json&prop=linkshere&titles=Cake%7CPizza&lhnamespace=100
(though as you can see, many unrelated portals link to "Cake" and none link to "Pizza")
A combined query for both directions
https://en.wikipedia.org/w/api.php?action=query&format=json&prop=links%7Clinkshere&titles=Cake%7CPizza&plnamespace=100&lhnamespace=100
So trough some more investigation i found the answer:
I ended up using the Revisions property in the API. This allows me to to give a series of page titles that I want to investigate, and have the HTML of each page returned to me in json format. Then I can just search for lines containing Portal and figure out what portal (if any) the page belongs to.
If anyone are in a similar situation, here is an example query to the API:
https://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Bread|Bubble_tea|Pizza&format=json&redirects&rvprop=content&rvslots=main

Website "vCard-ish" presentation

I'm not sure how common this implementation is but some time ago I saw a couple of sites having text or html file(s) that allowed for meta data to be gathered about external links.
In example (fictive); if I linked a post to YouTube a request was sent to the address and gathered some basic information given by the external server presenting text and a logotype that was embedded on the site I was posting on.
What is this called?
One service I definitely remember implementing this was Battlefield Battlelog, if a link was posted in the feed, Battlelog would send a request to the external server and present the external information including 2 rows of text and a logotype below the link.
I would say it's some sort of vCard, at the time I also think I looked it up to be W3C Standard/RFC.
Edit: Closest match I could find is Open Graph, if there is no W3C alternative this is it.

Flash/HTML Architecture: SEO Implications?

A client of mine has a full-Flash site and an HTML site (wordpress). Currently, the HTML site lives at http://www.domain.com, while the Flash site lives at http://www.domain.com/flash (swfobject detection at http://www.domain.com redirects flash users to the flash URL). The client isn't entirely pleased with this arrangement in terms of SEO, as links to their site sometimes point to http://www.domain.com and sometimes to http://www.domain.com/flash.
In a few weeks, the client will be rolling out a new version of their Flash site, which features deeplinking, among other things. Instead of living in its own folder off of the domain, the full-Flash site will be a "progressively enhanced" version of the HTML site, so if a user supports Flash, all HTML content will be replaced by Flash content.
Once the new site is launched, each page/URL in the Flash site will have a corresponding HTML page/URL; for example, the Flash content at http://www.domain.com/#/about/clients corresponds to the HTML content at http://www.domain.com/about/clients.
We're going to implement a 301 redirect so the old /flash path points to the domain itself, but we're not sure how to proceed in terms of redirects between the HTML and Flash versions of the site. One possibility would be to simply do client-side detection of capabilities and redirect the user to the appropriate version; under that scenario, a non-Flash-capable client that attempts to visit http://www.domain.com/#/about/clients would be JS-redirected to http://www.domain.com/about/clients, and a Flash-capable client visiting http://www.domain.com/about/clients would be JS-redirected to http://www.domain.com/#/about/clients.
Is this a reasonable approach? Are there any potential SEO red flags that we should be aware of before proceeding?
Thanks for your consideration!
The redirect from /#/about/clients to /about/clients sounds reasonable, but applying the reverse could cause problems - if your Flash detection doesn't work correctly (perhaps Flash is blocked etc.) then you may send the user into an infinite redirect loop.
Personally, I would recommend that non-hash links always load their content as expected, in a static manner. If the user then navigates, you may either end up with a URL like /about/clients#/ (if they went to the home page) (this shouldn't be an issue as crawlers will never end up visiting them this way) or you can have them redirect to / next time they navigate.
IMHO, I'd say that a pure JavaScript solution to the hash problem would be easier to manage as there are already many good examples of this.
Also consider using #! instead of # - this 'hash-bang' technique is being pushed by Google as a way of identifying to search engines that your hash is important and that its contents differ from what you would see without the hash part. Google can already point to specific parts of a page using # and if you follow the hash-bang technique on the client and server-side, it will be able to index your AJAX/Flash links just like regular links (see the implementation details and the requirements you need to fulfill).

Is there anyway of making json data readable by a Google spider?

Is it possible to make JSON data readable by a Google spider?
Say for instance that I have a JSON feed that contains the data for an e-commerce site. This JSON data is used to populate a human-readable page in the users browser. (I.E. The translation from JSON data to human displayed page is done inside the users browser; not my choice, just what I've been given to work with, its an old legacy CGI application and not an actual server-side scripting language.)
My concern here is that, the google spiders will not be able to pickup/directly link to the item in question when a user clicks on it in google, being presented with an index page full of all the items, rather than being linked directly to the item they clicked on.
Is there anyway of "informing" the google spider in the JSON that what they should feed the user a different link?
While Google does crawl and index JavaScript in some circumstances, it's still best to serve "normal" (X)HTML content if at all possible. In this case, it would help to know the rest of the site's setup, in particular: is the JSON content just used to create a feed of links to the product pages (with static content) or are all product pages also generated by JSON feeds? If the feed is only used to point to the actual product pages (which are static) then one way to make the product pages discoverable could be to create a HTML sitemap page or some other alternate form of navigation. A XML Sitemap file can also help, but I would recommend not using it as the sole way of making the product pages discoverable.
If all of the content is only accessible through JSON feeds, then I think you will have to make some bigger changes if you want that content to be accessible through search results.
One way to handle it could also be to use the new JavaScript crawling/indexing proposal, which basically would result in a headless browser being set up between your site and Google: http://code.google.com/web/ajaxcrawling/ (whether setting this up or revamping the rest of the site is easier is hard to say :-))
You should make a wrapper page in server-side code around the JSON data, and respond to requests with either the wrapper or the regular version depending on the User-Agent.

Needed: flexible yet secure user HTML embedding technique

Our software manages libraries, museums, archives etc. We'd like to let the users (namely the catalogers, not the visitors) add some embedded content such as Google maps, YouTube videos etc. We'd like the solution to be as flexible as possible, as each embedded content provider has it's own format. OTOH, we'd rather not allow the users to enter raw HTML, as this will impose both a XSS security risk and in case of erroneous HTML might screw up our surrounding web page.
I started looking into Google Maps today, and couldn't find a way to handle it. I don't want to let the users just copy the embedding HTML snippet into an item; I can't embed the link URL provided, as Google won't allow it; and I can't let the user specify the coordinates, as I don't want to use the Google Maps JS API (which means providing a built-in solution which we'll have to maintain).
The question in not specifically about Google Maps, but Google Maps is quite representative. I'd love to hear suggestions for a flexible-yet-secure HTML embedding technique.
Thanks,
Eran
Would Caja work for you?
Caja (pronounced "KA-ha") is "virtual
iframes": it allows you to put
untrusted third-party HTML and
JavaScript inline in your page and
still be secure. Caja
gives stricter control over what the code can do:
no redirects to phishing pages: the window object the untrusted code has is a fake one created by the containing page
no malware: all requests to URLs are proxied
no XSS: dynamic HTML sanitization
allows the untrusted code more power than is safe to give to code currently in iframes. Here are some possibilities:
floating frames ("info windows")
frames don't have to be rectangular
frames can communicate without the current awkward protocols
a reader could broadcast geographic information about the current article; a maps gadget jumps to the location, while a news gadget gets local stories and a weather gadget pulls up the weather
similarly for financial info or entertainment info
an extensible syntax highlighter could have plugins that can mark up text but not leak the contents to another website
can be a bit channel (can only send information) or a code channel (can send functions)
hosting page can control who talks to whom
markdown or other lightweight markup language for markup; custom macros for embedding allowed snippets (see like it is done on wordpress.com to embed youtube videos)