I would like to know if the opengraph markup is W3C valid,
I'm getting the following error when I try to validate it:
Line 14, Column 17: there is no attribute "PROPERTY"
<meta property="og:site_name" content="sitename">
In case it's not valid, will it impact my pagerank and other search engines algo?
Is it possible to cloak those properties?
It's not valid in the normal HTML doctypes, but there is a doctype you can use to validate XHTML documents including Open Graph:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
See this question: Html validation error for property attribute
No, it isn't. That is why the validator reports an error.
<html version="HTML+RDFa 1.1" lang="en">
<head>
<title>Example Document</title>
</head>
<body>
<p>Moved to example.org.</p>
</body>
</html>
With this seems to work:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:v="http://rdf.data-vocabulary.org/#">
With this you are solving the issue:
<!DOCTYPE html>
<html vocab="http://www.w3.org/2011/rdfa-context/rdfa-1.1">
With this you can use lines in your html like this:
<meta property="og:title dc:title" content="m.clinic.pt - Está em boas mãos!">
or from other vocabularies listed (http://www.w3.org/2011/rdfa-context/rdfa-1.1) like this one:
cat: http://www.w3.org/ns/dcat#
qb: http://purl.org/linked-data/cube#
grddl: http://www.w3.org/2003/g/data-view#
ma: http://www.w3.org/ns/ma-ont#
owl: http://www.w3.org/2002/07/owl#
rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfa: http://www.w3.org/ns/rdfa#
rdfs: http://www.w3.org/2000/01/rdf-schema#
rif: http://www.w3.org/2007/rif#
rr: http://www.w3.org/ns/r2rml#
skos: http://www.w3.org/2004/02/skos/core#
skosxl: http://www.w3.org/2008/05/skos-xl#
wdr: http://www.w3.org/2007/05/powder#
void: http://rdfs.org/ns/void#
wdrs: http://www.w3.org/2007/05/powder-s#
xhv: http://www.w3.org/1999/xhtml/vocab#
xml: http://www.w3.org/XML/1998/namespace
xsd: http://www.w3.org/2001/XMLSchema#
prov: http://www.w3.org/ns/prov#
sd: http://www.w3.org/ns/sparql-service-description#
org: http://www.w3.org/ns/org#
gldp: http://www.w3.org/ns/people#
cnt: http://www.w3.org/2008/content#
dcat: http://www.w3.org/ns/dcat#
earl: http://www.w3.org/ns/earl#
ht: http://www.w3.org/2006/http#
ptr: http://www.w3.org/2009/pointers#
cc: http://creativecommons.org/ns#
ctag: http://commontag.org/ns#
dc: http://purl.org/dc/terms/
dc11: http://purl.org/dc/elements/1.1/
dcterms: http://purl.org/dc/terms/
foaf: http://xmlns.com/foaf/0.1/
gr: http://purl.org/goodrelations/v1#
ical: http://www.w3.org/2002/12/cal/icaltzd#
og: http://ogp.me/ns#
rev: http://purl.org/stuff/rev#
sioc: http://rdfs.org/sioc/ns#
v: http://rdf.data-vocabulary.org/#
vcard: http://www.w3.org/2006/vcard/ns#
schema: http://schema.org/
describedby:http://www.w3.org/2007/05/powder-s#describedby
license: http://www.w3.org/1999/xhtml/vocab#license
role: http://www.w3.org/1999/xhtml/vocab#role
You can validate it through http://validator.w3.org/ or http://html5.validator.nu/ very well.
So instead of this:
<div vocab="http://schema.org/" typeof="Product">
<img property="image" src="dell-30in-lcd.jpg" />
<span property="name">Dell UltraSharp 30" LCD Monitor</span>
</div>
You can have this:
<!-- The schema: prefix is defined in the vocabulary http://www.w3.org/2011/rdfa-context/rdfa-1.1 -->
<div typeof="schema:Product">
<img property="schema:image" src="dell-30in-lcd.jpg" />
<span property="schema:name">Dell UltraSharp 30" LCD Monitor</span>
</div>
Some resources http://www.w3.org/TR/rdfa-primer/ http://manu.sporny.org/2012/mythical-differences/ http://rdfa.info/
Related
Im using the Apache FOP to convert Html document to PDF.
In my document I need the right align. I was using http://webcoder.info/downloads/xhtml2fo.xsl stylesheet but the float feature is not working.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="content-type" content="text/html;charset=utf-8" />
<title>W22</title>
</head>
<body>
<p style= "float: right">
Invoice No : $invoiceId <br>
Date : $date <br>
Client Ref : $customerAccount <br>
Email : $email
</p>
</body>
</html>
I'm trying to generate an html file to a file. I'm using with-html-output-to-string, but I can't seem to figure out how to get the functionality to work. I'm not sure if I should use a file stream, with-open-file, and how to get the syntax to work. I've been messing with this for a day, but the code just doesnt run.
CL-USER> (who:with-html-output-to-string (out nil :prologue t :indent t)
(:html
(:head
(:title "home"))
(:body
(:p "Hello cl."))))
"<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">
<html>
<head>
<title>home
</title>
</head>
<body>
<p>Hello cl.
</p>
</body>
</html>"
I have to display a file in HTML table format.
I tried this but I cannot get any output.
use CGI qw(:standard);
my $line;
print '<HTML>';
print "<head>";
print "</head>";
print "<body>";
print "<p>hello perl am html</p>";
print "</body>";
print "</html>";
A CGI program must output the HTTP headers before it outputs any content. At a minimum, it must supply an HTTP Content-Type header.
Add:
my $q = CGI->new;
print $q->header('text/html; charset=utf-8');
… before you output any HTML.
(You should also write valid HTML, so include a Doctype and <title>).
You should use the CGI module once you have loaded it. It makes it much simpler to follow the correct rules for an HTTP page.
As has been observed, you need to print an HTTP header before the HTML body, and you can do that with print $cgi->header which defaults to specifying a content type of text/html and a character set of ISO-8859-1, which is adequate for many simple HTML pages. It also generates a <meta> element within the HTML that contains the same information.
This short program shows the idea. I have added a trivial table that shows how you could include that in the page. As you can see, the CGI code is much simpler than the corresponding HTML.
use strict;
use warnings;
use CGI qw/ :standard /;
print header;
print
start_html('My Title'),
p('Hello Perl am HTML'),
table(
Tr([
td([1, 2, 3]),
td([4, 5, 6]),
])
),
end_html
;
output
Content-Type: text/html; charset=ISO-8859-1
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US" xml:lang="en-US">
<head>
<title>My Title</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>
<body>
<p>Hello Perl am HTML</p><table><tr><td>1</td> <td>2</td> <td>3</td></tr> <tr><td>4</td> <td>5</td> <td>6</td></tr></table>
</body>
</html>
How about this:
use CGI;
use strict;
my $q = CGI->new;
print $q->header.$q->start_html(-title=>'MyTitle');
my $tableSettings = {-border=>1, -cellpadding=>0, -cellspacing=>0};
print $q->table($tableSettings, $q->Tr($q->td(['column1', 'column2', 'column3'])));
print $q->end_html;
Output:
Content-Type: text/html; charset=ISO-8859-1
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US" xml:lang="en-US">
<head>
<title>MyTitle</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>
<body>
<table border="1" cellspacing="0" cellpadding="0"><tr><td>column1</td> <td>column2</td> <td>column3</td></tr></table>
</body>
</html>
Hi I'm trying to get the top of my multimarkdown file to look like:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title>Test of markdown</title>
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
<link rel="stylesheet" type="text/css" href="../main.css" />
</head>
I know how to add the following metatags:
Title: Test of markdown
CSS: ../main.css
Quotes language: english
which gives me :
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8"/>
<title>Test of markdown</title>
<link type="text/css" rel="stylesheet" href="../main.css"/>
</head>
But I'm not sure how to add the rest. Would appreciate any help. Thanks
I can't find any native markdown way to do this but you could run a little script across the generated HTML if you really feel you need to do this.
This is a simple Python 3 option that might get you started. This could be improved in many ways but wanted to keep it simple. An obvious idea would be to give it a folder and have it process every HTML file in the folder. But I hope this gives the idea.
Example code:
filepath = input('What is the full file path to the file? - ')
htmldoctype = ' '.join([
'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"',
'"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">',
'\n'
])
htmlinfo = ('<html xmlns="http://www.w3.org/1999/xhtml">\n')
inlines = []
try:
with open(filepath, mode='r', encoding='utf-8') as infile:
for line in infile:
if line.strip() == '<!DOCTYPE html>':
inlines.append(htmldoctype)
elif line.strip() == '<html>':
inlines.append(htmlinfo)
else:
inlines.append(line)
except Exception:
print('something went wrong in get')
try:
with open(filepath, mode='w', encoding='utf-8') as outfile:
for line in inlines:
outfile.write(line)
except Exception:
print('something went wrong in write')
Input:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8"/>
<title>Test of markdown</title>
<link type="text/css" rel="stylesheet" href="../main.css"/>
</head>
<body>
test
</body>
</html>
Output:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8"/>
<title>Test of markdown</title>
<link type="text/css" rel="stylesheet" href="../main.css"/>
</head>
<body>
test
</body>
</html>
There are lots of examples of how to strip HTML tags from a document using Ruby, Hpricot and Nokogiri have inner_text methods that remove all HTML for you easily and quickly.
What I am trying to do is the opposite, remove all the text from an HTML document, leaving just the tags and their attributes.
I considered looping through the document setting inner_html to nil but then really you'd have to do this in reverse as the first element (root) has an inner_html of the entire rest of the document, so ideally I'd have to start at the inner most element and set inner_html to nil whilst moving up through the ancestors.
Does anyone know a neat little trick for doing this efficiently? I was thinking perhaps regex's might do it but probably not as efficiently as an HTML tokenizer/parser might.
This works too:
doc = Nokogiri::HTML(your_html)
doc.xpath("//text()").remove
You can scan the string to create an array of "tokens", and then only select those that are html tags:
>> some_html
=> "<div>foo bar</div><p>I like <em>this</em> stuff <a href='http://foo.bar'> long time</a></p>"
>> some_html.scan(/<\/?[^>]+>|[\w\|`~!##\$%^&*\(\)\-_\+=\[\]{}:;'",\.\/?]+|\s+/).select { |t| t =~ /<\/?[^>]+>/ }.join("")
=> "<div></div><p><em></em><a href='http://foo.bar'></a></p>"
==Edit==
Or even better, just scan for html tags ;)
>> some_html.scan(/<\/?[^>]+>/).join("")
=> "<div></div><p><em></em><a href='http://foo.bar'></a></p>"
To grab everything not in a tag, you can use nokogiri like this:
doc.search('//text()').text
Of course, that will grab stuff like the contents of <script> or <style> tags, so you could also remove blacklisted tags:
blacklist = ['title', 'script', 'style']
nodelist = doc.search('//text()')
blacklist.each do |tag|
nodelist -= doc.search('//' + tag + '/text()')
end
nodelist.text
You could also whitelist if you preferred, but that's probably going to be more time-intensive:
whitelist = ['p', 'span', 'strong', 'i', 'b'] #The list goes on and on...
nodelist = Nokogiri::XML::NodeSet.new(doc)
whitelist.each do |tag|
nodelist += doc.search('//' + tag + '/text()')
end
nodelist.text
You could also just build a huge XPath expression and do one search. I honestly don't know which way is faster, or if there is even an appreciable difference.
I just came up with this, but #andre-r's solution is soo much better!
#!/usr/bin/env ruby
require 'nokogiri'
def strip_text doc
Nokogiri(doc).tap { |doc|
doc.traverse do |node|
node.content = nil if node.text?
end
}.to_s
end
require 'test/unit'
require 'yaml'
class TestHTMLStripping < Test::Unit::TestCase
def test_that_all_text_gets_strippped_from_the_document
dirty, clean = YAML.load DATA
assert_equal clean, strip_text(dirty)
end
end
__END__
---
- |
<!DOCTYPE html>
<html xmlns='http://www.w3.org/1999/xhtml' xml:lang='en' lang='en'>
<head>
<meta http-equiv='Content-type' content='text/html; charset=UTF-8' />
<title>Test HTML Document</title>
<meta http-equiv='content-language' content='en' />
</head>
<body>
<h1>Test <abbr title='Hypertext Markup Language'>HTML</abbr> Document</h1>
<div class='main'>
<p>
<strong>Test</strong> <abbr title='Hypertext Markup Language'>HTML</abbr> <em>Document</em>
</p>
</div>
</body>
</html>
- |
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title></title>
<meta http-equiv="content-language" content="en">
</head>
<body><h1><abbr title="Hypertext Markup Language"></abbr></h1><div class="main"><p><strong></strong><abbr title="Hypertext Markup Language"></abbr><em></em></p></div></body>
</html>