Sphinx Read the Docs - Use icon-only in texts - html

I want to use some informative icons that depict specific meanings. Hover your mouse over an icon and you see a tooltip showing what it means. Allow non-graphical user agents (e.g. browsers that do not support CSS, text-to-speech devices) to read it too.
Consider you write a list
Chicken sandwich
Tofu sandwich - vegan
Chicken salad
House salad - vegan
and you want to change this as below, where 🥒 means "vegan".
Chicken sandwich
Tofu sandwich - 🥒
Chicken salad
House salad - 🥒
One idea I come up with is to write
* Chicken sandwich
* Tofu sandwich - &&VEGAN&&
...
then replace the string &&VEGAN&& with <span class="vegan">VEGAN</span>,
Also add custom CSS snippets
span.vegan {
display: none;
}
span.vegan:hover {
visibility: visible;
/* some position settings for the tool tip go here */
}
span.vegan::before {
display: inline;
font-family: "Font Awesome";
content: "\f06c";
}
The string replacement can be done every time I perform make html followed by a shell command
find /path/to/build/html/ -name *.html -exec \
sed -i 's/\&\&VEGAN\&\&/\<span class="vegan"\>VEGAN\<\/span\>/g' {} \;
Does Sphinx or Read the Docs already have this sort of feature?

Create a (custom) directive. Use replace:: to put some other texts, image:: to put an image, and unicode:: to put a (special) Unicode character.
In the original question, I wanted the cucumber character represents every |vegan|. Create the following directive in the same reStructuredText source document or the rst_prolog in conf.py which allows you to apply this directive in entire documents.
.. |vegan| unicode:: U+1F952
* Chicken sandwich
* Tofu sandwich - |vegan|
* Chicken salad
* House salad - |vegan|
Run make html and open the created HTML file with a browser. You will see a result like below:
Restriction: Simply replacing directive texts by another texts or images will leave web accessibility concerns.
As stated in the original question, none of these directive methods can have the title attribute. In the HTML file, you will want to replace |vegan| by the following HTML code:
<span title="Vegan">&#x1f952</span>
The example directive will only makes 🥒. On a web browser, hover your mouse over the cucumber character and you will see a tooltip saying "Vegan". It will also allow you to see the meaning of this character even if your browser doesn't support some special characters including that emoji.
Workaround 1. Make HTML anyway and replace the text by a desired one. Find and replace every cucumber character by the HTML code. In Bash, use
find /path/to/build/html/ -name *.html -exec \
sed -i 's/🥒/<span title="vegan">🥒<\/span>/g' {} \;
Workaround 2. Use different directives for each output format type. (In research. Someone answer StackOverflow: "Sphinx: Use a different directive for a different output format".)

Related

Is there a way to add hyperlinks from within JSON data?

I am trying to implement a proof of concept for a personal project which is a hypertextual blog. The project is built using expressjs and I am storing the blog data in a JSON file. Part of the structure of the JSON file is as follows:
{
"id": 1,
"name_kanji": "井上和 ",
[...]
"trivia": [ "Favorite food are Mikan and bell pepper stuffed with meat",
"Lightstick Colors are red and white",
"Favorite color is mustard yellow",
"Favorite Nogizaka46song is Arigachi na Renai",
"Her favorite animes are 86 Eighty Six, Vivy and No Game No Life",
"She likes anime songs, vocaloid, Yonezu Kenshi and Yorushika",
"Her favorite mangas are Magi and Tokimeki Tonight",
"She was the first Nogizaka46 5th Generation Member to be announced"
],
"tv_participation": ["Nogizaka Shin Star Tanjo!",
"Nogizaka Under Construction",
"Music Blood"],
"single_participation": ["Actually...",
"Suki to Iu no wa Rock daze!",
"Koko ni wa Nai Mono"],
[...]
}
Ideally, I want to be able to add links to specific words in the JSON file, and then render them as hyperlinks within pug.
As a more specific example:
block col-2
ul
each val in member.trivia
li= val
I am trying to add hyperlinks to specific words within the JSON file, and then render them as hyperlinks within a Pug template. For example, I want to turn the words 'Mikan', 'Lightstick Colors', 'Arigachi na Renai' etc. into hyperlinks so that I can connect them to other pages. I have tried adding the hyperlink directly in the JSON file but it did not work. Is there a straightforward way to do this that I am not aware of?
If you store the full HTML anchor tag as escaped text within a JSON text string, you can use unescaped interpolation to render the link:
JSON
{
"myProperty": "Lorem <a href=\"#\">Ipsum</>"
}
Pug
ul
li!= myProperty
HTML
<ul>
<li>Lorem Ipsum</li>
</ul>
It's not very pretty though. And be sure to heed the warnings about unescaped interpolation on that page in the Pug docs.
Is there a reason you're writing and storing the blog data in JSON and not in Pug?

Does text from a rich text editor not inherit styles when rendered in an HTML document?

Just to make things clear, I have used an RTE in the backend to store some description. Later, through an api, I am receiving the description along with other details as a response. Now the styles are intact till now. For example, bold headings. But when I render it in the HTML document using innerHTML property, all I see is unformatted text. The headings are not bold anymore.
Here's a part of response:
</p>\r\n\n\r\n<p><span style=\"font-weight: bold;\">Features</span> \n </p>\r\n\n\r\n<p>Gives even skin tone, smoother complexion and sculpted facial features.
Clearly, font-style="bold" can be seen here. But after this, the rendered version does not contain those styles.
Here's the full response:
"cart_count":2,
"images":[
],
"success":true,
"message":"Sucessfully",
"data":{
"product_id":1,
"name":"Dr G Butterfly Gua Sha",
"category_id":1,
"category":"Skin Tool",
"description":"<p>Dr G Butterfly Rose Quartz Gua Sha is a beauty and wellness tool designed to heal and enhance natural beauty. It lifts and sculpts your face, drains the lymph node, which reduces puffy eyes and face. By scraping with repeated strokes on the surface of the skin, this tool helps stimulate muscles and increases the blood flow. \n </p>\r\n\n\r\n<p><span style=\"font-weight: bold;\">Features</span> \n </p>\r\n\n\r\n<p>Gives even skin tone, smoother complexion and sculpted facial features. Reduces the signs of ageing and gives younger-looking skin. Increases lymphatic function. Stimulates blood circulation. Improves the appearance of dark circles and reduces under-eye puffiness. </p>\r\n\n\r\n<p><span style=\"font-weight: bold;\">How To Use \n</span></p>\r\n\n\r\n<p>Apply Dr G oil or Dr G gel as per your skin type covering the face and neck. </p>\r\n<p>Hold the butterfly gua sha tool firmly and sweep across gently up and out, starting with the neck, cheeks, jawline, chin, around the mouth, and slowly glide under the eyes, across your eyebrows and from your forehead up to your hairline. </p>\r\n<p>You can sweep it 3-5 times per area. </p>\r\n<p>Recommended at least a few times a week for best results. </p>\r\n\n\r\n<p><span style=\"font-weight: bold;\">About Dr G</span> \n </p>\r\n\n\r\n<p>Dr G offers luxury skincare products, backed by over a decade of dermatology expertise and on-ground practice. Made for Indian weather conditions, with variants for different skin types, including sensitive skin, and to address specific skin concerns - these innovative products are a perfect balance of nature and science. Drawing from ancient Ayurveda and combining natural extracts with skin-safe science, Dr G's range of products bridge modern skincare with holistic science.</p>",
"short_description":"Sculpts, Tones, Reduces Puffiness, Lifts",
"max_quantity":500,
"status":1,
"in_stock":1,
"measurement":[
{
"is_cart":true,
"ordered_quantity":2,
"is_wish":false,
"discounted_price":1400.0,
"weight":"200 Gram",
"price":1400.0,
"prod_id":1,
"percentage":100,
"max_quantity":500
}
]
}
}
The HTML from your response isn't valid. You can easily test it, if you copy the HTML string from your response to a text file with .html file ending and open it with your browser (index.html for example). Or use a validator like this one: https://www.freeformatter.com/html-validator.html
Let's pick one part from the HTML string which has wrong characters and gets displayed unformatted:
<span style=\"font-weight: bold;\">Features</span> \n
If you remove the backslashes \ here this peace gets rendered correctly:
<span style="font-weight: bold;">Features</span> \n
I would reccomend you to encode the HTML before sending it to the frondend. You could use Base64 which can be easily encoded in the backend and decoded on the frontend before displaying it.
If this "wrong" characters are already there when you recive this HTML (on your Backend) you have to parse it first to clean it.

How to fold/unfold HTML tags with Vim

Is there some plugin to fold HTML tags in Vim?
Or there is another way to setup a shortcut to fold or unfold html tags?
I would like to fold/unfold html tags just like I do with indentation folding.
I have found zfat (or, equally, zfit) works well for folding with HTML documents. za will toggle (open or close) an existing fold. zR opens all the folds in the current document, zM effectively re-enables all existing folds marked in the document.
If you find yourself using folds extensively, you could make some handy keybindings for yourself in your .vimrc.
If you indent your HTML the following should work:
set foldmethod=indent
The problem with this, I find, is there are too many folds. To get around this I use zO and zc to open and close nested folds, respectively.
See help fold-indent for more information:
The folds are automatically defined by the indent of the lines.
The foldlevel is computed from the indent of the line, divided by the
'shiftwidth' (rounded down). A sequence of lines with the same or higher fold
level form a fold, with the lines with a higher level forming a nested fold.
The nesting of folds is limited with 'foldnestmax'.
Some lines are ignored and get the fold level of the line above or below it,
whichever is lower. These are empty or white lines and lines starting
with a character in 'foldignore'. White space is skipped before checking for
characters in 'foldignore'. For C use "#" to ignore preprocessor lines.
When you want to ignore lines in another way, use the 'expr' method. The
indent() function can be used in 'foldexpr' to get the indent of a line.
Folding html with foldmethod syntax, which is simpler.
This answer is based on HTML syntax folding in vim. author is #Ingo Karcat.
set your fold method to be syntax with the following:
vim command line :set foldmethod=syntax
or put the setting in ~/.vim/after/ftplugin/html.vim
setlocal foldmethod=syntax
Also note so far, the default syntax script only folds a multi-line
tag itself, not the text between the opening and closing tag.
So, this gets folded:
<div
class="foo"
id="bar"
>
And this doesn't
<div>
<b>text between here</b>
</div>
To get folded between tags, you need extend the syntax script, via
the following, best place into ~/.vim/after/syntax/html.vim
The syntax folding is performed between all but void html elements
(those which don't have a closing sibling, like <br>)
syntax region htmlFold start="<\z(\<\(area\|base\|br\|col\|command\|embed\|hr\|img\|input\|keygen\|link\|meta\|para\|source\|track\|wbr\>\)\#![a-z-]\+\>\)\%(\_s*\_[^/]\?>\|\_s\_[^>]*\_[^>/]>\)" end="</\z1\_s*>" fold transparent keepend extend containedin=htmlHead,htmlH\d
Install js-beautify command(JavaScript version)
npm -g install js-beautify
wget --no-check-certificate https://www.google.com.hk/ -O google.index.html
js-beautify -f google.index.html -o google.index.bt.html
http://www.google.com.hk orignal html:
js-beautify and vim fold:
Add on to answer by James Lai.
Initially my foldmethod=syntax so zfat won't work.
Solution is to set the foldemethod to manual
:setlocal foldmethod=manual
to check which foldmethod in use,
:setlocal foldmethod?
Firstly set foldmethod=syntax and try zfit to fold start tag and zo to unfold tags, It works well on my vim.

What would I use to remove escaped html from large sets of data

Our database is filled with articles retrieved from RSS feeds. I was unsure of what data I would be getting, and how much filtering was already setup (WP-O-Matic Wordpress plugin using the SimplePie library). This plugin does some basic encoding before insertion using Wordpress's built in post insert function which also does some filtering. Between the RSS feed's encoding, the plugin's encoding using PHP, Wordpress's encoding and SQL escaping, I'm not sure where to start.
The data is usually at the end of the field after the content I want to keep. It is all on one line, but separated out for readability:
<img src="http://feeds.feedburner.com/~ff/SoundOnTheSound?i=xFxEpT2Add0:xFbIkwGc-fk:V_sGLiPBpWU" border="0"></img>
<img src="http://feeds.feedburner.com/~ff/SoundOnTheSound?d=qj6IDK7rITs" border="0"></img>
<img src="http://feeds.feedburner.com/~ff/SoundOnTheSound?i=xFxEpT2Add0:xFbIkwGc-fk:D7DqB2pKExk"
Notice how some of the images are escape and some aren't. I believe this has to do with the last part being cut off so as to be unrecognizable as an html tag, which then caused it to be html endcoded while the actual img tags were left alone.
Another record has only this in one of the fields, which means the RSS feed gave me nothing for the item (filtered out now, but I have a bunch of records like this):
<img src="http://farm3.static.flickr.com/2183/2289902369_1d95bcdb85.jpg" alt="post_img" width="80"
All extracted samples are on one line, but broken up for readability. Otherwise, they are copied exactly from the database from the command line mysql client.
Question: What is the best way to work with the above escaped html (or portion of an html tag), so I can then remove it without affecting the content?
I want to remove it, because the images at the end of the field are usually images that have nothing to do with content. In the case of the feedburner ones, feedburner adds those to every single article in a feed. Other times, they're broken links surrounding broken images. The point is not the valid html img tags which can be removed easily. It's the mangled tags which if unencoded will not be valid html, which will not be parsable with your standard html parsers.
[EDIT]
If it was just a matter of pulling the html I wanted out and doing a strip_tags and reinserting the data, I wouldn't be asking this question.
The portion that I have a problem with is that what used to be an img tag was html encoded and the end cut off. If it's deencoded it will not be an html tag, so I cannot parse it the usual way.
With all the <img src=" crap, I can't get my head around searching for it other than SELECT ID, post_content FROM table WHERE post_content LIKE '<img' which at least gets me those posts. But when I get the data, I need a way to find it, remove it, but keep the rest of the content.
[/EDIT]
[EDIT 2]
<img src="http://farm4.static.flickr.com/3162/2735565872_b8a4e4bd17.jpg" alt="post_img" width="80" />Through the first two months of the year, the volume of cargo handled at Port of Portland terminals has increased 46 percent as the port?s marine cargo business shows signs of recovering from a dismal 2009.<div>
<img src="http://feeds.feedburner.com/~ff/bizj_portland?d=yIl2AUoC8zA" border="0"></img> <img src="http://feeds.feedburner.com/~ff/bizj_portland?i=YIs66yw13JE:_zirAnH6dt8:V_sGLiPBpWU" border="0"></img> <img src="http://feeds.feedburner.com/~ff/bizj_portland?i=YIs66yw13JE:_zirAnH6dt8:F7zBnMyn0Lo" border="0"></img> <a href="http://feeds.bizjournals.com/~ff/bizj_portland?a=YIs66yw13JE:_zirAnH6dt8:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/bizj_portland?d=qj6IDK7rITs"
The part I want to keep:
<img src="http://farm4.static.flickr.com/3162/2735565872_b8a4e4bd17.jpg" alt="post_img" width="80" />Through the first two months of the year, the volume of cargo handled at Port of Portland terminals has increased 46 percent as the port?s marine cargo business shows signs of recovering from a dismal 2009.
To reiterate: It's not about removing the valid html img tags. That's easy. I need to be able to find specifically the <img src="http://feeds.feedburner.com/~ff/bizj_portland?d=qj6IDK7rITs" if it's part of the pattern of img tag img tag mangled img tag or anchor img anchor img img mangled image etc etc, but not remove <img if it is indeed part of the article. Out of the few dozen samples I've reviewed, it's been pretty consistent that this mangled img tag is at the end of the field.
The other one is the single mangled image tag. It's consistently a mangled flickr img tag, but as above, I can't just search for <img as it could be a valid part of the content.
The problem lies in that I can't simply decode it and parse it as HTML, because it will not be valid html.
[/EDIT 2]
The best way is to:
Install HTML::Entities from CPAN and use that to unescape the URIs.
Install HTML::Parser from CPAN and use that to parse and remove the URIs after they're unescaped.
Regexes are not a suitable tool for this task.
Question updated...
To extract the data you want, you could use this approach:
use HTML::Entities qw/decode_entities/;
my $decoded = decode_entities $raw;
if ($decoded =~ s{ (<img .+? (?:>.+?</img>|/>)) } {}x) { # grab the image
my $img = $1;
$decoded =~ s{<.+?>} {}xg; # strip complete tags
$decoded =~ s{< [^>]+? $} {}x; # strip trailing noise
print $img.$decoded;
}
Using a regex to parse HTML is generally frowned upon, however, in this case, it is more about stripping out segments that match a pattern. After testing the regexes on a larger set of data, you should have an idea of what might need to be tweaked.
Hope this helps.
I wouldn't strip it out. It's far from unrecoverable junk.
First apply HTML::Entities::decode_entities conditionally (use the occurence of < as the first character as heuristic), then let HTML::Tidy::libXML->clean(…, 'UTF-8', 1) reconstruct the mark-up as intended. clean returns a whole document, but it's trivial to extract just the needed img element.
Your best bet will be to recollect all of the articles that are in the database so that they aren't truncated and corrupted. If this is not an option then...
Based on your examples above it looks like you're stripping out everything that follows the text content of each article. In your example the text content is followed by a DIV tag and a bunch of IMG tags that may or may not have been truncated and or been converted into HTML entities.
If all of your records are similar you can strip out everything after the Text content by removing the final div tag and everything that follows it using perl like this:
my $article = magic_to_get_an_article();
$article =~ s/<div>.*//s;
magic_to_store_article($article);
If your records include anything more complex than this you're better off using an HTML parsing module and reading the documentation carefully to find out how it handles invalid HTML.
How about a stupid simple Perl find and replace on the var containing your data...
foreach $line(#lines) {
$line =~ s/</</gi;
$line =~ s/>/>/gi;
}
Given the sample input and output you give at the end of your post, the following will get you the desired output:
#!/usr/bin/perl
use strict; use warnings;
use HTML::TokeParser::Simple;
my $parser = HTML::TokeParser::Simple->new( \*DATA );
if ( my $tag = $parser->get_tag('img') ) {
print $tag->as_is;
print $parser->get_text('div');
}
__DATA__
<img src="http://farm4.static.flickr.com/3162/2735565872_b8a4e4bd17.jpg" alt="post_img" width="80" />Through the first two months of the year, the volume of cargo handled at Port of Portland terminals has increased 46 percent as the port?s marine cargo business shows signs of recovering from a dismal 2009.<div> <img src="http://feeds.feedburner.com/~ff/bizj_portland?d=yIl2AUoC8zA" border="0"></img> <img src="http://feeds.feedburner.com/~ff/bizj_portland?i=YIs66yw13JE:_zirAnH6dt8:V_sGLiPBpWU" border="0"></img> <img src="http://feeds.feedburner.com/~ff/bizj_portland?i=YIs66yw13JE:_zirAnH6dt8:F7zBnMyn0Lo" border="0"></img> <a href="http://feeds.bizjournals.com/~ff/bizj_portland?a=YIs66yw13JE:_zirAnH6dt8:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/bizj_portland?d=qj6IDK7rITs"
Output:
<img src="http://farm4.static.flickr.com/3162/2735565872_b8a4e4bd17.jpg" alt="po
st_img" width="80" />Through the first two months of the year, the volume of car
go handled at Port of Portland terminals has increased 46 percent as the port?s
marine cargo business shows signs of recovering from a dismal 2009.
However, I am puzzled as to the size and scope of each chunk you are supposed to process.

What Unicode character do you use in your website? (instead of image icons)

I am looking for character which could replace image icon, for example like ✘ (xmark) and ✔ (tick), maybe some symbol to "draft" or "new message"?
EDIT:
Fav: ❤
Draft: ✍
Message: ✉
To find useful symbols, I have two great resources:
http://shapecatcher.com
Allows you to draw a shape, which it then searches for similarly shaped unicode symbols.
https://www.fileformat.info/info/unicode/block/index.htm
Lists unicode by the character blocks (using an embedded unicode font to maximize compatibility for display) and has a "display a certain block with images" functionality that allows you to review symbol blocks.
Both are quite useful though I often end up using shapecatcher these days just because it's a fun break just to be able to draw the shape that you want and have the site pull it up for you. At least, sometimes it will put it up.
Misc. Symbols Blocks
http://shapecatcher.com/unicode/block/Miscellaneous_Symbols_And_Pictographs is also a great category of unicode symbols, though as with all unicode, you may have to test compatibility.
https://www.fileformat.info/info/unicode/block/miscellaneous_symbols/images.htm is the block of the miscellaneous symbols, for comparison.
⌚ U+0231A WATCH
⌛ U+0231B HOURGLASS
♟ U+265F SOLID CHESS PAWN
⚷ U+26B7 CHIRON
★ U+2605 SOLID STAR
✓ U+2713 CHECK MARK
☑ U+2611 SQUARE CHECKBOX
✕ U+2715 MULTIPLICATION X
☒ U+2612 SQUARE X-ED BOX
⚠ U+26A0 WARNING SIGN
Are also good symbols to add to the list.
Edit: In 2019 I would now recommend using a robust icon pack, either in svg form or font-file form, the presentation of unicode is often less controllable for web developers.
stackoverflow.com used to use "●" (U+25CF BLACK CIRCLE) for badges.
There are tons of useful characters in Unicode:
✆ U+2706 TELEPHONE LOCATION SIGN
✉ U+2709 ENVELOPE
☎ U+260E BLACK TELEPHONE and ☏ U+260F WHITE TELEPHONE
✎ U+270E LOWER RIGHT PENCIL
⌛ U+231B HOURGLASS
⌨ U+2328 KEYBOARD
←
↑
→
↓
↔
↕
↖
↗
↘
↙
just to name a few...
Why not just peruse the whole list?
I've used the block-arrows:
U+25b2 ▲, U+25ba ►, U+25bc ▼, U+25c4 ◄
Look at http://unicode.org/charts#symbols for some ideas. I'm not sure what would work for "draft" or "new message" but there is a lot to choose from there.
Some symbols might not be supported by the font selected into the browser page. Even if they are, a lot of them look really bad at small sizes. You're better off using an image if you can.
http://unicode-table.com/ is great too but for some unicodes designed for web design icons, i recommend : http://kudakurage.com/ligature_symbols/.
Twitter Bootstrap uses × (×) for close buttons.
I would suggest using custom font like https://github.com/FortAwesome/Font-Awesome
You can also have svg/png version https://github.com/encharm/Font-Awesome-SVG-PNG
There are also other svg icons
https://github.com/iconic/open-iconic
https://github.com/outpunk/evil-icons
Pure css icons https://github.com/saeedalipoor/icono
For Material Design you have static svg icons https://google.github.io/material-design-icons/ and animated:
http://tympanus.net/Development/AnimatedSVGIcons/
http://tympanus.net/Development/IconHoverEffects/
http://tympanus.net/Development/AnimatedCheckboxes/
https://alexk111.github.io/SVG-Morpheus/
I am surprised no one has posted Unicode emojis yet:
Range U+1F600 - U+1F64F
Just some from the list:
😁 :U+1F601: GRINNING FACE WITH SMILING EYES &#128513
😂 :U+1F602: FACE WITH TEARS OF JOY &#128514
😃 :U+1F603: SMILING FACE WITH OPEN MOUTH &#128515
😄 :U+1F604: SMILING FACE WITH OPEN MOUTH AND SMILING EYES &#128516
😅 :U+1F605: SMILING FACE WITH OPEN MOUTH AND COLD SWEAT &#128517
😆 :U+1F606: SMILING FACE WITH OPEN MOUTH AND TIGHTLY-CLOSED EYES &#128518
😷 :U+1F637: FACE WITH MEDICAL MASK &#128567
Also have a look at this list of cool icons from Supplemental list
☣ : U+2623: BIOHAZARD SIGN &#9763
☢ : U+2622: RADIOACTIVE SIGN &#9762
I've used the magnifying glass icon as the body of an anchor to link to a cool interactive page for some data analysis that allowed a user to pair arbitrary data selections much like this example.
🔎
Being a link the default underline appearance somewhat obscured the unicode glyph but that effect was negligible for our internal tool but might be suboptimal for something public facing.