Does text from a rich text editor not inherit styles when rendered in an HTML document? - html

Just to make things clear, I have used an RTE in the backend to store some description. Later, through an api, I am receiving the description along with other details as a response. Now the styles are intact till now. For example, bold headings. But when I render it in the HTML document using innerHTML property, all I see is unformatted text. The headings are not bold anymore.
Here's a part of response:
</p>\r\n\n\r\n<p><span style=\"font-weight: bold;\">Features</span> \n </p>\r\n\n\r\n<p>Gives even skin tone, smoother complexion and sculpted facial features.
Clearly, font-style="bold" can be seen here. But after this, the rendered version does not contain those styles.
Here's the full response:
"cart_count":2,
"images":[
],
"success":true,
"message":"Sucessfully",
"data":{
"product_id":1,
"name":"Dr G Butterfly Gua Sha",
"category_id":1,
"category":"Skin Tool",
"description":"<p>Dr G Butterfly Rose Quartz Gua Sha is a beauty and wellness tool designed to heal and enhance natural beauty. It lifts and sculpts your face, drains the lymph node, which reduces puffy eyes and face. By scraping with repeated strokes on the surface of the skin, this tool helps stimulate muscles and increases the blood flow. \n </p>\r\n\n\r\n<p><span style=\"font-weight: bold;\">Features</span> \n </p>\r\n\n\r\n<p>Gives even skin tone, smoother complexion and sculpted facial features. Reduces the signs of ageing and gives younger-looking skin. Increases lymphatic function. Stimulates blood circulation. Improves the appearance of dark circles and reduces under-eye puffiness. </p>\r\n\n\r\n<p><span style=\"font-weight: bold;\">How To Use \n</span></p>\r\n\n\r\n<p>Apply Dr G oil or Dr G gel as per your skin type covering the face and neck. </p>\r\n<p>Hold the butterfly gua sha tool firmly and sweep across gently up and out, starting with the neck, cheeks, jawline, chin, around the mouth, and slowly glide under the eyes, across your eyebrows and from your forehead up to your hairline. </p>\r\n<p>You can sweep it 3-5 times per area. </p>\r\n<p>Recommended at least a few times a week for best results. </p>\r\n\n\r\n<p><span style=\"font-weight: bold;\">About Dr G</span> \n </p>\r\n\n\r\n<p>Dr G offers luxury skincare products, backed by over a decade of dermatology expertise and on-ground practice. Made for Indian weather conditions, with variants for different skin types, including sensitive skin, and to address specific skin concerns - these innovative products are a perfect balance of nature and science. Drawing from ancient Ayurveda and combining natural extracts with skin-safe science, Dr G's range of products bridge modern skincare with holistic science.</p>",
"short_description":"Sculpts, Tones, Reduces Puffiness, Lifts",
"max_quantity":500,
"status":1,
"in_stock":1,
"measurement":[
{
"is_cart":true,
"ordered_quantity":2,
"is_wish":false,
"discounted_price":1400.0,
"weight":"200 Gram",
"price":1400.0,
"prod_id":1,
"percentage":100,
"max_quantity":500
}
]
}
}

The HTML from your response isn't valid. You can easily test it, if you copy the HTML string from your response to a text file with .html file ending and open it with your browser (index.html for example). Or use a validator like this one: https://www.freeformatter.com/html-validator.html
Let's pick one part from the HTML string which has wrong characters and gets displayed unformatted:
<span style=\"font-weight: bold;\">Features</span> \n
If you remove the backslashes \ here this peace gets rendered correctly:
<span style="font-weight: bold;">Features</span> \n
I would reccomend you to encode the HTML before sending it to the frondend. You could use Base64 which can be easily encoded in the backend and decoded on the frontend before displaying it.
If this "wrong" characters are already there when you recive this HTML (on your Backend) you have to parse it first to clean it.

Related

Is it possible to train tesseract v5 for OCR Egyptain licence plate?

I'm working on a project to OCR Egyptian licence plate written in arabic alphabet and arabic-indic numbers. The traineddata from https://github.com/Shreeshrii/tessdata_arabic gives an accuracy of 60% for letters and 70% for numbers. I'm gussing the bad accuracy is because the font on the plates is different. Also the letters are written seperatly (أ هـ ج)(ل ل ص) on the plates while it's usually connected in text books (أهج)(للص). And also because the plates deteceted have different lighting conditions or the letters may not be so clear -the plate can be dirty or distorted-.
Here's a sample that's recognised with extra apostrophe at the beginning ('ل ل ص ٦٢٩) after preprocessing the image to gray scale then to black and white. The correct characters are (ل ل ص ٦٢٩)
Another sample of the plates I am trying to recognise. black and white preprocessing. This one fails. it's recognised as (ط ئ ؤ د ١٢) The characters on the plate are (ط ج د ١٢٦٤)
Should I try with another preprossiccing? Or should I retrain the existing traineddata for the different font (I searched the font name but couldn't find it). Or train from scratch as the the plate images have alot of noise and differ in brightness/constract.

Markup text blocks with multiple citations in HTML

I'm undertaking an ambitious project to organize laws in my country starting with the Constitution.
The text body is arranged in short blocks of text called articles, and each article can be modified by a constitutional amendment.
The idea is that when someone highlights a particular article, they get a summary of all the constitutional amendments that went into creating this block of text.
Question: What would be the best way to demarcate a specific block and cite it?
For example, what would be the best way to replace the span tags:
<p>
<span data-ammendment-id="1">
<span data-ammendment-id="2">
10A. Right to fair trial:
For the determination of his civil rights and obligations or in any criminal charge against him a person shall be entitled to a fair trial and due process.
</span></span>
</p>
<p>
<span data-ammendment-id="1">
11 Slavery, forced labour, etc. prohibited
(1) Slavery is non-existent and forbidden and no law shall permit or facilitate its introduction into Pakistan in any form.
(2) All forms of forced labour and traffic in human beings are prohibited.
(3) No child below the age of fourteen years shall be engaged in any factory or mine or any other hazardous employment.
(4) Nothing in this Article shall be deemed to affect compulsory service:-
(a) by any person undergoing punishment for an offence against any law; or
(b) required by any law for public purpose provided that no compulsory service shall be of a cruel nature or incompatible with human dignity.
</span>
</p>
These tags aren't valid here:
cite: doesn't provide a way to demarcate a block of text
details and summary: aren't appropriate here, since I'm looking to cite the appropriate amendment here
blockquote and cite: This would be perfect, except that this is the markup for the original document itself, and so quoting the text wouldn't be appropriate. Especially since some amendments are iterative ("change THIS with THAT at line X") instead of declarative ("replace article X with Y")
del & ins: Unfortunately, this is best used in a diff, which the final version of the document isn't. It must read like plain text.

Sublime Text 2 and Emmet and wrapping lines

I often get text from clients and want to quickly format the text with html tags and wrap each text line at a specified number of characters. I've just installed Sublime Text 2 and it's pretty nice, but one of the things I really want to do I can't quite figure out.
I want to take long paragraphs, wrap each paragraph in a paragraph p tag, and then wrap the lines so they don't run off the screen. So here's what I'm doing:
Copy and paste text from my client into editor (2 paragraphs for this example).
Select text.
Using Emmet, enter "p*" which puts p tags at the beginning of each paragraph and /p at the end of each paragraph.
Select text.
Click Alt Q to wrap text.
The text wraps but it's corrupted because the opening angle bracket "<" from the /p tag is appended to the beginning of each line and the opening angle bracket is missing from the /p tag.
<p>Our swimming lessons run on a perpetual monthly enrollment system,
<making year-round lessons affordable and convenient. Our online
<registration system allows you to sign up at your convenience and
<monitor your account details easily./p>
<p>Our highly trained swim instructors teach our unique, proven
<curriculum in stages, encouraging swimmers to master the fundamentals
<of every important swimming skill. We continuously encourage
<progression and advancement as each swimmer becomes more confident in
<the water. Our program blends important water safety skills, buoyancy
<principles and correct stroke technique./p>
Help! What am I doing wrong?
Here's what you can do:
Paste content from client.
Select, hit AltQ to wrap. You'll now have two cursors, one at the end of each paragraph.
Select Selection -> Expand Selection to Paragraph (I'll show you how to make a shortcut later). Both paragraphs are now selected, each as a selection region.
Bring up Emmet with CtrlShiftG and enter p (not p*)
Hit Enter and you should have two wrapped paragraphs surrounded by <p></p> tags:
<p>Our swimming lessons run on a perpetual monthly enrollment system, making
year-round lessons affordable and convenient. Our online registration system
allows you to sign up at your convenience and monitor your account details
easily.</p>
<p>Our highly trained swim instructors teach our unique, proven curriculum in
stages, encouraging swimmers to master the fundamentals of every important
swimming skill. We continuously encourage progression and advancement as each
swimmer becomes more confident in the water. Our program blends important
water safety skills, buoyancy principles and correct stroke technique.</p>
To create a keyboard shortcut for Expand Selection to Paragraph, go to Preferences -> Key Bindings - User and add the following:
{ "keys": ["ctrl+alt+shift+p"], "command": "expand_selection_to_paragraph" }
If the file is empty, wrap the line above in square brackets []:
[
{ "keys": ["ctrl+alt+shift+p"], "command": "expand_selection_to_paragraph" }
]
Save the file, and you should now be able to use CtrlAltShiftP for step 3 above. Feel free to change the key combination if you wish, but be aware that it may conflict with other built-in or plugin combos.
Note: I tested all this on Sublime Text 3, but it should work the same in ST2.

Can OpenNLP use HTML tags as part of the training?

I'm creating a training set for the TokenNameFinder using html documents converted into plain text, but my precision is low and I want to use the HTML tags as part of the training. Like words in bold, and sentences in differents margin sizes.
Will OpenNLP accept and use those tags to create rules?
Is there another way to make use of those tags to improve precision?
It is not clear what you mean with using HTML tags to train OpenNLP.
The train input is an annotated tokenized sentence:
<START:person> Pierre Vinken <END> , 61 years old , will join the board as a nonexecutive director Nov. 29 .
Mr . <START:person> Vinken <END> is chairman of <START:company> Elsevier N.V. <END> , the Dutch publishing group .
To train an OpenNLP model using the standard tooling you need annotations follows this convention. Note that the annotations does not follow the XML standard.
You can embed annotations directly to the HTML documents you will use for training. It might even help the classifier with the extra context, but I've never read any experimental results about it.
You should keep in mind that the training data should be tokenized. It means that you should include white spaces between words and punctuation, as well as between text elements and html:
<p> <i> Mr . <START:person> Vinken <END> </i> is chairman of <b> <START:company> Elsevier N.V. <END> </b>, the Dutch publishing group .

What Unicode character do you use in your website? (instead of image icons)

I am looking for character which could replace image icon, for example like ✘ (xmark) and ✔ (tick), maybe some symbol to "draft" or "new message"?
EDIT:
Fav: ❤
Draft: ✍
Message: ✉
To find useful symbols, I have two great resources:
http://shapecatcher.com
Allows you to draw a shape, which it then searches for similarly shaped unicode symbols.
https://www.fileformat.info/info/unicode/block/index.htm
Lists unicode by the character blocks (using an embedded unicode font to maximize compatibility for display) and has a "display a certain block with images" functionality that allows you to review symbol blocks.
Both are quite useful though I often end up using shapecatcher these days just because it's a fun break just to be able to draw the shape that you want and have the site pull it up for you. At least, sometimes it will put it up.
Misc. Symbols Blocks
http://shapecatcher.com/unicode/block/Miscellaneous_Symbols_And_Pictographs is also a great category of unicode symbols, though as with all unicode, you may have to test compatibility.
https://www.fileformat.info/info/unicode/block/miscellaneous_symbols/images.htm is the block of the miscellaneous symbols, for comparison.
⌚ U+0231A WATCH
⌛ U+0231B HOURGLASS
♟ U+265F SOLID CHESS PAWN
⚷ U+26B7 CHIRON
★ U+2605 SOLID STAR
✓ U+2713 CHECK MARK
☑ U+2611 SQUARE CHECKBOX
✕ U+2715 MULTIPLICATION X
☒ U+2612 SQUARE X-ED BOX
⚠ U+26A0 WARNING SIGN
Are also good symbols to add to the list.
Edit: In 2019 I would now recommend using a robust icon pack, either in svg form or font-file form, the presentation of unicode is often less controllable for web developers.
stackoverflow.com used to use "●" (U+25CF BLACK CIRCLE) for badges.
There are tons of useful characters in Unicode:
✆ U+2706 TELEPHONE LOCATION SIGN
✉ U+2709 ENVELOPE
☎ U+260E BLACK TELEPHONE and ☏ U+260F WHITE TELEPHONE
✎ U+270E LOWER RIGHT PENCIL
⌛ U+231B HOURGLASS
⌨ U+2328 KEYBOARD
←
↑
→
↓
↔
↕
↖
↗
↘
↙
just to name a few...
Why not just peruse the whole list?
I've used the block-arrows:
U+25b2 ▲, U+25ba ►, U+25bc ▼, U+25c4 ◄
Look at http://unicode.org/charts#symbols for some ideas. I'm not sure what would work for "draft" or "new message" but there is a lot to choose from there.
Some symbols might not be supported by the font selected into the browser page. Even if they are, a lot of them look really bad at small sizes. You're better off using an image if you can.
http://unicode-table.com/ is great too but for some unicodes designed for web design icons, i recommend : http://kudakurage.com/ligature_symbols/.
Twitter Bootstrap uses × (×) for close buttons.
I would suggest using custom font like https://github.com/FortAwesome/Font-Awesome
You can also have svg/png version https://github.com/encharm/Font-Awesome-SVG-PNG
There are also other svg icons
https://github.com/iconic/open-iconic
https://github.com/outpunk/evil-icons
Pure css icons https://github.com/saeedalipoor/icono
For Material Design you have static svg icons https://google.github.io/material-design-icons/ and animated:
http://tympanus.net/Development/AnimatedSVGIcons/
http://tympanus.net/Development/IconHoverEffects/
http://tympanus.net/Development/AnimatedCheckboxes/
https://alexk111.github.io/SVG-Morpheus/
I am surprised no one has posted Unicode emojis yet:
Range U+1F600 - U+1F64F
Just some from the list:
😁 :U+1F601: GRINNING FACE WITH SMILING EYES &#128513
😂 :U+1F602: FACE WITH TEARS OF JOY &#128514
😃 :U+1F603: SMILING FACE WITH OPEN MOUTH &#128515
😄 :U+1F604: SMILING FACE WITH OPEN MOUTH AND SMILING EYES &#128516
😅 :U+1F605: SMILING FACE WITH OPEN MOUTH AND COLD SWEAT &#128517
😆 :U+1F606: SMILING FACE WITH OPEN MOUTH AND TIGHTLY-CLOSED EYES &#128518
😷 :U+1F637: FACE WITH MEDICAL MASK &#128567
Also have a look at this list of cool icons from Supplemental list
☣ : U+2623: BIOHAZARD SIGN &#9763
☢ : U+2622: RADIOACTIVE SIGN &#9762
I've used the magnifying glass icon as the body of an anchor to link to a cool interactive page for some data analysis that allowed a user to pair arbitrary data selections much like this example.
🔎
Being a link the default underline appearance somewhat obscured the unicode glyph but that effect was negligible for our internal tool but might be suboptimal for something public facing.