How do I decode HTML entities in Swift? - json

I am pulling a JSON file from a site and one of the strings received is:
The Weeknd ‘King Of The Fall’ [Video Premiere] | #TheWeeknd | #SoPhi
How can I convert things like &#8216 into the correct characters?
I've made a Xcode Playground to demonstrate it:
import UIKit
var error: NSError?
let blogUrl: NSURL = NSURL.URLWithString("http://sophisticatedignorance.net/api/get_recent_summary/")
let jsonData = NSData(contentsOfURL: blogUrl)
let dataDictionary = NSJSONSerialization.JSONObjectWithData(jsonData, options: nil, error: &error) as NSDictionary
var a = dataDictionary["posts"] as NSArray
println(a[0]["title"])

This answer was last revised for Swift 5.2 and iOS 13.4 SDK.
There's no straightforward way to do that, but you can use NSAttributedString magic to make this process as painless as possible (be warned that this method will strip all HTML tags as well).
Remember to initialize NSAttributedString from main thread only. It uses WebKit to parse HTML underneath, thus the requirement.
// This is a[0]["title"] in your case
let htmlEncodedString = "The Weeknd <em>‘King Of The Fall’</em>"
guard let data = htmlEncodedString.data(using: .utf8) else {
return
}
let options: [NSAttributedString.DocumentReadingOptionKey: Any] = [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
]
guard let attributedString = try? NSAttributedString(data: data, options: options, documentAttributes: nil) else {
return
}
// The Weeknd ‘King Of The Fall’
let decodedString = attributedString.string
extension String {
init?(htmlEncodedString: String) {
guard let data = htmlEncodedString.data(using: .utf8) else {
return nil
}
let options: [NSAttributedString.DocumentReadingOptionKey: Any] = [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
]
guard let attributedString = try? NSAttributedString(data: data, options: options, documentAttributes: nil) else {
return nil
}
self.init(attributedString.string)
}
}
let encodedString = "The Weeknd <em>‘King Of The Fall’</em>"
let decodedString = String(htmlEncodedString: encodedString)

#akashivskyy's answer is great and demonstrates how to utilize NSAttributedString to decode HTML entities. One possible disadvantage
(as he stated) is that all HTML markup is removed as well, so
<strong> 4 < 5 & 3 > 2</strong>
becomes
4 < 5 & 3 > 2
On OS X there is CFXMLCreateStringByUnescapingEntities() which does the job:
let encoded = "<strong> 4 < 5 & 3 > 2 .</strong> Price: 12 €. # "
let decoded = CFXMLCreateStringByUnescapingEntities(nil, encoded, nil) as String
println(decoded)
// <strong> 4 < 5 & 3 > 2 .</strong> Price: 12 €. #
but this is not available on iOS.
Here is a pure Swift implementation. It decodes character entities
references like < using a dictionary, and all numeric character
entities like &#64 or &#x20ac. (Note that I did not list all
252 HTML entities explicitly.)
Swift 4:
// Mapping from XML/HTML character entity reference to character
// From http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
private let characterEntities : [ Substring : Character ] = [
// XML predefined entities:
""" : "\"",
"&" : "&",
"&apos;" : "'",
"<" : "<",
">" : ">",
// HTML character entity references:
" " : "\u{00a0}",
// ...
"♦" : "♦",
]
extension String {
/// Returns a new string made by replacing in the `String`
/// all HTML character entity references with the corresponding
/// character.
var stringByDecodingHTMLEntities : String {
// ===== Utility functions =====
// Convert the number in the string to the corresponding
// Unicode character, e.g.
// decodeNumeric("64", 10) --> "#"
// decodeNumeric("20ac", 16) --> "€"
func decodeNumeric(_ string : Substring, base : Int) -> Character? {
guard let code = UInt32(string, radix: base),
let uniScalar = UnicodeScalar(code) else { return nil }
return Character(uniScalar)
}
// Decode the HTML character entity to the corresponding
// Unicode character, return `nil` for invalid input.
// decode("#") --> "#"
// decode("€") --> "€"
// decode("<") --> "<"
// decode("&foo;") --> nil
func decode(_ entity : Substring) -> Character? {
if entity.hasPrefix("&#x") || entity.hasPrefix("&#X") {
return decodeNumeric(entity.dropFirst(3).dropLast(), base: 16)
} else if entity.hasPrefix("&#") {
return decodeNumeric(entity.dropFirst(2).dropLast(), base: 10)
} else {
return characterEntities[entity]
}
}
// ===== Method starts here =====
var result = ""
var position = startIndex
// Find the next '&' and copy the characters preceding it to `result`:
while let ampRange = self[position...].range(of: "&") {
result.append(contentsOf: self[position ..< ampRange.lowerBound])
position = ampRange.lowerBound
// Find the next ';' and copy everything from '&' to ';' into `entity`
guard let semiRange = self[position...].range(of: ";") else {
// No matching ';'.
break
}
let entity = self[position ..< semiRange.upperBound]
position = semiRange.upperBound
if let decoded = decode(entity) {
// Replace by decoded character:
result.append(decoded)
} else {
// Invalid entity, copy verbatim:
result.append(contentsOf: entity)
}
}
// Copy remaining characters to `result`:
result.append(contentsOf: self[position...])
return result
}
}
Example:
let encoded = "<strong> 4 < 5 & 3 > 2 .</strong> Price: 12 €. # "
let decoded = encoded.stringByDecodingHTMLEntities
print(decoded)
// <strong> 4 < 5 & 3 > 2 .</strong> Price: 12 €. #
Swift 3:
// Mapping from XML/HTML character entity reference to character
// From http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
private let characterEntities : [ String : Character ] = [
// XML predefined entities:
""" : "\"",
"&" : "&",
"&apos;" : "'",
"<" : "<",
">" : ">",
// HTML character entity references:
" " : "\u{00a0}",
// ...
"♦" : "♦",
]
extension String {
/// Returns a new string made by replacing in the `String`
/// all HTML character entity references with the corresponding
/// character.
var stringByDecodingHTMLEntities : String {
// ===== Utility functions =====
// Convert the number in the string to the corresponding
// Unicode character, e.g.
// decodeNumeric("64", 10) --> "#"
// decodeNumeric("20ac", 16) --> "€"
func decodeNumeric(_ string : String, base : Int) -> Character? {
guard let code = UInt32(string, radix: base),
let uniScalar = UnicodeScalar(code) else { return nil }
return Character(uniScalar)
}
// Decode the HTML character entity to the corresponding
// Unicode character, return `nil` for invalid input.
// decode("#") --> "#"
// decode("€") --> "€"
// decode("<") --> "<"
// decode("&foo;") --> nil
func decode(_ entity : String) -> Character? {
if entity.hasPrefix("&#x") || entity.hasPrefix("&#X"){
return decodeNumeric(entity.substring(with: entity.index(entity.startIndex, offsetBy: 3) ..< entity.index(entity.endIndex, offsetBy: -1)), base: 16)
} else if entity.hasPrefix("&#") {
return decodeNumeric(entity.substring(with: entity.index(entity.startIndex, offsetBy: 2) ..< entity.index(entity.endIndex, offsetBy: -1)), base: 10)
} else {
return characterEntities[entity]
}
}
// ===== Method starts here =====
var result = ""
var position = startIndex
// Find the next '&' and copy the characters preceding it to `result`:
while let ampRange = self.range(of: "&", range: position ..< endIndex) {
result.append(self[position ..< ampRange.lowerBound])
position = ampRange.lowerBound
// Find the next ';' and copy everything from '&' to ';' into `entity`
if let semiRange = self.range(of: ";", range: position ..< endIndex) {
let entity = self[position ..< semiRange.upperBound]
position = semiRange.upperBound
if let decoded = decode(entity) {
// Replace by decoded character:
result.append(decoded)
} else {
// Invalid entity, copy verbatim:
result.append(entity)
}
} else {
// No matching ';'.
break
}
}
// Copy remaining characters to `result`:
result.append(self[position ..< endIndex])
return result
}
}
Swift 2:
// Mapping from XML/HTML character entity reference to character
// From http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
private let characterEntities : [ String : Character ] = [
// XML predefined entities:
""" : "\"",
"&" : "&",
"&apos;" : "'",
"<" : "<",
">" : ">",
// HTML character entity references:
" " : "\u{00a0}",
// ...
"♦" : "♦",
]
extension String {
/// Returns a new string made by replacing in the `String`
/// all HTML character entity references with the corresponding
/// character.
var stringByDecodingHTMLEntities : String {
// ===== Utility functions =====
// Convert the number in the string to the corresponding
// Unicode character, e.g.
// decodeNumeric("64", 10) --> "#"
// decodeNumeric("20ac", 16) --> "€"
func decodeNumeric(string : String, base : Int32) -> Character? {
let code = UInt32(strtoul(string, nil, base))
return Character(UnicodeScalar(code))
}
// Decode the HTML character entity to the corresponding
// Unicode character, return `nil` for invalid input.
// decode("#") --> "#"
// decode("€") --> "€"
// decode("<") --> "<"
// decode("&foo;") --> nil
func decode(entity : String) -> Character? {
if entity.hasPrefix("&#x") || entity.hasPrefix("&#X"){
return decodeNumeric(entity.substringFromIndex(entity.startIndex.advancedBy(3)), base: 16)
} else if entity.hasPrefix("&#") {
return decodeNumeric(entity.substringFromIndex(entity.startIndex.advancedBy(2)), base: 10)
} else {
return characterEntities[entity]
}
}
// ===== Method starts here =====
var result = ""
var position = startIndex
// Find the next '&' and copy the characters preceding it to `result`:
while let ampRange = self.rangeOfString("&", range: position ..< endIndex) {
result.appendContentsOf(self[position ..< ampRange.startIndex])
position = ampRange.startIndex
// Find the next ';' and copy everything from '&' to ';' into `entity`
if let semiRange = self.rangeOfString(";", range: position ..< endIndex) {
let entity = self[position ..< semiRange.endIndex]
position = semiRange.endIndex
if let decoded = decode(entity) {
// Replace by decoded character:
result.append(decoded)
} else {
// Invalid entity, copy verbatim:
result.appendContentsOf(entity)
}
} else {
// No matching ';'.
break
}
}
// Copy remaining characters to `result`:
result.appendContentsOf(self[position ..< endIndex])
return result
}
}

Swift 4
String extension computed variable
Without extra guard, do, catch, etc...
Returns the original strings if decoding fails
extension String {
var htmlDecoded: String {
let decoded = try? NSAttributedString(data: Data(utf8), options: [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
], documentAttributes: nil).string
return decoded ?? self
}
}

Swift 3 version of #akashivskyy's extension,
extension String {
init(htmlEncodedString: String) {
self.init()
guard let encodedData = htmlEncodedString.data(using: .utf8) else {
self = htmlEncodedString
return
}
let attributedOptions: [String : Any] = [
NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
NSCharacterEncodingDocumentAttribute: String.Encoding.utf8.rawValue
]
do {
let attributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil)
self = attributedString.string
} catch {
print("Error: \(error)")
self = htmlEncodedString
}
}
}

Swift 2 version of #akashivskyy's extension,
extension String {
init(htmlEncodedString: String) {
if let encodedData = htmlEncodedString.dataUsingEncoding(NSUTF8StringEncoding){
let attributedOptions : [String: AnyObject] = [
NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
NSCharacterEncodingDocumentAttribute: NSUTF8StringEncoding
]
do{
if let attributedString:NSAttributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil){
self.init(attributedString.string)
}else{
print("error")
self.init(htmlEncodedString) //Returning actual string if there is an error
}
}catch{
print("error: \(error)")
self.init(htmlEncodedString) //Returning actual string if there is an error
}
}else{
self.init(htmlEncodedString) //Returning actual string if there is an error
}
}
}

I was looking for a pure Swift 3.0 utility to escape to/unescape from HTML character references (i.e. for server-side Swift apps on both macOS and Linux) but didn't find any comprehensive solutions, so I wrote my own implementation: https://github.com/IBM-Swift/swift-html-entities
The package, HTMLEntities, works with HTML4 named character references as well as hex/dec numeric character references, and it will recognize special numeric character references per the W3 HTML5 spec (i.e. € should be unescaped as the Euro sign (unicode U+20AC) and NOT as the unicode character for U+0080, and certain ranges of numeric character references should be replaced with the replacement character U+FFFD when unescaping).
Usage example:
import HTMLEntities
// encode example
let html = "<script>alert(\"abc\")</script>"
print(html.htmlEscape())
// Prints ”<script>alert("abc")</script>"
// decode example
let htmlencoded = "<script>alert("abc")</script>"
print(htmlencoded.htmlUnescape())
// Prints ”<script>alert(\"abc\")</script>"
And for OP's example:
print("The Weeknd ‘King Of The Fall’ [Video Premiere] | #TheWeeknd | #SoPhi ".htmlUnescape())
// prints "The Weeknd ‘King Of The Fall’ [Video Premiere] | #TheWeeknd | #SoPhi "
Edit: HTMLEntities now supports HTML5 named character references as of version 2.0.0. Spec-compliant parsing is also implemented.

Swift 4 Version
extension String {
init(htmlEncodedString: String) {
self.init()
guard let encodedData = htmlEncodedString.data(using: .utf8) else {
self = htmlEncodedString
return
}
let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
]
do {
let attributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil)
self = attributedString.string
}
catch {
print("Error: \(error)")
self = htmlEncodedString
}
}
}

extension String{
func decodeEnt() -> String{
let encodedData = self.dataUsingEncoding(NSUTF8StringEncoding)!
let attributedOptions : [String: AnyObject] = [
NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
NSCharacterEncodingDocumentAttribute: NSUTF8StringEncoding
]
let attributedString = NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil, error: nil)!
return attributedString.string
}
}
let encodedString = "The Weeknd ‘King Of The Fall’"
let foo = encodedString.decodeEnt() /* The Weeknd ‘King Of The Fall’ */

Swift 4:
The total solution that finally worked for me with HTML code and newline characters and single quotes
extension String {
var htmlDecoded: String {
let decoded = try? NSAttributedString(data: Data(utf8), options: [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
], documentAttributes: nil).string
return decoded ?? self
}
}
Usage:
let yourStringEncoded = yourStringWithHtmlcode.htmlDecoded
I then had to apply some more filters to get rid of single quotes (for example, don't, hasn't, It's, etc.), and new line characters like \n:
var yourNewString = String(yourStringEncoded.filter { !"\n\t\r".contains($0) })
yourNewString = yourNewString.replacingOccurrences(of: "\'", with: "", options: NSString.CompareOptions.literal, range: nil)

This would be my approach. You could add the entities dictionary from https://gist.github.com/mwaterfall/25b4a6a06dc3309d9555 Michael Waterfall mentions.
extension String {
func htmlDecoded()->String {
guard (self != "") else { return self }
var newStr = self
let entities = [
""" : "\"",
"&" : "&",
"&apos;" : "'",
"<" : "<",
">" : ">",
]
for (name,value) in entities {
newStr = newStr.stringByReplacingOccurrencesOfString(name, withString: value)
}
return newStr
}
}
Examples used:
let encoded = "this is so "good""
let decoded = encoded.htmlDecoded() // "this is so "good""
OR
let encoded = "this is so "good"".htmlDecoded() // "this is so "good""

Elegant Swift 4 Solution
If you want a string,
myString = String(htmlString: encodedString)
add this extension to your project:
extension String {
init(htmlString: String) {
self.init()
guard let encodedData = htmlString.data(using: .utf8) else {
self = htmlString
return
}
let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
]
do {
let attributedString = try NSAttributedString(data: encodedData,
options: attributedOptions,
documentAttributes: nil)
self = attributedString.string
} catch {
print("Error: \(error.localizedDescription)")
self = htmlString
}
}
}
If you want an NSAttributedString with bold, italic, links, etc.,
textField.attributedText = try? NSAttributedString(htmlString: encodedString)
add this extension to your project:
extension NSAttributedString {
convenience init(htmlString html: String) throws {
try self.init(data: Data(html.utf8), options: [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
], documentAttributes: nil)
}
}

Swift 5.1 Version
import UIKit
extension String {
init(htmlEncodedString: String) {
self.init()
guard let encodedData = htmlEncodedString.data(using: .utf8) else {
self = htmlEncodedString
return
}
let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
]
do {
let attributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil)
self = attributedString.string
}
catch {
print("Error: \(error)")
self = htmlEncodedString
}
}
}
Also, if you want to extract date, images, metadata, title and description, you can use my pod named:
.
Readability kit

Swift 4
I really like the solution using documentAttributes. However, it is may too slow for parsing files and/or usage in table view cells. I can't believe that Apple does not provide a decent solution for this.
As a workaround, I found this String Extension on GitHub which works perfectly and is fast for decoding.
So for situations in which the given answer is to slow, see the solution suggest in this link:
https://gist.github.com/mwaterfall/25b4a6a06dc3309d9555
Note: it does not parse HTML tags.

Computed var version of #yishus' answer
public extension String {
/// Decodes string with HTML encoding.
var htmlDecoded: String {
guard let encodedData = self.data(using: .utf8) else { return self }
let attributedOptions: [String : Any] = [
NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
NSCharacterEncodingDocumentAttribute: String.Encoding.utf8.rawValue]
do {
let attributedString = try NSAttributedString(data: encodedData,
options: attributedOptions,
documentAttributes: nil)
return attributedString.string
} catch {
print("Error: \(error)")
return self
}
}
}

Swift 4
func decodeHTML(string: String) -> String? {
var decodedString: String?
if let encodedData = string.data(using: .utf8) {
let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
]
do {
decodedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil).string
} catch {
print("\(error.localizedDescription)")
}
}
return decodedString
}

Swift 4.1 +
var htmlDecoded: String {
let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
NSAttributedString.DocumentReadingOptionKey.documentType : NSAttributedString.DocumentType.html,
NSAttributedString.DocumentReadingOptionKey.characterEncoding : String.Encoding.utf8.rawValue
]
let decoded = try? NSAttributedString(data: Data(utf8), options: attributedOptions
, documentAttributes: nil).string
return decoded ?? self
}

Swift 4
extension String {
var replacingHTMLEntities: String? {
do {
return try NSAttributedString(data: Data(utf8), options: [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
], documentAttributes: nil).string
} catch {
return nil
}
}
}
Simple Usage
let clean = "Weeknd ‘King Of The Fall&#8217".replacingHTMLEntities ?? "default value"

Updated answer working on Swift 3
extension String {
init?(htmlEncodedString: String) {
let encodedData = htmlEncodedString.data(using: String.Encoding.utf8)!
let attributedOptions = [ NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType]
guard let attributedString = try? NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil) else {
return nil
}
self.init(attributedString.string)
}

Have a look at HTMLString - a library written in Swift that allows your program to add and remove HTML entities in Strings
For completeness, I copied the main features from the site:
Adds entities for ASCII and UTF-8/UTF-16 encodings
Removes more than 2100 named entities (like &)
Supports removing decimal and hexadecimal entities
Designed to support Swift Extended Grapheme Clusters (→ 100% emoji-proof)
Fully unit tested
Fast
Documented
Compatible with Objective-C

Objective-C
+(NSString *) decodeHTMLEnocdedString:(NSString *)htmlEncodedString {
if (!htmlEncodedString) {
return nil;
}
NSData *data = [htmlEncodedString dataUsingEncoding:NSUTF8StringEncoding];
NSDictionary *attributes = #{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
NSCharacterEncodingDocumentAttribute: #(NSUTF8StringEncoding)};
NSAttributedString *attributedString = [[NSAttributedString alloc] initWithData:data options:attributes documentAttributes:nil error:nil];
return [attributedString string];
}

Swift 3.0 version with actual font size conversion
Normally, if you directly convert HTML content to an attributed string, the font size is increased. You can try to convert an HTML string to an attributed string and back again to see the difference.
Instead, here is the actual size conversion that makes sure the font size does not change, by applying the 0.75 ratio on all fonts:
extension String {
func htmlAttributedString() -> NSAttributedString? {
guard let data = self.data(using: String.Encoding.utf16, allowLossyConversion: false) else { return nil }
guard let attriStr = try? NSMutableAttributedString(
data: data,
options: [NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType],
documentAttributes: nil) else { return nil }
attriStr.beginEditing()
attriStr.enumerateAttribute(NSFontAttributeName, in: NSMakeRange(0, attriStr.length), options: .init(rawValue: 0)) {
(value, range, stop) in
if let font = value as? UIFont {
let resizedFont = font.withSize(font.pointSize * 0.75)
attriStr.addAttribute(NSFontAttributeName,
value: resizedFont,
range: range)
}
}
attriStr.endEditing()
return attriStr
}
}

Swift 4
extension String {
mutating func toHtmlEncodedString() {
guard let encodedData = self.data(using: .utf8) else {
return
}
let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
NSAttributedString.DocumentReadingOptionKey(rawValue: NSAttributedString.DocumentAttributeKey.documentType.rawValue): NSAttributedString.DocumentType.html,
NSAttributedString.DocumentReadingOptionKey(rawValue: NSAttributedString.DocumentAttributeKey.characterEncoding.rawValue): String.Encoding.utf8.rawValue
]
do {
let attributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil)
self = attributedString.string
}
catch {
print("Error: \(error)")
}
}

Use:
NSData dataRes = (nsdata value )
var resString = NSString(data: dataRes, encoding: NSUTF8StringEncoding)

Related

iOS 11 changes font family while converting HTML to NSAttributedString

I have a string extension that converts HTML to an NSAttributedString, but i add my own custom font to the HTML string from an API before formatting in the following way.
let testString = "<span style="font-family: MuseoSans-900"> This is an html String <i> with italics </i> and a custom font </span>"
guard let htmlData = testString.data(using: .utf8) else { return nil }
do {
let attrString = try NSAttributedString(data: htmlData, options: [
NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
NSCharacterEncodingDocumentAttribute: String.Encoding.utf8.rawValue,
], documentAttributes: nil)
let formattedHTMLString = NSMutableAttributedString(attributedString: HTMLString)
formattedHTMLString.beginEditing()
formattedHTMLString.enumerateAttribute(NSFontAttributeName, in: NSMakeRange(0, formattedHTMLString.length), options: []){
(_ value: Any?, _ range: NSRange, _ stop: UnsafeMutablePointer) -> Void in
if value != nil {
var oldFont = value as? UIFont
print("\(oldFont?.fontName)")
}
}
return formattedHTMLString
} catch let error as NSError {
print(error.localizedDescription)
return nil
}
On iOS 10 < the attributed string is returned with the entire string having the same font (in this case MuseoSans-900) including the italized part of the html string but on iOS 11 it updates and changes the font to MuseoSans-300 for the italics part of the html string and keeps the rest of the string MuseoSans-900 during the conversion to NSAttributed string..I do not know why this is happening and any help would be greatly appreciated.

Display attributed text in textview in swift 3?

I want to display text in Italic ,bold which is received from server .
< p > < b > < i > hello < /i > < /b > < i >world< /i >< /p > so,
responseObj!["text"] = "<p><b><i>hello</i></b><i>world</></p>"
if let postText:String = responseObj!["text"] as? String{
let str = try NSAttributedString(data: postText.data(using: String.Encoding.unicode, allowLossyConversion: true)!, options: [ NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType], documentAttributes: nil)
self.textView.attributedText = str
}
when i am adding like this means my text displays normal without applying bold & italic . I want text to be displayed in bold ,italic .
EDIT: Updated Swift 4
Create an extension for String:
extension String {
func htmlAttributedString(fontSize: CGFloat = 17.0) -> NSAttributedString? {
let fontName = UIFont.systemFont(ofSize: fontSize).fontName
let string = self.appending(String(format: "<style>body{font-family: '%#'; font-size:%fpx;}</style>", fontName, fontSize))
guard let data = string.data(using: String.Encoding.utf16, allowLossyConversion: false) else { return nil }
guard let html = try? NSMutableAttributedString (
data: data,
options: [NSAttributedString.DocumentReadingOptionKey.documentType: NSAttributedString.DocumentType.html],
documentAttributes: nil) else { return nil }
return html
}
}
Invoke it anywhere in your project:
text.htmlAttributedString() // will set fontSize with 17.0 which is the default value
text.htmlAttributedString(fontSize: 14.0) // pass your required fontSize
Note, that let string = self.appending(String(format: "<style>body{font-family: '%#'; font-size:%fpx;}</style>", fontName, fontSize)) is used to keep the string font as same as the font used in default iOS platform.
To set the string on UITextView use:
textView.attributedText = textToBeConverted.htmlAttributedString() ?? ""
RESULT:
Assign Attributed text to UITextView from below function,
//HTML to Attributed text
func stringFromHtml(string: String) -> NSAttributedString? {
do {
let data = string.data(using: String.Encoding.utf8, allowLossyConversion: true)
if let d = data {
let str = try NSAttributedString(data: d,
options: [NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType],
documentAttributes: nil)
return str
}
} catch {
}
return nil
}
//This will give you attributed text which you have to assign to your UITextView. and whatever string having html tag that you have to pass inside this function.
//Below code will conver attributed text to HTML Text
//Attributed to HTML text
let attrStr = self.txtView.attributedText
let documentAttributes = [NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType]
do {
let htmlData = try attrStr?.data(from: NSMakeRange(0, (attrStr?.length)!), documentAttributes:documentAttributes)
if let htmlString = String(data:htmlData!, encoding:String.Encoding.utf8) {
print(htmlString)
}
}
catch {
print("error creating HTML from Attributed String")
}

Convert all HTML entities to their applicable characters in SWIFT 3 [duplicate]

I am pulling a JSON file from a site and one of the strings received is:
The Weeknd ‘King Of The Fall’ [Video Premiere] | #TheWeeknd | #SoPhi
How can I convert things like &#8216 into the correct characters?
I've made a Xcode Playground to demonstrate it:
import UIKit
var error: NSError?
let blogUrl: NSURL = NSURL.URLWithString("http://sophisticatedignorance.net/api/get_recent_summary/")
let jsonData = NSData(contentsOfURL: blogUrl)
let dataDictionary = NSJSONSerialization.JSONObjectWithData(jsonData, options: nil, error: &error) as NSDictionary
var a = dataDictionary["posts"] as NSArray
println(a[0]["title"])
This answer was last revised for Swift 5.2 and iOS 13.4 SDK.
There's no straightforward way to do that, but you can use NSAttributedString magic to make this process as painless as possible (be warned that this method will strip all HTML tags as well).
Remember to initialize NSAttributedString from main thread only. It uses WebKit to parse HTML underneath, thus the requirement.
// This is a[0]["title"] in your case
let htmlEncodedString = "The Weeknd <em>‘King Of The Fall’</em>"
guard let data = htmlEncodedString.data(using: .utf8) else {
return
}
let options: [NSAttributedString.DocumentReadingOptionKey: Any] = [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
]
guard let attributedString = try? NSAttributedString(data: data, options: options, documentAttributes: nil) else {
return
}
// The Weeknd ‘King Of The Fall’
let decodedString = attributedString.string
extension String {
init?(htmlEncodedString: String) {
guard let data = htmlEncodedString.data(using: .utf8) else {
return nil
}
let options: [NSAttributedString.DocumentReadingOptionKey: Any] = [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
]
guard let attributedString = try? NSAttributedString(data: data, options: options, documentAttributes: nil) else {
return nil
}
self.init(attributedString.string)
}
}
let encodedString = "The Weeknd <em>‘King Of The Fall’</em>"
let decodedString = String(htmlEncodedString: encodedString)
#akashivskyy's answer is great and demonstrates how to utilize NSAttributedString to decode HTML entities. One possible disadvantage
(as he stated) is that all HTML markup is removed as well, so
<strong> 4 < 5 & 3 > 2</strong>
becomes
4 < 5 & 3 > 2
On OS X there is CFXMLCreateStringByUnescapingEntities() which does the job:
let encoded = "<strong> 4 < 5 & 3 > 2 .</strong> Price: 12 €. # "
let decoded = CFXMLCreateStringByUnescapingEntities(nil, encoded, nil) as String
println(decoded)
// <strong> 4 < 5 & 3 > 2 .</strong> Price: 12 €. #
but this is not available on iOS.
Here is a pure Swift implementation. It decodes character entities
references like < using a dictionary, and all numeric character
entities like &#64 or &#x20ac. (Note that I did not list all
252 HTML entities explicitly.)
Swift 4:
// Mapping from XML/HTML character entity reference to character
// From http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
private let characterEntities : [ Substring : Character ] = [
// XML predefined entities:
""" : "\"",
"&" : "&",
"&apos;" : "'",
"<" : "<",
">" : ">",
// HTML character entity references:
" " : "\u{00a0}",
// ...
"♦" : "♦",
]
extension String {
/// Returns a new string made by replacing in the `String`
/// all HTML character entity references with the corresponding
/// character.
var stringByDecodingHTMLEntities : String {
// ===== Utility functions =====
// Convert the number in the string to the corresponding
// Unicode character, e.g.
// decodeNumeric("64", 10) --> "#"
// decodeNumeric("20ac", 16) --> "€"
func decodeNumeric(_ string : Substring, base : Int) -> Character? {
guard let code = UInt32(string, radix: base),
let uniScalar = UnicodeScalar(code) else { return nil }
return Character(uniScalar)
}
// Decode the HTML character entity to the corresponding
// Unicode character, return `nil` for invalid input.
// decode("#") --> "#"
// decode("€") --> "€"
// decode("<") --> "<"
// decode("&foo;") --> nil
func decode(_ entity : Substring) -> Character? {
if entity.hasPrefix("&#x") || entity.hasPrefix("&#X") {
return decodeNumeric(entity.dropFirst(3).dropLast(), base: 16)
} else if entity.hasPrefix("&#") {
return decodeNumeric(entity.dropFirst(2).dropLast(), base: 10)
} else {
return characterEntities[entity]
}
}
// ===== Method starts here =====
var result = ""
var position = startIndex
// Find the next '&' and copy the characters preceding it to `result`:
while let ampRange = self[position...].range(of: "&") {
result.append(contentsOf: self[position ..< ampRange.lowerBound])
position = ampRange.lowerBound
// Find the next ';' and copy everything from '&' to ';' into `entity`
guard let semiRange = self[position...].range(of: ";") else {
// No matching ';'.
break
}
let entity = self[position ..< semiRange.upperBound]
position = semiRange.upperBound
if let decoded = decode(entity) {
// Replace by decoded character:
result.append(decoded)
} else {
// Invalid entity, copy verbatim:
result.append(contentsOf: entity)
}
}
// Copy remaining characters to `result`:
result.append(contentsOf: self[position...])
return result
}
}
Example:
let encoded = "<strong> 4 < 5 & 3 > 2 .</strong> Price: 12 €. # "
let decoded = encoded.stringByDecodingHTMLEntities
print(decoded)
// <strong> 4 < 5 & 3 > 2 .</strong> Price: 12 €. #
Swift 3:
// Mapping from XML/HTML character entity reference to character
// From http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
private let characterEntities : [ String : Character ] = [
// XML predefined entities:
""" : "\"",
"&" : "&",
"&apos;" : "'",
"<" : "<",
">" : ">",
// HTML character entity references:
" " : "\u{00a0}",
// ...
"♦" : "♦",
]
extension String {
/// Returns a new string made by replacing in the `String`
/// all HTML character entity references with the corresponding
/// character.
var stringByDecodingHTMLEntities : String {
// ===== Utility functions =====
// Convert the number in the string to the corresponding
// Unicode character, e.g.
// decodeNumeric("64", 10) --> "#"
// decodeNumeric("20ac", 16) --> "€"
func decodeNumeric(_ string : String, base : Int) -> Character? {
guard let code = UInt32(string, radix: base),
let uniScalar = UnicodeScalar(code) else { return nil }
return Character(uniScalar)
}
// Decode the HTML character entity to the corresponding
// Unicode character, return `nil` for invalid input.
// decode("#") --> "#"
// decode("€") --> "€"
// decode("<") --> "<"
// decode("&foo;") --> nil
func decode(_ entity : String) -> Character? {
if entity.hasPrefix("&#x") || entity.hasPrefix("&#X"){
return decodeNumeric(entity.substring(with: entity.index(entity.startIndex, offsetBy: 3) ..< entity.index(entity.endIndex, offsetBy: -1)), base: 16)
} else if entity.hasPrefix("&#") {
return decodeNumeric(entity.substring(with: entity.index(entity.startIndex, offsetBy: 2) ..< entity.index(entity.endIndex, offsetBy: -1)), base: 10)
} else {
return characterEntities[entity]
}
}
// ===== Method starts here =====
var result = ""
var position = startIndex
// Find the next '&' and copy the characters preceding it to `result`:
while let ampRange = self.range(of: "&", range: position ..< endIndex) {
result.append(self[position ..< ampRange.lowerBound])
position = ampRange.lowerBound
// Find the next ';' and copy everything from '&' to ';' into `entity`
if let semiRange = self.range(of: ";", range: position ..< endIndex) {
let entity = self[position ..< semiRange.upperBound]
position = semiRange.upperBound
if let decoded = decode(entity) {
// Replace by decoded character:
result.append(decoded)
} else {
// Invalid entity, copy verbatim:
result.append(entity)
}
} else {
// No matching ';'.
break
}
}
// Copy remaining characters to `result`:
result.append(self[position ..< endIndex])
return result
}
}
Swift 2:
// Mapping from XML/HTML character entity reference to character
// From http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
private let characterEntities : [ String : Character ] = [
// XML predefined entities:
""" : "\"",
"&" : "&",
"&apos;" : "'",
"<" : "<",
">" : ">",
// HTML character entity references:
" " : "\u{00a0}",
// ...
"♦" : "♦",
]
extension String {
/// Returns a new string made by replacing in the `String`
/// all HTML character entity references with the corresponding
/// character.
var stringByDecodingHTMLEntities : String {
// ===== Utility functions =====
// Convert the number in the string to the corresponding
// Unicode character, e.g.
// decodeNumeric("64", 10) --> "#"
// decodeNumeric("20ac", 16) --> "€"
func decodeNumeric(string : String, base : Int32) -> Character? {
let code = UInt32(strtoul(string, nil, base))
return Character(UnicodeScalar(code))
}
// Decode the HTML character entity to the corresponding
// Unicode character, return `nil` for invalid input.
// decode("#") --> "#"
// decode("€") --> "€"
// decode("<") --> "<"
// decode("&foo;") --> nil
func decode(entity : String) -> Character? {
if entity.hasPrefix("&#x") || entity.hasPrefix("&#X"){
return decodeNumeric(entity.substringFromIndex(entity.startIndex.advancedBy(3)), base: 16)
} else if entity.hasPrefix("&#") {
return decodeNumeric(entity.substringFromIndex(entity.startIndex.advancedBy(2)), base: 10)
} else {
return characterEntities[entity]
}
}
// ===== Method starts here =====
var result = ""
var position = startIndex
// Find the next '&' and copy the characters preceding it to `result`:
while let ampRange = self.rangeOfString("&", range: position ..< endIndex) {
result.appendContentsOf(self[position ..< ampRange.startIndex])
position = ampRange.startIndex
// Find the next ';' and copy everything from '&' to ';' into `entity`
if let semiRange = self.rangeOfString(";", range: position ..< endIndex) {
let entity = self[position ..< semiRange.endIndex]
position = semiRange.endIndex
if let decoded = decode(entity) {
// Replace by decoded character:
result.append(decoded)
} else {
// Invalid entity, copy verbatim:
result.appendContentsOf(entity)
}
} else {
// No matching ';'.
break
}
}
// Copy remaining characters to `result`:
result.appendContentsOf(self[position ..< endIndex])
return result
}
}
Swift 4
String extension computed variable
Without extra guard, do, catch, etc...
Returns the original strings if decoding fails
extension String {
var htmlDecoded: String {
let decoded = try? NSAttributedString(data: Data(utf8), options: [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
], documentAttributes: nil).string
return decoded ?? self
}
}
Swift 3 version of #akashivskyy's extension,
extension String {
init(htmlEncodedString: String) {
self.init()
guard let encodedData = htmlEncodedString.data(using: .utf8) else {
self = htmlEncodedString
return
}
let attributedOptions: [String : Any] = [
NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
NSCharacterEncodingDocumentAttribute: String.Encoding.utf8.rawValue
]
do {
let attributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil)
self = attributedString.string
} catch {
print("Error: \(error)")
self = htmlEncodedString
}
}
}
Swift 2 version of #akashivskyy's extension,
extension String {
init(htmlEncodedString: String) {
if let encodedData = htmlEncodedString.dataUsingEncoding(NSUTF8StringEncoding){
let attributedOptions : [String: AnyObject] = [
NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
NSCharacterEncodingDocumentAttribute: NSUTF8StringEncoding
]
do{
if let attributedString:NSAttributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil){
self.init(attributedString.string)
}else{
print("error")
self.init(htmlEncodedString) //Returning actual string if there is an error
}
}catch{
print("error: \(error)")
self.init(htmlEncodedString) //Returning actual string if there is an error
}
}else{
self.init(htmlEncodedString) //Returning actual string if there is an error
}
}
}
I was looking for a pure Swift 3.0 utility to escape to/unescape from HTML character references (i.e. for server-side Swift apps on both macOS and Linux) but didn't find any comprehensive solutions, so I wrote my own implementation: https://github.com/IBM-Swift/swift-html-entities
The package, HTMLEntities, works with HTML4 named character references as well as hex/dec numeric character references, and it will recognize special numeric character references per the W3 HTML5 spec (i.e. € should be unescaped as the Euro sign (unicode U+20AC) and NOT as the unicode character for U+0080, and certain ranges of numeric character references should be replaced with the replacement character U+FFFD when unescaping).
Usage example:
import HTMLEntities
// encode example
let html = "<script>alert(\"abc\")</script>"
print(html.htmlEscape())
// Prints ”<script>alert("abc")</script>"
// decode example
let htmlencoded = "<script>alert("abc")</script>"
print(htmlencoded.htmlUnescape())
// Prints ”<script>alert(\"abc\")</script>"
And for OP's example:
print("The Weeknd ‘King Of The Fall’ [Video Premiere] | #TheWeeknd | #SoPhi ".htmlUnescape())
// prints "The Weeknd ‘King Of The Fall’ [Video Premiere] | #TheWeeknd | #SoPhi "
Edit: HTMLEntities now supports HTML5 named character references as of version 2.0.0. Spec-compliant parsing is also implemented.
Swift 4 Version
extension String {
init(htmlEncodedString: String) {
self.init()
guard let encodedData = htmlEncodedString.data(using: .utf8) else {
self = htmlEncodedString
return
}
let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
]
do {
let attributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil)
self = attributedString.string
}
catch {
print("Error: \(error)")
self = htmlEncodedString
}
}
}
extension String{
func decodeEnt() -> String{
let encodedData = self.dataUsingEncoding(NSUTF8StringEncoding)!
let attributedOptions : [String: AnyObject] = [
NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
NSCharacterEncodingDocumentAttribute: NSUTF8StringEncoding
]
let attributedString = NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil, error: nil)!
return attributedString.string
}
}
let encodedString = "The Weeknd ‘King Of The Fall’"
let foo = encodedString.decodeEnt() /* The Weeknd ‘King Of The Fall’ */
Swift 4:
The total solution that finally worked for me with HTML code and newline characters and single quotes
extension String {
var htmlDecoded: String {
let decoded = try? NSAttributedString(data: Data(utf8), options: [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
], documentAttributes: nil).string
return decoded ?? self
}
}
Usage:
let yourStringEncoded = yourStringWithHtmlcode.htmlDecoded
I then had to apply some more filters to get rid of single quotes (for example, don't, hasn't, It's, etc.), and new line characters like \n:
var yourNewString = String(yourStringEncoded.filter { !"\n\t\r".contains($0) })
yourNewString = yourNewString.replacingOccurrences(of: "\'", with: "", options: NSString.CompareOptions.literal, range: nil)
This would be my approach. You could add the entities dictionary from https://gist.github.com/mwaterfall/25b4a6a06dc3309d9555 Michael Waterfall mentions.
extension String {
func htmlDecoded()->String {
guard (self != "") else { return self }
var newStr = self
let entities = [
""" : "\"",
"&" : "&",
"&apos;" : "'",
"<" : "<",
">" : ">",
]
for (name,value) in entities {
newStr = newStr.stringByReplacingOccurrencesOfString(name, withString: value)
}
return newStr
}
}
Examples used:
let encoded = "this is so "good""
let decoded = encoded.htmlDecoded() // "this is so "good""
OR
let encoded = "this is so "good"".htmlDecoded() // "this is so "good""
Elegant Swift 4 Solution
If you want a string,
myString = String(htmlString: encodedString)
add this extension to your project:
extension String {
init(htmlString: String) {
self.init()
guard let encodedData = htmlString.data(using: .utf8) else {
self = htmlString
return
}
let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
]
do {
let attributedString = try NSAttributedString(data: encodedData,
options: attributedOptions,
documentAttributes: nil)
self = attributedString.string
} catch {
print("Error: \(error.localizedDescription)")
self = htmlString
}
}
}
If you want an NSAttributedString with bold, italic, links, etc.,
textField.attributedText = try? NSAttributedString(htmlString: encodedString)
add this extension to your project:
extension NSAttributedString {
convenience init(htmlString html: String) throws {
try self.init(data: Data(html.utf8), options: [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
], documentAttributes: nil)
}
}
Swift 5.1 Version
import UIKit
extension String {
init(htmlEncodedString: String) {
self.init()
guard let encodedData = htmlEncodedString.data(using: .utf8) else {
self = htmlEncodedString
return
}
let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
]
do {
let attributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil)
self = attributedString.string
}
catch {
print("Error: \(error)")
self = htmlEncodedString
}
}
}
Also, if you want to extract date, images, metadata, title and description, you can use my pod named:
.
Readability kit
Swift 4
I really like the solution using documentAttributes. However, it is may too slow for parsing files and/or usage in table view cells. I can't believe that Apple does not provide a decent solution for this.
As a workaround, I found this String Extension on GitHub which works perfectly and is fast for decoding.
So for situations in which the given answer is to slow, see the solution suggest in this link:
https://gist.github.com/mwaterfall/25b4a6a06dc3309d9555
Note: it does not parse HTML tags.
Computed var version of #yishus' answer
public extension String {
/// Decodes string with HTML encoding.
var htmlDecoded: String {
guard let encodedData = self.data(using: .utf8) else { return self }
let attributedOptions: [String : Any] = [
NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
NSCharacterEncodingDocumentAttribute: String.Encoding.utf8.rawValue]
do {
let attributedString = try NSAttributedString(data: encodedData,
options: attributedOptions,
documentAttributes: nil)
return attributedString.string
} catch {
print("Error: \(error)")
return self
}
}
}
Swift 4
func decodeHTML(string: String) -> String? {
var decodedString: String?
if let encodedData = string.data(using: .utf8) {
let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
]
do {
decodedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil).string
} catch {
print("\(error.localizedDescription)")
}
}
return decodedString
}
Swift 4.1 +
var htmlDecoded: String {
let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
NSAttributedString.DocumentReadingOptionKey.documentType : NSAttributedString.DocumentType.html,
NSAttributedString.DocumentReadingOptionKey.characterEncoding : String.Encoding.utf8.rawValue
]
let decoded = try? NSAttributedString(data: Data(utf8), options: attributedOptions
, documentAttributes: nil).string
return decoded ?? self
}
Swift 4
extension String {
var replacingHTMLEntities: String? {
do {
return try NSAttributedString(data: Data(utf8), options: [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
], documentAttributes: nil).string
} catch {
return nil
}
}
}
Simple Usage
let clean = "Weeknd ‘King Of The Fall&#8217".replacingHTMLEntities ?? "default value"
Updated answer working on Swift 3
extension String {
init?(htmlEncodedString: String) {
let encodedData = htmlEncodedString.data(using: String.Encoding.utf8)!
let attributedOptions = [ NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType]
guard let attributedString = try? NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil) else {
return nil
}
self.init(attributedString.string)
}
Have a look at HTMLString - a library written in Swift that allows your program to add and remove HTML entities in Strings
For completeness, I copied the main features from the site:
Adds entities for ASCII and UTF-8/UTF-16 encodings
Removes more than 2100 named entities (like &)
Supports removing decimal and hexadecimal entities
Designed to support Swift Extended Grapheme Clusters (→ 100% emoji-proof)
Fully unit tested
Fast
Documented
Compatible with Objective-C
Objective-C
+(NSString *) decodeHTMLEnocdedString:(NSString *)htmlEncodedString {
if (!htmlEncodedString) {
return nil;
}
NSData *data = [htmlEncodedString dataUsingEncoding:NSUTF8StringEncoding];
NSDictionary *attributes = #{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
NSCharacterEncodingDocumentAttribute: #(NSUTF8StringEncoding)};
NSAttributedString *attributedString = [[NSAttributedString alloc] initWithData:data options:attributes documentAttributes:nil error:nil];
return [attributedString string];
}
Swift 3.0 version with actual font size conversion
Normally, if you directly convert HTML content to an attributed string, the font size is increased. You can try to convert an HTML string to an attributed string and back again to see the difference.
Instead, here is the actual size conversion that makes sure the font size does not change, by applying the 0.75 ratio on all fonts:
extension String {
func htmlAttributedString() -> NSAttributedString? {
guard let data = self.data(using: String.Encoding.utf16, allowLossyConversion: false) else { return nil }
guard let attriStr = try? NSMutableAttributedString(
data: data,
options: [NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType],
documentAttributes: nil) else { return nil }
attriStr.beginEditing()
attriStr.enumerateAttribute(NSFontAttributeName, in: NSMakeRange(0, attriStr.length), options: .init(rawValue: 0)) {
(value, range, stop) in
if let font = value as? UIFont {
let resizedFont = font.withSize(font.pointSize * 0.75)
attriStr.addAttribute(NSFontAttributeName,
value: resizedFont,
range: range)
}
}
attriStr.endEditing()
return attriStr
}
}
Swift 4
extension String {
mutating func toHtmlEncodedString() {
guard let encodedData = self.data(using: .utf8) else {
return
}
let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
NSAttributedString.DocumentReadingOptionKey(rawValue: NSAttributedString.DocumentAttributeKey.documentType.rawValue): NSAttributedString.DocumentType.html,
NSAttributedString.DocumentReadingOptionKey(rawValue: NSAttributedString.DocumentAttributeKey.characterEncoding.rawValue): String.Encoding.utf8.rawValue
]
do {
let attributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil)
self = attributedString.string
}
catch {
print("Error: \(error)")
}
}
Use:
NSData dataRes = (nsdata value )
var resString = NSString(data: dataRes, encoding: NSUTF8StringEncoding)

Replace html characters in Swift without lag [duplicate]

I am pulling a JSON file from a site and one of the strings received is:
The Weeknd ‘King Of The Fall’ [Video Premiere] | #TheWeeknd | #SoPhi
How can I convert things like &#8216 into the correct characters?
I've made a Xcode Playground to demonstrate it:
import UIKit
var error: NSError?
let blogUrl: NSURL = NSURL.URLWithString("http://sophisticatedignorance.net/api/get_recent_summary/")
let jsonData = NSData(contentsOfURL: blogUrl)
let dataDictionary = NSJSONSerialization.JSONObjectWithData(jsonData, options: nil, error: &error) as NSDictionary
var a = dataDictionary["posts"] as NSArray
println(a[0]["title"])
This answer was last revised for Swift 5.2 and iOS 13.4 SDK.
There's no straightforward way to do that, but you can use NSAttributedString magic to make this process as painless as possible (be warned that this method will strip all HTML tags as well).
Remember to initialize NSAttributedString from main thread only. It uses WebKit to parse HTML underneath, thus the requirement.
// This is a[0]["title"] in your case
let htmlEncodedString = "The Weeknd <em>‘King Of The Fall’</em>"
guard let data = htmlEncodedString.data(using: .utf8) else {
return
}
let options: [NSAttributedString.DocumentReadingOptionKey: Any] = [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
]
guard let attributedString = try? NSAttributedString(data: data, options: options, documentAttributes: nil) else {
return
}
// The Weeknd ‘King Of The Fall’
let decodedString = attributedString.string
extension String {
init?(htmlEncodedString: String) {
guard let data = htmlEncodedString.data(using: .utf8) else {
return nil
}
let options: [NSAttributedString.DocumentReadingOptionKey: Any] = [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
]
guard let attributedString = try? NSAttributedString(data: data, options: options, documentAttributes: nil) else {
return nil
}
self.init(attributedString.string)
}
}
let encodedString = "The Weeknd <em>‘King Of The Fall’</em>"
let decodedString = String(htmlEncodedString: encodedString)
#akashivskyy's answer is great and demonstrates how to utilize NSAttributedString to decode HTML entities. One possible disadvantage
(as he stated) is that all HTML markup is removed as well, so
<strong> 4 < 5 & 3 > 2</strong>
becomes
4 < 5 & 3 > 2
On OS X there is CFXMLCreateStringByUnescapingEntities() which does the job:
let encoded = "<strong> 4 < 5 & 3 > 2 .</strong> Price: 12 €. # "
let decoded = CFXMLCreateStringByUnescapingEntities(nil, encoded, nil) as String
println(decoded)
// <strong> 4 < 5 & 3 > 2 .</strong> Price: 12 €. #
but this is not available on iOS.
Here is a pure Swift implementation. It decodes character entities
references like < using a dictionary, and all numeric character
entities like &#64 or &#x20ac. (Note that I did not list all
252 HTML entities explicitly.)
Swift 4:
// Mapping from XML/HTML character entity reference to character
// From http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
private let characterEntities : [ Substring : Character ] = [
// XML predefined entities:
""" : "\"",
"&" : "&",
"&apos;" : "'",
"<" : "<",
">" : ">",
// HTML character entity references:
" " : "\u{00a0}",
// ...
"♦" : "♦",
]
extension String {
/// Returns a new string made by replacing in the `String`
/// all HTML character entity references with the corresponding
/// character.
var stringByDecodingHTMLEntities : String {
// ===== Utility functions =====
// Convert the number in the string to the corresponding
// Unicode character, e.g.
// decodeNumeric("64", 10) --> "#"
// decodeNumeric("20ac", 16) --> "€"
func decodeNumeric(_ string : Substring, base : Int) -> Character? {
guard let code = UInt32(string, radix: base),
let uniScalar = UnicodeScalar(code) else { return nil }
return Character(uniScalar)
}
// Decode the HTML character entity to the corresponding
// Unicode character, return `nil` for invalid input.
// decode("#") --> "#"
// decode("€") --> "€"
// decode("<") --> "<"
// decode("&foo;") --> nil
func decode(_ entity : Substring) -> Character? {
if entity.hasPrefix("&#x") || entity.hasPrefix("&#X") {
return decodeNumeric(entity.dropFirst(3).dropLast(), base: 16)
} else if entity.hasPrefix("&#") {
return decodeNumeric(entity.dropFirst(2).dropLast(), base: 10)
} else {
return characterEntities[entity]
}
}
// ===== Method starts here =====
var result = ""
var position = startIndex
// Find the next '&' and copy the characters preceding it to `result`:
while let ampRange = self[position...].range(of: "&") {
result.append(contentsOf: self[position ..< ampRange.lowerBound])
position = ampRange.lowerBound
// Find the next ';' and copy everything from '&' to ';' into `entity`
guard let semiRange = self[position...].range(of: ";") else {
// No matching ';'.
break
}
let entity = self[position ..< semiRange.upperBound]
position = semiRange.upperBound
if let decoded = decode(entity) {
// Replace by decoded character:
result.append(decoded)
} else {
// Invalid entity, copy verbatim:
result.append(contentsOf: entity)
}
}
// Copy remaining characters to `result`:
result.append(contentsOf: self[position...])
return result
}
}
Example:
let encoded = "<strong> 4 < 5 & 3 > 2 .</strong> Price: 12 €. # "
let decoded = encoded.stringByDecodingHTMLEntities
print(decoded)
// <strong> 4 < 5 & 3 > 2 .</strong> Price: 12 €. #
Swift 3:
// Mapping from XML/HTML character entity reference to character
// From http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
private let characterEntities : [ String : Character ] = [
// XML predefined entities:
""" : "\"",
"&" : "&",
"&apos;" : "'",
"<" : "<",
">" : ">",
// HTML character entity references:
" " : "\u{00a0}",
// ...
"♦" : "♦",
]
extension String {
/// Returns a new string made by replacing in the `String`
/// all HTML character entity references with the corresponding
/// character.
var stringByDecodingHTMLEntities : String {
// ===== Utility functions =====
// Convert the number in the string to the corresponding
// Unicode character, e.g.
// decodeNumeric("64", 10) --> "#"
// decodeNumeric("20ac", 16) --> "€"
func decodeNumeric(_ string : String, base : Int) -> Character? {
guard let code = UInt32(string, radix: base),
let uniScalar = UnicodeScalar(code) else { return nil }
return Character(uniScalar)
}
// Decode the HTML character entity to the corresponding
// Unicode character, return `nil` for invalid input.
// decode("#") --> "#"
// decode("€") --> "€"
// decode("<") --> "<"
// decode("&foo;") --> nil
func decode(_ entity : String) -> Character? {
if entity.hasPrefix("&#x") || entity.hasPrefix("&#X"){
return decodeNumeric(entity.substring(with: entity.index(entity.startIndex, offsetBy: 3) ..< entity.index(entity.endIndex, offsetBy: -1)), base: 16)
} else if entity.hasPrefix("&#") {
return decodeNumeric(entity.substring(with: entity.index(entity.startIndex, offsetBy: 2) ..< entity.index(entity.endIndex, offsetBy: -1)), base: 10)
} else {
return characterEntities[entity]
}
}
// ===== Method starts here =====
var result = ""
var position = startIndex
// Find the next '&' and copy the characters preceding it to `result`:
while let ampRange = self.range(of: "&", range: position ..< endIndex) {
result.append(self[position ..< ampRange.lowerBound])
position = ampRange.lowerBound
// Find the next ';' and copy everything from '&' to ';' into `entity`
if let semiRange = self.range(of: ";", range: position ..< endIndex) {
let entity = self[position ..< semiRange.upperBound]
position = semiRange.upperBound
if let decoded = decode(entity) {
// Replace by decoded character:
result.append(decoded)
} else {
// Invalid entity, copy verbatim:
result.append(entity)
}
} else {
// No matching ';'.
break
}
}
// Copy remaining characters to `result`:
result.append(self[position ..< endIndex])
return result
}
}
Swift 2:
// Mapping from XML/HTML character entity reference to character
// From http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
private let characterEntities : [ String : Character ] = [
// XML predefined entities:
""" : "\"",
"&" : "&",
"&apos;" : "'",
"<" : "<",
">" : ">",
// HTML character entity references:
" " : "\u{00a0}",
// ...
"♦" : "♦",
]
extension String {
/// Returns a new string made by replacing in the `String`
/// all HTML character entity references with the corresponding
/// character.
var stringByDecodingHTMLEntities : String {
// ===== Utility functions =====
// Convert the number in the string to the corresponding
// Unicode character, e.g.
// decodeNumeric("64", 10) --> "#"
// decodeNumeric("20ac", 16) --> "€"
func decodeNumeric(string : String, base : Int32) -> Character? {
let code = UInt32(strtoul(string, nil, base))
return Character(UnicodeScalar(code))
}
// Decode the HTML character entity to the corresponding
// Unicode character, return `nil` for invalid input.
// decode("#") --> "#"
// decode("€") --> "€"
// decode("<") --> "<"
// decode("&foo;") --> nil
func decode(entity : String) -> Character? {
if entity.hasPrefix("&#x") || entity.hasPrefix("&#X"){
return decodeNumeric(entity.substringFromIndex(entity.startIndex.advancedBy(3)), base: 16)
} else if entity.hasPrefix("&#") {
return decodeNumeric(entity.substringFromIndex(entity.startIndex.advancedBy(2)), base: 10)
} else {
return characterEntities[entity]
}
}
// ===== Method starts here =====
var result = ""
var position = startIndex
// Find the next '&' and copy the characters preceding it to `result`:
while let ampRange = self.rangeOfString("&", range: position ..< endIndex) {
result.appendContentsOf(self[position ..< ampRange.startIndex])
position = ampRange.startIndex
// Find the next ';' and copy everything from '&' to ';' into `entity`
if let semiRange = self.rangeOfString(";", range: position ..< endIndex) {
let entity = self[position ..< semiRange.endIndex]
position = semiRange.endIndex
if let decoded = decode(entity) {
// Replace by decoded character:
result.append(decoded)
} else {
// Invalid entity, copy verbatim:
result.appendContentsOf(entity)
}
} else {
// No matching ';'.
break
}
}
// Copy remaining characters to `result`:
result.appendContentsOf(self[position ..< endIndex])
return result
}
}
Swift 4
String extension computed variable
Without extra guard, do, catch, etc...
Returns the original strings if decoding fails
extension String {
var htmlDecoded: String {
let decoded = try? NSAttributedString(data: Data(utf8), options: [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
], documentAttributes: nil).string
return decoded ?? self
}
}
Swift 3 version of #akashivskyy's extension,
extension String {
init(htmlEncodedString: String) {
self.init()
guard let encodedData = htmlEncodedString.data(using: .utf8) else {
self = htmlEncodedString
return
}
let attributedOptions: [String : Any] = [
NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
NSCharacterEncodingDocumentAttribute: String.Encoding.utf8.rawValue
]
do {
let attributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil)
self = attributedString.string
} catch {
print("Error: \(error)")
self = htmlEncodedString
}
}
}
Swift 2 version of #akashivskyy's extension,
extension String {
init(htmlEncodedString: String) {
if let encodedData = htmlEncodedString.dataUsingEncoding(NSUTF8StringEncoding){
let attributedOptions : [String: AnyObject] = [
NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
NSCharacterEncodingDocumentAttribute: NSUTF8StringEncoding
]
do{
if let attributedString:NSAttributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil){
self.init(attributedString.string)
}else{
print("error")
self.init(htmlEncodedString) //Returning actual string if there is an error
}
}catch{
print("error: \(error)")
self.init(htmlEncodedString) //Returning actual string if there is an error
}
}else{
self.init(htmlEncodedString) //Returning actual string if there is an error
}
}
}
I was looking for a pure Swift 3.0 utility to escape to/unescape from HTML character references (i.e. for server-side Swift apps on both macOS and Linux) but didn't find any comprehensive solutions, so I wrote my own implementation: https://github.com/IBM-Swift/swift-html-entities
The package, HTMLEntities, works with HTML4 named character references as well as hex/dec numeric character references, and it will recognize special numeric character references per the W3 HTML5 spec (i.e. € should be unescaped as the Euro sign (unicode U+20AC) and NOT as the unicode character for U+0080, and certain ranges of numeric character references should be replaced with the replacement character U+FFFD when unescaping).
Usage example:
import HTMLEntities
// encode example
let html = "<script>alert(\"abc\")</script>"
print(html.htmlEscape())
// Prints ”<script>alert("abc")</script>"
// decode example
let htmlencoded = "<script>alert("abc")</script>"
print(htmlencoded.htmlUnescape())
// Prints ”<script>alert(\"abc\")</script>"
And for OP's example:
print("The Weeknd ‘King Of The Fall’ [Video Premiere] | #TheWeeknd | #SoPhi ".htmlUnescape())
// prints "The Weeknd ‘King Of The Fall’ [Video Premiere] | #TheWeeknd | #SoPhi "
Edit: HTMLEntities now supports HTML5 named character references as of version 2.0.0. Spec-compliant parsing is also implemented.
Swift 4 Version
extension String {
init(htmlEncodedString: String) {
self.init()
guard let encodedData = htmlEncodedString.data(using: .utf8) else {
self = htmlEncodedString
return
}
let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
]
do {
let attributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil)
self = attributedString.string
}
catch {
print("Error: \(error)")
self = htmlEncodedString
}
}
}
extension String{
func decodeEnt() -> String{
let encodedData = self.dataUsingEncoding(NSUTF8StringEncoding)!
let attributedOptions : [String: AnyObject] = [
NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
NSCharacterEncodingDocumentAttribute: NSUTF8StringEncoding
]
let attributedString = NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil, error: nil)!
return attributedString.string
}
}
let encodedString = "The Weeknd ‘King Of The Fall’"
let foo = encodedString.decodeEnt() /* The Weeknd ‘King Of The Fall’ */
Swift 4:
The total solution that finally worked for me with HTML code and newline characters and single quotes
extension String {
var htmlDecoded: String {
let decoded = try? NSAttributedString(data: Data(utf8), options: [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
], documentAttributes: nil).string
return decoded ?? self
}
}
Usage:
let yourStringEncoded = yourStringWithHtmlcode.htmlDecoded
I then had to apply some more filters to get rid of single quotes (for example, don't, hasn't, It's, etc.), and new line characters like \n:
var yourNewString = String(yourStringEncoded.filter { !"\n\t\r".contains($0) })
yourNewString = yourNewString.replacingOccurrences(of: "\'", with: "", options: NSString.CompareOptions.literal, range: nil)
This would be my approach. You could add the entities dictionary from https://gist.github.com/mwaterfall/25b4a6a06dc3309d9555 Michael Waterfall mentions.
extension String {
func htmlDecoded()->String {
guard (self != "") else { return self }
var newStr = self
let entities = [
""" : "\"",
"&" : "&",
"&apos;" : "'",
"<" : "<",
">" : ">",
]
for (name,value) in entities {
newStr = newStr.stringByReplacingOccurrencesOfString(name, withString: value)
}
return newStr
}
}
Examples used:
let encoded = "this is so "good""
let decoded = encoded.htmlDecoded() // "this is so "good""
OR
let encoded = "this is so "good"".htmlDecoded() // "this is so "good""
Elegant Swift 4 Solution
If you want a string,
myString = String(htmlString: encodedString)
add this extension to your project:
extension String {
init(htmlString: String) {
self.init()
guard let encodedData = htmlString.data(using: .utf8) else {
self = htmlString
return
}
let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
]
do {
let attributedString = try NSAttributedString(data: encodedData,
options: attributedOptions,
documentAttributes: nil)
self = attributedString.string
} catch {
print("Error: \(error.localizedDescription)")
self = htmlString
}
}
}
If you want an NSAttributedString with bold, italic, links, etc.,
textField.attributedText = try? NSAttributedString(htmlString: encodedString)
add this extension to your project:
extension NSAttributedString {
convenience init(htmlString html: String) throws {
try self.init(data: Data(html.utf8), options: [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
], documentAttributes: nil)
}
}
Swift 5.1 Version
import UIKit
extension String {
init(htmlEncodedString: String) {
self.init()
guard let encodedData = htmlEncodedString.data(using: .utf8) else {
self = htmlEncodedString
return
}
let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
]
do {
let attributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil)
self = attributedString.string
}
catch {
print("Error: \(error)")
self = htmlEncodedString
}
}
}
Also, if you want to extract date, images, metadata, title and description, you can use my pod named:
.
Readability kit
Swift 4
I really like the solution using documentAttributes. However, it is may too slow for parsing files and/or usage in table view cells. I can't believe that Apple does not provide a decent solution for this.
As a workaround, I found this String Extension on GitHub which works perfectly and is fast for decoding.
So for situations in which the given answer is to slow, see the solution suggest in this link:
https://gist.github.com/mwaterfall/25b4a6a06dc3309d9555
Note: it does not parse HTML tags.
Computed var version of #yishus' answer
public extension String {
/// Decodes string with HTML encoding.
var htmlDecoded: String {
guard let encodedData = self.data(using: .utf8) else { return self }
let attributedOptions: [String : Any] = [
NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
NSCharacterEncodingDocumentAttribute: String.Encoding.utf8.rawValue]
do {
let attributedString = try NSAttributedString(data: encodedData,
options: attributedOptions,
documentAttributes: nil)
return attributedString.string
} catch {
print("Error: \(error)")
return self
}
}
}
Swift 4
func decodeHTML(string: String) -> String? {
var decodedString: String?
if let encodedData = string.data(using: .utf8) {
let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
]
do {
decodedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil).string
} catch {
print("\(error.localizedDescription)")
}
}
return decodedString
}
Swift 4.1 +
var htmlDecoded: String {
let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
NSAttributedString.DocumentReadingOptionKey.documentType : NSAttributedString.DocumentType.html,
NSAttributedString.DocumentReadingOptionKey.characterEncoding : String.Encoding.utf8.rawValue
]
let decoded = try? NSAttributedString(data: Data(utf8), options: attributedOptions
, documentAttributes: nil).string
return decoded ?? self
}
Swift 4
extension String {
var replacingHTMLEntities: String? {
do {
return try NSAttributedString(data: Data(utf8), options: [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
], documentAttributes: nil).string
} catch {
return nil
}
}
}
Simple Usage
let clean = "Weeknd ‘King Of The Fall&#8217".replacingHTMLEntities ?? "default value"
Updated answer working on Swift 3
extension String {
init?(htmlEncodedString: String) {
let encodedData = htmlEncodedString.data(using: String.Encoding.utf8)!
let attributedOptions = [ NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType]
guard let attributedString = try? NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil) else {
return nil
}
self.init(attributedString.string)
}
Have a look at HTMLString - a library written in Swift that allows your program to add and remove HTML entities in Strings
For completeness, I copied the main features from the site:
Adds entities for ASCII and UTF-8/UTF-16 encodings
Removes more than 2100 named entities (like &)
Supports removing decimal and hexadecimal entities
Designed to support Swift Extended Grapheme Clusters (→ 100% emoji-proof)
Fully unit tested
Fast
Documented
Compatible with Objective-C
Objective-C
+(NSString *) decodeHTMLEnocdedString:(NSString *)htmlEncodedString {
if (!htmlEncodedString) {
return nil;
}
NSData *data = [htmlEncodedString dataUsingEncoding:NSUTF8StringEncoding];
NSDictionary *attributes = #{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
NSCharacterEncodingDocumentAttribute: #(NSUTF8StringEncoding)};
NSAttributedString *attributedString = [[NSAttributedString alloc] initWithData:data options:attributes documentAttributes:nil error:nil];
return [attributedString string];
}
Swift 3.0 version with actual font size conversion
Normally, if you directly convert HTML content to an attributed string, the font size is increased. You can try to convert an HTML string to an attributed string and back again to see the difference.
Instead, here is the actual size conversion that makes sure the font size does not change, by applying the 0.75 ratio on all fonts:
extension String {
func htmlAttributedString() -> NSAttributedString? {
guard let data = self.data(using: String.Encoding.utf16, allowLossyConversion: false) else { return nil }
guard let attriStr = try? NSMutableAttributedString(
data: data,
options: [NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType],
documentAttributes: nil) else { return nil }
attriStr.beginEditing()
attriStr.enumerateAttribute(NSFontAttributeName, in: NSMakeRange(0, attriStr.length), options: .init(rawValue: 0)) {
(value, range, stop) in
if let font = value as? UIFont {
let resizedFont = font.withSize(font.pointSize * 0.75)
attriStr.addAttribute(NSFontAttributeName,
value: resizedFont,
range: range)
}
}
attriStr.endEditing()
return attriStr
}
}
Swift 4
extension String {
mutating func toHtmlEncodedString() {
guard let encodedData = self.data(using: .utf8) else {
return
}
let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
NSAttributedString.DocumentReadingOptionKey(rawValue: NSAttributedString.DocumentAttributeKey.documentType.rawValue): NSAttributedString.DocumentType.html,
NSAttributedString.DocumentReadingOptionKey(rawValue: NSAttributedString.DocumentAttributeKey.characterEncoding.rawValue): String.Encoding.utf8.rawValue
]
do {
let attributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil)
self = attributedString.string
}
catch {
print("Error: \(error)")
}
}
Use:
NSData dataRes = (nsdata value )
var resString = NSString(data: dataRes, encoding: NSUTF8StringEncoding)

HTML string to formatted text in Swift

Hello I have a string that I want to write to a text document in Swift. Because the text document has line breaks, however, the string seems to be in HTML format.
Here is what I want to do:
//Covert str1 to str2
var str1 = "1<div>2</div><div>3</div><div>4</div>"
var str2 = "1
2
3
4" //Text document will have text that looks like this (With line breaks)
You can try with NSAttributedString:
let htmlString = "1<div>2</div><div>3</div><div>4</div>"
NSAttributedString(data: htmlString.dataUsingEncoding(NSUnicodeStringEncoding)!,
options: [NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType],
documentAttributes: nil,
error: nil);
this will produce:
"1\n2\n3\n4\n"
Swift 3.0: You can pop this directly into Xcode's playground to check it out.
let htmlString = "1<div>2</div><div>3</div><div>4</div>"
func getHtmlFormat(fromString htmlString: String) -> NSAttributedString? {
guard let htmlData = htmlString.data(using: String.Encoding.unicode) else
{
print("Could not get htmlData")
return nil
}
do {
return try NSAttributedString(data: htmlData, options: [NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType], documentAttributes: nil)
} catch let error {
print(error)
return nil
}
}
let textView = UITextView(frame: CGRect(x: 0, y: 0, width: 500, height: 500))
textView.attributedText = getHtmlFormat(fromString: htmlString)