Need a function for converting Slack Markdown to HTML - html

Is there an available package by which we can convert the slack block kit to HTML?
Or if someone has a function for the same, can you please help ?

If anyone is looking for something similar - here's the solution
function slackMarkdownToHtml(markdown) {
// Replace asterisks with bold tags
let html = markdown.replace(/\*(.+?)\*/g, '<b>$1</b>');
// Replace underscores with italic tags
html = html.replace(/\_(.+?)\_/g, '<i>$1</i>');
// Replace tildes with strike-through tags
html = html.replace(/\~(.+?)\~/g, '<s>$1</s>');
// Replace dashes with unordered list items
html = html.replace(/- (.*)/g, '<li>$1</li>');
html = html.replace(/\n-/g, '\n</ul><ul>-')
html = '<ul>' + html + '</ul>';
// Replace numbers with ordered list items
html = html.replace(/[0-9]\. (.*)/g, '<li>$1</li>');
html = html.replace(/\n[0-9]\./g, '\n</ol><ol>$&')
html = '<ol>' + html + '</ol>';
// Replace Slack's link syntax with anchor tags
html = html.replace(/\[(.+?)\]\((.+?)\)/g, '$1');
return html;
}
Also, the reverse - HTML to Slack Markdown
function htmlToSlackMarkdown(html) {
// Replace newline characters with a line break
let markdown = html.replace(/\n/g, '\n\n');
// Replace bold tags with asterisks
markdown = markdown.replace(/<b>/g, '*').replace(/<\/b>/g, '*');
// Replace italic tags with underscores
markdown = markdown.replace(/<i>/g, '_').replace(/<\/i>/g, '_');
// Replace strike-through tags with tildes
markdown = markdown.replace(/<s>/g, '~').replace(/<\/s>/g, '~');
// Replace unordered list items with dashes
markdown = markdown.replace(/<li>/g, '- ').replace(/<\/li>/g, '');
// Replace ordered list items with numbers
markdown = markdown.replace(/<ol>/g, '').replace(/<\/ol>/g, '');
markdown = markdown.replace(/<li>/g, '1. ').replace(/<\/li>/g, '');
markdown = markdown.replace(/\n1\./g, '\n2.');
markdown = markdown.replace(/\n2\./g, '\n3.');
markdown = markdown.replace(/\n3\./g, '\n4.');
markdown = markdown.replace(/\n4\./g, '\n5.');
markdown = markdown.replace(/\n5\./g, '\n6.');
markdown = markdown.replace(/\n6\./g, '\n7.');
markdown = markdown.replace(/\n7\./g, '\n8.');
markdown = markdown.replace(/\n8\./g, '\n9.');
markdown = markdown.replace(/\n9\./g, '\n10.');
// Replace anchor tags with Slack's link syntax
markdown = markdown.replace(/<a href="(.+?)">(.+?)<\/a>/g, '[$2]($1)');
return markdown;
}

Related

Turn a file name into a clickable list

I need to turn the name of the file in the folder into a clickable text. As of now, the file name is in one line and link in another.
What's the name of it? Which keywords I should use?
html = '<html><body>'
subset = []
lastFile = None
for file in os.listdir():
if file.endswith(".html"):
subset.append(file)
for r in subset:
if not lastFile:
html += '<h3>%s</h3>' % r
html += 'r' % r
You can just wrap the <h3> tag in an anchor tag, using your code do something like this
html = '<html><body>'
subset = []
lastFile = None
for file in os.listdir():
if file.endswith(".html"):
subset.append(file)
for r in subset:
if not lastFile:
html += '<a href="%s">' % r
html += '<h3>%s</h3></a>' % r

How do I not escape HTML special characters in QDomText?

I'm trying to create an HTML document containing Javascript using Qt XML. Here is the relevant part of my Qt code:
QDomDocument document;
//Create head, body, etc
QDomElement script = document.createElement("script");
script.setAttribute("type", "text/javascript");
body.appendChild(script); //body is the QDomElement representing the <body> tag
QDomText scriptText = document.createTextNode("if(x < 0){\n/*do something*/\n}");
script.appendChild(scriptText);
QFile file("C:\\foo.html");
file.open(QIODevice::WriteOnly);
QTextStream stream(&file);
stream << document.toString();
The problem is that in the Javascript code, it's escaping the < character replacing it with <, giving the following output which isn't valid Javascript:
<script type="text/javascript">
if(x < 0){
/*do something*/
}
</script>
I've searched the Qt documentation for a solution, but haven't found anything.
A workaround could be to replace < with < when writing in the file by doing stream << document.toString().replace("<", "<"), but there might also be occurrences of < outside of the Javascript code that I want to leave alone.
I can also think of a few Javascript tricks to check if a number is negative without using any special HTML characters, like for example if(String(x).indexOf('-') != -1), but I would like to know if there is a better way of doing it.
My question is how do I create a QDomText object with text containing special HTML characters like <, >, &, etc without them being escaped in QDomDocument::toString()?
You can put the javascript code in a CDATA section:
QDomElement script = document.createElement("script");
script.setAttribute("type", "text/javascript");
body.appendChild(script);
QString js = "if(x < 0){\n/*do something*/\n}";
QDomCDATASection data = document.createCDATASection(js);
script.appendChild(data);
then remove the unwanted text right after:
QString text = document.toString();
text.replace("<![CDATA[", "\n");
text.replace("]]>", "\n");

Highlight a part of string which has html entities as text : angularJs

I have :
$scope.text = "<b>TESTNAME</b>"; (This can be any string. This is to specify that there can be html tags written as text in the string.)
The bold tags are part of text and need to be displayed as text only and not HTML.
Now suppose someone enters a search string(for eg.. anyone can enter any string) :
$scope.searchTerm = "NAME";
Then i want that $scope.text gets modified such that i see <b>TESTNAME</b> but with the substring of "NAME" highlighted.
My highlight function does :
$scope.text = $scope.text.replace(new RegExp("(" + $scope.searchTerm + ")","gi"), "<span class='highlighted'>\$1</span>");
and in the HTML I had to write :
<span ng-bing-html="text"></span>
However, the issue now arises is that, the <b> and </b> also get rendered in the HTML form and bold the string in between.
How can this be handled?
EDIT
In order to avoid the b tags from rendering as HTML, I modified the angular brackets to their HTML counterparts using this :
$scope.text = $scope.text.replace(/&/g, '&').replace(/</g, '<').replace(/>/g, '>');
after using the first replace function mentioned above. Now when the $scope.text is rendered using ng-bing-html, the b tags are only rendered as text.
However, now the span tags added are also rendered as text as angular brackets have been replaced globally.
EDIT
Another way to deal with the problem was that i replaced the angular tags before adding the span tags to highlight the text. So my highlight function was :
$scope.text = $scope.text.replace(/&/g, '&').replace(/</g, '<').replace(/>/g, '>');
$scope.searchTerm = $scope.searchTerm.replace(/&/g, '&').replace(/</g, '<').replace(/>/g, '>');
$scope.text = $scope.text.replace(new RegExp("(" + $scope.searchTerm + ")","gi"), "<span class='highlighted'>\$1</span>");
However, the issue with this is that if the user searches for the string lt or gt,then due to the replacements done for < and >, the highlight spans are added to these too and the net result is not as expected.
Please check working example: DEMO
Controller:
var app = angular.module('plunker', ["ngSanitize"]);
app.controller('MainCtrl', function($scope) {
$scope.searchTerm = "NAME";
$scope.content = "TESTNAME";
$scope.matchClass = 'bold';
var re = new RegExp($scope.searchTerm, 'g');
$scope.content = $scope.content.replace(re, '<span class="' + $scope.matchClass + '">' + $scope.searchTerm + '</span>');
});
HTML
<body ng-controller="MainCtrl">
<p ng-bind-html="content"></p>
</body>
CSS
.bold {
font-weight: bold;
}
Edit: new solution:
$scope.text = $scope.text.replace(new RegExp("(" + $scope.searchTerm + ")","gi"), "long-random-string-one$1long-random-string-two");
// Any encoding goes here
$scope.text = $scope.text.replace("long-random-string-one", "<span class='highlighted'>").replace("long-random-string-two", "</span>")
The idea is to insert two strings that won't be changed by the encoding, and are unique enough that they are extremely unlikely to be present in the text you are searching. Replace them with GUIDs if you want.

Loop Through HTML Elements and Nodes

I'm working on an HTML page highlighter project but ran into problems when a search term is a name of an HTML tag metadata or a class/ID name; eg if search terms are "media OR class OR content" then my find and replace would do this:
<link href="/css/DocHighlighter.css" <span style='background-color:yellow;font-weight:bold;'>media</span>="all" rel="stylesheet" type="text/css">
<div <span style='background-color:yellow;font-weight:bold;'>class</span>="container">
I'm using Lucene for highlighting and my current code (sort of):
InputStreamReader xmlReader = new INputStreamReader(xmlConn.getInputStream(), "UTF-8");
if (searchTerms!=null && searchTerms!="") {
QueryScorer qryScore = new QueryScorer(qp.parse(searchTerms));
Highlighter hl = new Highlighter(new SimpleHTMLFormatter(hlStart, hlEnd), qryScore);
}
if (xmlReader!=null) {
BufferedReader br = new BufferedReader(xmlReader);
String inputLine;
while((inputLine = br.readLine())!=null) {
String tmp = inputLine.trim();
StringReader strReader = new stringReader(tmp);
HTMLStripCharFilter htm = HTMLStripCharFilter(strReader.markSupported() ? strReader : new BufferedReader(strReader));
String tHL = hl.getBestFragment(analyzer, "", htm);
tmp = (tHL==null ? tmp : tHL);
}
xmlDoc+=tmp;
}
bufferedReader.close()
As you can see (if you understand Lucene highlighting) this does an indiscriminate find/replace. Since my document will be HTML and the search terms are dictated by users there is no way for me to parse on certain elements or tags. Also, since the find/replace basically loops and appends the HTML to a string (the return type of the method) I have to keep all HTML tags and values in place and order. I've tried using Jsoup to loop through the page but handles the HTML tag as one big result. I also tried tag soup to remove the broken HTML caused by the problem but it doesn't work correctly. Does anyone know how to basically loop though the elements and node (data value) of html?
I've been having the most luck with this
StringBuilder sb = new StringBuilder();
sb.append("<?xml version=\"1.0\" enconding=\"UTF-8\"?><!DOCTYPE html>");
Document doc = Jsoup.parse(txt.getResult());
Element elements = doc.getAllElements();
for (Element e : elements) {
if (!(e.tagName().equalsIgnoreCase("#root"))) {
sb.append("<" + e.tagName() + e.attributes() + ">" + e.ownText() + "\n");
}// end if
}// end for
return sb;
The one snag I still get is the nesting isn't always "repaired" properly but still semi close. I'm working more on this.

How to preserve whitespace indentation of text enclosed in HTML <pre> tags excluding the current indentation level of the <pre> tag in the document?

I'm trying to display my code on a website but I'm having problems preserving the whitespace indentation correctly.
For instance given the following snippet:
<html>
<body>
Here is my code:
<pre>
def some_funtion
return 'Hello, World!'
end
</pre>
<body>
</html>
This is displayed in the browser as:
Here is my code:
def some_funtion
return 'Hello, World!'
end
When I would like it displayed as:
Here is my code:
def some_funtion
return 'Hello, World!'
end
The difference is that that current indentation level of the HTML pre tag is being added to the indentation of the code. I'm using nanoc as a static website generator and I'm using google prettify to also add syntax highlighting.
Can anyone offer any suggestions?
PRE is intended to preserve whitespace exactly as it appears (unless altered by white-space in CSS, which doesn't have enough flexibility to support formatting code).
Before
Formatting is preserved, but so is all the indentation outside of the PRE tag. It would be nice to have whitespace preservation that used the location of the tag as a starting point.
After
Contents are still formatted as declared, but the extraneous leading whitespace caused by the position of the PRE tag within the document is removed.
I have come up with the following plugin to solve the issue of wanting to remove superfluous whitespace caused by the indentation of the document outline. This code uses the first line inside the PRE tag to determine how much it has been indented purely due to the indentation of the document.
This code works in IE7, IE8, IE9, Firefox, and Chrome. I have tested it briefly with the Prettify library to combine the preserved formatting with pretty printing. Make sure that the first line inside the PRE actually represents the baseline level of indenting that you want to ignore (or, you can modify the plugin to be more intelligent).
This is rough code. If you find a mistake or it does not work the way you want, please fix/comment; don't just downvote. I wrote this code to fix a problem that I was having and I am actively using it so I would like it to be as solid as possible!
/*!
*** prettyPre ***/
(function( $ ) {
$.fn.prettyPre = function( method ) {
var defaults = {
ignoreExpression: /\s/ // what should be ignored?
};
var methods = {
init: function( options ) {
this.each( function() {
var context = $.extend( {}, defaults, options );
var $obj = $( this );
var usingInnerText = true;
var text = $obj.get( 0 ).innerText;
// some browsers support innerText...some don't...some ONLY work with innerText.
if ( typeof text == "undefined" ) {
text = $obj.html();
usingInnerText = false;
}
// use the first line as a baseline for how many unwanted leading whitespace characters are present
var superfluousSpaceCount = 0;
var currentChar = text.substring( 0, 1 );
while ( context.ignoreExpression.test( currentChar ) ) {
currentChar = text.substring( ++superfluousSpaceCount, superfluousSpaceCount + 1 );
}
// split
var parts = text.split( "\n" );
var reformattedText = "";
// reconstruct
var length = parts.length;
for ( var i = 0; i < length; i++ ) {
// cleanup, and don't append a trailing newline if we are on the last line
reformattedText += parts[i].substring( superfluousSpaceCount ) + ( i == length - 1 ? "" : "\n" );
}
// modify original
if ( usingInnerText ) {
$obj.get( 0 ).innerText = reformattedText;
}
else {
// This does not appear to execute code in any browser but the onus is on the developer to not
// put raw input from a user anywhere on a page, even if it doesn't execute!
$obj.html( reformattedText );
}
} );
}
}
if ( methods[method] ) {
return methods[method].apply( this, Array.prototype.slice.call( arguments, 1 ) );
}
else if ( typeof method === "object" || !method ) {
return methods.init.apply( this, arguments );
}
else {
$.error( "Method " + method + " does not exist on jQuery.prettyPre." );
}
}
} )( jQuery );
This plugin can then be applied using a standard jQuery selector:
<script>
$( function() { $("PRE").prettyPre(); } );
</script>
Indenting With Comments
Since browsers ignore comments, you can use them to indent your pre tag contents.
Solution
<html>
<body>
<main>
Here is my code with hack:
<pre>
<!-- -->def some_function
<!-- --> return 'Hello, World!'
<!-- -->end
</pre>
Here is my code without hack:
<pre>
def some_function
return 'Hello, World!'
end
</pre>
</main>
<body>
</html>
NOTE: a main wrapper was added to provide enough space for the comments.
Advantages
No JavaScript required
Can be added statically
Minification won't affect the indentation and reduces file size
Disadvantages
Requires a minimum amount of space for the comments
Not very elegant unless build tools are used
Removing Indentation With Node
A better solution is to remove the leading white-space using either your build process or back-end rendering process. If you are using node.js, then you can use a stream I wrote called predentation. You can use any language you want to build a similar tool.
Before
<html>
<body>
Here is my code:
<pre>
def some_function
return 'Hello, World!'
end
</pre>
</body>
</html>
After
<html>
<body>
Here is my code:
<pre>
def some_function
return 'Hello, World!'
end
</pre>
</body>
</html>
Advantages
Seamless way to write pre tags
Smaller output file size
Disadvantages
Requires a build step in your workflow
Does not handle non pre elements with white-space: pre added by CSS
Removing Indentation With JavaScript
See this answer to remove indentation with JavaScript
Advantages
Possible to target elements with white-space: pre
Disadvantages
JavaScript can be disabled
White-space adds to the file size
Managed to do this with JavaScript. It works in Internet Explorer 9 and Chrome 15, I haven't tested older versions. It should work in Firefox 11 when support for outerHTML is added (see here), meanwhile there are some custom implementations available on the web. An excercise for the reader is to get rid of trailing indentation (until I make time to finish it and update this answer).
I'll also mark this as community wiki for easy editing.
Please note that you'll have to reformat the example to use tabs as indentation, or change the regex to work with spaces.
<!DOCTYPE html>
<html>
<head>
<title>Hello, World!</title>
</head>
<body>
<pre>
<html>
<head>
<title>Hello World Example</title>
</head>
<body>
Hello, World!
</body>
</html>
</pre>
<pre>
class HelloWorld
{
public static int Main(String[] args)
{
Console.WriteLine(&quot;Hello, World!&quot;);
return 0;
}
}
</pre>
<script language="javascript">
var pre_elements = document.getElementsByTagName('pre');
for (var i = 0; i < pre_elements.length; i++)
{
var content = pre_elements[i].innerHTML;
var tabs_to_remove = '';
while (content.indexOf('\t') == '0')
{
tabs_to_remove += '\t';
content = content.substring(1);
}
var re = new RegExp('\n' + tabs_to_remove, 'g');
content = content.replace(re, '\n');
pre_elements[i].outerHTML = '<pre>' + content + '</pre>';
}
</script>
</body>
</html>
This can be done in four lines of JavaScript:
var pre= document.querySelector('pre');
//insert a span in front of the first letter. (the span will automatically close.)
pre.innerHTML= pre.textContent.replace(/(\w)/, '<span>$1');
//get the new span's left offset:
var left= pre.querySelector('span').getClientRects()[0].left;
//move the code to the left, taking into account the body's margin:
pre.style.marginLeft= (-left + pre.getClientRects()[0].left)+'px';
<body>
Here is my code:
<pre>
def some_funtion
return 'Hello, World!'
end
</pre>
<body>
If you're okay with changing the innerHTML of the element:
Given:
<pre>
<code id="the-code">
def some_funtion
return 'Hello, World!'
end
</code
</pre>
Which renders as:
def some_funtion
return 'Hello, World!'
end
The following vanilla JS:
// get block however you want.
var block = document.getElementById("the-code");
// remove leading and trailing white space.
var code = block.innerHTML
.split('\n')
.filter(l => l.trim().length > 0)
.join('\n');
// find the first non-empty line and use its
// leading whitespace as the amount that needs to be removed
var firstNonEmptyLine = block.textContent
.split('\n')
.filter(l => l.trim().length > 0)[0];
// using regex get the first capture group
var leadingWhiteSpace = firstNonEmptyLine.match(/^([ ]*)/);
// if the capture group exists, then use that to
// replace all subsequent lines.
if(leadingWhiteSpace && leadingWhiteSpace[0]) {
var whiteSpace = leadingWhiteSpace[0];
code = code.split('\n')
.map(l => l.replace(new RegExp('^' + whiteSpace + ''), ''))
.join('\n');
}
// update the inner HTML with the edited code
block.innerHTML = code;
Will result in:
<pre>
<code id="the-code">def some_funtion
return 'Hello, World!'
end</code>
</pre>
And will render as:
def some_funtion
return 'Hello, World!'
end
I also found that if you're using haml you can use the preserve method. For example:
preserve yield
This will preserve the whitespace in the produced yield which is usually markdown containing the code blocks.
I decided to come up with something more concrete than changing the way pre or code work. So I made some regex to get the first newline character \n (preceded with possible whitespace - the \s* is used to cleanup extra whitespace at the end of a line of code and before the newline character (which I noticed yours had)) and find the tab or whitespace characters following it [\t\s]* (which means tab character, whitespace character (0 or more) and set that value to a variable. That variable is then used in the regex replace function to find all instances of it and replace it with \n (newline). Since the second line (where pattern gets set) doesn't have the global flag (a g after the regex), it will find the first instance of the \n newline character and set the pattern variable to that value. So in the case of a newline, followed by 2 tab characters, the value of pattern will technically be \n\t\t, which will be replaced where every \n character is found in that pre code element (since it's running through the each function) and replaced with \n
$("pre code").each(function(){
var html = $(this).html();
var pattern = html.match(/\s*\n[\t\s]*/);
$(this).html(html.replace(new RegExp(pattern, "g"),'\n'));
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script>
<body>
Here is some code:
<pre><code>
Here is some fun code!
More code
One tab
One more tab
Two tabs and an extra newline character precede me
</code></pre>
</body>
<script>
$("pre[name='pre']").each(function () {
var html = $(this).html()
var blankLen = (html.split('\n')[0].match(/^\s+/)[0]).length
$(this).html($.trim(html.replace(eval("/^ {" + blankLen + "}/gm"), "")))
})
</script>
<div>
<pre name="pre">
1
2
3
</pre>
</div>
This is cumbersome, but it works if code folding is important to you:
<pre>def some_funtion</pre>
<pre> return 'Hello, World!'</pre>
<pre>end</pre>
In your css,
pre { margin:0 }
In vim, writing your code normally and then executing:
:s/\t\t\([^\n]\+\)/<pre>\1<\/pre>/
for each line would work.
If you are using this on a code block like:
<pre>
<code>
...
</code>
</pre>
You can just use css like this to offset that large amount of white space in the front.
pre code {
position: relative;
left: -95px; // or whatever you want
}
The pre tag preserves all the white spaces you have used while writing in the body. Where as normally if you do not use pre it will display the text normally...(HTML will make the browser to neglect those white spaces) Here try this I have used the paragraph tag.
Output:-
Here is my code:
def some_function
return 'Hello, World!'
end
<html>
<body>
Here is my code:
<p>
def some_function<br>
<pre> return 'Hello, World!'<br></pre>
end
</p>
</body>
</html>