Jekyll post HTML not getting formatted correctly - jekyll

I am generating my blog using GIT HUB pages , some of the posts in the blog do not seem to get rendered properly while some work .
All the formating and display is lost and the page ends up showing the the actual md file instead of the html.
You can see the issues on the link here http://pratikvasani.github.io/archive/2015/01/16/Valuetype-or-reference-type/
The .md file in question is here https://github.com/pratikvasani/pratikvasani.github.io/blob/master/_posts/2015-01-16-Valuetype-or-reference-type.md
---
layout: post
title: Value type
date: 2015-12-25
summary: Value types.
categories: MVC6 Localization
---
This looks like a amateur question , but don't be surprised if you get the answer wrong.
There is more to understand in value types and reference types than saying
Value types are data types which are stored on the stack while Reference types are stored on the Heap.
While this is a very popular statement which is used to differentiate value types and reference types , it is not entirely true.
....
Could anyone tell me whats wrong ? I have checking the format of the .md file and its correct . The encoding is also fine.

Your file is UTF-8 with BOM encoded. As you can see in Jekyll documentation, UTF-8 files must be encoded with no BOM.
Remove the BOM and it works.

There was an unprintable character at the beginning of the file.
I've created a PR.
You could easily noticed it just by comparing how github processes the markdown: problematic file
and other post file.

Related

How do you save a JSON response with Emojis as Unicode?

Currently I am scraping Instagram comments for a sentiment analysis project, and am using an Instagram scraper. It is supposed to output a comment file but it doesn't, so a workaround is to find the query URL in the log file and paste it into a browser.
An example URL would be this https://www.instagram.com/graphql/query/?query_hash=33ba35852cb50da46f5b5e889df7d159&variables={%22shortcode%22:%22CMex-IGn1G-%22,%22first%22:50,%22after%22:%22QVFCaERkTm84aWF3T1Exbmw5V0xhb05haVBEY2JaYmxhSTNGWVZ4M2RQWi0yVzVUSExlUlRYOUtsOVEtM0trRzBmSGxyYjdJV094a1hlYm1aLXZjdkVpZQ==%22}.
On Firefox I am able to view the JSON response and am also able to download it through two ways:
CTRL + A to select all and paste into a JSON file.
Download webpage as a JSON file.
The issue with these methods are that neither of these retain the emoji data. The first loses the emojis as they are not stored in unicode, but rather as question marks ???. I assumed this was related to the encoding, so tried to paste the raw response into Unicode files. Instead they are the emojis which can be represented as emojis ️🙌👏😍, but not unicode.
The second method either saves it with only the message {"message":"rate limited","status":"fail"} or another incorrect format.
The thing is, is that a few months ago I scraped some pages and managed to save the comments with the emojis stored in the unicode format. This is frustrating as I know it can be done, but I can't remember the process how I did it as I would have tried something basic, as I have outlined.
I am out of ideas and would greatly appreciate any help. Thank you.

Freemarker CSV generation - CSV with Chinese text truncates the csv contents

I have this very weird problem. I'm using Java 8, Struts2 and Freemarker 2.3.23 to generate reports in csv and html file formats (via.csv.ftl and .html.ftl templates both saved in utf-8 encoding), with data coming from postgres database.
The data has chinese characters in it and when I generate the report in html format, it is fine and complete and chinese characters are displayed properly. But when the report is generated in csv, I have observed that:
If I run the app with -Dfile.encoding=UTF-8 VM option, the chinese characters are generated properly but the report is incomplete (i.e. the texts are truncated specifically on the near end part)
If I run the app without -Dfile.encoding=UTF-8 VM option, the chinese characters are displayed in question marks (?????) but the report is complete
Also, the app uses StringWriter to write the data to the csv and html templates.
So, what could be the problem? Am I hitting Java character limits? I do not see error in the logs either. Appreciate your help. Thanks in advance.
UPDATE:
The StringWriter returns the data in whole, however when writing the data to the OutputStream, this is where some of the data gets lost.
ANOTHER UPDATE:
Looks like the issue is on contentLength (because the app is a webapp and csv is generated as file-download type) being generated from the data as Strings using String.length(). The String.length() method returns less value when there should be more. Maybe it has something to do with the chinese characters that's why length is being reported with less value.
I was able to resolve the issue with contentLength by using String.getBytes("UTF-8").length

MediaWiki filepath Magic Word doesn't work for some files types

I'm trying to use the MediaWiki filepath magic word` so that I can create some template links that pass a specific MediaWiki file. Unfortunately with certain file types, filepath just returns nothing.
The file I'm trying to get the path for that's failing is a text file in this case. I have confirmed that I am using the correct filename as I can create a regular file link using [[File:Name.txt]], and {{filepath:Image.png}} works properly.
Example of what I'm trying to accomplish:
[http://server/processfile.php?path={{filepath:<filename>}} Process A File]
Is this a known issue? Is there an easy way that I can debug what's happening here?
After digging around a bunch more I was able to resolve the issue. It turns out that even though the MediaWiki would accept the file, it was being assigned a random mime type because it was a .yaml file.
After updating mime.types and mime.info in MediaWiki and adding the mime type (text/yaml) to my IIS configuration, I was able to get the downloads working and the file links showing up.
Full disclosure: I may have been using an incorrectly cased file name even though I said that I was using the correct file name. :P

UWC is unable to convert the HTML tags to correct form in Confluence

I am using the UWC to convert our Mediawiki data to Confluence. I am able to create text files for all the data successfully. But I figured out that some pages have HTML tags in the text files
eg - Title of Page
After checking throughly I found out that this is the way data is stored in the Mediawiki database. So it gets carried forward to the text files created.
When this text file is used to create pages in Confluence, it doesn't get converted to h1 or h2. Thus the page created in Confluence contains the data with the html tags in them.
So my question is, does the UWC support this type of syntax or do I need to write my own parser to parse such occurences of HTML tags.
Secondly, I have a problem with the bullet points as well. What I found is say there are 4 bullet points under a topic, say for 3 the style is maintained and the last is converted as "*". This occurs sometimes for some pages only.
eg-
In the text file
* Topic 1
* Topic 2
* Topic 3
In Confluence
Topic 1
Topic 2
.
* Topic 3
Thirdly, the attachments are not getting carried to Confluence. I tried to search for the attachments location in Mediawiki but could not locate it.
Any help would be helpful.
The UWC does its best with the data it receives. I'd suggest looking at the mediawiki properties file to tweak the html conversions and also look at any regex that you can use to more successfully cleanse your data.
UWC does work well, but think as it as the basis of the conversion that requires some carressing rather than a complete out of the box solution.

Character encoding not being picked up

http://www.mamstore.co.uk/bin/pxisapi1.exe/catalogue?level=805838
Look where its (meant to say) £5 T-shirts. Instead the '£' comes up as an invalid character, yet the exact same char is shown just below on the products.
I am getting the same when i pull a php files contents in with Jquery. The actual PHP file shows the chars correctly (without any head/body set etc) as soon as i pull it into the site it suddenly has issues with it.
Its stored in an SQL DB on a custom build CMS / WMS system.
Any suggestions would be much appreciated.
Cheers
Your page is encoded with UTF, but character in breadcrumbs is encoded with ISO. What encoding do you have in your database?