Ruby encoding question - html

I'm saving scraped data to a web app, and here's a sample param:
400\xB0F.
This is the 'degree' character from a website, but when I put that into my model I get the dreaded invalid byte sequence in UTF-8 error.
Since it's coming from the web I thought I might try some client side encoding, so javascript turns that into: 400%B0F. This can at least get saved by ActiveRecord with no issue, but Rails seems to be escaping it again on the way out so those entities aren't decoded by the browser, so my show method shows the entire encoded string.
Where should I be cleaning up my input data, and what methods might be the best to use for unpredictable input?
Thanks!

Years ago I had, and solved, this very same problem in builder. Take a look at the to_xs method: http://builder.rubyforge.org/classes/String.html#M000007
You can require builder, and use it directly (you might want to pass false to escaping or you will get entity escaped output). Either that, or simply steal and adapt the source.
Update: here is the original, standalone, library:
http://intertwingly.net/stories/2005/09/28/xchar.rb

Perhaps you can use a binary form (like for upload file) with enctype="multipart/form-data" in form tag. Like this, you can use this data as a binary data ?
It's depends perhaps of waht you do with this data.

URI.unescape was the trick, after I encoded it client-side

Related

How to fetch special chars from Website with Google Apps Script

I'm fetching a Website, but all the Special Characters in the String from .getContentText() or .getContentText("UTF-8") are encoded as ’ and such.
I've really run out of ideas, and to be honest don't quite understand at which point this Encoding happens. Thanks a lot for your help. I could solve it by "manually" replacing all the occurances, but that doesnt seem very clean.
var response = UrlFetchApp.fetch("https://podtail.com/de/top-podcasts/de/");
var html = response.getContentText();
Your sample code suggests that you are retrieving the HTML source code of a specific page. That HTML source code uses ’ and friends, so the data will be in that format. It is unclear why you would need to decode those HTML entities.
If you really need to decode the HTML fully in Google Apps Script, you will need a parser of fairly respectable complexity. There are some shortcuts that you can try if your app has an HTML user interface of its own, but it would probably make more sense to use a library like the one by mathiasbynens.
If you only want to replace some HTML entities with their non-encoded equivalents, you may want to just use String.replace().

How to generate Windows 1250 encoded content using LogicApp in Azure

As I know, LogicApps works on UTF-8 character set, and I need to compose the file in the Windows 1250 code page. How can I do this?
The json_encode function doesn't appear in the built-in Expressions/Functions inside the LogicApp.
There is not File Encoding in Logic App, contents can be encoded in whatever format you need and the app may or may not even be aware of this.
If you have a specific situation, you could do a content type conversion with Azure Functions using stream writer. Here is a blog and a question about conversion.
And the logic app supports Azure Functions integration.
Hope this could help you, if you still have other questions, please let me know.

how to pass backslash in html form

I know very little html, I have a backend application that does a mongodb lookup. I am building a simple html screen with forms to accept value to a web service which will run the mongo query and reply on the screen.
When I pass a filename path field in my form like this
\\test.server.com\filetest\test
in my web service app, I see the value coming in as
%5c%5Ctest.server.com%5cfiletest%5ctest
how can I get the value without this translation.
Matter fact I was hoping it would come in like this
\\\\test.server.com\\filetest\\test
as that is how things got stored in mongo.
You cannot pass a backslash directly as it is. That's because URLs can only be ASCII encoded. What this means is, that when you need to pass some special characters like Ü, as well as characters that need to be escaped in URLs (as spaces, backslashes, etc.) you need a way to represent them with ASCII symbols.
In your case the URL is getting encoded and backslashes are converted to %5c. To have them revert to '\' you need to either:
Decode them back in your server-side code. This is your best bet. This is done in different ways, depending on the technology your backend uses. In PHP, for example, you can use urldecode function - here.
Decode characters before querying in mongodb itself. This you will need to work on, because I'm not aware of a functionality that does this for you out of the box.
More info on URL encoding can be found here.
Hope this helps!

Can URL #anchor contain binary data?

I'm trying to encode web pages state in #anchor. Right now I am base64 encoding a JSON string, but it sometimes gets too long (10K+). Apparently I hit some kind of URL length limitation and it just doesn't work right (it gets cut off and JSON data structure can't be reconstructed).
I talked with some of my buddies and they said try to bzip or gzip it. I tried that, but now my #anchor is binary data.
I haven't been able to decode it properly, and I'm not sure if it even got sent correctly as part of URL.
Does anyone know how to add binary data in #anchor, if it's a good idea, or how to come up with an alternative working solution for my problem?
I would not bother with all of this.
Use Local Storage for your large data, and send a reference through your anchor to the data.

Seting TextCodepage property of WorkbookSaveAsArgs (Workbook) object

Folks, I'm trying to set the TextCodepageproperty of WorkbookSaveAsArgs used as argument for an Workbook method SaveAs. Which is used to convert .xls files in .csv ones. However, this property receives a generic object and I don't know how to properly set it. In msdn documentation it only say Ignored for all languages in Microsoft Excel. or Not used in U.S. English Excel.. When my documents are being convert, it generate invalid characters cause my input files are in portuguese. Thus, I need a encode mode that accept this language. Any suggestions?
I can only suggest saving with FileFormat:=xlUnicodeText, instead of xlCSV. But use a ".csv" file extension when you do so. This should preserve your portuguese characters.
It works to me, use the "FileFormat:=xlCSVUTF8". Try if it works, i suggest looking the documentation too "https://learn.microsoft.com/pt-br/office/vba/api/excel.xlfileformat"