Reading HTML contents of a URL in OCaml - html

I would like to write an OCaml function which takes a URL and returns a string made up of the contents of the HTML file at that location. Any ideas?
Thanks a lot!
Best,
Surikator.

I've done both of those things using ocurl and nethtml
ocurl to read the contents of the URL (tons of properties here; this is the minimum),
let string_of_uri uri =
try let connection = Curl.init () and write_buff = Buffer.create 1763 in
Curl.set_writefunction connection
(fun x -> Buffer.add_string write_buff x; String.length x);
Curl.set_url connection uri;
Curl.perform connection;
Curl.global_cleanup ();
Buffer.contents write_buff;
with _ -> raise (IO_ERROR uri)
and from nethtml; (you might need to set up a DTD for Nethtml.parse)
let parse_html_string uri =
let ch = new Netchannels.input_string (string_of_uri uri) in
let docs = Nethtml.parse ?return_pis:(Some false) ch in
ch # close_in ();
docs
Cheers!

Related

Using JsonProvider from Fsharp.Data for Binance-Request

I'm trying to work with the Binance-Connector for .NET. In the F#-Examples, we have the following:
let f =
let loggerFactory = LoggerFactory.Create(fun (builder:ILoggingBuilder) ->
builder.AddConsole() |> ignore
)
let logger = loggerFactory.CreateLogger()
let loggingHandler = new BinanceLoggingHandler(logger)
let httpClient = new HttpClient(loggingHandler)
let market = new Market(httpClient)
let result = market.TwentyFourHrTickerPriceChangeStatistics() |> Async.AwaitTask |> Async.RunSynchronously
0
result is a string and looks something like this:
{"symbol":"ETHBTC","priceChange":"0.00013400","priceChangePercent":"0.179","weightedAvgPrice":"0.07444089","prevClosePrice":"0.07467300","lastPrice":"0.07480700","lastQty":"0.04640000","bidPrice":"0.07480600","bidQty":"6.01380000","askPrice":"0.07480700","askQty":"48.54320000","openPrice":"0.07467300","highPrice":"0.07531600","lowPrice":"0.07357000","volume":"80296.33090000","quoteVolume":"5977.33041290","openTime":1650281947747,"closeTime":1650368347747,"firstId":335177449,"lastId":335313233,"count":135785}
This works as intended, but of course I want to work with the result.
So I tried to deserialize it with the JsonProvider:
type Simple = JsonProvider<result>
Which for some reason doesn't work. The error resides with and it says that the Constructor or value is not defined (FS0039)
A sample in the docs for JsonProvider is given as follows:
type Simple = JsonProvider<""" { "name":"John", "age":94 } """>
let simple = Simple.Parse(""" { "name":"Tomas", "age":4 } """)
simple.Age
simple.Name
How can I correctly cast the json to a type?
Best regards
JsonProvider<...> takes a static string parameter which is either a sample string or a sample file (local or online accessible) in order to let the provider infer the types from it.
So in your case it would be:
let [<Literal>] BinanceSample = """ [{"symbol":"ETHBTC","priceChange":"-0.00163000"}] """
type Binance = JsonProvider<BinanceSample>
Then you should be able to parse the JSON with:
let parsed = Binance.Parse(result)
PS: try to provide a JSON sample that is as complete as possible.

Alamofire string parameter hardcoded works but passing as string parameter does not

I'm attempting to make a Alamofire service call to retrieve a list of items in JSON. The issue I am having is that anytime I type in a special character: such as ' it somewhere resolves the string a unicode string while sending the request. When I type in o'sheas its coming back that I'm searching O\U2019sheas
func sendGetRequest(passedInString: String) {
PARAMETERS["searchTxt"] = passedInString
Alamofire.request(URL, method: .get , parameters: PARAMETERS, headers: HEADER)
.validate(statusCode: 200..<400)
.responseJSON { response in
debugPrint(response.request!)
switch response.result {
// GETTING NO RESULTS BECAUSE THE REQUEST IS TURNING the o'sheas into O\U2019sheas
But the odd thing is, if I just replace this:
PARAMETERS["searchTxt"] = passedInString
with a hardcoded string (the one I'm typing initially and getting no results)
PARAMETERS["searchTxt"] = "o'sheas"
...it works just fine and does not convert this to O\U2019sheas. Any idea how to fix this, and more importantly, why this is happening when passed in as a String parameter as opposed to hard coded string?
UPDATE:
I've tried adding the URLEncoding.default as the encoding parameter, same results.
I've also tried creating a Swift String Extension to be used on the searchTxt as passing it as parameter like so:
PARAMETERS["searchTxt"] = passedInString.escapeSpecialCharacters()
extension String {
mutating func escapeSpecialCharacters() {
// 1. Define replacements
let replacementCharacters = ["'" : "\'"]
// 2. Cycle through each replacement defined in the replacementCharacters array and remodify the string accordingly.
replacementCharacters.keys.forEach {
// 3. Replacing self with new modified string
self = self.replacingOccurrences(of: $0, with: replacementCharacters[$0]!)
}
}
}
But same results.
If you are sure about this Unicode problem, then you can use Encoder and Decoder functionality to handle this.
Or if you know the your String always will be same, then use Regex expression to append '\'
I think the problem is with the "get" - cause it will use your string in the URL as parameter, the "'" can't be handelt so well.
Try to create a url with URLComponents
var urlComponents = URLComponents()
urlComponents.scheme = scheme
urlComponents.host = host
urlComponents.path = path
urlComponents.queryItems = [queryItem]
then use urlComponents.url in
Alamofire.request(urlComponents.url, method: .get, headers: HEADERS) {...}
URLQueryItem(name: "Param", value: "eter")
Try below lines of code. May help you.
originalString = "o'sheas"
var escapedString = originalString.addingPercentEncoding(withAllowedCharacters: .urlHostAllowed)
PARAMETERS["searchTxt"] = escapedString

How serialize to stream with JSON.NET on F#?

I need to send JSON to WebResourceResponse:
override this.ShouldInterceptRequest(view:WebView, request:IWebResourceRequest) =
let rows = Customer.fakeData 1 // Array of records
let st = Shared.jsonToStream(rows)
new WebResourceResponse("application/javascript", "UTF-8", st)
But don't see how do that. I use Json.NET and F#.
When I run:
let jsonToStream(value:'T) =
let serializer = new JsonSerializer()
let std = new IO.MemoryStream()
let sw = new IO.StreamWriter(std)
let json = new JsonTextWriter(sw)
serializer.Serialize(json, value)
//std.Position <- 0L
std
the response returns as a blank string.
You have the following issues:
You need to dispose of your StreamWriter and JsonTextWriter so that the serialized JSON is flushed to the underlying stream. This can be done by replacing let with use.
However, you need to do so without closing the underlying stream std, since you are going to read from it later.
Having done so, you need to reset the position of the std stream after sw and json have gone out of scope and been disposed. If you try to reset the position before then it won't work.
Thus the following will work:
let jsonToStream(value:'T) =
let serializer = new JsonSerializer()
let std = new IO.MemoryStream()
( use sw = new StreamWriter(std, new UTF8Encoding(false, true), 1024, true)
use json = new JsonTextWriter(sw, CloseOutput = false)
serializer.Serialize(json, value))
std.Position <- 0L
std
Note the use of parentheses to restrict the scope of sw and json so that std.Position can be reset after they go out of scope. jsonToStream(rows) will now return an open MemoryStream containing complete, serialized JSON and positioned at the beginning.
Sample working f# fiddle.

Read json file in and write without indentation

The following code take a folder of json files (saved with indentation) open it, get content and serialize to json and write to file a new file.
Same code task in python works, so it is not the data. But the rust version you see in here:
extern crate rustc_serialize;
use rustc_serialize::json;
use std::io::Read;
use std::fs::read_dir;
use std::fs::File;
use std::io::Write;
use std::io;
use std::str;
fn write_data(filepath: &str, data: json::Json) -> io::Result<()> {
let mut ofile = try!(File::create(filepath));
let encoded: String = json::encode(&data).unwrap();
try!(ofile.write(encoded.as_bytes()));
Ok(())
}
fn main() {
let root = "/Users/bling/github/data/".to_string();
let folder_path = root + &"texts";
let paths = read_dir(folder_path).unwrap();
for path in paths {
let input_filename = format!("{}", path.unwrap().path().display());
let output_filename = str::replace(&input_filename, "texts", "texts2");
let mut data = String::new();
let mut f = File::open(input_filename).unwrap();
f.read_to_string(&mut data).unwrap();
let json = json::Json::from_str(&data).unwrap();
write_data(&output_filename, json).unwrap();
}
}
Do you have spot an Error in my code already or did I get some language concepts wrong. Is the rustc-serialize cargo wrongly used. At the end it does not work as expected - to outperform python.
± % cargo run --release --verbose
Fresh rustc-serialize v0.3.16
Fresh fileprocessing v0.1.0 (file:///Users/bling/github/rust/fileprocessing)
Running `target/release/fileprocessing`
thread '<main>' panicked at 'called `Result::unwrap()` on an `Err` value: SyntaxError("unescaped control character in string", 759, 55)', ../src/libcore/result.rs:736
Process didn't exit successfully: `target/release/fileprocessing` (exit code: 101)
Why does it throw an error is my serializing json done wrong?
Can I get the object it fails on? What about encoding?
...is the code right or is there something obvious wrong with some more experience?
Wild guess: if the same input file can be parsed by other JSON parsers (e.g. in Python), you may be hitting a rustc-serialize bug that was fixed in https://github.com/rust-lang-nursery/rustc-serialize/pull/142. Try to update?

F#: System.Net.WebException

I am new to programming and F# is my first language. I am currently still very unfamiliar with .NET APIs.
As a beginner's project, I want to scrape a website. I want to write a function that, given a specific URL, automatically downloads all the HTML contents on that page. However, if the URL is invalid, rather than throwing a System.Net.WebException message, I want to return the Boolean output "False" instead.
Here is the relevant part of my code:
let noSuchURL (url: string) =
let html = downloadHtmlFromUrl url
let regexPattern = #"<title>Page not found</title>"
let matchResult = Regex.IsMatch(html, regexPattern)
matchResult
(I have tested the downloadHtmlFromUrl function in F# interactive, and it works fine.)
I realised that the code above does not return a Boolean value in the event that the address is invalid. Instead, System.Net.WebException is thrown, with the message "System.Net.WebException: The remote server returned an error: (404) Not Found".
What changes can I make to get a Boolean output?
Maybe catch the exception?
let noSuchURL (url: string) =
try
let html = downloadHtmlFromUrl url
let regexPattern = #"<title>Page not found</title>"
let matchResult = Regex.IsMatch(html, regexPattern)
matchResult
with :? System.Net.WebException -> false
One caveat: this program will return false if there is a WebException, no matter for what reason that exception being raised. If you want specifically to return false on 404 responses, you'll have to look closer at the WebException:
let noSuchURL (url: string) =
try
let html = downloadHtmlFromUrl url
let regexPattern = #"<title>Page not found</title>"
let matchResult = Regex.IsMatch(html, regexPattern)
matchResult
with
:? System.Net.WebException as e
when e.Status = WebExceptionStatus.ProtocolError ||
e.Status = WebExceptionStatus.NameResolutionFailure
-> false
For more on exceptions in F#, take a look at https://msdn.microsoft.com/en-us/library/dd233194.aspx.