I am trying to parse a string to return the text between two sets. For example, my string is: "faultstring>Item not valid: The specified Standard SIP1 Profile was not found faultstring>"
I want to write a function that will return the string: Item not valid: The specified Standard SIP1 Profile was not found
I am new to tcl and your help is very much appreciated.
Please let me know.
Thanks.
Assuming there is no faultstring> inside the interesting string, and there might be some uninteresting garbage before and after specified fragment:
set testString "faultstring>Item not valid: The specified Standard SIP1 Profile was not found faultstring>"
if {[regexp {faultstring>(.*)faultstring>} $testString _ extracted]} {
puts "Got it: $extracted"
}
The answer may vary for other assumptions.
Related
So I'm writing a JSON parser in OCaml, and I need to get a slice of a string. More specifically, I need to get the first n characters of a string so I can pattern-match with them.
Here's an example string:
"null, \"field2\": 25}"
So, how could I use just a couple lines of OCaml code to get just the first 4 characters (the null)?
P.S. I've already thought about using something like input.[0..4] but I'm not entirely sure how that works, I'm reasonably new to OCaml and the ML family.
Using build-in sub function should do the work:
let example_string = "null, \"field2\": 25}"
(*val example_string : string = "null, \"field2\": 25}" *)
let first_4 = String.sub example_string 0 4
(*val first_4 : string = "null" *)
I suggest you to look at official documentation:
https://caml.inria.fr/pub/docs/manual-ocaml/libref/String.html
And if you are not doing this for self teaching I would strongly suggest using one of available libraries for the purpose, such as yojson (https://ocaml-community.github.io/yojson/yojson/Yojson/index.html) for example.
I have a mpgw where the request is JSON.
I save the content in a context variable with JSON.stringify(json)
The problem is when json contains a emoiji eg \uD83D\uDE0D tha variable no longer will be a string, it will be binary and the emoijis is shown as dots.
I need to use the the content of the variable later to calculate hmac so it has to look exact as the original json.
Is there any way to get around this?
Help wold be much appreciated.
We are running firmware: IDG.7.5.2.9
/Jocke D
Ok, from your comment I can conclude that it is the Stringify() that messes it up. This is according to the cookbook for escaping (there is a RFC describing this)...
Try adding your own function for stringify() that will handle unicode better:
function JSON_stringify(s, emit_unicode) {
var json = JSON.stringify(s);
return emit_unicode ? json : json.replace(/[\u007f-\uffff]/g,
function(c) {
return '\\u'+('0000'+c.charCodeAt(0).toString(16)).slice(-4);
}
);
}
ctx.setVar('json', JSON_stringify(json, false));
Something like that...
I am very new to Rust, and trying to build a HTML parser.
I first tried to parse the string and put it in the Hashmap<&str, i32>.
and I figured out that I have to take care of letter cases.
so I added tag.to_lowercase() which creates a String type. From there it got my brain to panic.
Below is my code snippet.
fn html_parser<'a>(html:&'a str, mut tags:HashMap<&'a str, i32>) -> HashMap<&'a str, i32>{
let re = Regex::new("<[:alpha:]+?[\\d]*[:space:]*>+").unwrap();
let mut count;
for caps in re.captures_iter(html) {
if !caps.at(0).is_none(){
let tag = &*(caps.at(0).unwrap().trim_matches('<').trim_matches('>').to_lowercase());
count = 1;
if tags.contains_key(tag){
count = *tags.get_mut(tag).unwrap() + 1;
}
tags.insert(tag,count);
}
}
tags
}
which throws this error,
src\main.rs:58:27: 58:97 error: borrowed value does not live long enough
src\main.rs:58 let tag:&'a str = &*(caps.at(0).unwrap().trim_matches('<').trim_matches('>').to_lowercase());
^~~~~~~~~~~~~~~~~~~
src\main.rs:49:90: 80:2 note: reference must be valid for the lifetime 'a as defined on the block at 49:89...
src\main.rs:49 fn html_parser<'a>(html:&'a str, mut tags:HashMap<&'a str, i32>)-> HashMap<&'a str, i32>{
src\main.rs:58:99: 68:6 note: ...but borrowed value is only valid for the block suffix following statement 0 at 58:98
src\main.rs:58 let tag:&'a str = &*(caps.at(0).unwrap().trim_matches('<').trim_matches('>').to_lowercase());
src\main.rs:63
...
error: aborting due to previous error
I read about lifetimes in Rust but still can not understand this situation.
If anyone has a good HTML tag regex, please recommend so that I can use it.
To understand your problem it is useful to look at the function signature:
fn html_parser<'a>(html: &'a str, mut tags: HashMap<&'a str, i32>) -> HashMap<&'a str, i32>
From this signature we can see, roughly, that both accepted and returned hash maps may only be keyed by subslices of html. However, in your code you are attempting to insert a string slice completely unrelated (in lifetime sense) to html:
let tag = &*(caps.at(0).unwrap().trim_matches('<').trim_matches('>').to_lowercase());
The first problem here (your particular error is about exactly this problem) is that you're attempting to take a slice out of a temporary String returned by to_lowercase(). This temporary string is only alive during this statement, so when the statement ends, the string is deallocated, and its references would become dangling if this was not prohibited by the compiler. So, the correct way to write this assignment is as follows:
let tag = caps.at(0).unwrap().trim_matches('<').trim_matches('>').to_lowercase();
let tag = &*tag;
(or you can just use top tag and convert it to a slice when it is used)
However, your code is not going to work even after this change. to_lowercase() method allocates a new String which is unrelated to html in terms of lifetime. Therefore, any slice you take out of it will have a lifetime necessarily shorter than 'a. Hence it is not possible to insert such slice as a key to the map, because the data they point to may be not valid after this function returns (and in this particular case, it will be invalid).
It is hard to tell what is the best way to fix this problem because it may depend on the overall architecture of your program, but the simplest way would be to create a new HashMap<String, i32> inside the function:
fn html_parser(html:&str, tags: HashMap<&str, i32>) -> HashMap<String, i32>{
let mut result: HashMap<String, i32> = tags.iter().map(|(k, v)| (k.to_owned(), *v)).collect();
let re = Regex::new("<[:alpha:]+?[\\d]*[:space:]*>+").unwrap();
for caps in re.captures_iter(html) {
if let Some(cap) = caps.at(0) {
let tag = cap
.trim_matches('<')
.trim_matches('>')
.to_lowercase();
let count = result.get(&tag).unwrap_or(0) + 1;
result.insert(tag, count);
}
}
result
}
I've also changed the code for it to be more idiomatic (if let instead of if something.is_none(), unwrap_or() instead of mutable local variables, etc.). This is a more or less direct translation of your original code.
As for parsing HTML with regexes, I just cannot resist providing a link to this answer. Seriously consider using a proper HTML parser instead of relying on regexes.
I would like to parse an HTML document and print each of the paragraphs to a log file as an individual entry. So far I have:
let parseTextFile (path) =
let fileText = File.ReadAllText(path)
fileText.Split('<p>') |> Seq.iter (fun m -> logEmail(m))
But unfortunately for me string.Split does not do what I want here, it seems to exist to split a string by a single character delimiter. How can I split the file up using something more than a single character, it may be nice to have something more than just <p> as well because with just that I will have a </p> at the end of the paragraph. With a regex or some sort of complex matcher I could more specifically pick out everything between <p> tags.
Try using specific libraries for parsing html, for example HtmlAgilityPack.
As wmeyer said, you need to use a different overload of the .Split() method on strings. In fact, the code you posted won't even compile because '<p>' is not a string literal -- you need to use "<p>" instead (single quotes are for character literals).
Here's how to use the correct overload of .Split():
open System.IO
let parseTextFile path =
let fileText = File.ReadAllText path
fileText.Split ([| "<p>"; |], System.StringSplitOptions.RemoveEmptyEntries)
|> Seq.iter logEmail
For a quick test in F# Interactive:
> "First paragraph<p>Second paragraph.<p><p>Third paragraph.<p>"
.Split ([| "<p>"; |], System.StringSplitOptions.RemoveEmptyEntries);;
val it : string [] =
[|"First paragraph"; "Second paragraph."; "Third paragraph."|]
Finally, as #ntr said -- you're much, much better off using a library like the HTML Agility Pack for parsing HTML. Their parsers are very robust and will save you a lot of trouble.
I received this error, and I couldn't find any reasonable answer to this question, so I thought I'd write a summary of the problem.
If you run this snippet in irb:
JSON.parse( nil )
You'll see the following error:
TypeError: can't convert nil into String
I was kind of expecting the function to return nil, and not a TypeError. If you convert all input using to_s, then you'll see the octet error:
JSON::ParserError: A JSON text must at least contain two octets!
That's just fine and well. If you don't know what an octet is, read this post for a summary and solution:
What is a JSON octet and why are two required?
Solution
The variable you're passing in is an empty string. Don't attempt to use an empty string in the JSON.parse method.
Question
So, now I know the cause of the error, what pattern should I use to handle this? I'm a bit loathe to monkey patch the JSON library to allow nil values. Any suggestions would be greatly appreciated.
parsed = json && json.length >= 2 ? JSON.parse(json) : nil
But really the library should be able to handle this case and return nil. Web browsers with built-in JSON support seem to work just like you expect after all.
Or to do it with a only slightly intrusive mini patch:
module JSON
def self.parse_nil(json)
JSON.parse(json) if json && json.length >= 2
end
end
parsed = JSON.parse_nil(json)
data.presence && JSON.parse(data)
JSON.parse(data.presence || '{}')
According to json.org
JSON is built on two structures:
A collection of name/value pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list, or associative array.
An ordered list of values. In most languages, this is realized as an array, vector, list, or sequence.
So, minimum two octets(8 bits) required at the top level would be {} or []
IMO, the best solution would be to make sure the argument to JSON.parse is either an strigified object or a strigified array. :-)
hash = JSON.parse(json) rescue {}
array = JSON.parse(json) rescue []
string = JSON.parse(json) rescue ''