Function which removes html color from a string with sscanf - html

I've a big dilemma how can I do a condition to remove this type of color from my string (ex: {dd2e22}) using sscanf, which is only func I want to use. So the string provided will be some random text:
Te{dd2e22}xt is {3f53ec}here
The condition what I tried
sscanf(buf,"%[^\{[0-9a-fA-F]{6,8}\}]s",output);
This isn't working, the result are only first character "T".

Try using the format specifier:
"%*6X"
Analysis:
% -- starts a format specifier.
* -- tells scanf not to assign field to variable.
6x -- says that field is 6 hex digits.
See scanf format specifier

result are only first character "T".
Well, the next character is 'e', which matches the set \{[0-9a-fA-F]{6,8}\ and thus doesn't match the inverted set specified by '^'.
This task can be achieved with a regular expression. The standard library provides you with appropriate tools in the <regex> header.

Related

String.IndexOf() returns unexpected value - cannot extract substring between two search strings

Script to manipulate some proper names in a web story to help my reading tool pronounce them correctly.
I get the content of a webpage via
$webpage = (Invoke-WebRequest -URI 'https://wanderinginn.com/2018/03/20/4-20-e/').Content
This $webpage should be of type String.
Now
$webpage.IndexOf('<div class="entry-content">')
returns correct value, yet
$webpage.IndexOf("Previous Chapter")
returns unexpected value and I need some explanation why or how I can find the error myself.
In theory it should cut the "body" of the page run it through a list of proper nouns I want to Replace and push this into a htm-file.
It all works, but the value of IndexOf("Prev...") does not.
Edit:
After invoke-webrequest I can
Set-Clipboard $webrequest
and post this in notepad++, there I can find both 'div class="entry-content"' and 'Previous Chapter'.
If I do something like
Set-Clipboard $webpage.substring(
$webpage.IndexOf('<div class="entry-content">'),
$webpage.IndexOf('PreviousChapter')
)
I would expect Powershell to correctly determine both first instances of those strings and cut between. Therefore my clipboard should now have my desired content, yet the string goes further than the first occurrence.
tl;dr
You had a misconception about how String.Substring() method works: the second argument must be the length of the substring to extract, not the end index (character position) - see below.
As an alternative, you can use a more concise (albeit more complex) regex operation with -replace to extract the substring of interest in a single operation - see below.
Overall, it's better to use an HTML parser to extract the desired information, because string processing is brittle (HTML allows variations in whitespace, quoting style, ...).
As Lee_Dailey points out, you had a misconception about how the String.Substring() method works: its arguments are:
a starting index (0-based character position),
from which a substring of a given length should be returned.
Instead, you tried to pass another index as the length argument.
To fix this, you must subtract the lower index from the higher one, so as to obtain the length of the substring you want to extract:
A simplified example:
# Sample input from which to extract the substring
# '>>this up to here'
# or, better,
# 'this up to here'.
$webpage = 'Return from >>this up to here<<'
# WRONG (your attempt):
# *index* of 2nd substring is mistakenly used as the *length* of the
# substring to extract, which in this even *breaks*, because a length
# that exceeds the bounds of the string is specified.
$webpage.Substring(
$webpage.IndexOf('>>'),
$webpage.IndexOf('<<')
)
# OK, extracts '>>this up to here'
# The difference between the two indices is the correct length
# of the substring to extract.
$webpage.Substring(
($firstIndex = $webpage.IndexOf('>>')),
$webpage.IndexOf('<<') - $firstIndex
)
# BETTER, extracts 'this up to here'
$startDelimiter = '>>'
$endDelimiter = '<<'
$webpage.Substring(
($firstIndex = $webpage.IndexOf($startDelimiter) + $startDelimiter.Length),
$webpage.IndexOf($endDelimiter) - $firstIndex
)
General caveats re .Substring():
In the following cases this .NET method throws an exception, which PowerShell surfaces as a statement-terminating error; that is, by default the statement itself is terminated, but execution continues:
If you specify an index that is outside the bounds of the string (a 0-based character position less than 0 or one greater than the length of the string):
'abc'.Substring(4) # ERROR "startIndex cannot be larger than length of string"
If you specify a length whose endpoint would fall outside the bounds of the string (if the index plus the length yields an index that is greater than the length of the string).
'abc'.Substring(1, 3) # ERROR "Index and length must refer to a location within the string"
That said, you could use a single regex (regular expression) to extract the substring of interest, via the -replace operator:
$webpage = 'Return from >>this up to here<<'
# Outputs 'this up to here'
$webpage -replace '^.*?>>(.*?)<<.*', '$1'
The key is to have the regex match the entire string and extract the substring of interest via a capture group ((...)) whose value ($1) can then be used as the replacement string, effectively returning just that.
For more information about -replace, see this answer.
Note: In your specific case an additional tweak is needed, because you're dealing with a multiline string:
$webpage -replace '(?s).*?<div class="entry-content">(.*?)Previous Chapter.*', '$1'
Inline option ((?...)) s ensures that metacharacter . also matches newline characters (so that .* matches across lines), which it doesn't by default.
Note that you may have to apply escaping to the search strings to embed in the regex, if they happen to contain regex metacharacters (characters with special meaning in the context of a regex):
With embedded literal strings, \-escape characters as needed; e.g., escape .txt as \.txt
If a string to embed comes from a variable, apply [regex]::Escape() to its value first; e.g.:
$var = '.txt'
# [regex]::Escape() yields '\.txt', which ensures
# that '.txt' doesn't also match '_txt"
'a_txt a.txt' -replace ('a' + [regex]::Escape($var)), 'a.csv'

HTML pattern in character#2354 format

How can I make a pattern which requires the user to type in the format
string#number
The string can contain any character other than #
eg robert#2345
bob34#7805
linda_2#3444
!3eve3!#5545
This is what I have right now
pattern="[]+#[0-9]$"
thanks for the help in advanced
string#number
The string can contain any character other than #
Based on your description of the problem, you can use the following pattern:
^[^#]+#\d+$
However, this allows any string before the #. You might want to be a bit more restrictive - e.g. no whitespace:
^[^#\s]+#\d+$
Or perhaps only some "whitelisted" characters, e.g.:
^[a-zA-Z0-9_!:.,-]+#\d+$

Regular expression to exclude number from list

I would like to use regex pattern with <input> form element.
In this pattern I would like to put a range of numbers that is not allowed as input.
For instance I would have a list of number {1,4,10} and allowed input is any number except these.
I've managed to create this regex:
[^(1|4|10)]
But that also excludes everything contain 0,1 or 4 such as 10.
If negative lookaheads be allowed, then you can try the following regex:
^(?!(?:1|4|10)$)\d+$
Regex101
You don't need to use a character class here (i.e. the [ ]), because the | already means 'this character or that character'.
Instead, use:
^(1|4|10)$
The ^ matches the start of a string, and the $ matches the end of the string, so this will only match 1 (with nothing else), 4 (with nothing else) or 10 (with nothing else).
By the way, to test regex, you can use a online tester such as https://regex101.com/.

problems using replaceText for special characters: [ ]

I want to replace "\cite{foo123a}" with "[1]" and backwards. So far I was able to replace text with the following command
body.replaceText('.cite{foo}', '[1]');
but I did not manage to use
body.replaceText('\cite{foo}', '[1]');
body.replaceText('\\cite{foo}', '[1]');
Why?
The back conversion I cannot get to work at all
body.replaceText('[1]', '\\cite{foo}');
this will replace only the "1" not the [ ], this means the [] are interpreted as regex character set, escaping them will not help
body.replaceText('\[1\]', '\\cite{foo}');//no effect, still a char set
body.replaceText('/\[1\]/', '\\cite{foo}');//no matches
The documentation states
A subset of the JavaScript regular expression features are not fully supported, such as capture groups and mode modifiers.
Can I find a full description of what is supported and what not somewhere?
I'm not familiar with Google Apps Script, but this looks like ordinary regular expression troubles.
Your second conversion is not working because the string literal '\[1\]' is just the same as '[1]'. You want to quote the text \[1\] as a string literal, which means '\\[1\\]'. Slashes inside of a string literal have no relevant meaning; in that case you have written a pattern which matches the text /1/.
Your first conversion is not working because {...} denotes a quantifier, not literal braces, so you need \\\\cite\\{foo\\}. (The four backslashes are because to match a literal \ in a regular expression is \\, and to make that a string literal it is \\\\ — two escaped backslashes.)

iPhone: Decode characters like \U05de

I used SBJsonParser to parse a json string.
inside, instead of hebrew chars, I got a string full of chars in a form like \U05de
what would be the best way to decode these back to hebrew chars,
so i can put these on controls like UIFieldView?
Eventually I ran a loop iterating in the string for the chars \u
in the loop, when detected such a substring, i took a range of 6 characters since that index,
giving me a substring for example \u052v that need to be fixed.
on this string, i ran the method [str JSONValue], which gave me the correct char, then i simply replaced all occurrences of \u052v (for example) with the latter corrected char.