Remove All Unnecessary Whitespaces from JSON String with regex (in AutoHotKey) - json

How do I remove ALL unnecessary white-spaces from a JSON String (in AutoHotkey)?
I assume that I need to use regExReplace with some clever regex in order to NOT touch the white-spaces that are part of the values.
A simple example would be:
Before:
g_config :=
{
FuzzySearch:{
enable: true,
keysMAXperEntry : 6,
o = {
keyString: "Hello World"
}
}
} ;
After: g_config:={FuzzySearch:{enable:true,keysMAXperEntry:6,o={keyString:"Hello World"}}};
Basically, I'm looking for a way to minify and pack the string as tight as possible without changing any data.
first I tried searching [\n]+ and replace with "" (nothing). Developed here:
https://www.regextester.com/?fam=106988
the same here
https://regex101.com/r/dZnHaZ/1
Best try: Then I reused this
https://www.codeproject.com/Questions/1230349/Remove-extra-space-in-json-string
to:
https://regex101.com/r/EYFHy9/4
Problem: this regEx also removes the spaces in a value.
Is it even better to do?

Along the lines of what #Aaron mentioned, here is some slow-as-hell AHK code that will look at each individual character and remove it if it's a space or line break, except between quotes. It starts at your cursor and ends once there is nothing left to copy (or rather, one second after).
;;; For speed? Maybe?? ...It's still slow :(
ListLines Off
SetBatchLines , -1
SetKeyDelay , -1 , -1
SetMouseDelay , -1
SetDefaultMouseSpeed , 0
SetWinDelay , -1
SetControlDelay , -1
SendMode , Input
f1::
Loop
{
clipboard := ""
Send , +{right}^c
ClipWait , 1
If ErrorLevel
Break
bT := ( clipboard = """" ) ? !bT : bT
Send , % ( !bT && ( clipboard = "`r`n" || clipboard = A_Space || clipboard = A_Tab )) ? "{del}" : "{right}"
}
Return

Related

Replace HTML escape sequence with its single character equivalent in C

My program is loading some news article from the web. I then have an array of html documents representing these articles. I need to parse them and show on the screen only the relevant content. That includes converting all html escape sequences into readable symbols. So I need some function which is similar to unEscape in JavaScript.
I know there are libraries in C to parse html.
But is there some easy way to convert html escape sequences like & or ! to just & and !?
This is something that you typically wouldn't use C for. I would have used Python. Here are two questions that could be a good start:
What's the easiest way to escape HTML in Python?
How do you call Python code from C code?
But apart from that, the solution is to write a proper parser. There are lots of resources out there on that topic, but basically you could do something like this:
parseFile()
while not EOF
ch = readNextCharacter()
if ch == '\'
readNextCharacter()
elseif ch == '&'
readEscapeSequence()
else
output += ch
readEscapeSequence()
seq = ""
ch = readNextCharacter();
while ch != ';'
seq += ch
ch = readNextCharacter();
replace = lookupEscape(seq)
output += replace
Note that this is only pseudo code to get you started
Just wrote and tested a version that does this (crudely). Didn't take long.
You'll want something like this:
typedef struct {
int gotLen; // save myriad calls to strlen()
char *got;
char *want;
} trx_t;
trx_t lut[][2] = {
{ 5, "&", "&" },
{ 5, "!", "!" },
{ 8, "†", "*" },
};
const int nLut = sizeof lut/sizeof lut[0];
And then a loop with two pointers that copies characters within the same buf, sniffing for the '&' that triggers a search of the replacement table. If found, copy the replacement string to the destination and advance the source pointer to skip past the HTML token. If not found, then the LUT may need additional tokens.
Here's a beginning...
void replace( char *buf ) {
char *pd = buf, *ps = buf;
while( *ps )
if( *ps != '&' )
*pd++ = *ps++;
else {
// EDIT: Credit #Craig Estey
if( ps[1] == '#' ) {
if( ps[2] == 'x' || ps[2] == 'X' ) {
/* decode hex value and save as char(s) */
} else {
/* decode decimal value and save as char(s) */
}
/* advance pointers and continue */
}
for( int i = 0; i < nLut; i++ )
/* not giving it all away */
/* handle "found" and "not found" in LUT *
}
*pd = '\0';
}
This was the test program
int main() {
char str[] = "The fox & hound† went for a walk! & chat.";
puts( str );
replace( str );
puts( str );
return 0;
}
and this was the output
The fox & hound† went for a walk! & chat.
The fox & hound* went for a walk! & chat.
The "project" is to write the interesting bit of the code. It's not difficult.
Caveat: Only works when substitution length is shorter or equal to target length. Otherwise need two buffers.

Need guidance and tips on oxmlelement

I am currently manipulating a word document using Python-docx and I would like to standardize every table/figure caption to either match with their corresponding heading or just increment them based on user choice.
I am currently stuck with the field code as I need to also update the field automatically. Below is a sample field code which I have from the document and would like to achieve
Table { STYLEREF 1 \s }-{ SEQ Table \* ARABIC \s 1} Random Heading Name
I have referenced from this github link
paragraph = document.add_paragraph('Table ', style='Caption')
run = run = paragraph.add_run()
r = run._r
fldChar = OxmlElement('w:fldChar')
fldChar.set(qn('w:fldCharType'), 'begin')
r.append(fldChar)
instrText = OxmlElement('w:instrText')
instrText.text = ' STYLEREF 1 \s '
r.append(instrText)
fldChar = OxmlElement('w:fldChar')
fldChar.set(qn('w:fldCharType'), 'end')
r.append(fldChar)
instrText = OxmlElement('w:instrText')
instrText.set(qn('xml:space'), 'preserve') # sets attribute on element
r.append(instrText)
fldChar = OxmlElement('w:fldChar')
fldChar.set(qn('w:fldCharType'), 'begin')
r.append(fldChar)
instrText = OxmlElement('w:instrText')
instrText.text = ' SEQ Table \* ARABIC \s 1'
r.append(instrText)
fldChar = OxmlElement('w:fldChar')
fldChar.set(qn('w:fldCharType'), 'end')
r.append(fldChar)
I have tried using preserver and separate but they can't seem to get "-" which I need sitting in between the 2 field code.
So right now, my end product is:
Table { STYLEREF 1 \s }{ SEQ Table \* ARABIC \s 1} Random Heading Name

SSRS Insert Space Between Numeric and Alpha Characters

I am having an issue where a field is stored in our database as '##ABC' with no space between the number and letters. The number can be anything from 1-100 and the letters can be any combination, so no consistency of beginning letter or numeric length.
I am trying to find a way to insert a space between the number and letters.
For example, '1DRM' would transform to '1 DRM'
'35PLT' would transform to '35 PLT'
Does anyone know of a way to accomplish this?
You can use regular expressions like the one below (assuming your pattern is digits-characters)
= System.Text.RegularExpressions.Regex.Replace( Fields!txt.Value, "(\d)(\D)", "$1 $2")
Unfortunately, there's no built in function to do this.
Fortunately, Visual Studio lets you create functions to help with things like this.
You can add Visual BASIC custom code by going to the Report Properties and going to the Custom Code tab.
You would just need to write some code to go through some text input character by character. If it finds a number and a letter in the next character, add a space.
Here's what I wrote in a few minutes that seems to work:
Function SpaceNumberLetter(ByVal Text1 AS String) AS String
DIM F AS INTEGER
IF LEN(Text1) < 2 THEN GOTO EndFunction
F = 1
CheckCharacter:
IF ASC(MID(Text1, F, 1)) >= 48 AND ASC(MID(Text1, F, 1)) <=57 AND ASC(MID(Text1, F + 1, 1)) >= 65 AND ASC(MID(Text1, F + 1, 1)) <=90 THEN Text1 = LEFT(Text1, F) + " " + MID(Text1, F+1, LEN(Text1))
F = F + 1
IF F < LEN(Text1) THEN GOTO CheckCharacter
EndFunction:
SpaceNumberLetter = Text1
End Function
Then you call the function from your text box expression:
=CODE.SpaceNumberLetter("56EF78GH12AB34CD")
Result:
I used text to test but you'd use your field.

Autohokey: Change clipboard format from html to text

How can I switch back to rich text after the several regex at the end of my script ( so I could past rich text and not HTML)?
This script copies a selected text, retrieve the HTML, format it (remove/change some html tags). So when I past the clipboard at the end, I have an html format. How can I rechange the format of the clipboard to get a text format when I past: I would like to be able to paste the clipboard in MSWord for exemple instead of only in a html editor.
The actual code = text format → html format
The code I am looking for is : text format → html format→ text format
The code I am looking for could seem strange but I need to access the html tags to format them.
!^+k:: ;Alt(!) CTRL(^) shift(+) AND c
clipboard =
Send, {CTRLDOWN}c{CTRLUP}{ESC}
ClipWait
; Change clipboard content from text to html with tag
ClipboardGet_HTML( byref Data ) { ; http://www.autohotkey.com/forum/viewtopic.php?p=392624#392624
If CBID := DllCall( "RegisterClipboardFormat", Str,"HTML Format", UInt )
If DllCall( "IsClipboardFormatAvailable", UInt,CBID ) <> 0
If DllCall( "OpenClipboard", UInt,0 ) <> 0
If hData := DllCall( "GetClipboardData", UInt,CBID, UInt )
DataL := DllCall( "GlobalSize", UInt,hData, UInt )
, pData := DllCall( "GlobalLock", UInt,hData, UInt )
, Data := StrGet( pData, dataL, "UTF-8" )
, DllCall( "GlobalUnlock", UInt,hData )
DllCall( "CloseClipboard" )
Return dataL ? dataL : 0
}
If ClipboardGet_HTML( Data ){
; MsgBox, % Data
clipboard = %Data%
; parse HTML to remove tag attributes - bcoz I want to apply a style on <span id="textmark... I first replace them with a unique string that enable me to personalize their style after the parsing
HHSpanid := RegExReplace(clipboard, "<span id=""textmark", "TO2BE2REPLACED$0")
HHSpanidclass := RegExReplace(HHSpanid, "<span class=""textmark", "TO2BE2REPLACED$0")
Replacehtmlmarker := RegexReplace(HHSpanidclass, "<(p|span|div|img|h1|h2|h3|h4|h5|h6|h7|a|label|blockquote|form|svg|path|input|header|sup|br|iframe|button|time|nav)\K [^>]+(?=>)")
RemoveImg := RegExReplace(Replacehtmlmarker, "<img>", "")
ReplaceHHSpan := RegExReplace(RemoveImg, "TO2BE2REPLACED<span>", "<span style=""color: black;background-color: #ffff00;"">")
clipboard = %ReplaceHHSpan%
ClipWait
return
}
Else SoundBeep
Thanks a lot for your help!
Check out the WinClip class.
Specifically SetHTML( html, source = "" ) and Paste( plainText = "" ).

Eliminate html tags from values

I'm trying to eliminate HTML tags from a value displayed in an ssrs report.
My solution came to:
=(new System.Text.RegularExpressions.Regex("<[^>]*>")).Replace((new System.Text.RegularExpressions.Regex("< STYLE >. *< /STYLE >")).Replace(Fields!activitypointer1_description.Value,""),"")
The problem is that the second expression ("< STYLE >. *< /STYLE >" without the spaces) which should be executed first doesn't do anything. The result contains the styles from the html without the tags attached.
I'm out of ideas.
C
You need to add RegexOptions.Singleline, because by default Regular expressions will stop on newline characters. Here's an example of a console program you can run to verify it:
string decription = #"<b>this is some
text</b><style>and
this is style</style>";
Console.WriteLine(
(new Regex( "<[^>]*>", RegexOptions.IgnoreCase | RegexOptions.Singleline ))
.Replace(
(new Regex( "<STYLE>.*</STYLE>", RegexOptions.IgnoreCase | RegexOptions.Singleline ))
.Replace( decription
, "" )
, "" )
);