Autohokey: Change clipboard format from html to text - html

How can I switch back to rich text after the several regex at the end of my script ( so I could past rich text and not HTML)?
This script copies a selected text, retrieve the HTML, format it (remove/change some html tags). So when I past the clipboard at the end, I have an html format. How can I rechange the format of the clipboard to get a text format when I past: I would like to be able to paste the clipboard in MSWord for exemple instead of only in a html editor.
The actual code = text format → html format
The code I am looking for is : text format → html format→ text format
The code I am looking for could seem strange but I need to access the html tags to format them.
!^+k:: ;Alt(!) CTRL(^) shift(+) AND c
clipboard =
Send, {CTRLDOWN}c{CTRLUP}{ESC}
ClipWait
; Change clipboard content from text to html with tag
ClipboardGet_HTML( byref Data ) { ; http://www.autohotkey.com/forum/viewtopic.php?p=392624#392624
If CBID := DllCall( "RegisterClipboardFormat", Str,"HTML Format", UInt )
If DllCall( "IsClipboardFormatAvailable", UInt,CBID ) <> 0
If DllCall( "OpenClipboard", UInt,0 ) <> 0
If hData := DllCall( "GetClipboardData", UInt,CBID, UInt )
DataL := DllCall( "GlobalSize", UInt,hData, UInt )
, pData := DllCall( "GlobalLock", UInt,hData, UInt )
, Data := StrGet( pData, dataL, "UTF-8" )
, DllCall( "GlobalUnlock", UInt,hData )
DllCall( "CloseClipboard" )
Return dataL ? dataL : 0
}
If ClipboardGet_HTML( Data ){
; MsgBox, % Data
clipboard = %Data%
; parse HTML to remove tag attributes - bcoz I want to apply a style on <span id="textmark... I first replace them with a unique string that enable me to personalize their style after the parsing
HHSpanid := RegExReplace(clipboard, "<span id=""textmark", "TO2BE2REPLACED$0")
HHSpanidclass := RegExReplace(HHSpanid, "<span class=""textmark", "TO2BE2REPLACED$0")
Replacehtmlmarker := RegexReplace(HHSpanidclass, "<(p|span|div|img|h1|h2|h3|h4|h5|h6|h7|a|label|blockquote|form|svg|path|input|header|sup|br|iframe|button|time|nav)\K [^>]+(?=>)")
RemoveImg := RegExReplace(Replacehtmlmarker, "<img>", "")
ReplaceHHSpan := RegExReplace(RemoveImg, "TO2BE2REPLACED<span>", "<span style=""color: black;background-color: #ffff00;"">")
clipboard = %ReplaceHHSpan%
ClipWait
return
}
Else SoundBeep
Thanks a lot for your help!

Check out the WinClip class.
Specifically SetHTML( html, source = "" ) and Paste( plainText = "" ).

Related

Replace HTML escape sequence with its single character equivalent in C

My program is loading some news article from the web. I then have an array of html documents representing these articles. I need to parse them and show on the screen only the relevant content. That includes converting all html escape sequences into readable symbols. So I need some function which is similar to unEscape in JavaScript.
I know there are libraries in C to parse html.
But is there some easy way to convert html escape sequences like & or ! to just & and !?
This is something that you typically wouldn't use C for. I would have used Python. Here are two questions that could be a good start:
What's the easiest way to escape HTML in Python?
How do you call Python code from C code?
But apart from that, the solution is to write a proper parser. There are lots of resources out there on that topic, but basically you could do something like this:
parseFile()
while not EOF
ch = readNextCharacter()
if ch == '\'
readNextCharacter()
elseif ch == '&'
readEscapeSequence()
else
output += ch
readEscapeSequence()
seq = ""
ch = readNextCharacter();
while ch != ';'
seq += ch
ch = readNextCharacter();
replace = lookupEscape(seq)
output += replace
Note that this is only pseudo code to get you started
Just wrote and tested a version that does this (crudely). Didn't take long.
You'll want something like this:
typedef struct {
int gotLen; // save myriad calls to strlen()
char *got;
char *want;
} trx_t;
trx_t lut[][2] = {
{ 5, "&", "&" },
{ 5, "!", "!" },
{ 8, "†", "*" },
};
const int nLut = sizeof lut/sizeof lut[0];
And then a loop with two pointers that copies characters within the same buf, sniffing for the '&' that triggers a search of the replacement table. If found, copy the replacement string to the destination and advance the source pointer to skip past the HTML token. If not found, then the LUT may need additional tokens.
Here's a beginning...
void replace( char *buf ) {
char *pd = buf, *ps = buf;
while( *ps )
if( *ps != '&' )
*pd++ = *ps++;
else {
// EDIT: Credit #Craig Estey
if( ps[1] == '#' ) {
if( ps[2] == 'x' || ps[2] == 'X' ) {
/* decode hex value and save as char(s) */
} else {
/* decode decimal value and save as char(s) */
}
/* advance pointers and continue */
}
for( int i = 0; i < nLut; i++ )
/* not giving it all away */
/* handle "found" and "not found" in LUT *
}
*pd = '\0';
}
This was the test program
int main() {
char str[] = "The fox & hound† went for a walk! & chat.";
puts( str );
replace( str );
puts( str );
return 0;
}
and this was the output
The fox & hound† went for a walk! & chat.
The fox & hound* went for a walk! & chat.
The "project" is to write the interesting bit of the code. It's not difficult.
Caveat: Only works when substitution length is shorter or equal to target length. Otherwise need two buffers.

AutoHotkey - Building a Clipboardsaving Function

What I want to do is to build a function, that I can use to paste something (in this case URLs). Normally I would just use the send command or the sendinput command, but they are kind of slow which is a little bit annoying. That's why I want to avoid it and use the clipboard instead.
Here my function:
ClipPaster(CustomClip){
ClipSaved := ClipboardAll ;Saving the current clipboard
Clipboard := %CustomClip% ;Overwriting the current clipboard
Send, ^{v}{Enter} ;pasting it into the search bar
Clipboard := Clipsaved ;Recovering the old clipboard
}
Here how I'm using the function:
RAlt & b::
Send, ^{t} ;Open a new tab
ClipPaster("chrome://settings/content/images") ;Activating my clipboard
return
RAlt & g::
Send, ^{t} ;Open a new tab
ClipPaster("https://translate.google.com/#en/es/violin") ;Activating
my clipboard function
return
Then when I'm trying to use the function. I get an error:
Error: The following variable name contains an illegal character: "chrome://settings/content/images"
Line:
-->1934: Clipboard := %CustomClip%
What am I doing wrong here?
You get this error message because
Variable names in an expression are NOT enclosed in percent signs.
https://www.autohotkey.com/docs/Variables.htm#Expressions
ClipPaster(CustomClip){
ClipSaved := ClipboardAll ; Saving the current clipboard
Clipboard := "" ; empty the clipboard (start off empty to allow ClipWait to detect when the text has arrived)
; Variable names in an expression are NOT enclosed in percent signs:
Clipboard := CustomClip ; Overwriting the current clipboard
ClipWait 1 ; wait max. 1 second for the clipboard to contain data
if (!ErrorLevel) ; If NOT ErrorLevel clipwait found data on the clipboard
Send, ^v{Enter} ; pasting it into the search bar
Sleep, 300
Clipboard := Clipsaved ; Recovering the old clipboard
ClipSaved := "" ; Free the memory
}
https://www.autohotkey.com/docs/misc/Clipboard.htm#ClipboardAll

Remove All Unnecessary Whitespaces from JSON String with regex (in AutoHotKey)

How do I remove ALL unnecessary white-spaces from a JSON String (in AutoHotkey)?
I assume that I need to use regExReplace with some clever regex in order to NOT touch the white-spaces that are part of the values.
A simple example would be:
Before:
g_config :=
{
FuzzySearch:{
enable: true,
keysMAXperEntry : 6,
o = {
keyString: "Hello World"
}
}
} ;
After: g_config:={FuzzySearch:{enable:true,keysMAXperEntry:6,o={keyString:"Hello World"}}};
Basically, I'm looking for a way to minify and pack the string as tight as possible without changing any data.
first I tried searching [\n]+ and replace with "" (nothing). Developed here:
https://www.regextester.com/?fam=106988
the same here
https://regex101.com/r/dZnHaZ/1
Best try: Then I reused this
https://www.codeproject.com/Questions/1230349/Remove-extra-space-in-json-string
to:
https://regex101.com/r/EYFHy9/4
Problem: this regEx also removes the spaces in a value.
Is it even better to do?
Along the lines of what #Aaron mentioned, here is some slow-as-hell AHK code that will look at each individual character and remove it if it's a space or line break, except between quotes. It starts at your cursor and ends once there is nothing left to copy (or rather, one second after).
;;; For speed? Maybe?? ...It's still slow :(
ListLines Off
SetBatchLines , -1
SetKeyDelay , -1 , -1
SetMouseDelay , -1
SetDefaultMouseSpeed , 0
SetWinDelay , -1
SetControlDelay , -1
SendMode , Input
f1::
Loop
{
clipboard := ""
Send , +{right}^c
ClipWait , 1
If ErrorLevel
Break
bT := ( clipboard = """" ) ? !bT : bT
Send , % ( !bT && ( clipboard = "`r`n" || clipboard = A_Space || clipboard = A_Tab )) ? "{del}" : "{right}"
}
Return

Generating CSV files from Javascript with UTF-8 encoding and strings containing line breaks

I have a javascript program that can export HTML data tables to CSV.
This data is UTF-8 encoded so that it can contain a wide variety of character sets (English, French, Arabic, Japanese...).
The contents of particular cells in the data tables often contains line breaks
Before writing to a file I apply the following function to the contents of each table cell
function OutputFilter (Text)
{
Text = Text.replace (/<br>/g , "\n"); // replace line breaks with new line
from = new RegExp ("<[^>]+>","g"); // replace any remaining HTML tags
to = "" ;
Text = Text.replace (from,to);
Text = Text.replace (/"/g, '"' ); // unescape all double quotes
Text = Text.replace (/&apos;/g, "'" ); // unescape all single quotes
Text = Text.replace (/</g , "<" ); // unescape all <
Text = Text.replace (/>/g , ">" ); // unescape all >
Text = Text.replace (/&/g , "&" ); // unescape all ampersands
var allQuotes = new RegExp ('"',"g"); //Text = Text.replace (/"/g, '""');
var twoQuotes = '""' ;
Text = Text.replace (allQuotes, twoQuotes); // duplicate all double quotes
var commaInText = Text.indexOf ("," ) != -1;
var quoteInText = Text.indexOf ('"' ) != -1;
var newlineInText = Text.indexOf ("\n") != -1;
var commasRequired = commaInText || quoteInText || newlineInText ;
if (commasRequired) {return '"' + Text + '"' } else {return Text};
};
Now I am fairly sure I am heading in the correct direction because Mac Numbers reads the output CSVs correctly (or as I think they should be read).
However Excel for Mac behaves differently, as follows
If I double click to open the file, it recognises the converted line breaks (to "\n"), but does not recognise the UTF-8 encoding
If I import the data using the File/Import dialog, it recognises the UTF-8 encoding, but does not recognise the converted line breaks.
I am happy to just use MacOS Numbers but as half the world or more use Excel, I would like to be able to generate a CSV file that worked in Excel.
All help gratefully received
Thanks

How could I, in Windows, designate a file of a type like GIF to be opened in MSIE via a specific HTML doc?

Windows "viewers" (like Windows Live Photo Gallery or Windows Photo Viewer) have not supported GIF animation since the days of Windows XP. The handiest way I know now to view animation of a GIF is to open it with MSIE -- but THAT, unlike Windows Photo Viewer, does not let me "scroll" through a directory to view other image files. It occurred to me that I could create a scripted HTML document that would perform that "scrolling" through the directory, but I don't know of a way to set it up so that by right-clicking an animated GIF file in my "Recent Items" (or elsewhere), and selecting "Open with...", that one of the options in that group would be the HTML doc I had created, to be opened in MSIE and given the name of the file I had right-clicked on (in the location.search property, for example), so that it would display THAT animated GIF initially, but then, by my script in the HMTL document, would let me scroll through the directory to view other image files as well. Also, I would want this option to be available for any type of image file, so that I could initially view, say, a JPEG file, but then subsequent "directory scrolling" could include GIFs or BMPs, etc. IS there a way to do that?
As the saying goes, "Don't get me started!" :)
I hadn't actually planned on having the batch write to the HTML file, but given that approach, I decided to put my javascript into a JS file, and have the batch write code that would reference it, thus:
#echo ^<html^>^<body onkeydown='kdn(event.keyCode)'^>^<span id='im'^>^<img style='display:none' src=%1^>^</span^>^<script src='c:/wind/misc/peruse.js'^>^</script^> > c:\wind\misc\peruse.htm
#start c:\wind\misc\peruse.htm
I found that the only way to handle the backslashes in %1 was to store it directly to an img src, as you did; however, I wanted more detailed code for the img than I wanted to write at this stage, so I set it to be invisible and placed it inside an id'd span for later elaboration by my script. It's nice to know about %~p1 but I don't really need it here.
And here is a rudimentary script (in peruse.js) for folder navigation that it calls up:
document.bgColor = 'black';
f = ('' + document.images[0].src).substr(8);
document.getElementById('im').innerHTML = '<table height=100% width=100% cellspacing=0 cellpadding=0><tr><td align="center" valign="middle"><img src="' + f + '" onMouseDown="self.focus()"></td></tr></table>';
fso = new ActiveXObject("Scripting.FileSystemObject");
d = fso.GetFolder(r = f.substr(0, (b = f.lastIndexOf('/')) + (b < 3)));
if(b > 2) r += '/';
b = (document.title = f.substr(++b)).toLowerCase();
for(n = new Enumerator(d.files) , s = [] , k = -1 , x = '..jpg.jpeg.gif.bmp.png.art.wmf.'; !n.atEnd(); n.moveNext()) {
if(x.indexOf((p = n.item().name).substr(p.lastIndexOf('.') + 512 & 511).toLowerCase() + '.') > 0) {
s[++k] = p.toLowerCase() + '*' + p
}
}
for(s.sort() , i = 0 , j = k++ , status = k + ' file' + (k > 1 ? 's' : '') , z = 100; (x = s[n = (i + j) >> 1].substr(0, s[n].indexOf('*'))) != b; ) {
x < b ? i = (i == n) + n : j = n
}
document.title = (n + 1) + ': ' + document.title;
function kdn(e, x) {
if(k > 1 && ((x = e % 63) == 37 || x == 39)) {
document.images[0].src = r + (x = s[n = (n + x - 38 + k) % k].substr(s[n].indexOf("*") + 1));
e = 12;
document.title = (n + 1) + ': ' + x;
setTimeout("status+=''", 150)
};
if(e == 12 || e == 101 || e == 107 || e == 109) {
document.images[0].style.zoom = (z = e < 107 ? 100 : e == 107 ? z * 1.2 : z / 1.2) + '%'
}
}
self.focus()
It sets the page background to black,
recovers the path-and-filename into f (with the problematical backslashes converted to forward slashes),
sets up table code so the image appears in the center of the window,
accesses the filesystemobject, and, with the path portion extracted from f into r,
sets the page title to just the filename (with the lowercase name stored to b),
and iterates the folder, checking for any image file,
creates an array s of all those files, with names in lowercase followed by their original case-format,
sorts the array case-blind, and binary-searches the array for the original file (as b) so it knows where to proceed from,
and prefixes the number-within-folder to the page title;
then the keydown function uses the left and right arrows to move backward and forward in the folder, with wraparound,
and uses the numpad+ and - to enlarge or shrink the image, and numpad-5 to reset the size (which also occurs for every new image).
It still remains, though, that I'd like to know of a way to simply pass the original %1 info to an HTML file, without writing a file in the process. I might expect it to be a way to have it "appended to the web address", as is done with info following a ? which gets placed in location.search. I don't know if the command line for iexplore.exe could have a parameter for passing info to location.search.