How to build complex vs code snippet variable transforms? - json

I'm trying to write a code snippet for vs code that takes a given file name, removes a piece of the name and capitalizes the first letter. For example
Input:
example.model.js
Output:
Example
Output im getting:
${TM_FILENAME_BASE/(.*).[model]+$//capitalize//}
I'm able to remove the trailing half of the file name with the following string
"${TM_FILENAME_BASE/(.*)\\.[model]+$/$1/}"
I tried to take this a step further with the following but it doesn't seem to work.
"${TM_FILENAME_BASE/(.*)\\.[model]+$/${1:/capitalize/}/}"
Based on the documentation i'm not sure where I'm going wrong.
https://code.visualstudio.com/docs/editor/userdefinedsnippets#_transform-examples
Any ideas on what I'm missing here? Also are there any tools that could help build these kinds of complex expressions?
Thanks

It looks like i was writing the grammer incorrect adding a trailing slash / the correct way is below
${TM_FILENAME_BASE/(.).\.[model]+$/${1:/capitalize}/};"

With this regex (.*)\\.[model]+$, (.*) captures the whole word.
For eg, it will capture example in example.model.js and thus, capitalize it as EXAMPLE
You need to capture only the first character like so:
"${TM_FILENAME_BASE/(.).*\\.[model]+$/${1:/capitalize/}/}"

Related

Simple macros for HTML

My html file contains in many places the code
It is too short and it doesn't really make sense to replace it with a code like
<span class="three-spaces"></span>
I would like to replace it with something like
##TS##
or
%%TS%%
and the file should start with something like:
SET TS = " "
Is there any way to write the HTML this way? I am not looking for compiling a source file into a HTML. I am looking for a solution that allows directly writing macros into HTML files.
Later edit: I'm coming with another example:
I also need to transform
lnk(http://www.example.com)
into
<a target="_blank" href="http://www.example.com">http://www.example.com</a>
Instead of telling him WHY he should not do something, how about telling him HOW he could do it? Maybe his example is not an appropriate need for it, but there's other situations where being able to create a macro would be nice.
For example... I have an HTML page that I'm working on that deals with unit conversions and quite often, I'm having to type things like "cm/in" as "cm/in" or for volumes "cu-cm/cu-in" as "cm3/in3". It would be really nice from a typing and readability standpoint if I could create macros that were just typed as "%%cm-per-in%%, %%cc-per-cu-in%% or something like that.
So, the line in the 'sed' file might look like this:
s/%%cc-per-cu-in%%/<sup>cm<sup>3<\/sup><\/sup>\/<sub>in<sup>3<\/sup><\/sub>/g
Since the "/" is a field separator for the substitute command, you need to explicitly quote it with the backslash character ("\") within the replacement portion of the substitute command.
The way that I have handled things like this in the past was to either write my own preprocessor to make the changes or if the "sed" utility was available, I would use it. So for this sort of thing, I would basically have a "pre-HTML" file that I edited and after running it through "sed" or the preprocessor, it would generate an HTML file that I could copy to the web server.
Now, you could create a javascript function that would do the text substitution for you, but in my opinion, it is not as nice looking as an actual preprocessor macro substitution. For example, to do what I was doing in the sed script, I would need to create a function that would take as a parameter the short form "nickname" for the longer HTML that would be generated. For example:
function S( x )
{
if (x == "cc-per-cu-in") {
document.write("<sup>cm<sup>3</sup></sup>/<sub>in<sup>3</sup></sub>");
} else if (x == "cm-per-in") {
document.write("<sup>cm</sup>/<sub>in</sub>");
} else {
document.write("<B>***MACRO-ERROR***</B>");
}
}
And then use it like this:
This is a test of cc-per-cu-in <SCRIPT>S("cc-per-cu-in");</SCRIPT> and
cm-per-in <SCRIPT>S("cm-per-in");</SCRIPT> as an alternative to sed.
This is a test of an error <SCRIPT>S("cc-per-in");</SCRIPT> for a
missing macro substitution.
This generates the following:
This is a test of cc-per-cu-in cm3/in3
and cm-per-in cm/in as an alternative to sed. This is a test of an error MACRO-ERROR for a missing macro substitution.
Yeah, it works, but it is not as readable as if you used a 'sed' substitution.
So, decide for yourself... Which is more readable...
This...
This is a test of cc-per-cu-in <SCRIPT>S("cc-per-cu-in");</SCRIPT> and
cm-per-in <SCRIPT>S("cm-per-in");</SCRIPT> as an alternative to sed.
Or this...
This is a test of cc-per-cu-in %%cc-per-cu-in%% and
cm-per-in %%cm-per-in% as an alternative to sed.
Personally, I think the second example is more readable and worth the extra trouble to have pre-HTML files that get run through sed to generate the actual HTML files... But, as the saying goes, "Your mileage may vary"...
EDITED: One more thing that I forgot about in the initial post that I find useful when using a pre-processor for the HTML files -- Timestamping the file... Often I'll have a small timestamp placed on a page that says the last time it was modified. Instead of manually editing the timestamp each time, I can have a macro (such as "%%DATE%%", "%%TIME%%", "%%DATETIME%%") that gets converted to my preferred date/time format and put in the file.
Since my background is in 'C' and UNIX, if I can't find a way to do something in HTML, I'll often just use one of the command line tools under UNIX or write a small 'C' program to do it. My HTML editing is always in 'vi' (or 'vim' on the PC) and I find that I am often creating tables for alignment of various portions of the HTML page. I got tired of typing all the TABLE, TR, and TD tags, so I created a simple 'C' program called 'table' that I can execute via the '!}' command in 'vi', similar to how you execute the 'fmt' command in 'vi'. It takes as parameters the number of rows & columns to create, whether the column cells are to be split across two lines, how many spaces to indent the tags, and the column widths and generates an appropriately indented TABLE tag structure. Just a simple utility, but saves on the typing.
Instead of typing this:
<TABLE>
<TR>
<TD width=200>
</TD>
<TD width=300>
</TD>
</TR>
<TR>
<TD>
</TD>
<TD>
</TD>
</TR>
<TR>
<TD>
</TD>
<TD>
</TD>
</TR>
</TABLE>
I can type this:
!}table -r 3 -c 2 -split -w 200 300
Now, with respect to the portion of the original question about being able to create a macro to do HTML links, that is also possible using 'sed' as a pre-processor for the HTML files. Let's say that you wanted to change:
%%lnk(www.stackoverflow.com)
to:
www.stackoverflow.com
you could create this line in the sed script file:
s/%%lnk(\(.*\))/<a href="\1">\1<\/a>/g
'sed' uses regular expressions and they are not what you might call 'pretty', but they are powerful if you know what you are doing.
One slight problem with this example is that it requires the macro to be on a single line (i.e. you cannot split the macro across lines) and if you call the macro multiple times in a single line, you get a result that you might not be expecting. Instead of doing the macro substitution multiple times, it assumes the argument to the macro starts with the first '(' of the first macro invocation and ends with the last ')' of the last macro invocation. I'm not a sed regular expression expert, so I haven't figured out how to fix this yet. For the multiple line portion though, a possible fix would be to replace all the LF characters in the file with some other special character that would not normally be used, run sed on that result, and then convert the special characters back to LF characters. Of course, the problem there is that the entire file would be a single line and if you are invoking the macro, it is going to have the results that I described above. I suspect awk would not have that problem, but I have never had a need to learn awk.
Upon further reflection, I think there might be an easier solution to both the multi-line and multiple invocation of a macro on a single line -- the 'm4' macro preprocessor that comes with the 'C' compiler (e.g. gcc). I haven't tested it much to see what the downside might be, but it seems to work well enough for the tests that I have performed. You would define a macro as such in your pre-HTML file:
define(`LNK', `$1')
And yeah, it does use the backwards single quote character to start the text string and the normal single quote character to end the text string.
The only problem that I've found so far is that is that for the macro names, it only allows the characters 'A'-'Z', 'a'-'z', '0'-'9', and '' (underscore). Since I prefer to type '-' instead of '', that is a definite disadvantage to me.
Technically inline JavaScript with a <script> tag could do what you are asking. You could even look into the many templating solutions available via JavaScript libraries.
That would not actually provide any benefit, though. JavaScript changes what is ultimately displayed, not the file itself. Since your use case does not change the display it wouldn't actually be useful.
It would be more efficient to consider why is appearing in the first place and fix that.
This …
My html file contains in many places the code
… is actually what is wrong in your file!
is not meant to use for layout purpose, you should fix that and use CSS instead to layout it correctly.
is meant to stop breaking words at the end of a line that are seperated by a space. For example numbers and their unit: 5 liters can end up with 5 at the end of the line and liters in the next line (Example).
To keep that together you would use 5 liters. That's what you use for and nothing else, especially not for layout purpose.
To still answer your question:
HTML is a markup language not a programming language. That means it is descriptive/static and not functional/dynamic. If you try to generate HTML dynamically you would need to use something like PHP or JavaScript.
Just an observation from a novice. If everyone did as purists suggest (i.e.-the right way), then the web would still be using the same coding conventions it was using 30 years ago. People do things, innovate, and create new ways, then new standards, and deprecate others all the time. Just because someone says "spaces are only for separating words...and nothing else" is silly. For many, many years, when people typed letters, they used one space between words, and two spaces between end punctuation and the next sentence. That changed...yeah, things change. There is absolutely nothing wrong with using spaces and non-breaking spaces in ways which assist layout. It is neither useful nor elegant for someone to use a long span with style over and over and over, rather than simple spaces. You can think it is, and your club of do it right folks might even agree. But...although "right", they are also being rather silly about it. Question: Will a page with 3 non-breaking spaces validate? Interesting.

Regex has unexpected results. I am new to this, fair warning

The following regex does match what I am looking for, but it will also match all file extensions (just the file extensions) of anything ending with gif|jpg|png
webcomic"\ssrc="http://www\.explosm\.net/[a-zA-Z/]+\.gif|png|jpg"\s
I am using it on the source of the following page, which is a webcomic that is updated daily:
http://www.explosm.net/comics/
Today, the end goal would be the following, and only the following:
webcomic" src="http://www.explosm.net/db/files/Comics/Kris/lawyer.gif"
I'm just getting my feet wet with regex, have browsed a few websites but can't figure this one out. I don't get why just the file extensions are getting matched, when their file paths/urls do not match the rest of my pattern.
Any help appreciated
Well, the problem that jumps right out at me is the end there. gif|png|jpg should really be (gif|jpg|png) - with what you have now, the string can match webcomic"\ssrc="http://www\.explosm\.net/[a-zA-Z/]+\.gif, or it can match just png or jpg"\s. With the parentheses, it will match webcomic"\ssrc="http://www\.explosm\.net/[a-zA-Z/]+\. followed by (gif or jpg or png), and then followed by "\s.
That last bit
gif|png|jpg
means "match any of the three". If want it to match just gif, write just gif.
I'd try a regex like this:
\shttp://www.explosm.net\/[a-zA-Z]+\.(gif|png|jpg|jpeg)\s

extracting double quotes from html tags with a regex

I'm extracting some content from a website with this pattern:
([^+]+)
and it outputs
< img src=""http://www."" border=""0""/>
with double quotes. What is wrong with my query?
your problem only makes sense if you modify your regexp.
but first of all, beware:
in general, what you try to achieve is not feasible using regexes. they are the inappropriate tool to do it. you will not come up with a solution 100% correct using regexes.
having said this, try to replace ([^+]+) with (([^<!--]+([^<]|<[^!]|<![^-]|<!-[^-]))+). note that this regex assumes the following:
there are no html comments inside the message portion
there are no strings containing html comment openings inside the message portion
the message portion is a valid html fragment
(otherwise it would match eg. <!-<!-- / message -->)
you have been warned.
btw, the dquote doubling must be a standard escape mechanism of the imacro environment.

Trademark symbol is displayed as raw text

if you visit www.startwire.com you'll see in the center of the page (in the yellow box, under the video) the following:
StartWire™
in our dev and stage environments, this is not an issue, but it is in production. What could possibly be causing this?
If you look at the page source, you will see &trade; - you are double encoding the entity.
This should be simply ™.
In the HTML you have:
<h2>Sign-up now. StartWire&trade; is completely FREE.</h2>
whereas the correct would be:
<h2>Sign-up now. StartWire™ is completely FREE.</h2>
Notice the extraneous &. Look like you are double encoding something on the server.
If you check your page source it says:
&trade;
This means that probably it took ™ and transformed that into HTML. So the & becomes &. This is probably due to the use of a htmlentities() function.
Make sure you do not do this conversion twice...
A possible cause of this is that you are taking the contents from a database and that you have encoded the entries before inserting them into the database and you encode them a second time when you retrieve them from this database.
Is the content being "HTML encoded" (or whatever they call it) automatically, somewhere in the script? Because this is what appears in the HTML: &trade;.
My suggestions would be to just use the symbol in your code (™). If that doesn't work, try escaping the & of ™ using \ (so that it becomes \™).
not sure, but i have checked your site it shows like you have write like
&™
simple write ™

How can I extract the HREF value from an HTML link?

My text file contains 2 lines:
<IMG SRC="/icons/folder.gif" ALT="[DIR]"> yahoo.com.jp/
</PRE><HR>
In my Perl script, I have:
my $String =~ /.*(HREF=")(.*)(">)/;
print "$2";
and my output is the following:
Output 1: yahoo.com.jp
Output 2: ><HR>
What I am trying to achieve is have my Perl script automatically extract the string inside the <A Href="">
As I am very new to regex, I want to ask if my regex is a badly formed one? If so can someone provide some suggestion to make it look nicer?
Secondly, I do not know why my second output is "><HR>", I thought the expected behavior is that output2 will be skipped since it does not contain HREF=". Obviously I am very wrong.
Thanks for the help.
To answer your specific question about why your regex isn't working, you're using .*, which is "greedy" - it will by default match as much as you can. Alternatives would be using the non-greedy form, .*?, or be a bit more exacting about what you're trying to match. For instance, [^"]* will match anything that's not a double quote, which seems to be what you're looking for.
But yes, the other posters are correct - using regular expressions to do anything non-trivial in HTML parsing is a recipe for disaster. Technically you can do it properly, especially in Perl 5.10 (which has more advanced regular expression features), but it's usually not worth the headache.
Using regular expressions to parse HTML works just often enough to lull you into a false sense of security. You can get away with it for simple cases where you control the input but you're better off using something like HTML::Parser instead.
If I may, I'd like to suggest the simplest way of doing this (it may not be the fastest or lightest-weight way): HTML::TreeBuilder::XPath
It gives you the power of XPath in non-well-formed HTML.
use HTML::TreeBuilder::XPath;
my $tree= HTML::TreeBuilder::XPath->new_from_file( 'D:\Archive\XPath.pm.htm' );
my #hrefs = $tree->findvalues( '//div[#class="noprint"]/a/#href');
print "The links are: ", join( ',', #hrefs ), "\n";
When trying to match against HTML (or XML) with a regex you have to be careful about using . Rarely ever do you want a . because start is a greedy modifier that will match as far as it can. as Gumbo showed use the character class specifier [^"]* to match all characters except a quote. This will match till the end quote. You may also want to use something similar for matching the angle bracket. Try this:
/HREF="([^"]*)"[^>]*>/i
That should match much more consistently.