Finding a specific link from a site - html

I'm trying to find a specific link from a web page using windows command line and tools. I think Xidel can do what I want to do.
In the page, the link is used like this:
file: 'http://link.link/index.txt'
Note: there's only one line like this. Now if I can set something like
file: '{%link}'
then I'll be able to extract the link. Also if I want to change the word index.txt to something like root.txt and then use aria2 to download the link as http://link.link/root.txt , what do I need to do?
(I don't have any experience with any of these tools/command like scripts, I just wanted to make something that does this (some alternatives are already available but I want to do it myself) and this only. So I did search for it and have an idea on how can I do it but extrating the exact url seems to be the hardest part since I couldn't find anything that might help me in xidel's docs)

Xidel is meant to extract data from HTML/XML/json files, but it can also extract from CSV's and TXT if you know how to use the $raw variable and xidel/xquery functions, like extract(), tokenize() and replace().
Post the URL or the source (or part thereof) of the webpage and I'll see how I can help you.

Related

PhpStorm generate template from code selection

I frequently use PhpStorm's Extract variable & method refactorings. Is there a way to add/extend functionality that could create a new template file from the selected code, prompt for desired template path, and create an include/require statement for that template?
I'm asking either for an entry point into coding this functionality, or extending existing functionality. Or maybe it's already available and I missed it.
As #Ástþór mentioned, there is no such way to change the refactoring templates.
You can use surround with live templates to emulate this behavior. This will not find duplicates and will not replace them as well, but may be it's close enough what you want.
Add a surround live template like this one. Open the editor with Ctrl+Alt+S:
Edit the variables in order to get a nicer UX:
Select the variable you want to extract and select Code > Surround with Live Templates from the menu or press Ctrl+Alt+J.
Adjust the templates to your needs.
Live template variables
HTH
No, there isn't. You can ask this question at https://intellij-support.jetbrains.com/hc/en-us/community/topics/200366979-IntelliJ-IDEA-Open-API-and-Plugin-Development
Other useful sources: https://www.jetbrains.org/intellij/sdk/docs/basics/getting_started.html & https://confluence.jetbrains.com/display/PhpStorm/Setting-up+environment+for+PhpStorm+plugin+development

How to change format of Warning admonition or add Caution in Sphinx HTML output

This seems like it should be straightforward but I've been prowling the documentation and web and haven't found the answer.
I want to output HTML doc from Sphinx. Ideally I'd like to have three levels of "note" type highlighted text boxes. ReST defines several "admonitions": (http://docutils.sourceforge.net/docs/ref/rst/directives.html#admonitions) but most of the Sphinx HTML themes include special formatting only for Note and Warning. (I am using one of the preinstalled themes, Classic.)
I have two questions:
1) How can I customize the color behind Warning in my documents?
2) How can I add a formatting style for Caution?
I see that these all end up with tags like <div class="admonition warning"> ... in the HTML output. But I can't find where the formatting for that class is defined. Is it in a stylesheet? Is it in a layout.html file or some other file?
Is there anything that explains how the various files in themes actually interact with each other? I haven't found a good primer. (I am no expert on css-based HTML either, so maybe that's part of the problem.)
Okay, I figured out more and have a working workaround. (I'm still not sure how I'm supposed to handle this.)
Looks like my HTML code is reading directly from a few cascading stylesheets stored along with the output in a directory called _static. There's classic.css, which inherits from basic.css.
I don't understand how these relate to the files named like basic.css_t that live in the Python Sphinx install.
To change things, should I (A) try altering the _t files? or (B) create an altered local copy of classic.css that lives in my source directory?
If I go with B, more questions.
Will it be overwritten by the values in the css_t template at build time? (I guess this is easy enough to test)
Is it good practice to use the same filename for a modified version of that stylesheet?
Here's a workaround that avoids those questions and seems to be doing what I want - from this: https://github.com/snide/sphinx_rtd_theme/issues/117
I created an override stylesheet that includes just the formatting I want to change.
I stored it in the _static of my source directory.
I defined it in my conf.py as follows:
html_context = {
'css_files': [
'_static/theme_overrides.css',
],
}
Now, that github discussion said that this wasn't a solution for all kinds of themes (including the RTD theme mentioned in the question) but I think I'm safe for now.
What more should I know?

Run a regular expression into a new file (or another existing file)

I would like to take some stuff from file A and reformat it to stick into file B using regular expressions. I am kind of new to vim so this may be a dumb question but I could not find the solution to this anywhere. I guess I am searching for the wrong phrases. Anyway, here are the details of what I want to do. I have a static html page that I would like to have an RSS feed for. Luckily, this page is mostly links to various news items, so creating the RSS will be pretty easy.
I have the regular expression ready:
:%s/^<a href="\(.\{-}\)".title="\(.\{-}\)">\(.\{-}\)<\/a>/<title>\3<\/title>\r<link>\1<\/link>\r<description>\2<\/description>
My problem is I do not want to make the changes in the html file that I am searching. I want the changes to occur in another file, new or existing. How do I make this happen? Or is this method completely off.
Oh and by the way, this expression takes something like this in the html file:
Title of Link
and turns it into this in the xml file:
<title>Title of Link</title>
<link>http://linktosomesite.com</link>
<description>Description of link</description>
Bonus: It would be really nice if I can place this within another file, say starting at line 5.
PS: I know this is a vim and regex question but posting it in html and rss because I noticed people have static html to rss questions there.
Why not just copy your file and then use sed/replace on the copied file?
It sounds like you want to write a transform. There are many transform tools. You certainly could do it with sed & awk for example. But I think the easiest way would be xslt. (you could use xsltproc or saxon...)
Here's an example template:
<xsl:template match="a">
<title><xsl:value-of select="text()"/></title>
<link><xsl:value-of select="#href"/></link>
<description><xsl:value-of select="#title"/></description>
</xsl:template>
It finds each a element, and outputs the results, with the text() node and attributes filled in.
Just run your substitution and save as another file:
$ vim file.html
:%s/^<a href="\(.\{-}\)".title="\(.\{-}\)">\(.\{-}\)<\/a>/<title>\3<\/title>\r<link>\1<\/link>\r<description>\2<\/description>
:w file.rss
:q
That's how I would in any editor, by the way.

Pulling out some text from a giant HTML file using Nokogiri/xpath

I am scraping a website and am trying to pull out certain elements from the HTML. In the sites I am scraping, there are script tags with a bunch of info in them however, there is one part inside these tags that I am interested in. The line basically looks like:
'image':'http://ut5.example.com/t/231/3_b_643435.jpg',
With some stuff above and below it. Now, this is different for each page source except for obviously the domain and some of the subfolders that store the images.
How would I go about looking through the source for this specific line, and cutting out just the URL? I would need to use regular expressions I feel as the URLs are dynamic.
The "gsub" method does something similar to what I want to search for, with its ability to use /regex/. But, I am not wanting to replace anything, I just want to find that URL in the source code using a /regex/ and copy it.
According to you comments, this is what you're looking for I guess
var regex = /http.+/;
Example http://jsfiddle.net/Km9ZB/

How can I convert URLs in text to HTML links?

I'm writing a forum-type discussion board in Perl and would like to change automatically http://www.google.com to be an HTML link. This should also be safe, and err on the side of security. Is there a quick, easy, and safe way to add links automatically?
Try something like this:
use Regexp::Common qw /URI/;
$text =~ s|($RE{URI}{HTTP})(?!</a>)|$1|g
The key here is using Regexp::Common::URI which probably has a more thorough url matcher than anything I could come up with. Also I do a negative lookahead assertion at the end to make sure that the url is not already in a link. That last part isn't exactly thorough, since it's possible that somebody could do something like this:
http://www.mysite.com is my website
To do this correctly you'd need to parse the entire submission text and only substitute out urls that are not already part of a link.