How to Extract data from a google drive link with iMacros? - google-drive-api

I am using iMacros for automating some form filling web task.
To avoid hard-coding certain parameters, i had planned to place the parameters in a htm file, and extract the parameters online before starting with the rest of the scripts. This way i believe, i can maintain the parameters across multiple instances running of the script, from a single place.
However, I have come across a problem, while extracting the htm file that I uploaded to google drive.
Below is the link that i want to extract data from.
https://drive.google.com/file/d/0B_GgQPGYiDg8UVBTOEYyVGk1Yk0
But looks neither the EXTRACT command nor the iMacro browser is able to extract the contents from this link.
One alternative is to host the htm file on any free web hosting platform. Buy it doesn't seems to be worth for a single file. There should be any alternate simple solution to this.
Hint: When I view source of the page, I see no tag that contains my data. Only some javascript functions.

If you want to stay with drive you can extract all the content of that file with
TAG POS=1 TYPE=DIV ATTR=class:drive-viewer-text-content EXTRACT=TXT
You will get the full HTML Code inside your file and have to parse that for the data you want. If you only really need the data in the TDs, would making it a plain text file work for you? Then you could place the individual TD contents in a single line each to make parsing your extracted data easier.

I think its better you simplify the parameters and just keep it as an array. IMacros is able to extract the text easily. Please find below my code, where I extract the text and did some basic processing and finally returned an array with the parameters.
CODE:
VERSION BUILD=9030808 RECORDER=FX
SET !EXTRACT_TEST_POPUP NO
SET !TIMEOUT_PAGE 10
URL GOTO=https://drive.google.com/file/d/0B_GgQPGYiDg8UVBTOEYyVGk1Yk0/edit
TAG POS=1 TYPE=PRE ATTR=TXT:<?xml<SP>version="1.0"?><html><SP><head><SP><title>Parameter* EXTRACT=TXT
SET !VAR1 EVAL("var s='{{!EXTRACT}}';s=s.match(/<TD>([^<]+)<.TD>/gm);s=s.map(e => e.replace('<TD>', '').replace('</TD>', ''));s;")
PROMPT {{!VAR1}}
OUTPUT:
am|am|Yes|Blocked by Administrator|Y|Y|Y|N|N|N|N|N|N|N|N|N|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y
,
pm|pm|Yes|Blocked by Administrator|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|N|N|N|N|N|N|N|N|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y
,
dt|dt|Yes|Blocked by Administrator|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y|Y

Related

Seeking Guidance

I'm new with this and not certain what platform to use to achieve my desired outcome (i.e. php, javascript, etc.), but I'm a fast learner.
I add videos to my YouTube channel daily. After this I update two separate webpages where I manually embed the newest video URL.
Question:
I would like to automate this work process. What is the best approach (i.e. CSS, Javascript, PHP, etc.) that I can use to "get" the most current YouTube video URL and embed it into my webpage(s) automatically?
I hope I explained this properly. Let me know if you need any additional information. Thanks in advance for any guidance you can offer!!!
(1) Get link of latest video on your Channel:
You can request from Youtube, a Channel's feed using
https://www.youtube.com/feeds/videos.xml?channel_id=XXXXX
Where XXXXX is the channel's ID (as shown in browser's address bar).
The first entry in the XML document is the latest video.
Use Javascript Fetch API to load the XML or else have a JS function to call a PHP script that gives/reads back this XML document.
After correct loading, you'll have a String (text) copy of that same document existing in some variable that you put it into. The idea here is to edit the text by code (instead of highlighting and replacing the URL in a text editor). The code should find and replace the URL. The code should then save the edited text as a new HTML file (overwrite the old one using PHP)
With Javascript, either use its String functions to extract the URL or follow some tutorial about parsing XML to extract data.
(2) To update the webpages: (use PHP)
Option 1 is to load the old page and use PHP String functions to replace text of old link with latest new link. Then write the edited text as file (overwrite older HTML file)
Option 2 is to have a "template" document already stored as String in your code. Then simply replace (or add if needed) the URL of new video. Then have PHP save the text of String as an HTML file, overwriting the old .html.
Use this service, I think it will be the easiest way: https://latestyoutu.be/
You can find your channel ID by clicking Settings, Advanced, and Account Information and paste it into the site. (https://support.google.com/youtube/answer/3250431). This is probably the most hassle-free way of doing what it seems you want to.

All paragraphs are empty in an opened document in python-docx

I do the following:
from docx import Document
document = Document('text.docx')
document.paragraphs[42].text
And it gives me '' whatever number I enter, and for loop to find and replace a word does not work. But if I save the document with document.save('text2.docx'), the document is not empty.
The document is relatively big and contains many different formatting, images, tables, styles.
My task is to find and replace a word in docx document with some correction of the following word, so I will be glad, if you suggest another tool
I ran into this problem and was able to read the document using docx2txt: https://pypi.org/project/docx2txt/

Regex: img src urls that don't have multiple paths

Going through another crazy website migration!
I have HTML img src urls that look like this
http://blog.example.com/imagename.jpg
Image formats can also be jpg, png, or gif
We need a regex that finds every url that has the domain then "/imagename.jpg" immediately after.
Very new to regex, what would the expression be?
Better Alternative for WordPress Migrations
If you are moving your website and you will want to replace all references to the old site with the new domain, I suggest you use David Coveney's Serialized Search & Replace DB v2.1.0. You'll want to run this on a new copy of the database, always have a backup handy. Import the database on the destination server, then run the tool - You don't even have to upload the server files.
When I do this coming from a development server to live domain, I usually do two search & replaces:
One for URLs, very basic:
Search: mywebsite.devserver.com
Replace: my-new-website.com
And one for file paths:
Search: /vhosts/devserver.com/mywebsite
Replace: /vhosts/my-new-website.com/httpdocs
(Note: This is assuming the majority of the file path is the same for both servers. Your search & replace paths may need to be more accurate)
The reason you want serialized search and replace is that some data is stored in PHP-serialized format, and if you change the value with a text editor or in MySQL directly, it may not be able to unserialize afterwards.
Regex Answer
Select images hosted by blog.example.com with the following regex pattern:
((http|https)://blog\.example.com/[^ \r\n]+\.(jpg|jpeg|png|gif))
Which basically searches for this: http(s)://blog.example.com/*.(jpg/png/etc)
Matches the URLs in the following examples:
http://example.com/imagename.jpg
http://blog.example.com/imagename.jpg
http://blog.example.com/favicon.png
http://blog.example.com/uploads/2013/05/kitten.gif
https://blog.example.com/ssl-secure.png
This is my favorite gif https://blog.example.com/some-hilarious-image.gif hahaha
DOES NOT match any of these:
blog.example.com/google.png
https://blog.google.com/google.png
our website is http://blog.example.com and has an image named /imagename.png
http://blog.example.com/
WHY it doesn't match those (by line):
Does not include http(s)://
Hosted by google
Paragraph text, where the URL is split into two parts
Not an image
$1 returns the full URL of the image.
I tested this on RegexTester.com. You can copy the pattern in the top field, and all of the examples in the box below. The red highlights are matches.
Many good suggestions already, and why would a wordpress site hardcode domain name to links, but thats not our problem right now. If you need a regex then try this:
(?<=<img).+(?<=src=["'])(.+(?:jpe?g|gif|png))
EXPLAINED:
(?<=<img).+(?<=src=["']) - be sure we're inside an <img> tag up to src attribute
(.+(?:jpe?g|gif|png)) capture everything up to required extension

Pulling out some text from a giant HTML file using Nokogiri/xpath

I am scraping a website and am trying to pull out certain elements from the HTML. In the sites I am scraping, there are script tags with a bunch of info in them however, there is one part inside these tags that I am interested in. The line basically looks like:
'image':'http://ut5.example.com/t/231/3_b_643435.jpg',
With some stuff above and below it. Now, this is different for each page source except for obviously the domain and some of the subfolders that store the images.
How would I go about looking through the source for this specific line, and cutting out just the URL? I would need to use regular expressions I feel as the URLs are dynamic.
The "gsub" method does something similar to what I want to search for, with its ability to use /regex/. But, I am not wanting to replace anything, I just want to find that URL in the source code using a /regex/ and copy it.
According to you comments, this is what you're looking for I guess
var regex = /http.+/;
Example http://jsfiddle.net/Km9ZB/

using encodeURI to display an entire page

Hi I am making a chrome extension. Where I save a page to the database as a string and then open it later as a dataURI scheme like:
d = 'data:text/html;charset=utf-8'+encodeURI('HTML TEXT')
location.reload(d);
The problem with this is that the page, say its name is http://X/, in which I executed the above command loses the javascript files in its head.
I considered using the document.write(d), if d has a string appeneded to it with the <head>...</head> of http://X/.
But this opens a big vulnerability problem for XSS. At this point I am trying to think of white listing tags when I save the original page... is there another way?
I'm not sure what you mean by http://X/, but if want copied website to retain its origin (i.e. have code you give run exactly as if it were downloaded from http://X/), then I'm afraid it's not possible with standard DOM methods (it would be a security vulnerability that bypasses same-origin security policy).
If you want to run 3rd party sourcecode safely, then use this:
<iframe sandbox src="data:…"></iframe>
You could modify the source and insert <base href="http://X/"> in there to make relative URLs work properly.