I want to scrape all of the url's out of my .json bookmark backup that firefox creates and output a .txt file.
Here's a sample of one of the objects located in the file:
{"index":1,"title":"Bookmarks Toolbar","id":3,"parent":1,"dateAdded":1219177758531250,"lastModified":1288873459187000,"annos":[{"name":"bookmarkProperties/description","flags":0,"expires":4,"mimeType":null,"type":3,"value":"Add bookmarks to this folder to see them displayed on the Bookmarks Toolbar"}],"type":"text/x-moz-place-container","root":"toolbarFolder","children":[{"title":"","id":25,"parent":3,"dateAdded":1224693644437500,"lastModified":1236888979406250,"annos":[{"name":"placesInternal/GUID","flags":0,"expires":4,"mimeType":null,"type":3,"value":"{f6066e21-10ff-46a2-af7a-2891f8dca345}0"}],"type":"text/x-moz-place","uri":"http://www.google.com/"}
These objects are comma-separated and should all contain at least one member that contains a string whose value is the url of the bookmark.
Here's a sample of what the .txt file would have in it:
http://www.google.com
http://www.yahoo.com
http://www.etc.com`
Ideally, I'm interested in seeing if this can be pulled off using any scripting tools available within a generic Windows XP "environment".
If Windows can't cut it, what would be the quickest & easiest solution to this?
Is there a website or program that can do pattern matching or regex to parse the file do search & replace before I go install something like Active Perl or Strawberry Perl and write a script for it.
Another way I found is the method at the following site:
http://forums.mozillazine.org/viewtopic.php?f=38&t=1057265&sid=66d981cc79d1ff63644e0cdd5b665a37
Basically you do the following:
(1) Create a firefox bookmark with the following as the location:
javascript:(function(){var E=document.getElementsByTagName('PRE')[0],T=E.innerHTML,i=0,r1,r2;t=new Array();while(/("uri":"([^"]*)")/g.exec(T)){r1=RegExp.$1;r2=RegExp.$2;if(/^https?:/.exec(r2)){t[i++]='['+(i)+']:<a href='+r2+'>'+r2+'<\/a>';}}with(window.open().document){for(i=0;t[i];i++)write(t[i]+'<br>');close();}})();
(2) Open a blank firefox tab.
(3) drag your firefox json file into the blank tab, this should open the json file.
(4) goto your bookmark you created in step 1.
(5) you should have a list of "clickable urls" for all your bookmarks.
If you have Excel, it's probably easy to do a text to columns split
http://office.microsoft.com/en-us/excel-help/split-names-by-using-convert-text-to-columns-HA001149851.aspx
on ". Given the format (order of fields) is always the same, you should have the URLs somewhere near the last column.
I haven't tested this.
NOTE: Verify/correct all below file paths to match your system.
#Echo Off
Rem FFExportBookmarks.bat
SetLocal EnableDelayedExpansion
Set JSONFile="%APPDATA%\Mozilla\Firefox\Profiles\xyz42pdq.default\bookmarkbackups\Bookmarks.json"
Set FavOut="%USERPROFILE%\My Documents\FFBookmarks.txt"
Set JSONTemp="%Temp%\JSONTemp.txt"
Echo.> %JSONTemp%
Set JSONTemp1="%Temp%\JSONTemp1.txt"
Echo.> %JSONTemp1%
For /f "UseBackQ Delims=" %%N In ('Type %JSONFile%') Do (
Set JSONInput=%%N
Rem Filter double " and other delimiters
Set JSONInput=!JSONInput:"=!
Set JSONInput=!JSONInput: =!
Set JSONInput=!JSONInput:^,= !
Set JSONInput=!JSONInput:[= !
Set JSONInput=!JSONInput:]= !
Set JSONInput=!JSONInput:{= !
Set JSONInput=!JSONInput:}= !
For %%K In (!JSONInput!) Do For /f "Tokens=1,2 Delims=:" %%X In ("%%K") Do (
If /i "%%X"=="uri" Echo %%Y >> %FavOut%
)
)
Start "" %FavOut%
It wasn't very quick, but it's plenty dirty!
Related
I'm very new to .bat files and have excitedly created some to copy, move and rename documents.
Despite searching, I'm getting stuck with a more complex command, largely because the document I'm trying to modify is pipeline delimited rather than 'normal' csv...
My question: Can I, and if I can - how do I take an existing pipeline delimited csv that always has the same number of columns and add a column onto the end with todays date (DD/MM/YYYY) in it for every row?
$ awk -F, 'NF>1{$0 = "\"YYYY-MM-DD\"" FS $0}1' file
sed 'N;s/","/","YYYY-MM-DD HH:MM:SS","/5' file
I cant seem to get anything to even modify the document at the moment :-(
Your batch attempt isn't that bad:
#echo off
for /f "delims=" %%a in ('type "file.csv"') do (
>>"fileout.csv" echo.%%a|%time%
)
Just a few adjustments:
#echo off
(for /f "delims=" %%a in (file.csv) do (
echo(%%a^|%time%
))>"fileout.csv"
for /f is able to process the contents of a file directly (no need for type, although it works fine)
Redirecting (>>) inside the loop is slow, because the file will be opened, and closed every time you write to it (although it works). It's much faster to only once open/close the file (especially with large files).
echo. is not secure (although it works fine in most circumstances), best option is echo(.
The pipe, | is a special character and in this case needs escaping with the caret, ^.
Note: for /f skips empty lines, so they will not be in the new file. Same with lines, that start with ; (default EOL)
Edit for "adding |Date to the Header":
#echo off
<file.csv set /p header=
>fileout.csv echo %header%^|DATE
(for /f "skip=1 delims=" %%a in (file.csv) do (
echo(%%a^|%date%
))>>"fileout.csv"
<file.csv set /p header= is a way to read just one (the first) line of a file to a variable. Write it back with the new data appended (or leave it unchanged - your comment isn't quite clear about that). Use skip=1 to skip the first line with further processing.
Don't forget to change ))>"fileout.csv" to ))>>"fileout.csv".
I have bulk html files within several folders.
And the problem is i have to remove tags from those html files.
I can't figure out how to do that..
I searched the internet and found nothing.
Is there any cmd script that will open every html file and remove the tags or replace with any other tag of my choice?
Thanks for the help.
Well.. after 8 hours of my personal experiments, it should work now as #user3551620 wanted. I made updates in my answer due to change of specification of question, where user told me he wants to run this script in system files where I treat with problem of work with path that contain spaces as: "Program Files (x86)" ... Remember that If you run this script in system files, you should do it as an administrator, due to creating new temp file and other writings in script that need permission to do it.
Now correct code should work as follows:
setlocal enabledelayedexpansion
::get path
SET mypath=%~dp0*.html
set /p old=old string ?
set /p new=new string ?
::cycle for every file of specific folder where you have this script and all html files
for /f "delims=" %%f in ('dir /b /s "%mypath:~0,-1%"') do (
::copy to temp file line by line text with replacing of specific tags
for /f "delims=" %%a in ('type "%%f"') do (
set str=%%a
set str=!str:%old%=%new%!
>> tempfileXXX.txt echo !str!
)
::empty the folder from where you copied
break>%%f
::cycle over every line of temp file to copy back to old file
for /f "delims=" %%a in (tempfileXXX.txt) do (
set str=%%a
>> "%%f" echo !str!
)
::clear tempfile
break>tempfileXXX.txt
)
::delete temp file
del tempfileXXX.txt
pause
You have to run this script.bat in the folder where you need do your actions. After running script will ask you for adding string which tags should be replaced and second with what tag it should be replaced. Remember that when using "<" and ">" signs for creating tag, you should enter before every "<" sign special "^" character. This will replace all html tags with php tags in arr .html files in your folder.
Example of usage:
Additional problems but not big:
you must rerun script for closing tags , but you can now programm it on your own
after script copy line by line you can notice, that script will remove newline characters, but that's not so huge problem to find out how
I believe that script that I made above can be made smarter in way of no need of creating temp file.. more experienced programmers could comment below this post for this issue..
So I am just playing with batch files and was curious if it was possible to create a batch file that opens the google browser and without typing into the search box, a variable from my batch file gets put into the search box. Anyone know if that's possible? Thanks.
#echo off
cd c:\program files (x86)\google\application
start chrome.exe www.youtube.com
I can open the web browser, I can even change the code to store the variable, but need to know how to send that variable to the search engine. Youtube i just the website i left it at.
If you use google as search engine, try to pass the keyword like this :
#echo off
start "" chrome.exe www.google.com#q=batch
and if you want add more than a keyword just add the sign +
#echo off
start "" chrome.exe www.google.com#q=batch+vbscript+HTA
You could send raw HTTP request as follows:
start "" "c:\Program Files (x86)\Google\Chrome\Application\chrome.exe" "https://www.youtube.com/results?search_query=simon's cat"
Below is possible approach how-to make it more readable in a batch script (non-systematic approach, as e.g. site variable joins together protocol and host name). Use appropriate value of engine variable for a particular host (as it could vary for different servers):
#ECHO OFF >NUL
SETLOCAL EnableExtensions EnableDelayedExpansion
set "chromepath=c:\Program Files (x86)\Google\Chrome\Application" path to chrome
set "site=https://www.youtube.com"
set "engine=results?search_query"
set "search=simon's cat" string to search
start "" "!chromepath!\chrome.exe" "!site!/!engine!=!search!"
Delayed expansion used as any variable in above code could contain cmd poisonous characters.
If you want the simplest alternative, you can just copy the full link and paste it in:
#echo off
start chrome "youtube.com/results?search_query=funny+videos"
To open another tab, separate first address with a space and type "google.com" next to it.
If you want to open another browser like FireFox at the same time, type on a new line: start firefox "stackoverflow.com"
It'll look something like this:
#echo off
start chrome "youtube.com/results?search_query=funny+videos" "google.com"
start firefox "stackoverflow.com"
#echo off
set tmp="%*"
IF %tmp% == "" (
GOTO :query
) ELSE (
GOTO :replace
)
:replace
set url=%*
REM set url=%url: =+%
echo %url%
GOTO :search
:query
set /p url=Input search Keywords:
GOTO :search
:search
echo Search query confirmed: %*
echo Attaching to process..
tasklist /nh|findstr "chrome.exe" && start "" "chrome.exe" "? %url%"
REM tasklist /nh|findstr "chrome.exe" && start "" "chrome.exe" "www.google.com/search?q=%url%"
Here is the batch file I use for accomplishing this.
It will attach the search tab to an open process of chrome and search for %* arguments.
If you don't pass arguments, it will ask for some.
> url installing pycharm on ubuntu
There is a commented out method aswell that replaces spaces with '+', then searches with raw HTTP instead of the "? %url%" option.
Delete REM on line 12 and 25, and all of line 24 to switch
#echo off
echo Welcome to my Search Engine!
echo Type 1 Keyword to Search. Use +s instead of spaces
set/p "keyw=Keyword is "
start https://www.google.com/search?q=%keyw%&sourceid=ie7&rls=com.microsoft:en-US:IE-Address&ie=&oe=
I have multiple csv files that I need to search for Special characters like the exclamation mark! if the character is found delete the information between the commas with a .bat file. the email address always seems to be where people screw up. example: 233dd123dde3,Valid,boxer,Nov-13,Philip Smith,andrew!#myaxxus.net,16666
suggestion with sed for Windows:
sed -i.bak "s/[^,]*![^,]*//" *.csv
That seems like a drastic measure to completely drop the entire value if a single character exists, but it can be done.
Note that the you must account for the fact that the first value does not have a leading comma, and the last value does not have a trailing comma.
This solution will not properly handle quoted values containing commas.
I'm using a hybrid JScript/batch utility called REPL.BAT that performs a regex search and replace on stdin and writes the result to stdout. It is pure script that works on any modern Windows from XP onward - no 3rd party executebable required. Full documentation is embedded within the utility.
Assuming that REPL.BAT is in your current directory, or better yet, somewhere within your path:
#echo off
for %%F in (*.csv) do (
type "%%F" | repl "(^|,)[^,]*![^,]*(,|$)" "$1$2" >"%%F.new"
move /y "%%F.new" "%%F" >nul
)
EDIT
Now that I see Endoro's sed solution, I realize that the default greedy match means you don't have to explicitly match the commas. The following simpler regex works just as well:
#echo off
for %%F in (*.csv) do (
type "%%F" | repl "[^,]*![^,]*" "" >"%%F.new"
move /y "%%F.new" "%%F" >nul
)
i have a process that downloads a file from a webbrower. it has the same name always (can't change that) so each file gets downloaded as file([latestnumber])
so in this directory i have:
joe.pdf
joe(1).pdf
joe(2).pdf
etc . . .
I now would like a script to take the "latest file" (joe(2).pdf in this case) and copy it to another directory.
something like GetLatestFile("joe") and copy to "X:\mydirectory"
can anyone think of an easy way to do this.
Do you have a preference as to what language you write your script in?
I wouldn't go by the name of the file, I'd choose whatever scripting language you are going to use, loop through the directory and look at the file attributes for each file to pick out the latest one, then move it to your target directory. This would be fairly trivial in a .NET console application with the classes available in the System.IO namespace. (namely the DirectoryInfo, FileInfo and File classes)
Try this: XCOPY C:\BATCH\*.* C:\UPLOAD /M
Put the code in a text file and rename it as whateveryouwant.bat and execute.
Be sure to edit the source and destination folder to your liking.
Is this what you're looking for ?
So, as it is enough to get the latest filename sorted by date, I suggest something like:
#echo off & setLocal enabledelayedexpansion
for /f "tokens=* delims= " %%a in ('dir /b/a-d/o-d') do (
set N=%%~Fa
goto :done
)
:done
echo !N!
Replace the last echo command for the "copy ..." or whatever you want to do with the newest file.
HTH!
Edit> If the files are not in the current directory, change the "dir" command accordingly
this uses sed, and regular expressions
http://gnuwin32.sourceforge.net/packages/sed.htm
it generates a bat file that does the job.
i've put the bat file in c:\crp so it doesn't become a latest file.
as a demonstration, i've created a latest file latestfile.txt
you can see the line that generates copyit.bat and you can amend it so the files goes exactly where you want.
C:>md c:\crp
C:>copy /y con latestfile.txt
fgfdgd^Z
1 file(s) copied.
C:>dir /o-d/a-d/b | find /N /V "QWERTY" | find "[1]" | sed -e s/[1](.*)/cop
y\d32\1\d32c:\newdir/>c:\crp\copyit.bat
C:>type c:\crp\copyit.bat
copy latestfile.txt c:\newdir