how to manage multiple 'executable and datadir profile' for parallelizing the launch of scrappers?

how to manage multiple 'executable and datadir profile' for parallelizing the launch of scrappers? - google-chrome

I am using , and have difficulties to launch 4 scripts at the same time.
I have used theses variable for local browser
let CHROMIUM_DATA_DIR = `/Users/yo/dataDir/datadir${this.cmd}`
let CHROMIUM_EXEC_PATH = `/Applications/Google-Chrome${this.cmd}.app/Contents/MacOS/Google Chrome`
I have multiplied by 4, the same datadir, et the same executable. I have just renamed the files/directories.
It does not work well. What would be your recomendation, to quickly scale the launch of the scrappers (). How could I install various chromes instance, et managing according datadir (to save some login session etc..)
tks

Since you are using playwright, you can use persistent contexts.
You do not need to create your own data directories or executables by copying them, simply pass location of an empty directory when launching the browser and playwright will populate it itself, storing any session data.
I do not use node.js, but just to give an idea, sample code in python:
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch_persistent_context(user_data_dir=r'C:\Users\me\Desktop\dir', headless=False)
page = browser.new_page()
page.goto("http://playwright.dev")
print(page.title())
browser.close()

Related

How to pass directives to snappy_ec2 created clusters

We have a need to set some directives in the snappy config files for the various components (servers, locators, etc).
The snappy_ec2 scripts do a good job at creating all of the config's and keeping them in sync across the cluster, but I need to find a serviceable method to add directives to the auto generated scripts.
What is the preferred method using this script?
Example: Add the following to the 'servers' file:
-gemfirexd.disable-getall-local-index=true
Or perhaps I should add these strings to an environments file such as
snappy-env.sh
TIA
-doug

Have you tried adding the directives directly in the servers (or locators or leads) file and placing this file under (SNAPPY_DIR)/ec2/deploy/home/ec2-user/snappydata/? The script would read the conf files under this dir at the time of launching the cluster.
You'll need to specify it for each server you want to launch, with the name of server as shown below. See 'Specifying properties' section in README, if you have not already done so. e.g.
{{SERVER_0}} -heap-size=4096m -locators={{LOCATOR_0}}:9999,{{LOCATOR_1}}:9888 -J-Dgemfirexd.disable-getall-local-index=true
{{SERVER_1}} -heap-size=4096m -locators={{LOCATOR_0}}:9999,{{LOCATOR_1}}:9888 -J-Dgemfirexd.disable-getall-local-index=true
If you want it to be applied for all the servers, simply put it in snappy-env.sh as you mentioned (as SERVER_STARTUP_OPTIONS) and place the file under directory mentioned above.
We could have read the conf files directly from (SNAPPY_DIR)/conf/ instead of making users copy it to above location, but we may release the ec2 scripts as a separate package, in future, so that the users do not have to download the entire distribution.

Where does GM_setValue store data?

Where does GM_setValue store the data to in chrome. I've tried to determine where the data is going but couldn't figure it out. I monitored with process monitor and saw that when I stored a value chrome was updating a chrome_iwoeoiifoi2h3iofhufsdfnvdf type of file and I opened that up with an sqlite browser but the data was not there. I've looked at all the recently modified files trying to find the data but could not find it.
Latest chrome/TM.

In Tampermonkey, GM_setValue() data is stored in a LevelDB database that can be found in the User Data Directory tree.
Once in Chrome's "User Data Directory" (EG: C:\Users\USER_JOE\AppData\Local\Google\Chrome\User Data\Default\),
navigate to the Local Extension Settings\dhdgffkkebhmkfjojejmpbldmpobfkfo folder.
(gcalenpjmijncebpfijmoaglllgpjagf for the Tampermonkey Beta.)
There you will find a LevelDB database, usually named CURRENT. You can manipulate it with tools like LevelDB JSON, but external support for LevelDB currently appears to be spotty and I did not find any working tools for Windows yet (might have to compile your own).
You can also use the Chrome Storage Area Explorer extension to explore the data.
As of Tampermonkey 4.3.6, you can see an individual script's data with the Storage tab in the built-in script editor: (if the 'Storage' tab is not visible, edit Tampermonkey Settings > General Config mode > Advanced)
OLD, Pre November-ish 2015:
Before, about November 2015, data was stored in a Web SQL database in databases\chrome-extension_dhdgffkkebhmkfjojejmpbldmpobfkfo_0.
Once you have navigated to the correct folder, you will typically see two files. On my machine, they are currently just named 4 and 6. These are both SQLite files (the backend for Chrome's Web SQL implementation) and can be inspected with a SQLite viewer/utility.
The (normally) larger file, 6 on my machine, is a somewhat disturbing list of 94-thousand userscripts! I'm not sure what purpose it serves, but haven't investigated it much.
The smaller file (initially, at least), 4 on my machine, is where all the information about/for your userscripts is kept. This includes any data set by GM_setValue().
For example, if I install and run this userscript:
// ==UserScript==
// #name _GM_setValue demo
// #match https://stackoverflow.com/questions/*
// #grant GM_setValue
// ==/UserScript==
GM_setValue ('foo', 'bar');
And then I inspect the `config` table in file `4`, I will see four entries like this:
[![DB entries for sample script][8]][8]
The one you want is the `#st` row. Notice how it has the `GM_setValue` data encoded? :

Protect Air application content

On Mac Os, I see that all content on my application can be readable (mxml and as files).
Indeed with right clic on application, you can see all application content and so all files.
So It's very dangerous for a company to distribute air application like that.
Is a solution exist to protect those files.
Thanks

It is not possible to protect 100% your code. After all, if the computer can run it, it can be decompiled, regardless of the language. However, you can make it more difficult.
One method is to encrypt the swf as stated in another answer. But all the "attacker" needs to do is find the key and then they can decrypt all your swfs.
Another method is to use obfuscators. Obfuscators don't depend on encryption, nor they prevent decompiling, they just make it harder to understand what gets decompiled.
For example if you had a method called saveInvoice() the obfuscator would rename it to aa1() or something like that, so it would make it diffucult to guess what that function does. It basically turns everything into spaguetti code.
You can use a decompiler to see what can be obtained from a SWF file (which is alot), and play with obfuscators to see if they meet your espectations.
An example of one is http://www.kindi.com/ which I'm not endorsing btw, it just shows up quickly on google.

Although there are loads of decompilers which can read all your code. There is one guy who came up with encryption solution it might worth a try. (It's for Desktop AIR applications)
Have a look at this post: http://forums.adobe.com/message/3510525#3510525
Quoted text (in case of page being erased)
The method I use will allow you encrpyt most of your source code using
a key that is unique to every computer. The initial download of my
software is a simple air app that does not contain the actual program.
It is more like a shell that first retreaves a list of the clients mac
addresses and the user entered activation code that is created at time
of purchase. This is sent to server and logged. The activation code
is saved to a file client side. At the server the mac address and
activation key are used to create the encryption key. The bulk of the
program code is then encrypted using that key, then divided into parts
and sent back to the client. The client puts the parts back together
and saves the encrypted file. At runtime the shell finds the mac
address list and the activation key, then using same method as server
gets the encryption key and decrypts the program file. Run simple
check to make sure it loaded. For encyption i found an aes method that
works in php and javascript.
Next I use this code to load the program
var loader = air.HTMLLoader.createRootWindow(true, options, true, windowBounds);
loader.cacheResponse=false;
loader.placeLoadStringContentInApplicationSandbox=true;
loader.loadString(page);
This method makes it very difficult to copy
to another computer although since I wrote it i know there are some
weeknesses in the security but to make it harder i obv. the shell
code. It at least keeps most from pirating. However there are issues
with this that I have found. First i was using networkInfo to get the
list of mac address but this failed in a test windows XP computer.
When the wireless was off it did not return the MAC. I was not able
to recreate this in VISTA or 7. Not sure if it could happen. Was not
tested on a mac computer. To fix this (at least for windows). I
wrote a simple bat file that gets the MAC list, then converted it to
an exe which is included. This does force you to create native
installers. call the exe with this
var nativeProcessStartupInfo = new air.NativeProcessStartupInfo();
var file = air.File.applicationDirectory.resolvePath("findmac.exe");
nativeProcessStartupInfo.executable = file;
process = new air.NativeProcess();
process.start(nativeProcessStartupInfo);
process.addEventListener(air.ProgressEvent.STANDARD_OUTPUT_DATA, onOutputData);
process.addEventListener(air.ProgressEvent.STANDARD_ERROR_DATA, onErrorData);
process.addEventListener(air.NativeProcessExitEvent.EXIT, onExit);
process.addEventListener(air.IOErrorEvent.STANDARD_OUTPUT_IO_ERROR, onIOError);
process.addEventListener(air.IOErrorEvent.STANDARD_ERROR_IO_ERROR, onIOError);
put the list together in the onOutputData event using array.push and
continue on the onExit event using the findmac.exe will return the
same info every time (that i know of) beware thought that using the
native install will break the standard application update process so
you will have to write your own. My updates are processed the same way
as above. This is contents of the .bat file to get the mac list
#Echo off
SETLOCAL SET MAC = SET Media = Connected
FOR /F "Tokens=1-2 Delims=:" %%a in ('ipconfig /all^| FIND "Physical Address"') do #echo %%b ENDLOCAL
using this method makes it simple to implement at try before you by
method. at runtime if no activation code get try me version from
server instead of full version.

How can I get a Windows batch or Perl script to run when a file is added to a directory?

I am trying to write a script that will parse a local file and upload its contents to a MySQL database. Right now, I am thinking that a batch script that runs a Perl script would work, but am not sure if this is the best method of accomplishing this.
In addition, I would like this script to run immediately when the data file is added to a certain directory. Is this possible in Windows?
Thoughts? Feedback? I'm fairly new to Perl and Windows batch scripts, so any guidance would be appreciated.

You can use Win32::ChangeNotify. Your script will be notified when a file is added to the target directory.

Checking a folder for newly created files can be implemented using the WMI functionality. Namely, you can create a Perl script that subscribes to the __InstanceCreationEvent WMI event that traces the creation of the CIM_DirectoryContainsFile class instances. Once that kind of event is fired, you know a new file has been added to the folder and can process it as you need.
These articles provide more information on the subject and contain VBScript code samples (hope it won't be hard for you to convert them to Perl):
How Can I Automatically Run a Script Any Time a File is Added to a Folder?
WMI and File System Monitoring

The function you want is ReadDirectoryChangesW. A quick search for a perl wrapper yields this Win32::ReadDirectoryChanges module.
Your script would look something like this:
use Win32::ReadDirectoryChanges;
$rdc = new Win32::ReadDirectoryChanges(path => $path,
subtree => 1,
filter => $filter);
while(1) {
#results = $rdc->read_changes;
while (scalar #results) {
my ($action, $filename) = splice(#results, 0, 2);
... run script ...
}
}

You can easily achieve this in Perl using File::ChangeNotify. This module is to be found on CPAN: http://search.cpan.org/dist/File-ChangeNotify/lib/File/ChangeNotify.pm
You can run the code as a daemon or as a service, make it watch one or more directories and then automatically execute some code (or start up a script) if some condition matches.
Best of all, it's cross-platform, so should you want to switch to a Linux machine or a Mac, it would still work.

It wouldn't be too hard to put together a small C# application that uses the FileSystemWatcher class to detect files being added to a folder and then spawn the required script. It would certainly use less CPU / system resources / hard disk bandwidth than polling the folder at regular intervals.

You need to consider what is a sufficient heuristic for determining "modified".
In increasing order of cost and accuracy:
file size (file content can still be changed as long as size is maintained)
file timestamp (If you aren't running ntpd time is not monotonic)
file sha1sum (bulletproof but expensive)
I would run ntpd, and then loop over the timestamps, and then compare the checksum if the timestamp changes. This can cover a lot of ground in little time.
These methods are not appropriate for a computer security application, they are for file management on a sane system.

Get the application's path

I've recently searched how I could get the application's directory in Java. I've finally found the answer but I've needed surprisingly long because searching for such a generic term isn't easy. I think it would be a good idea to compile a list of how to achieve this in multiple languages.
Feel free to up/downvote if you (don't) like the idea and please contribute if you like it.
Clarification:
There's a fine distinction between the directory that contains the executable file and the current working directory (given by pwd under Unix). I was originally interested in the former but feel free to post methods for determining the latter as well (clarifying which one you mean).

In Java the calls
System.getProperty("user.dir")
and
new java.io.File(".").getAbsolutePath();
return the current working directory.
The call to
getClass().getProtectionDomain().getCodeSource().getLocation().getPath();
returns the path to the JAR file containing the current class, or the CLASSPATH element (path) that yielded the current class if you're running directly from the filesystem.
Example:
Your application is located at
C:\MyJar.jar
Open the shell (cmd.exe) and cd to C:\test\subdirectory.
Start the application using the command java -jar C:\MyJar.jar.
The first two calls return 'C:\test\subdirectory'; the third call returns 'C:\MyJar.jar'.
When running from a filesystem rather than a JAR file, the result will be the path to the root of the generated class files, for instance
c:\eclipse\workspaces\YourProject\bin\
The path does not include the package directories for the generated class files.
A complete example to get the application directory without .jar file name, or the corresponding path to the class files if running directly from the filesystem (e.g. when debugging):
String applicationDir = getClass().getProtectionDomain().getCodeSource().getLocation().getPath();
if (applicationDir.endsWith(".jar"))
{
applicationDir = new File(applicationDir).getParent();
}
// else we already have the correct answer

In .NET (C#, VB, …), you can query the current Assembly instance for its Location. However, this has the executable's file name appended. The following code sanitizes the path (using System.IO and using System.Reflection):
Directory.GetParent(Assembly.GetExecutingAssembly().Location)
Alternatively, you can use the information provided by AppDomain to search for referenced assemblies:
System.AppDomain.CurrentDomain.BaseDirectory
VB allows another shortcut via the My namespace:
My.Application.Info.DirectoryPath

In Windows, use the WinAPI function GetModuleFileName(). Pass in NULL for the module handle to get the path for the current module.

Python
path = os.path.dirname(__file__)
That gets the path of the current module.

Objective-C Cocoa (Mac OS X, I don't know for iPhone specificities):
NSString * applicationPath = [[NSBundle mainBundle] bundlePath];

In Java, there are two ways to find the application's path. One is to employ System.getProperty:
System.getProperty("user.dir");
Another possibility is the use of java.io.File:
new java.io.File("").getAbsolutePath();
Yet another possibilty uses reflection:
getClass().getProtectionDomain().getCodeSource().getLocation().getPath();

In VB6, you can get the application path using the App.Path property.
Note that this will not have a trailing \ EXCEPT when the application is in the root of the drive.
In the IDE:
?App.Path
C:\Program Files\Microsoft Visual Studio\VB98

In .Net you can use
System.IO.Directory.GetCurrentDirectory
to get the current working directory of the application, and in VB.NET specifically you can use
My.Application.Info.DirectoryPath
to get the directory of the exe.

Delphi
In Windows applications:
Unit Forms;
path := ExtractFilePath(Application.ExeName);
In console applications:
Independent of language, the first command line parameter is the fully qualified executable name:
Unit System;
path := ExtractFilePath(ParamStr(0));

Libc
In *nix type environment (also Cygwin in Windows):
#include <unistd.h>
char *getcwd(char *buf, size_t size);
char *getwd(char *buf); //deprecated
char *get_current_dir_name(void);
See man page

Unix
In unix one can find the path to the executable that was started using the environment variables. It is not necessarily an absolute path, so you would need to combine the current working directory (in the shell: pwd) and/or PATH variable with the value of the 0'th element of the environment.
The value is limited in unix though, as the executable can for example be called through a symbolic link, and only the initial link is used for the environment variable. In general applications on unix are not very robust if they use this for any interesting thing (such as loading resources). On unix, it is common to use hard-coded locations for things, for example a configuration file in /etc where the resource locations are specified.

In bash, the 'pwd' command returns the current working directory.

In PHP :
<?php
echo __DIR__; //same as dirname(__FILE__). will return the directory of the running script
echo $_SERVER["DOCUMENT_ROOT"]; // will return the document root directory under which the current script is executing, as defined in the server's configuration file.
echo getcwd(); //will return the current working directory (it may differ from the current script location).
?>

in Android its
getApplicationInfo().dataDir;
to get SD card, I use
Environment.getExternalStorageDirectory();
Environment.getExternalStoragePublicDirectory(String type);
where the latter is used to store a specific type of file (Audio / Movies etc). You have constants for these strings in Environment class.
Basically, for anything to with app use ApplicationInfo class and for anything to do with data in SD card / External Directory using Environment class.
Docs :
ApplicationInfo ,
Environment

In Tcl
Path of current script:
set path [info script]
Tcl shell path:
set path [info nameofexecutable]
If you need the directory of any of these, do:
set dir [file dirname $path]
Get current (working) directory:
set dir [pwd]

Java:
On all systems (Windows, Linux, Mac OS X) works for me only this:
public static File getApplicationDir()
{
URL url = ClassLoader.getSystemClassLoader().getResource(".");
File applicationDir = null;
try {
applicationDir = new File(url.toURI());
} catch(URISyntaxException e) {
applicationDir = new File(url.getPath());
}
return applicationDir;
}

in Ruby, the following snippet returns the path of the current source file:
path = File.dirname(__FILE__)

In CFML there are two functions for accessing the path of a script:
getBaseTemplatePath()
getCurrentTemplatePath()
Calling getBaseTemplatePath returns the path of the 'base' script - i.e. the one that was requested by the web server.
Calling getCurrentTemplatePath returns the path of the current script - i.e. the one that is currently executing.
Both paths are absolute and contain the full directory+filename of the script.
To determine just the directory, use the function getDirectoryFromPath( ... ) on the results.
So, to determine the directory location of an application, you could do:
<cfset Application.Paths.Root = getDirectoryFromPath( getCurrentTemplatePath() ) />
Inside of the onApplicationStart event for your Application.cfc
To determine the path where the app server running your CFML engine is at, you can access shell commands with cfexecute, so (bearing in mind above discussions on pwd/etc) you can do:
Unix:
<cfexecute name="pwd"/>
for Windows, create a pwd.bat containing text #cd, then:
<cfexecute name="C:\docume~1\myuser\pwd.bat"/>
(Use the variable attribute of cfexecute to store the value instead of outputting to screen.)

In cmd (the Microsoft command line shell)
You can get the name of the script with %* (may be relative to pwd)
This gets directory of script:
set oldpwd=%cd%
cd %0\..
set app_dir=%pwd%
cd %oldpwd%
If you find any bugs, which you will. Then please fix or comment.

I released https://github.com/gpakosz/whereami which solves the problem in C and gives you:
the path to the current executable
the path to the current module (differs from path to executable when calling from a shared library).
It uses GetModuleFileNameW on Windows, parses /proc/self/maps on Linux and Android and uses _NSGetExecutablePath or dladdr on Mac and iOS.

Note to answer "20 above regarding Mac OSX only: If a JAR executable is transformed to an "app" via the OSX JAR BUNDLER, then the getClass().getProtectionDomain().getCodeSource().getLocation(); will NOT return the current directory of the app, but will add the internal directory structure of the app to the response. This internal structure of an app is /theCurrentFolderWhereTheAppReside/Contents/Resources/Java/yourfile
Perhaps this is a little bug in Java. Anyway, one must use method one or two to get the correct answer, and both will deliver the correct answer even if the app is started e.g. via a shortcut located in a different folder or on the desktop.
carl
SoundPimp.com

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

how to manage multiple 'executable and datadir profile' for parallelizing the launch of scrappers? - google-chrome

Related

How to pass directives to snappy_ec2 created clusters

Where does GM_setValue store data?

Protect Air application content

How can I get a Windows batch or Perl script to run when a file is added to a directory?

Get the application's path

Categories

Resources