Run R silently from command line, export results to JSON - json

How might I call an R script from the shell (e.g. from Node.js exec) and export results as JSON (e.g. back to Node.js)?
The R code below basically works. It reads data, fits a model, converts the parameter estimates to JSON, and prints them to stdout:
#!/usr/bin/Rscript --quiet --slave
install.packages("cut", repos="http://cran.rstudio.com/");
install.packages("Hmisc", repos="http://cran.rstudio.com/");
install.packages("rjson", repos="http://cran.rstudio.com/");
library(rjson)
library(reshape2);
data = read.csv("/data/records.csv", header = TRUE, sep=",");
mylogit <- glm( y ~ x1 + x2 + x3, data=data, family="binomial");
params <- melt(mylogit$coefficients);
json <- toJSON(params);
json
Here's how I'd like to call it from Node...
var exec = require('child_process').exec;
exec('./model.R', function(err, stdout, stderr) {
var params = JSON.parse(stdout); // FAIL! Too much junk in stdout
});
Except the R process won't stop printing to stdout. I've tried --quiet --slave --silent which all help a little but not enough. Here's what's sent to stdout:
The downloaded binary packages are in
/var/folders/tq/frvmq0kx4m1gydw26pcxgm7w0000gn/T//Rtmpyk7GmN/downloaded_packages
The downloaded binary packages are in
/var/folders/tq/frvmq0kx4m1gydw26pcxgm7w0000gn/T//Rtmpyk7GmN/downloaded_packages
[1] "{\"value\":[4.04458733165933,0.253895751245782,-0.1142272181932,0.153106007464742,-0.00289013062471735,-0.00282580664375527,0.0970325223603164,-0.0906967639834928,0.117150317941983,0.046131890754108,6.48538603593323e-06,6.70646151749708e-06,-0.221173770066275,-0.232262366060079,0.163331098409235]}"
What's the best way to use R scripts on the command line?
Running R --silent --slave CMD BATCH model.R per the post below still results in a lot of extraneous text printed to model.Rout:
Run R script from command line

Those options only stop R's own system messages from printing, they won't stop another R function doing some printing. Otherwise you'll stop your last line from printing and you won't get your json to stdout!
Those messages are coming from install.packages, so try:
install.packages(-whatever-, quiet=TRUE)
which claims to reduce the amount of output. If it reduces it to zero, job done.
If not, then you can redirect stdout with sink, or run things inside capture.output.

Related

Connection issues in Storage trigger GCF

For my application, new file uploaded to storage is read and the data is added to a main file. The new file contains 2 lines, one a header and other an array whose values are separated by a comma. The main file will need maximum of 265MB. The new files will have maximum of 30MB.
def write_append_to_ecg_file(filename,ecg,patientdata):
file1 = open('/tmp/'+ filename,"w+")
file1.write(":".join(patientdata))
file1.write('\n')
file1.write(",".join(ecg.astype(str)))
file1.close()
def storage_trigger_function(data,context):
#Download the segment file
download_files_storage(bucket_name,new_file_name,storage_folder_name = blob_path)
#Read the segment file
data_from_new_file,meta = read_new_file(new_file_name, scale=1, fs=125, include_meta=True)
print("Length of ECG data from segment {} file {}".format(segment_no,len(data_from_new_file)))
os.remove(new_file_name)
#Check if the main ecg_file_exists
file_exists = blob_exists(bucket_name, blob_with_the_main_file)
print("File status {}".format(file_exists))
data_from_main_file = []
if ecg_file_exists:
download_files_storage(bucket_name,main_file_name,storage_folder_name = blob_with_the_main_file)
data_from_main_file,meta = read_new_file(main_file_name, scale=1, fs=125, include_meta=True)
print("ECG data from main file {}".format(len(data_from_main_file)))
os.remove(main_file_name)
data_from_main_file = np.append(data_from_main_file,data_from_new_file)
print("data after appending {}".format(len(data_from_main_file)))
write_append_to_ecg_file(main_file,data_from_main_file,meta)
token = upload_files_storage(bucket_name,main_file,storage_folder_name = main_file_blob,upload_file = True)
else:
write_append_to_ecg_file(main_file,data_from_new_file,meta)
token = upload_files_storage(bucket_name,main_file,storage_folder_name = main_file_blob,upload_file = True)
The GCF is deployed
gcloud functions deploy storage_trigger_function --runtime python37 --trigger-resource patch-us.appspot.com --trigger-event google.storage.object.finalize --timeout 540s --memory 8192MB
For the first file, I was able to read the file and write the data to the main file. But after uploading the 2nd file, its giving Function execution took 70448 ms, finished with status: 'connection error' On uploading the 3rd file, it gives the Function invocation was interrupted. Error: memory limit exceeded. Despite of deploying the function with 8192MB memory, I am getting this error. Can I get some help on this.

How to Deploy model using plumber API from CMD line R?

New to using plumber API, trying to deploy R model, I have saved the R Model and a test data (OneRecord). Ran the plumber API from CMD line, 127.0.0.1:8000 returns Error "{"error":["500 - Internal server error"]}"
and the terminal shows an error of "simpleError in if (opts$show.learner.output) identity else capture.output: argument is of length zero"
My R Code
#plumb_test.R
library(plumber)
#Simple msg command
#* #apiTitle Plumber Example API
#* Echo back the input
#* #param msg The message to echo
#* #get /echo
function(msg=""){
list(msg = paste0("The message is: '", msg, "'"))
}
#My Model
#* #get /run
function(){
rf_prediction <- predict(readRDS("rf_unwrap.rds"), newdata = as.data.frame(readRDS("Test_data.Rds")))
rf_prediction$data
}
R Code for plumber run
library(plumber)
pr <- plumb("plumb_test.R")
pr$run(port=8000)
msg working properly
http://127.0.0.1:8000/echo?msg=hellohru
returns me
{"msg":["The message is: 'hellohru'"]}
But my model returns
{"error":["500 - Internal server error"]}
in the terminal I am getting
> pr$run(port=8000)
Starting server to listen on port 8000
<simpleError in if (opts$show.learner.output) identity else capture.output: argument is of length zero>
I am running from windows cmd line as follows
C:\R\R-3.5.2\bin>r -f plumb_run.R
All the files were in bin folder (model, test data, plumb R scripts)
Expecting the output of prediction, not sure what the error means.
loading mlr library along with plumber and using print in the function, everything worked
library(plumber)
library(mlr)
#My Model
#* #get /run
function(){
print(predict(readRDS("rf_unwrap.rds"), newdata = as.data.frame(readRDS("Test_data.Rds")))$data)
}

How do I find out what's buffering the communication from qemu to pexpect?

I have a Python2 program that runs qemu with a FreeBSD image.
expect()ing lines out output works.
However, expect()ing output that does not have its line terminated (such as when waiting for a prompt like login:) does not, this times out.
I suspect something in the communication between qemu and my program is doing line buffering, but how do I find out which of them it is? Candidates that I can think of:
FreeBSD itself. I find that unlikely, it shows prompts when running interactively, and qemu's -nographics options shouldn't make a difference for the emulated VM (but I may be wrong).
Something in the setup of the pty. I have zero experience with ptys. If that's the issue, this would be a bug in pexpect since pexpect is setting the pty up.
A bug in pexpect.
Something in my own script... but I have no clue what that could be.
For reference, here's the stripped-down code (including download and unpack, should anybody want to play with it):
#! /usr/bin/env python2
import os
import pexpect
import re
import sys
import time
def run(cmd):
'''Run command, log to stdout, no timeout, return the status code.'''
print('run: ' + cmd)
(output, rc) = pexpect.run(
cmd,
withexitstatus=1,
encoding='utf-8',
logfile=sys.stdout,
timeout=None
)
if rc != 0:
print('simple.py: Command failed with return code: ' + rc)
exit(rc)
download_path = 'https://download.freebsd.org/ftp/releases/VM-IMAGES/12.0-RELEASE/amd64/Latest'
image_file = 'FreeBSD-12.0-RELEASE-amd64.qcow2'
image_file_xz = image_file + '.xz'
if not os.path.isfile(image_file_xz):
run('curl -o %s %s/%s' % (image_file_xz, download_path, image_file_xz))
if not os.path.isfile(image_file):
# Reset image file to initial state
run('xz --decompress --keep --force --verbose ' + image_file_xz)
#cmd = 'qemu-system-x86_64 -snapshot -monitor none -display curses -chardev stdio,id=char0 ' + image_file
cmd = 'qemu-system-x86_64 -snapshot -nographic ' + image_file
print('interact with: ' + cmd)
child = pexpect.spawn(
cmd,
timeout=90, # FreeBSD takes roughly 60 seconds to boot
maxread=1,
)
child.logfile = sys.stdout
def expect(pattern):
result = child.expect([pexpect.TIMEOUT, pattern])
if result == 0:
print("timeout: %d reached when waiting for: %s" % (child.timeout, pattern))
exit(1)
return result - 1
if False:
# This does not work: the prompt is not visible, then timeout
expect('login: ')
else:
# Workaround, tested to work:
expect(re.escape('FreeBSD/amd64 (freebsd)')) # Line before prompt
time.sleep(1) # MUCH longer than actually needed, just to be safe
child.sendline('root')
# This will always time out, and terminate the script
expect('# ')
print('We want to get here but cannot')

Using the LDAvis package in R to create a gist file of the result

I'm using LDAvis for topic modeling and trying to use the as.gist option to create a gist. When serVis executes there is a timeout in curl::curl_fetch_memory after about 10 seconds. If I immediately execute serVis again I get a different error 'Problems parsing JSON' and from then on whenever serVis is run that same error recurs.
If I start all over with a fresh workspace the same behavior occurs. The first time serVis is run, curl::curl_fetch_memory times out after about 10 seconds. Subsequent executions return 'Problems parsing JSON'.
If I don't use the as.gist option it works fine, but of course doesn't create a gist.
Very rarely, it works and a gist is created. If I change parameters to reduce the size of the JSON object it usually works, which makes me think it may be related to size.
I have explored the various RCurlOptions timeout settings. Currently, they are set as
options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem",
package = "RCurl"),
connecttimeout = 300, timeout = 3000,
followlocation = TRUE, dns.cache.timeout = 300))
Below is a console listing with debug set on curl::curl_fetch_memory.
> json <- createJSON(phi = cases$phi,
+ theta = cases$theta,
+ doc.len .... [TRUNCATED]
> serVis(json, open.browser = TRUE, as.gist = TRUE, description = 'APM Community')
debugging in: curl::curl_fetch_memory(url, handle = handle)
debug: {
output <- .Call(R_curl_fetch_memory, url, handle)
res <- handle_response_data(handle)
res$content <- output
res
}
Browse[2]> output <- .Call(R_curl_fetch_memory, url, handle)
Error: Timeout was reached
Browse[2]> output <- .Call(R_curl_fetch_memory, url, handle)
Browse[2]> rawToChar(output)
[1] "{\"message\":\"Problems parsing JSON\",\"documentation_url\":\"https://developer.github.com/v3\"}"
Browse[2]>
.
.
exiting from: curl::curl_fetch_memory(url, handle = handle)
Error: Problems parsing JSON
Any hints on how to debug this problem?

How to get list of changed files since last build in Jenkins/Hudson

I have set up Jenkins, but I would like to find out what files were added/changed between the current build and the previous build. I'd like to run some long running tests depending on whether or not certain parts of the source tree were changed.
Having scoured the Internet I can find no mention of this ability within Hudson/Jenkins though suggestions were made to use SVN post-commit hooks. Maybe it's so simple that everyone (except me) knows how to do it!
Is this possible?
I have done it the following way. I am not sure if that is the right way, but it seems to be working. You need to get the Jenkins Groovy plugin installed and do the following script.
import hudson.model.*;
import hudson.util.*;
import hudson.scm.*;
import hudson.plugins.accurev.*
def thr = Thread.currentThread();
def build = thr?.executable;
def changeSet= build.getChangeSet();
changeSet.getItems();
ChangeSet.getItems() gives you the changes. Since I use accurev, I did List<AccurevTransaction> accurevTransList = changeSet.getItems();.
Here, the modified list contains duplicate files/names if it has been committed more than once during the current build window.
The CI server will show you the list of changes, if you are polling for changes and using SVN update. However, you seem to want to be changing the behaviour of the build depending on which files were modified. I don't think there is any out-of-the-box way to do that with Jenkins alone.
A post-commit hook is a reasonable idea. You could parameterize the job, and have your hook script launch the build with the parameter value set according to the changes committed. I'm not sure how difficult that might be for you.
However, you may want to consider splitting this into two separate jobs - one that runs on every commit, and a separate one for the long-running tests that you don't always need. Personally I prefer to keep job behaviour consistent between executions. Otherwise traceability suffers.
echo $SVN_REVISION
svn_last_successful_build_revision=`curl $JOB_URL'lastSuccessfulBuild/api/json' | python -c 'import json,sys;obj=json.loads(sys.stdin.read());print obj["'"changeSet"'"]["'"revisions"'"][0]["'"revision"'"]'`
diff=`svn di -r$SVN_REVISION:$svn_last_successful_build_revision --summarize`
You can use the Jenkins Remote Access API to get a machine-readable description of the current build, including its full change set. The subtlety here is that if you have a 'quiet period' configured, Jenkins may batch multiple commits to the same repository into a single build, so relying on a single revision number is a bit naive.
I like to keep my Subversion post-commit hooks relatively simple and hand things off to the CI server. To do this, I use wget to trigger the build, something like this...
/usr/bin/wget --output-document "-" --timeout=2 \
https://ci.example.com/jenkins/job/JOBID/build?token=MYTOKEN
The job is then configured on the Jenkins side to execute a Python script that leverages the BUILD_URL environment variable and constructs the URL for the API from that. The URL ends up looking like this:
https://ci.example.com/jenkins/job/JOBID/BUILDID/api/json/
Here's some sample Python code that could be run inside the shell script. I've left out any error handling or HTTP authentication stuff to keep things readable here.
import os
import json
import urllib2
# Make the URL
build_url = os.environ['BUILD_URL']
api = build_url + 'api/json/'
# Call the Jenkins server and figured out what changed
f = urllib2.urlopen(api)
build = json.loads(f.read())
change_set = build['changeSet']
items = change_set['items']
touched = []
for item in items:
touched += item['affectedPaths']
Using the Build Flow plugin and Git:
final changeSet = build.getChangeSet()
final changeSetIterator = changeSet.iterator()
while (changeSetIterator.hasNext()) {
final gitChangeSet = changeSetIterator.next()
for (final path : gitChangeSet.getPaths()) {
println path.getPath()
}
}
With Jenkins pipelines (pipeline supporting APIs plugin 2.2 or above), this solution is working for me:
def changeLogSets = currentBuild.changeSets
for (int i = 0; i < changeLogSets.size(); i++) {
def entries = changeLogSets[i].items
for (int j = 0; j < entries.length; j++) {
def entry = entries[j]
def files = new ArrayList(entry.affectedFiles)
for (int k = 0; k < files.size(); k++) {
def file = files[k]
println file.path
}
}
}
See How to access changelogs in a pipeline job.
Through Groovy:
<!-- CHANGE SET -->
<% changeSet = build.changeSet
if (changeSet != null) {
hadChanges = false %>
<h2>Changes</h2>
<ul>
<% changeSet.each { cs ->
hadChanges = true
aUser = cs.author %>
<li>Commit <b>${cs.revision}</b> by <b><%= aUser != null ? aUser.displayName : it.author.displayName %>:</b> (${cs.msg})
<ul>
<% cs.affectedFiles.each { %>
<li class="change-${it.editType.name}"><b>${it.editType.name}</b>: ${it.path} </li> <% } %> </ul> </li> <% }
if (!hadChanges) { %>
<li>No Changes !!</li>
<% } %> </ul> <% } %>
#!/bin/bash
set -e
job_name="whatever"
JOB_URL="http://myserver:8080/job/${job_name}/"
FILTER_PATH="path/to/folder/to/monitor"
python_func="import json, sys
obj = json.loads(sys.stdin.read())
ch_list = obj['changeSet']['items']
_list = [ j['affectedPaths'] for j in ch_list ]
for outer in _list:
for inner in outer:
print inner
"
_affected_files=`curl --silent ${JOB_URL}${BUILD_NUMBER}'/api/json' | python -c "$python_func"`
if [ -z "`echo \"$_affected_files\" | grep \"${FILTER_PATH}\"`" ]; then
echo "[INFO] no changes detected in ${FILTER_PATH}"
exit 0
else
echo "[INFO] changed files detected: "
for a_file in `echo "$_affected_files" | grep "${FILTER_PATH}"`; do
echo " $a_file"
done;
fi;
It is slightly different - I needed a script for Git on a particular folder...
So, I wrote a check based on jollychang.
It can be added directly to the job's exec shell script. If no files are detected it will exit 0, i.e. SUCCESS... this way you can always trigger on check-ins to the repository, but build when files in the folder of interest change.
But... If you wanted to build on-demand (i.e. clicking Build Now) with the changed from the last build.. you would change _affected_files to:
_affected_files=`curl --silent $JOB_URL'lastSuccessfulBuild/api/json' | python -c "$python_func"`
Note: You have to use Jenkins' own SVN client to get a change list. Doing it through a shell build step won't list the changes in the build.
It's simple, but this works for me:
$DirectoryA = "D:\Jenkins\jobs\projectName\builds" ####Jenkind directory
$firstfolder = Get-ChildItem -Path $DirectoryA | Where-Object {$_.PSIsContainer} | Sort-Object LastWriteTime -Descending | Select-Object -First 1
$DirectoryB = $DirectoryA + "\" + $firstfolder
$sVnLoGfIle = $DirectoryB + "\" + "changelog.xml"
write-host $sVnLoGfIle
I tried to add that to comments but code in comments is no way:
Just want to prettify code from heroin's answer:
def changedFiles = []
def changeLogSets = currentBuild.changeSets
for (entries in changeLogSets) {
for (entry in entries) {
for (file in entry.affectedFiles) {
echo "Found changed file: ${file.path}"
changedFiles += "${file.path}"
}
}
}
Keep in mind for some cases git plugin returns empty changeSet, like:
First run in newly created branch
'Build now' button build
Refer to https://issues.jenkins-ci.org/browse/JENKINS-26354 for more details.