Save R plot to web server - mysql

I'm trying to create a procedure which extracts data from a MySQL server (using the RODBC package), performs some statistical routines on that data in R, then saves generated plots back to the server such that they can be retrieved in a Web Browser via a little bit of php and web magic.
My plan is to save the plot in a MySQL BLOB field by using the RODBC package to execute a SQL insert into statement. I think I can insert the data directly as a string. Problem is, how do I get the data string and will this even work? My best thought is to use the savePlot function to save a temp file and then read it back in somehow.
Anybody tried this before or have suggestions on how to approach this?

Regardless of if you think this is a terrible idea, here is a working answer I was able to piece together from this post
## open connection
library(RODBC)
channel <- odbcConnect("")
## generate a plot and save it to a temp file
x <- rnorm(100,0,1)
hist(x, col="light blue")
savePlot("temp.jpg", type="jpeg")
## read back in the temp file as binary
plot_binary <- paste(readBin("temp.jpg", what="raw", n=1e6), collapse="")
## insert it into a table
sqlQuery(channel, paste("insert into test values (1, x'",plot_binary,"')", sep=""))
## close connection
odbcClose(channel)
Before implementation, I'll make sure to do some soul searching to decide if this should be used rather than using the servers file system.

Storing images in databases is often frowned upon. To create an in memory file in R you can use a textConnection as a connection. This will give you the string. It will work if you don't forget to set the proper mime type and open the connection as binary.

Save the plot to a server and write the filename into the database will work. But there's this thing called Rapache may help. Plus, Jeroen Ooms has some online demo, including a web interface for Hadley Wickham's famous R Graph package ggplot2.

Related

How do I download gridded sst data?

I've recently been introduced to R and trying the heatwaveR package. I get an error when loading erddap data ... Here's the code I have used so far:
library(rerddap)
library(ncdf4)
info(datasetid = "ncdc_oisst_v2_avhrr_by_time_zlev_lat_lon", url = "https://www.ncei.noaa.gov/erddap/")
And I get the following error:
Error in curl::curl_fetch_memory(x$url$url, handle = x$url$handle) :
schannel: next InitializeSecurityContext failed: SEC_E_INVALID_TOKEN (0x80090308) - The token supplied to the function is invalid
Would like some help in this. I'm new to this website too so I apologize if the above question is not as per standards (codes to be typed in a grey box, etc.)
Someone directed this post to my attention from the heatwaveR issues page on GitHub. Here is the answer I provided for them:
I do not manage the rerddap package so can't say exactly why it may be giving you this error. But I can say that I have noticed lately that the OISST data are often not available on the ERDDAP server in question. I (attempt to) download fresh data every day and am often denied with an error similar to the one you posted. It's gotten to the point where I had to insert some logic gates into my download script so it tells me that the data aren't currently being hosted before it tries to download them. I should also point out that one may download the "final" data from this server, which have roughly a two week delay from present day, as well as the "preliminary (prelim)" data, which are near-real-time but haven't gone through all of the QC steps yet. These two products are accounted for in the following code:
# First download the list of data products on the server
server_data <- rerddap::ed_datasets(which = "griddap", "https://www.ncei.noaa.gov/erddap/")$Dataset.ID
# Check if the "final" data are currently hosted
if(!"ncdc_oisst_v2_avhrr_by_time_zlev_lat_lon" %in% server_data)
stop("Final data are not currently up on the ERDDAP server")
# Check if the "prelim" data are currently hosted
if(!"ncdc_oisst_v2_avhrr_prelim_by_time_zlev_lat_lon" %in% server_data)
stop("Prelim data are not currently up on the ERDDAP server")
If the data are available I then check the times/dates available with these two lines:
# Download final OISST meta-data
final_info <- rerddap::info(datasetid = "ncdc_oisst_v2_avhrr_by_time_zlev_lat_lon", url = "https://www.ncei.noaa.gov/erddap/")
# Download prelim OISST meta-data
prelim_info <- rerddap::info(datasetid = "ncdc_oisst_v2_avhrr_prelim_by_time_zlev_lat_lon", url = "https://www.ncei.noaa.gov/erddap/")
I ran this now and it looks like the data are currently available. Is your error from today, or from a day or two ago? The availability seems to cycle over the week but I haven't quite made sense of any pattern yet. It is also important to note that about a day before the data go dark they are filled with all sorts of massive errors. So I've also had to add error trapping into my code that stops the data aggregation process once it detects temperatures in excess of some massive number. In this case it is something like1^90, but the number isn't consistent meaning it is not a missing value placeholder.
To manually see for yourself if the data are being hosted you can go to this link and scroll to the bottom:
https://www.ncei.noaa.gov/erddap/griddap/index.html
All the best,
-Robert

R: jsonlite's stream_out function producing incomplete/truncated JSON file

I'm trying to load a really big JSON file into R. Since the file is too big to fit into memory on my machine, I found that using the jsonlite package's stream_in/stream_out functions is really helpful. With these functions, I can subset the data first in chunks without loading it, write the subset data to a new, smaller JSON file, and then load that file as a data.frame. However, this intermediary JSON file is getting truncated (if that's the right term) while being written with stream_out. I will now attempt to explain with further detail.
What I'm attempting:
I have written my code like this (following an example from documentation):
con_out <- file(tmp <- tempfile(), open = "wb")
stream_in(file("C:/User/myFile.json"), handler = function(df){
df <- df[which(df$Var > 0), ]
stream_out(df, con_out, pagesize = 1000)
}, pagesize = 5000)
myData <- stream_in(file(tmp))
As you can see, I open a connection to a temporary file, read my original JSON file with stream_in and have the handler function subset each chunk of data and write it to the connection.
The problem
This procedure runs without any problems, until I try to read it in myData <- stream_in(file(tmp)), upon which I receive an error. Manually opening the new, temporary JSON file reveals that the bottom-most line is always incomplete. Something like the following:
{"Var1":"some data","Var2":3,"Var3":"some othe
I then have to manually remove that last line after which the file loads without issue.
Solutions I've tried
I've tried reading the documentation thoroughly and looking at the stream_out function, and I can't figure out what may be causing this issue. The only slight clue I have is that the stream_out function automatically closes the connection upon completion, so maybe it's closing the connection while some other component is still writing?
I inserted a print function to print the tail() end of the data.frame at every chunk inside the handler function to rule out problems with the intermediary data.frame. The data.frame is produced flawlessly at every interval, and I can see that the final two or three rows of the data.frame are getting truncated while being written to file (i.e., they're not being written). Notice that it's the very end of the entire data.frame (after stream_out has rbinded everything) that is getting chopped.
I've tried playing around with the pagesize arguments, including trying very large numbers, no number, and Inf. Nothing has worked.
I can't use jsonlite's other functions like fromJSON because the original JSON file is too large to read without streaming and it is actually in minified(?)/ndjson format.
System info
I'm running R 3.3.3 x64 on Windows 7 x64. 6 GB of RAM, AMD Athlon II 4-Core 2.6 Ghz.
Treatment
I can still deal with this issue by manually opening the JSON files and correcting them, but it's leading to some data loss and it's not allowing my script to be automated, which is an inconvenience as I have to run it repeatedly throughout my project.
I really appreciate any help with this; thank you.
I believe this does what you want, it is not necessary to do the extra stream_out/stream_in.
myData <- new.env()
stream_in(file("MOCK_DATA.json"), handler = function(df){
idx <- as.character(length(myData) + 1)
myData[[idx]] <- df[which(df$id %% 2 == 0), ] ## change back to your filter
}, pagesize = 200) ## change back to 1000
myData <- myData %>% as.list() %>% bind_rows()
(I created some mock data in Mockaroo: generated 1000 lines, hence the small pagesize, to check if everything worked with more than one chunk. The filter I used was even IDs because I was lazy to create a Var column.)

When using IBM iSeries Client Access to create a .CSV file from an AS400 db file, can I squeeze out leading blanks?

I have a very simple iseries AS400 DDS file defined in this way
A R U110055R TEXT('POSITIVE PAY')
A ACCT55 9A
A SERL55 10A
A ISSD55 10A
A AMT55 13A
A NAME55 50A
I am using Data Transfer from System i (IBM I Access for Windows V6R1)
to output a csv to the desktop. It will then be used by our banking
PC software. I cannot find a setting or a file type that will yield
the data without leading blanks. It consistently holds the file sizes
of the Database file (without trailing blanks).
I need:
"192345678","311","07/22/2016","417700","ALICE BROWN CO."
"192345678","2887","07/22/2016","4124781","BARBIE LLC."
"192345678","2888","07/22/2016","4766","ROBERT BLUE, INC."
"192345678","2889","07/22/2016","71521","NANCYS COOKIES, INC"
"192345678","312","07/22/2016","67041","FRANKS MARKET"
But I get:
"192345678"," 311","07/22/2016"," 11417700","ALICE BROWN CO."
"192345678"," 887","07/22/2016"," 4124781","BARBIE LLC."
"192345678"," 888","07/22/2016"," 3204766","ROBERT BLUE, INC."
"192345678"," 301","07/22/2016"," 2971521","NANCY, INC"
"192345678"," 890","07/22/2016"," 967041","FRANKS MARKET"
The Data Options button of the Data Transfer from IBM i application brings up a window where you can explicitly specify the SQL statement used to extract data.
You can use the SQL TRIM() function there...
Limitation of DDS:
Use of TRIM in DDS
The answer seems to be "You can't." If I were doing this, I'd probably
create a SQL VIEW with a DDL statement rather than try to get it done
with DDS.
Solution: (W3School) or (IBM Knowledge Center)
CREATE VIEW U110055V1 as
SELECT TRIM(ACCT55),
TRIM(SERL55),
TRIM(ISSD55),
TRIM(AMT55),
TRIM(NAME55)
FROM U110055
RCDFMT U110055V1R;
Suggestion:
Export the newly created View to a .CSV file (refer here or here)
Share the IFS file on the network for easy consistent automated access, (No manual export with i Access required).

How to upload multiple JSON files into CouchDB

I am new to CouchDB. I need to get 60 or more JSON files in a minute from a server.
I have to upload these JSON files to CouchDB individually as soon as I receive them.
I installed CouchDB on my Linux machine.
I hope some one can help me with my requirement.
If possible can someone help me with pseudo code.
My Idea:
Is to write a python script for uploading all JSON files to CouchDB.
Each and every JSON file must be each document and the data present in
JSON must be inserted same into CouchDB
(the specified format with values in a file).
Note:
These JSON files are Transactional, every second 1 file is generated
so I need to read the file upload as same format into CouchDB on
successful uploading archive the file into local system of different folder.
python program to parse the json and insert into CouchDb:
import sys
import glob
import errno,time,os
import couchdb,simplejson
import json
from pprint import pprint
couch = couchdb.Server() # Assuming localhost:5984
#couch.resource.credentials = (USERNAME, PASSWORD)
# If your CouchDB server is running elsewhere, set it up like this:
couch = couchdb.Server('http://localhost:5984/')
db = couch['mydb']
path = 'C:/Users/Desktop/CouchDB_Python/Json_files/*.json'
#dirPath = 'C:/Users/VijayKumar/Desktop/CouchDB_Python'
files = glob.glob(path)
for file1 in files:
#dirs = os.listdir( dirPath )
file2 = glob.glob(file1)
for name in file2: # 'file' is a builtin type, 'name' is a less-ambiguous variable name.
try:
with open(name) as f: # No need to specify 'r': this is the default.
#sys.stdout.write(f.read())
json_data=f
data = json.load(json_data)
db.save(data)
pprint(data)
json_data.close()
#time.sleep(2)
except IOError as exc:
if exc.errno != errno.EISDIR: # Do not fail if a directory is found, just ignore it.
raise # Propagate other kinds of IOError.
I would use CouchDB bulk API, even though you have specified that you need to send them to db one by one. For example, by implementing a simple queue that gets sent out every say 5 - 10 seconds via a bulk doc call will greatly increase performance of your application.
There is obviously a quirk in that and that is you need to know the IDs of the docs that you want to get from the DB. But for the PUTs it is perfect. (it is not entirely true, you can get ranges of docs using bulk operation if the IDs you are using for your docs can be sorted nicely).
From my experience working with CouchDB, I have a hunch that you are dealing with Transactional documents in order to compile them into some sort of sum result and act on that data accordingly (maybe creating next transactional doc in series). For that you can rely on CouchDB by using 'reduce' functions on the views you create. It takes a little practice to get reduce function working properly and is highly dependent on what it is you actually what to achieve and what data you are prepared to emit by the view so I can't really provide you with more detail on that.
So in the end the app logic would go something like that:
get _design/someDesign/_view/yourReducedView
calculate new transaction
add transaction to queue
onTimeout
send all in transaction queue
If I got that first part of why you are using transactional docs wrong all that would really change is the part where you getting those transactional docs in my app logic.
Also, before writing your own 'reduce' function, have a look at buil-in ones (they are alot faster then anything outside of db engine can do)
http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API
EDIT:
Since you are starting, I strongly recommend to have a look at CouchDB Definitive Guide.
NOTE FOR LATER:
Here is one hidden stone (well maybe not so much a hidden stone but not an obvious thing to look out for for the new-comer in any case). When you write reduce function make sure that it does not produce too much output for the query without boundaries. This will extremely slow down the entire view even when you provide reduce=false when getting stuff from it.
So you need to get JSON documents from a server and send them to CouchDB as you receive them. A Python script would work fine. Here is some pseudo-code:
loop (until no more docs)
get new JSON doc from server
send JSON doc to CouchDB
end loop
In Python, you could use requests to send the documents to CouchDB and probably to get the documents from the server as well (if it is using an HTTP API).
You might want to checkout the pycouchdb module for python3. I've used it myself to upload lots of JSON objects into couchdb instance. My project does pretty much the same as you describe so you can take a look at my project Pyro at Github for details.
My class looks like that:
class MyCouch:
""" COMMUNICATES WITH COUCHDB SERVER """
def __init__(self, server, port, user, password, database):
# ESTABLISHING CONNECTION
self.server = pycouchdb.Server("http://" + user + ":" + password + "#" + server + ":" + port + "/")
self.db = self.server.database(database)
def check_doc_rev(self, doc_id):
# CHECKS REVISION OF SUPPLIED DOCUMENT
try:
rev = self.db.get(doc_id)
return rev["_rev"]
except Exception as inst:
return -1
def update(self, all_computers):
# UPDATES DATABASE WITH JSON STRING
try:
result = self.db.save_bulk( all_computers, transaction=False )
sys.stdout.write( " Updating database" )
sys.stdout.flush()
return result
except Exception as ex:
sys.stdout.write( "Updating database" )
sys.stdout.write( "Exception: " )
print( ex )
sys.stdout.flush()
return None
Let me know in case of any questions - I will be more than glad to help if you will find some of my code usable.

How to export a data frame in R to a table in MySQL

I tried sqlSave() in RODBC, but it runs super slowly. Is there an alternative way of doing this?
You could look at the package RMySQL. I am using it and it offers quite some convenience loading and reading data from a MySQL database. That being said it is limited in the queries you can use (e.g. HAVING is not possible IIRC). I can't say it's super-quick or my data is that big, but it's several 2-digits MB of text and it's ok. Depends on what you expect. However it's convenient:
con <- dbConnect(MySQL(), user="user", password="pass",
dbname="mydb", host="localhost",client.flag=CLIENT_MULTI_STATEMENTS)
dbListTables(con)
yourtable <- dbReadTable(con,"sometable")
# write it back
dbWriteTable(con,"yourTableinMySQL",yourtable,overwrite=T)
# watch out with the overwrite argument it does what it says :)
dbDisconnect(con)
yourtable will be a data.frame. Sometimes it bugs me that the modes are not set like i'd expect, but I have a custom made function for that. Just need to improve it then I'll post it here.
http://cran.r-project.org/web/packages/RMySQL/
You need to have mysql client codes installed before you try installing RMySQL. Without knowing what OS and version you use it is really not possible to give any better answer.