Web2Py - generic view for csv? - csv

For web2py there are generic views e.g. for JSON.
I could not find a sample.
When looking at the web2py manual 10.1.2 and 10.1.6, its written:
'.. define a "generic.csv" file, but one would have to specify the name of the object to be serialized ("animals" in the example)'
Looking at the generic pdf view
{{
import os
from gluon.contrib.generics import pdf_from_html
filename = '%s/%s.html' % (request.controller,request.function)
if os.path.exists(os.path.join(request.folder,'views',filename)):
html=response.render(filename)
else:
html=BODY(BEAUTIFY(response._vars))
pass
=pdf_from_html(html)
}}
and also the specified csv (Manual charpter 10.1.6):
{{
import cStringIO
stream=cStringIO.StringIO() animals.export_to_csv_file(stream)
response.headers['Content-Type']='application/vnd.ms-excel'
response.write(stream.getvalue(), escape=False)
}}
Massimo is writing: 'web2py does not provide a "generic.csv";'
He is not fully against it but..
So lets try to get it and deactivate when necessary.
The generic view should look similar to (the non working)
(well, this we better call pseudocode as it is not working):
{{
import os
from gluon.contrib.generics export export_to_csv_file(stream)
filename = '%s/%s' % (request.controller,request.function)
if os.path.exists(os.path.join(request.folder,'views',filename)):
csv=response.render(filename)
else:
csv=BODY(BEAUTIFY(response._vars))
pass
= export_to_csv_file(stream)
}}
Whats wrong?
Or is there a sample?
Is there a reson not to have a generic csv?

{{
import os
from gluon.contrib.generics export export_to_csv_file(stream)
filename = '%s/%s' % (request.controller,request.function)
if os.path.exists(os.path.join(request.folder,'views',filename)):
csv=response.render(filename)
else:
csv=BODY(BEAUTIFY(response._vars))
pass
= export_to_csv_file(stream)
}}
Adapting the generic.pdf code so literally as above would not work for CSV output, as the generic.pdf code is first executing the standard HTML template and then simply converting the generated HTML to a PDF. This approach does not make sense for CSV, as CSV requires data of a particular structure.
As stated in the documentation:
Notice that one could also define a "generic.csv" file, but one would
have to specify the name of the object to be serialized ("animals" in
the example). This is why we do not provide a "generic.csv" file.
The execution of a view is triggered by a controller action returning a dictionary. The keys of the dictionary become available as variables in the view execution environment (the entire dictionary is also available as response._vars). If you want to create a generic.csv view, you therefore need to establish some conventions about what variables are in the returned dictionary as well as the possible structure(s) of the returned data.
For example, the controller could return something like dict(data=mydata). The code in generic.csv would then access the data variable and could convert it to CSV. In that case, there would have to be some convention about the structure of data -- perhaps it could be required to be a list of dictionaries or a DAL Rows object (or optionally either one).
Another possible convention is for the controller to return something like dict(columns=mycolumns, rows=myrows), where columns is a list of column names and rows is a list of lists containing the data for each row.
The point is, there is no universal convention for what the controller might return and how that can be converted into CSV, so you first need to decide on some conventions and then write generic.csv accordingly.
For example, here is a very simple generic.csv that would work only if the controller returns dict(rows=myrows), where myrows is a DAL Rows object:
{{
import cStringIO
stream=cStringIO.StringIO() rows.export_to_csv_file(stream)
response.headers['Content-Type']='application/vnd.ms-excel'
response.write(stream.getvalue(), escape=False)
}}

I tried:
# Sample from Web2Py manual 10.1.1 Page 464
def count():
session.counter = (session.counter or 0) + 1
return dict(counter=session.counter, now = request.now)
#and my own creation from a SQL table (if possible used for json and csv):
def csv_rt_bat_c_x():
battdat = db().select(db.csv_rt_bat_c.rec_time, db.csv_rt_bat_c.cellnr,
db.csv_rt_bat_c.volt_act, db.csv_rt_bat_c.id).as_list()
return dict(battdat=battdat)
Bot times I get an error when trying csv. It works for /default/count.json but not for /default/count.csv
I suppose the requirement:
dict(rows=myrows)
"where myrows is a DAL Rows object" is not met.

Related

Palantir foundry code workbook, export individual xmls from dataset

I have a dataset which have an xml column and i am trying to export individual xmls as files with filename being in another column using codeworkbook
I filtered the rows i want using below code
def prepare_input(xml_with_debug):
from pyspark.sql import functions as F
filter_column = "key"
filter_value = "test_key"
df_filtered = xml_with_debug.filter(filter_value == F.col(filter_column))
approx_number_of_rows = 1
sample_percent = float(approx_number_of_rows) / df_filtered.count()
df_sampled = df_filtered.sample(False, sample_percent, seed=0)
important_columns = ["key", "xml"]
return df_sampled.select([F.col(c).cast(F.StringType()).alias(c) for c in important_columns])
It works till here. Now for the last part i tried this in a python task, but was complaining about the parameters (i should have set it up wrongly). But even if it works it will be as a single file i think.
from transforms.api import transform, Input, Output
#transform(
output=Output("/path/to/python_csv"),
my_input=Input("/path/to/input")
)
def my_compute_function(output, my_input):
output.write_dataframe(my_input.dataframe().coalesce(1), output_format="csv", options={"header": "true"})
I am trying to set it up in GUI like below
My question i guess is, what will be the code in the last Python task (write_file) after the prepare input so that i extract individual xmls (And if possible zip them into single file for download)
You can access the output dataset filesystem and write files into it in whatever format you want.
The documentation for that can be found here: https://www.palantir.com/docs/foundry/code-workbook/transforms-unstructured/#writing-files
(If you want to do it from a code repository it's very similar https://www.palantir.com/docs/foundry/transforms-python/unstructured-files/#writing-files)
By doing that you can create multiple different files or you can create a single zip file and write it into a dataset.

How do I get django "Data too long for column '<column>' at row" errors to print the actual value?

I have a Django application. Sometimes in production I get an error when uploading data that one of the values is too long. It would be very helpful for debugging if I could see which value was the one that went over the limit. Can I configure this somehow? I'm using MySQL.
It would also be nice if I could enable/disable this on a per-model or column basis so that I don't leak user data to error logs.
When creating model instances from outside sources, one must take care to validate the input or have other guarantees that this data cannot violate constraints.
When not calling at least full_clean() on the model, but directly calling save, one bypasses Django's validators and will only get alerted to the problem by the database driver at which point it's harder to obtain diagnostics:
class JsonImportManager(models.Manager):
def import(self, json_string: str) -> int:
data_list = json.loads(json_string) # list of objects => list of dicts
failed = 0
for data in data_list:
obj = self.model(**data)
try:
obj.full_clean()
except ValidationError as e:
print(e.message_dict) # or use better formatting function
failed += 1
else:
obj.save()
return failed
This is of course very simple, but it's a good boilerplate to get started with.

Saving JSON data to DB in Zotonic

Im trying to write a small app that retrieves a JSON file (it contains a list of items, which all have some properties), saves its contents to the DB and then displays some of it later on. I have Zotonic up and running, and generating some HTML is no problem.
ATM i'm stuck trying to figure out how to define a custom resource and how to get the data from the JSON in the DB. When the data is there I should be fine, that part seems covered ok by the documentation.
I wrote some standalone erlang scripts that fetch the data and I noticed that Zotonic has a library for decoding JSON so that part should be fine. Any tips on where to put which code or where to look further?
The z_db module allows for creating custom tables by using:
z_db:create_table(Table, Cols, Context).
The Table variable is your table name which can be either an atom or a list containing a single atom.
The Cols is a list of column definitions, which are defined by records. Currently the record definition (you can find this in include/zotonic.hrl) is:
-record(column_def, {name, type, length, is_nullable=true, default, primary_key}).
See Erlang docs on records for more info on records
Example code which I put in users/sites/[sitename]/models/m_[sitename].erl:
init(Context) ->
case z_db:table_exists(?table,Context) of
false ->
z_db:create_table(tablename,
[
#column_def{name=id, type="serial"},
#column_def{name=gid, type="integer", is_nullable=false},
#column_def{name=magnitude, type="real"},
#column_def{name=depth, type="real"},
#column_def{name=location, type="character varying"},
#column_def{name=time, type="integer"},
#column_def{name=date, type="integer"}
], Context);
true -> ok
end,
ok.
Pay attention to what options of the record you specify. Most of the errors I got were e.g. from specifying a length on the integer fields.
The models/m_sitename:init/1 does not get called on site start. The sitename:init/1 does get called so I call the init function there to ensure the table exists. Example:
init(Context) ->
m_sitename:init(Context).
It is called by Zotonic with the Context variable of the site automatically. You can get this variable manually as well with z:c(sitename).. So if you call the m_sitename:init(Context). from somewhere else you would do:
m_sitename:init(z:c(sitename)).
Next, insertion in the DB can be done with:
z_db:insert(Table, PropList, Context).
Where Table is again an atom or a list containing a single atom representing the table name. Context is the same as above.
PropList is a property list which is a list containing tuples consisting of two elements where the first is an atom and the second is its associated value/property. Example:
PropList = [
{row, Value},
{anotherrow, AnotherValue}
].
Table = tablename.
Context = z:c(sitename).
z_db:insert(Table, PropList, Context).
See Erlang docs on Property Lists for more info on property lists.
=== The dependencies have been updated so if you build from source the step directly below is no longer needed ===
The JSON part is bit more tricky. Included with Zotonic are mochijson2 and as a secondary dependency also jiffy. The latest version of jiffy contains jiffy:decode/2 which allows you to specify maps as a return type. Much more readable than the standard {struct, {struct, <<"">>}} monster. To update to the latest version edit the line in deps/twerl/rebar.config that says
{jiffy, ".*", {git, "https://github.com/davisp/jiffy.git", {tag, "0.8.3"}}},
to
{jiffy, ".*", {git, "https://github.com/davisp/jiffy.git", {tag, "0.14.3"}}},
Now run z:m(). in the Zotonic shell. (you must do this after every change in your code).
Now check in the Zotonic shell if there is a jiffy:decode/2 available by typing jiffy: <tab>, it will show a list of available functions and their arity.
To retrieve a JSON file from the internet run:
{ok, {{_, 200, _}, _, Body}} = httpc:request(get, {"url-to-JSON-here", []}, [], [])
Which will yield the variable Body with the contents. See Erlang docs on http client for more info on this call.
Next convert the contents of Body to Erlang terms with:
JsonData = jiffy:decode(Body, [return_maps]).
What you have to do next depends a lot on the structure of your JSON resource. Keep in mind that everything is now in binary UTF-8 encoded strings! If you print JsonData to screen (just enter JsonData. in your Zotonic/Erlang shell) you will see a lot of #map{<<"key"", <<"Value">>} this.
My data was nested so I had to extract the needed data like this:
[{_,ItemList}|_] = ListData.
This gave me a list of maps, and in order to deal with them as individual items I used the following function:
get_maps([]) ->
done;
get_maps([First|Rest]) ->
Map = maps:get(<<"properties">>, First),
case is_map(Map) of
true ->
map_to_proplist(Map),
get_maps(Rest);
false -> done
end,
done;
get_maps(_) ->
done.
As you might remember, the z_db:insert/3 function needs a property list to populate rows, so that what the call to map_to_proplist/1 is for. How this function looks is completely dependent on how your data looks but as an example here is what worked for me:
map_to_proplist(Map) ->
case is_map(Map) of
true ->
{Value1,_} = string:to_integer(binary_to_list(maps:get(<<"key1">>, Map))),
{Value2,_} = string:to_float(binary_to_list(maps:get(<<"key2">>, Map))),
{Value3,_} = string:to_float(binary_to_list(maps:get(<<"key3">>, Map))),
Value4 = binary_to_list(maps:get(<<"key4">>, Map)),
{Value5,_} = string:to_integer(binary_to_list(maps:get(<<"key5">>, Map))),
{Value6,_} = string:to_integer(binary_to_list(maps:get(<<"key6">>, Map))),
PropList = [{rowname1, Value1}, {rowname2, Value2}, {rowname3, Value3}, {rowname4, Value4}, {rowname5, Value5}, {rowname6, Value6}],
m_sitename:insert_items(PropList,z:c(sitename)),
ok;
false ->
ok
end.
See the documentation on string:to_list/1 as to why the tuples are needed when casting. The call to m_sitename:insert_items(PropList,z:c(sitename)) calls the z_db:insert/3 in models/m_sitename.erl but wrapped in a catch:
insert_items(PropList,Context) ->
(catch z_db:insert(?table, PropList, Context)).
Ok, quite a long post but this should get you up and running if you were looking for this answer.
The above was done with Zotonic 0.13.2 on Erlang/OTP 18.
A repost (except the JSON part) of my post in the Zotonic Developers group.

How to upload multiple JSON files into CouchDB

I am new to CouchDB. I need to get 60 or more JSON files in a minute from a server.
I have to upload these JSON files to CouchDB individually as soon as I receive them.
I installed CouchDB on my Linux machine.
I hope some one can help me with my requirement.
If possible can someone help me with pseudo code.
My Idea:
Is to write a python script for uploading all JSON files to CouchDB.
Each and every JSON file must be each document and the data present in
JSON must be inserted same into CouchDB
(the specified format with values in a file).
Note:
These JSON files are Transactional, every second 1 file is generated
so I need to read the file upload as same format into CouchDB on
successful uploading archive the file into local system of different folder.
python program to parse the json and insert into CouchDb:
import sys
import glob
import errno,time,os
import couchdb,simplejson
import json
from pprint import pprint
couch = couchdb.Server() # Assuming localhost:5984
#couch.resource.credentials = (USERNAME, PASSWORD)
# If your CouchDB server is running elsewhere, set it up like this:
couch = couchdb.Server('http://localhost:5984/')
db = couch['mydb']
path = 'C:/Users/Desktop/CouchDB_Python/Json_files/*.json'
#dirPath = 'C:/Users/VijayKumar/Desktop/CouchDB_Python'
files = glob.glob(path)
for file1 in files:
#dirs = os.listdir( dirPath )
file2 = glob.glob(file1)
for name in file2: # 'file' is a builtin type, 'name' is a less-ambiguous variable name.
try:
with open(name) as f: # No need to specify 'r': this is the default.
#sys.stdout.write(f.read())
json_data=f
data = json.load(json_data)
db.save(data)
pprint(data)
json_data.close()
#time.sleep(2)
except IOError as exc:
if exc.errno != errno.EISDIR: # Do not fail if a directory is found, just ignore it.
raise # Propagate other kinds of IOError.
I would use CouchDB bulk API, even though you have specified that you need to send them to db one by one. For example, by implementing a simple queue that gets sent out every say 5 - 10 seconds via a bulk doc call will greatly increase performance of your application.
There is obviously a quirk in that and that is you need to know the IDs of the docs that you want to get from the DB. But for the PUTs it is perfect. (it is not entirely true, you can get ranges of docs using bulk operation if the IDs you are using for your docs can be sorted nicely).
From my experience working with CouchDB, I have a hunch that you are dealing with Transactional documents in order to compile them into some sort of sum result and act on that data accordingly (maybe creating next transactional doc in series). For that you can rely on CouchDB by using 'reduce' functions on the views you create. It takes a little practice to get reduce function working properly and is highly dependent on what it is you actually what to achieve and what data you are prepared to emit by the view so I can't really provide you with more detail on that.
So in the end the app logic would go something like that:
get _design/someDesign/_view/yourReducedView
calculate new transaction
add transaction to queue
onTimeout
send all in transaction queue
If I got that first part of why you are using transactional docs wrong all that would really change is the part where you getting those transactional docs in my app logic.
Also, before writing your own 'reduce' function, have a look at buil-in ones (they are alot faster then anything outside of db engine can do)
http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API
EDIT:
Since you are starting, I strongly recommend to have a look at CouchDB Definitive Guide.
NOTE FOR LATER:
Here is one hidden stone (well maybe not so much a hidden stone but not an obvious thing to look out for for the new-comer in any case). When you write reduce function make sure that it does not produce too much output for the query without boundaries. This will extremely slow down the entire view even when you provide reduce=false when getting stuff from it.
So you need to get JSON documents from a server and send them to CouchDB as you receive them. A Python script would work fine. Here is some pseudo-code:
loop (until no more docs)
get new JSON doc from server
send JSON doc to CouchDB
end loop
In Python, you could use requests to send the documents to CouchDB and probably to get the documents from the server as well (if it is using an HTTP API).
You might want to checkout the pycouchdb module for python3. I've used it myself to upload lots of JSON objects into couchdb instance. My project does pretty much the same as you describe so you can take a look at my project Pyro at Github for details.
My class looks like that:
class MyCouch:
""" COMMUNICATES WITH COUCHDB SERVER """
def __init__(self, server, port, user, password, database):
# ESTABLISHING CONNECTION
self.server = pycouchdb.Server("http://" + user + ":" + password + "#" + server + ":" + port + "/")
self.db = self.server.database(database)
def check_doc_rev(self, doc_id):
# CHECKS REVISION OF SUPPLIED DOCUMENT
try:
rev = self.db.get(doc_id)
return rev["_rev"]
except Exception as inst:
return -1
def update(self, all_computers):
# UPDATES DATABASE WITH JSON STRING
try:
result = self.db.save_bulk( all_computers, transaction=False )
sys.stdout.write( " Updating database" )
sys.stdout.flush()
return result
except Exception as ex:
sys.stdout.write( "Updating database" )
sys.stdout.write( "Exception: " )
print( ex )
sys.stdout.flush()
return None
Let me know in case of any questions - I will be more than glad to help if you will find some of my code usable.

Issue with passing JSON to a view in Web2py

Wrote a very simple function to pull data from the espn api and display in default/index. However default/index is a blank page.
At this point I'm not even trying to parse through the JSON - I just want to see something on my browser.
default.py:
import urllib2
import json
#espn_uri being pulled from models/db.py
def index():
r = urllib2.Request(espn_uri)
opener = urllib2.build_opener()
f = opener.open(r)
status = json.load(f)
return dict(status)
default/index.html:
{{status}}
Thank you!
Try: return dict(status=status)
return dict(status) works because status it itself a dict, and dict(status) just copies it. But it's probably got no key named status, or at least nothing interesting.
And yes, you need =.
As JLundell advises, first return paired data via the dictionary:
return dict(my_status=status)
Second, as you've worked out, use the following to access the returned, rather than the local variable in index.html. Make sure you use the equals sign here or nothing will display
{{=my_status}}
When it comes to JSON, you can return the data using
return my_status.json()
Several other options are available to return data as a list, or to return HTML.
Finally, I recommend that you make use of jQuery and AJAX ($.ajax), so that the AJAX return value can be easily assigned to a JS object. This will also allow you to handle success or errors in the form of JS functions.