I'm pulling data from the NHTSA API, using a JSON format. I'm then creating a named tuple from this data and a few other sources and using this as a record to insert into a MySQL database.
The NHTSA API uses '' to designate a null value which is not an accepted value in for this particular column in database. The column only allows a float datatype.
When creating my named tuple, is there a way to substitute None if a specific value is returned? I.e. if API call returns '', use None instead?
Error returned is
Failed inserting object into MySQL table Error while executing statement: Data truncated for column 'weight' at row 1
Tuples are immutable, hence you need to create a new tuple
Here's an example:
old = (1,2,'ABC','','','','text')
new = tuple(None if x == '' else x for x in old)
Output:
Now new contains:
(1, 2, 'ABC', None, None, None, 'text')
Refer this thread for more information
To replace one specific field value in namedtuple / NamedTuple in an easier way you can use _replace() method.
Point = namedtuple('Point', 'x,y')
p = Point(x=11, y=22)
p = p._replace(x=33)
print(p)
It will print:
Point(x=33, y=22)
_replace() substitutes a field specified with keyword argument with its value, and returns a new namedtuple with that value and the rest of values copied from an old namedtuple.
Related
I'm trying to read a .csv file with the following format using MAC:
;lon;lat
0;55,245594;25,066697
1;55,135613;25,070419
2;55,275683;25,203425
What I am doing so far is:
$call csv2gdx coords.csv id=d index=1 values=2..lastCol useHeader=y
sets
i
c /x,y/
;
parameters
dloc(i,c) 'locations'
;
$gdxin clients_csv.gdx
$load ___ ?
What I want to do is read the lat,lon coordinates in the parameter dloc so as for each i to have a pair of coords c, i.e. lat, lon.
Example output:
x y
i1 17.175 84.327
Running your code produces an error from csv2gdx:
*** ErrNr = 15 Msg = Values(s) column number exceeds column count; Index = 2, ColCnt = 1
Per default, csv2gdx expects the entries separated by commas, which you do not have in your data. You could also define semicolon or tab as separator by means of an option, but if the data has really the format you posted, you do not need to call csv2gdx at all. You could just include the data directly like this:
Sets
i
c
;
Table dloc(i<,c<) 'locations'
$include coords.csv
;
Display dloc;
EDIT after change of input data format:
The error message is still the same. And also the reason is the same: You use a different field separator than the default one. If you switch that using the option fieldSep=semiColon, you will realize that also your decimal separator is non-default for csv2gdx. But this can be changed as well. Here is the whole code (with adjusted csv2gdx call and adjustments for data loading). Note that sets i and c get implicitly defined when loading dloc with the < syntax in the declaration of dloc.
$call csv2gdx coords.csv id=d index=1 values=2..lastCol useHeader=y fieldSep=semiColon decimalSep=comma
Sets
i
c
;
parameters
dloc(i<,c<) 'locations'
;
$gdxin coords.gdx
$load dloc=d
Display dloc;
$exit\
I am trying YugaByte's Cassandra API (YCQL) and interested in using the JSONB data type extensions.
But I am having trouble both updating an attribute in an existing JSONB column as well as adding a new attribute to an existing JSONB column.
Is this supported in YugaByte? Here is what I tried:
Consider the following example whichhas have one row with a simple key and JSONB column.
cqlsh:k> CREATE TABLE T (key int PRIMARY KEY, value jsonb);
cqlsh:k> INSERT INTO T(key, value) VALUES(1, '{"author": "Charles", "title": "Hello World"}');
cqlsh:k> SELECT * FROM T;
key | value
-----+--------------------------------------------
1 | {"author":"Charles","title":"Hello World"}
(1 rows)
So far so good.
If I try to update an existing attribute inside the doc, I see the following error:
cqlsh:k> UPDATE T SET value->'author' = 'Bruce' WHERE key=1;
InvalidRequest: Error from server: code=2200 [Invalid query] message="SQL error: \
Invalid Arguments. Corruption: JSON text is corrupt: Invalid value.
If I try to add a new attribute into an existing JSONB attribute, I get the following error;
cqlsh:k> UPDATE T SET value->'price' = '10' WHERE key=1;
InvalidRequest: Error from server: code=2200 [Invalid query] message="SQL error: \
Execution Error. Could not find member:
Is this supported, and if so what is the correct syntax?
When updating a string value you must enclose the new value in double quotes inside the single quotes. For example:
cqlsh:k> UPDATE T SET value->'author' = '"Bruce"' WHERE key=1;
cqlsh:k> SELECT * FROM T;
key | value
-----+------------------------------------------
1 | {"author":"Bruce","title":"Hello World"}
(1 rows)
Regarding the second question on ability to add new attributes:
For UPDATE, currently (as of 1.1) YugaByte DB allows updating specific attributes if that attribute/field already exists, but does not allow addition of new attributes into an existing JSONB column. If you need to the latter, you need to read the old value into the app, and write the new json in its entirety.
It is possible to preserve insertion order when parsing a JSON struct with a
Poco::JSON::Parser( new Poco::JSON::ParseHandler( true ) ): the non-default ParseHandler parameter preserveObjectOrder = true is handed over to the Poco::JSON::Objects so that they keep an private list of keys sorted in insertion order.
An object can then be serialized via Object::stringify() to look just like the source JSON string. Fine.
What, however, is the official way to step through a Poco::JSON::Object and access its internals in insertion order? Object::getNames() and begin()/end() use the alphabetical order of keys, not insertion order -- is there another way to access the values, or do I have to patch Poco?
As you already said:
Poco::JSON::ParseHandler goes into the Poco::JSON::Parser-constructor.
Poco::JSON::Parser::parse() creates a Poco::Dynamic::Var.
From that you'll extract a Poco::JSON::Object::Ptr.
The Poco::JSON:Object has the method "getNames". Beginning with this commit it seems to preserve the order, if it was requested via the ParseHandler. (Poco::JSON:Object::getNames 1.8.1, Poco::JSON:Object::getNames 1.9.0)
So now it should work as expected to use:
for(auto const & name : object->getNames()){
auto const & value = object->get(name); // or one of the other get-methods
// ... do things ...
}
What is the DynamoDB equivalent of
SELECT MAX(RANGE_KEY) FROM MYTABLE WHERE PRIMARYKEY = "value"
The best I can come up with is
from boto.dynamodb2.table import Table as awsTable
tb = awsTable("MYTABLE")
rs = list(tb.query_2(PRIMARYKEY__eq="value", reverse=True, limit=1))
MAXVALUE = rs[0][RANGE_KEY]
Is there a better way to do this?
That's the correct way.
Because the records matched by the Hash Key are sorted by the Range Key, getting the first one by the descendant order will give you the record with the maximum range key.
Query results are always sorted by the range key. If the data type of
the range key is Number, the results are returned in numeric order;
otherwise, the results are returned in order of ASCII character code
values. By default, the sort order is ascending. To reverse the order
use the ScanIndexForward parameter set to false.
Query and Scan Operations - Amazon DynamoDB : http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/QueryAndScan.html
NOTE: Setting the reverse parameter to true via boto API is equivalent to setting ScanIndexForward to false via the native AWS API.
If someone looking how to do it with Java:
QuerySpec querySpec = new QuerySpec();
querySpec.withKeyConditionExpression("PRIMARYKEY = :key")
.withValueMap(new ValueMap()
.withString(":key", primaryKeyValue));
querySpec.withScanIndexForward(true);
querySpec.withMaxResultSize(1);
In boto3 you can do it this way:
import boto3
from boto3.dynamodb.conditions import Key, Attr
kce = Key('table_id').eq(tableId) & Key('range').between(start, end)
output = table.query(KeyConditionExpression = kce, ScanIndexForward = False, Limit = 1)
output contains the row associated with the Max value for the range between start and end. For the Min value change the ScanIndexForward to True
Very similar to this question MySQL Dynamic Query Statement in Python
However what I am looking to do instead of two lists is to use a dictionary
Let's say i have this dictionary
instance_insert = {
# sql column variable value
'instance_id' : 'instnace.id',
'customer_id' : 'customer.id',
'os' : 'instance.platform',
}
And I want to populate a mysql database with an insert statement using sql column as the sql column name and the variable name as the variable that will hold the value that is to be inserted into the mysql table.
Kind of lost because I don't understand exactly what this statement does, but was pulled from the question that I posted where he was using two lists to do what he wanted.
sql = "INSERT INTO instance_info_test VALUES (%s);" % ', '.join('?' for _ in instance_insert)
cur.execute (sql, instance_insert)
Also I would like it to be dynamic in the sense that I can add/remove columns to the dictionary
Before you post, you might want to try searching for something more specific to your question. For instance, when I Googled "python mysqldb insert dictionary", I found a good answer on the first page, at http://mail.python.org/pipermail/tutor/2010-December/080701.html. Relevant part:
Here's what I came up with when I tried to make a generalized version
of the above:
def add_row(cursor, tablename, rowdict):
# XXX tablename not sanitized
# XXX test for allowed keys is case-sensitive
# filter out keys that are not column names
cursor.execute("describe %s" % tablename)
allowed_keys = set(row[0] for row in cursor.fetchall())
keys = allowed_keys.intersection(rowdict)
if len(rowdict) > len(keys):
unknown_keys = set(rowdict) - allowed_keys
print >> sys.stderr, "skipping keys:", ", ".join(unknown_keys)
columns = ", ".join(keys)
values_template = ", ".join(["%s"] * len(keys))
sql = "insert into %s (%s) values (%s)" % (
tablename, columns, values_template)
values = tuple(rowdict[key] for key in keys)
cursor.execute(sql, values)
filename = ...
tablename = ...
db = MySQLdb.connect(...)
cursor = db.cursor()
with open(filename) as instream:
row = json.load(instream)
add_row(cursor, tablename, row)
Peter
If you know your inputs will always be valid (table name is valid, columns are present in the table), and you're not importing from a JSON file as the example is, you can simplify this function. But it'll accomplish what you want to accomplish. While it may initially seem like DictCursor would be helpful, it looks like DictCursor is useful for returning a dictionary of values, but it can't execute from a dict.