boto VPC get_all_dhcp_options() filter by domain-name - boto

Should be a pretty simple thing, but I couldn't find a clue anywhere. I want to run get_all_dhcp_options() but set filters so that only those that match a certain domain-name are returned.

If anyone has better way, please let me know.
I understand to get the answer directly as #garnaat did is simple, but on how to get the filter key is not that direct.
What I did here is the instruction to find out the key/value easily. This is a general way for finding any keys in boto, not only vpc, same for ec2, s3, etc.
$ python
>>> import boto.vpc
>>> c = boto.vpc.connect_to_region('us-west-2') # or whatever
>>> c.get_all_dhcp_options()
[DhcpOptions:dopt-12dc23d1, DhcpOptions:dopt-426e82c7]
>>> for dhcp in c.get_all_dhcp_options():
... print dir(dhcp)
...
['__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'add_tag', 'add_tags', 'connection', 'endElement', 'id', 'item', 'options', 'region', 'remove_tag', 'remove_tags', 'startElement', 'tags']
It makes sense that the domain name should be in options, but you can verify it by login aws management console, click VPC => DHCP Options Sets, the domain name is in column options. So continue the debug:
>>> for dhcp in c.get_all_dhcp_options():
... print dhcp.options
...
{u'domain-name': [u'us-west-2.compute.internal'], u'domain-name-servers': [u'AmazonProvidedDNS']}
{u'domain-name': [u'abc.example.com xyz.example.com'], u'domain-name-servers': [u'10.0.0.1', u' 10.0.0.2'], u'ntp-servers': [u'10.0.0.1', u' 10.0.0.2']}
...
So now you get the key which is domain-name, and you also know the other keys domain-name-servers, ntp-servers. Then you should confidently put the filter now.
>>> c.get_all_dhcp_options(filters={'key': 'domain-name', 'value': 'us-west-2.compute.internal'})
[DhcpOptions:dopt-426e82c7]
After you get the result properly, you can add above steps in your codes.
Good luck.

This should work for you:
import boto.vpc
c = boto.vpc.connect_to_region('us-west-2') # or whatever
c.get_all_dhcp_options(filters={'key': 'domain-name', 'value': 'us-west-2.compute.internal'})
Obviously, use whatever domain name makes sense in your situation.
Finding out which filter key/values are supported by a given API call is another problem. Unfortunately, boto does not provide a way to do this mainly because there is no API that provides this information. However, if you have the AWSCLI, it has help information that comes directly from the service documentation so it does list the available filters.
To find what filters are supported you have to know that the get_all_* method calls in boto map to the Describe* API calls in EC2 and those, in turn, are available in the AWSCLI as describe-* commands. So:
aws ec2 describe-dhcp-options help
will display all of the help for the API call including the supported filters. Its a roundabout way of getting the info but its better than looking it up in the API docs.

Related

Most efficient way to handle lists saved in a file python3

I have been looking for information in Google and stackoverflow, but I dind't find a good solution.
I need to handle a list, add elements, delete elements... but saved in a file. This is in order to avoid losing the list when the execution finish, because I need to execute my python script periodically. Here are alternatives I found, but they have some problems
Shelve module: can't find how to delete a element in the list (such as list.pop() ) instead of deleting all the list.
pprint.pformat() : to modify information, I need to delete all the document and save the modifed information, very inefficient.
json: tediuos for just a list and doesn't seems to solve my problem
So, which is the best way to handle a list, doing things as easy as mylist.pop() keeping the changes in a file in an efficient way?
Since this has never been answered before, here is an efficient way. The package pysos can handle disk backed lists with inserts/deletes in constant time.
pip install pysos
Code example:
import pysos
db = pysos.List('somefile')
db.append('saved in the file 0')
db.append('saved in the file 1')
db.append('saved in the file 2')
print(db[1]) # => 'saved in the file 1'
del db[1]
print(db[1]) # => 'saved in the file 2'

Erhversstyrelsens CVR-API with elasticsearch and R?

For a while I have been trying to access data from The Danish Company register, which is stores in elastic searchs rest API. I have tried using httr and The elastic package for R with no luck.
I think i have succeded connecting using
elastic::connect("distribution.virk.dk/CVR-permanent/", es_port=" ", es_user="username", es_pwd="pass")
I can use the count function to see the index of companies, participant and production units
Count(index="virksomhed")
[1] 1617466
The issue comes when I use Search(). Below is my input
Search(index=virksomhed, q="422900")
where 422900 is nacecode. I have tried various type of test with no luck though. Also tried using body copying search JSON search strings.
My result is always the same.
Error in if (gsub("\.","",ping()$version$number) <500) {: argument is of length zero
Is there Any help out there? Httr jsonlite, anything??
code here

kafka-python 1.3.3: KafkaProducer.send with explicit key fails to send message to broker

(Possibly a duplicate of Can't send a keyedMessage to brokers with partitioner.class=kafka.producer.DefaultPartitioner, although the OP of that question didn't mention kafka-python. And anyway, it never got an answer.)
I have a Python program that has been successfully (for many months) sending messages to the Kafka broker, using essentially the following logic:
producer = kafka.KafkaProducer(bootstrap_servers=[some_addr],
retries=3)
...
msg = json.dumps(some_message)
res = producer.send(some_topic, value=msg)
Recently, I tried to upgrade it to send messages to different partitions based on a definite key value extracted from the message:
producer = kafka.KafkaProducer(bootstrap_servers=[some_addr],
key_serializer=str.encode,
retries=3)
...
try:
key = some_message[0]
except:
key = None
msg = json.dumps(some_message)
res = producer.send(some_topic, value=msg, key=key)
However, with this code, no messages ever make it out of the program to the broker. I've verified that the key value extracted from some_message is always a valid string. Presumably I don't need to define my own partitioner, since, according to the documentation:
The default partitioner implementation hashes each non-None key using the same murmur2 algorithm as the java client so that messages with the same key are assigned to the same partition.
Furthermore, with the new code, when I try to determine what happened to my send by calling res.get (to obtain a kafka.FutureRecordMetadata), that call throws a TypeError exception with the message descriptor 'encode' requires a 'str' object but received a 'unicode'.
(As a side question, I'm not exactly sure what I'd do with the FutureRecordMetadata if I were actually able to get it. Based on the kafka-python source code, I assume I'd want to call either its succeeded or its failed method, but the documentation is silent on the point. The documentation does say that the return value of send "resolves to" RecordMetadata, but I haven't been able to figure out, from either the documentation or the code, what "resolves to" means in this context.)
Anyway: I can't be the only person using kafka-python 1.3.3 who's ever tried to send messages with a partitioning key, and I have not seen anything on teh Intertubes describing a similar problem (except for the SO question I referenced at the top of this post).
I'm certainly willing to believe that I'm doing something wrong, but I have no idea what that might be. Is there some additional parameter I need to supply to the KafkaProducer constructor?
The fundamental problem turned out to be that my key value was a unicode, even though I was quite convinced that it was a str. Hence the selection of str.encode for my key_serializer was inappropriate, and was what led to the exception from res.get. Omitting the key_serializer and calling key.encode('utf-8') was enough to get my messages published, and partitioned as expected.
A large contributor to the obscurity of this problem (for me) was that the kafka-python 1.3.3 documentation does not go into any detail on what a FutureRecordMetadata really is, nor what one should expect in the way of exceptions its get method can raise. The sole usage example in the documentation:
# Asynchronous by default
future = producer.send('my-topic', b'raw_bytes')
# Block for 'synchronous' sends
try:
record_metadata = future.get(timeout=10)
except KafkaError:
# Decide what to do if produce request failed...
log.exception()
pass
suggests that the only kind of exception it will raise is KafkaError, which is not true. In fact, get can and will (re-)raise any exception that the asynchronous publishing mechanism encountered in trying to get the message out the door.
I also faced the same error. Once I added json.dumps while sending the key, it worked.
producer.send(topic="first_topic", key=json.dumps(key)
.encode('utf-8'), value=json.dumps(msg)
.encode('utf-8'))
.add_callback(on_send_success).add_errback(on_send_error)

Simperium Data Dictionary or Decoder Ring for Return Value on "all" call?

I've looked through all of the Simperium API docs for all of the different programming languages and can't seem to find this. Is there any documentation for the data returned from an ".all" call (e.g. api.todo.all(:cv=>nil, :data=>false, :username=>false, :most_recent=>false, :timeout=>nil) )?
For example, this is some data returned:
{"ccid"=>"10101010101010101010101010110101010",
"o"=>"M",
"cv"=>"232323232323232323232323232",
"clientid"=>"ab-123123123123123123123123",
"v"=>{
"date"=>{"o"=>"+", "v"=>"2015-08-20T00:00:00-07:00"},
"calendar"=>{"o"=>"+", "v"=>false},
"desc"=>{"o"=>"+", "v"=>"<p>test</p>\r\n"},
"location"=>{"o"=>"+", "v"=>"Los Angeles"},
"id"=>{"o"=>"+", "v"=>43}
},
"ev"=>1,
"id"=>"abababababababababababababab/10101010101010101010101010110101010"}
I can figure out some of it just from context or from the name of the key but a lot of it is guesswork and trial and error. The one that concerns me is the value returned for the "o" key. I assume that a value of "M" is modify and a value of "+" is add. I've also run into "-" for delete and just recently discovered that there is also a "! '-'" which is also a delete but don't know what else it signifies. What other values can be returned in the "o" key? Are there other keys/values that can be returned but are rare? Is there documentation that details what can be returned (that would be the most helpful)?
If it matters, I am using the Ruby API but I think this is a question that, if answered, can be helpful for all APIs.
The response you are seeing is a list of all of the changes which have occurred in the given bucket since some point in its history. In the case where cv is blank, it tries to get the full history.
You can find some of the details in the protocol documentation though it's incomplete and focused on the WebSocket message syntax (the operations are the same however as with the HTTP API).
The information provided by the v parameter is the result of applying the JSON-diff algorithm to the data between changes. With this diff information you can reconstruct the data at any given version as the changes stream in.

How to upload multiple JSON files into CouchDB

I am new to CouchDB. I need to get 60 or more JSON files in a minute from a server.
I have to upload these JSON files to CouchDB individually as soon as I receive them.
I installed CouchDB on my Linux machine.
I hope some one can help me with my requirement.
If possible can someone help me with pseudo code.
My Idea:
Is to write a python script for uploading all JSON files to CouchDB.
Each and every JSON file must be each document and the data present in
JSON must be inserted same into CouchDB
(the specified format with values in a file).
Note:
These JSON files are Transactional, every second 1 file is generated
so I need to read the file upload as same format into CouchDB on
successful uploading archive the file into local system of different folder.
python program to parse the json and insert into CouchDb:
import sys
import glob
import errno,time,os
import couchdb,simplejson
import json
from pprint import pprint
couch = couchdb.Server() # Assuming localhost:5984
#couch.resource.credentials = (USERNAME, PASSWORD)
# If your CouchDB server is running elsewhere, set it up like this:
couch = couchdb.Server('http://localhost:5984/')
db = couch['mydb']
path = 'C:/Users/Desktop/CouchDB_Python/Json_files/*.json'
#dirPath = 'C:/Users/VijayKumar/Desktop/CouchDB_Python'
files = glob.glob(path)
for file1 in files:
#dirs = os.listdir( dirPath )
file2 = glob.glob(file1)
for name in file2: # 'file' is a builtin type, 'name' is a less-ambiguous variable name.
try:
with open(name) as f: # No need to specify 'r': this is the default.
#sys.stdout.write(f.read())
json_data=f
data = json.load(json_data)
db.save(data)
pprint(data)
json_data.close()
#time.sleep(2)
except IOError as exc:
if exc.errno != errno.EISDIR: # Do not fail if a directory is found, just ignore it.
raise # Propagate other kinds of IOError.
I would use CouchDB bulk API, even though you have specified that you need to send them to db one by one. For example, by implementing a simple queue that gets sent out every say 5 - 10 seconds via a bulk doc call will greatly increase performance of your application.
There is obviously a quirk in that and that is you need to know the IDs of the docs that you want to get from the DB. But for the PUTs it is perfect. (it is not entirely true, you can get ranges of docs using bulk operation if the IDs you are using for your docs can be sorted nicely).
From my experience working with CouchDB, I have a hunch that you are dealing with Transactional documents in order to compile them into some sort of sum result and act on that data accordingly (maybe creating next transactional doc in series). For that you can rely on CouchDB by using 'reduce' functions on the views you create. It takes a little practice to get reduce function working properly and is highly dependent on what it is you actually what to achieve and what data you are prepared to emit by the view so I can't really provide you with more detail on that.
So in the end the app logic would go something like that:
get _design/someDesign/_view/yourReducedView
calculate new transaction
add transaction to queue
onTimeout
send all in transaction queue
If I got that first part of why you are using transactional docs wrong all that would really change is the part where you getting those transactional docs in my app logic.
Also, before writing your own 'reduce' function, have a look at buil-in ones (they are alot faster then anything outside of db engine can do)
http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API
EDIT:
Since you are starting, I strongly recommend to have a look at CouchDB Definitive Guide.
NOTE FOR LATER:
Here is one hidden stone (well maybe not so much a hidden stone but not an obvious thing to look out for for the new-comer in any case). When you write reduce function make sure that it does not produce too much output for the query without boundaries. This will extremely slow down the entire view even when you provide reduce=false when getting stuff from it.
So you need to get JSON documents from a server and send them to CouchDB as you receive them. A Python script would work fine. Here is some pseudo-code:
loop (until no more docs)
get new JSON doc from server
send JSON doc to CouchDB
end loop
In Python, you could use requests to send the documents to CouchDB and probably to get the documents from the server as well (if it is using an HTTP API).
You might want to checkout the pycouchdb module for python3. I've used it myself to upload lots of JSON objects into couchdb instance. My project does pretty much the same as you describe so you can take a look at my project Pyro at Github for details.
My class looks like that:
class MyCouch:
""" COMMUNICATES WITH COUCHDB SERVER """
def __init__(self, server, port, user, password, database):
# ESTABLISHING CONNECTION
self.server = pycouchdb.Server("http://" + user + ":" + password + "#" + server + ":" + port + "/")
self.db = self.server.database(database)
def check_doc_rev(self, doc_id):
# CHECKS REVISION OF SUPPLIED DOCUMENT
try:
rev = self.db.get(doc_id)
return rev["_rev"]
except Exception as inst:
return -1
def update(self, all_computers):
# UPDATES DATABASE WITH JSON STRING
try:
result = self.db.save_bulk( all_computers, transaction=False )
sys.stdout.write( " Updating database" )
sys.stdout.flush()
return result
except Exception as ex:
sys.stdout.write( "Updating database" )
sys.stdout.write( "Exception: " )
print( ex )
sys.stdout.flush()
return None
Let me know in case of any questions - I will be more than glad to help if you will find some of my code usable.