I am trying to get elasticsearch cloud-watch metrics using boto but whatever I do, I do not get the value. Below is snippet of my code , same code works for example if I use for RDS metrics.
import datetime
import boto.ec2.cloudwatch
end = datetime.datetime.utcnow()
start = end - datetime.timedelta(minutes=5)
metric="CPUUtilization"
region = boto.regioninfo.RegionInfo(
name='ap-southeast-1',
endpoint='monitoring.ap-southeast-1.amazonaws.com')
conn = boto.ec2.cloudwatch.CloudWatchConnection(region=region)
data = conn.get_metric_statistics(60, start, end, metric, "AWS/ES", "Average", {"DomainName": "My-es-name"})
print data
[]
However if I change the namespace to RDS it works fine with proper dimension value. This is a simple code which I can write. I am not sure what is wrong here. Can anyone help me to figure out this?
What am I doing wrong here?
Thanks
I found the solution.
To pull Elasticsearch metrics for a specific domain name, you need to also indicate your ClientId in the dimensions.
My examples below are in Boto3, but for executing it with your code (boto2), I believe you only need to amend the dimensions as follow, assuming your syntax was originally right:
data = conn.get_metric_statistics(60, start, end, metric, "AWS/ES", "Average", {"ClientId":"My-client-id", "DomainName": "My-es-name"})
Try the code below (boto3). It worked for me.
import boto3
from datetime import datetime, timedelta
cloudwatch = boto3.resource('cloudwatch', region_name='ap-southeast-1')
cpu = cloudwatch.Metric('AWS/ES', 'CPUUtilization')
cpu_usage = cpu.get_statistics(
Dimensions=[
{'Name': 'ClientId', 'Value': 'YOUR-CLIENT-ID'},
{'Name': 'DomainName', 'Value': 'YOUR-DOMAIN-NAME'}
],
StartTime=(datetime.utcnow() - timedelta(minutes=5)).isoformat(),
EndTime=datetime.utcnow().isoformat(),
Period=60,
Statistics=['Average']
)
If you prefer to use a client, use the following instead:
client = boto3.client('cloudwatch', region_name='ap-southeast-1')
response = client.get_metric_statistics(
Namespace='AWS/ES',
MetricName='CPUUtilization',
Dimensions=[
{'Name': 'ClientId', 'Value': 'YOUR-CLIENT-ID'},
{'Name': 'DomainName', 'Value': 'YOUR-DOMAIN-NAME'}
],
StartTime=(datetime.utcnow() - timedelta(minutes=5)).isoformat(),
EndTime=datetime.utcnow().isoformat(),
Period=60,
Statistics=['Average']
)
Related
I am trying to load the airbnb_nyc data set from GCS bucket to BigqueryTable. Link to the dataset.
I am using the following Code:
def parse_file(element):
for line in csv.reader([element],delimiter=','):
return line
class DataIngestion2:
def parse_method2(self, values):
row1 = dict(
zip(('id', 'name', 'host_id', 'host_name', 'neighbourhood_group', 'neighbourhood', 'latitude', 'longitude',
'room_type', 'price', 'minimum_nights', 'number_of_reviews', 'last_review', 'reviews_per_month',
'calculated_host_listings_count', 'availability_365'),
values))
return row1
with beam.Pipeline(options=pipeline_options) as p:
lines= p | 'Read' >> ReadFromText(known_args.input,skip_header_lines=1)\
| 'parse' >> beam.Map(parse_file)
pipeline2 = lines | 'Format to Dict _ original CSV' >> beam.Map(lambda x: data_ingestion2.parse_method2(x))
pipeline2 | 'Load2' >> beam.io.WriteToBigQuery(table_spec, schema=table_schema,
write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE,
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED
)
`
But my output on BigQuery Table is wrong.
I am only getting values for the first two columns and the rest of the 14 columns are showing NULL. I am not able to figure out what I am doing wrong. Can Someone Help me find the error in my logic. I basically want to know how to transfer a csv from GCS bucket to BigQuery through DataFlow pipeline.
Thank you,
You can use the ReadFromText method and then create your own transform by extending beam.DoFn. Attached the code below for reference.
https://beam.apache.org/releases/pydoc/2.32.0/apache_beam.io.textio.html#apache_beam.io.textio.ReadFromText
Note that you can use gs:// for GCS in file_pattern.
More details about Pardo and DoFn
https://beam.apache.org/documentation/programming-guide/#pardo
import apache_beam as beam
from apache_beam.io.textio import ReadAllFromText,ReadFromText
from apache_beam.io.gcp.bigquery import WriteToBigQuery
from apache_beam.io.gcp.gcsio import GcsIO
import csv
COLUMN_NAMES = ['id','name','host_id','host_name','neighbourhood_group','neighbourhood','latitude','longitude','room_type','price','minimum_nights','number_of_reviews','last_review','reviews_per_month','calculated_host_listings_count','availability_365']
def files(path='gs:/some/path'):
return list(GcsIO(storage_client='<ur storage client>').list_prefix(path=path).keys())
def transform_csv(element):
rows = []
with open(element,newline='\r\n') as f:
itr = csv.reader(f, delimiter = ',',quotechar= '"')
skip_head = next(itr)
for row in itr:
rows.append(row)
return rows
def to_dict(element):
rows = []
for item in element:
row_dict = {}
zipped = zip(COLUMN_NAMES,item)
for key,val in zipped:
row_dict[key] =val
rows.append(row_dict)
yield rows
with beam.Pipeline() as p:
read =(
p
|'read-file'>> beam.Create(files())
|'transform-dict'>>beam.Map(transform_csv)
|'list-to-dict'>>beam.FlatMap(to_dict )
|'print'>>beam.Map(print)
#|'write-to-bq'>>WriteToBigQuery(schema=COLUMN_NAMES,table='ur table',project='',dataset='')
)
EDITED1 The ReadFromText supports \r\n as newline char.But,this fails to consider the condition where column data itself has \r\n. Updating the code below.
EDITED 2 GcsIo error fixed.
Note - I have used GCSIO for getting the list of files.
Details here
Please Up-vote and mark as answer if this helps.
Let me suggest another approch for this use case. BiqQuery offers special feature for uploading from Google Could Storage (GCS) to Bigquery. You can load data in several formats and CSV is among them.
There is nice tutorial on Google documentation explaining how to do it. You do not have to use Dataflow or apache_beam. Such process is available through BigQuery API itself.
This is working in many languages, but you do not have to use any language as such process can be done from console or via Cloud SDK using bq command. Everything can be found in mentioned tutorial.
I'm currently trying to create a JSON object to insert into a mongoDB database. However, when I create my JSON object, it appears to reach the end of the creation and then won't continue onto the insert statement in order to place the JSON object in the database. Here is an example of my code. For security reasons, I've changed some of the sensitive code to 99999, but I've verified that the authorisation section of the code works.
'''
client = MongoClient('9999999', 999999) # LAN
db = client.999999
site = client.99999
db.authenticate('9999999', '999999') # user authentication
collection = site.9999999
test = {
'Timestamp': timestamp(),
'Device Type': device,
'Instrument Serial': instrumentserial,
'Calibration Date': cal,
'Survey Date': dateofsurvey,
'Survey Time': timeofsurvey,
'Site Name': nameofsite,
'Survey Location': survloc,
'Frequency Survey Reference': survref,
'Voltage': v,
'Insulation': ins,
'Tuner Select': tunerselect,
'Start Frequency': frequency,
'Reference Level': referencelevel,
'RBW': rbw,
'Instrument Mode': instrumentmode,
'Detector Mode': detectormode,
'Sweep Rate': sweeprate,
'Trigger Mode': triggermode,
'Trigger Level': triggerlevel,
}
collection.insert_one(test)
'''
All the variables have been parsed from an xml file. Any help would be greatly appreciated, as the code appears to just stop.
I am trying to parse data in our organization's HR system for people's profiles. I am using Selenium in Python to do the work but I encounter some difficulties.
I have the URLs list and I want to extract people who report to them. The dummy data displayed as below (same strcuture for all URLs):
[{'Name': 'Jon Doe','prop': {'ID': '5646'},'ManagerName': 'Kat'},
{'Name': 'Maokai','prop': {'ID': '48521'},'ManagerName': 'Malphite'},
{'Name': 'Ryze','prop': {'ID': '43547'},'ManagerName': 'Wukong'},
{'Name': 'Zed','prop': {'ID': '98244'},'ManagerName': 'Annie'}]
I tried the coding below but can only extract info for the 10th URL...The output lists don't aggregate. Can anyone tell me anything wrong with the code and how to fix it?
driver = webdriver.Chrome(executable_path=r'xxx\chromedriver.exe')
for url in URL_lst[:10]:
driver.get(url)
time.sleep(10)
data = json.loads(driver.find_element_by_tag_name('body').text)
NAME_lst = []
ID_lst = []
Manager_lst = []
for profile in data:
NAME_lst.append(profile['Name'])
ID_lst.append(profile['prop']['ID'])
Manager_lst.append(profile['ManagerName'])
df_outputs = pd.DataFrame({'NAME':NAME_lst,
'ID':ID_lst,
'Manager':Manager_lst})
The expected outputs would be the aggregation for the 10 URLs' outputs.
For security reasons, I cannot post the URLs. Thanks for understanding.
Looks like indentation issue. Check this once
driver = webdriver.Chrome(executable_path=r'xxx\chromedriver.exe')
for url in URL_lst[:10]:
driver.get(url)
time.sleep(10)
data = json.loads(driver.find_element_by_tag_name('body').text)
NAME_lst = []
ID_lst = []
Manager_lst = []
for profile in data:
NAME_lst.append(profile['Name'])
ID_lst.append(profile['prop']['ID'])
Manager_lst.append(profile['ManagerName'])
OK. I found the solution myself:
driver = webdriver.Chrome(executable_path=r'xxx\chromedriver.exe')
output=[]
for url in URL_lst[:10]:
driver.get(url)
time.sleep(10)
data = json.loads(driver.find_element_by_tag_name('body').text)
output.append(data)
Then, create loops to append info.
I have a list .I am trying to json dump and load it and get specific data out of it but its not working.
x=[
AttributeDict({
'address': '0xf239F8424AffCbf9CC08Bd0110F0Df011Bcd2e68',
'logIndex': 0,
'args': AttributeDict({
'_value': 63
}),
'transactionHash': HexBytes('0x96d06e0f112247fd584cfe9fbdf726d172ec0703bad3604c1182e0abcb67a45a'),
'event': 'Energy',
'blockHash': HexBytes('0x3ee6e9f4d682d9a99a94828e9ad7eb7e009e464aed980cd6c3055f62703599fa'),
'blockNumber': 1327084,
'transactionIndex': 0
})
]
This is the reposnse above.
I need to get the "_value" out of it so
i first did the dump..
y = json.dumps(x)
and then loads
z = json.loads(y)
but i am not getting any data by putting e.g.
z['AttributeDict']
how can i get that out of it?? thanks
So the answer is,there is a module in web3.py called
web3.datastructures so just doing by that code below you can get the values out of it
import web3.datastructures as wd
res=wd.AttributeDict(x[0])
print(res['args']['_value'])
I am trying to write my custom API view and I am struggling a bit with querysets and JSON. It shouldn't be that complicated but I am stuck still. Also I am confused by some strange behaviour of the loop I coded.
Here is my view:
#api_view()
def BuildingGroupHeatYear(request, pk, year):
passed_year = str(year)
building_group_object = get_object_or_404(BuildingGroup, id=pk)
buildings = building_group_object.buildings.all()
for item in buildings:
demand_heat_item = item.demandheat_set.filter(year=passed_year).values('building_id', 'year', 'demand')
print(demand_heat_item)
print(type(demand_heat_item)
return Response(demand_heat_item))
Ok so this actually gives me back exactly what I want. Namely that:
{'building_id': 1, 'year': 2019, 'demand': 230.3}{'building_id': 1, 'year': 2019, 'demand': 234.0}
Ok, great, but why? Shouldn't the data be overwritten each time the loop goes over it?
Also when I get the type of the demand_heat_item I get back a queryset <class 'django.db.models.query.QuerySet'>
But this is an API View, so I would like to get a JSON back. SHouldn't that throw me an error?
And how could I do this so I get the same data structure back as a JSON?
It tried to rewrite it like this but without success because I can't serialize it:
#api_view()
def BuildingGroupHeatYear(request, pk, year):
passed_year = str(year)
building_group_object = get_object_or_404(BuildingGroup, id=pk)
buildings = building_group_object.buildings.all()
demand_list = []
for item in buildings:
demand_heat_item = item.demandheat_set.filter(year=passed_year).values('building_id', 'year', 'demand')
demand_list.append(demand_heat_item)
json_data = json.dumps(demand_list)
return Response(json_data)
I also tried with JSON Response and Json decoder.
But maybe there is a better way to do this?
Or maybe my question is formulated clearer like this: How can I get the data out of the loop, and return it as a JSON
Any help is much appreciated. Thanks in advance!!
Also, I tried the following:
for item in buildings:
demand_heat_item = item.demandheat_set.filter(year=passed_year).values('building_id', 'year', 'demand')
json_data = json.dumps(list(demand_heat_item))
return Response(json_data)
that gives me this weird response that I don't really want:
"[{\"building_id\": 1, \"year\": 2019, \"demand\": 230.3}, {\"building_id\": 1, \"year\": 2019, \"demand\": 234.0}]"