My Lambda is not able to access MySQL RDS. What to do? - mysql

I have an architecture where the lambda would run when a irs data file is put on the S3 bucket, I can easily connect to my RDS on my local machine but for some very weird reason the Lambda is not able to access it and giving error:
"errorMessage": "2022-11-15T22:22:51.919Z 9f20c035-5a47-4c6f-be9f-407b4a43aee6 Task timed out after 60.06 seconds"
2-11-15T22:21:53.402Z 9f20c035-5a47-4c6f-be9f-407b4a43aee6 URI updated to: https://irs-data.s3.amazonaws.com/?prefix=index&encoding-type=url
[DEBUG] 2022-11-15T22:21:53.402Z 9f20c035-5a47-4c6f-be9f-407b4a43aee6 Calculating signature using v4 auth.
[DEBUG] 2022-11-15T22:21:53.402Z 9f20c035-5a47-4c6f-be9f-407b4a43aee6 CanonicalRequest:
GET
/
encoding-type=url&prefix=index
host:irs-data.s3.amazonaws.com
x-amz-content-sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
x-amz-date:20221115T222153Z
x-amz-security-token:IQoJb3JpZ2luX2VjEP///////////wEaCXVzLWVhc3QtMSJHMEUCIQC9x2awzo/kQIantnRem2kmylKVHw5fBV+ylz/PQeP0DwIgQHovdX5Jv9/cpe/PAaWDTBZGcc3TxXGUALQRJCh1XMsq7QII9///////////ARAAGgw5NzU4MjIzMjkxNDIiDDegGWv5Wxk3ihIEdCrBAlqSbCaW/e4tIn2SK5gAcePArZf5Ij7o1qhoqEyG2boXivxftDkd7vM3RGg9lK2YaMEx9ku3mCBFpS03T5zlbr2EnaQjRuvEZzdHBKY79qqbUOCqcITmYkQQK+GSCoAyfnckjbjY0yORD41/7OS6wRa9pRKzu0ib8V/aE8Uln5Eem9ylYSn7LdyNWanD2I0CNfYNMV+Xx0bduAhVyXP6HjXikjTG5e2gqlA61xQmq4NMXyRixxINUk47R1FWBqPnYVqQWOIPW1HKcbj26qlW+JJyh530ML1RK3qqkssnH7c0LGu8rJz9Ag9wldHcRlODljZcaOmX7OlErdwIImGoeb99ngcVKVrCc+QnegTQolsoAhU3AG68LrZrmY/zRborttAslMzeUpiZ4fkA86QKJJDdpJEL/sZc/ZXzBMCj2x/ZozD+odCbBjqeAVPiKRQMCuBUqK8LlnALW2ki6RwMyS8WmGFpSoDjUYcyFDhMkHSa8TnTa+0gdertafyc4c4NPfsWFBYTLavdkgmACCkug75ENt3LWAgpGvBMxp6f2hiZKjJzqQnOE6VofIUXU8PLycB+L9uaJuYplLuMoRmjURtHFj5whMZrGclS0+V9/eH2ep8x9SAiFIJ1yOimmox6FTw2DhvpuE8U
host;x-amz-content-sha256;x-amz-date;x-amz-security-token
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
[DEBUG] 2022-11-15T22:21:53.439Z 9f20c035-5a47-4c6f-be9f-407b4a43aee6 StringToSign:
AWS4-HMAC-SHA256
20221115T222153Z
20221115/us-east-1/s3/aws4_request
5d61ffe01d9d6dd6aee4b1faeecbf21721efb8696f94f969389c93b05579847c
[DEBUG] 2022-11-15T22:21:53.439Z 9f20c035-5a47-4c6f-be9f-407b4a43aee6 Signature:
d6e70d2c6350adfa7231bd7b2a63e5ac2fd83583f5dde1dfada2b08854d493d2
[DEBUG] 2022-11-15T22:21:53.439Z 9f20c035-5a47-4c6f-be9f-407b4a43aee6 Sending http request: <AWSPreparedRequest stream_output=False, method=GET, url=https://irs-data.s3.amazonaws.com/?prefix=index&encoding-type=url, headers={'User-Agent': b'Boto3/1.19.10 Python/3.9.13 Linux/4.14.255-285-225.501.amzn2.x86_64 exec-env/AWS_Lambda_python3.9 Botocore/1.22.12 Resource', 'X-Amz-Date': b'20221115T222153Z', 'X-Amz-Security-Token': b'IQoJb3JpZ2luX2VjEP///////////wEaCXVzLWVhc3QtMSJHMEUCIQC9x2awzo/kQIantnRem2kmylKVHw5fBV+ylz/PQeP0DwIgQHovdX5Jv9/cpe/PAaWDTBZGcc3TxXGUALQRJCh1XMsq7QII9///////////ARAAGgw5NzU4MjIzMjkxNDIiDDegGWv5Wxk3ihIEdCrBAlqSbCaW/e4tIn2SK5gAcePArZf5Ij7o1qhoqEyG2boXivxftDkd7vM3RGg9lK2YaMEx9ku3mCBFpS03T5zlbr2EnaQjRuvEZzdHBKY79qqbUOCqcITmYkQQK+GSCoAyfnckjbjY0yORD41/7OS6wRa9pRKzu0ib8V/aE8Uln5Eem9ylYSn7LdyNWanD2I0CNfYNMV+Xx0bduAhVyXP6HjXikjTG5e2gqlA61xQmq4NMXyRixxINUk47R1FWBqPnYVqQWOIPW1HKcbj26qlW+JJyh530ML1RK3qqkssnH7c0LGu8rJz9Ag9wldHcRlODljZcaOmX7OlErdwIImGoeb99ngcVKVrCc+QnegTQolsoAhU3AG68LrZrmY/zRborttAslMzeUpiZ4fkA86QKJJDdpJEL/sZc/ZXzBMCj2x/ZozD+odCbBjqeAVPiKRQMCuBUqK8LlnALW2ki6RwMyS8WmGFpSoDjUYcyFDhMkHSa8TnTa+0gdertafyc4c4NPfsWFBYTLavdkgmACCkug75ENt3LWAgpGvBMxp6f2hiZKjJzqQnOE6VofIUXU8PLycB+L9uaJuYplLuMoRmjURtHFj5whMZrGclS0+V9/eH2ep8x9SAiFIJ1yOimmox6FTw2DhvpuE8U', 'X-Amz-Content-SHA256': b'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855', 'Authorization': b'AWS4-HMAC-SHA256 Credential=ASIA6GM4LCU3GISECH6S/20221115/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date;x-amz-security-token, Signature=d6e70d2c6350adfa7231bd7b2a63e5ac2fd83583f5dde1dfada2b08854d493d2'}>
[DEBUG] 2022-11-15T22:21:53.459Z 9f20c035-5a47-4c6f-be9f-407b4a43aee6 Certificate path: /var/task/botocore/cacert.pem
[DEBUG] 2022-11-15T22:21:53.459Z 9f20c035-5a47-4c6f-be9f-407b4a43aee6 Starting new HTTPS connection (1): irs-data.s3.amazonaws.com:443
2022-11-15T22:22:51.919Z 9f20c035-5a47-4c6f-be9f-407b4a43aee6 Task timed out after 60.06 seconds
END RequestId: 9f20c035-5a47-4c6f-be9f-407b4a43aee6
REPORT RequestId: 9f20c035-5a47-4c6f-be9f-407b4a43aee6 Duration: 60061.89 ms Billed Duration: 60000 ms Memory Size: 128 MB Max Memory Used: 116 MB Init Duration: 1040.64 ms
Lambda Code with S3 data:
import pandas as pd
import boto3
import os
from dotenv import load_dotenv
import logging
import sys
import time
import datetime as dt
import io
import pymysql
####### LOADING ENVIRONMENT VARIABLES #######
load_dotenv()
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
BUCKET = os.getenv('BUCKET')
BUCKET_PREFIX = os.getenv('BUCKET_PREFIX')
# Credentials to database connection
hostname= os.getenv('HOSTNAME')
dbname= os.getenv('DATABASE')
uname= os.getenv('USERNAME')
pwd= os.getenv('PASSWORD')
def lambda_handler(event, context):
try:
logger.info("TEST")
logger.info(BUCKET)
s3 = boto3.resource('s3')
# assigning the bucket:
my_bucket = s3.Bucket(BUCKET)
data_list = []
for my_bucket_object in my_bucket.objects.filter(Prefix=BUCKET_PREFIX):
if my_bucket_object.key.endswith(".csv"):
key=my_bucket_object.key
body=my_bucket_object.get()['Body'].read()
temp_data = pd.read_csv(io.BytesIO(body))
data_list.append(temp_data)
# concatenating all the files together:
df = pd.concat(data_list)
# Connect to MySQL Database
connection = pymysql.connect(host=hostname,user=uname,password=pwd,database=dbname)
cursor = connection.cursor()
# Truncate the table everytime before an ETL:
sql_trunc = "TRUNCATE TABLE `irs990`"
cursor.execute(sql_trunc)
# commit the results
connection.commit()
# creating columns from the dataframe:
cols = "`,`".join([str(i) for i in df.columns.tolist()])
# adding dataframe to mysql RDS
for i,row in df.iterrows():
sql = "INSERT INTO `irs990` (`" +cols + "`) VALUES (" + "%s,"*(len(row)-1) + "%s)"
cursor.execute(sql, tuple(row))
connection.commit()
# checking if data was successfully written:
sql = "SELECT * FROM `irs990`"
cursor.execute(sql)
result = cursor.fetchall()
for i in result:
print(i)
# closing MySQL connection:
connection.close()
except Exception as e:
logging.error(e)
My Lambda VPC details:
2
My RDS details:
3
4
5
Can somebody please help me what to do? I am assigning the lambda the same VPC as the RDS, I tried using the same security group as well and making sure the outbound IP address of lambda is in the inbound rules for RDS. But nothing :(

The proper security configuration should be:
A Security Group on the AWS Lambda function (Lambda-SG) that permits All outbound access (which is the default configuration)
A Security Group on the Amazon RDS database (DB-SG) that permits inbound connections on port 3306 from Lambda-SG
That is, DB-SG should specifically reference Lambda-SG. This will then permit the incoming connection from the Lambda function.
Merely putting the Lambda function and the RDS database "in the same Security Group" is insufficient because security groups apply to each resource individually. Unless the security group allows a connection from 'itself', this will not permit the desired access. Much better to use two security groups as described above.

Related

Mocking sqlalchemy.create_engine fails to mock the function

I want to test my DBConnect class, which manages and returns connections made with sqlalchemy.
I want to specifically test that the constructor of DBConnect calls sqlalchemy.create_engine and stores the engine in the connection.
My test function is this:
from unittest import mock
from catcom.db_connect import DBConnect
import os
class MockEngine:
def connect(self):
return "test_connection"
#mock.patch("sqlalchemy.create_engine", return_value=MockEngine())
def test_from_key_uses_correct_key(mock):
print(mock)
db = DBConnect("user", "pw", "127.0.0.1", "1234", "db")
assert db.connection == "test_connection"
However, I see a response of:
E sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) connection to server at "127.0.0.1", port 1234 failed: Connection refused
E Is the server running on that host and accepting TCP/IP connections?
How can I mock the create_engine method without it being actually called?
UPDATE
Adding the DB Connect class and relevant methods.
from sqlalchemy import create_engine
import os
class DBConnect:
"""Manages database connection using SQLalchemy"""
def __init__(self, user: str, password: str, host: str, port: str, database: str):
self.user = user
self.password = password
self.host = host
self.port = port
self.database = database
self.connection = self.connect()
def connect(self):
"""creates a connection object that is used to query database using SQLalchemy core
automatically created in __init__"""
engine = create_engine(
f"postgresql+psycopg2://{self.user}:{self.password}#{self.host}:{self.port}/{self.database}"
)
return engine.connect()
You are importing create_engine out of the sqlalchemy namespace with from sqlalchemy import create_engine, which puts it in the module's namespace.
That's what you need to patch, see where to patch.
#mock.patch('catcom.db_connect.create_engine', return_value='test_success')
Although, patching catcom.db_connect.create_engine to return 'test_success' will cause a problem when the DBConnect.connect method tries to call the connect method of the engine (which will be a string).

How can I connect to a moto mysql instance?

I created an RDS DB Instance using a mocked boto3 rds client. Here's how I set it up in my conftest.py
#pytest.fixture
def aws_credentials():
"""Mocked AWS Credentials for moto."""
os.environ["AWS_ACCESS_KEY_ID"] = "testing"
os.environ["AWS_SECRET_ACCESS_KEY"] = "testing"
os.environ["AWS_SECURITY_TOKEN"] = "testing"
os.environ["AWS_SESSION_TOKEN"] = "testing"
#pytest.fixture
def rds_client(aws_credentials, aws_region):
with mock_rds():
client = boto3.client("rds", region_name=aws_region)
yield client
Following the example here (https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.IAMDBAuth.Connecting.Python.html) I set up my mysql connector like this:
db_instance = rds_client.create_db_instance(DBInstanceIdentifier="TestDBInstanceIdentifier",
DBInstanceClass="db.m4.large", Engine="mysql",
MasterUsername="root", DBName="TestDBName")
print("RDS Instance-----------------------------------------------------")
print(db_instance)
host = db_instance['DBInstance']['Endpoint']['Address']
port = db_instance['DBInstance']['Endpoint']['Port']
user = db_instance['DBInstance']['MasterUsername']
dbname = db_instance['DBInstance']['DBName']
print("Starting connection")
token = rds_client.generate_db_auth_token(DBHostname=host, Port=port, DBUsername=user, Region=aws_region)
mydb = mysql.connector.connect(host=host, database=dbname, user=user, passwd=token, port=port)
However, the connector can't find the DB:
FAILED test_read_rds_db - mysql.connector.errors.DatabaseError: 2005 (HY000): Unknown MySQL server host 'TestDBInstanceIdentifier.aaaaaaaaaa.us-east-1.rds.amazonaws.com' (8)
Has someone been able to set this up before?
Moto does not offer this functionality. It mocks the AWS API, but does not expose any RDBMS-functionality.
You could look into Localstack instead. It uses Moto in the background to mock calls to AWS, but offers features on top of that such as the ability to connect to an RDS instance.
See the docs here: https://docs.localstack.cloud/aws/rds/

I am trying to call a DAG( wrtitten in Python) using Cloud Function(Python 3.7) and getting error "405 Method Not Found" Could someone help here?

I have used below code available on the Google Cloud Docs platform:
from google.auth.transport.requests import Request
from google.oauth2 import id_token
import requests
IAM_SCOPE = 'https://www.googleapis.com/auth/iam'
OAUTH_TOKEN_URI = 'https://www.googleapis.com/oauth2/v4/token'
def trigger_dag(data, context=None):
"""Makes a POST request to the Composer DAG Trigger API
When called via Google Cloud Functions (GCF),
data and context are Background function parameters.
For more info, refer to
https://cloud.google.com/functions/docs/writing/background#functions_background_parameters-python
To call this function from a Python script, omit the ``context`` argument
and pass in a non-null value for the ``data`` argument.
"""
# Fill in with your Composer info here
# Navigate to your webserver's login page and get this from the URL
# Or use the script found at
# https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/composer/rest/get_client_id.py
client_id = '87431184677-jitlhi9o0u9sin3uvdebqrvqokl538aj.apps.googleusercontent.com'
# This should be part of your webserver's URL:
# {tenant-project-id}.appspot.com
webserver_id = 'b368a47a354ddf2f6p-tp'
# The name of the DAG you wish to trigger
dag_name = 'composer_sample_trigger_response_dag'
webserver_url = (
#'https://'
webserver_id
+ '.appspot.com/admin/airflow/tree?dag_id='
+ dag_name
#+ '/dag_runs'
)
# Make a POST request to IAP which then Triggers the DAG
make_iap_request(
webserver_url, client_id, method='POST', json={"conf": data, "replace_microseconds": 'false'})
# This code is copied from
# https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/iap/make_iap_request.py
# START COPIED IAP CODE
def make_iap_request(url, client_id, method='GET', **kwargs):
"""Makes a request to an application protected by Identity-Aware Proxy.
Args:
url: The Identity-Aware Proxy-protected URL to fetch.
client_id: The client ID used by Identity-Aware Proxy.
method: The request method to use
('GET', 'OPTIONS', 'HEAD', 'POST', 'PUT', 'PATCH', 'DELETE')
**kwargs: Any of the parameters defined for the request function:
https://github.com/requests/requests/blob/master/requests/api.py
If no timeout is provided, it is set to 90 by default.
Returns:
The page body, or raises an exception if the page couldn't be retrieved.
"""
# Set the default timeout, if missing
if 'timeout' not in kwargs:
kwargs['timeout'] = 90
# Obtain an OpenID Connect (OIDC) token from metadata server or using service
# account.
google_open_id_connect_token = id_token.fetch_id_token(Request(), client_id)
# Fetch the Identity-Aware Proxy-protected URL, including an
# Authorization header containing "Bearer " followed by a
# Google-issued OpenID Connect token for the service account.
resp = requests.request(
method, url,
headers={'Authorization': 'Bearer {}'.format(
google_open_id_connect_token)}, **kwargs)
if resp.status_code == 403:
raise Exception('Service account does not have permission to '
'access the IAP-protected application.')
elif resp.status_code != 200:
raise Exception(
'Bad response from application: {!r} / {!r} / {!r}'.format(
resp.status_code, resp.headers, resp.text))
else:
return resp.text
# END COPIED IAP CODE
You encounter the error "405 method not found" because you are trying to send a request to your_webserver_id.appspot.com/admin/airflow/tree?dag_id=composer_sample_trigger_response_dag which is the URL seen "Tree view" in the airflow webserver.
To properly send request to the Airflow API you need to construct the webserver_url just like in the documentation in Trigger DAGs in Cloud Functions. The webserver_url was constructed to use Trigger a new DAG endpoint to send requests. So if you'd like to trigger the DAG you can stick with the code below.
Airflow run DAG endpoint:
POST https://airflow.apache.org/api/v1/dags/{dag_id}/dagRuns
webserver_url = (
'https://'
+ webserver_id
+ '.appspot.com/api/experimental/dags/'
+ dag_name
+ '/dag_runs'
)
Moving forward, if you would like to perform different operations using the Airflow API you can check the Airflow REST reference

What is wrong in the following Lambda code that throws up module error?

Using the following code to use make an API that connects to Amazon AWS. This is the AMazon Lambda code that I use-
import boto3
import json
import requests
from requests_aws4auth import AWS4Auth
region = 'us-east-1'
service = 'es'
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region,
service, session_token=credentials.token)
host = 'XXX.com'
index = 'items'
url = 'https://' + host + '/' + index + '/_search'
# Lambda execution starts here
def handler(event, context):
# Put the user query into the query DSL for more accurate search results.
# Note that certain fields are boosted (^).
query = {
"query": {
"multi_match": {
"query": event['queryStringParameters']['q'],
}
}
}
# ES 6.x requires an explicit Content-Type header
headers = { "Content-Type": "application/json" }
# Make the signed HTTP request
r = requests.get(url, auth=awsauth, headers=headers,
data=json.dumps(query))
# Create the response and add some extra content to support CORS
response = {
"statusCode": 200,
"headers": {
"Access-Control-Allow-Origin": '*'
},
"isBase64Encoded": False
}
# Add the search results to the response
response['body'] = r.text
return response
This should connect to an AWS ES cluster with endpoint XXX.com
Getting output when trying to test -
START RequestId: f640016e-e4d6-469f-b74d-838b9402968b Version: $LATEST
Unable to import module 'index': Error
at Function.Module._resolveFilename (module.js:547:15)
at Function.Module._load (module.js:474:25)
at Module.require (module.js:596:17)
at require (internal/module.js:11:18)
END RequestId: f640016e-e4d6-469f-b74d-838b9402968b
REPORT RequestId: f640016e-e4d6-469f-b74d-838b9402968b Duration:
44.49 ms Billed Duration: 100 ms Memory Size: 128 MB Max
Memory Used: 58 MB
While creating a Lambda function, we need to specify a handler, which is a function in your code, that the AWS Lambda service can invoke when the given Lambda function is executed.
By default, a Python Lambda function is created with handler as lambda_function.lambda_handler which signifies that the service must invoke lambda_handler function contained inside lambda_function module.
From the error you're receiving, it seems that the handler is incorrectly configured to something like index.<something>, and since there's no Python module called index in your deployment package, Lambda is unable to import the same in order to start the execution.
If i am getting things correctly to connect to a AWS ES cluster you need something of this sort
import gitlab
import logging
from elasticsearch import Elasticsearch, RequestsHttpConnection
from requests_aws4auth import AWS4Auth
import boto3
#from aws_requests_auth.aws_auth import AWSRequestsAuth
LOGGER = logging.getLogger()
ES_HOST = {'host':'search-testelasticsearch-xxxxxxxxxx.eu-west-2.es.amazonaws.com', 'port': 443}
def lambda_handler(event, context):
LOGGER.info('started')
dump2={
'number': 9
}
service = 'es'
credentials = boto3.Session().get_credentials()
print('-------------------------------------------')
print(credentials.access_key)
print(credentials.secret_key)
print('--------------------------------------------------------')
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, "eu-west-2", service, session_token=credentials.token)
es = Elasticsearch(hosts=[ES_HOST], http_auth = awsauth, use_ssl = True, verify_certs = True, connection_class = RequestsHttpConnection)
DAVID_INDEX = 'test_index'
response = es.index(index=DAVID_INDEX, doc_type='is_this_important?', body=dump2, id='4')

Database connection error while celery worker remains idle for 24 hours

I have a django based web application where I am using Kafka to process some orders. Now I use Celery Workers to assign a Kafka Consumer to each topics. Each Kafka Consumer is assigned to a Kafka topic in the form of a Kafka tasks. However after a day or so, when I am submitting a task I am getting the following error :
_mysql.connection.query(self, query)
_mysql_exceptions.OperationalError: (2006, 'MySQL server has gone away')
The above exception was the direct cause of the following exception:
Below is how my tasks.py file looks like :
#shared_task
def init_kafka_consumer(topic):
try:
if topic is None:
raise Exception("Topic is none, unable to initialize kafka consumer")
logger.info("Spawning new task to subscribe to topic")
params = []
params.append(topic)
background_thread = Thread(target=sunscribe_consumer, args=params)
background_thread.start()
except Exception :
logger.exception("An exception occurred while reading message from kafka")
def sunscribe_consumer(topic) :
try:
if topic is None:
raise Exception("Topic is none, unable to initialize kafka consumer")
conf = {'bootstrap.servers': "localhost:9092", 'group.id': 'test', 'session.timeout.ms': 6000,
'auto.offset.reset': 'earliest'}
c = Consumer(conf)
logger.info("Subscribing consumer to topic "+str(topic[0]))
c.subscribe(topic)
# Read messages from Kafka
try:
while True:
msg = c.poll(timeout=1.0)
if msg is None:
continue
if msg.error():
raise KafkaException(msg.error())
else:
try:
objs = serializers.deserialize("json", msg.value())
for obj in objs:
order = obj.object
order = BuyOrder.objects.get(id=order.id) #Getting an error while accessing DB
if order.is_pushed_to_kafka :
return
order.is_pushed_to_kafka = True
order.save()
from web3 import HTTPProvider, Web3, exceptions
w3 = Web3(HTTPProvider(INFURA_MAIN_NET_ETH_URL))
processBuyerPayout(order,w3)
except Exception :
logger.exception("An exception occurred while de-serializing message")
except Exception :
logger.exception("An exception occurred while reading message from kafka")
finally:
c.close()
except Exception :
logger.exception("An exception occurred while reading message from kafka")
Is there anyway that I could check if database connection exists as soon as a task is received and if not, I can re-establish the connection?
According to https://github.com/celery/django-celery-results/issues/58#issuecomment-418413369
and comments above putting this code:
from django.db import close_old_connections
close_old_connections()
which is closing old connection and opening new one inside your task should helps.