How to debug knex.js? Without having to pollute my db - mysql

does anyone know anything about debugging with knexjs and mysql? I'm trying to do a lot of things and test out stuff and I keep polluting my test database with random data. ideally, I'd just like to do things and see what the output query would be instead of running it against the actual database to see if it actually worked.
I can't find anything too helpful in their docs. they mention passing {debug: true} as one of the options in your initialize settings but it doesn't really explain what it does.
I am a junior developer, so maybe some of this is not meant to be understood by juniors but at the end of the day Is just not clear at all what steps I should take to be able to just see what queries would have been ran instead of running the real queries and polluting my db.
const result = await db().transaction(trx =>
trx.insert(mapToSnakeCase(address), 'id').into('addresses')
.then(addressId =>
trx.insert({ addresses_id: addressId, display_name: displayName }, 'id')
.into('chains')).toString();

You can build a knex query, but until you attach a .then() or awiat() (or run . asCallback((error,cb)=>{})), the query is just an object.
So you could do
let localVar = 8
let query = knex('table-a').select().where('id', localVar)
console.log(query.toString())
// outputs a string 'select * from table-a where id = 8'
This does not hit the database, and is synchronous. Make as many of these as you want!
As soon as you do await query or query.then(rows => {}) or query.asCallback( (err,rows)=>{} ) you are awaiting the db results, starting the promise chain or defining the callback. That is when the database is hit.
Turning on debug: true when initializing just writes the results of query.toSQL() to the console as they run against the actual DB. Sometimes an app might make a lot of queries and if one goes wrong this is a way to see why a DB call failed (but is VERY verbose so typically is not on all the time).
In our app's tests, we do actually test against the database because unit testing this type of stuff is a mess. We use knex's migrations on a test database that is brought down and up every time the tests run. So it always starts clean (or with known seed data), and if a test fails the DB is in the same state to be manually inspected. While we create a lot of test data in a testing run, it's cleaned up before the next test.

Related

How can I make a select function more performant in pyspark?

When I use the following function, it takes up to 10 seconds to execute. Is there any way to make it run quicker?
def select_top_20 (df, col):
most_data = df.groupBy(col).count().sort(f.desc("count"))
top_20_count = most_data.limit(20).drop("count")
top_20 = [row[col] for row in top_20_count.collect()]
return top_20
Hard to answer in general, the code seems fine to me.
It depends on how the input DataFrame was created:
if it was directly read from a data source (parquet, database or so), it is an I/O problem and there is not much you can do.
if the DataFrame went through some processing before the function is executed, you might inspect this part. Lazy evaluation in Spark means that all this processing is done from scratch when you execute this function (instead of only the commands listed in the function). I.e. reading the data from disk, processing, everything. Persisting or caching the DataFrame somewhere in-between might speed you up considerably.

Do MySQL database transactions break Laravel PHPUnit tests using RefreshDatabase or DatabaseTransactions?

I'm belatedly writing some PHPUnit feature tests for a project. One of these hits a number of routes which modify the database - and until now, I've been using database transactions in some of these routes.
If I use either the RefreshDatabase or DatabaseTransactions trait in my test, it fails with a strange error:
Illuminate\Database\Eloquent\ModelNotFoundException: No query results for model [App\Models\ClassM]
If I remove the traits (so the data will persist), the test passes.
I went through and removed all my database transactions from the relevant routes, and the test now does pass with RefreshDatabase.
My suspicion is that the problem is that MySQL doesn't support nested transactions, and both traits use transactions. But that would mean that if I want to run tests with RefreshDatabase (by far the best option!), I can't use transactions at all in my code.
Is that correct? It seems a major limitation.
I've found a workaround - and it seems a little strange. I was mostly using the "manual" transaction methods - e.g.
DB::beginTransaction();
try {
$user = User::create([...]); etc.
DB::commit();
}
catch (...) {
DB::rollBack();
}
I simply switched to the closure method:
DB::transaction(function () {
$user = User::create([...]); etc.
});
No idea why that would have fixed it!

AWS Lambda - MySQL caching

I have Lambda that uses RDS. I wanted to improve it and use the Lambda connection caching. I have found several articles, and implemented it on my side, best to my knowledge. But now, I am not sure it is this the rigth way to go.
I have Lambda (running Node 8), which has several files used with require. I will start from the main function, until I reach the MySQL initializer, which is exact path. All will be super simple, showing only to flow of the code that runs MySQL:
Main Lambda:
const jobLoader = require('./Helpers/JobLoader');
exports.handler = async (event, context) => {
const emarsysPayload = event.Records[0];
let validationSchema;
const body = jobLoader.loadJob('JobName');
...
return;
...//
Job Code:
const MySQLQueryBuilder = require('../Helpers/MySqlQueryBuilder');
exports.runJob = async (params) => {
const data = await MySQLQueryBuilder.getBasicUserData(userId);
MySQLBuilder:
const mySqlConnector = require('../Storage/MySqlConnector');
class MySqlQueryBuilder {
async getBasicUserData (id) {
let query = `
SELECT * from sometable WHERE id= ${id}
`;
return mySqlConnector.runQuery(query);
}
}
And Finally the connector itself:
const mySqlConnector = require('promise-mysql');
const pool = mySqlConnector.createPool({
host: process.env.MY_SQL_HOST,
user: process.env.MY_SQL_USER,
password: process.env.MY_SQL_PASSWORD,
database: process.env.MY_SQL_DATABASE,
port: 3306
});
exports.runQuery = async query => {
const con = await pool.getConnection();
const result = con.query(query);
con.release();
return result;
};
I know that measuring performance will show the actual results, but today is Friday, and I will not be able to run this on Lambda until the late next week... And really, it would be awesome start of the weekend knowing I am in right direction... or not.
Thank for the inputs.
First thing would be to understand how require works in NodeJS. I do recommend you go through this article if you're interested in knowing more about it.
Now, once you have required your connection, you have it for good and it won't be required again. This matches what you're looking for as you don't want to overwhelm your database by creating a new connection every time.
But, there is a problem...
Lambda Cold Starts
Whenever you invoke a Lambda function for the first time, it will spin up a container with your function inside it and keep it alive for approximately 5 mins. It's very likely (although not guaranteed) that you will hit the same container every time as long as you are making 1 request at a time. But what happens if you have 2 requests at the same time? Then another container will be spun up in parallel with the previous, already warmed up container. You have just created another connection on your database and now you have 2 containers. Now, guess what happens if you have 3 concurrent requests? Yes! One more container, which equals one more DB connection.
As long as there are new requests to your Lambda functions, by default, they will scale out to meet demand (you can configure it in the console to limit the execution to as many concurrent executions as you want - respecting your Account limits)
You cannot safely make sure you have a fixed amount of connections to your Database by simply requiring your code upon a Function's invocation. The good thing is that this is not your fault. This is just how Lambda functions behave.
...one other approach is
to cache the data you want in a real caching system, like ElasticCache, for example. You could then have one Lambda function be triggered by a CloudWatch Event that runs in a certain frequency of time. This function would then query your DB and store the results in your external cache. This way you make sure your DB connection is only opened by one Lambda at a time, because it will respect the CloudWatch Event, which turns out to run only once per trigger.
EDIT: after the OP sent a link in the comment sections, I have decided to add a few more info to clarify what the mentioned article wants to say
From the article:
"Simple. You ARE able to store variables outside the scope of our
handler function. This means that you are able to create your DB
connection pool outside of the handler function, which can then be
shared with each future invocation of that function. This allows for
pooling to occur."
And this is exactly what you're doing. And this works! But the problem is if you have N connections (Lambda Requests) at the same time. If you don't set any limits, by default, up to 1000 Lambda functions can be spun up concurrently. Now, if you then make another 1000 requests simultaneously in the next 5 minutes, it's very likely you won't be opening any new connections, because they have already been opened on previous invocations and the containers are still alive.
Adding to the answer above by Thales Minussi but for a Python Lambda. I am using PyMySQL and to create a connection pool I added the connection code above the handler in a Lambda that fetches data. Once I did this, I was not getting any new data that was added to the DB after an instance of the Lambda was executed. I found bugs reported here and here that are related to this issue.
The solution that worked for me was to add a conn.commit() after the SELECT query execution in the Lambda.
According to the PyMySQL documentation, conn.commit() is supposed to commit any changes, but a SELECT does not make changes to the DB. So I am not sure exactly why this works.

How do I get Rails 4.x streaming to work with MySQL when testing?

I created a new Rails 4.2.1 test project to try out the new streaming feature (the 'Live' one which I read about here). This project is set up to use MySQL for the database (I also tried Sqlite but couldn't repro the issue with it). The project is simple, consisting only of: 1) a model Test with 2 attributes (both strings). 2) a simple route resources :tests and 3) a simple controller tests_controller with one action index. The model and controller were generated by the standard rails generators, and only the controller was modified, as follows:
class TestsController < ApplicationController
include ActionController::Live
def index
response.headers['Content-Type'] = 'application/json'
response.stream.write('{"count": 5, "tests": [')
Test.find_each do |test|
response.stream.write(test.to_json)
response.stream.write(',')
end
response.stream.write(']}')
response.stream.close
end
end
When I run rails s and test by hand everything seems fine. But when I added a test (shown below) I get a strange error:
1) Error:
TestsControllerTest#test_index:
ActiveRecord::StatementInvalid: Mysql2::Error: This connection is in use by: #<Thread:0x007f862a4a7e48#/Users/xxx/.rvm/gems/ruby-2.2.2/gems/actionpack-4.2.1/lib/action_controller/metal/live.rb:269 sleep>: ROLLBACK
The test is:
require 'test_helper'
class TestsControllerTest < ActionController::TestCase
test "index" do
#request.headers['Accept'] = 'application/json'
get :index
assert_response :success
end
end
Note that the error is intermittent, coming up only about half the time. Also, even though testing by hand doesn't cause any errors I'm worried that when multiple clients hit the API at the same time that errors will occur. Any suggestions as to what's going on here would be much appreciated.
Pretty old, but you need to actually checkout a new database connection since ActionController::Live executes the action in a new thread:
The final caveat is that your actions are executed in a separate thread than the main thread. Make sure your actions are thread safe, and this shouldn't be a problem (don't share state across threads, etc).
https://github.com/rails/rails/blob/861b70e92f4a1fc0e465ffcf2ee62680519c8f6f/actionpack/lib/action_controller/metal/live.rb
You can even use an around_filter/around_action for this.

SQL Alchemy + Testing webserver with InnoDB fails

I am currently trying to move my DB tables over to InnoDB from MyISAM. I am having timing issues with requests and cron jobs that are running on the server that is leading to some errors. I am quite sure that transaction support will help me with the problem. I am therefore transitioning to InnoDB.
I have a suite of tests which make calls to our webservices REST API and receive XML responses. The test suite is fairly thorough, and it's written in Python and uses SQLAlchemy to query information from the database. When I change the tables in the system from MyISAM to InnoDB however, the tests start failing. However, the tests aren't failing because the system isn't working, they are failing because the ORM is not correctly querying the rows from the database I am testing on. when I step through the code I see the correct results, but the ORM is not returning the correct results at all.
Basic flow is:
class UnitTest(unittest.TestCase):
def setUp(self):
# Create a test object in DB that gets affected by the web server
testObject = Obj(foo='one')
self.testId = testObject.id
session.add(testObject)
session.commit()
def tearDown(self):
# Clean up after the test
testObject = session.query(Obj).get(self.testId)
session.delete(testObject)
session.commit()
def test_web_server(self):
# Ensure the initial state of the object.
objects = session.query(Obj).get(self.testId)
assert objects.foo == 'one'
# This will make a simple HTTP get call on an url that will modify the DB
response = server.request.increment_foo(self.testId)
# This one fails, the object still has a foo of 'one'
# When I stop here in a debugger though, and look at the database,
# The row in question actually has the correct value in the database.
# ????
objects = session.query(Obj).get(self.testId)
assert objects.foo == 'two'
Using MyISAM tables to store the object and this test will pass. However, when I change to InnoDB tables, this test will not pass. What is more interesting is that when I step through the code in the debugger, I can see that the datbase has what I expect, so it's not a problem in the web server code. I have tried nearly every combination of expire_all, autoflush, autocommit, etc. etc, and still can't get this test to pass.
I can provide more info if necessary.
Thanks,
Conrad
The problem is that you put the line self.testId = testObject.id before new object is added to session, flushed, and SQLAlchemy assigned ID to it. Thus self.testId is always None. Move this line below session.commit().