Exception in using bootstrap-storage-plugins.json file for storage plugin in apache-drill - apache-drill

I want to add storage plugins for MongoDB in apache-drill. After reading docs, I came to know that programmatically I can do that in two ways:
Rest API
using bootstrap-storage-plugins.json for configuration
I am using 2nd way for my java code.
Useful portion of my code:
Connection conn = new Driver().connect("jdbc:drill:zk=local",null);
Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery("show databases");
while (rs.next())
{
String SCHEMA_NAME = rs.getString("SCHEMA_NAME");
System.out.println(SCHEMA_NAME);
}
bootstrap-storage-plugins.json:
{
"type": "mongo",
"connection": "mongodb://localhost:27017/",
"enabled": true
}
But on firing
"select * from mongo.testDB.`testCollection`";
I got following exception:
org.apache.calcite.sql.validate.SqlValidatorException
SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Table 'mongo.testDB.testCollection' not found
Aug 12, 2015 3:47:05 AM org.apache.calcite.runtime.CalciteException
SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1, column 15 to line 1, column 19: Table
'mongo.testDB.testCollection' not found
java.sql.SQLException: PARSE ERROR: From line 1, column 15 to line 1, column 19: Table 'mongo.testDB.testCollection' not found
bootstrap-storage-plugins.json is in my classpath. Do I need to provide and additional information?
Edit:
I tried show databases query and it's not showing schemas from MongoDB. It's only showing:
INFORMATION_SCHEMA
cp.default
dfs.default
dfs.root
dfs.tmp
sys

Your query looks like a query on the file system. Using the mongo storage plugin configuration, there's no workspace or file, so try making your query look like this:
SELECT * FROM testCollection;
Make sure you're using the right database name and your and collection is listed (SHOW DATABASES and SHOW TABLES).
This published correction of the Drill doc might help.

Related

Why won't my extracted data from Spotify's API store in MySQL database?

I have connected to Spotify's API in Python to extract the top twenty tracks of a searched artist. I am trying to store the data in MySQL Workbench in a database named 'spotify_api', I created called 'spotify'. Before I added my code to connect to MySQL Workbench, my code worked correctly and was able to extract the list of tracks, but I have run into issues in getting my code to connect to my database. Below is the code I have written to both extract the data and store it into my database:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import mysql.connector
mydb = mysql.connector.connect(
host = "localhost",
user = "root",
password = "(removed for question)",
database = "spotify_api"
)
mycursor = mydb.cursor()
sql = 'DROP TABLE IF EXISTS spotify_api.spotify;'
mycursor.execute(sql)
sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id="(removed for question)",
client_secret="(removed for question)"))
results = sp.search(q='sza', limit=20)
for idx, track in enumerate(results['tracks']['items']):
print(idx, track['name'])
sql = "INSERT INTO spotify_api.spotify (tracks, items) VALUES (" + \
str(idx) + ", '" + track['name'] + "');"
mycursor.execute(sql)
mydb.commit()
print(mycursor.rowcount, "record inserted.")
mycursor.execute("SELECT * FROM spotify_api.spotify;")
myresult = mycursor.fetchall()
for x in myresult:
print(x)
mycursor.close()
Every time I run my code in the VS Code terminal, I receive an error stating that my table doesn't exist. This is what it states:
"mysql.connector.errors.ProgrammingError: 1146 (42S02): Table 'spotify_api.spotify' doesn't exist"
I'm not sure what I need to fix in my code or in my database in order to eliminate this error and get my data stored into my table. In my table I have created two columns 'tracks' and 'items', but I'm not sure if my issues lie in my database or in my code.
Well, it seems pretty clear. You ran
DROP TABLE IF EXISTS spotify_api.spotify;
...
INSERT INTO spotify_api.spotify (tracks, items) VALUES ...
We won't even raise the spectre of the Chuck Berry
track titled little ol' Bobby Tables here.
You DROP'd it, then tried to INSERT into it.
That won't work.
You'll need to CREATE TABLE prior to the INSERT.

MySQL binding multiple parameters to a single query

I have a MySql database, and I'm connecting to it from a .Net app using Dapper. I have the following code:
await connection.ExecuteAsync(
"DELETE FROM my_data_table WHERE somedata IN (#data)",
new { data = datalist.Select(a => a.dataitem1).ToArray() },
trans);
When I do this with more than a single value, I get the following error:
MySqlConnector.MySqlException: 'Operand should contain 1 column(s)'
Is what I'm trying to do possible in MySql / Dapper, or do I have to issue a query per line I wish to delete?
Your original code was almost fine. You just need to remove the parentheses around the parameter. Dapper will insert those for you:
await connection.ExecuteAsync(
"DELETE FROM my_data_table WHERE somedata IN #data",
new { data = datalist.Select(a => a.dataitem1).ToArray() },
trans);

MYSQL query in a finite loop with sqlalchemy lost connection

I`m building a finance web app with flask, sqlalchemy and MYSQL.
I try to query a collection of trading data in a stocks list loop, but it shows lost connection after some steps (about first 5~10 steps randomly).
for stock in stocks: #about 2000~3000 items
engine = create_engine(os.getenv('MYSQL_DATABASE_URI')
sql = 'select * from ..."
df = pd.read_sql_query(sql, engine)
return df
The error message is as following:
sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query')
(Background on this error at: http://sqlalche.me/e/13/e3q8)
Any direction would be greatly appreciated!
I would suggest reusing a single engine object. First create the engine object.
engine = create_engine(os.getenv('MYSQL_DATABASE_URI')
for stock in stocks: #about 2000~3000 items
sql = 'select * from ...'
df = pd.read_sql_query(sql, engine)
return df
Although this doesn't make a lot of sense because there is a return statement inside this loop so the loop never finishes.
Maybe make a list and return that instead?
engine = create_engine(os.getenv('MYSQL_DATABASE_URI')
values = []
for stock in stocks: #about 2000~3000 items
sql = 'select * from ...'
values.append(pd.read_sql_query(sql, engine))
return values

Slick Prepared Statement

I'm using slick 3.0.0-M1 with "com.zaxxer" % "HikariCP" % "2.4.3"
Slick is preparing a statement for every query (indicated by logging) which is bad:
"Preparing statement: select * from ..."
My configuration tells Slick / Hikari to cache prepared statements:
myDB {
url = "jdbc:mysql://...
user = ...
...
connectionPool = HikariCP
queueSize = 50000
maxConnections = 50
properties.cachePrepStmts = true
properties.prepStmtCacheSize = 20000
properties.prepStmtCacheSqlLimit = 100000
}
The logs seem to indicate these properties are read:
configuration:
...
dataSourceName..................
dataSourceClassName.............
dataSourceProperties............
{password=<masked>,
prepStmtCacheSqlLimit=100000,
cachePrepStmts=true,
prepStmtCacheSize=20000}
maximumPoolSize.................50
poolName..........................
The db object is instantiated and used in a test:
val db = Database.forConfig("", config.getConfig("myDB"))
val qTemplate = StaticQuery[(Int), MyRow] + "select * from table_name where num=?"
db.withSession{ implicit session =>
(0 until 100).foreach{ case i =>
qTemplate(2).foreach(println)
}
}
For every call to qTemplate(2), slick logs 'Preparing Statement...' Why is the template not cached?
First, I suggest you use a production version of Slick like 3.1.1 if you'd like the latest. http://slick.typesafe.com/
Normally, the connection pool does connection caching and the database does statement caching. Using a PreparedStatement with placeholders like ? should be sufficient for MySql to cache the statement. The database must parse the statement and each statement must look the same for the DB to cache the statement. Having the database avoid parsing the same statement over and over with different parameter values is what this feature avoids.
If you look at the document about statement caching will also help explain - https://github.com/brettwooldridge/HikariCP - There is some more information about MySQL here as well - https://github.com/brettwooldridge/HikariCP/wiki/MySQL-Configuration
I would just upgrade Slick and continue doing what you are doing.
Eric

Insert query failing when using a parameter in the associated select statement in SQL Server CE

INSERT INTO voucher (voucher_no, account, party_name, rece_amt, particulars, voucher_date, voucher_type, cuid, cdt)
SELECT voucher_rec_no, #account, #party_name, #rece_amt, #particulars, #voucher_date, #voucher_type, #cuid, #cdt
FROM auto_number
WHERE (auto_no = 1)
Error:
A parameter is not allowed in this location. Ensure that the '#' sign is in a valid location or that parameters are valid at all in this SQL statement.
I've just stumbled upon this whilst trying to fix the same issue. I know it's late but, assuming that you're getting this error when attempting to execute the query via .net, ensure that you are setting the SqlCeParameter.DbType - if this is not specified, you get the exception you listed above.
Example (assume cmd is a SqlCeCommand - all the stuff is in the System.Data.SqlServerCe namespace):
SqlCeParameter param = new SqlCeParameter();
param.ParameterName = "#SomeParameterName";
param.Direction = ParameterDirection.Input;
param.DbType = DbType.String; // this is the important bit to avoid the exception
param.Value = kvp.Value;
cmd.Parameters.Add(param);
Obviously, you'd want to set the DB type to match the type of your parameter.