I'd like to connect to prestodb with SQLalchemy interface. I'm running prestodb==0.7.0 and SQLalchemy== 1.4.20 and SQLalchemy doesn't seem to have prestodb baked in:
NoSuchModuleError: Can't load plugin: sqlalchemy.dialects:presto
Not much luck with registering the prestodb either:
from sqlalchemy.dialects import registry
import prestodb
from prestodb.dbapi import Connection
registry.register('presto', 'prestodb.dbapi', 'Connection')
from sqlalchemy.engine import create_engine
port = 8889
user = os.environ["USER"]
engine = create_engine(f'presto://{user}#presto:{port}/hive',
connect_args={'protocol': 'https', 'requests_kwargs': {'verify': False}})
db = engine.raw_connection()
# AttributeError: type object 'Connection' has no attribute 'get_dialect_cls'
Any ideas?
If you have a look at the Dialects docs you will see that Presto is a external dialect and needs to be installed separately. The Presto dialect is supported through PiHyve and can be installed using pip install 'pyhive[presto]'.
Please help me. I want to start module webdriver from selenium and use chromedriver.exe on windows.
from selenium.webdriver.common.keys import Keys
import time
# driver = webdriver.Firefox()
driver = webdriver.Chrome(executable_path='D:\download\chromedriver_win32\chromedriver.exe')
After execute simple code:
Message: unknown error: cannot find Chrome binary
I used this code and import is to update chrome to Version 87.0.4280.88 (Official Build) (32-bit) because after open browse get error "Getting Radio failed" and could not use "driver.close()"
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
options = webdriver.ChromeOptions()
options.add_argument("--remote-debugging-port=9222")
options.binary_location = "D:\\download\\GoogleChromePortable2\\GoogleChromePortable.exe"
chrome_driver_path = "D:\\download\\chromedriver_win32\\chromedriver87.exe"
driver = webdriver.Chrome(executable_path=chrome_driver_path, options=options)
driver.get('https:/python.org')
time.sleep(13)
driver.close()
I am trying to import stanford ner but getting error:
NLTK was unable to find the
C:/Users/.../stanford-ner-2018-10-16/classifiers/all.7class.distsim.crf.ser
file! Use software specific configuration paramaters or set the
STANFORD_MODELS environment variable.
Below the code:
from nltk.tag import StanfordNERTagger
import os
java_path = "C:/Program Files/Java/jre1.8.0_201/bin/java.exe"
os.environ['JAVAHOME'] = "JAVA_PATH" #this java path you get from command 'echo %PATH%'in terminal
st = StanfordNERTagger('C:/Users/.../stanford-ner-2018-10-16/classifiers/all.7class.distsim.crf.ser',
'C:/Users/.../stanford-ner-2018-10-16/stanford-ner.jar',encoding='utf-8')
I am learning pyspark, and trying to connect to a mysql database.
But i am getting a java.lang.ClassNotFoundException: com.mysql.jdbc.Driver Exception while running the code. I have spent a whole day trying to fix it, any help would be appreciated :)
I am using pycharm community edition with anaconda and python 3.6.3
Here is my code:
from pyspark import SparkContext,SQLContext
sc= SparkContext()
sqlContext= SQLContext(sc)
df = sqlContext.read.format("jdbc").options(
url ="jdbc:mysql://192.168.0.11:3306/my_db_name",
driver = "com.mysql.jdbc.Driver",
dbtable = "billing",
user="root",
password="root").load()
Here is the error:
py4j.protocol.Py4JJavaError: An error occurred while calling o27.load.
: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
This got asked 9 months ago at the time of writing, but since there's no answer, there it goes. I was in the same situation, searched stackoverflow over and over, tried different suggestions but the answer finally is absurdly simple: You just have to COPY the MySQL driver into the "jars" folder of Spark!
Download here https://dev.mysql.com/downloads/connector/j/5.1.html
I'm using the 5.1 version, although 8.0 exists, but I had some other problems when running the latest version with Spark 2.3.2 (had also other problems running Spark 2.4 on Windows 10).
Once downloaded you can just copy it into your Spark folder
E:\spark232_hadoop27\jars\ (use your own drive:\folder_name -- this is just an example)
You should have two files:
E:\spark232_hadoop27\jars\mysql-connector-java-5.1.47-bin.jar
E:\spark232_hadoop27\jars\mysql-connector-java-5.1.47.jar
After that the following code launched through pyCharm or jupyter notebook should work (as long as you have a MySQL database set up, that is):
import findspark
findspark.init()
import pyspark # only run after findspark.init()
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
dataframe_mysql = spark.read.format("jdbc").options(
url="jdbc:mysql://localhost:3306/uoc2",
driver = "com.mysql.jdbc.Driver",
dbtable = "company",
user="root",
password="password").load()
dataframe_mysql.show()
Bear in mind, I'm working currently locally with my Spark setup, so no real clusters involved, and also no "production" kind of code which gets submitted to such a cluster. For something more elaborate this answer could help: MySQL read with PySpark
On my computer, #Kondado 's solution works only if I change the driver in the options:
driver = 'com.mysql.cj.jdbc.Driver'
I am using Spark 8.0 on Windows. I downloaded mysql-connector-java-8.0.15.jar, Platform Independent version from here. And copy it to 'C:\spark-2.4.0-bin-hadoop2.7\jars\'
My code in Pycharm looks like this:
#import findspark # not necessary
#findspark.init() # not necessary
from pyspark import SparkConf, SparkContext, sql
from pyspark.sql import SparkSession
sc = SparkSession.builder.getOrCreate()
sqlContext = sql.SQLContext(sc)
source_df = sqlContext.read.format('jdbc').options(
url='jdbc:mysql://localhost:3306/database1',
driver='com.mysql.cj.jdbc.Driver', #com.mysql.jdbc.Driver
dbtable='table1',
user='root',
password='****').load()
print (source_df)
source_df.show()
I dont know how to add jar file to ClassPath(can someone tell me how??) so I put it in the SparkSession config and it works fine.
spark = SparkSession \
.builder \
.appName('test') \
.master('local[*]') \
.enableHiveSupport() \
.config("spark.driver.extraClassPath", "<path to mysql-connector-java-5.1.49-bin.jar>") \
.getOrCreate()
df = spark.read.format("jdbc").option("url","jdbc:mysql://localhost/<database_name>").option("driver","com.mysql.jdbc.Driver").option("dbtable",<table_name>).option("user",<user>).option("password",<password>).load()
df.show()
This worked for me, pyspark with mssql
java version is 1.7.0_191
pyspark version is 2.1.2
Download the below jar files
sqljdbc41.jar
mssql-jdbc-6.2.2.jre7.jar
Paste the above jars inside jars folder in the virtual environment
test_env/lib/python3.6/site-packages/pyspark/jars
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('Practise').getOrCreate()
url = 'jdbc:sqlserver://your_host_name:your_port;databaseName=YOUR_DATABASE_NAME;useNTLMV2=true;'
df = spark.read.format('jdbc'
).option('url', url
).option('user', 'your_db_username'
).option('password','your_db_password'
).option('dbtable', 'YOUR_TABLE_NAME'
).option('driver', 'com.microsoft.sqlserver.jdbc.SQLServerDriver'
).load()
I installed web2py as source and wanted to use DAL without the rest of the framework.
But DAL does not connect to mysql:
>>> DAL('mysql://user1:user1#localhost/test_rma')
...
RuntimeError: Failure to connect, tried 5 times:
'NoneType' object has no attribute 'connect'
Whereas MySQLdb can connect to the database with the same credentials:
>>> import MySQLdb
>>> db = MySQLdb.connect(host='localhost', user='user1', passwd='user1', db='test_rma')
A similar problem with MsSQL was solved by explicitly setting the driver object. I tried the same solution:
>>> from gluon.dal import MySQLAdapter
>>> print MySQLAdapter.driver
None
>>> driver = globals().get('MySQLdb',None)
>>> print MySQLAdapter.driver
None
But still the driver is None.
Ok, I found the solution of the problem. I had to write:
MySQLAdapter.driver = globals().get('MySQLdb',None)
instead of
driver = globals().get('MySQLdb',None)
I misread that line in the original question.