PyDeeQu Error 'VerificationRunBuilder' object has no attribute 'verificationRun' - pydeequ

Using:
pip install pydeequ==1.0.1
deequ-2.0.0-spark-3.1.jar and
deequ-1.0.7_scala-2.12_spark-3.0.0.jar (tried both)
Spark Version 3.1
Pyhton 3.8
Spark Config ==> spark = (SparkSession .builder
.config("spark.jars.packages", pydeequ.deequ_maven_coord)
.config("spark.jars.excludes", pydeequ.f2j_maven_coord)
.getOrCreate())
This is the code:
import pydeequ
from pydeequ.repository import *
from pydeequ.analyzers import *
from pydeequ.checks import *
from pydeequ.verification import *
## Initialize the Deequ Validation API
checkResult = VerificationSuite(spark).onData(df)
checkResult.addCheck(Check(spark, CheckLevel.Warning, "DQ Validation").isComplete(column_to_validate))
checkResult.addCheck(Check(spark, CheckLevel.Warning, "DQ Validation").hasDataType(column_to_validate, ConstrainableDataTypes.Integral))
.......
checkResult.run()
checkResult_df = VerificationResult.checkResultsAsDataFrame(spark, checkResult)
checkResult_df.show()
This code is similar to: https://pydeequ.readthedocs.io/en/latest/README.html#constraint-verification
**** But, Gives the following error. Why?****
~/cluster-env/clonedenv/lib/python3.8/site-packages/pydeequ/verification.py in checkResultsAsDataFrame(cls, spark_session, verificationResult, forChecks, pandas)
135
136 df = spark_session._jvm.com.amazon.deequ.VerificationResult.checkResultsAsDataFrame(
--> 137 spark_session._jsparkSession, verificationResult.verificationRun, forChecks
138 )
139 sql_ctx = SQLContext(
AttributeError: 'VerificationRunBuilder' object has no attribute 'verificationRun'

Related

geomesa - unable to initialise spark sql session using geomesa pyspark

I am trying to install geomesa for pyspark and while initialising getting an error
command: geomesa_pyspark.init_sql(spark)
~/opt/anaconda3/envs/geomesa-pyspark/lib/python3.7/site-packages/geomesa_pyspark/__init__.py in init_sql(spark)
113
114 def init_sql(spark):
--> 115 spark._jvm.org.apache.spark.sql.SQLTypes.init(spark._jwrapped)
TypeError: 'JavaPackage' object is not callable
I have used the below code to install:
pyspark == 2.4.8
geomesa_pyspark using https://repo.eclipse.org/content/repositories/geomesa-releases/org/locationtech/geomesa/
geomesa_pyspark-2.4.0.tar.gz
geomesa-accumulo-spark-runtime_2.11-2.4.0.jar
python 3.7
import geomesa_pyspark
conf = geomesa_pyspark.configure(
jars=['./jars/geomesa-accumulo-spark-runtime_2.11-2.4.0.jar', './jars/postgresql-42.3.1.jar', './jars/geomesa-spark-sql_2.11-2.4.0.jar'],
packages=['geomesa_pyspark','pytz'],
spark_home='/Users/user/opt/anaconda3/envs/geomesa-pyspark/lib/python3.7/site-packages/pyspark').\
setAppName('MyTestApp')
spark = ( SparkSession
.builder
.config(conf=conf)
.config('spark.driver.memory', '15g')
.config('spark.executor.memory', '15g')
.config('spark.default.parallelism', '10')
.config('spark.sql.shuffle.partitions', '10')
.master("local")
.getOrCreate()
)
I replaced the
jars=['./jars/geomesa-accumulo-spark-runtime_2.11-2.4.0.jar', './jars/postgresql-42.3.1.jar', './jars/geomesa-spark-sql_2.11-2.4.0.jar'],
to
jars=['./jars/geomesa-accumulo-spark-runtime_2.11-2.4.0.jar'],
And for postgresql, i passed .option("driver", "org.postgresql.Driver") while loading data through pyspark which fixed the issue

AttributeError: module 'carla' has no attribute 'Client'

I've been trying to play around with the Carla self-driving car environment but I run into "AttributeError: module 'carla' has no attribute 'Client'" when I try running the code from this tutorial: https://pythonprogramming.net/control-camera-sensor-self-driving-autonomous-cars-carla-python/.
I have made a few changes to the code, including changing the .egg file to it's exact file path within my computer.
this is my code...
'''
import glob
import os
import sys
try:
sys.path.append(glob.glob('C:\Downloads\CARLA_0.9.9.4\WindowsNoEditor\PythonAPI\carla\dist\carla-0.9.9-py3.7-win-amd64.egg'))
except IndexError:
pass
import carla
actor_list = []
#try:
client = carla.Client("localhost", 2000)
client.set_timeout(2.0)
world = client.get_world()
blueprint_library = world.get_blueprint_library()
#finally:
for actor in actor_list:
actor.destroy()
print("All cleaned up!")
'''
Just for refrence I'm running on a windows 10 that has anaconda3 and python 3.7.7 and I'm using carla version 0.9.9.4. Thanks in advance!
Just correct your folder path. Would need to rename path in file structure like this...
Remove all "." from the path.
path = glob.glob('C:\Downloads\CARLA_0994\WindowsNoEditor\PythonAPI\carla\dist\carla-099-py37-win-amd64.egg')[0]
sys.path.append(path)
Full Example:
import glob
import os
import sys
try:
path = glob.glob('C:\Downloads\CARLA_0994\WindowsNoEditor\PythonAPI\carla\dist\carla-099-py37-win-amd64.egg')[0]
sys.path.append(path)
except IndexError:
pass
import carla
actor_list = []
try:
client = carla.Client("localhost", 2000)
client.set_timeout(5.0)
world = client.get_world()
blueprint_library = world.get_blueprint_library()
print("Map = ", world.get_map())
finally:
for actor in actor_list:
actor.destroy()
print("All cleaned up!")

Hazm: POSTagger(): ArgumentError: argument 2: <class 'TypeError'>: wrong type

I have got an error for running the below code. May you give me some help?
from __future__ import unicode_literals
from hazm import *
tagger = POSTagger(model='resources/postagger.model')
tagger.tag(word_tokenize('ما بسیار کتاب موانیم'))
Error:
---------------------------------------------------------------------------
ArgumentError Traceback (most recent call last)
<ipython-input-16-1d74d781e0c1> in <module>
1 tagger = POSTagger(model='resources/postagger.model')
----> 2 tagger = POSTagger()
3 tagger.tag(word_tokenize('ما بسیار کتاب موانیم'))
~/.local/lib/python3.6/site-packages/hazm/SequenceTagger.py in __init__(self, patterns, **options)
21 def __init__(self, patterns=[], **options):
22 from wapiti import Model
---> 23 self.model = Model(patterns='\n'.join(patterns), **options)
24
25 def train(self, sentences):
~/.local/lib/python3.6/site-packages/wapiti/api.py in __init__(self, patterns, encoding, **options)
283 self._model = _wapiti.api_new_model(
284 ctypes.pointer(self.options),
--> 285 self.patterns
286 )
287
ArgumentError: argument 2: <class 'TypeError'>: wrong type
I am using ubuntu18.04 on windows 10. Also, I put mentioned files in resources file beside of code.
Python 3.6.9
Package of hazm
I have no problem to run Chunker one from this packege!
chunker = Chunker(model='resources/chunker.model')
tagged = tagger.tag(word_tokenize('واقعا ک بعضیا چقد بی درکن و ادعا دارن فقط بنده خدا لابد دسترسی نداره ب دکتری چیزی نگران شد'))
tree2brackets(chunker.parse(tagged))
its because of wapiti package! wapiti does not supporting python 3 and just work with python 2! if you need postagger, you should use another postagger package!

converting h5 file to csv using h5py

how can I convert an h5 file to its corresponding csv file? these h5 files are the output of DeepLabCut
import h5py
import pandas as pd
path_to_file = '/scratch3/3d_pose/animalpose/experiments/moth-filtered-Mona-2019-12-06_10k_iters/videos/mothDLC_resnet50_moth-filteredDec6shuffle1_10000.h5'
f = h5py.File(path_to_file, 'r')
print(f.keys())
I get:
scratch/sjn-p3/anaconda/anaconda3/bin/python /scratch3/pycharm-2019.3/plugins/python/helpers/pydev/pydevconsole.py --mode=client --port=42112
import sys; print('Python %s on %s' % (sys.version, sys.platform))
sys.path.extend(['/scratch3/3d_pose/animalpose/moth_original'])
Python 3.6.7 | packaged by conda-forge | (default, Feb 28 2019, 09:07:38)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.
PyDev console: using IPython 6.2.1
Python 3.6.7 | packaged by conda-forge | (default, Feb 28 2019, 09:07:38)
[GCC 7.3.0] on linux
runfile('/scratch3/3d_pose/animalpose/moth_original/converth5tocsv.py', wdir='/scratch3/3d_pose/animalpose/moth_original')
/scratch3/pycharm-2019.3/plugins/python/helpers/pydev/_pydev_bundle/pydev_import_hook.py:21: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
module = self._system_import(name, *args, **kwargs)
KeysView(<HDF5 file "mothDLC_resnet50_moth-filteredDec6shuffle1_10000.h5" (mode r)>)
so I am not sure how to proceed when I don't get the keys

how to upload and read a zip file containing training and testing images data from google colab from my pc

I am new to google colab. I am implementing a pretrained vgg16 and resnet50 model using pytorch, but I am unable to load my file and read it as it returns an error of no directory found
I have uploaded the data through file also I have used to upload it using
from google.colab import files
uploaded = files.upload()
The file got uploaded but when I tried to unzip it because it is a zip file using
!unzip content/cropped_months
then it says
no file found
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision.transforms import *
from torch.optim import lr_scheduler
from torch.autograd import Variable
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
import matplotlib.pyplot as plt
import time
import os
import copy
from google.colab import files
uploaded = files.upload()
!unzip content/cropped_months
data_dir = 'content/cropped_months'
​
#Define transforms for the training data and testing data
train_transforms = transforms.Compose([transforms.RandomRotation(30),transforms.RandomResizedCrop(224),transforms.RandomHorizontalFlip(),transforms.ToTensor(),transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])
​
test_transforms = transforms.Compose([transforms.Resize(256),transforms.CenterCrop(224),transforms.ToTensor(),transforms.Normalize([0.485, 0.456, 0.406],[0.229, 0.224, 0.225])])
​
#pass transform here-in
train_data = datasets.ImageFolder(data_dir + '/train', transform=train_transforms)
test_data = datasets.ImageFolder(data_dir + '/test', transform=test_transforms)
​
#data loaders
trainloader = torch.utils.data.DataLoader(train_data, batch_size=8, shuffle=True)
testloader = torch.utils.data.DataLoader(test_data, batch_size=8, shuffle=True)
​
print("Classes: ")
class_names = train_data.classes
print(class_names)
first error
unzip: cannot find or open content/cropped_months,
content/cropped_months.zip or content/cropped_months.ZIP.
second error
--------------------------------------------------------------------------- FileNotFoundError Traceback (most recent call
last) in ()
16
17 #pass transform here-in
---> 18 train_data = datasets.ImageFolder(data_dir + '/train', transform=train_transforms)
19 test_data = datasets.ImageFolder(data_dir + '/test', transform=test_transforms)
20
2 frames
/usr/local/lib/python3.6/dist-packages/torchvision/datasets/folder.py
in _find_classes(self, dir)
114 if sys.version_info >= (3, 5):
115 # Faster and available in Python 3.5 and above
--> 116 classes = [d.name for d in os.scandir(dir) if d.is_dir()]
117 else:
118 classes = [d for d in os.listdir(dir) if os.path.isdir(os.path.join(dir, d))]
FileNotFoundError: [Errno 2] No such file or directory:
'content/cropped_months (1)/train'
You are probably trying to access the wrong path. In my notebook, the file was uploaded to the working directory.
Use google.colab.files to upload the zip.
from google.colab import files
files.upload()
Upload your file. Google Colab will display where it was saved:
Saving dummy.zip to dummy.zip
Then just run !unzip:
!unzip dummy.zip
I think you can use PySurvival library is compatible with Torch , here the link :
https://square.github.io/pysurvival/miscellaneous/save_load.html