Textblob OCR throws 404 error when trying to translate to another language - ocr

I have a 5 line simple program to translate a language to English via OCR Textblob.
But for some reason, it throws 404 error!!!
from textblob import TextBlob
text = u"おはようございます。"
tb = TextBlob(text)
translated = tb.translate(to="en")
print(translated)
The Textblob is installed and the version is 0.15.3
$ pip install -U textblob
$ python -m textblob.download_corpora
Thank you

Something has changed in the Google API used by Textblob. You can see the 'official' discussion and suggested solution here https://github.com/sloria/TextBlob/issues/395
In summary, until the issue get fixed by Textblob author, in translate.py you should change
url = "http://translate.google.com/translate_a/t?client=webapp&dt=bd&dt=ex&dt=ld&dt=md&dt=qca&dt=rw&dt=rm&dt=ss&dt=t&dt=at&ie=UTF-8&oe=UTF-8&otf=2&ssel=0&tsel=0&kc=1"
to
url = "http://translate.google.com/translate_a/t?client=te&format=html&dt=bd&dt=ex&dt=ld&dt=md&dt=qca&dt=rw&dt=rm&dt=ss&dt=t&dt=at&ie=UTF-8&oe=UTF-8&otf=2&ssel=0&tsel=0&kc=1"

Related

I am trying to run a download data from chrome browser using chromedriver in a python file

I am getting this error while using chrome driver to download the google images ... the chrome driver is in path too. The executable is downloaded still the issue persists.
Windows 10 operating system .. the chromedriver is installed through pip too...
Input:
if __name__ == '__main__':
chrome_driver = 'C:\\Users\\320086442\\AppData\\Local\\Continuum\\anaconda3\\Lib\\site-packages\\selenium\\webdriver\\chrome'
# download the emotion data
data_dir = 'C:\\Users\\320086442\\Downloads\\emotion-master\\image_net'
emotions = {'angry': ['angry', 'furious', 'resentful', 'irate'],
'disgusted': ['disgusted', 'sour', 'grossed out'],
'happy': ['happy', 'smiling', 'cheerful', 'elated', 'joyful'],
'sad': ['sad', 'depressed', 'sorrowful', 'mournful', 'grieving', 'crying'],
'surprised': ['surprised', 'astonished', 'shocked', 'amazed']}
download_emotions(emotions, data_dir, chrome_driver)
# download the pseudo Imagenet data
imagenet_labels = []
with open('C:\\Users\\320086442\\Downloads\\emotion-master\\image_net\\imagenet_labels.txt', 'r') as file:
for line in file:
imagenet_labels.append(line.strip())
data_dir = 'C:\\Users\\320086442\\Downloads\\emotion-master'
imagenet_label_file = 'C:\\Users\\320086442\\Downloads\\emotion-master\\image_net\\imagenet_labels.txt'
download_fake_imagenet(imagenet_labels, data_dir, chrome_driver)
Output:
C:\Users\320086442\AppData\Local\Continuum\anaconda3\python.exe C:/Users/320086442/Downloads/emotion-
master/download_data.py
Downloading images for: angry human face ...
Looks like we cannot locate the path the 'chromedriver' (use the '--chromedriver' argument to specify
the path to the executable.) or google chrome browser is not installed on your machine (exception:
use
options instead of chrome_options)
Process finished with exit code 0

I need some solution please

I am getting the following error message:
Warning: Environment variable SUMO_HOME is not set, using built in type maps.
Warning: Environment variable SUMO_HOME is not set, schema resolution will use slow website lookups.
Error: unable to open file 'https://sumo.dlr.de/xsd/types_file.xsd'
In file 'built in type map'
At line/column 1/0.
The types could not be loaded from 'built in type map'.
Quitting (on error).
What could be causing this?
Error: unable to open file 'https://sumo.dlr.de/xsd/types_file.xsd'
It's http , not https. ... Please see this site https://sumo.dlr.de/wiki/Networks/PlainXML → $ wget http://sumo.dlr.de/xsd/types_file.xsd
My test (I created a test dir. sumo/TEST_COMMANDS/ with some default files + the "wget downloaded" types_file.xsd):
$ cd sumo/ && export SUMO_HOME="$PWD" && cd TEST_COMMANDS/
$ netconvert --node-files=input_nodes.nod.xml --edge-files=input_edges.edg.xml \
--connection-files=input_connections.con.xml --type-files=types_file.xsd \
--output-file=MySUMONet.net.xml
The terminal reply is : Success. ..... And the file MySUMONet.net.xml 61.4kB is created.

Azure Batch :Elevating the user privileges during Pool Creation using Azure CLI

I need to mount the azure file storage to Linux-Pools when they are being spun-up.I am following the instructions given here to achieve that: mounting Azure-File Storage to Batch Specically in my Azure CLI script under the Pools start commands I am inserting something which looks like this
--start-task-command-line="apt-get update && apt-get install cifs-utils && mkdir -p {} && mount -t cifs {} {} -o vers=3.0,username={},password={},dir_mode=0777,file_mode=0777,serverino".format(_COMPUTE_NODE_MOUNT_POINT, _STORAGE_ACCOUNT_SHARE_ENDPOINT, _COMPUTE_NODE_MOUNT_POINT, _STORAGE_ACCOUNT_NAME, _STORAGE_ACCOUNT_KEY)
but when I run the tasks with the auto-user that batch uses by default I get an error in the stderr.txt file mentioning that it was unable to create the "/mnt/MyAzureFileshare" directory and so my guess is the mounting didn't occur during the pool creation process.I saw a very similar question to the one I am facing:setting custom user identity for tasks and even the official Microsoft documentation goes over this in detail:Run Tasks under User accounts in Batch but none of them put a light on how to achieve this using Azure CLI.
In order to install specific packages so that Azure File Storage can be mounted requires sudo privileges and I am unable to do that through the Azure-CLI. In order to recreate the error I would recommend having a look at this:app to replicate the issue
What I want to achieve is:
1) Create a Pool with the Azure-File Storage mounted on it and elevate the privileges of the auto-user to the admin level using Azure CLI
2) Run tasks with the same auto-user with Admin Privileges using the azure CLI
Update 1:
I was able to mount Azure File Storage with Batch using the Azure CLI. I still am not able to populate the Azure File Storage with the output files of the app that I deployed on Batch Nodes.I have got no error in the stderr.txt files.
The output of the stderr.txt file is:
WARNING: In "login" auth mode, the following arguments are ignored: --account-key
Alive[################################################################] 100.0000%
Finished[#############################################################] 100.0000%
pdf--->png: 0%| | 0/1 [00:00<?, ?it/s]
pdf--->png: 100%|##########| 1/1 [00:00<00:00, 1.16it/s]WARNING: In "login" auth mode, the following arguments are ignored: --account-key
WARNING: uploading /mnt/batch/tasks/workitems/pdf-processing-job-2018-10-29-15-36-15/job-1/mytask-0/wd/png_files-2018-10-29-15-39-25/akronbeaconjournal_20180108_AkronBeaconJournal_0___page---0.png
Alive[################################################################] 100.0000%
Finished[#############################################################] 100.0000%
The Python App that was deployed on the Batch Nodes is:
import os
import fitz
import subprocess
import argparse
import time
from tqdm import tqdm
import sentry_sdk
import sys
import datetime
def azure_active_directory_login(azure_username,azure_password,azure_tenant):
try:
azure_login_output=subprocess.check_output(["az","login","--service-principal","--username",azure_username,"--password",azure_password,"--tenant",azure_tenant])
except subprocess.CalledProcessError:
sentry_sdk.capture_message("Invalid Azure Login Credentials")
sys.exit("Invalid Azure Login Credentials")
def download_from_azure_blob(azure_storage_account,azure_storage_account_key,input_azure_container,file_to_process,pdf_docs_path):
file_to_download=os.path.join(input_azure_container,file_to_process)
try:
subprocess.check_output(["az","storage","blob","download","--container-name",input_azure_container,"--file",os.path.join(pdf_docs_path,file_to_process),"--name",file_to_process,"--account-key",azure_storage_account_key,\
"--account-name",azure_storage_account,"--auth-mode","login"])
except subprocess.CalledProcessError:
sentry_sdk.capture_message("unable to download the pdf file")
sys.exit("unable to download the pdf file")
def pdf_to_png(input_folder_path,output_folder_path):
pdf_files=[x for x in os.listdir(input_folder_path) if x.endswith((".pdf",".PDF"))]
pdf_files.sort()
for pdf in tqdm(pdf_files,desc="pdf--->png"):
doc=fitz.open(os.path.join(input_folder_path,pdf))
page_count=doc.pageCount
for f in range(page_count):
page=doc.loadPage(f)
pix = page.getPixmap()
if pdf.endswith(".pdf"):
png_filename=pdf.split(".pdf")[0]+"___"+"page---"+str(f)+".png"
pix.writePNG(os.path.join(output_folder_path,png_filename))
elif pdf.endswith(".PDF"):
png_filename=pdf.split(".PDF")[0]+"___"+"page---"+str(f)+".png"
pix.writePNG(os.path.join(output_folder_path,png_filename))
def upload_to_azure_blob(azure_storage_account,azure_storage_account_key,output_azure_container,png_docs_path):
try:
subprocess.check_output(["az","storage","blob","upload-batch","--destination",output_azure_container,"--source",png_docs_path,"--account-key",azure_storage_account_key,\
"--account-name",azure_storage_account,"--auth-mode","login"])
except subprocess.CalledProcessError:
sentry_sdk.capture_message("Unable to upload file to the container")
def upload_to_fileshare(png_docs_path):
try:
subprocess.check_output(["cp","-r",png_docs_path,"/mnt/MyAzureFileShare/"])
except subprocess.CalledProcessError:
sentry_sdk.capture_message("unable to upload to azure file share ")
if __name__=="__main__":
#Credentials
sentry_sdk.init("<Sentry Creds>")
azure_username=<azure_username>
azure_password=<azure_password>
azure_tenant=<azure_tenant>
azure_storage_account=<azure_storage_account>
azure_storage_account_key=<azure_account_key>
try:
parser = argparse.ArgumentParser()
parser.add_argument("input_azure_container",type=str,help="Location to download files from")
parser.add_argument("output_azure_container",type=str,help="Location to upload files to")
parser.add_argument("file_to_process",type=str,help="file link in azure blob storage")
args = parser.parse_args()
timestamp = time.time()
timestamp_humanreadable= datetime.datetime.fromtimestamp(timestamp).strftime('%Y-%m-%d-%H-%M-%S')
task_working_dir=os.getcwd()
file_to_process=args.file_to_process
input_azure_container=args.input_azure_container
output_azure_container=args.output_azure_container
pdf_docs_path=os.path.join(task_working_dir,"pdf_files"+"-"+timestamp_humanreadable)
png_docs_path=os.path.join(task_working_dir,"png_files"+"-"+timestamp_humanreadable)
os.mkdir(pdf_docs_path)
os.mkdir(png_docs_path)
except Exception as e:
sentry_sdk.capture_exception(e)
azure_active_directory_login(azure_username,azure_password,azure_tenant)
download_from_azure_blob(azure_storage_account,azure_storage_account_key,input_azure_container,file_to_process,pdf_docs_path)
pdf_to_png(pdf_docs_path,png_docs_path)
upload_to_azure_blob(azure_storage_account,azure_storage_account_key,output_azure_container,png_docs_path)
upload_to_fileshare(png_docs_path)
The upload_to_fileshare() in the python app above should initiate the upload but in my case nothing happens and there is no error in the copy operation in the stderr.txt files
Please let me know a way to troubleshoot this issue
It does not look like the run elevated parameter is exposed via a command line argument through the CLI. You can however specify a JSON file to the --json argument formatted as the REST API object to get all functionalities.

Service Account and PyDrive

I would like to use the Google Drive API to store some backups on it using a cronjob. I just don't understand how I can use PyDrive using a Service Account. When I generate the service account file, and I put it in the directory as my script as client_secret.json.
Using this code :
#!/usr/bin/env python
# -*- coding: utf8 -*-
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
def main():
gauth = GoogleAuth()
drive = GoogleDrive(gauth)
f = drive.CreateFile({'parent': 'toto'})
f.SetContentFile('test.drive.py')
f.Upload()
if __name__ == '__main__':
main(sys.argv[1:])
Result
pydrive.settings.InvalidConfigError: Invalid client secrets file Invalid file format.
Well ok. Then I look at other posts on SO, and find these two :
Automate Verification Process
The code from the first answer on this question returns this :
Traceback (most recent call last):
File "test.drive.py", line 4, in <module>
from oauth2client.client import SignedJwtAssertionCredentials
ImportError: cannot import name SignedJwtAssertionCredentials
And then this one :
Automating pydrive verification process
Which just... Doesn't help me much.
Where should I start ? What should I do ? Could someone give me an example with pydrive and Service Authentication just to upload a file ?
EDIT :
After some more research it seems like I needed to install pycrypto to fix the import error described above. I don't know why as it is not specified in the error message.
After some more research it seems like I needed to install pycrypto to fix the import error described above. I don't know why as it is not specified in the error message.

JSON API for PyPi - how to list packages?

There is a JSON API for PyPI which allows getting data for packages:
http://pypi.python.org/pypi/<package_name>/json
http://pypi.python.org/pypi/<package_name>/<version>/json
However, is it possible to get a list of all PyPI packages (or, for example, recent ones) with a GET call?
The easiest way to do this is to use the simple index at PyPI which lists all packages without overhead. You can then request the JSON of each package individually by performing a GET request to the URLs mentioned in your question.
I know that you asked for a way to do this from the JSON API, but you can use the XML-RPC api to get this info very easily, without having to parse HTML.
try:
import xmlrpclib
except ImportError:
import xmlrpc.client as xmlrpclib
client = xmlrpclib.ServerProxy('https://pypi.python.org/pypi')
# get a list of package names
packages = client.list_packages()
I tried this answer, but it's not working on Python 3.6
I found one solution with HTML parsing by using lxml package, But you have to install it via pip command as
pip install lxml
Then, try the following snippet
from lxml import html
import requests
response = requests.get("https://pypi.org/simple/")
tree = html.fromstring(response.content)
package_list = [package for package in tree.xpath('//a/text()')]
NOTE: To make tasks like this simple I've implemented an own Python module. It can be installed using pip:
pip install jk_pypiorgapi
The module is very simple to use. After instantiating an object representing the API interface you can make use of it:
import jk_pypiorgapi
api = jk_pypiorgapi.PyPiOrgAPI()
n = len(api.listAllPackages())
print("Number of packages on pypi.org:", n)
This module also provides capabilities for downloading information about specific packages as provided by pypi.org:
import jk_pypiorgapi
import jk_json
api = jk_pypiorgapi.PyPiOrgAPI()
jData = api.getPackageInfoJSON("jk_pypiorgapi")
jk_json.prettyPrint(jData)
This feature might be helpful as well.
As of PEP 691, you can now grab this through the Simple API if you request a JSON response.
curl --header 'Accept: application/vnd.pypi.simple.v1+json' https://pypi.org/simple/ | jq
Here's Bash one-liner:
curl -sG -H 'Host: pypi.org' -H 'Accept: application/json' https://pypi.org/pypi/numpy/json | awk -F "description\":\"" '{ print $2 }' |cut -d ',' -f 1
# NumPy is a general-purpose array-processing package designed to...