Using PDF2Image in Code Repository on Palantir Foundry

Using PDF2Image in Code Repository on Palantir Foundry - palantir-foundry

I am trying to use the library pdf2image in a Code Repository on Palantir Foundry and getting the error
pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?
when using the function convert_from_bytes.
Does anyone know how to reference the poppler path and get rid of this error?
Thanks!
Here is the code:
def extract_pdf_text(input_bytes, language='eng', dpi=200):
pages = convert_from_bytes(input_bytes, dpi)
pdf_pages = ''
for page_index, page in enumerate(pages):
pdf_page = pytesseract.image_to_string(page, lang=language)
pdf_pages = pdf_pages + pdf_page
return pdf_pages
And the meta.yaml for reference:
# If you need to modify the runtime requirements for your package,
# update the 'requirements.run' section in this file
package:
name: "{{ PACKAGE_NAME }}"
version: "{{ PACKAGE_VERSION }}"
source:
path: ../src
requirements:
# Tools required to build the package. These packages are run on the build system and include
# things such as revision control systems (Git, SVN) make tools (GNU make, Autotool, CMake) and
# compilers (real cross, pseudo-cross, or native when not cross-compiling), and any source pre-processors.
# https://docs.conda.io/projects/conda-build/en/latest/resources/define-metadata.html#build
build:
- python 3.8.*
- setuptools
# Packages required to run the package. These are the dependencies that are installed automatically
# whenever the package is installed.
# https://docs.conda.io/projects/conda-build/en/latest/resources/define-metadata.html#run
run:
- python 3.8.*
- transforms {{ PYTHON_TRANSFORMS_VERSION }}
- transforms-expectations
- transforms-verbs
- pytesseract
- pdfplumber
- googletrans
- regex
- pdf2image
- langdetect
- pandas
- numpy
- selenium
- requests
- pypdf2
- poppler
build:
script: python setup.py install --single-version-externally-managed --record=record.txt

I found the problem when inspecting the CI-Checks. They failed before poppler was pulled. After I cleaned up meta.yaml and the checks succeded everything seems to work fine.

Related

Deploy Flask Application using python 3.7

I follow this guideline to deploy my python application.
https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/create-deploy-python-flask.html
I got this bug when use command eb deploy:
2021/09/10 17:26:49.653297 [INFO] extracting /opt/elasticbeanstalk/deployment/app_source_bundle to /var/app/staging/
2021/09/10 17:26:49.653315 [INFO] Running command /bin/sh -c /usr/bin/unzip -q -o /opt/elasticbeanstalk/deployment/app_source_bundle -d /var/app/staging/
2021/09/10 17:26:49.656613 [INFO] finished extracting /opt/elasticbeanstalk/deployment/app_source_bundle to /var/app/staging/ successfully
2021/09/10 17:26:49.657975 [ERROR] An error occurred during execution of command [app-deploy] - [StageApplication]. Stop running the command. Error: chown /var/app/staging/env/lib/python3.7/collections: no such file or directory
2021/09/10 17:26:49.657984 [INFO] Executing cleanup logic
2021/09/10 17:26:49.658080 [INFO] CommandService Response: {"status":"FAILURE","api_version":"1.0","results":[{"status":"FAILURE","msg":"Engine execution has encountered an error.","returncode":1,"events":[{"msg":"Instance deployment failed. For details, see 'eb-engine.log'.","timestamp":1631294809,"severity":"ERROR"}]}]}
its so annoy because it can not find python 3.7 to run. can anyone give me a hand to by pass this mess?
application.py
# app.py
from flask import Flask
application = Flask(__name__)
#application.route("/")
def hello():
return "Hello World!"
if __name__ == '__main__':
application.run()
requirements.txt
click==7.1.2
Flask==1.1.2
itsdangerous==1.1.0
Jinja2==2.11.2
MarkupSafe==1.1.1
Werkzeug==1.0.1

This happened to me while following the tutorial as well.
The problem, in my case, was that my virtual environment folder venv was getting swept up in the deploy and breaking it.
I thought that I could put the path in .ebignore to prevent this, but I did not read the .ebignore documentation clearly:
If .ebignore isn't present, but .gitignore is, the EB CLI ignores files specified in .gitignore. If .ebignore is present, the EB CLI doesn't read .gitignore.
https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/eb-cli3-configuration.html#eb-cli3-ebignore
In my case, I just removed the .ebignore file completely, added a line for venv into my .gitignore file, ran eb deploy again and everything worked.

Pipeline fails when running Packer fix

I'm trying to run Packer (1.7) in an Azure DevOps pipeline.
The pkr.hcl files passes validation on my PC running Packer 1.7.3. The pipeline runs Packer 1.7.2.
The YAML task in the pipeline reads like this:
- task: PackerBuild#1
inputs:
templateType: 'custom'
customTemplateLocation: 'ComboBoxes.pkr.hcl'
imageUri: 'ssi-dev-combobox'
imageId: <full resource ID>
When run in the pipeline it reads:
Current installed packer version is 1.7.2.
Running packer fix command
/usr/local/bin/packer fix -validate=false /home/vsts/work/1/s/ComboBoxes.pkr.hcl
Error parsing template: invalid character '#' looking for beginning of value
##[error]Packer fix command failed with error : ''. This could happen if task does not support packer version.
The # is the first character in the .pkr.hcl file. And changing the beginning of the file will change what character shows up as invalid.
Why is it trying to run "packer fix" instead of "packer build"?

So it turns out that the Packer task in Azure Pipelines doesn't work with current versions of Packer.
Run Packer as part of a script task instead.
- task: PowerShell#2
displayName: 'Packer build'
inputs:
targetType: 'inline'
script: 'packer build $(build.artifactstagingdirectory)/ComboBoxes.pkr.hcl'

changing build string for conda package

When I first ran conda-build myrecipe the tar.bz2 had the name 'mypackage-version-py38_0.tar.bz2' however every time I have run it since, the build string is 'mypackage-version-py38head00f5_0.tar.bz2'. Why is 'head00f5' now being added to the build string?
Here is my meta.yaml
package:
name: mypackage
version: "0.0.0.dev1"
source:
path: ../
build:
number: 0
requirements:
build:
- python
- setuptools
- numpy
run:
- python
- numpy
test:
imports:
- mypackage
``

Since version 3.0, conda-build has been adding hashes to the build string. See Differentiating packages built with different variants for more details.
The hash always starts with h and then has 7 hexadecimal numbers. In your particular case it happens to produce the word head, which I'm guessing is part of the confusion. I think that's just a coincidence.

Why rustc did not include libmariadb into release binary?

I thought rust compiler uses static binding and includes all the dependent libraries at compile time (hence executable size).
But when I've tried to use compiled binary in a docker scratch image with actix, mysql client and diesel with mysql feature enabled this error pops up:
error while loading shared libraries: libmariadb.so.3: cannot open shared object file: No such file or director
My dockerfile:
FROM rust:1.43 as builder
WORKDIR /var/app
RUN apt-get update && apt-get install -y libclang-dev clang libmariadb-dev-compat libmariadb-dev
COPY Cargo.toml Cargo.lock diesel.toml ./
COPY src src
RUN cargo install diesel_cli --no-default-features --features mysql
RUN cp /usr/local/cargo/bin/diesel diesel
RUN cargo build --release
FROM ubuntu
USER 1000
WORKDIR /var/app
COPY --from=builder --chown=1000:1000 /var/app/target/release/sniper_api app
COPY --from=builder --chown=1000:1000 /var/app/diesel diesel
CMD ["./app"]
My cargo:
[dependencies]
actix-rt = "1.0.0"
actix-web = "2.0.0"
actix-http = "1.0.1"
serde = { version = "1.0.112", features=["derive"] }
dotenv = "0.15.0"
config = "0.10.1"
diesel = { version = "1.4.2", features = ["mysql","r2d2"]}
futures = "0.3.5"
r2d2 = "0.8.8"
r2d2_mysql = "18.0.0"
env_logger = "0.7.1"
But if I use ubuntu/debian/etc. image as runtime and install libmariadb-dev-compat libmariadb-dev everything is fine. Is there a way to get true single binary with mysql connector in Rust?

I thought rust compiler uses static binding and includes all the dependent libraries at compile time (hence executable size).
This only applies for Rust libraries. For other languages, there is generally little rustc can do.
In particular in this case, diesel provides mysql/mariadb support via the mysqlclient-sys crate for which there currently is an issue and an accompanying PR open to support static linking for this library. But they haven't been merged yet.

Cannot Build the site on the preview server

I am a beginner on jekyll. I am following the documentation advice to Build the site on the preview server.
Here is what a did :
- install the last version of ruby
$ruby -v -> ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-darwin16]
$ gem --version -> 2.6.14
- Install Jekyll
$gem install jekyll bundler -> Version of jekyll : jekyll-3.6.2
When I try to build the site on the preview server (bundle exec jekyll serve) there is the following error :
jekyll 3.6.2 | Error: (/Users/admin/Documents/Perso/Site-Internet/Jekyll/inger/_config.yml): did not find expected key while parsing a block mapping at line 16 column 1
the line 16 is the first line uncommented that shoul be processed. It is this one
title: Inger Hair at Home at Aix en Provence (line 16)
I don't see what can be wrong. Thank you in advance for your answers

Yaml files are sensitive to spaces, in your _config.yml you have:
# Exclude from processing.
# The following items will not be processed, by default. Create a custom list
# to override the default setting.
exclude:
- Gemfile
- Gemfile.lock
- node_modules
- vendor/bundle/
- vendor/cache/
- vendor/gems/
- vendor/ruby/
- Inger-Analytics-feb0aa8b73d1.json
- .gitignore
When it should be:
# Exclude from processing.
# The following items will not be processed, by default. Create a custom list
# to override the default setting.
exclude:
- Gemfile
- Gemfile.lock
- node_modules
- vendor/bundle/
- vendor/cache/
- vendor/gems/
- vendor/ruby/
- Inger-Analytics-feb0aa8b73d1.json
- .gitignore
Note the space before exclude:. Then it should work.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Using PDF2Image in Code Repository on Palantir Foundry - palantir-foundry

I found the problem when inspecting the CI-Checks. They failed before poppler was pulled. After I cleaned up meta.yaml and the checks succeded everything seems to work fine.

Related

Deploy Flask Application using python 3.7

Pipeline fails when running Packer fix

changing build string for conda package

Why rustc did not include libmariadb into release binary?

Cannot Build the site on the preview server

Categories

Resources