I want to upload nodes from a csv:
LOAD CSV FROM 'file:///Downloads/template_algorithmes.csv' AS line
MATCH (p:Person {name:'Raf'})
CREATE (al:Algorithm {name: line[1], project:line[2], description:line[3], input:line[4], output:line[5], remark:line[9]}), (p)-[:WORKED_ON]->(al)
But it answers:
Couldn't load the external resource at: file:/var/lib/neo4j/import/Downloads/template_algorithmes_TEITGEN_raphael.csv
Indeed, it is in /Downloads/ not in var/lib... which don't even have a neo4j folder:
bash-5.1$ cd /var/lib/
abrt/ cni/ dnf/ games/ initramfs/ misc/ PackageKit/ rpm-state/ tpm2-tss/
AccountsService/ color/ dnsmasq/ gdm/ iscsi/ mlocate/ plymouth/ samba/ udisks2/
alsa/ colord/ docker/ geoclue/ kdump/ net-snmp/ polkit-1/ selinux/ unbound/
alternatives/ containerd/ fedora-third-party/ gssproxy/ libvirt/ NetworkManager/ portables/ sss/ upower/
authselect/ containers/ flatpak/ hp/ lockdown/ nfs/ power-profiles-daemon/ systemd/ xkb/
bluetooth/ dbus/ fprint/ httpd/ logrotate/ openvpn/ private/ texmf/
chrony/ dhclient/ fwupd/ hyperv/ machines/ os-prober/ rpm/ tpm/
You can change it in the neo4j.conf configuration file located in <NEO4J_HOME>/conf. Then ensure to restart/bounce your server.
#default
#dbms.directories.import=import
dbms.directories.import=<new location>
Related
I want to Load Multiple CSV files matching certain names into a dataframe. Currently i am looping through the whole folder and creating a list of filenames and then loading those csv's into the dataframe list and then concatenating that dataframe.
The approach i want to use (if possible) is to bypass all the code and read all files in a one liner kind of approach.
I know this can be done easily for single level of subfolders, but my subfolder structure is as follows
Root Folder
|
Subfolder1
|
Subfolder 2
|
X01.csv
Y01.csv
Z01.csv
|
Subfolder3
|
Subfolder4
|
X01.csv
Y01.csv
|
Subfolder5
|
X01.csv
Y01.csv
I want to read all "X01.csv" files while reading from Root Folder.
Is there a way i can read all the required files in code something like the below
filepath = "rootpath" + "/**/X*.csv"
df = spark.read.format("com.databricks.spark.csv").option("recursiveFilelookup","true").option("header","true").load(filepath)
This code works fine for single level of subfolders, is there any equivalent of this for multi level folders ? i thought the "recursiveFilelookup" option would look across all levels of subfolders, but apparently this is not the way it works.
Currently i am getting a
Path not found ... filepath
exception
any help please
Have you tried using the glob.glob function?
You can use it to search for files that match certain criteria inside a root path, and pass the list of files it finds to spark.read.csv function.
For example, I've recreated the folder structure from your example inside a Google Colab environment:
To get a list of all CSV files matching the criteria you've specified, you can use the following code:
import glob
rootpath = './Root Folder/'
# The following line of code looks through all files
# inside the rootpath recursively, trying to match the
# pattern specified. In this case, it tries to find any
# CSV file that starts with the letters X, Y, or Z,
# and ends with 2 numbers (ranging from 0 to 9).
glob.glob(rootpath + "**/[X|Y|Z][0-9][0-9].csv", recursive=True)
# Returns:
# ['./Root Folder/Subfolder5/Y01.csv',
# './Root Folder/Subfolder5/X01.csv',
# './Root Folder/Subfolder1/Subfolder 2/Y01.csv',
# './Root Folder/Subfolder1/Subfolder 2/Z01.csv',
# './Root Folder/Subfolder1/Subfolder 2/X01.csv']
Now you can combine this with spark.read.csv capability of reading a list of files to get the answer you're looking for:
import glob
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
rootpath = './Root Folder/'
spark.read.csv(glob.glob(rootpath + "**/[X|Y|Z][0-9][0-9].csv", recursive=True), inferSchema=True, header=True)
Note
You can specify more general patterns like:
glob.glob(rootpath + "**/*.csv", recursive=True)
To return a list of all csv files inside any subdirectory of rootpath.
Additionally, to consider only the immediate subdirectories files, you could use something like:
glob.glob(rootpath + "*.csv", recursive=True)
Edit
Based on your comments to this answer, does something like this works on Databricks?
from notebookutils import mssparkutils as ms
# databricks has a module called dbutils.fs.ls
# that works similarly to mssparkutils.fs, based on
# the following page of its documentation:
# https://docs.databricks.com/dev-tools/databricks-utils.html#ls-command-dbutilsfsls
def scan_dir(
initial_path: str,
search_str: str,
account_name: str,
):
"""Scan a directory and subdirectories for a string.
Parameters
----------
initial_path : str
The path to start the search. Accepts either a valid container name,
or the entire connection string.
search_str : str
The string to search.
account_name : str
The name of the account to access the container folders.
This value is only used, when the `initial_path`, doesn't
conform with the format: "abfss://<initial_path>#<account_name>.dfs.core.windows.net/"
Raises
------
FileNotFoundError
If the `initial_path` informed doesn't exist.
ValueError
If `initial_path` is not a string.
"""
if not isinstance(initial_path, str):
raise ValueError(
f'`initial_path` needs to be of type string, not {type(initial_path)}'
)
elif not initial_path.startswith('abfss'):
initial_path = f'abfss://{initial_path}#{account_name}.dfs.core.windows.net/'
try:
fdirs = ms.fs.ls(initial_path)
except Py4JJavaError as exc:
raise FileNotFoundError(
f'The path you informed \"{initial_path}\" doesn\'t exist'
) from exc
found = []
for path in fdirs:
p = path.path
if path.isDir:
found = [*found, *scan_dir(p, search_str)]
if search_str.lower() in path.name.lower():
# print(p.split('.net')[-1])
found = [*found, p.replace(path.name, "")]
return list(set(found))
Example:
# Change .parquet to .csv
spark.read.parquet(*scan_dir("abfss://CONTAINER_NAME#ACCOUNTNAME.dfs.core.windows.net/ROOT/FOLDER/", ".parquet"))
This method above worked for on Azure Synapse:
I am doing SCA analysis using Fortify for API Axway. The codes are written in json and xml format. Upon scanning, Fortify is able to scan .xml files only and not the json files.
Can anyone help me if there is any plugin or any other thing through which I can scan json files as well.
The github url of the sample code: https://github.com/amolmandloi037/axway-swagger-maven.git
Below is the command I am using in Jenkins Scripted Pipeline:
stage('Fortify scan') {
pom = readMavenPom file: "pom.xml"
fortify_name = pom.artifactId
fortify_version = pom.version
withCredentials([usernamePassword(credentialsId: 'Fortify', passwordVariable: 'password', usernameVariable:'username')]) {
bat """
dir
sourceandlibscanner -auto -bt none -scan -sonatype -iqrl https://{fortify_url} --nexusauth {username}:{password} -iqappid ${fortify_name} -stage build -r sonatype_result.json -f result.fpr
"""
}
fortifyUpload appName: fortify_name, appversion: fortify_version, resultsFile: 'result.fpr
}
Scenario:
I have inspec profile-A(10 controls), Profile-B(15 controls), Profile-C(5 controls)
Profile-A depends on Profile-B and Profile-C.
I have a file in Profile-A which I am prasing with inspec.profile.file('test.json') and executing the 10 controls in the same profile.
I have to pass the same file to Profile-B and Profile-C so that I can execute the other set of tests in each profile as part of the profile dependency
I am able to successfully parse the test.json file in profile-A as the file is in correct folder path
myjson = json(content: inspec.profile.file('test.json'))
puts myjson
I have followed the inspec documentation to set up the profile dependency and inputs to the dependant profiles.
https://docs.chef.io/inspec/inputs/
Issue:
Issue is that I am able to pass a single Input values (like string, array etc..) to the dependent profiles but not able to pass the entire json file so that it will parse and the controls will be executed.
I have tried the following in the profile metadata file
# ProfileB inspec.yml
name: profile-b
inputs:
- name: file1
- name: file2
# wrapper inspec.yml
name: profile-A
depends:
- name: profile-b
path: ../profile-b
inputs:
- name: file1
value: 'json(content: inspec.profile.file('test.json'))'
profile: profile-b
- name: file2
value: 'FILE.read('/path/to/test.json')'
profile: profile-b
Error:
when I try to load the file1 and file2 in profile-b with the following
jsonfile1 = input('file1')
jsonfile2 = input('file2')
puts jsonfile1
puts jsonfile2
error - no implicit conversion of nil to integer
Goal:
I should be able to pass the file from profile-A to profile-B or profile-C so that the respective dependent profile controls are execute.
I have a configuration file with:
{path, "/mnt/test/"}.
{name, "Joe"}.
The path and the name could be changed by a user. As I know, there is a way to save those variables in a module by usage of file:consult/1 in
-define(VARIABLE, <parsing of the config file>).
Are there any better ways to read a config file when the module begins to work without making a parsing function in -define? (As I know, according to Erlang developers, it's not the best way to make a complicated functions in -define)
If you need to store config only when you start the application - you may use application config file which is defined in 'rebar.config'
{profiles, [
{local,
[{relx, [
{dev_mode, false},
{include_erts, true},
{include_src, false},
{vm_args, "config/local/vm.args"}]
{sys_config, "config/local/yourapplication.config"}]
}]
}
]}.
more info about this here: rebar3 configuration
next step to create yourapplication.config - store it in your application folder /app/config/local/yourapplication.config
this configuration should have structure like this example
[
{
yourapplicationname, [
{path, "/mnt/test/"},
{name, "Joe"}
]
}
].
so when your application is started
you can get the whole config data with
{ok, "/mnt/test/"} = application:get_env(yourapplicationname, path)
{ok, "Joe"} = application:get_env(yourapplicationname, name)
and now you may -define this variables like:
-define(VARIABLE,
case application:get_env(yourapplicationname, path) of
{ok, Data} -> Data
_ -> undefined
end
).
I have bunch of files in one directory that has many entries in the file as this:
{"DateTimeStamp":"2017-07-20T21:52:00.767-0400","Host":"Server","Code":"test101","use":"stats"}
I need to be able read each file and form a data frame from the json etries. Sometimes, the lines in the file may not be complete and my script is failing. How can I modify this script to account for not complete lines in the files:
path<-c("C:/JsonFiles")
filenames <- list.files(path, pattern="*Data*", full.names=TRUE)
dflist <- lapply(filenames, function(i) {
jsonlite::fromJSON(
paste0("[",
paste0(readLines(i),collapse=","),
"]"),flatten=TRUE
)
})
mq<-rbindlist(dflist, use.names=TRUE, fill=TRUE)