My doc-tests in Julia require a qualification with the module name, despite calling using my_module everywhere. If I do not qualify the functions, I get
ERROR: UndefVarError: add not defined
Here is the setup that gives this error. The directory structure with tree is:
.
|____docs
| |____make.jl
| |____src
| | |____index.md
|____src
| |____my_module.jl
The file docs/make.jl is:
using Documenter, my_module
makedocs(
modules = [my_module],
format = :html,
sitename = "my_module.jl",
doctest = true
)
The file docs/src/index.md is:
# Documentation
```#meta
CurrentModule = my_module
DocTestSetup = quote
using my_module
end
```
```#autodocs
Modules = [my_module]
```
The file src/my_module.jl is:
module my_module
"""
add(x, y)
Dummy function
# Examples
```jldoctest
julia> add(1, 2)
3
```
"""
function add(x::Number, y::Number)
return x + y
end
end
If I qualify the doc-test in the src/my_module.jl with my_module.add(1,2), then it works fine.
How can I avoid qualifying function names in doc-tests?
Use a setup name block
This is untested, but something like this should work:
module my_module
"""
add(x, y)
Dummy function
# Examples
```#setup abc
import my_module: add
```
```jldoctest abc
julia> add(1, 2)
3
```
"""
function add(x::Number, y::Number)
return x + y
end
end
Following the comments in this thread, the problem is that the add function is not exported, so it is not brought into scope with using. You can add this line near the top of src/my_module.jl, after the module declaration:
export add
And then the doc-testing works.
Related
I want to Load Multiple CSV files matching certain names into a dataframe. Currently i am looping through the whole folder and creating a list of filenames and then loading those csv's into the dataframe list and then concatenating that dataframe.
The approach i want to use (if possible) is to bypass all the code and read all files in a one liner kind of approach.
I know this can be done easily for single level of subfolders, but my subfolder structure is as follows
Root Folder
|
Subfolder1
|
Subfolder 2
|
X01.csv
Y01.csv
Z01.csv
|
Subfolder3
|
Subfolder4
|
X01.csv
Y01.csv
|
Subfolder5
|
X01.csv
Y01.csv
I want to read all "X01.csv" files while reading from Root Folder.
Is there a way i can read all the required files in code something like the below
filepath = "rootpath" + "/**/X*.csv"
df = spark.read.format("com.databricks.spark.csv").option("recursiveFilelookup","true").option("header","true").load(filepath)
This code works fine for single level of subfolders, is there any equivalent of this for multi level folders ? i thought the "recursiveFilelookup" option would look across all levels of subfolders, but apparently this is not the way it works.
Currently i am getting a
Path not found ... filepath
exception
any help please
Have you tried using the glob.glob function?
You can use it to search for files that match certain criteria inside a root path, and pass the list of files it finds to spark.read.csv function.
For example, I've recreated the folder structure from your example inside a Google Colab environment:
To get a list of all CSV files matching the criteria you've specified, you can use the following code:
import glob
rootpath = './Root Folder/'
# The following line of code looks through all files
# inside the rootpath recursively, trying to match the
# pattern specified. In this case, it tries to find any
# CSV file that starts with the letters X, Y, or Z,
# and ends with 2 numbers (ranging from 0 to 9).
glob.glob(rootpath + "**/[X|Y|Z][0-9][0-9].csv", recursive=True)
# Returns:
# ['./Root Folder/Subfolder5/Y01.csv',
# './Root Folder/Subfolder5/X01.csv',
# './Root Folder/Subfolder1/Subfolder 2/Y01.csv',
# './Root Folder/Subfolder1/Subfolder 2/Z01.csv',
# './Root Folder/Subfolder1/Subfolder 2/X01.csv']
Now you can combine this with spark.read.csv capability of reading a list of files to get the answer you're looking for:
import glob
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
rootpath = './Root Folder/'
spark.read.csv(glob.glob(rootpath + "**/[X|Y|Z][0-9][0-9].csv", recursive=True), inferSchema=True, header=True)
Note
You can specify more general patterns like:
glob.glob(rootpath + "**/*.csv", recursive=True)
To return a list of all csv files inside any subdirectory of rootpath.
Additionally, to consider only the immediate subdirectories files, you could use something like:
glob.glob(rootpath + "*.csv", recursive=True)
Edit
Based on your comments to this answer, does something like this works on Databricks?
from notebookutils import mssparkutils as ms
# databricks has a module called dbutils.fs.ls
# that works similarly to mssparkutils.fs, based on
# the following page of its documentation:
# https://docs.databricks.com/dev-tools/databricks-utils.html#ls-command-dbutilsfsls
def scan_dir(
initial_path: str,
search_str: str,
account_name: str,
):
"""Scan a directory and subdirectories for a string.
Parameters
----------
initial_path : str
The path to start the search. Accepts either a valid container name,
or the entire connection string.
search_str : str
The string to search.
account_name : str
The name of the account to access the container folders.
This value is only used, when the `initial_path`, doesn't
conform with the format: "abfss://<initial_path>#<account_name>.dfs.core.windows.net/"
Raises
------
FileNotFoundError
If the `initial_path` informed doesn't exist.
ValueError
If `initial_path` is not a string.
"""
if not isinstance(initial_path, str):
raise ValueError(
f'`initial_path` needs to be of type string, not {type(initial_path)}'
)
elif not initial_path.startswith('abfss'):
initial_path = f'abfss://{initial_path}#{account_name}.dfs.core.windows.net/'
try:
fdirs = ms.fs.ls(initial_path)
except Py4JJavaError as exc:
raise FileNotFoundError(
f'The path you informed \"{initial_path}\" doesn\'t exist'
) from exc
found = []
for path in fdirs:
p = path.path
if path.isDir:
found = [*found, *scan_dir(p, search_str)]
if search_str.lower() in path.name.lower():
# print(p.split('.net')[-1])
found = [*found, p.replace(path.name, "")]
return list(set(found))
Example:
# Change .parquet to .csv
spark.read.parquet(*scan_dir("abfss://CONTAINER_NAME#ACCOUNTNAME.dfs.core.windows.net/ROOT/FOLDER/", ".parquet"))
This method above worked for on Azure Synapse:
I follow a python course on finance about portfolio theory. I have to create a function with a nested function in it.
My problem is I have a error message of "neg_sharpe_ratio() missing 2 required positional arguments: 'er' and 'cov'" whereas to my mind 'er' and 'cov' are already defined in my function msr below. So I understand how they are missing.
from scipy.optimize import minimize
def msr(riskfree_rate, er, cov):
n= er.shape[0]
init_guess= np.repeat(1/n, n)
bounds=((0.00, 1.0),)*n
weights_sum_to_1 = {
'type' :'eq' , #
'fun' : lambda weights: np.sum(weights) - 1 ##
}
def neg_sharpe_ratio(weights,riskfree_rate, er, cov):
r = erk.portfolio_return(weights, er)
vol = erk.portfolio_vol(weights,cov)
return -(r-riskfree_rate)/vol
results = minimize( neg_sharpe_ratio, init_guess,
args=(cov,), method="SLSQP",
options={'disp': False},
constraints=( weights_sum_to_1),
bounds=bounds
)
return results.x
TypeError: neg_sharpe_ratio() missing 2 required positional arguments: 'er' and 'cov'
The function neg_sharpe_ratio is able to reference any of the variables passed in and made by the function msr without needing those same variables passed into it itself. Therefore you should be able to remove the paramters riskfree_rate, er, and cov from the neq_sharpe_ratio function definition and have it work, as those variables are passed into its parent function, leaving you with:
def neg_sharpe_ratio(weights):
For those who might be interested, I find my mistake..
Indeed, I forgot to define correctly the arguments of my function neg_share_ratio in my function minimize.
Here is the code amended:
from scipy.optimize import minimize
def msr(riskfree_rate, er, cov):
n= er.shape[0]
init_guess= np.repeat(1/n, n)
bounds=((0.00, 1.0),)*n
weights_sum_to_1 = {
'type' :'eq' , #
'fun' : lambda weights: np.sum(weights) - 1 ##
}
def neg_sharpe_ratio(weights,riskfree_rate, er, cov):
r = erk.portfolio_return(weights, er)
vol = erk.portfolio_vol(weights,cov)
return -(r-riskfree_rate)/vol
results = minimize( neg_sharpe_ratio, init_guess,
args=(weights,riskfree_rate,er,cov), method="SLSQP",
options={'disp': False},
constraints=( weights_sum_to_1),
bounds=bounds
)
return results.x code here
I have the following closure:
def get!(Item, id) do
Enum.find(
#items,
fn(item) -> item.id == id end
)
end
As I believe this looks ugly and difficult to read, I'd like to give this a name, like:
def get!(Item, id) do
defp has_target_id?(item), do: item.id = id
Enum.find(#items, has_target_id?/1)
end
Unfortunately, this results in:
== Compilation error in file lib/auction/fake_repo.ex ==
** (ArgumentError) cannot invoke defp/2 inside function/macro
(elixir) lib/kernel.ex:5238: Kernel.assert_no_function_scope/3
(elixir) lib/kernel.ex:4155: Kernel.define/4
(elixir) expanding macro: Kernel.defp/2
lib/auction/fake_repo.ex:28: Auction.FakeRepo.get!/2
Assuming it is possible, what is the correct way to do this?
The code you posted has an enormous amount of syntax errors/glitches. I would suggest you start with getting accustomed to the syntax, rather than trying to make Elixir better by inventing the things that nobody uses.
Here is the correct version that does what you wanted. The task might be accomplished with an anonymous function, although I hardly see a reason to make a perfectly looking idiomatic Elixir look ugly.
defmodule Foo do
#items [%{id: 1}, %{id: 2}, %{id: 3}]
def get!(id) do
has_target_id? = fn item -> item.id == id end
Enum.find(#items, has_target_id?)
end
end
Foo.get! 1
#⇒ %{id: 1}
Foo.get! 4
#⇒ nil
You can do this:
def get!(Item, id) do
Enum.find(
#items,
&compare_ids(&1, id)
)
end
defp compare_ids(%Item{}=item, id) do
item.id == id
end
But, that's equivalent to:
Enum.find(
#items,
fn item -> compare_ids(item, id) end
)
and may not pass your looks ugly and difficult to read test.
I was somehow under the impression Elixir supports nested functions?
Easy enough to test:
defmodule A do
def go do
def greet do
IO.puts "hello"
end
greet()
end
end
Same error:
$ iex a.ex
Erlang/OTP 20 [erts-9.2] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:10] [hipe] [kernel-poll:false]
** (ArgumentError) cannot invoke def/2 inside function/macro
(elixir) lib/kernel.ex:5150: Kernel.assert_no_function_scope/3
(elixir) lib/kernel.ex:3906: Kernel.define/4
(elixir) expanding macro: Kernel.def/2
a.ex:3: A.go/0
wouldn't:
defp compare_ids(item, id), do: item.id == id
be enough? Is there any advantage to including %Item{} or making
separate functions for returning both true and false conditions?
What you gain by specifying the first parameter as:
func(%Item{} = item, target_id)
is that only an Item struct will match the first parameter. Here is an example:
defmodule Item do
defstruct [:id, :name, :description]
end
defmodule Dog do
defstruct [:id, :name, :owner]
end
defmodule A do
def go(%Item{} = item), do: IO.inspect(item.id, label: "id: ")
end
In iex:
iex(1)> item = %Item{id: 1, name: "book", description: "old"}
%Item{description: "old", id: 1, name: "book"}
iex(2)> dog = %Dog{id: 1, name: "fido", owner: "joe"}
%Dog{id: 1, name: "fido", owner: "joe"}
iex(3)> A.go item
id: : 1
1
iex(4)> A.go dog
** (FunctionClauseError) no function clause matching in A.go/1
The following arguments were given to A.go/1:
# 1
%Dog{id: 1, name: "fido", owner: "joe"}
a.ex:10: A.go/1
iex(4)>
You get a function clause error if you call the function with a non-Item, and the earlier an error occurs, the better, because it makes debugging easier.
Of course, by preventing the function from accepting other structs, you make the function less general--but because it's a private function, you can't call it from outside the module anyway. On the other hand, if you wanted to call the function on both Dog and Item structs, then you could simply specify the first parameter as:
|
V
func(%{}=thing, target_id)
then both an Item and a Dog would match--but not non-maps.
What you gain by specifying the first parameter as:
|
V
func(%Item{id: id}, target_id)
is that you let erlang's pattern matching engine extract the data you need, rather than calling item.id as you would need to do with this definition:
func(%Item{}=item, target_id)
In erlang, pattern matching in a parameter list is the most efficient/convenient/stylish way to write functions. You use pattern matching to extract the data that you want to use in the function body.
Going even further, if you write the function definition like this:
same variable name
| |
V V
func(%Item{id: target_id}, target_id)
then erlang's pattern matching engine not only extracts the value for the id field from the Item struct, but also checks that the value is equal to the value of the target_id variable in the 2nd argument.
Defining multiple function clauses is a common idiom in erlang, and it is considered good style because it takes advantage of pattern matching rather than logic inside the function body. Here's an erlang example:
get_evens(List) ->
get_evens(List, []).
get_evens([Head|Tail], Results) when Head rem 2 == 0 ->
get_evens(Tail, [Head|Results]);
get_evens([Head|Tail], Results) when Head rem 2 =/= 0 ->
get_evens(Tail, Results);
get_evens([], Results) ->
lists:reverse(Results).
I am using karate automation tool for service automation.. and i am trying to extract department id from the json response which i have stored in variable ...
def departmentId = getstorewalk.departments[*].id
getstorewalk is my variable in which json response is stored
following is the json response
{"walkzz":"001","zz":zz,"ddd":"zz","zz":{"zz":"zz","who":{"zz":"11","zz":"zz"}},"departments":[{"id":need to extract this id,"name":"zz","someorder":1,"zzs":[{"zz":zz,"name":"zz (zz, zz, zz, zz & zz)","someorder":zz,"zz":[{"zz":51,"name":"zz Spread","someorder":16,"zz":"Available","zz":[{"zz":"1223","zz":"zz 30g","zz":3,"zz":0,"zz":"stale","zz":false,"zz":true,"zz":[],"zz":{"zz":2,"zz":"zz","zz":6,"zzName":"Core zz (zz)","department":56,"zz":"015 zz Spreads","zz":"zz Spread","subzz":100,"somezz":"zz & zz","zz":{},"zz_":100},"zz":{"zz":"2017-09-21T11:09:15.524Z","who":{"zz":"11","zz":"zz"}},"action":{"zz":"Include"},"zz":[{"capturezz":375716,"zz":"Gap","qty":15,"zz":"zz","zz":{"zz":"zz","type":"N","name":"zz","sequence":1},"zz":{"zz":"211","who":{"zz":"11","zz":"zz"}}}]}]}]}]}]}
i have used the following in the background :
def getstorewalk = callonce read('classpath:zz/zz/zz.feature')
def departmentId = getstorewalk.departments[*].id
and error is listed below:
com.jayway.jsonpath.PathNotFoundException: Missing property in path $['departments']
at com.jayway.jsonpath.internal.path.PathToken.handleObjectProperty(PathToken.java:72)
at com.jayway.jsonpath.internal.path.PropertyPathToken.evaluate(PropertyPathToken.java:77)
at com.jayway.jsonpath.internal.path.RootPathToken.evaluate(RootPathToken.java:62)
at com.jayway.jsonpath.internal.path.CompiledPath.evaluate(CompiledPath.java:53)
at com.jayway.jsonpath.internal.path.CompiledPath.evaluate(CompiledPath.java:61)
at com.jayway.jsonpath.JsonPath.read(JsonPath.java:187)
at com.jayway.jsonpath.internal.JsonContext.read(JsonContext.java:164)
at com.jayway.jsonpath.internal.JsonContext.read(JsonContext.java:151)
at com.intuit.karate.Script.evalJsonPathOnVarByName(Script.java:339)
at com.intuit.karate.Script.eval(Script.java:262)
at com.intuit.karate.Script.eval(Script.java:182)
at com.intuit.karate.Script.assign(Script.java:606)
at com.intuit.karate.Script.assign(Script.java:537)
at com.intuit.karate.StepDefs.def(StepDefs.java:268)
at ?.* def departmentId = getstorewalk.departments[*].id(C:/Karate/zz/zz/src/test/java/zz/zz/zz.feature:11)
This is working for me, you can try paste the 3 lines below in a Karate file. By the way this is a good tip for troubleshooting, you can test snippets like this without needing to make HTTP calls.
* def getstorewalk = {"walkzz":"001","zz":zz,"ddd":"zz","zz":{"zz":"zz","who":{"zz":"11","zz":"zz"}},"departments":[{"id":need to extract this id,"name":"zz","someorder":1,"zzs":[{"zz":zz,"name":"zz (zz, zz, zz, zz & zz)","someorder":zz,"zz":[{"zz":51,"name":"zz Spread","someorder":16,"zz":"Available","zz":[{"zz":"1223","zz":"zz 30g","zz":3,"zz":0,"zz":"stale","zz":false,"zz":true,"zz":[],"zz":{"zz":2,"zz":"zz","zz":6,"zzName":"Core zz (zz)","department":56,"zz":"015 zz Spreads","zz":"zz Spread","subzz":100,"somezz":"zz & zz","zz":{},"zz_":100},"zz":{"zz":"2017-09-21T11:09:15.524Z","who":{"zz":"11","zz":"zz"}},"action":{"zz":"Include"},"zz":[{"capturezz":375716,"zz":"Gap","qty":15,"zz":"zz","zz":{"zz":"zz","type":"N","name":"zz","sequence":1},"zz":{"zz":"211","who":{"zz":"11","zz":"zz"}}}]}]}]}]}]}
* def departmentId = getstorewalk.departments[*].id
* print departmentId
which logs:
19:40:55.135 [main] INFO com.intuit.karate - [print] ["need to extract this id"]
So, clearly you have made a mistake assigning the response to getstorewalk.
My guess is you intended to do this:
def result = callonce read('classpath:zz/zz/zz.feature')
def departmentId = result.response.departments[*].id
Please read the documentation on calling other feature files carefully and you will probably realize what you are doing wrong.
I am interested in using the monotone spline, but I get an error when R tries to use it. I am using R 2.12.0, and the method 'monoH.FC' says that it has been supported since 2.8.0
Reproducible example (same result for more complicated (x,y) relationships)
x<-1:2
y<-1:2
spline(x,y,method="monoH.FC")
Error in spline(x, y, method = "monoH.FC") : invalid interpolation method
What I have tried
?spline returns:
...
Usage:
...
spline(x, y = NULL, n = 3*length(x), method = "fmm",
xmin = min(x), xmax = max(x), xout, ties = mean)
...
Arguments:
method: specifies the type of spline to be used. Possible values are
‘"fmm"’, ‘"natural"’, ‘"periodic"’ and ‘"monoH.FC"’.
...
But the spline function itself indicates that the 'monoH.FC' method is not supported:
...
method <- pmatch(method, c("periodic", "natural", "fmm"))
if (is.na(method))
stop("invalid interpolation method")
...
Question
How can I use method = 'monoH.FC' with spline?
Use splinefun; it supports method=monoH.FC.
The last example in ?spline shows you how to do it.
## An example of monotone interpolation
n <- 20
set.seed(11)
x. <- sort(runif(n)) ; y. <- cumsum(abs(rnorm(n)))
plot(x.,y.)
curve(splinefun(x.,y.)(x), add=TRUE, col=2, n=1001)
curve(splinefun(x.,y., method="mono")(x), add=TRUE, col=3, n=1001)
legend("topleft", paste("splinefun( \"", c("fmm", "monoH.CS"), "\" )", sep=''),
col=2:3, lty=1)