In Elixir, How can I extract a lambda to a named function when the lambda is in a closure? - function

I have the following closure:
def get!(Item, id) do
Enum.find(
#items,
fn(item) -> item.id == id end
)
end
As I believe this looks ugly and difficult to read, I'd like to give this a name, like:
def get!(Item, id) do
defp has_target_id?(item), do: item.id = id
Enum.find(#items, has_target_id?/1)
end
Unfortunately, this results in:
== Compilation error in file lib/auction/fake_repo.ex ==
** (ArgumentError) cannot invoke defp/2 inside function/macro
(elixir) lib/kernel.ex:5238: Kernel.assert_no_function_scope/3
(elixir) lib/kernel.ex:4155: Kernel.define/4
(elixir) expanding macro: Kernel.defp/2
lib/auction/fake_repo.ex:28: Auction.FakeRepo.get!/2
Assuming it is possible, what is the correct way to do this?

The code you posted has an enormous amount of syntax errors/glitches. I would suggest you start with getting accustomed to the syntax, rather than trying to make Elixir better by inventing the things that nobody uses.
Here is the correct version that does what you wanted. The task might be accomplished with an anonymous function, although I hardly see a reason to make a perfectly looking idiomatic Elixir look ugly.
defmodule Foo do
#items [%{id: 1}, %{id: 2}, %{id: 3}]
def get!(id) do
has_target_id? = fn item -> item.id == id end
Enum.find(#items, has_target_id?)
end
end
Foo.get! 1
#⇒ %{id: 1}
Foo.get! 4
#⇒ nil

You can do this:
def get!(Item, id) do
Enum.find(
#items,
&compare_ids(&1, id)
)
end
defp compare_ids(%Item{}=item, id) do
item.id == id
end
But, that's equivalent to:
Enum.find(
#items,
fn item -> compare_ids(item, id) end
)
and may not pass your looks ugly and difficult to read test.

I was somehow under the impression Elixir supports nested functions?
Easy enough to test:
defmodule A do
def go do
def greet do
IO.puts "hello"
end
greet()
end
end
Same error:
$ iex a.ex
Erlang/OTP 20 [erts-9.2] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:10] [hipe] [kernel-poll:false]
** (ArgumentError) cannot invoke def/2 inside function/macro
(elixir) lib/kernel.ex:5150: Kernel.assert_no_function_scope/3
(elixir) lib/kernel.ex:3906: Kernel.define/4
(elixir) expanding macro: Kernel.def/2
a.ex:3: A.go/0

wouldn't:
defp compare_ids(item, id), do: item.id == id
be enough? Is there any advantage to including %Item{} or making
separate functions for returning both true and false conditions?
What you gain by specifying the first parameter as:
func(%Item{} = item, target_id)
is that only an Item struct will match the first parameter. Here is an example:
defmodule Item do
defstruct [:id, :name, :description]
end
defmodule Dog do
defstruct [:id, :name, :owner]
end
defmodule A do
def go(%Item{} = item), do: IO.inspect(item.id, label: "id: ")
end
In iex:
iex(1)> item = %Item{id: 1, name: "book", description: "old"}
%Item{description: "old", id: 1, name: "book"}
iex(2)> dog = %Dog{id: 1, name: "fido", owner: "joe"}
%Dog{id: 1, name: "fido", owner: "joe"}
iex(3)> A.go item
id: : 1
1
iex(4)> A.go dog
** (FunctionClauseError) no function clause matching in A.go/1
The following arguments were given to A.go/1:
# 1
%Dog{id: 1, name: "fido", owner: "joe"}
a.ex:10: A.go/1
iex(4)>
You get a function clause error if you call the function with a non-Item, and the earlier an error occurs, the better, because it makes debugging easier.
Of course, by preventing the function from accepting other structs, you make the function less general--but because it's a private function, you can't call it from outside the module anyway. On the other hand, if you wanted to call the function on both Dog and Item structs, then you could simply specify the first parameter as:
|
V
func(%{}=thing, target_id)
then both an Item and a Dog would match--but not non-maps.
What you gain by specifying the first parameter as:
|
V
func(%Item{id: id}, target_id)
is that you let erlang's pattern matching engine extract the data you need, rather than calling item.id as you would need to do with this definition:
func(%Item{}=item, target_id)
In erlang, pattern matching in a parameter list is the most efficient/convenient/stylish way to write functions. You use pattern matching to extract the data that you want to use in the function body.
Going even further, if you write the function definition like this:
same variable name
| |
V V
func(%Item{id: target_id}, target_id)
then erlang's pattern matching engine not only extracts the value for the id field from the Item struct, but also checks that the value is equal to the value of the target_id variable in the 2nd argument.
Defining multiple function clauses is a common idiom in erlang, and it is considered good style because it takes advantage of pattern matching rather than logic inside the function body. Here's an erlang example:
get_evens(List) ->
get_evens(List, []).
get_evens([Head|Tail], Results) when Head rem 2 == 0 ->
get_evens(Tail, [Head|Results]);
get_evens([Head|Tail], Results) when Head rem 2 =/= 0 ->
get_evens(Tail, Results);
get_evens([], Results) ->
lists:reverse(Results).

Related

Substring question on mips assembly language

Please help as soon as possible...
Write a MIPS assembly language program that prompts the user to input two strings (each should be no longer than 50 characters including the null terminator). Your program should determine whether the second string is a substring of the first. If it is, then your program should print out the first index in which the second string appears in the first. For example, if the first string is “Hello World” and the second string is “lo”, then the program should print out 3, i.e. the starting index of “lo” in “Hello World.” If the second string is not contained in the first string, then your program should print out -1.
To be able to understand what you have to implement at assembly level, the first thing you should do, is understanding the high-level algorithm. Why?
It's easier for you to see all the cases and edge-cases in time!
To look back at what have I been trying to do again? in the middle of programming the Assembly version.
To quickly see which variables you (certainly) need.
I wrote following program (forgive me, python has been a while for me):
def is_in_string_1(string, substring):
"""
aaba: a, b, ab, ba, aba (last one will fail)
"""
length = len(string)
sublength = len(substring)
if (sublength == 0):
return True
j = 0
for i in range(0, length):
if string[i] != substring[j]:
j = 0
else:
j += 1
if j == sublength:
return True
return False
def is_in_string_2(string, substring):
"""
aaba: a, b, ab, ba, aba
"""
length = len(string)
sublength = len(substring)
for i in range(0, length + 1 - sublength): # string: aabc; substring: c; length: 4; sublength: 1; indexes: 0,1,2,3;
is_substring = True
for j in range(0, sublength): # for the sake of the algorithm, no slicing here
if string[i+j] != substring[j]:
is_substring = False
break
if is_substring:
return True
return False
stringTuples = [
("aaa", "aa"),
("aaa", "aaa"),
("aaa", "aab"),
("abc", "bc"),
("abc", "a"),
("abc", "abc"),
("aaa", ""),
("aaa", "b")
]
for stringTuple in stringTuples:
print(
stringTuple[0],
stringTuple[1],
':',
is_in_string_1(stringTuple[0], stringTuple[1]),
is_in_string_2(stringTuple[0], stringTuple[1]),
sep='\t'
)
I first thought I could optimize the standard solution (is_in_string_2), leaving out the second for-loop (is_in_string_1), but after some thinking I already found out it would fail (the edge-case wasn't even in any of my test-data!) - so I left it as an example for how important it is that you use a correct algorithm.
The program produces the following output:
aaa aa : True True
aaa aaa : True True
aaa aab : False False
abc bc : True True
abc a : True True
abc abc : True True
aaa : True True
aaa b : False False
aaba aba : False True
Notice how all output was correct, except for the last line, where the first algorithm is wrong.
Before you continue:
You have to make your own len() function in MIPS; note that all string are (if I remember correctly) null terminated, so you'll have to count all non-null characters of a string, or in other words, count how many characters precede the null-terminator.
You should use j, $ra and jr calls to go to a "function" or subroutines. (Search)
While in one, you can call other functions using the stack. (Search)

Elixir - JasonHelpers - How can I send a keyword list to json_map?

I have a data structure that I want to convert to json and preserve the key order.
For example:
%{ x: 1, a: 5} should be converted to "{\"x\": 1, \"a\": 5}"
Poison does it without any problem. But when I upgrade to Jason, it changes to "{\"a\": 5, \"x\": 1}".
So I use JasonHelpers json_map to preserve the order like this:
Jason.Helpers.json_map([x: 1, a: 5])
It creates a fragment with correct order.
However, when I use a variable to do this:
list = [x: 1, a: 5]
Jason.Helpers.json_map(list)
I have an error:
** (Protocol.UndefinedError) protocol Enumerable not implemented for {:list, [line: 15], nil} of type Tuple.
....
QUESTION: How can I pass a pre-calculated list into Jason.Helpers.json_map ?
The calculation is complicated, so I don't want to repeat the code just to use json_map, but use the function that returns a list.
json_map/1 is a macro, from its docs:
Encodes a JSON map from a compile-time keyword.
It is designed for compiling JSON at compile-time, which is why it doesn't work with your runtime variable.
Support for encoding keyword lists was added to the Jason library a year ago, but it looks like it hasn't been pushed to hex yet. I managed to get it work by pulling the latest code from github:
defp deps do
[{:jason, git: "https://github.com/michalmuskala/jason.git"}]
end
Then by creating a struct that implements Jason.Encoder (adapted from this solution by the Jason author):
defmodule OrderedObject do
defstruct [:value]
def new(value), do: %__MODULE__{value: value}
defimpl Jason.Encoder do
def encode(%{value: value}, opts) do
Jason.Encode.keyword(value, opts)
end
end
end
Now we can encode objects with ordered keys:
iex(1)> Jason.encode!(OrderedObject.new([x: 1, a: 5]))
"{\"x\":1,\"a\":5}"
I don't know if this is part of the public API or just an implementation detail, but it appears you have some control of the order when implementing the Jason.Encoder protocol for a struct.
Let's say you've defined an Ordered struct:
defmodule Ordered do
#derive {Jason.Encoder, only: [:a, :x]}
defstruct [:a, :x]
end
If you encode the struct, the "a" key will be before the "x" key:
iex> Jason.encode!(%Ordered{a: 5, x: 1})
"{\"a\":5,\"x\":1}"
Let's reorder the keys we pass in to the :only option:
defmodule Ordered do
#derive {Jason.Encoder, only: [:x, :a]}
defstruct [:a, :x]
end
If we now encode the struct, the "x" key will be before the "a" key:
iex> Jason.encode!(%Ordered{a: 5, x: 1})
"{\"x\":1,\"a\":5}"

How do functions work in Elixir?

In book "Programming Elixir" Dave Thomas shows an example
handle_open = fn
{:ok, file} -> "Read data: #{IO.read(file, :line)}"
{_, error} -> "Error: #{:file.format_error(error)}"
end
handle_open.(File.open("code/intro/hello.exs"))
handle_open.(File.open("nonexistent"))
I can't understand why second call of function goes into second flow.
Also why we don't pass an argument into func:
handle_open = fn(file)
?
This is a multi-clause anonymous function. handle_open has 2 clauses, one to match arguments with the pattern {:ok, file} and one with {_, error}. The first pattern that matches the arguments is executed. The given code is almost [1] equivalent to the following:
handle_open = fn arg ->
case arg do
{:ok, file} -> ...
{_, error} -> ...
end
end
So if the file exists, File.open will return {:ok, file} and the first clause will be executed. If it doesn't, {:error, error} will be returned which will match the second clause and execute that.
[1]: "Almost" because the error raised when none of the patterns match will be slightly different for the two cases.
This actually just mimics the pattern-match behaviour of Elixir.
Like you can have pattern match on your Module functions as well.
defmodule MyModule do
def read_file(path) do
File.open(path) |> handle_file()
end
defp handle_file({:ok, file}) do
// File there..
end
defp handle_file({_, error}) do
// Could not open file.. because of `error`
end
end
Like in the book example, I would try to open the file inside a case clause and delegate the outcome to the appropriate function:
defmodule FileReader do
def read_file(path) do
case File.open(path) do
{:ok, file} -> handle_file(file)
{_, error} -> handle_error(error)
end
end
defp handle_file(file) do
// File there..
end
defp handle_error(error) do
//
end
end

What is the simplest way to do upsert with Ecto (MySQL)

Doing upsert is common in my app and I want to implement the cleanest and simple way to implement upsert.
Should I use fragments to implement native sql upsert?
Any idiomatic ecto way to do upsert?
You can use Ecto.Repo.insert_or_update/2, please note that for this to work, you will have to load existing models from the database.
model = %Post{id: 'existing_id', ...}
MyRepo.insert_or_update changeset
# => {:error, "id already exists"}
Example:
result =
case MyRepo.get(Post, id) do
nil -> %Post{id: id} # Post not found, we build one
post -> post # Post exists, using it
end
|> Post.changeset(changes)
|> MyRepo.insert_or_update
case result do
{:ok, model} -> # Inserted or updated with success
{:error, changeset} -> # Something went wrong
end
In my case insert_or_update raised an error due to the unique index constraint 🤔
What did work for me was Postgres v9.5 upsert through on_conflict parameter:
(considering unique column is called user_id)
changeset
|> MyRepo.insert(
on_conflict: :replace_all,
conflict_target: :user_id
)
If you're looking to upsert by something other than id, you can swap in get_by for get like this:
model = %User{email: "existing_or_new_email#heisenberg.net", name: "Cat", ...}
model |> User.upsert_by(:email)
# => {:found, %User{...}} || {:ok, %User{...}}
defmodule App.User do
alias App.{Repo, User}
def upsert_by(%User{} = record_struct, selector) do
case User |> Repo.get_by({selector, record_struct |> Map.get(selector)}) do
nil -> %User{} # build new user struct
user -> user # pass through existing user struct
end
|> User.changeset(record_struct |> Map.from_struct)
|> Repo.insert_or_update
end
end
On the off chance you're looking for a flexible approach that works across models and for multiple selectors (ie country + passport number), check out my hex package EctoConditionals!

Parse a MySQL insert statement with multiple rows [duplicate]

I need a regular expression to select all the text between two outer brackets.
Example:
START_TEXT(text here(possible text)text(possible text(more text)))END_TXT
^ ^
Result:
(text here(possible text)text(possible text(more text)))
I want to add this answer for quickreference. Feel free to update.
.NET Regex using balancing groups:
\((?>\((?<c>)|[^()]+|\)(?<-c>))*(?(c)(?!))\)
Where c is used as the depth counter.
Demo at Regexstorm.com
Stack Overflow: Using RegEx to balance match parenthesis
Wes' Puzzling Blog: Matching Balanced Constructs with .NET Regular Expressions
Greg Reinacker's Weblog: Nested Constructs in Regular Expressions
PCRE using a recursive pattern:
\((?:[^)(]+|(?R))*+\)
Demo at regex101; Or without alternation:
\((?:[^)(]*(?R)?)*+\)
Demo at regex101; Or unrolled for performance:
\([^)(]*+(?:(?R)[^)(]*)*+\)
Demo at regex101; The pattern is pasted at (?R) which represents (?0).
Perl, PHP, Notepad++, R: perl=TRUE, Python: PyPI regex module with (?V1) for Perl behaviour.
(the new version of PyPI regex package already defaults to this → DEFAULT_VERSION = VERSION1)
Ruby using subexpression calls:
With Ruby 2.0 \g<0> can be used to call full pattern.
\((?>[^)(]+|\g<0>)*\)
Demo at Rubular; Ruby 1.9 only supports capturing group recursion:
(\((?>[^)(]+|\g<1>)*\))
Demo at Rubular  (atomic grouping since Ruby 1.9.3)
JavaScript  API :: XRegExp.matchRecursive
XRegExp.matchRecursive(str, '\\(', '\\)', 'g');
Java: An interesting idea using forward references by #jaytea.
Without recursion up to 3 levels of nesting:
(JS, Java and other regex flavors)
To prevent runaway if unbalanced, with * on innermost [)(] only.
\((?:[^)(]|\((?:[^)(]|\((?:[^)(]|\([^)(]*\))*\))*\))*\)
Demo at regex101; Or unrolled for better performance (preferred).
\([^)(]*(?:\([^)(]*(?:\([^)(]*(?:\([^)(]*\)[^)(]*)*\)[^)(]*)*\)[^)(]*)*\)
Demo at regex101; Deeper nesting needs to be added as required.
Reference - What does this regex mean?
RexEgg.com - Recursive Regular Expressions
Regular-Expressions.info - Regular Expression Recursion
Mastering Regular Expressions - Jeffrey E.F. Friedl 1 2 3 4
Regular expressions are the wrong tool for the job because you are dealing with nested structures, i.e. recursion.
But there is a simple algorithm to do this, which I described in more detail in this answer to a previous question. The gist is to write code which scans through the string keeping a counter of the open parentheses which have not yet been matched by a closing parenthesis. When that counter returns to zero, then you know you've reached the final closing parenthesis.
You can use regex recursion:
\(([^()]|(?R))*\)
[^\(]*(\(.*\))[^\)]*
[^\(]* matches everything that isn't an opening bracket at the beginning of the string, (\(.*\)) captures the required substring enclosed in brackets, and [^\)]* matches everything that isn't a closing bracket at the end of the string. Note that this expression does not attempt to match brackets; a simple parser (see dehmann's answer) would be more suitable for that.
This answer explains the theoretical limitation of why regular expressions are not the right tool for this task.
Regular expressions can not do this.
Regular expressions are based on a computing model known as Finite State Automata (FSA). As the name indicates, a FSA can remember only the current state, it has no information about the previous states.
In the above diagram, S1 and S2 are two states where S1 is the starting and final step. So if we try with the string 0110 , the transition goes as follows:
0 1 1 0
-> S1 -> S2 -> S2 -> S2 ->S1
In the above steps, when we are at second S2 i.e. after parsing 01 of 0110, the FSA has no information about the previous 0 in 01 as it can only remember the current state and the next input symbol.
In the above problem, we need to know the no of opening parenthesis; this means it has to be stored at some place. But since FSAs can not do that, a regular expression can not be written.
However, an algorithm can be written to do this task. Algorithms are generally falls under Pushdown Automata (PDA). PDA is one level above of FSA. PDA has an additional stack to store some additional information. PDAs can be used to solve the above problem, because we can 'push' the opening parenthesis in the stack and 'pop' them once we encounter a closing parenthesis. If at the end, stack is empty, then opening parenthesis and closing parenthesis matches. Otherwise not.
(?<=\().*(?=\))
If you want to select text between two matching parentheses, you are out of luck with regular expressions. This is impossible(*).
This regex just returns the text between the first opening and the last closing parentheses in your string.
(*) Unless your regex engine has features like balancing groups or recursion. The number of engines that support such features is slowly growing, but they are still not a commonly available.
It is actually possible to do it using .NET regular expressions, but it is not trivial, so read carefully.
You can read a nice article here. You also may need to read up on .NET regular expressions. You can start reading here.
Angle brackets <> were used because they do not require escaping.
The regular expression looks like this:
<
[^<>]*
(
(
(?<Open><)
[^<>]*
)+
(
(?<Close-Open>>)
[^<>]*
)+
)*
(?(Open)(?!))
>
I was also stuck in this situation when dealing with nested patterns and regular-expressions is the right tool to solve such problems.
/(\((?>[^()]+|(?1))*\))/
This is the definitive regex:
\(
(?<arguments>
(
([^\(\)']*) |
(\([^\(\)']*\)) |
'(.*?)'
)*
)
\)
Example:
input: ( arg1, arg2, arg3, (arg4), '(pip' )
output: arg1, arg2, arg3, (arg4), '(pip'
note that the '(pip' is correctly managed as string.
(tried in regulator: http://sourceforge.net/projects/regulator/)
I have written a little JavaScript library called balanced to help with this task. You can accomplish this by doing
balanced.matches({
source: source,
open: '(',
close: ')'
});
You can even do replacements:
balanced.replacements({
source: source,
open: '(',
close: ')',
replace: function (source, head, tail) {
return head + source + tail;
}
});
Here's a more complex and interactive example JSFiddle.
Adding to bobble bubble's answer, there are other regex flavors where recursive constructs are supported.
Lua
Use %b() (%b{} / %b[] for curly braces / square brackets):
for s in string.gmatch("Extract (a(b)c) and ((d)f(g))", "%b()") do print(s) end (see demo)
Raku (former Perl6):
Non-overlapping multiple balanced parentheses matches:
my regex paren_any { '(' ~ ')' [ <-[()]>+ || <&paren_any> ]* }
say "Extract (a(b)c) and ((d)f(g))" ~~ m:g/<&paren_any>/;
# => (「(a(b)c)」 「((d)f(g))」)
Overlapping multiple balanced parentheses matches:
say "Extract (a(b)c) and ((d)f(g))" ~~ m:ov:g/<&paren_any>/;
# => (「(a(b)c)」 「(b)」 「((d)f(g))」 「(d)」 「(g)」)
See demo.
Python re non-regex solution
See poke's answer for How to get an expression between balanced parentheses.
Java customizable non-regex solution
Here is a customizable solution allowing single character literal delimiters in Java:
public static List<String> getBalancedSubstrings(String s, Character markStart,
Character markEnd, Boolean includeMarkers)
{
List<String> subTreeList = new ArrayList<String>();
int level = 0;
int lastOpenDelimiter = -1;
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (c == markStart) {
level++;
if (level == 1) {
lastOpenDelimiter = (includeMarkers ? i : i + 1);
}
}
else if (c == markEnd) {
if (level == 1) {
subTreeList.add(s.substring(lastOpenDelimiter, (includeMarkers ? i + 1 : i)));
}
if (level > 0) level--;
}
}
return subTreeList;
}
}
Sample usage:
String s = "some text(text here(possible text)text(possible text(more text)))end text";
List<String> balanced = getBalancedSubstrings(s, '(', ')', true);
System.out.println("Balanced substrings:\n" + balanced);
// => [(text here(possible text)text(possible text(more text)))]
The regular expression using Ruby (version 1.9.3 or above):
/(?<match>\((?:\g<match>|[^()]++)*\))/
Demo on rubular
The answer depends on whether you need to match matching sets of brackets, or merely the first open to the last close in the input text.
If you need to match matching nested brackets, then you need something more than regular expressions. - see #dehmann
If it's just first open to last close see #Zach
Decide what you want to happen with:
abc ( 123 ( foobar ) def ) xyz ) ghij
You need to decide what your code needs to match in this case.
"""
Here is a simple python program showing how to use regular
expressions to write a paren-matching recursive parser.
This parser recognises items enclosed by parens, brackets,
braces and <> symbols, but is adaptable to any set of
open/close patterns. This is where the re package greatly
assists in parsing.
"""
import re
# The pattern below recognises a sequence consisting of:
# 1. Any characters not in the set of open/close strings.
# 2. One of the open/close strings.
# 3. The remainder of the string.
#
# There is no reason the opening pattern can't be the
# same as the closing pattern, so quoted strings can
# be included. However quotes are not ignored inside
# quotes. More logic is needed for that....
pat = re.compile("""
( .*? )
( \( | \) | \[ | \] | \{ | \} | \< | \> |
\' | \" | BEGIN | END | $ )
( .* )
""", re.X)
# The keys to the dictionary below are the opening strings,
# and the values are the corresponding closing strings.
# For example "(" is an opening string and ")" is its
# closing string.
matching = { "(" : ")",
"[" : "]",
"{" : "}",
"<" : ">",
'"' : '"',
"'" : "'",
"BEGIN" : "END" }
# The procedure below matches string s and returns a
# recursive list matching the nesting of the open/close
# patterns in s.
def matchnested(s, term=""):
lst = []
while True:
m = pat.match(s)
if m.group(1) != "":
lst.append(m.group(1))
if m.group(2) == term:
return lst, m.group(3)
if m.group(2) in matching:
item, s = matchnested(m.group(3), matching[m.group(2)])
lst.append(m.group(2))
lst.append(item)
lst.append(matching[m.group(2)])
else:
raise ValueError("After <<%s %s>> expected %s not %s" %
(lst, s, term, m.group(2)))
# Unit test.
if __name__ == "__main__":
for s in ("simple string",
""" "double quote" """,
""" 'single quote' """,
"one'two'three'four'five'six'seven",
"one(two(three(four)five)six)seven",
"one(two(three)four)five(six(seven)eight)nine",
"one(two)three[four]five{six}seven<eight>nine",
"one(two[three{four<five>six}seven]eight)nine",
"oneBEGINtwo(threeBEGINfourENDfive)sixENDseven",
"ERROR testing ((( mismatched ))] parens"):
print "\ninput", s
try:
lst, s = matchnested(s)
print "output", lst
except ValueError as e:
print str(e)
print "done"
You need the first and last parentheses. Use something like this:
str.indexOf('('); - it will give you first occurrence
str.lastIndexOf(')'); - last one
So you need a string between,
String searchedString = str.substring(str1.indexOf('('),str1.lastIndexOf(')');
because js regex doesn't support recursive match, i can't make balanced parentheses matching work.
so this is a simple javascript for loop version that make "method(arg)" string into array
push(number) map(test(a(a()))) bass(wow, abc)
$$(groups) filter({ type: 'ORGANIZATION', isDisabled: { $ne: true } }) pickBy(_id, type) map(test()) as(groups)
const parser = str => {
let ops = []
let method, arg
let isMethod = true
let open = []
for (const char of str) {
// skip whitespace
if (char === ' ') continue
// append method or arg string
if (char !== '(' && char !== ')') {
if (isMethod) {
(method ? (method += char) : (method = char))
} else {
(arg ? (arg += char) : (arg = char))
}
}
if (char === '(') {
// nested parenthesis should be a part of arg
if (!isMethod) arg += char
isMethod = false
open.push(char)
} else if (char === ')') {
open.pop()
// check end of arg
if (open.length < 1) {
isMethod = true
ops.push({ method, arg })
method = arg = undefined
} else {
arg += char
}
}
}
return ops
}
// const test = parser(`$$(groups) filter({ type: 'ORGANIZATION', isDisabled: { $ne: true } }) pickBy(_id, type) map(test()) as(groups)`)
const test = parser(`push(number) map(test(a(a()))) bass(wow, abc)`)
console.log(test)
the result is like
[ { method: 'push', arg: 'number' },
{ method: 'map', arg: 'test(a(a()))' },
{ method: 'bass', arg: 'wow,abc' } ]
[ { method: '$$', arg: 'groups' },
{ method: 'filter',
arg: '{type:\'ORGANIZATION\',isDisabled:{$ne:true}}' },
{ method: 'pickBy', arg: '_id,type' },
{ method: 'map', arg: 'test()' },
{ method: 'as', arg: 'groups' } ]
While so many answers mention this in some form by saying that regex does not support recursive matching and so on, the primary reason for this lies in the roots of the Theory of Computation.
Language of the form {a^nb^n | n>=0} is not regular. Regex can only match things that form part of the regular set of languages.
Read more # here
I didn't use regex since it is difficult to deal with nested code. So this snippet should be able to allow you to grab sections of code with balanced brackets:
def extract_code(data):
""" returns an array of code snippets from a string (data)"""
start_pos = None
end_pos = None
count_open = 0
count_close = 0
code_snippets = []
for i,v in enumerate(data):
if v =='{':
count_open+=1
if not start_pos:
start_pos= i
if v=='}':
count_close +=1
if count_open == count_close and not end_pos:
end_pos = i+1
if start_pos and end_pos:
code_snippets.append((start_pos,end_pos))
start_pos = None
end_pos = None
return code_snippets
I used this to extract code snippets from a text file.
This do not fully address the OP question but I though it may be useful to some coming here to search for nested structure regexp:
Parse parmeters from function string (with nested structures) in javascript
Match structures like:
matches brackets, square brackets, parentheses, single and double quotes
Here you can see generated regexp in action
/**
* get param content of function string.
* only params string should be provided without parentheses
* WORK even if some/all params are not set
* #return [param1, param2, param3]
*/
exports.getParamsSAFE = (str, nbParams = 3) => {
const nextParamReg = /^\s*((?:(?:['"([{](?:[^'"()[\]{}]*?|['"([{](?:[^'"()[\]{}]*?|['"([{][^'"()[\]{}]*?['")}\]])*?['")}\]])*?['")}\]])|[^,])*?)\s*(?:,|$)/;
const params = [];
while (str.length) { // this is to avoid a BIG performance issue in javascript regexp engine
str = str.replace(nextParamReg, (full, p1) => {
params.push(p1);
return '';
});
}
return params;
};
This might help to match balanced parenthesis.
\s*\w+[(][^+]*[)]\s*
This one also worked
re.findall(r'\(.+\)', s)