Julia: How to find the longest word in a given string? - function

I am very new in Julia, I got this challenge from the web:
How can I find the longest word in a given string?
I would like to build a function which would allow to obtain the longest string, even in cases where punctuation is used.
I was trying to to the following code:
function LongestWord(sen::String)
sentence =maximum(length(split(sen, "")))
word= [(x, length(x)) for x in split(sen, " ")]
return((word))
end
LongestWord("Hello, how are you? nested, punctuation?")
But I haven't manage to find the solution.

You can use regex too. It only needs a slight change from #Bogumil's answer:
julia> function LongestWord2(sen::AbstractString)
words = matchall(r"\w+", sen)
words[findmax(length.(words))[2]]
end
LongestWord2 (generic function with 1 method)
julia> LongestWord2("Hello, how are you? nested, punctuation?")
"punctuation"
This way you get rid of the punctuations and get the raw word back.
To consolidate the comments here's some further explanation:
matchall() takes a regex, in this case r"\w+" which matches word like substrings, so letters, numbers and lowercases and returns an array of strings that match the regex.
length.() is using the combination of the length function and . which broadcasts the operation across all elements of the array. So we're counting the length of each array element (word).
Findmax() returns a tuple of length 2 where the 2 argument gives us the index of the maximum element. I use this to subset the words array and return the longest word.

I understand that you want to retain punctuation and want to split only on space (" "). If this is the case then you can use findmax. Note that I have changed the order of length(x) and x. In this way you will find the longest word, and among words of equal maximum length you will find the word that is last when using string comparison. Also I put AbstractString in the signature of the function as it will work on any string:
julia> function LongestWord(sen::AbstractString)
word = [(length(x), x) for x in split(sen, " ")]
findmax(word)[1][2]
end
LongestWord (generic function with 1 method)
julia> LongestWord("Hello, how are you? nested, punctuation?")
"punctuation?"
This is the simplest solution but not the fastest (you could loop through the original string by searching consecutive occurrences of space without creating word vector using findnext function).
Other approach (even shorter):
julia> function LongestWord3(sen::AbstractString)
word = split(sen, " ")
word[indmax(length.(word))]
end
LongestWord3 (generic function with 1 method)
julia> LongestWord3("Hello, how are you? nested, punctuation?")
"punctuation?"

My version specifically defines what symbols are allowable (in this case letters, numbers and spaces):
ALLOWED_SYMBOLS = "ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890 \t\n"
function get_longest_word(text::String)::String
letters = Vector{Char}()
for symbol in text
if uppercase(symbol) in ALLOWED_SYMBOLS
push!(letters, symbol)
end
end
words = split(join(letters))
return words[indmax(length.(words))]
end
#time get_longest_word("Hello, how are you? nested, punctuation?")
"punctuation"
I doubt it's the most efficient code in the world, but it pulls 'ANTIDISESTABLISHMENTARIANISM' out of a 45,000-word dictionary in about 0.1 seconds. Of course, it won't tell me if there is more than one word of the maximum length! That's a problem for another day...

Related

NetSuite Saved Search: REGEXP_SUBSTR Pattern troubles

I am trying to break down a string that looks like this:
|5~13~3.750~159.75~66.563~P20~~~~Bundle A~~|
Here is a second example for reference:
|106~10~0~120~1060.000~~~~~~~|
Here is a third example of a static sized item:
|3~~~~~~~~~~~5:12|
Example 4:
|3~23~5~281~70.250~upper r~~~~~~|
|8~22~6~270~180.000~center~~~~~~|
|16~22~1~265~353.333~center~~~~~~|
Sometimes there are multiple lines in the same string.
I am not super familiar with setting up patterns for regexp_substr and would love some assistance with this!
The string will always have '|' at the beginning and end and 11 '~'s used to separate the numeric/text values which I am hoping to obtain. Also some of the numeric characters have decimals while others do not. If it helps the values are separated like so:
|Quantity~ Feet~ Inch~ Unit inches~ Total feet~ Piece mark~ Punch Pattern~ Notch~ Punch~ Bundling~ Radius~ Pitch|
As you can see, if there isn't something specified it shows as blank, but it may have them in another string, its rare for all of the values to have data.
For this specific case I believe regexp_substr will be my best option but if someone has another suggestion I'd be happy to give it a shot!
This is the formula(Text) I was able to come up with so far:
REGEXP_SUBSTR({custbody_msm_cut_list},'[[:alnum:]. ]+|$',1,1)
This allows me to pull all the matches held in the strings, but if some fields are excluded it makes presenting the correct data difficult.
TRIM(REGEXP_SUBSTR({custbody_msm_cut_list}, '^\|(([^~]*)~){1}',1,1,'i',2))
From the start of the string, match the pipe character |, then match anything except a tilde ~, then match the tilde. Repeat N times {1}. Return the last of these repeats.
You can control how many tildes are processed by the integer in the braces {1}
EG:
TRIM(REGEXP_SUBSTR('|Quantity~ Feet~ Inch~ Unit inches~ Total feet~ Piece mark~ Punch Pattern~ Notch~ Punch~ Bundling~ Radius~ Pitch|', '^\|(([^~]*)~){1}',1,1,'i',2))
returns "Quantity"
TRIM(REGEXP_SUBSTR('|Quantity~ Feet~ Inch~~~ Piece mark~ Punch Pattern~ Notch~ Punch~ Bundling~ Radius~ Pitch|', '^\|(([^~]*)~){7}',1,1,'i',2))
returns "Punch Pattern"
The final value Pitch is a slightly special case as it is not followed by a tilde:
TRIM(REGEXP_SUBSTR('|~~~~~~~~~~ Radius~ Pitch|', '^\|(([^~]*)~){11}([^\|]*)',1,1,'i',3))
Adapted and improved from https://stackoverflow.com/a/70264782/7885772

Returning min in list of lists error

Trying to return the min value in this function, but I keep getting the error "TypeError: my_min() takes 1 positional argument but 3 were given." Advice on what I need to change?
def my_min(xss):
min = xss[0]
for i in xss:
if i < min:
min = i
return min
You need to wrap the content within another list as [[2,3],[],[4,1,-5]]. Hence your function call should be:
my_min([[2,3],[],[4,1,-5]])
You are getting this error because without wrapping these to the list, these are getting treated as three separate argument to the function. But your function is defined to accept just one.
Simplest way to find the minimum in number in the list of lists is using min function as:
def my_min(xss):
return min([min(item) for item in xss if item])
As I mentioned in my comment, your first problem is that you were calling your method with three lists when you should be calling it as a list-of-lists. To be more explicit, you should be calling as:
my_min([[2,3],[],[4,1,-5]])
Now, the next problem you are facing is that you are only iterating through the first level of your list. This is incorrect. You have to realize you have a list of lists. So, to stick to the logic in your solution, you need to iterate again inside your for loop to find the minimum value of each list inside your list. Your min_val (I changed the name because it shadows the built-in min)
def my_min(xss):
min_val = None
# iterate over the first loop
for i in xss:
# i now is each sublist in your list
for j in i:
if min_val is None or j < min_val:
min_val = j
return min_val
print(my_min([[], [2,3],[],[4,1,-5]]))
result -> -5
Per the comment by Blckknght, you can actually still pass three separate lists to your method, by declaring with an *, which is an unpacking argument in this case. So:
def my_min(*xss)
If you pass your separate lists like you originally were, it will work, and you will end up with a tuple of lists. The solution will not change, it just adds the ability to pass to your method the way you were originally doing it. Here is the doc on argument unpacking.
Your function takes only a single argument, you can use multiple arguments by commas
my_min(a,b,c) #takes 3 arguments instead of one
Or just call the function multiple times if you want to test the function for different test cases
my_min([2,3])
my_min([])
my_min([4,1,-5])
You can also use lists within lists if you are looking to do that
As mentioned by other, the first thing to do is to pass my_min a list by wrapping your input like this:
my_min([[2,3],[],[4,1,-5]])
Then, what you need to do is find the min recursively. I have also added a check for an empty list
from math import inf
def my_min(xss):
if not xss:
return inf
min = inf
for i in xss:
if isinstance(i, list):
i = my_min(i)
if i < min:
min = i
return min
my_min([[2,3],[],[4,1,-5]])

Map and Filter in Haskell

I have two lists of tuples which are as follows: [(String,Integer)] and [(Float,Integer)]. Each list has several tuples.
For every Integer that has a Float in the second list, I need to check if its Integer matches the Integer in the first list, and if it does, return the String - although this function needs to return a list of Strings, i.e. [String] with all the results.
I have already defined a function which returns a list of Integers from the second list (for the comparison on the integers in the first list).
This should be solvable using "high-order functions". I've spent a considerably amount of time playing with map and filter but haven't found a solution!
You have a list of Integers from the second list. Let's call this ints.
Now you need to do two things--first, filter the (String, Integer) list so that it only contains pairs with corresponding integers in the ints list and secondly, turn this list into just a list of String.
These two steps correspond to the filter and map respectively.
First, you need a function to filter by. This function should take a (String, Integer) pair and return if the integer is in the ints list. So it should have a type of:
check :: (String, Integer) -> Bool
Writing this should not be too difficult. Once you have it, you can just filter the first list by it.
Next, you need a function to transform a (String, Integer) pair into a String. This will have type:
extract :: (String, Integer) -> String
This should also be easy to write. (A standard function like this actually exists, but if you're just learning it's healthy to figure it out yourself.) You then need to map this function over the result of your previous filter.
I hope this gives you enough hints to get the solution yourself.
One can see in this example how important it is to describe the problem accurately, not only to others but foremost to oneself.
You want the Strings from the first list, whose associated Integer does occur in the second list.
With such problems it is important to do the solutions in small steps. Most often one cannot write down a function that does it right away, yet this is what many beginners think they must do.
Start out by writing the type signature you need for your function:
findFirsts :: [(String, Integer)] -> [(Float, Integer)] -> [String]
Now, from the problem description, we can deduce, that we essentially have two things to do:
Transform a list of (String, Integer) to a list of String
Select the entries we want.
Hence, the basic skeleton of our function looks like:
findFirsts sis fis = map ... selected
where
selected = filter isWanted sis
isWanted :: (String, Integer) -> Bool
isWanted (_,i) = ....
You'll need the functions fst, elem and snd to fill out the empty spaces.
Side note: I personally would prefer to solve this with a list comprehension, which results often in better readable (for me, anyway) code than a combination of map and filter with nontrivial filter criteria.
Half of the problem is to get the string list if you have a single integer. There are various possibilities to do this, e.g. using filter and map. However you can combine both operations using a "fold":
findAll x axs = foldr extract [] axs where
extract (a,y) runningList | x==y = a:runningList
| otherwise = runningList
--usage:
findAll 2 [("a",2),("b",3),("c",2)]
--["c","a"]
For a fold you have a start value (here []) and an operation that combines the running values successively with all list elements, either starting from the left (foldl) or from the right (foldr). Here this operation is extract, and you use it to decide whether to add the string from the current element to the running list or not.
Having this part done, the other half is trivial: You need to get the integers from the (Float,Integer) list, call findAll for all of them, and combine the results.

How to remove digits from the end of the string using SQL

Please, could you answer my question.
How to remove digits from the end of the string using SQL?
For example, the string '2Ga4la2009' must be converted to 2Ga4la. The problem is that we can't trim them because we don't know how many digits are in the end of the string.
Best regards, Galina.
This seems to work:
select left( concat('2Ga4la2009','1'), length(concat('2Ga4la2009','1')) - length(convert(convert(reverse(concat('2Ga4la2009','1')),unsigned),char)))
The concat('myvalue', '1') is to protect against numbers that end in 0s.
The reverse flips it around so the number is at the front.
The inner convert changes the reversed string to a number, dropping the trailing chars.
The outer convert turns the numeric part back to characters, so you can get the length.
Now you know the length of the numeric portion, and you can determine the number of characters of the original value to chop off with the "left()" function.
Ugly, but it works. :-)
Take a look at this: http://www.mysqludf.org/lib_mysqludf_preg/
And if you for some reason can't use UDF, and don't want to do it on the db client side, you can always do the following:
Find the position of the first letter from the end (e.g. the minimum of the 25 LOCATEs on the string's reverse)
Do LEFT(#string, #string_length - #result_of_step_1)
You don't have to do any special handling in case there aren't any digits at the end of the string because in this case LOCATE returns 0.
Cheers

Parsing and formatting search results

Search:
Scripting+Language Web+Pages Applications
Results:
...scripting language originally...producing dynamic web pages. It has...graphical applications....purpose scripting language that is...d creating web pages as output...
Suppose I want a value that represents the amount of characters to allow as padding on either side of the matched terms, and another value that represents how many matches will be shown in the result (ie, I want to see only the first 5 matches, nothing more).
How exactly would you go about doing this?
This is pretty language-agnostic, but I will be implementing the solution in a PHP environment, so please restrict answers to options that do not require a specific language or framework.
Here's my thought process: create an array from the search words. Determine which search word has the lowest index regarding where it's found in the article-body. Gather that portion of the body into another variable, and then remove that section from the article-body. Return to step 1. You might even add a counter to each word, skipping it when the counter reaches 3 or so.
Important:
The solution must match all search terms in a non-linear fashion. Meaning, term one should be found after term two if it exists after term two. Likewise, it should be found after term 3 as well. Term 3 should be found before term 1 and 2, if it happens to exist before them.
The solution should allow me to declare "Only allow up to three matches for each term, then terminate the summary."
Extra Credit:
Get the padding-variable to optionally pad words, rather than chars.
My thought process:
Create a results array that supports non-unique name/value pairs (PHP supports this in its standard array object)
Loop through each search term and find its character starting position in the search text
Add an item to the results array that stores this character position it has just found with the actual search term as the key
When you've found all the search terms, sort the array ascending by value (the character position of the search term)
Now, the search results will be in order that they were found in the search text
Loop through the results array and use the specified word padding to get words on each side of the search term while also keeping track of the word count in a separate name/value pair
Pseudocode, or my best attempt at it:
function string GetSearchExcerpt(searchText, searchTerms, wordPadding = 0, searchLimit = 3)
{
results = new array()
startIndex = 0
foreach (searchTerm in searchTerms)
{
charIndex = searchText.FindByIndex(searchTerms, startIndex) // finds 1st position of searchTerm starting at startIndex
results.Add(searchTerm, charIndex)
startIndex = charIndex + 1
}
results = results.SortByValue()
lastSearchTerm = ""
searchTermCount = new array()
outputText = ""
foreach (searchTerm => charIndex in results)
{
searchTermCount[searchTerm]++
if (searchTermCount[searchTerm] <= searchLimit)
{
// WordPadding is a simple function that moves left or right a given number of words starting at a specified character index and returns those words
outputText += "..." + WordPadding(-wordPadding, charIndex) + "<strong>" + searchTerm + "</strong>" + WordPadding(wordPadding, charIndex)
}
}
return outputText
}
Personally I would convert the search terms into Regular Expressions and then use a Regex Find-Replace to wrap the matches in strong tags for the formatting.
Most likely the RegEx route would be you best bet. So in your example, you would end up getting three separate RegEx values.
Since you want a non-language dependent solution I will not put the actual expressions here as the exact syntax varies by language.