Is there a way to sort a list of ordered tuples of strings into a list of strings while maintaining the order of the strings in the tuples? - nltk

I have a list of tuples. Each tuples is ordered according to a string that appears before another. So for the first tuple in list ‘gi-‘ linearly proceeds ‘ba-‘.
list = [(‘gi-‘, ‘ba-‘), (‘be-‘, ‘ke-‘), (‘be-‘, ‘ba-‘), (‘gi-‘, ‘ke-‘), … ]
I am trying to make a function to give the ordered_list. Notice that I expect some strings to be unordered with respect to each other, such as ‘ke-‘ and ‘ba-‘.
ordered_list = [‘gi-‘, ‘be-‘, {‘ke-‘, ba-} … ]
I’m not sure what sorting method I would use to do this task. This is a nltk project that is similar to forming a Cinque style order of adverbs.

Related

How to store centrality values in a file sequentially?

I'm using network extension "nw"I calculated centrality metric like betweenness and I'm trying to print the values of nodes sequentially to
csv files
I wish the result consist two column the first is turtle-id and second the betweenness of turtle
--
and so on
to save
file-open "turtles.csv"
Let Dk1 [ nw:betweenness-centrality] of turtle-set sort turtles
if is-number? Dk1 [ set Dk1 precision Dk1 2 ]
file-print(word "betweenness-centrality: " Dk1)
file-close ;
end
The result of this code changes every time it is executed and They are different from what they appear in world
Remember that agentsets in NetLogo are unordered while lists are ordered. The sort primitive returns a list. In this case, sort turtles returns a list of turtles sorted by who number. However, if you then turn that list back into a turtle-set you'll lose the ordered properties of the list.
Instead of using the of primitive to get a list of betweenness-centrality values in an agentset, just iterate over the list that is returned by sort. For example:
foreach sort turtles [ a-turtle -> show [who] of a-turtle ]

jq: groupby and nested json arrays

Let's say I have: [[1,2], [3,9], [4,2], [], []]
I would like to know the scripts to get:
The number of nested lists which are/are not non-empty. ie want to get: [3,2]
The number of nested lists which contain or not contain number 3. ie want to get: [1,4]
The number of nested lists for which the sum of the elements is/isn't less than 4. ie want to get: [3,2]
ie basic examples of nested data partition.
Since stackoverflow.com is not a coding service, I'll confine this response to the first question, with the hope that it will convince you that learning jq is worth the effort.
Let's begin by refining the question about the counts of the lists
"which are/are not empty" to emphasize that the first number in the answer should correspond to the number of empty lists (2), and the second number to the rest (3). That is, the required answer should be [2,3].
Solution using built-in filters
The next step might be to ask whether group_by can be used. If the ordering did not matter, we could simply write:
group_by(length==0) | map(length)
This returns [3,2], which is not quite what we want. It's now worth checking the documentation about what group_by is supposed to do. On checking the details at https://stedolan.github.io/jq/manual/#Builtinoperatorsandfunctions,
we see that by design group_by does indeed sort by the grouping value.
Since in jq, false < true, we could fix our first attempt by writing:
group_by(length > 0) | map(length)
That's nice, but since group_by is doing so much work when all we really need is a way to count, it's clear we should be able to come up with a more efficient (and hopefully less opaque) solution.
An efficient solution
At its core the problem boils down to counting, so let's define a generic tabulate filter for producing the counts of distinct string values. Here's a def that will suffice for present purposes:
# Produce a JSON object recording the counts of distinct
# values in the given stream, which is assumed to consist
# solely of strings.
def tabulate(stream):
reduce stream as $s ({}; .[$s] += 1);
An efficient solution can now be written down in just two lines:
tabulate(.[] | length==0 | tostring )
| [.["true", "false"]]
QED
p.s.
The function named tabulate above is sometimes called bow (for "bag of words"). In some ways, that would be a better name, especially as it would make sense to reserve the name tabulate for similar functionality that would work for arbitrary streams.

Capture a value from a repeating group on every iteration (as opposed to just last occurrence)

How does one capture a value recursively with regex, where value is a part of a group that repeats?
I have a serialized array in mysql database
These are 3 examples of a serialized array
a:2:{i:0;s:2:"OR";i:1;s:2:"WA";}
a:1:{i:0;s:2:"CA";}
a:4:{i:0;s:2:"CA";i:1;s:2:"ID";i:2;s:2:"OR";i:3;s:2:"WA";}
a:1 stands for array:{number of elements}
then in between {} i:0 means element 0, i:1 means element 1 etc.
then the actual value s:2:"CA" means string with length of 2
so I have 2 elements in first array, 1 element in the second and 4 elements in the last
I have this data in mysql database and I DO NOT HAVE an option to parse this with back-end code - this has to be done in mysql (10.0.23-MariaDB-log)
the repeating pattern is inside of the curly braces
the number of repeats is variable (as in 3 examples each has a different number of repeating patterns),
the number of repeating patterns is defined by the number at 3rd position (if that helps)
for the first example it's a:2:
and so there are 2 repeating blocks:
i:0;s:2:"OR";
i:1;s:2:"WA";
I only care to extract the values in bold
So I came up with this regex
^a:(?:\d+):\{(?:i:(?:\d+);s:(?:\d+):\"(\w\w)\";)+}$
it captures the values I want all right but problem is it only captures the last one in each repeating group
so going back to the example what would be captured is
WA
CA
WA
What I would want is
OR|WA
CA
CA|ID|OR|WA
these are the language specific regex functions available to me:
https://mariadb.com/kb/en/library/regular-expressions-functions/
I don't care which one is used to solve the problem
Ultimately I need this in as sensible form that can be presented to the client e.g. CA,ID,OR or CA|ID|OR
Current thoughts are perhaps this isn't possible in a one liner, and I have to write a multi-step function where
extract the repeating portion between the curly braces
then somehow iterate over each repeating portion
then use the regex on each
then return the results as one string with separated elements
I doubt if such a capture is possible. However, this would probably do the job for your specific purpose.
REGEXP_REPLACE(
REGEXP_REPLACE(
REGEXP_REPLACE(str1, '^a:\\d+:\{', ''),
'i:\\d+;s:\\d+:\"(\\w\\w)\";',
'\\1,'
),
'\,?\}$',
''
)
Basically, this works with the input string (or column) str1 like
remove the first part
replace every cell with the string you want
remove the last 2 characters, ,}
and voila! You get a string CA,ID,OR.
Aftenote
It may or may not work well when the original array before serialised is empty (it depends how it is serialised).

Endeca need to return all its values under one dimension

I need to return all values under one dimension (e.g. Product.category) in Endeca and return all its values as JSON object to content assembler. Can someone provide an optimal way to achieve this feature?
This is a tricky one, particularly because I'm assuming the product.category is a hierarchical dimension.
With a regular navigation query (such as a search results page), there's no way to bring back every level of a hierarchical dimension at once. However, using a Dimension search (and if you have --compoundDimSearch turned OFF), you can make a query like this: D=*&Dn=0&Di=10001 (where 10001 might be the dimension ID for product.category).
That will bring back every value in the dimension.
What you could do is maybe make / extend the DimensionSearchResultsHandler to help you out. In the preprocess() method, you would construct a query like the one above.
Then in the process method, you'd do something like:
ENEQueryResults results = executeMdexRequest(mMdexRequest);
NavigationState navigationState = getNavigationState();
navigationState.inform(results);
DimensionSearchResults dimensionSearchResults = new DimensionSearchResults(cartridgeConfig);
DimensionSearchResultsBuilder.build(
getActionPathProvider(),
dimensionSearchResults,
navigationState,
results.getDimensionSearch(),
cartridgeConfig.getDimensionList(),
cartridgeConfig.getMaxResults(),
cartridgeConfig.isShowCountsEnabled());
return dimensionSearchResults;
That will help you build out the Assembler objects for the results. Then if you made an Assembler query that returns JSON, these results would be returned as well.
One big caveat: The results above aren't nicely formatted. What I mean is that this will bring back every leaf value and its ancestors. If you wanted to create a nice hierarchical display, you'd have to do a bunch of formatting yourself.

Map and Filter in Haskell

I have two lists of tuples which are as follows: [(String,Integer)] and [(Float,Integer)]. Each list has several tuples.
For every Integer that has a Float in the second list, I need to check if its Integer matches the Integer in the first list, and if it does, return the String - although this function needs to return a list of Strings, i.e. [String] with all the results.
I have already defined a function which returns a list of Integers from the second list (for the comparison on the integers in the first list).
This should be solvable using "high-order functions". I've spent a considerably amount of time playing with map and filter but haven't found a solution!
You have a list of Integers from the second list. Let's call this ints.
Now you need to do two things--first, filter the (String, Integer) list so that it only contains pairs with corresponding integers in the ints list and secondly, turn this list into just a list of String.
These two steps correspond to the filter and map respectively.
First, you need a function to filter by. This function should take a (String, Integer) pair and return if the integer is in the ints list. So it should have a type of:
check :: (String, Integer) -> Bool
Writing this should not be too difficult. Once you have it, you can just filter the first list by it.
Next, you need a function to transform a (String, Integer) pair into a String. This will have type:
extract :: (String, Integer) -> String
This should also be easy to write. (A standard function like this actually exists, but if you're just learning it's healthy to figure it out yourself.) You then need to map this function over the result of your previous filter.
I hope this gives you enough hints to get the solution yourself.
One can see in this example how important it is to describe the problem accurately, not only to others but foremost to oneself.
You want the Strings from the first list, whose associated Integer does occur in the second list.
With such problems it is important to do the solutions in small steps. Most often one cannot write down a function that does it right away, yet this is what many beginners think they must do.
Start out by writing the type signature you need for your function:
findFirsts :: [(String, Integer)] -> [(Float, Integer)] -> [String]
Now, from the problem description, we can deduce, that we essentially have two things to do:
Transform a list of (String, Integer) to a list of String
Select the entries we want.
Hence, the basic skeleton of our function looks like:
findFirsts sis fis = map ... selected
where
selected = filter isWanted sis
isWanted :: (String, Integer) -> Bool
isWanted (_,i) = ....
You'll need the functions fst, elem and snd to fill out the empty spaces.
Side note: I personally would prefer to solve this with a list comprehension, which results often in better readable (for me, anyway) code than a combination of map and filter with nontrivial filter criteria.
Half of the problem is to get the string list if you have a single integer. There are various possibilities to do this, e.g. using filter and map. However you can combine both operations using a "fold":
findAll x axs = foldr extract [] axs where
extract (a,y) runningList | x==y = a:runningList
| otherwise = runningList
--usage:
findAll 2 [("a",2),("b",3),("c",2)]
--["c","a"]
For a fold you have a start value (here []) and an operation that combines the running values successively with all list elements, either starting from the left (foldl) or from the right (foldr). Here this operation is extract, and you use it to decide whether to add the string from the current element to the running list or not.
Having this part done, the other half is trivial: You need to get the integers from the (Float,Integer) list, call findAll for all of them, and combine the results.