I received help on a related question previously on this forum and am wondering if there is a similarly straightforward way to resolve a more complex issue.
Given the following snippet, is there a way to merge the partial sentence (the one which does not end with a "[punctuation mark][white space]" pattern) with its remainder based on the matching TextSize? When I tried to adjust the answer from the related question I quickly ran into issues, but I am basically looking to translate a rule such as if .Text !endswith("[punctuation mark][white space]") then .Text + next .Text where .TextSize matches
{
"Text": "Was it political will that established social democratic policies in the 1930s and ",
"Path": "P",
"TextSize": 9
},
{
"Text": "31 Lawrence Mishel and Jessica Schieder, Economic Policy Institute website, May 24, 2016 at (https://www.epi.org/publication/as-union-membership-has-fallen-the-top-10-percent-have-been-getting-a-larger-share-of-income/). ",
"Path": "Footnote",
"TextSize": 8
},
{
"Text": "Fig. 9.2 Higher union membership has been associated with a higher share of income to lower income brackets (the lower 90%) and a lower share of income to the top 10% of earners. ",
"Path": "P",
"TextSize": 8
},
{
"Text": "1940s, or that undermined them after the 1970s? Or was it abundant and cheap energy resources that enabled social democratic policies to work until the 1970s, and energy constraints that forced a restructuring of policy after the 1970s? ",
"Path": "P",
"TextSize": 9
},
{
"Text": "Recall that my economic modeling discussed in Chap. 6 shows that, even with no change in the assumption related to labor \u201cbargaining power,\u201d you can explain a shift from increasing to declining income equality (higher equality expressed as a higher wage share) by a corresponding shift from a period of rapidly increasing per capita resource consumption to one of constant per capita resource consumption. ",
"Path": "P",
"TextSize": 9
}
The result I'm looking for would be as follows:
{
"Text": "Was it political will that established social democratic policies in the 1930s and 1940s, or that undermined them after the 1970s? Or was it abundant and cheap energy resources that enabled social democratic policies to work until the 1970s, and energy constraints that forced a restructuring of policy after the 1970s? ",
"Path": "P",
"TextSize": 9
},
{
"Text": "31 Lawrence Mishel and Jessica Schieder, Economic Policy Institute website, May 24, 2016 at (https://www.epi.org/publication/as-union-membership-has-fallen-the-top-10-percent-have-been-getting-a-larger-share-of-income/). ",
"Path": "Footnote",
"TextSize": 8
},
{
"Text": "Fig. 9.2 Higher union membership has been associated with a higher share of income to lower income brackets (the lower 90%) and a lower share of income to the top 10% of earners. ",
"Path": "P",
"TextSize": 8
},
{
"Text": "Recall that my economic modeling discussed in Chap. 6 shows that, even with no change in the assumption related to labor \u201cbargaining power,\u201d you can explain a shift from increasing to declining income equality (higher equality expressed as a higher wage share) by a corresponding shift from a period of rapidly increasing per capita resource consumption to one of constant per capita resource consumption. ",
"Path": "P",
"TextSize": 9
}
The following, which assumes the input is a valid JSON array, will merge every .Text with at most one successor, but can easily be modified to merge multiple .Text values together as shown in Part 2 below.
Part 1
# input and output: an array of {Text, Path, TextSize} objects.
# Attempt to merge the .Text of the $i-th object with the .Text of a subsequent compatible object.
# If a merge is successful, the subsequent object is removed.
def attempt_to_merge_next($i):
.[$i].TextSize as $class
| first( (range($i+1; length) as $j | select(.[$j].TextSize == $class) | $j) // null) as $j
| if $j then .[$i].Text += .[$j].Text | del(.[$j])
else .
end;
reduce range(0; length) as $i (.;
if .[$i] == null then .
elif .[$i].Text|test("[,.?:;]\\s*$")|not
then attempt_to_merge_next($i)
else .
end)
Part 2
Using the above def:
def merge:
def m($i):
if $i >= length then .
elif .[$i].Text|test("[,.?:;]\\s*$")|not
then attempt_to_merge_next($i) as $x
| if ($x|length) == length then m($i+1)
else $x|m($i)
end
else m($i+1)
end ;
m(0);
merge
I want to implement an array from the received data that will contain objects with the same identifiers.
pool.query('SELECT * FROM columnslist INNER JOIN tableslist ON columnKey = keyTable', (error, result) => {
response.send(result);
});
After executing this code, I get the expected result:
[
{
"id": 1,
"columnOne": "qeqeqq qq wq qw wwqqwdqd",
"columnTwo": "qdqdqdq wdqdqwqwd",
"columnThree": "dqwdq qw qqsvds",
"columnFour": "svsdvsxcvscsv svd ds",
"columnFive": "sdvsdvsdvs ds sdd",
"columnKey": 1,
"keyTable": 1,
"name": "Test"
},
{
"id": 3,
"columnOne": "qdqwdwq",
"columnTwo": "dqdqd",
"columnThree": "qdqdwq",
"columnFour": "wdqwdqwq",
"columnFive": "wdqdqw",
"columnKey": 2,
"keyTable": 2,
"name": "Qqeqeqeq"
},
{
"id": 4,
"columnOne": "qdqwdwq",
"columnTwo": "dqdqd",
"columnThree": "qdqdwq",
"columnFour": "wdqwdqwq",
"columnFive": "wdqdqw",
"columnKey": 2,
"keyTable": 2,
"name": "Qqeqeqeq"
}
]
Tell me how you can implement or process the response to get this result? I need an array to be created in the array, as I wrote earlier, with the same identifiers:
[
[{"keyTable": 1,"name": "Test"...}],
[{"columnKey": 2,"keyTable": 2...}, [{"columnKey": 2,"keyTable": 2...}]
]
Thanks to!
You can typecast objects into an associative array like this
$array = (array) $yourObject;
In your case
$array = (array) $result;
This will help out you
You can use JSON_ARRAYAGG() AND GROUP BY, for your example the query is :
SELECT ALL
columnKey, keyTable,
JSON_ARRAYAGG(JSON_OBJECT(
'id', id,
'columnOne', columnOne,
'columnTwo', columnTwo,
'columnThree', columnThree,
'columnFour', columnFour,
'columnFive', columnFive,
'columnKey', columnKey,
'keyTable', keyTable,
'name', name
)) AS jsonData
FROM columnslist
INNER JOIN tableslist ON columnKey = keyTable
GROUP BY columnKey, keyTable
https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html#function_json-arrayagg
I have a search field and I want to add some complex functionality using underscore.js.
Sometimes users search for a whole "sentence" like "Samsung galaxy A20s ultra". I want to filter JSON data using any of the words in the search string and sort by results that contain more of the words.
Sample data:
var phones = [
{name: "Samsung A10s", id: 845},
{name: "Samsung galaxy", id: 839},
{name: "Nokia 7", id: 814},
{name: "Samsung S20s ultra", id: 514},
{name: "Apple iphone ultra", id: 159},
{name: "LG S20", id: 854}];
What is the best way to do it in underscore?
In this answer, I'll be building a function searchByRelevance that takes two arguments:
a JSON array of phones with name and id properties, and
a search string,
and which returns a new JSON array, with only the phones of which the name has at least one word in common with the search string, sorted such that the phones with the most common words come first.
Let's first identify all the subtasks and how you could implement them with Underscore. Once we've done that, we can compose them into the searchByRelevance function. In the end, I'll also spend some words on how we might determine what is "best".
Subtasks
Split a string into words
You don't need Underscore for this. Strings have a builtin split method:
"Samsung galaxy A20s ultra".split(' ')
// [ 'Samsung', 'galaxy', 'A20s', 'ultra' ]
However, if you have a whole array of strings and you want to split them all, so you get an array of arrays, you can do so using _.invoke:
_.invoke([
'Samsung A10s',
'Samsung galaxy',
'Nokia 7',
'Samsung S20s ultra',
'Apple iphone ultra',
'LG S20'
], 'split', ' ')
// [ [ 'Samsung', 'A10s' ],
// [ 'Samsung', 'galaxy' ],
// [ 'Nokia', '7' ],
// [ 'Samsung', 'S20s', 'ultra' ],
// [ 'Apple', 'iphone', 'ultra' ],
// [ 'LG', 'S20' ] ]
Find the words that two arrays have in common
If you have two arrays of words,
var words1 = [ 'Samsung', 'galaxy', 'A20s', 'ultra' ],
words2 = [ 'Apple', 'iphone', 'ultra' ];
then you can get a new array with just the words they have in common using _.intersection:
_.intersection(words1, words2) // [ 'ultra' ]
Count the number of words in an array
This is again something you don't need Underscore for:
[ 'Samsung', 'A10s' ].length // 2
But if you have multiple arrays of words, you can get the word counts for all of them using _.map:
_.map([
[ 'Samsung', 'A10s' ],
[ 'Samsung', 'galaxy' ],
[ 'Nokia', '7' ],
[ 'Samsung', 'S20s', 'ultra' ],
[ 'Apple', 'iphone', 'ultra' ],
[ 'LG', 'S20' ]
], 'length')
// [ 2, 2, 2, 3, 3, 2 ]
Sort an array by some criterion
_.sortBy does this. For example, the phones data by id:
_.sortBy(phones, 'id')
// [ { name: 'Apple iphone ultra', id: 159 },
// { name: 'Samsung S20s ultra', id: 514 },
// { name: 'Nokia 7', id: 814 },
// { name: 'Samsung galaxy', id: 839 },
// { name: 'Samsung A10s', id: 845 },
// { name: 'LG S20', id: 854 } ]
To sort descending instead of ascending, you can first sort ascending and then reverse the result using the builtin reverse method:
_.sortBy(phones, 'id').reverse()
// [ { name: 'LG S20', id: 854 },
// { name: 'Samsung A10s', id: 845 },
// { name: 'Samsung galaxy', id: 839 },
// { name: 'Nokia 7', id: 814 },
// { name: 'Samsung S20s ultra', id: 514 },
// { name: 'Apple iphone ultra', id: 159 } ]
You can also pass a criterion function. The function receives the current item and it can do anything, as long as it returns a string or number to use as the rank of the current item. For example, this sorts the phones by the last letter of the name (using _.last):
_.sortBy(phones, function(phone) { return _.last(phone.name); })
// [ { name: 'LG S20', id: 854 },
// { name: 'Nokia 7', id: 814 },
// { name: 'Samsung S20s ultra', id: 514 },
// { name: 'Apple iphone ultra', id: 159 },
// { name: 'Samsung A10s', id: 845 },
// { name: 'Samsung galaxy', id: 839 } ]
Group the elements of an array by some criterion
Instead of sorting directly, we might also first only group the items by a criterion. Here's grouping the phones by the first letter of the name, using _.groupBy and _.first:
_.groupBy(phones, function(phone) { return _.first(phone.name); })
// { S: [ { name: 'Samsung A10s', id: 845 },
// { name: 'Samsung galaxy', id: 839 },
// { name: 'Samsung S20s ultra', id: 514 } ],
// N: [ { name: 'Nokia 7', id: 814 } ],
// A: [ { name: 'Apple iphone ultra', id: 159 } ],
// L: [ { name: 'LG S20', id: 854 } ] }
We have seen that we can pass keys to sort or group by, or a function that returns something to use as a criterion. There is a third option which we can use here instead of the function above:
_.groupBy(phones, ['name', 0])
// { S: [ { name: 'Samsung A10s', id: 845 },
// { name: 'Samsung galaxy', id: 839 },
// { name: 'Samsung S20s ultra', id: 514 } ],
// N: [ { name: 'Nokia 7', id: 814 } ],
// A: [ { name: 'Apple iphone ultra', id: 159 } ],
// L: [ { name: 'LG S20', id: 854 } ] }
Getting the keys of an object
This is what _.keys is for:
_.keys({name: "Samsung A10s", id: 845}) // [ 'name', 'id' ]
You can also do this with the standard Object.keys. _.keys works in old environments where Object.keys doesn't. Otherwise, they are interchangeable.
Turn an array of things into other things
We have previously seen the use of _.map to get the lengths of multiple arrays of words. In general, it takes an array or object and something that you want to be done with each element of that array or object, and it will return an array with the results:
_.map(phones, 'id')
// [ 845, 839, 814, 514, 159, 854 ]
_.map(phones, ['name', 0])
// [ 'S', 'S', 'N', 'S', 'A', 'L' ]
_.map(phones, function(phone) { return _.last(phone.name); })
// [ 's', 'y', '7', 'a', 'a', '0' ]
Note the similarity with _.sortBy and _.groupBy. This is a general pattern in Underscore: you have a collection of something and you want to do something with each element, in order to arrive at some sort of result. The thing you want to do with each element is called the "iteratee". Underscore has a function that ensures you can use the same iteratee shorthands in all functions that work with an iteratee: _.iteratee.
Sometimes you may want to do something with each element of a collection and combine the results in a way that is different from what _.map, _.sortBy and the other Underscore functions already do. In this case, you can use _.reduce, the most general function of them all. For example, here's how we can create a mixture of the names of the phones, by taking the first letter of the name of the first phone, the second letter of the name of the second phone, and so forth:
_.reduce(phones, function(memo, phone, index) {
return memo + phone.name[index];
}, '')
// 'Sakse0'
The function that we pass to _.reduce is invoked for each phone. memo is the result that we've built so far. The result of the function is used as the new memo for the next phone that we process. In this way, we build our string one phone at a time. The last argument to _.reduce, '' in this case, sets the initial value of memo so we have something to start with.
Concatenate multiple arrays into a single one
For this we have _.flatten:
_.flatten([
[ 'Samsung', 'A10s' ],
[ 'Samsung', 'galaxy' ],
[ 'Nokia', '7' ],
[ 'Samsung', 'S20s', 'ultra' ],
[ 'Apple', 'iphone', 'ultra' ],
[ 'LG', 'S20' ]
])
// [ 'Samsung', 'A10s', 'Samsung', 'galaxy', 'Nokia', '7',
// 'Samsung', 'S20s', 'ultra', 'Apple', 'iphone', 'ultra',
// 'LG', 'S20' ]
Putting it all together
We have an array of phones and a search string, we want to somehow compare each of those phones to the search string, and finally we want to combine the results of that so we get the phones by relevance. Let's start with the middle part.
Does "each of those phones" ring a bell? We are creating an iteratee! We want it to take a phone as its argument, and we want it to return the number of words that its name has in common with the search string. This function will do that:
function relevance(phone) {
return _.intersection(phone.name.split(' '), searchTerms).length;
}
This assumes that there is a searchTerms variable defined outside of the relevance function. It has to be an array with the words in the search string. We'll deal with this in a moment; let's address how to combine our results first.
While there are many ways possible, I think the following is quite elegant. I start with grouping the phones by relevance,
_.groupBy(phones, relevance)
but I want to omit the group of phones that have zero words in common with the search string:
var groups = _.omit(_.groupBy(phones, relevance), '0');
Note that I'm omitting the string key '0', not the number key 0, because the result of _.groupBy is an object, and the keys of an object are always strings.
Now we need to order the remaining groups by the number of matching words. We know the number of matching words for each group by taking the keys of our groups,
_.keys(groups)
and we can sort these ascending first, but we must take care to cast them back to numbers, so that we will sort 2 before 10 (numerical comparison) instead of '10' before '2' (lexicographical comparison):
_.sortBy(_.keys(groups), Number)
then we can reverse this in order to arrive at the final order of our groups.
var tiers = _.sortBy(_.keys(groups), Number).reverse();
Now we just need to transform this sorted array of keys into an array with the actual groups of phones. To do this, we can use _.map and _.propertyOf:
_.map(tiers, _.propertyOf(groups))
Finally, we only need to flatten this into one big array, in order to have our search results by relevance.
_.flatten(_.map(tiers, _.propertyOf(groups)))
Let's wrap all of this up into our searchByRelevance function. Remember that we still needed to define searchTerms outside of our relevance iteratee:
function searchByRelevance(phones, searchString) {
var searchTerms = searchString.split(' ');
function relevance(phone) {
return _.intersection(phone.name.split(' '), searchTerms).length;
}
var groups = _.omit(_.groupBy(phones, relevance), '0');
var tiers = _.sortBy(_.keys(groups), Number).reverse();
return _.flatten(_.map(tiers, _.propertyOf(groups)));
}
Now put it to the test!
searchByRelevance(phones, 'Samsung galaxy A20s ultra')
// [ { name: 'Samsung galaxy', id: 839 },
// { name: 'Samsung S20s ultra', id: 514 },
// { name: 'Samsung A10s', id: 845 },
// { name: 'Apple iphone ultra', id: 159 } ]
What is "best"?
If you measure "goodness" by the number of lines of code, then less code is generally better. We implemented searchByRelevance above in just eight lines of code, so that seems pretty good.
It is, however, a bit dense. The number of lines increases, but the readability improves a bit, if we use chaining:
function searchByRelevance(phones, searchString) {
var searchTerms = searchString.split(' ');
function relevance(phone) {
return _.intersection(phone.name.split(' '), searchTerms).length;
}
var groups = _.chain(phones)
.groupBy(relevance)
.omit('0');
return groups.keys()
.sortBy(Number)
.reverse()
.map(_.propertyOf(groups.value()))
.flatten()
.value();
}
Yet another dimension of "goodness" is performance. Could searchByRelevance be faster? To get a sense of this, we usually take the smallest and most frequent operation, and we calculate how often we'll be executing that operation for a given size of input.
The main thing we'll be doing a lot in searchByRelevance, is comparing words. This is not the smallest operation, because comparing words consists of comparing letters, but because words in English tend to be short, we can pretend for now that comparing two words is our smallest and most executed operation. This makes the calculations a bit easier.
For each phone, we will be comparing each word in its name with each word in our search string. If we have 100 phones, and the average phone name has 3 words, and the search string has 5 words, then we will be making 100 * 3 * 5 = 1500 word comparisons.
Computers are fast, so 1500 is nothing. Generally, if the number of times you execute your smallest step remains under 100000 (100k), you probably won't even notice a delay unless that smallest step is very expensive.
However, the number of word comparisons will grow quite explosively with larger inputs. If we have 20000 (20k) phones, 5 words in the average name and a search string of 10 words, we are already making a million word comparisons. That could mean staring at your screen for a few seconds before the results come in.
Can we write a variant of searchByRelevance that can search 20k phones with long names in an eyeblink? Yes, and in fact we can probably also do a million or more! I won't go into the details line by line, but we can get much better speed by using appropriate lookup structures:
// lookup table by word in the name
function createIndex(phones) {
return _.reduce(phones, function(lookup, phone) {
_.each(phone.name.split(' '), function(word) {
var matchingPhones = (lookup[word] || []);
matchingPhones.push(phone.id);
lookup[word] = matchingPhones;
});
return lookup;
}, {});
}
// search using lookup tables
function searchByRelevance(phonesById, idsByWord, searchString) {
var groups = _.chain(searchString.split(' '))
.map(_.propertyOf(idsByWord))
.compact()
.flatten()
.countBy()
.pairs()
.groupBy('1');
return groups.keys()
.sortBy(Number)
.reverse()
.map(_.propertyOf(groups.value()))
.flatten(true) // only one level of flattening
.map('0')
.map(_.propertyOf(phonesById))
.value();
}
To use this, we create the lookup tables once, then reuse them for each search. We need to recreate the lookup tables only if the JSON data of phones change.
var phonesById = _.indexBy(phones);
var idsByWord = createIndex(phones);
searchByRelevance(phonesById, idsByWord, 'Samsung galaxy A20s ultra')
// [ { name: 'Samsung galaxy', id: 839 },
// { name: 'Samsung S20s ultra', id: 514 },
// { name: 'Samsung A10s', id: 845 },
// { name: 'Apple iphone ultra', id: 159 } ]
searchByRelevance(phonesById, idsByWord, 'Apple')
// [ { name: 'Apple iphone ultra', id: 159 } ]
To appreciate how much faster this is, let's count the smallest operations again. In createIndex, the smallest most frequent operation is storing an association between a word and the id of a phone. We do this once for each phone, for each word in its name. In searchByRelevance, the smallest most frequent operation is incrementing the relevance of a given phone in the countBy step. We do this once for each word in the search string, for each phone that matches that word.
We can estimate the number of matching phones for a given search string if we make some reasonable assumptions. The most frequent words in the phone names are probably the brands, such as "Samsung" and "Apple". Since there are at least ten brands, we can assume that the number of phones that match a given search term is generally less than 10% of the total number of phones. So the time it takes to execute one search is the number of words in the search string, times the number of phones, times 10% (i.e., divided by 10).
So if we have 100 phones with on average 3 words in the name, then indexing takes 100 * 3 = 300 times storing an association in the idsByWord lookup table. Performing a search with 5 words in the search string takes only 5 * 100 * 10% = 50 relevance increments. This is already much faster than the 1500 word comparisons we needed without lookup tables, although the human behind the computer will not notice the difference in this case.
The speed advantage of the approach with the lookup table further increases with larger inputs:
┌───────────────────┬───────┬────────┬───────┐
│ Problem size │ Small │ Medium │ Large │
├───────────────────┼───────┼────────┼───────┤
│ phones │ 100 │ 20k │ 1M │
│ words per name │ 3 │ 5 │ 8 │
│ search terms │ 5 │ 10 │ 15 │
├───────────────────┼───────┼────────┼───────┤
│ w/o lookup tables │ │ │ │
│ word comparisons │ 1500 │ 1M │ 120M │
├───────────────────┼───────┼────────┼───────┤
│ w/ lookup tables │ │ │ │
│ associations │ 300 │ 100k │ 8M │
│ increments │ 50 │ 20k │ 1.5M │
└───────────────────┴───────┴────────┴───────┘
This is, in fact, still underestimating the speed advantage, since the percentage of phones that match a given search term is likely to drop as the number of phones increases.
Lookup tables make searching much faster. But is it better? As I said before, for small problem sizes, the speed difference will not be noticable. A disadvantage of the lookup tables is that this requires more code, which makes it a bit harder understand, as well as taking more effort to maintain. It also requires a lookup table as large as the number of associations, which means we will be using much more additional memory than before.
To conclude, what is "best" always depends on a tradeoff between different constraints, such as code size, speed and memory usage. It is up to you to decide how you want to weigh these constraints relative to each other.