Beginner RegExp question. I have a JSON in my NiFi ExtractText and there are 2 fields I want to extract. How would I use a regex to do this?
[
{
"id": "12erf3-312331-233"
},
[
{
"id": "1234",
"id2": "1234",
"id3": "1234"
},
{
"id": "1234",
"id2": "1234",
"id3": "1234"
},
{
"id": "1234",
"id2": "1234",
"id3": "1234"
},
{
"id": "1234",
"id2": "1234",
"id3": "5555"
}
]
]
get the first id: "12erf3-312331-233" (this is a uuid)
get all the second array: [ { "id": "1234" ... "id3": "5555" } ]
To achieve what you need you may use ReplaceText processor. Have a look to the configuration below. Search Value would be (?s).*("id": )(".*").*}.*(\[.*\]).*\] and the Replacement Value: {$1$2,data:$3}. Pay attention to pick Evaluation Mode as "Entire text".
To easily check regex you can use link i have used: https://regex101.com/r/p6T2Vw/1
Related
I am looking to read a JSON file using shell script and getting Id based on the name attribute and store it in a variable without using "jq". Please suggest how it can be done. The json looks like:
{
"elements": [
{
"id": "1",
"internalId": "AA",
"name": "SampleService",
},
{
"id": "2",
"internalId": "BB",
"name": "Loan_Evaluation",
},
{
"id": "3",
"internalId": "CC",
"name": "Miniloan Service",
}
],`
}
I have a an object. I am able to sort the items by using lodash's _.orderBy().
However, in one of the scenario I have to sort by subject, which is an array of objects. Items inside the subject array are already sorted based on the name.
As subject is an array of the objects, I need to consider the first item for sorting.
[
{
"id": "1",
"name": "peter",
"subject": [
{
"id": "1",
"name": "maths"
},
{
"id": "2",
"name": "social"
}
]
},
{
"id": "2",
"name": "david",
"subject": [
{
"id": "2",
"name": "physics"
},
{
"id": "3",
"name": "science"
}
]
},
{
"id": "3",
"name": "Justin",
"subject": [
]
}
]
You can use _.get() to extract the name (or id) of the 1st item in subjects. If no item exists, _.get() will return undefined, which can be replaced with a default value. In this case, we don't want to use an empty string as a default value, since the order would change. Instead I'm checking if the value is a string, if it is I use lower case on it, if not I return it as is.
const arr = [{"id":"1","name":"peter","subject":[{"id":"1","name":"maths"},{"id":"2","name":"social"}]},{"id":"2","name":"david","subject":[{"id":"2","name":"physics"},{"id":"3","name":"science"}]},{"id":"3","name":"Justin","subject":[]}]
const result = _.orderBy(arr, o => {
const name = _.get(o, 'subject[0].name')
return _.isString(name) ? name.toLowerCase() : name
})
console.log(result)
<script src="https://cdnjs.cloudflare.com/ajax/libs/lodash.js/4.17.11/lodash.js"></script>
Use _.sortBy with a comparison/sorting function argument. Your function itself can look into the receiving arguments subject key (I think its the subject you want to compare?)
Since you have the question also tagged with ES6 here is an JS only solution via Array.sort:
let arr = [ { "id": "1", "name": "peter", "subject": [ { "id": "1", "name": "maths" }, { "id": "2", "name": "social" } ] }, { "id": "2", "name": "david", "subject": [ { "id": "2", "name": "physics" }, { "id": "3", "name": "science" } ] }, { "id": "3", "name": "Justin", "subject": [] }, ]
const result = arr.sort((a,b) =>
a.subject.length && b.subject.length
? a.subject[0].name.localeCompare(b.subject[0].name)
: a.subject.length ? -1 : 1)
console.log(result)
I have the following db:
{
"a": [{
"name": "foo",
"thing": [{
"name": "bar",
"lyrics": ["1", "2", "3"]
}]
}, {
"name": "abc",
"thing": [{
"name": "123",
"list": ["one", "two"]
}]
}]
}
I can't seem to query it correctly. These two queries return the same thing, the entire db:
db.test.find({"a.name":"abc"})
db.test.find({"a.name":"foo"})
How do I find one collection instead of the whole db?
I would expect the first query to return:
{
"name": "abc",
"thing": [{
"name": "123",
"list": ["one", "two"]
}]
}
The two queries return the same document, because both queries match it.
This is one document
[{
"a": [{
"name": "foo",
"thing": [{
"name": "bar",
"lyrics": ["1", "2", "3"]
}]
}, {
"name": "abc",
"thing": [{
"name": "123",
"list": ["one", "two"]
}]
}]
}]
This is two documents
[{
"a": [{
"name": "foo",
"thing": [{
"name": "bar",
"lyrics": ["1", "2", "3"]
}]
}]
},
{
"a": [{
"name": "abc",
"thing": [{
"name": "123",
"list": ["one", "two"]
}]
}]
}]
You can get stats on of a collection like so
db.test.stats()
"count" will tell you how many documents are there.
Edit: To add to this, in your collection "test" a document has 1 field, which is "a" and is of type array that holds objects (documents). It has 2 array elements
First
{
"name": "foo",
"thing": [{
"name": "bar",
"lyrics": ["1", "2", "3"]
}]
}
Second
{
"name": "abc",
"thing": [{
"name": "123",
"list": ["one", "two"]
}]
}
Everything inside curly braces {..}, including the braces themselves, is a one single document, i.e. the whole your database contains only one document that you receive for any matching query. To receive the desired result, you have to re-write your JSON document as an array of documents inside square braces [..].
I have a json file to store my data and I convert it to CSV to edit my data. But when i convert it to json again it all goes unconstructed. How can i convert my csv to same structure as my old json.
JSON
{
"product": [
{
"id": "item0001",
"category": "12",
"name": "Name1",
"tag": "tag1",
"more": [
{
"id": "1",
"name": "AL"
},
{
"id": "1",
"name": "BS"
}
],
"active": true
},
{
"id": "item0002",
"categoryId": "13",
"name": "Name2",
"tag": "tag2",
"size": "2",
"more": [
{
"id": "2",
"name": "DL"
},
{
"id": "2",
"name": "AS"
}
],
"active": true
}
]
}
CSV
id,categoryId,name,shortcut,more/0/optionId,more/0/price,more/1/optionId,more/1/price,active,more/2/optionId,more/2/price,spanSize
item0001,ab92d2c6-010e-4182-844d-65050e746617,Name1,Shortcut1,1,60,1,70,TRUE,,,
item0002,ab92d2c6-010e-4182-844d-65050e746617,Name2,Shortcut2,2,60,2,70,TRUE,2,2,4
You can use Miller (mlr) to convert you file both ways
https://miller.readthedocs.io/en/latest/flatten-unflatten/
first from JSON to CSV
mlr --ijson --ocsv cat test.json > test.csv
then edit CSV (Visidata is a very nice command line tool for the job)
then convert it back to CSV
mlr --icsv --ojson cat test.csv > test_v2.json
If you want to have some JSON lines structure instead, use --ojonl
I want to index & search nested json in solr. Here is my json code
{
"id": "44444",
"headline": "testing US",
"generaltags": [
{
"type": "person",
"name": "Jayalalitha",
"relevance": "0.334",
"count": 1
},
{
"type": "person",
"name": "Kumar",
"relevance": "0.234",
"count": 1
}
],
"socialtags": {
"type": "SocialTag",
"name": "US",
"importance": 2
},
"topic": {
"type": "Topic",
"name": "US",
"score": "0.936"
}
}
When I try to Index, I'm getting the error "Error parsing JSON field value. Unexpected OBJECT_START"
When we tried to use Multivalued Field & index, we couldn't able to search using the multivalued field? Its returning "Undefined Field"
Also Please advice if I need to do any changes in schema.xml file?
You are nesting child documents within your document. You need to use the proper syntax for nested child documents in JSON:
[
{
"id": "1",
"title": "Solr adds block join support",
"content_type": "parentDocument",
"_childDocuments_": [
{
"id": "2",
"comments": "SolrCloud supports it too!"
}
]
},
{
"id": "3",
"title": "Lucene and Solr 4.5 is out",
"content_type": "parentDocument",
"_childDocuments_": [
{
"id": "4",
"comments": "Lots of new features"
}
]
}
]
Have a look at this article which describes JSON child documents and block joins.
Using the format mentioned by #qux you will face "Expected: OBJECT_START but got ARRAY_START at [16]",
"code": 400
as when JSON starting with [....] will parsed as a JSON array
{
"id": "44444",
"headline": "testing US",
"generaltags": [
{
"type": "person",
"name": "Jayalalitha",
"relevance": "0.334",
"count": 1
},
{
"type": "person",
"name": "Kumar",
"relevance": "0.234",
"count": 1
}
],
"socialtags": {
"type": "SocialTag",
"name": "US",
"importance": 2
},
"topic": {
"type": "Topic",
"name": "US",
"score": "0.936"
}
}
The above format is correct.
Regarding searching. Kindly use the index to search for the elements of the JSON array.
The workaround for this can be keeping the whole JSON object inside other JSON object and the indexing it
I was suggesting to keep the whole data inside another JSON object. You can try the following way
{
"data": [
{
"id": "44444",
"headline": "testing US",
"generaltags": [
{
"type": "person",
"name": "Jayalalitha",
"relevance": "0.334",
"count": 1
},
{
"type": "person",
"name": "Kumar",
"relevance": "0.234",
"count": 1
}
],
"socialtags": {
"type": "SocialTag",
"name": "US",
"importance": 2
},
"topic": {
"type": "Topic",
"name": "US",
"score": "0.936"
}
}
]
}
see the syntax in http://yonik.com/solr-nested-objects/
$ curl http://localhost:8983/solr/demo/update?commitWithin=3000 -d '
[
{id : book1, type_s:book, title_t : "The Way of Kings", author_s : "Brandon Sanderson",
cat_s:fantasy, pubyear_i:2010, publisher_s:Tor,
_childDocuments_ : [
{ id: book1_c1, type_s:review, review_dt:"2015-01-03T14:30:00Z",
stars_i:5, author_s:yonik,
comment_t:"A great start to what looks like an epic series!"
}
,
{ id: book1_c2, type_s:review, review_dt:"2014-03-15T12:00:00Z",
stars_i:3, author_s:dan,
comment_t:"This book was too long."
}
]
}
]'
supported from solr 5.3