while searching in elasticsearch,if we search for a word "brother" which is present more than once in array of json - json

then if we write a nested query to get brother then will we get the both occurences present in the array or json or only single occurence?
for example
if we are querying to get the word "brother" and its present more than twice in the array of json
will it show only one occurence of brother or multiple occurences?
hence we are handling multiple files we need the file id in which the word is present and if single word is present in a file more than once is it possible to get the word in that single file more than once?
i tried to get the single file id more than once because the single file contained the word more than once in multiple jsons but im getting the file id only once ...

If you are using kibana sql, you will search any occurence in each elasticsearch doc. If you search by "brother" you will get 1 occurence from each doc contains the word "brother". If you have 3 docs and once has 3 times the word you are searching for and the other 2 have the word once each, you will get 3 hits.

We have to use a for loop in any programming language and we will get only max 3 hits in inner hits. We have to access the json file and load it in Python for example and then use for loop and we can have upto 3 hits.by this process we can access upto 3 inner hits in a same file.
the code:-
we can fetch em in this way
for i in range(some_value):
try:
uniq_id = data["hits"]["hits"][i]["_id"]
start = data["hits"]["hits"][i]["inner_hits"]["key"]["hits"]["hits"][0]["fields"]["key.start"][0]
check = data["hits"]["hits"][i]["inner_hits"]["key"]["hits"]["total"]["value"]

Related

Power Automate Desktop - Convert a data table (with multiple rows) to JSON

I've been researching the best way to convert a data table from excel (with multiple rows) to JSON.
I found a solution on here that appears to "mostly" work, but I am not familiar with JSON to know if it's converting multiple rows correctly.
Here is the data table that I am starting with (from excel)
Here are the steps I took to convert this to JSON
Step 1: Set variable called INVObject to be empty to initialize it
Step 3: Added a For each to loop through each Data Row in the Data Table
Step 4: Added a Set Variable to set the INVObject (Custom Object) to the Data Table for each loop in the For each
Step 5: Convert the Custom Object INVObject to JSON
Results: There is one row/object with all 3 rows from the Data table on the same row
If you scroll to the right, the 2nd row eventually starts and then the 3rd row.
I was expecting to see 3 lines/rows/object to represent the 3 different rows in the Data table.
Can someone provide some insight as to if I am doing something wrong or if this is the expected results for multiple rows?
Thank You!
There is an option in Actions under Variables: 'Convert Custom Object to JSON'
https://learn.microsoft.com/en-us/power-automate/desktop-flows/actions-reference/variables#convertcustomobjecttojson

Extract comma-separated values from JSON Records within a List with PowerQuery

As part of a tool I am creating for my team I am connecting to an internal web service via PowerQuery.
The web service returns nested JSON, and I have trouble parsing the JSON data to the format I am looking for. Specifically, I have a problem with extracting the content of records in a column to a comma separated list.
The data
As you can see, the data contains details related to a specific "race" (race_id). What I want to focus on is the information in the driver_codes which is a List of Records. The amount of records varies from 0 to 4 and each record is structured as id: 50000 (50000 could be any 5 digit number). So it could be:
id: 10000
id: 20000
id: 30000
As requested, an example snippet of the raw JSON:
<race>
<race_id>ABC123445</race_id>
<begin_time>2018-03-23T00:00:00Z</begin_time>
<vehicle_id>gokart_11</vehicle_id>
<driver_code>
<id>90200</id>
</driver_code>
<driver_code>
<id>90500</id>
</driver_code>
</race>
I want it to be structured as:
10000,20000,30000
The problem
When I choose "Extract values" on the column with the list, then I get the following message:
Expression.Error: We cannot convert a value of type Record to type
Text.
If I instead choose "Expand to new rows", then duplicate rows are created for each unique driver code. I now have several rows per unique race_id, but what I wanted was one row per unique race_id and a concatenated list of driver codes.
What I have tried
I have tried grouping the data by the race_id, but the operations allowed when grouping data do not include concatenating rows.
I have also tried unpivoting the column, but that leaves me with the same problem: I still get multiple rows.
I have googled (and Stack Overflowed) this issue extensively without luck. It might be that I am using the wrong keywords, however, so I apologize if a duplicate exists.
UPDATE: What I have tried based on the answers so far
I tried Alexis Olson's excellent and very detailed method, but I end up with the following error:
Expression.Error: We cannot convert the value "id" to type Number. Details:
Value=id
Type=Type
The error comes from using either of these lines of M code (one with a List.Transform and one without):
= Table.Group(#"Renamed Columns", {"race_id", "begin_time", "vehicle_id"},
{{"DriverCodes", each Text.Combine([driver_code][id], ","), type text}})
= Table.Group(#"Renamed Columns", {"race_id", "begin_time", "vehicle_id"},
{{"DriverCodes", each Text.Combine(List.Transform([driver_code][id], each Number.ToText(_)), ","), type text}})
NB: if I do not write [driver_code][id] but only [id] then I get another error saying that column [id] does not exist.
Here's the JSON equivalent to the XML example you gave:
{"race": {
"race_id": "ABC123445",
"begin_time": "2018-03-23T00:00:00Z",
"vehicle_id": "gokart_11",
"driver_code": [
{ "id": "90200" },
{ "id": "90500" }
]}}
If you load this into the query editor, convert it to a table, and expand out the Value record, you'll have a table that looks like this:
At this point, choose Expand to New Rows, and then expand the id column so that your table looks like this:
At this point, you can apply the trick #mccard suggested. Group by the first columns and aggregate over the last using, say, max.
This last step produces M code like this:
= Table.Group(#"Expanded driver_code1",
{"Name", "race_id", "begin_time", "vehicle_id"},
{{"id", each List.Max([id]), type text}})
Instead of this, you want to replace List.Max with Text.Combine as follows:
= Table.Group(#"Changed Type",
{"Name", "race_id", "begin_time", "vehicle_id"},
{{"id", each Text.Combine([id], ","), type text}})
Note that if your id column is not in the text format, then this will throw an error. To fix this, insert a step before you group rows using Transform Tab > Data Type: Text to convert the type. Another options is to use List.Transform inside your Text.Combine like this:
Text.Combine(List.Transform([id], each Number.ToText(_)), ",")
Either way, you should end up with this:
An approach would be to use the Advanced Editor and change the operation done when grouping the data directly there in the code.
First, create the grouping using one of the operations available in the menu. For instance, create a column"Sum" using the Sum operation. It will give an error, but we should get the starting code to work on.
Then, open the Advanced Editor and find the code corresponding to the operation. It should be something like:
{{"Sum", each List.Sum([driver_codes]), type text}}
Change it to:
{{"driver_codes", each Text.Combine([driver_codes], ","), type text}}

NiFi : Regular Expression in ExtractText gets CSV header instead of data

I'm working on a flow where I get CSV files. I want to put the records into different directories based on the first field in the CSV record.
For ex, the CSV file would look like this
country,firstname,lastname,ssn,mob_num
US,xxxx,xxxxx,xxxxx,xxxx
UK,xxxx,xxxxx,xxxxx,xxxx
US,xxxx,xxxxx,xxxxx,xxxx
JP,xxxx,xxxxx,xxxxx,xxxx
JP,xxxx,xxxxx,xxxxx,xxxx
I want to get the field value of the first field i.e, country. Put those records into a particular directory. US records goes to US directory, UK records goes to UK directory, and so on.
The flow that I have right now is:
GetFile ----> SplitText(line split count = 1 & header line count = 1) ----> ExtractText (line = (.+)) ----> PutFile(Directory = \tmp\data\${line:getDelimitedField(1)}). I need the header file to be replicated across all the split files for a different purpose. So I need them.
The thing is, the incoming CSV file gets split into multiple flow files with the header successfully. However, the regex that I have given in ExtractText processor evaluates it against the splitted flow files' CSV header instead of the record. So instead of getting US or UK in the "line" attribute, I always get "country". So all the files go to \tmp\data\country. Help me how to resolve this.
I believe getDelimitedField will only work off a singular line and is likely not moving past the newline in your split file.
I would advocate for a slightly different approach in which you could alter your ExtractText to find the country code through a regular expression and avoid the need to include the contents of the file as an attribute.
Using a regex of ^.*\n+(\w+) will capture the first line and the first set of word characters up to the comma and place them in the attribute name you specify in capture group 1. (e.g. country.1).
I have created a template that should get the value you are looking for available at https://github.com/apiri/nifi-review-collateral/blob/master/stackoverflow/42022249/Extract_Country_From_Splits.xml

Best way to parse a big and intricated Json file with OpenRefine (or R)

I know how to parse json cells in Open refine, but this one is too tricky for me.
I've used an API to extract the calendar of 4730 AirBNB's rooms, identified by their IDs.
Here is an example of one Json file : https://fr.airbnb.com/api/v2/calendar_months?key=d306zoyjsyarp7ifhu67rjxn52tv0t20&currency=EUR&locale=fr&listing_id=4212133&month=11&year=2016&count=12&_format=with_conditions
For each ID and each day of the year from now until november 2017, i would like to extract the availability of this rooms (true or false) and its price at this day.
I can't figure out how to parse out these informations. I guess that it implies a series of nested forEach, but i can't find the right way to do this with Open Refine.
I've tried, of course,
forEach(value.parseJson().calendar_months, e, e.days)
The result is an array of arrays of dictionnaries that disrupts me.
Any help would be appreciate. If the operation is too difficult in Open Refine, a solution with R (or Python) would also be fine for me.
Rather than just creating your Project as text, and working with GREL to parse out...
The best way is just select the JSON record part that you want to work with using our visual importer wizard for JSON files and XML files (you can even use a URL pointing to a JSON file as in your example). (A video tutorial shows how here: https://www.youtube.com/watch?v=vUxdB-nl0Bw )
Select the JSON part that contains your records that you want to parse and work with (this can be any repeating part, just select one of them and OpenRefine will extract all the rest)
Limit the amount of data rows that you want to load in during creation, or leave default of all rows.
Click Create Project and now your in Rows mode. However if you think that Records mode might be better suited for context, just import the project again as JSON and then select the next outside area of the content, perhaps a larger array that contains a key field, etc. In the example, the key field would probably be the Date, and why I highlight the whole record for a given date. This way OpenRefine will have Keys for each record and Records mode lets you work with them better than Row mode.
Feel free to take this example and make it better and even more helpful for all , add it to our Wiki section on How to Use
I think you are on the right track. The output of:
forEach(value.parseJson().calendar_months, e, e.days)
is hard to read because OpenRefine and JSON both use square brackets to indicate arrays. What you are getting from this expression is an OR array containing twelve items (one for each month of the year). The items in the OR array are JSON - each one an array of days in the month.
To keep the steps manageable I'd suggest tackling it like this:
First use
forEach(value.parseJson().calendar_months,m,m.days).join("|")
You have to use 'join' because OR can't store OR arrays directly in a cell - it has to be a string.
Then use "Edit Cells->Split multi-valued cells" - this will get you 12 rows per ID, each containing a JSON expression. Now for each ID you have 12 rows in OR
Then use:
forEach(value.parseJson(),d,d).join("|")
This splits the JSON down into the individual days
Then use "Edit Cells->Split multi-valued cells" again to split the details for each day into its own cell.
Using the JSON from example URL above - this gives me 441 rows for the single ID - each contains the JSON describing the availability & price for a single day. At this point you can use the 'fill down' function on the ID column to fill in the ID for each of the rows.
You've now got some pretty easy JSON in each cell - so you can extract availability using
value.parseJson().available
etc.

Import CSV column from different file into new file

I have 2 CSV files almost identical with the following differences:
The first has a column, "date".
The second doesn't have "date" and also has 50 rows less than the 1st ("email").
They are a list of subscribers with date created. The second, however, is the updated list with subscribers who wanted to be removed, but this no longer has the date created.
Is there any way to import column "date" from 1st CSV into the 2nd CSV by making a reference to the "email" column so I can get the correct date of that subscriber?
Sorry, there seems to be not a ready made (probably an evening's worth of effort) command line tool available.
You could look at different ways, one complex way is to load it in tables, to the merge (using a select and join on the two tables) and export it back as csv.
The simplest I could think of was to use R (given that you have header names, in your CSV?):
csv1_data <- read.csv('/path/to/csv1.csv')
csv2_data <- read.csv('/path/to/csv2.csv')
merged_csv <- merge(csv1_data, csv2_data)
write.table(merged_csv,file="/path/to/merged_csv.csv",sep=",",row.names=T)
The first 2 lines load the data in R, the 3 line merges them using the default S3 method, the final line exports the result as a csv file, with the headers.
Hope this helps!