ArangoDB search by string field - json

I want to do a simple search by text in a specific field of a specific collection in ArangonDB. Something like this ( in SQL ):
SELECT * FROM procedures WHERE procedures.name LIKE '%hemogram%'
I need to search in a string field of object ( document? ) that is part of an array that is a field of my actuall document:
[
{
"name": "Unimed",
"procedures": [
{
"type": "Exames",
"name": "Endoscopia"
},
{
"type": "Exame",
"name": "Hemograma"
}
]
}
]
I want to retrieve, for example, all procedures that name likes a "string", searching in all documents of my clinics collection.
I've been reading about fulltext indexes but I couldn't understand how to create or how to use them.
Any help would be great!
EDIT
I almost got what I wanted. My problem is now return just the information I want.
FOR clinic IN clinics
FILTER LIKE(clinic.procedures[*].name, '%hemogram%', true)
RETURN{
clinic_name: clinic.name,
procedure: clinic.procedures
}
This returns to me all the procedures in a given clinic ( procedures is an array inside a clinic ) and not only the procedure which the field name is 'LIKE' my search string. How can I achieve this?

ArangoDB does matching in a similar way as SQL. You however have to address the field you want to execute the LIKE on:
(You have to have an object as toplevel, with the mandatory attributes _key etc.)
FOR document IN myCollection
FILTER LIKE(document.procedures.name, '%hemogram%')
RETURN document;

You might wanna rethink your data model. Your documents store two different types of entities, clinics and procedures.
You could store clinics in a clinics collection, and their procedures in a procedures collection (with optional de-duplication enabled by the separation into two collections). Then link clinics to the procedures, either by an array of procedure _ids in clinic records, or by using an edge collection to link clinic and procedure documents with edges.
If you want to keep the current data model, use the following AQL query:
FOR clinic IN clinics
FOR proc IN clinic.procedures
FILTER LIKE(proc.name, "%hemo%", true)
RETURN MERGE(
UNSET(clinic, "procedures"),
{procedure: proc}
)
It's not possible to use the abbreviated syntax ([*] operator) in your case.
The approach is to remove the procedures attribute from the document, add a new attribute procedure and the matched procedure object as value. The problem is, that multiple results will be returned if LIKE() found more than a single procedure. You could LIMIT this to 1 below FILTER, but that may not be desired.
To return a single result instead, with procedures attribute reduced to matching procedures, a sub-query is required:
FOR clinic IN clinics
LET p = (
FOR proc IN clinic.procedures
FILTER LIKE(proc.name, "%hemo%", true)
RETURN proc
)
RETURN MERGE(
clinic,
{procedures: p}
)

Related

Using json_extract in sqlite to pull data from parent and child objects

I'm starting to explore the JSON1 library for sqlite and have been so far successful in the basic queries I've created. I'm now looking to create a more complicated query that pulls data from multiple levels.
Here's the example JSON object I'm starting with (and most of the data is very similar).
{
"height": 140.0,
"id": "cp",
"label": {
"bind": "cp_label"
},
"type": "color_picker",
"user_data": {
"my_property": 2
},
"uuid": "948cb959-74df-4af8-9e9c-c3cb53ac9915",
"value": {
"bind": "cp_color"
},
"width": 200.0
}
This json object is buried about seven levels deep in a json structure and I pulled it from the larger json construct using an sql statement like this:
SELECT value FROM forms, json_tree(forms.formJSON, '$.root')
WHERE type = 'object'
AND json_extract(value, '$.id') = #sControlID
// In this example, #sControlID is a variable that represents the `id` value we're looking for, which is 'cp'
But what I really need to pull from this object are the following:
the value from key type ("color_picker" in this example)
the values from keys bind ("cp_color" and "cp_label" in this example)
the keys value and label (which have values of {"bind":"<string>"} in this example)
For that last item, the key name (value and label in this case) can be any number of keywords, but no matter the keyword, the value will be an object of the form {"bind":"<some_string>"}. Also, there could be multiple keys that have a bind object associated with them, and I'd need to return all of them.
For the first two items, the keywords will always be type and bind.
With the json example above, I'd ideally like to retrieve two rows:
type key value
color_picker value cp_color
color_picker label cp_label
When I use json_extract methods, I end up retrieving the object {"bind":"cp_color"} from the json_tree table, but I also need to retrieve the data from the parent object. I feel like I need to do some kind of union, but my attempts have so far been unsuccessful. Any ideas here?
Note: if the {"bind":"<string>"} object doesn't exist as a child of the parent object, I don't want any rows returned.
Well, I was on the right track and eventually figured out it. I created a separate query for each of the items I was looking for, then INNER JOINed all the json_tree tables from each of the queries to have all the required fields available. Then I json_extracted the required data from each of the json fields I needed data from. In the end, it gave me exactly what I was looking for, though I'm sure it could be written more efficiently.
For anyone interested, this is what hte final query ended up looking like:
SELECT IFNULL(json_extract(parent.value, '$.type'), '_window_'), child.key, json_extract(child.value, '$.bind') FROM (SELECT json_tree.* FROM nui_forms, json_tree(nui_forms.formJSON, '$') WHERE type = 'object' AND json_extract(nui_forms.formJSON, '$.id') = #sWindowID) parent INNER JOIN (SELECT json_tree.* FROM nui_forms, json_tree(nui_forms.formJSON, '$') WHERE type = 'object' AND json_extract(value, '$.bind') != 'NULL' AND json_extract(nui_forms.formJSON, '$.id') = #sWindowID) child ON child.parent = parent.id;
If you have any tips on reducing its complexity, feel free to comment!

Creating nodes and relations from JSON (dynamically)

I've got a couple hundred JSONs in a structure like the following example:
{
"JsonExport": [
{
"entities": [
{
"identity": "ENTITY_001",
"surname": "SMIT",
"entityLocationRelation": [
{
"parentIdentification": "PARENT_ENTITY_001",
"typeRelation": "SEEN_AT",
"locationIdentity": "LOCATION_001"
},
{
"parentIdentification": "PARENT_ENTITY_001",
"typeRelation": "SEEN_AT",
"locationIdentity": "LOCATION_002"
}
],
"entityEntityRelation": [
{
"parentIdentification": "PARENT_ENTITY_001",
"typeRelation": "FRIENDS_WITH",
"childIdentification": "ENTITY_002"
}
]
},
{
"identity": "ENTITY_002",
"surname": "JACKSON",
"entityLocationRelation": [
{
"parentIdentification": "PARENT_ENTITY_002",
"typeRelation": "SEEN_AT",
"locationIdentity": "LOCATION_001"
}
]
},
{
"identity": "ENTITY_003",
"surname": "JOHNSON"
}
],
"identification": "REGISTRATION_001",
"locations": [
{
"city": "LONDON",
"identity": "LOCATION_001"
},
{
"city": "PARIS",
"identity": "LOCATION_002"
}
]
}
]
}
With these JSON's, I want to make a graph consisting of the following nodes: Registration, Entity and Location. This part I've figured out and made the following:
WITH "file:///example.json" AS json_file
CALL apoc.load.json(json_file,"$.JsonExport.*" ) YIELD value AS data
MERGE(r:Registration {id:data.identification})
WITH json_file
CALL apoc.load.json(json_file,"$.JsonExport..locations.*" ) YIELD value AS locations
MERGE(l:Locations{identity:locations.identity, name:locations.city})
WITH json_file
CALL apoc.load.json(json_file,"$.JsonExport..entities.*" ) YIELD value AS entities
MERGE(e:Entities {name:entities.surname, identity:entities.identity})
All the entities and locations should have a relation with the registration. I thought I could do this by using the following code:
MERGE (e)-[:REGISTERED_ON]->(r)
MERGE (l)-[:REGISTERED_ON]->(r)
However this code doesn’t give the desired output. It creates extra "empty" nodes and doesn't connect to the registration node. So the first question is: How do I connect the location and entities nodes to the registration node. And in light of the other JSON's, the entities and locations should only be linked to the specific registration.
Furthermore, I would like to make the entity -> location relation and the entity - entity relation and use the given type of relation (SEEN_AT or FRIENDS_WITH) as label for the given relation. How can this be done? I'm kind of lost at this point and don’t see how to solve this. If someone could guide me into the right direction I would be much obliged.
Variable names (like e and r) are not stored in the DB, and are bound to values only within individual queries. MERGE on a pattern with an unbound variable will just create the entire pattern (including creating an empty node for unbound node variables).
When you MERGE a node, you should only specify the unique identifying property for that node, to avoid duplicates. Any other properties you want to set at the time of creation should be set using ON CREATE SET.
It is inefficient to parse through the JSON data 3 times to get different areas of the data. And it is especially inefficient the way your query was doing it, since each subsequent CALL/MERGE group of clauses would be done multiple times (since every previous CALL produces multiple rows, and the number of rows increases multiplicative). You can use aggregation to get around that, but it is unnecessary in your case, since you can just do the entire query in a single pass through the JSON data.
This may work for you:
CALL apoc.load.json(json_file,"$.JsonExport.*" ) YIELD value AS data
MERGE(r:Registration {id:data.identification})
FOREACH(ent IN data.entities |
MERGE (e:Entities {identity: ent.identity})
ON CREATE SET e.name = ent.surname
MERGE (e)-[:REGISTERED_ON]->(r)
FOREACH(loc1 IN ent.entityLocationRelation |
MERGE (l1:Locations {identity: loc1.locationIdentity})
MERGE (e)-[:SEEN_AT]->(l1))
FOREACH(ent2 IN ent.entityEntityRelation |
MERGE (e2:Entities {identity: ent2.childIdentification})
MERGE (e)-[:FRIENDS_WITH]->(e2))
)
FOREACH(loc IN data.locations |
MERGE (l:Locations{identity:loc.identity})
ON CREATE SET l.name = loc.city
MERGE (l)-[:REGISTERED_ON]->(r)
)
For simplicity, it hard-codes the FRIENDS_WITH and REGISTERED_ON relationship types, as MERGE only supports hard-coded relationship types.
So playing with neo4j/cyper I've learned some new stuff and came to another solution for the problem. Based on the given example data, the following can create the nodes and edges dynamically.
WITH "file:///example.json" AS json_file
CALL apoc.load.json(json_file,"$.JsonExport.*" ) YIELD value AS data
CALL apoc.merge.node(['Registration'], {id:data.identification}, {},{}) YIELD node AS vReg
UNWIND data.entities AS ent
CALL apoc.merge.node(['Person'], {id:ent.identity}, {}, {id:ent.identity, surname:ent.surname}) YIELD node AS vPer1
UNWIND ent.entityEntityRelation AS entRel
CALL apoc.merge.node(['Person'],{id:entRel.childIdentification},{id:entRel.childIdentification},{}) YIELD node AS vPer2
CALL apoc.merge.relationship(vPer1, entRel.typeRelation, {},{},vPer2) YIELD rel AS ePer
UNWIND data.locations AS loc
CALL apoc.merge.node(['Location'], {id:loc.identity}, {name:loc.city}) YIELD node AS vLoc
UNWIND ent.entityLocationRelation AS locRel
CALL apoc.merge.relationship(vPer1, locRel.typeRelation, {},{},vLoc) YIELD rel AS eLoc
CALL apoc.merge.relationship(vLoc, "REGISTERED_ON", {},{},vReg) YIELD rel AS eReg1
CALL apoc.merge.relationship(vPer1, "REGISTERED_ON", {},{},vReg) YIELD rel AS eReg2
CALL apoc.merge.relationship(vPer2, "REGISTERED_ON", {},{},vReg) YIELD rel AS eReg3
RETURN vPer1,vPer2, vReg, vLoc, eLoc, eReg1, eReg2, eReg3

is there a way to write u-sql queries without using EXTRACT

I have a metadata activity output which is a json of blobs in my container. I want to input these names into my foreach activity where some u-sql query is performed on the blob as per the file name. Is it possible?
You need to include either a SELECT or an EXTRACT. Since you are pulling from files, you are going to want to use EXTRACT.
If I understand your question correctly, you want to run different U-SQL scripts based on the file name.
There are a couple ways to do this:
1) use If conditions in Data Factory to call different U-SQL scripts based on the file name. Nesting the if statements will allow you to have more than two options. There are several string manipulation functions to help you with this. Say one path is #item.Contains('a').
{
"name": "<Name of the activity>",
"type": "IfCondition",
"typeProperties": {
"expression": {
"value": "#item() == <file name>",
"type": "Expression"
}
"ifTrueActivities": [
{
"<U-SQL script = 1>"
}
],
"ifFalseActivities": [
{
"<U-SQL script 2>"
}
]
}
}
2) The second option is to use a single U-SQL script and do the split from there. Again, string manipulation functions can help via pattern matching. There is some advantage to this as far as organization goes as you can store the unique scripts in stored procedures and the U-SQL script would simply check the file name passed in and call the relevant stored proc.
//This would be added by data factory
DECLARE #fileName = "/Samples/Data/SearchLog.tsv";
IF #fileName == "/Samples/Data/SearchLog.tsv"
THEN
#searchlog =
EXTRACT UserId int,
Start DateTime,
Region string,
Query string,
Duration int?,
Urls string,
ClickedUrls string
FROM "/Samples/Data/SearchLog.tsv"
USING Extractors.Tsv();
OUTPUT #searchlog
TO #fileName
USING Outputters.Csv();
ELSE
#searchlog =
EXTRACT UserId int,
Start DateTime,
Region string,
Query string,
Duration int?,
Urls string,
ClickedUrls string
FROM #fileName
USING Extractors.Tsv();
OUTPUT #searchlog
TO "/output/SearchLogResult1.csv"
USING Outputters.Csv();
END;
Something to think about is that Data Lake Analytics is going to be more efficient if you can combine multiple files into one statement. You can have multiple EXTRACT and OUTPUT statements. I would encourage you to explore whether or not you could use pattern matching in your EXTRACT statements to split the U-SQL processing without needing the foreach loop in data factory.

Returning MySQL data as an OBJECT rather than an ARRAY (Knex)

Is there a way to get the output of a MySQL query to list rows in the following structure
{
1:{voo:bar,doo:dar},
2:{voo:mar,doo:har}
}
as opposed to
[
{id:1,voo:bar,doo:dar},
{id:2,voo:mar,doo:har}
]
which I then have to loop through to create the desired object?
I should add that within each row I am also concatenating results to form an object, and from what I've experimented with you can't group_concatenate inside a group_concatenation. As follows:
knex('table').select(
'table.id',
'table.name',
knex.raw(
`CONCAT("{", GROUP_CONCAT(DISTINCT
'"',table.voo,'"',':','"',table.doo,'"'),
"}") AS object`
)
.groupBy('table.id')
Could GROUP BY be leveraged in any way to achieve this? Generally I'm inexperienced at SQL and don't know what's possible and what's not.

How do I search for a specific string in a JSON Postgres data type column?

I have a column named params in a table named reports which contains JSON.
I need to find which rows contain the text 'authVar' anywhere in the JSON array. I don't know the path or level in which the text could appear.
I want to just search through the JSON with a standard like operator.
Something like:
SELECT * FROM reports
WHERE params LIKE '%authVar%'
I have searched and googled and read the Postgres docs. I don't understand the JSON data type very well, and figure I am missing something easy.
The JSON looks something like this.
[
{
"tileId":18811,
"Params":{
"data":[
{
"name":"Week Ending",
"color":"#27B5E1",
"report":"report1",
"locations":{
"c1":0,
"c2":0,
"r1":"authVar",
"r2":66
}
}
]
}
}
]
In Postgres 11 or earlier it is possible to recursively walk through an unknown json structure, but it would be rather complex and costly. I would propose the brute force method which should work well:
select *
from reports
where params::text like '%authVar%';
-- or
-- where params::text like '%"authVar"%';
-- if you are looking for the exact value
The query is very fast but may return unexpected extra rows in cases when the searched string is a part of one of the keys.
In Postgres 12+ the recursive searching in JSONB is pretty comfortable with the new feature of jsonpath.
Find a string value containing authVar:
select *
from reports
where jsonb_path_exists(params, '$.** ? (#.type() == "string" && # like_regex "authVar")')
The jsonpath:
$.** find any value at any level (recursive processing)
? where
#.type() == "string" value is string
&& and
# like_regex "authVar" value contains 'authVar'
Or find the exact value:
select *
from reports
where jsonb_path_exists(params, '$.** ? (# == "authVar")')
Read in the documentation:
The SQL/JSON Path Language
jsonpath Type