Regex Operation to Extract html Strings

Regex Operation to Extract html Strings - html

I have the following html code:
'"height": { "#type": "QuantitativeValue", "value": "6-1" },\n
"weight": {"#type": "QuantitativeValue", "value": "195 lbs" }\n}\n'
I want to create a Regex that'll extract the height and weight values (6-1 and 195 lbs). What re expression can do this?

If you don't have anything else with the pattern "value": "", then just use:
value":\s"?(.*)"
https://regex101.com/r/clWVkg/1
If you do, then you can specify that you only want the values from height and weight caught:
(height|weight).*"value":\s"?(.*)"
https://regex101.com/r/5ShdKO/1
This will check for the word height or weight first, then ignore everything until value before doing a lazy catch all to capture the value. You should be able to extract the value by extracting the group.

Related

Regex: Remove Commas within quotes

I'm using NiFi and I have a series of JSONs that look like this:
{
"url": "RETURNED URL",
"repository_url": "RETURNED URL",
"labels_url": "RETURNED URL",
"comments_url": "RETURNED URL",
"events_url": "RETURNED URL",
"html_url": "RETURNED URL",
"id": "RETURNED_ID",
"node_id": "RETURNED id",
"number": 10,
...
"author_association": "xxxx",
"active_lock_reason": null,
"body": "text text text, text text, text text text, text, text text",
"performed_via_github_app": null
}
My focus is on the "body" attribute. Because I'm merging them into one giant JSON to convert into a csv, I need the commas within the "body" text to go away (to help with possible NLP later down the road as well). I know I can just use the replace text, but capturing the commas themselves is the part I'm struggling with. So far I have the following:
((?<="body"\s:\s").*(?=",))
Every guide I look at, though, doesn't match the commas within the quotes. Any suggestions?

You can use
(\G(?!^)|\"body\"\s*:\s*\")([^\",]*),
In case there are escape sequences in the string use
(\G(?!^)|\"body\"\s*:\s*\")([^\",\\]*(?:\\.[^\",\\]*)*),
See the regex demo (and regex demo #2), replace with $1$2.
Details:
(\G(?!^)|\"body\"\s*:\s*\") - Group 1: end of the previous match or "body", zero or more whitespaces, :, zero or more whitespaces
([^\",]*) - Group 2 ($2): any zero or more chars other than " and ,
, - a comma (to be removed/replaced).

how to extract properly when sqlite json has value as an array

I have a sqlite database and in one of the fields I have stored complete json object . I have to make some json select requests . If you see my json
the ALL key has value which is an array . We need to extract some data like all comments where "pod" field is fb . How to extract properly when sqlite json has value as an array ?
select json_extract(data,'$."json"') from datatable ; gives me entire thing . Then I do
select json_extract(data,'$."json"[0]') but i dont want to do it manually . i want to iterate .
kindly suggest some source where i can study and work on it .
MY JSON
{
"ALL": [{
"comments": "your site is awesome",
"pod": "passcode",
"originalDirectory": "case1"
},
{
"comments": "your channel is good",
"data": ["youTube"],
"pod": "library"
},
{
"comments": "you like everything",
"data": ["facebook"],
"pod": "fb"
},
{
"data": ["twitter"],
"pod": "tw",
"ALL": [{
"data": [{
"codeLevel": "3"
}],
"pod": "mo",
"pod2": "p"
}]
}
]
}
create table datatable ( path string , data json1 );
insert into datatable values("1" , json('<abovejson in a single line>'));

Simple List
Where your JSON represents a "simple" list of comments, you want something like:
select key, value
from datatable, json_each( datatable.data, '$.ALL' )
where json_extract( value, '$.pod' ) = 'fb' ;
which, using your sample data, returns:
2|{"comments":"you like everything","data":["facebook"],"pod":"fb"}
The use of json_each() returns a row for every element of the input JSON (datatable.data), starting at the path $.ALL (where $ is the top-level, and ALL is the name of your array: the path can be omitted if the top-level of the JSON object is required). In your case, this returns one row for each comment entry.
The fields of this row are documented at 4.13. The json_each() and json_tree() table-valued functions in the SQLite documentation: the two we're interested in are key (very roughly, the "row number") and value (the JSON for the current element). The latter will contain elements called comment and pod, etc..
Because we are only interested in elements where pod is equal to fb, we add a where clause, using json_extract() to get at pod (where $.pod is relative to value returned by the json_each function).
Nested List
If your JSON contains nested elements (something I didn't notice at first), then you need to use the json_tree() function instead of json_each(). Whereas the latter will only iterate over the immediate children of the node specified, json_tree() will descend recursively through all children from the node specified.
To give us some data to work with, I have augmented your test data with an extra element:
create table datatable ( path string , data json1 );
insert into datatable values("1" , json('
{
"ALL": [{
"comments": "your site is awesome",
"pod": "passcode",
"originalDirectory": "case1"
},
{
"comments": "your channel is good",
"data": ["youTube"],
"pod": "library"
},
{
"comments": "you like everything",
"data": ["facebook"],
"pod": "fb"
},
{
"data": ["twitter"],
"pod": "tw",
"ALL": [{
"data": [{
"codeLevel": "3"
}],
"pod": "mo",
"pod2": "p"
},
{
"comments": "inserted by TripeHound",
"data": ["facebook"],
"pod": "fb"
}]
}
]
}
'));
If we were to simply switch to using json_each(), then we see that a simple query (with no where clause) will return all elements of the source JSON:
select key, value
from datatable, json_tree( datatable.data, '$.ALL' ) limit 10 ;
ALL|[{"comments":"your site is awesome","pod":"passcode","originalDirectory":"case1"},{"comments":"your channel is good","data":["youTube"],"pod":"library"},{"comments":"you like everything","data":["facebook"],"pod":"fb"},{"data":["twitter"],"pod":"tw","ALL":[{"data":[{"codeLevel":"3"}],"pod":"mo","pod2":"p"},{"comments":"inserted by TripeHound","data":["facebook"],"pod":"fb"}]}]
0|{"comments":"your site is awesome","pod":"passcode","originalDirectory":"case1"}
comments|your site is awesome
pod|passcode
originalDirectory|case1
1|{"comments":"your channel is good","data":["youTube"],"pod":"library"}
comments|your channel is good
data|["youTube"]
0|youTube
pod|library
Because JSON objects are mixed in with simple values, we can no longer simply add where json_extract( value, '$.pod' ) = 'fb' because this produces errors when value does not represent an object. The simplest way around this is to look at the type values returned by json_each()/json_tree(): these will be the string object if the row represents a JSON object (see above documentation for other values).
Adding this to the where clause (and relying on "short-circuit evaluation" to prevent json_extract() being called on non-object rows), we get:
select key, value
from datatable, json_tree( datatable.data, '$.ALL' )
where type = 'object'
and json_extract( value, '$.pod' ) = 'fb' ;
which returns:
2|{"comments":"you like everything","data":["facebook"],"pod":"fb"}
1|{"comments":"inserted by TripeHound","data":["facebook"],"pod":"fb"}
If desired, we could use json_extract() to break apart the returned objects:
.mode column
.headers on
.width 30 15 5
select json_extract( value, '$.comments' ) as Comments,
json_extract( value, '$.data' ) as Data,
json_extract( value, '$.pod' ) as POD
from datatable, json_tree( datatable.data, '$.ALL' )
where type = 'object'
and json_extract( value, '$.pod' ) = 'fb' ;
Comments Data POD
------------------------------ --------------- -----
you like everything ["facebook"] fb
inserted by TripeHound ["facebook"] fb
Note: If your structure contained other objects, of different formats, it may not be sufficient to simply select for type = 'object': you may have to devise a more subtle filtering process.

Get last element of array by parsing JSON with Neo4j APOC

Short task description: I need to get the last element of an array/list of one of the fields in nested JSON, here the input JSON file:
{
"origin": [{
"label": "Alcohol drinks",
"tag": [],
"type": "string",
"xpath": []
},
{
"label": "Wine",
"tag": ["red", "white"],
"type": "string",
"xpath": ["Alcohol drinks"]
},
{
"label": "Port wine",
"tag": ["Portugal", "sweet", "strong"],
"type": "string",
"xpath": ["Alcohol drinks", "Wine"]
},
{
"label": "Sandeman Cask 33",
"tag": ["red", "expensive"],
"type": "string",
"xpath": ["Alcohol drinks", "Wine", "Port wine"]
}
]
}
I need to get the last element of "xpath" field, in order to create relationship with appropriate "label". Here is the code, which creates connection to all elements mentioned in "xpath", I need just connection to the last one:
WITH "file:///D:/project/neo_proj/input.json" AS url
CALL apoc.load.json(url) YIELD value
UNWIND value.origin as or
MERGE(label:concept{name:or.label})
ON CREATE SET label.type = or.type
FOREACH(tagName IN or.tag | MERGE(tag:concept{name:tagName})
MERGE (tag)-[r:link]-(label)
ON CREATE SET r.Weight=1
ON MATCH SET r.Weight=r.Weight+1)
FOREACH(xpathName IN or.xpath | MERGE (xpath:concept{name:xpathName})
MERGE (label)-[r:link]-(xpath))
Probably there is something like:
apoc.agg.last(or.xpath)
which returns just an array of arrays or all "xpath" from all 4 records of "origin".
I will appreciate any help, probably there some workarounds (not necessary as I proposed) to solve this issue. Thank you in advance!
N.B. All this should be done from an app, not from within Neo4j browser.

Probably the easiest way would be to split this query into two queries if you want to only take the xpath array of the last element in the origin object.
Query: 1
WITH "file:///D:/project/neo_proj/input.json" AS url
CALL apoc.load.json(url) YIELD value
UNWIND value.origin as or
MERGE(label:concept{name:or.label})
ON CREATE SET label.type = or.type
FOREACH(tagName IN or.tag | MERGE(tag:concept{name:tagName})
MERGE (tag)-[r:link]-(label)
ON CREATE SET r.Weight=1
ON MATCH SET r.Weight=r.Weight+1)
Query 2:
WITH "file:///D:/project/neo_proj/input.json" AS url
CALL apoc.load.json(url) YIELD value
WITH value.origin[-1] as or
MATCH(label:concept{name:or.label})
FOREACH(xpathName IN or.xpath | MERGE (xpath:concept{name:xpathName})
MERGE (label)-[r:link]-(xpath))
Combining these two queries into a single one feels hacky anyway and I would avoid it, but I guess you can do the following.
WITH "file:///D:/project/neo_proj/input.json" AS url
CALL apoc.load.json(url) YIELD value
UNWIND value.origin as or
MERGE(label:concept{name:or.label})
ON CREATE SET label.type = or.type
FOREACH(tagName IN or.tag | MERGE(tag:concept{name:tagName})
MERGE (tag)-[r:link]-(label)
ON CREATE SET r.Weight=1
ON MATCH SET r.Weight=r.Weight+1)
// Any aggregation function will break the UNWIND loop
// and return a single row as we want to write it only once
WITH value.origin[-1] as last, count(*) as agg
FOREACH(xpathName IN last.xpath |
MERGE(label:concept{name:last.label})
MERGE (xpath:concept{name:xpathName})
MERGE (label)-[r:link]-(xpath))

Sounds like you're looking for the last() function? This will return the last element of a list.
In this case, since you UNWIND the origin to 4 rows, you'll get the last element of the list for each of those rows.
WITH "file:///D:/project/neo_proj/input.json" AS url
CALL apoc.load.json(url) YIELD value
UNWIND value.origin as or
RETURN last(or.xpath) as last

JSON Schema - allow null with regex pattern

Would like to allow null on an optional date property where the date format is validated with a regex expression. Is this even possible?
"dateOfRetirement": {
"description": "Optional. Format: yyyy-MM-dd.",
"type": ["string", "null"],
"pattern": "^\\d{4}-\\d{2}-\\d{2}$"
}

To get the regular syntax for that you have to add a condition to your regex.
Your regex will get (assuming your regex syntax has no error!):
^(\\d{4}-\\d{2}-\\d{2}|null)$
Steps done:
incapsulate the normal regex with brackets (())
add an or-operator to the regex (|)
add the second validation for null to the regex after the or-operator
In the end the regex will allow a-valid-date-format or null as text.

I don't think that will work when "column": null.
It will only account for "column": "null" in regex

JSONPath get the id of a parent element by a sub-child value

Given the following JSON I want to get the id field of the parent by an equals text compare of a sub-child element:
{
"datapoints": [{
"id": "default.1",
"definedBy": "default/0.1",
"featureValues": {
"bui.displayname": "Health status",
"bui.visibility": "normal",
"default.access": "r",
"default.basetype": "text",
"default.description": "Aggregated health status",
"default.format": "text/plain",
"default.name": "health_status",
"default.restriction": "re:(OK|WARN|ERROR|UNKNOWN)"
}
}, {
"id": "kdl.240",
"definedBy": "kdl/0.9",
"featureValues": {
"bui.displayname": "Delta K",
"bui.visibility": "normal",
"default.access": "rw",
"default.basetype": "real",
"default.description": "Delta K",
"default.name": "Delta_K",
"default.privacy": "false",
"default.restriction": "b32"
}
}
]
}
My first goal is to get the correct data point by a sub-child text compare like:
$['datapoints'][*]['featureValues'][?(#['default.name']=='Delta_K')]
It seems not to work when I test it on http://jsonpath.com/
To get all the data points I used this successfully:
$['datapoints'][*]['featureValues']['default.name']
My goal is to get the id value of the data point with the featureValues child element default.name is equal Delta_K. In the example this would be kdl.240.

I could only solve the first part of my question by using:
$['datapoints'][*][?(#['default.name']=='Delta_K')]
During my research I found that jsonpath does not support to get the parent of a filtered node. In Chapter 7 "Conclusion" of http://www.baeldung.com/guide-to-jayway-jsonpath it's written:
Although JsonPath has some drawbacks, such as a lack of operators for reaching parent or sibling nodes, it can be highly useful in a lot of scenarios.
Also further SO posts couldn't help me.
Getting parent of matched element with jsonpath
Using jsonpath to get parent node

The following code is working for me on https://jsonpath.com :
$.datapoints[?(#.featureValues['default.name']=='Delta_K')].id

You need to find a node containing a featureValues attribute that contains a default.name attribute that matches your text. Using the suggestion at the end of https://github.com/json-path/JsonPath/issues/287 you can get what you want with:
$..[?(#.featureValues[?(#['default.name']=='Delta_K')] empty false)].id

#cat jsonData.json | jq ‘.datapoints[].featureValues | select .default.name == 'Delta_K') | .id’
see also:
https://github.com/adriank/ObjectPath/issues/70

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Regex Operation to Extract html Strings - html

I have the following html code: '"height": { "#type": "QuantitativeValue", "value": "6-1" },\n "weight": {"#type": "QuantitativeValue", "value": "195 lbs" }\n}\n' I want to create a Regex that'll extract the height and weight values (6-1 and 195 lbs). What re expression can do this?

Related

Regex: Remove Commas within quotes

how to extract properly when sqlite json has value as an array

Get last element of array by parsing JSON with Neo4j APOC

JSON Schema - allow null with regex pattern

JSONPath get the id of a parent element by a sub-child value

Categories

Resources