regex replace SMT connector a json object to string - json

consider a msg on kafka topic, event: { "x":1, "y":2, "c": "abc"}. I would like to convert the event object to event: "{\"x\": 1, \"y\":2, \"c\":\"abc\"}.
Looking to use a regex transformation to capture everything between the curly brace.
Apparently I get an error Invalid value "{.*?}" for configuration regex: Invalid regex: Illegal repetition near index 0 "{.*?}".
Not sure how to solve this.
This is the connector config so far:
transforms=escapeBrace
transforms.escapeBrace.type=org.apache.kafka.connect.transforms.RegexRouter
transforms.escapeBrace.regex="\{.*?\}"
transforms.escapeBrace.replacement="\"$1\""

RegexRouter is only for modifying the topic name, not the data within any record. (Topic names cannot contain braces, slashes, or quotes.)
You'll want to try using Kafka Streams, or some other consumer to modify the data before sending to a new topic. Writing your own SMT is another option.

Related

Correct delimiter expression for csv next line in Spring Cloud Dataflow Filesplitter Processor

I'm using the Spring Cloud Dataflow server and I am polling csv files with the Time source and the http client processor.
Now I want to split the polled csv file and pipeline single line-by-line messages. Since the HttpClientProcessor polls entire files only, I'm using a FileSplitter processor in order to archieve that. But I'm stuck on that. The relevant FileSplitter options are delimiters and expression.
The delimiters options hint says
When expression is null, delimiters to use when tokenizing {#link String} payloads.
The expression options hint says
A SpEL expression for splitting payloads.
I've tried lots of possibilities more than just a simple \n for both options without success.
The expression option literally always fire the following error:
Failed to bind properties under 'splitter.expression' to org.springframework.expression.Expression:
Property: splitter.expression
Value: \n
Origin: System Environment Property "SPRING_APPLICATION_JSON"
Reason: failed to convert java.lang.String to org.springframework.expression.Expression
Approaches on the delimiters option lead into a successful start of the applications, but my log-sink is not getting any input. My delimiter options are different from what is recognized as the real 'new line'-character in csv.
Does anyone have an idea on what option(s) I have to input for delimiters or expression in order to split the csv message line-by-line?
Implementing my own FileSplitter processor app seems to be an overkill but I will do it if need be...
Here is an example working syntax with the expression argument providing \\n.
stream create --name name --definition “http --port=9090|splitter
--expression=payload.split(‘\\n’) |custom|log

Getting error while converting json file to a data frame using jsonlite

I am using the tweetscores package of R to get 'tweets list from twitter. The tweets are stored in json format. While converting it to a data frame I get a lexical error
' Error: lexical error: inside a string, '\' occurs before a character which it may not.".
Any solution to the mentioned error.
A part of the json file text
":[{"text":["MUFC"],"indices":[[83],[88]]}],"symbols":[],"user_mentions":[],"urls":[]},"metadata":{"iso_language_code":["en"],"result_type":["recent"]},"source":["http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>"],"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":[7.32108114527322e+017],"id_str":["732108114527322112"],"name":["wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww(^o^)/"],"screen_name":["SukiSukinal"],"location":["+6222"],"description":["Alliansi osaosi ngevote kagak. katanya sih fans a.k.a + "],"url":null,"entities":{"description":{"urls":[]}},"protected":[false],"followers_count":[163],"friends_count":[107],"listed_count":[4],"created_at":["Mon May 16 07:19:11 +0000
Json format does not allow backslashes so you need to escape them. replace any '\' character found with '\\'. Refer [here][1]
[1]: http://www.json.org/ for more info
You likely have an incomplete json string, which may be caused by the package or by an interrupted connection to Twitter's API. A complete json string returned from Twitter should look something like the following:
which I got using rtweet's stream_tweets() function. With a complete string returned by Twitter's REST or stream API, you should be able to convert the data using basically any json parser (e.g., jsonlite::fromJSON()).

how to escape special character '?' while inserting into memsql

I am trying to insert JSON payload into memsql JSON type column but it is failing due to the following reason.
My JSON content having '?' character.
I tried to escape '?' by using the following ways, but it doesn't worked for me.
The Exception i am getting is:
Root Exception stack trace:
java.lang.IndexOutOfBoundsException: Index: 0
Ex payload: "question mark content?"
1. #[org.mule.util.StringUtils.replace(payload,"?","\\?")]
Result: "question mark content\?"
2. #[org.mule.util.StringUtils.replace(payload,"?","\?")]
Result: not allowed to use the above expression
If i use the payload "question mark content" then it is inserted successfully.
Please help me how can I escape '?' in my JSON content while saving it into memsql?
'\?' itself is an escape sequence, so achive this you have to use "\\?" which produce "\?" which should work with memsql.
#[org.mule.util.StringUtils.replace(payload,"?","\\\\?")]
Hope this helps.
From the looks of your exception it looks like you are calling for it to replace the payload, but you're not assigning it to anything.
Going off of the documentation at:
http://grepcode.com/file/repo1.maven.org/maven2/commons-lang/commons-lang/2.4/org/apache/commons/lang/StringUtils.java#3457
It basically says that it's trying to replace the items in a string, and the method itself returns a string. Based on what I can tell in the stack trace, it seems as though you are passing a null or uninitialized variable to something that's trying to parse str[0], which is returning an array out of bounds error.
The way to correct this would be to do something like:
payload = org.mule.util.StringUtils.replace(payload,"?","\\\\?")
Which should replace any instance of ? with \? and re-write it to the payload variable. That said, it sounds like payload may actually be null when you're evaluating it later in your program, which could be indicative of a larger issue.

Import JSON using WITH in Neo4j

I'm trying to use a JSON, to eventually import it into Neo4j.
I use something like, it's a big JSON string:
WITH [
{"fullname":"Full name","note":"f","addr":[],"phone":[],"email":[{"value":"mail#city.com"}],"first_name":"","last_name":""},
..
] AS contacts
The colors of the first contact is mostly orange, then the other contacts become green, then black.
I get the following error:
Invalid input '"': expected whitespace, an identifier, UnsignedDecimalInteger, a property key name or '}'
I can view my JSON file with http://jsonviewer.stack.hu/ And it looks fine
Do I need to escape some kind of character, so that Neo4j understands it?
Edit:
Based on Martins answer, I removed the quotes using a regex in PHP from:
Remove double-quotes from a json_encoded string on the keys
Remove the quotation marks around the keys. The error message tells you that it expects a property key. Cypher does not use JSON here.
WITH [
{fullname:"Full name",note:"f",addr:[],phone:[],
email:[{value:"mail#city.com"}],
first_name:"",last_name:""}
] AS contacts
RETURN contacts
A neo4j driver or client library will handle data passed from dictionary like structures as parameters: https://neo4j.com/docs/developer-manual/current/cypher/#cypher-parameters
If you want to work with JSON and maybe load it from external sources you should have a look at the APOC procedures: https://neo4j-contrib.github.io/neo4j-apoc-procedures/.
This for example converts a JOSN string to a map that can be used in Cypher: https://neo4j-contrib.github.io/neo4j-apoc-procedures/#_from_tojson
CALL apoc.convert.fromJsonMap(
'{"fullname":"Full name","note":"f","addr":[],"phone":[],
"email":[{"value":"mail#city.com"}],"first_name":"","last_name":""}'
)
YIELD value
RETURN value

Parsing large JSON file with Scala and JSON4S

I'm working with Scala in IntelliJ IDEA 15 and trying to parse a large twitter record json file and count the total number of hashtags. I am very new to Scala and the idea of functional programming. Each line in the json file is a json object (representing a tweet). Each line in the file starts like so:
{"in_reply_to_status_id":null,"text":"To my followers sorry..
{"in_reply_to_status_id":null,"text":"#victory","in_reply_to_screen_name"..
{"in_reply_to_status_id":null,"text":"I'm so full I can't move"..
I am most interested in a property called "entities" which contains a property called "hastags" with a list of hashtags. Here is an example:
"entities":{"hashtags":[{"text":"thewayiseeit","indices":[0,13]}],"user_mentions":[],"urls":[]},
I've browsed the various scala frameworks for parsing json and have decided to use json4s. I have the following code in my Scala script.
import org.json4s.native.JsonMethods._
var json: String = ""
for (line <- io.Source.fromFile("twitter38.json").getLines) json += line
val data = parse(json)
My logic here is that I am trying to read each line from twitter38.json into a string and then parse the entire string with parse(). The parse function is throwing an error claiming:
"Type mismatch, expected: Nothing, found:String."
I have seen examples that use parse() on strings that hold json objects such as
val jsontest =
"""{
|"name" : "bob",
|"age" : "50",
|"gender" : "male"
|}
""".stripMargin
val data = parse(jsontest)
but I have received the same error. I am coming from an object oriented programming background, is there something fundamentally wrong with the way I am approaching this problem?
You have most likely incorrectly imported dependencies to your Intellij project or modules into your file. Make sure you have the following lines imported:
import org.json4s.native.JsonMethods._
Even if you correctly import this module, parse(String: json) will not work for you, because you have incorrectly formed a json. Your json String will look like this:
"""{"in_reply_...":"someValue1"}{"in_reply_...":"someValues2"}"""
but should look as follows to be a valid json that can be parsed:
"""{{"in_reply_...":"someValue1"},{"in_reply_...":"someValues2"}}"""
i.e. you need starting and ending brackets for the json, and a comma between each line of tweets. Please read the json4s documenation for more information.
Although being almost 6 years old, I think this question deserves another try.
JSON format has a few misunderstandings in people's minds, especially how they are stored and how they are read back.
JSON documents, are stored as either a single object having all the other fields, or an array of multiple object possibly in same format. this second part is important because arrays in almost every programming language are defined by angle brackets and values separated by commas (note here I used a person object as my single value):
[
{"name":"John","surname":"Doe"},
{"name":"Jane","surname":"Doe"}
]
also note that everything except brackets, numbers and booleans are enclosed in quotes when written into file.
however, there is another use that is not official but preferred to transfer datasets easily where every object, or document as in nosql/mongo language, are stored in a new line like this:
{"name":"John","surname":"Doe"}
{"name":"Jane","surname":"Doe"}
so for the question, OP has a document written in this second form, but tries an algorithm written to read the first form. following code has few simple changes to achieve this, and the user must read the file knowing that:
var json: String = "["
for (line <- io.Source.fromFile("twitter38.json").getLines) json += line + ","
json=json.splitAt(json.length()-1)._1
json+= "]"
val data = parse(json)
PS: although #sbrannon, has the correct idea, the example he/she gave has mistakenly curly braces instead of angle brackets to surround the data.
EDIT: I have added json=json.splitAt(json.length()-1)._1 because the code above ends with a trailing comma which will cause parse error per the JSON format definition.