Translate T-SQL Json query to USQL - json

Hi I am trying to translate this logic from T-sql json query to usql. But i am not able to achieve the same results. Any help will be appreciated.
select N'{
"rowid": "1",
"freeresponses": {
"fr1": "1.1",
"fr2": "1.1",
"fr3": "1.3",
"fr4": "1.4",
"fr5": "1.4"
}
}'as jsontext
into dbo.tmp#
SELECT convert (int, JSON_VALUE(jsontext,'$.rowid' )) as rowid,
c.[key], c.[value]
FROM dbo.tmp#
cross apply openjson(json_query(jsontext,'$.freeresponses') )
as c;
the result will be like this in SQL server.
rowid | key | value
1 | fr1| 1.1
1 | fr2| 1.1
1 | fr3| 1.3
1 | fr4 | 1.4
1 | fr5| 1.4
To achive the same result in USQL i have tried the below and errors.
REFERENCE ASSEMBLY Staging.[Newtonsoft.Json];
REFERENCE ASSEMBLY Staging.[Microsoft.Analytics.Samples.Formats];
USING Microsoft.Analytics.Samples.Formats.Json;
DECLARE #inputmasterfileset string = "/sourcedata/samplefreeresponse.txt";
DECLARE #outputfreeresponse string = "/sourcedata/samplefreeresponseoutput.txt";
#freeresponse1 =
EXTRACT rowid string,
freeresponses string
FROM #inputmasterfileset
USING new JsonExtractor("");
#freeresponse2 =
SELECT rowid,
JsonFunctions.JsonTuple(freeresponses).Values AS freeresponses
FROM #freeresponse1;
#freeresponse3 =
SELECT rowid
,JsonFunctions.JsonTuple(free) AS values
FROM #freeresponse2
CROSS APPLY
EXPLODE(freeresponses) AS c(free);
OUTPUT #freeresponse3
TO #outputfreeresponse
USING Outputters.Text('|', outputHeader:true,quoting:false);
The catch is, i dont know how the keys are named the json document so i cannot specify JsonFunctions.JsonTuple(free)["fr1"] in stage 3 of the code and I want the result as same as what i got in T-SQL.
Much appreciated.

I have resolved it myself. It was the confusion with SQL MAP and SQL ARRAY.

Related

Calling other feature and reading data from csv not in examples

I am usually calling other feature and reading data from csv in the examples, like below.
Scenario Outline:
* call read('classpath:controller/Controller.feature')
Examples:
|read('classpath:com/testdata/Test.csv')|
This time I still want to read data from csv, but use Examples for other purpose, like below. Is it possible to read data from csv still? Maybe passing as parameter?
Scenario Outline:
* call read('classpath:controller/Controller.feature'){read('classpath:com/testdata/Test.csv')}
Examples:
|gain |spend |
|12000| 12008 |
|3400 | 4655 |
I know it works this way but I have to pass index [0], and if I have more test data in csv it won't work
Scenario Outline:
* def testData = read('classpath:com/testdata/Test.csv')
* call read('classpath:controller/Controller.feature'){ "name": "#(testData[0].name)", "age": "#(testData[0].age)"}
Examples:
|gain |spend |
|12000| 12008 |
|3400 | 4655 |
I'll just give one tip. When you use Examples the row index is available as a variable called __num: https://github.com/karatelabs/karate#scenario-outline-enhancements
So you can do things like this:
Feature:
Scenario Outline:
* def data = [{ id: 0 }, { id: 1 }]
* match (data[__num].id) == temp
Examples:
| temp! |
| 0 |
| 1 |

How to fix timeless execution in cypher query - Neo4j Graph Database?

I'm dealing with the import of Common Weakness Enumeration Catalog (.json file) to the Neo4j Graph Database, using cypher language query and the apoc library. Although i import properly the fields: Weaknesses, Views, External_References, i have an execution problem (without any error) with the import of the field: Categories which is executing without ending. Below i present the structure of .json file and my cypher code.
"Weakness_Catalog": {
"Weaknesses": {"Weakness":[...]}
"Categories": {"Category":[...]}
"Views": {"View":[...]}
"External_References": {"External_Reference":[...]}
}
Cypher Query
After several tests i think that the logic error is between the last 2 parts [with value....(catRef)], without them, the query executes pretty good, at normal time. I've also changed a setting param. at the db configuration file due to an error (cypher.lenient_create_relationship = true). And i tested the different import sequence with the same bad results (weakness, categories, views, ext. references etc.)
call apoc.load.json(files) yield value
unwind value.Weakness_Catalog.Weaknesses.Weakness as weakness
merge (i:GeneralInfo_CWE {Name:value.Weakness_Catalog.Name, Version:value.Weakness_Catalog.Version,
Date:value.Weakness_Catalog.Date, Schema:'https://cwe.mitre.org/data/xsd/cwe_schema_v6.4.xsd'})
merge(w:CWE {Name:'CWE-' + weakness.ID})
set w.Extended_Name=weakness.Name, w.Abstraction=weakness.Abstraction,
w.Structure=weakness.Structure, w.Status=weakness.Status, w.Description=weakness.Description,
w.Extended_Description= apoc.convert.toString(weakness.Extended_Description),
w.Likelihood_Of_Exploit=weakness.Likelihood_Of_Exploit,
w.Background_Details=apoc.convert.toString(weakness.Background_Details.Background_Detail),
w.Modes_Of_Introduction=[value in weakness.Modes_Of_Introduction.Introduction | value.Phase],
w.Submission_Date=weakness.Content_History.Submission.Submission_Date,
w.Submission_Name=weakness.Content_History.Submission.Submission_Name,
w.Submission_Organization=weakness.Content_History.Submission.Submission_Organization,
w.Modifications=[value in weakness.Content_History.Modification | apoc.convert.toString(value)],
w.Alternate_Terms=apoc.convert.toString(weakness.Alternate_Terms),
w.Notes=[value in weakness.Notes.Note | apoc.convert.toString(value)],
w.Affected_Resources=[value in weakness.Affected_Resources.Affected_Resource | value],
w.Functional_Areas=[value in weakness.Functional_Areas.Functional_Area | value]
merge (w)-[:belongsTo]->(i)
with w, weakness, value
unwind weakness.Related_Weaknesses.Related_Weakness as Rel_Weakness
match (cwe:CWE) where cwe.Name='CWE-' + Rel_Weakness.CWE_ID
merge (w)-[:Related_Weakness{Nature:Rel_Weakness.Nature}]->(cwe)
with w, weakness, value
unwind weakness.Applicable_Platforms as appPl
foreach (lg in appPl.Language |
merge(ap:Applicable_Platform{Type:'Language', Prevalence:lg.Prevalence,
Name:coalesce(lg.Name, 'NOT SET'), Class:coalesce(lg.Class, 'NOT SET')})
merge(w)-[:Applicable_Platform]->(ap))
with w, weakness, value, appPl
foreach (tch in appPl.Technology |
merge(ap:Applicable_Platform{Type:'Technology', Prevalence:tch.Prevalence,
Name:coalesce(tch.Name, 'NOT SET'), Class:coalesce(tch.Class, 'NOT SET')})
merge(w)-[:Applicable_Platform]->(ap))
with w, weakness, value, appPl
foreach (arc in appPl.Architecture |
merge(ap:Applicable_Platform{Type:'Architecture', Prevalence:arc.Prevalence,
Name:coalesce(arc.Name, 'NOT SET'), Class:coalesce(arc.Class, 'NOT SET')})
merge(w)-[:Applicable_Platform]->(ap))
with w, weakness, value, appPl
foreach (os in appPl.Operating_System |
merge(ap:Applicable_Platform{Type:'Operating System', Prevalence:os.Prevalence,
Name:coalesce(os.Name, 'NOT SET'), Class:coalesce(os.Class, 'NOT SET')})
merge(w)-[:Applicable_Platform]->(ap))
with w, weakness, value
foreach (example in weakness.Demonstrative_Examples.Demonstrative_Example |
merge(ex:Demonstrative_Example {Intro_Text:apoc.convert.toString(example.Intro_Text)})
set ex.Body_Text=[value in example.Body_Text | apoc.convert.toString(value)],
ex.Example_Code=[value in example.Example_Code | apoc.convert.toString(value)]
merge (w)-[:hasExample]->(ex))
with w, weakness, value
foreach (consequence in weakness.Common_Consequences.Consequence |
merge (con:Consequence{CWE:w.Name, Scope:[value in consequence.Scope | value]})
set con.Impact=[value in consequence.Impact | value],
con.Note=consequence.Note, con.Likelihood=consequence.Likelihood
merge(w)-[:hasConsequence]->(con))
with w, weakness, value
foreach (dec in weakness.Detection_Methods.Detection_Method |
merge(d:Detection_Method {Method:dec.Method})
merge(w)-[wd:canBeDetected{Description:apoc.convert.toString(dec.Description)}]->(d)
set wd.Effectiveness=dec.Effectiveness, wd.Effectiveness_Notes=dec.Effectiveness_Notes,
wd.Detection_Method_ID=dec.Detection_Method_ID)
with w, weakness, value
foreach (mit in weakness.Potential_Mitigations.Mitigation |
merge(m:Mitigation {Description:apoc.convert.toString(mit.Description)})
set m.Phase=[value in mit.Phase | value], m.Strategy=mit.Strategy,
m.Effectiveness=mit.Effectiveness, m.Effectiveness_Notes=mit.Effectiveness_Notes,
m.Mitigation_ID=mit.Mitigation_ID
merge(w)-[:hasMitigation]->(m))
with w, weakness, value
foreach (rap in weakness.Related_Attack_Patterns.Related_Attack_Pattern |
merge(cp:CAPEC {Name:rap.CAPEC_ID})
merge(w)-[:RelatedAttackPattern]->(cp))
with w, weakness, value
foreach (reference in value.Weakness_Catalog.External_References.External_Reference |
merge(r:External_Reference{Reference_ID:reference.Reference_ID})
set r.Author=[value in reference.Author | value], r.Title=reference.Title,
r.Edition=reference.Edition, r.URL=reference.URL,
r.Publication_Year=reference.Publication_Year, r.Publisher=reference.Publisher)
with w, weakness, value
unwind weakness.References.Reference as exReference
match (ref:External_Reference) where ref.Reference_ID=exReference.External_Reference_ID
merge(w)-[:hasExternal_Reference]->(ref)
with value
unwind value.Weakness_Catalog.Views.View as view
merge (v:CWE_VIEW{ViewID:view.ID})
set v.Name=view.Name, v.Type=view.Type, v.Status=view.Status,
v.Objective=apoc.convert.toString(view.Objective), v.Filter=view.Filter,
v.Notes=apoc.convert.toString(view.Notes),
v.Submission_Name=view.Content_History.Submission.Submission_Name,
v.Submission_Date=view.Content_History.Submission.Submission_Date,
v.Submission_Organization=view.Content_History.Submission.Submission_Organization,
v.Modification=[value in view.Content_History.Modification | apoc.convert.toString(value)]
foreach (value in view.Audience.Stakeholder |
merge (st:Stakeholder{Type:value.Type})
merge (v)-[rel:usefulFor]->(st)
set rel.Description=value.Description)
with v, view, value
unwind (case view.Members.Has_Member when [] then [null] else view.Members.Has_Member end) as members
optional match (MemberWeak:CWE{Name:'CWE-' + members.CWE_ID})
merge (v)-[:hasMember{ViewID:members.View_ID}]->(MemberWeak)
with v, view, value
unwind (case view.References.Reference when [] then [null] else view.References.Reference end) as viewExReference
optional match (viewRef:External_Reference{Reference_ID:viewExReference.External_Reference_ID})
merge (v)-[:hasExternal_Reference{ViewID:v.ViewID}]->(viewRef)
with value
unwind value.Weakness_Catalog.Categories.Category as category
merge (c:CWE_Category{CategoryID:category.ID})
set c.Name=category.Name, c.Status=category.Status, c.Summary=apoc.convert.toString(category.Summary),
c.Notes=apoc.convert.toString(category.Notes), c.Submission_Name=category.Content_History.Submission.Submission_Name,
c.Submission_Date=category.Content_History.Submission.Submission_Date,
c.Submission_Organization=category.Content_History.Submission.Submission_Organization,
c.Modification=[value in category.Content_History.Modification | apoc.convert.toString(value)]
with c, category
unwind (case category.References.Reference when [] then [null] else category.References.Reference end) as categoryExReference
optional match (catRef:External_Reference{Reference_ID:categoryExReference.External_Reference_ID})
merge (c)-[:hasExternal_Reference{CategoryID:c.CategoryID}]->(catRef)
So, the problem was that every time i use with, i'm working in nested loops. The more nested loops, the slower the query will be. A good way to speed up, is to create simplier queries when it's possible.
For example in the json file:
"Weakness_Catalog": {
"Weaknesses": {"Weakness":[...]}
"Categories": {"Category":[...]}
"Views": {"View":[...]}
"External_References": {"External_Reference":[...]}
}
i will execute one query for Weaknesses, one for Categories, one for Views and one for External_References.

Checking discord messages with sql

So I've been trying to check each message from a sql table. My sql table structure is down below:
Example:
| id | | triggervalue | | triggermessage |
| 633666515413237791 | | hello, world, test | | Trigger works! |
(array like string is something like: hello, world, test)
I want to check each message from each triggervalue column to see if message contains a string from array like string.
Here is what I've done:
I tried to merge every single array like string then send(triggermessage) where the same row of the found array contains, then checking for word.
connection.query(`SELECT * FROM triggervalue`, (err, rows) => {
let array = []
for(i = 0; i < rows.length; i++) {
let received = rows[i].jsonstring;
var intarray = received.replace(/^\[|\]$/g, "").split(", ");
array.concat(intarray)
// continue code here...
}
})
However, I can't get the triggermessage of the same row of found array. How would I go for it? I've been stuck here for quite a while... Sorry if this way of asking is wrong, thanks!
(Sorry if my english is bad)

spark rdd fliter by query mysql

I use spark streaming to stream data from Kafka and I want to filter data judge by data in MySql.
For example, I get data from kafka just like:
{"id":1, "data":"abcdefg"}
and there are data in MySql like this:
id | state
1 | "success"
I need to query the MySql to get the state of term id.
I can define a connect to MySql in the function of filter, and it works. The code like this:
def isSuccess(x):
id = x["id"]
sql = """
SELECT *
FROM Test
WHERE id = "{0}"
""".format(id)
conn = mysql_connection(......)
result = rdbi.query_one(sql)
if result == None:
return False
else:
return True
successRDD = rdd.filter(isSuccess)
But it will define connection for every row of the RDD, and will waste a lot of computing resource.
How to do in filter?
I suggest you go for using mapPartition available in Apache Spark to prevent initialization of MySQL connection for every RDD.
This is the MySQL table that I created:
create table test2(id varchar(10), state varchar(10));
With the following values:
+------+---------+
| id | state |
+------+---------+
| 1 | success |
| 2 | stopped |
+------+---------+
Use the following PySpark Code as reference:
import MySQLdb
data1=[["1", "afdasds"],["2","dfsdfada"],["3","dsfdsf"]] #sampe data, in your case streaming data
rdd = sc.parallelize(data1)
def func1(data1):
con = MySQLdb.connect(host="127.0.0.1", user="root", passwd="yourpassword", db="yourdb")
c=con.cursor()
c.execute("select * from test2;")
data=c.fetchall()
dict={}
for x in data:
dict[x[0]]=x[1]
list1=[]
for x in data1:
if x[0] in dict:
list1.append([x[0], x[1], dict[x[0]]])
else:
list1.append([x[0], x[1], "none"]) # i assign none if id in table and one received from streaming dont match
return iter(list1)
print rdd.mapPartitions(func1).filter(lambda x: "none" not in x[2]).collect()
The output that i got was:
[['1', 'afdasds', 'success'], ['2', 'dfsdfada', 'stopped']]

Why does reading csv file with empty values lead to IndexOutOfBoundException?

I have a csv file with the foll struct
Name | Val1 | Val2 | Val3 | Val4 | Val5
John 1 2
Joe 1 2
David 1 2 10 11
I am able to load this into an RDD fine. I tried to create a schema and then a Dataframe from it and get an indexOutOfBound error.
Code is something like this ...
val rowRDD = fileRDD.map(p => Row(p(0), p(1), p(2), p(3), p(4), p(5), p(6) )
When I tried to perform an action on rowRDD, gives the error.
Any help is greatly appreciated.
This is not answer to your question. But it may help to solve your problem.
From the question I see that you are trying to create a dataframe from a CSV.
Creating dataframe using CSV can be easily done using spark-csv package
With the spark-csv below scala code can be used to read a CSV
val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load(csvFilePath)
For your sample data I got the following result
+-----+----+----+----+----+----+
| Name|Val1|Val2|Val3|Val4|Val5|
+-----+----+----+----+----+----+
| John| 1| 2| | | |
| Joe| 1| 2| | | |
|David| 1| 2| | 10| 11|
+-----+----+----+----+----+----+
You can also inferSchema with latest version. See this answer
Empty values are not the issue if the CSV file contains fixed number of columns and your CVS looks like this (note the empty field separated with it's own commas):
David,1,2,10,,11
The problem is your CSV file contains 6 columns, yet with:
val rowRDD = fileRDD.map(p => Row(p(0), p(1), p(2), p(3), p(4), p(5), p(6) )
You try to read 7 columns. Just change your mapping to:
val rowRDD = fileRDD.map(p => Row(p(0), p(1), p(2), p(3), p(4), p(5))
And Spark will take care of the rest.
The possible solution to that problem is replacing missing value with Double.NaN. Suppose I have a file example.csv with columns in it
David,1,2,10,,11
You can read the csv file as text file as follow
fileRDD=sc.textFile(example.csv).map(x=> {val y=x.split(","); val z=y.map(k=> if(k==""){Double.NaN}else{k.toDouble()})})
And then you can use your code to create dataframe from it
You can do it as follows.
val df = sqlContext
.read
.textfile(csvFilePath)
.map(_.split(delimiter_of_file, -1)
.map(
p =>
Row(
p(0),
p(1),
p(2),
p(3),
p(4),
p(5),
p(6))
Split using delimiter of your file. When you set -1 as limit it consider all the empty fields.