I am building a testcafe web automation project with a data-driven framework. I am using NPM/node.jS/ with Visual Studio Code. I have a JSON File and a SQL query, I want to point the where clause of the SQL query to the JSON data. This is very new to me, any suggestions are welcome.
I saw that there is an NPM JSON-SQL Node.JS library, but I have not used it, any example will help me.
Json file:
[
{
venid='ABC'
status='Open'
},
{
venid='IGH'
status='Closed'
},
SQL query:
// running the query and query will take data from the json file
Select *
From table
Where VendorId = <jsondata>
and Inventorystatus = <jsonData>
I'm not sure if I understand your question correctly, but if you want to parse a JSON data using SQL Server, you should use JSON capabilities, which are introduced in SQL Server 2016. Most of the scripting languages have enough capabilities to wotk with JSON format, so you should pass JSON data directly to SQL Server.
Next is a basic example, which you may use as a starting point (note, that your JSON is not valid):
Table with data:
CREATE TABLE #Data (
VendorId nvarchar(3),
InventoryStatus nvarchar(10)
)
INSERT INTO #Data
(VendorId, InventoryStatus)
VALUES
(N'ABC', N'Open'),
(N'ABC', N'Closed'),
(N'IGH', N'Open'),
(N'IGH', N'Closed')
T-SQL:
-- Valid JSON
-- [{"venid":"ABC", "status":"Open"}, {"venid":"IGH", "status":"Closed"}]
DECLARE #json nvarchar(max)
-- Read JSON from file. Try to send JSON data directly to SQL Server.
-- Additional permissions are needed to execute next statement.
--SELECT #json = BulkColumn
--FROM OPENROWSET (BULK '<Path>\JsonData.json', SINGLE_CLOB) as j
-- Read JSON from variable
SELECT #json = N'[{"venid":"ABC", "status":"Open"}, {"venid":"IGH", "status":"Closed"}]'
-- Use JSON data in a statement
SELECT d.*
FROM #Data d
JOIN OPENJSON(#json) WITH (
VendorId nvarchar(3) '$.venid',
InventoryStatus nvarchar(10) '$.status'
) j ON (d.VendorId = j.VendorId) AND (d.InventoryStatus = j.InventoryStatus)
Output:
---------------------------
VendorId InventoryStatus
---------------------------
ABC Open
IGH Closed
Related
I am currently assigned to a task to transform our active non-normalized table to a normalized one. We decided to use database triggers to facilitate the bulk migration and succeeding data changes until we discontinue the old table.
Below are the structure and sample of our old table:
SELECT * FROM TabHmIds;
ID EntitlementID TabId HmId
1 101 201 301
2 102 202 302
The required structure and sample of our new table should look like:
SELECT * FROM tab_integration;
id tab_id integration_id metadata
1 201 1 { "paid_id": {"entitlement_id": 101, "id": 301} }
2 202 1 { "paid_id": {"entitlement_id": 202, "id": 302} }
The following is what I have done in my INSERT trigger so far:
CREATE TRIGGER tab_integration_after_insert AFTER INSERT ON `TabHmIds`
FOR EACH ROW
BEGIN
DECLARE var_metadata JSON;
DECLARE var_new_metadata JSON;
DECLARE var_hm_metadata JSON;
DECLARE var_integration_id INT(11);
SELECT
metadata,
integration_id INTO var_metadata,
var_integration_id
FROM
`go`.`tab_integration` gti
WHERE
gti.`tab_id` = NEW.`TabId`;
SET var_hm_metadata = JSON_OBJECT('entitlement_id', NEW.`EntitlementId`, 'id', NEW.`HmId`);
IF var_integration_id = 1 THEN
IF var_metadata IS NULL THEN
SET var_new_metadata = JSON_OBJECT('paid_id', var_hm_metadata);
ELSE
SET #paid_id = JSON_UNQUOTE(JSON_EXTRACT(var_metadata, '$.paid_id'));
SET var_new_metadata = JSON_ARRAY_APPEND(var_metadata, '$.paid_id', var_hm_metadata);
END IF;
END IF;
UPDATE `tab_integration` gti SET `metadata` = var_new_metadata WHERE `tab_id` = NEW.`TabId`;
END
However, what I get is this:
SELECT * FROM tab_integration;
id tab_id integration_id metadata
1 201 1 { "paid_id": "{\"entitlement_id\": 101, \"id\": 301}" }
2 202 1 { "paid_id": "{\"entitlement_id\": 202, \"id\": 302}" }
From the table above, the JSON object is parsed into STRING. I am aware that the JSON_OBJECT parses the passed value to a string. SO I used the JSON_UNQUOTE(JSON_EXTRACT(…)) to convert the paid_id path value to JSON, but it does not get parsed into JSON. I also tried JSON_MERGE_PRESERVE to put the JSON object under the paid_id path but I end up getting:
{“paid_id”: [], “entitlement_id”: 101, “id”: 301 }
I also tried to put the JSON array into a temporary table using JSON_TABLE and modify the values in the temporary and convert that temporary table to JSON using JSONARRAYAGG. But Workbench keeps saying I have an error in my syntax even though I directly copied examples from the web.
I also tried CASTing a well-formed string to JSON, but Workbench also throws a syntax error.
I have spent a week into resolving this data structure in MySQL.
Database scripting is not my strong suit and I am new to the JSON functions in MySQL. Thank you in advance to those who will reply.
If in case needed, my MySQL Workbench is version 10.4.13-MariaDB. But the script should work in MySQL 5.7.
I found the answer to my problem.
Before inserting the new JSON data, I parsed it to CHAR and REPLACEd the characters making it a string. I did the following and it worked!
# After performing the needed JSON manipulations, cleanse the JSON string to JSON.
SET var_new_metadata = CAST(var_new_metadata AS CHAR);
SELECT REPLACE(REPLACE(REPLACE(var_new_metadata, '\\', ''), '\"\{', '{'), '\}\"', '}') INTO var_new_metadata;
After cleansing the data, the UPDATE call still worked and tried to do some JSON manipulations after and, yes, still works!
I have the below string in a column in hive table which i am trying to query using apache drill:
{"cdrreasun":"52","cdxscarc":"20150407161405","cdrend":"20150407155201","cdrdnrar.1un":"24321.70","servlnqlp":"54.201.25.50","men":"42403","xa:lnqruup":"3","cemcau":"120","accuuncl":"21","cdrc:
5","volcuca":"1.7"}
Want to retrieve all values for key cdrreasun using apache drill SQL.
Can't use FLATTEN on the column as it says Flatten does not work with inputs of non-list types.
Can't use KVGEN as well as it works only with MAP datatype.
Drill has function convert_fromJSON which allows converting from String to JSON object. For more details about this function and examples of usage please see https://drill.apache.org/docs/data-type-conversion/#convert_to-and-convert_from
For the example you specified, you can run
convert_fromJSON(colWithJsonText)['cdrreasun']
I figured it out, hope it will be helpful for others.
We have to do it in 3 steps if the datatype is of type MAP:
KVGEN() -> FLATTEN() -> convert_from()
If it's of type STRING then KVGEN() function is not needed.
SELECT ratinggrouplist
,t3.cdrlist3.cdrreason AS cdrreason
,t3.cdrlist3.cdrstart AS cdrstart
,t3.cdrlist3.cdrend AS cdrend
,t3.cdrlist3.cdrduration AS cdrduration
FROM (
SELECT ratinggrouplist, convert_from(t2.cdrlist2.`element`, 'JSON') AS cdrlist3
FROM (
SELECT ratinggrouplist ,flatten(t1.cdrlist1.`value`) AS cdrlist2
FROM (
SELECT ratinggrouplist, kvgen(cdrlist) AS cdrlist1
FROM dfs.tmp.SOME_TABLE
) AS t1
) AS t2
) AS t3;
Using Python 2.7, I wont to pass a query from BigQuery to ML Predict which has a specific formating request.
First: Is there an easier way to go directly from the BigQuery query to JSON in the correct format so it can be passed to requests.post() instead of going through pandas (from what I understand pandas is still not supported for GCP Standard)?
Second: Is there a way to construct the query to go directly to a JSON format and then modify the JSON to reflect the ML Predict JSON requirments?
Currently my code looks like this:
#I used the bigquery to dataframe option here to view the output.
#I would like to not use pandas in the end code.
logs = log_data.execute(output_options=bq.QueryOutput.dataframe()).result()
data = logs.to_json(orient='index')
print data
'{"0":{"end_time":"2018-04-19","device":"iPad","device_os":"iOS","device_os_version":"5.1.1","latency":0.150959,"megacycles":140.0,"cost":"1.3075e-08","device_brand":"Apple","device_family":"iPad","browser_version":"5.1","app":"567","ua_parse":"0"}}'
#The JSON needs to be in this format according to google documentation.
#data = {
# 'instances': [
# {
# 'key':'',
# 'end_time': '2018-04-19',
# 'device': 'iPad',
# 'device_os': 'iOS',
# 'device_os_version': '5.1.1',
# 'latency': 0.150959,
# 'megacycles':140.0,
# 'cost':'1.3075e-08',
# 'device_brand':'Apple',
# 'device_family':'iPad',
# 'browser_version':'5.1',
# 'app':'567',
# 'ua_parse':'40.9.8'
# }
# ]
#}
So all I would need to change is the leading key '0' to 'instances' and I should be all set to pass into `requests.post().
Is there a way to accomplish this?
Edit-Adding BigQuery query:
%%bq query --n log_data
WITH `my.table` AS (
SELECT ARRAY<STRUCT<end_time STRING, device STRING, device_os STRING, device_os_version STRING, latency FLOAT64, megacycles FLOAT64,
cost STRING, device_brand STRING, device_family STRING, browser_version STRING, app STRING, ua_parse STRING>>[] instances
)
SELECT TO_JSON_STRING(t)
FROM `my.table` AS t
WHERE end_time >='2018-04-19'
LIMIT 1
data = log_data.execute().result()
Thanks to #MikhailBerlyant I have adjust my query and code to look like this:
%%bq query --n log_data
SELECT [TO_JSON_STRING(t)] AS instance
FROM `yourproject.yourdataset.yourtable` AS t
WHERE end_time >='2018-04-19'
LIMIT 1
But when I run the execute logs = log_data.execute().result() I get this
Which results in this error when passing into request.post
TypeError: QueryResultsTable job_zfVEiPdf2W6msBlT6bBLgMusF49E is not JSON serializable
Is there a way within execut() to just return the json?
First: Is there an easier way to go directly from the BigQuery query to JSON in the correct format
See example below
#standardSQL
WITH yourTable AS (
SELECT ARRAY<STRUCT<id INT64, type STRING>>[(1, 'abc'), (2, 'xyz')] instances
)
SELECT TO_JSON_STRING(t)
FROM yourTable t
with result is in the format you asked for:
{"instances":[{"id":1,"type":"abc"},{"id":2,"type":"xyz"}]}
Above demonstrates the query and how it will work
In you real case - you should use something like below
SELECT TO_JSON_STRING(t)
FROM `yourproject.yourdataset.yourtable` AS t
WHERE end_time >='2018-04-19'
LIMIT 1
hope this helps :o)
Update based on comments
SELECT [TO_JSON_STRING(t)] AS instance
FROM `yourproject.yourdataset.yourtable` t
WHERE end_time >='2018-04-19'
LIMIT 1
I wanted to add this in case someone has the same problem I had or at least are stuck on were to go once you have the query.
I was able to write a function that formatted the query in the way Google ML Predict wants it to be passed into requests.post(). This is most likely a horrible way to accomplish this but I could not find a direct way to go from BigQuery to ML Predict in the correct format.
def logs(query):
client = gcb.Client()
query_job = client.query(query)
CSV_COLUMNS ='end_time,device,device_os,device_os_version,latency,megacycles,cost,device_brand,device_family,browser_version,app,ua_parse'.split(',')
for row in query_job.result():
var = list(row)
l1 = dict(zip(CSV_COLUMNS,var))
l1.update({'key':''})
l2 = {'instances':[l1]}
return l2
I am building a DataSnap server in Delphi XE2, and I'm having trouble figuring out the best way to insert the JSON Object that the client sends me into a Database.
Here's the format of the object I'm receiving:
{"PK":0,"FIELD1":"EXAMPLE","FIELD2":5, "DATE":""}
The solution I found was the following code
with qryDBMethods do
begin
SQL.Text := 'SELECT * FROM Table';
Open;
Append;
FieldByName('PK') .AsInteger := StrToInt(newId.ToString)
FieldByName('FIELD1').AsString := Object.Get('FIELD1').JsonValue.Value;
FieldByName('FIELD2').AsInteger := StrToInt(Object.Get('FIELD2').JsonValue.Value);
FieldByName('DATE') .AsDateTime:= Now;
Post;
After that, I would have my query component made into a JSON Object and returned to my client, but problem is, this is going to be a big application with dense tables and so doing a "SELECT * " every time I want to insert something isn't ideal. What is the best way to do this?
Try using a INSERT sentence instead.
with qryDBMethods do
begin
SQL.Text := 'INSERT INTO Table (PK, FIELD1, FIELD2) VALUES (:PK, :FIELD1, :FIELD2)';
ParamByName('PK') .Value:= StrToInt(newId.ToString)
ParamByName('FIELD1').Value := Object.Get('FIELD1').JsonValue.Value;
ParamByName('FIELD2').Value:= StrToInt(Object.Get('FIELD2').JsonValue.Value);
ExecSQL();
If what the problem is the amount of data when you open a Select * From Table, why not do something like Select * From Table Where 1 <> 1?
This way you would be able to insert while not loading any result.
Other option would to make an Insert script instead.
Is there a way to do a bulk update on a collection with LINQ? Currently if I have a List<myObject> and I want to update column1 to equal TEST for every row in the List I would setup a foreach loop and then for each individual object I would set the value and then save it. This works fine but I was just wondering if there was some LINQ method out there where I could do something like myOject.BulkUpdate(columnName, value)?
Your requirement here is entirely possible using Linq expressions and Terry Aney's excellent library on this topic.
Batch Updates and Deletes with LINQ to SQL
An update in the terms of the example you gave would be as follows:
using BTR.Core.Linq;
...
Context.myObjects.UpdateBatch
(
Context.myObjects.Where(x => x.columnName != value),
x => new myObject { columnName = value}
);
Edit (2017-01-20): It's worth nothing this is now available in the form of a NuGet package # https://www.nuget.org/packages/LinqPost/.
Install-Package LinqPost
Sounds like you're using LINQ To SQL, and you've got the basics laid out already.
LINQ To SQL is about abstracting tables into classes, and doesn't really provide the 'silver bullet' or one-liner you are looking for.
The only way to do that is to achieve your one-liner would be to make a stored proc to take that column name and new value, and implement that logic yourself.
db.MassUpdateTableColumn("Customer", "Name", "TEST");
....
CREATE PROC MassUpdateTableColumn
#TableName varchar(100), #ColumnName varchar(100), #NewVal varchar(100)
AS
/*your dynamic SQL to update a table column with a new val. */
Otherwise, it's as you describe:
List<Customer> myCusts = db.Customers.ToList();
foreach(Customer c in myCusts)
{
c.Name = "TEST";
}
db.SubmitChanges();
LINQ to SQL (or EF for that matter), is all about bringing objects into memory, manipulating them, and then updating them with separate database requests for each row.
In cases where you don't need to hydrate the entire object on the client, it is much better to use server side operations (stored procs, TSQL) instead of LINQ. You can use the LINQ providers to issue TSQL against the database. For example, with LINQ to SQL you can use context.ExecuteCommand("Update table set field=value where condition"), just watch out for SQL Injection.
EF Core 7.0 introduces Bulk Update and Bulk Delete.
For example, consider the following LINQ query terminated with a call to ExecuteUpdateAsync:
var priorToDateTime = new DateTime(priorToYear, 1, 1);
await context.Tags
.Where(t => t.Posts.All(e => e.PublishedOn < priorToDateTime))
.ExecuteUpdateAsync(s => s.SetProperty(t => t.Text, t => t.Text + " (old)"));
This generates SQL to immediately update the “Text” column of all tags for posts published before the given year:
UPDATE [t]
SET [t].[Text] = [t].[Text] + N' (old)'
FROM [Tags] AS [t]
WHERE NOT EXISTS (
SELECT 1
FROM [PostTag] AS [p]
INNER JOIN [Posts] AS [p0] ON [p].[PostsId] = [p0].[Id]
WHERE [t].[Id] = [p].[TagsId] AND [p0].[PublishedOn] < #__priorToDateTime_1)