Do not show JSON data in columns - json

The software I'm using saves a copy of the data that I think is json in an extra-different table when I do records in the database.
What I want to do is to be able to query the json data contained in the DATASETS column separately.
I'm using SQL 2012 as my server
This is the query I tried so far:
SELECT TOP 1 IND, SNAPSHOTDATE, DATASETS, USERNAME, OWNERFORM
FROM TBLSNAPSHOTS
CODE RESULT:
105 2018-09-14 02:59:34.000 { "Datasets": [{"Name": "TBLSTOKLAR","Lines": [{"IND": "102","STOKNO": "","MALINCINSI": "TITIZ PLASTIK BUYUK KASIK 10 ADET","STOKKODU": "8691262708050","ANABIRIM": "102","BIRIMEX": "102","ALTSEVIYE": "","KRITIKSEVIYE": "","USTSEVIYE": "","DEPOSEVIYESI": "True","URETICI": "","AYLIKVADE": "0","SERINO": "","DEPO": "1","STOKGRUBU": "","GARANTI": "0","PRIM": "0","IPTAL": "False","STOKTIPI": "0","STOKTAKIP": "0","TEMINYERI": "1","RAFOMRU": "0","RESIM": "","KALAN": "0","REZERV": "0","KOD1": "","KOD2": "","KOD3": "","KOD4": "","KOD5": "","KOD6": "","KOD7": "","KOD8": "","KOD9": "","KOD10": "","TAKSITSAYISI": "0","ISTIHBARAT": "","FIYATYOK": "","DELETED": "","ALISFIYATI": "0","ESKIALISFIYATI": "0","SONALISTARIHI": "","SONSATISTARIHI": "","KARTINACILMATARIHI": "14.09.18 ı. 02:57:58","DEVIRIND": "","MALIYET": "1","KDVGRUBU": "1","AKTIF": "False","ISCILIKIND": "0","ISCILIKBIRIMIND": "0","ISCILIKACIKLAMA": "","ISCILIKSTOKKODU": "","ALISFIYATIDEGISMETARIHI": "","STATUS": "1","DALISFIYATI": "","APB": "","OIV": "0","KARORANI": "0","OTV": "0","ISK": "0","STOKGRUPTANIMI": "","ISKSATISFIYATI2": "0","ISKSATISFIYATI3": "0","ALISKDVORANI": "18","ALISISKORANI": "","SIPARISALINMASIN": "False","SIPARISVERILMESIN": "False","P1": "","P2": "","P3": "","SATISKOSULU": "","DEFAULTALISFIYATI": "","DEFAULTALISFIYATIDEGISMESTARIHI": "","KDVGRUBUT": "","HEDEFSATISFIAYTI": "","KURUMISKONTOSU": "","TICARIISKONTO": "","ITSBILDIRIMI": "False","MAXISKORANI": "","IMALATCISATISFIYATI": "","DKUR": "1","ACILSEVK": "False","SOGUKSEVK": "False","ICMIKTAR": "","TICARISEKIL": "","MAXISKTUTAR": "","TAXE": "","KOD11": "","DAPB": "","IKINCIEL": "","ETICARET": "","STOKNEVI": "0","OTVORANSAL": "True","POZ": "","YAZARKASA": "False","KOD12": "","KOD13": "","KOD14": "","KOD15": "","KOD16": "","KOD17": "","KOD18": "","KOD19": "","KOD20": "","KOD21": "","UID": "{0DE71D73-E447-45B0-BF6A-1D312DBAFDD2}"}]}]} ADMIN frmEdtStok```

In SQL 2012 - no, you can't directly query the JSON. In SQL 2016 they added functions to let you do this:
https://learn.microsoft.com/en-us/sql/t-sql/functions/json-query-transact-sql?view=sql-server-2017
But if you need to stay on 2012 you are limited to String parsing it (don't do this), or writing/finding a CLR function which parses it using .Net code and returns the results
If you simply must do it quickly there are some hackey solutions to parse it like so: https://www.red-gate.com/simple-talk/sql/t-sql-programming/consuming-json-strings-in-sql-server/ but don't expect it to work smoothly with complex json

Related

AWS Athena and handling json

I have millions of files with the following (poor) JSON format:
{
"3000105002":[
{
"pool_id": "97808",
"pool_name": "WILDCAT (DO NOT USE)",
"status": "Zone Permanently Plugged",
"bhl": "D-12-10N-05E 902 FWL 902 FWL",
"acreage": ""
},
{
"pool_id": "96838",
"pool_name": "DRY & ABANDONED",
"status": "Zone Permanently Plugged",
"bhl": "D-12-10N-05E 902 FWL 902 FWL",
"acreage": ""
}]
}
I've tried to generate an Athena DDL that would accommodate this type (especially the api field) of structure with this:
CREATE EXTERNAL TABLE wp_info (
api:array < struct < pool_id:string,
pool_name:string,
status:string,
bhl:string,
acreage:string>>)
LOCATION 's3://foo/'
After trying to generate a table with this, the following error is thrown:
Your query has the following error(s):
FAILED: ParseException line 2:12 cannot recognize input near ':' 'array' '<' in column type
What is a workable solution to this issue? Note that the api string is different for every one of the million files. The api key is not actually within any of the files, so I hope there is a way that Athena can accommodate just the string-type value for these data.
If you don't have control over the JSON format that you are receiving, and you don't have a streaming service in the middle to transform the JSON format to something simpler, you can use regex functions to retrieve the relevant data that you need.
A simple way to do it is to use Create-Table-As-Select (CTAS) query that will convert the data from its complex JSON format to a simpler table format.
CREATE TABLE new_table
WITH (
external_location = 's3://path/to/ctas_partitioned/',
format = 'Parquet',
parquet_compression = 'SNAPPY')
AS SELECT
regexp_extract(line, '"pool_id": "(\d+)"', 1) as pool_id,
regexp_extract(line, ' "pool_name": "([^"])",', 1) as pool_name,
...
FROM json_lines_table;
You will improve the performance of the queries to the new table, as you are using Parquet format.
Note that you can also update the table when you can new data, by running the CTAS query again with external_location as 's3://path/to/ctas_partitioned/part=01' or any other partition scheme

Max date row using Talend Data Integration

Using Talend Data Integration, following data is inputted from a CSV file
Time A B C
18:22.0 -0.30107087 3.79899955 6.52000999
18:23.5 -9.648633 0.84515321 1.50116444
18:25.0 -6.01184034 7.66982508 4.42568207
18:25.9 -9.22426033 3.12263751 5.10443783
18:26.7 -9.00698662 4.03901815 0.01316811
18:27.4 -4.31255579 6.25724602 5.02961922
18:28.2 -2.67013335 7.5932107 5.41628265
18:28.8 -1.76213241 6.26981592 7.44536877
18:29.5 -2.18590617 5.58567238 6.23928976
18:30.3 0.42078096 3.1429882 8.46290493
18:30.9 0.36391866 3.02926373 8.86752415
18:31.6 0.35673606 3.07176089 8.93396378
18:32.4 0.35374331 3.05081153 8.93994904
18:33.0 0.38187516 3.05799413 8.89745235
18:33.7 0.32920274 3.03644633 8.9315691
18:34.4 0.37529111 3.07475352 8.93575954
18:35.0 0.40342298 3.07654929 8.86393356
18:35.7 0.35254622 3.05260706 8.9267807
How do I extract only the max date row (18:35.7 0.35254622 3.05260706 8.9267807) and load it into a JSON?
You can achieve this by sorting your file on the Time column (reverse alphanumeric sort gives the most recent Time on top), then take the first row and write it to a JSON file. Like so :
tSampleRow_1 config :

MySQL query to retrieve tabular data in json format

I have a table like below in MySQl database
user-name mail
ganesh g#g.com
gani gani#gani.com
gan gan#gan.com
I need query to retrieve above table in JSON format
Example:
[{
user-name:"ganesh",
mail:"g#g.com"
},
{
user-name:"gani",
mail:"gani#gani.com"
},
{
user-name:"gan",
mail:"gan#gan.com"
}
]
I need help, to do above
It's not recommended to do such things in the DBMS, do it in the script that is loading the data instead, if you're wrapping some legacy code you can't edit then wrap it with more code to format the data.
If all that fails do something like this: http://www.thomasfrank.se/mysql_to_json.html
SELECT
CONCAT("[",
GROUP_CONCAT(
CONCAT("{username:'",username,"'"),
CONCAT(",email:'",email),"'}")
)
,"]")
AS json FROM users;

Load only a few values from complex JSON object in Pig Latin

I have a complex JSON file that looks like this: http://pastebin.com/4UfadbqS
I would like to load only several values from these JSON objects using Pig Latin. I tried doing that like this:
mydata = LOAD 'data.json'
USING JsonLoader('id:chararray, created_at:chararray,
user: {(language:chararray)}’);
STORE mydata
INTO 'output';
But it seems that Pig Latin is just taking the first 3 values from the JSON and saving them (it does not recognize the column name as a key). Is there a way to achieve this? OR should I just list ALL the values from JSON in a Pig and filter them after that?
There are few problems in the above approach
1. JsonLoader will always expect the full schema of your input but you gave only three fields.
2. JsonLoader will always expect the entire input as a single line but your input is multiline.
3. JsonLoader will not support nested schema but your input contains nested schema.
To solve all the above problems you have use the thirdparty library elephant-bird jar.
Download the (elephant-bird-pig-4.1.jar and elephant-bird-hadoop-compat-4.1.jar) jar file from this link
http://www.java2s.com/Code/Jar/e/elephant.htm and try the below approach
I copied your entire input and formatted as a single line as below.
input.json
{"filter_level":"medium","retweeted":false,"in_reply_to_screen_name":null,"possibly_sensitive":false,"truncated":false,"lang":"en","in_reply_to_status_id_str":null,"id":488927960280211456,"in_reply_to_user_id_str":null,"in_reply_to_status_id":null,"created_at":"Tue Jul 15 06:08:04 +0000 2014","favorite_count":0,"place":null,"coordinates":null,"text":"RT #BulleyBufton: #MinaANDMaya PLEASE RT /VOTE BULLEY. Last day to help me win my old rescue #HilbraesDogs £5k https://t.co/Y8g47fLYY1 http\u2026","contributors":null,"retweeted_stt
atus":{"filter_level":"low","contributors":null,"text":"#MinaANDMaya PLEASE RT /VOTE BULLEY. Last day to help me win my old rescue #HilbraesDogs £5k https://t.co/Y8g47fLYY1 httpp
://t.co/DDco9wVXtP","geo":null,"retweeted":false,"in_reply_to_screen_name":"MinaANDMaya","possibly_sensitive":false,"truncated":false,"lang":"en","entities":{"trends":[],"symbols":[],"urls":[{"expanded_url":"https://www.animalfriendsquote.co.uk/fb-worldcup/","indices":[93,116],"display_url":"animalfriendsquote.co.uk/fb-worldcup/","url":"https://t.co/Y8g47fLYY1"}],"hashtags":[],"media":[{"sizes":{"thumb":{"w":150,"resize":"crop","h":150},"small":{"w":340,"resize":"fit","h":455},"large":{"w":706,"resize":"fit","h":946},"medium":{"w":600,"resize":"fit","h":803}},"id":488926730481332224,"media_url_https":"https://pbs.twimg.com/media/BskERVuIcAAJZGu.jpg","media_url":"http://pbs.twimg.com/media/BskERVuIcAAJZGu.jpg","expanded_url":"http://twitter.com/BulleyBufton/status/488926827394904064/photo/1","indices":[117,139],"id_str":"488926730481332224","type":"photo","display_url":"pic.twitter.com/DDco9wVXtP","url":"http://t.co/DDco9wVXtP"}],"user_mentions":[{"id":132204038,"name":"Mina*Bad Yoga Kitty*","indices":[0,12],"screen_name":"MinaANDMaya","id_str":"132204038"},{"id":2308374684,"name":"Julianna Kaminski","indices":[75,88],"screen_name":"HilbraesDogs","id_str":"2308374684"}]},"in_reply_to_status_id_str":null,"id":488926827394904064,"source":"<a href=\"http://twitter.com/download/android\" rel=\"nofollow\">Twitter for Android<\/a>","in_reply_to_user_id_str":"132204038","favorited":false,"in_reply_to_status_id":null,"retweet_count":6,"created_at":"Tue Jul 15 06:03:34 +0000 2014","in_reply_to_user_id":132204038,"favorite_count":3,"id_str":"488926827394904064","place":null,"user":{"location":"CHICAGO , USA","default_profile":false,"statuses_count":8868,"profile_background_tile":true,"lang":"en","profile_link_color":"AD54E8","profile_banner_url":"https://pbs.twimg.com/profile_banners/225136520/1403608773","id":225136520,"following":null,"favourites_count":5082,"protected":false,"profile_text_color":"3D1957","verified":false,"description":"I'm Bulley, I'm proof that there is always hope.\r\nI was in rescue kennels in UK for 9yrs. #ada_bscakes took me in.\r\nWe've moved to America to start a new life.","contributors_enabled":false,"profile_sidebar_border_color":"000000","name":"BULLEY","profile_background_color":"0A0A0A","created_at":"Fri Dec 10 19:55:17 +0000 2010","default_profile_image":false,"followers_count":3421,"profile_image_url_https":"https://pbs.twimg.com/profile_images/486614595457789952/gtcLac9w_normal.jpeg","geo_enabled":true,"profile_background_image_url":"http://pbs.twimg.com/profile_background_images/378800000166829702/isbjd7O4.jpeg","profile_background_image_url_https":"https://pbs.twimg.com/profile_background_images/378800000166829702/isbjd7O4.jpeg","follow_request_sent":null,"url":null,"utc_offset":-39600,"time_zone":"International Date Line West","notifications":null,"profile_use_background_image":true,"friends_count":3702,"profile_sidebar_fill_color":"7AC3EE","screen_name":"BulleyBufton","id_str":"225136520","profile_image_url":"http://pbs.twimg.com/profile_images/486614595457789952/gtcLac9w_normal.jpeg","listed_count":29,"is_translator":false},"coordinates":null},"geo":null,"entities":{"trends":[],"symbols":[],"urls":[{"expanded_url":"https://www.animalfriendsquote.co.uk/fb-worldcup/","indices":[111,134],"display_url":"animalfriendsquote.co.uk/fb-worldcup/","url":"https://t.co/Y8g47fLYY1"}],"hashtags":[],"media":[{"sizes":{"thumb":{"w":150,"resize":"crop","h":150},"small":{"w":340,"resize":"fit","h":455},"large":{"w":706,"resize":"fit","h":946},"medium":{"w":600,"resize":"fit","h":803}},"id":488926730481332224,"media_url_https":"https://pbs.twimg.com/media/BskERVuIcAAJZGu.jpg","media_url":"http://pbs.twimg.com/media/BskERVuIcAAJZGu.jpg","expanded_url":"http://twitter.com/BulleyBufton/status/488926827394904064/photo/1","source_status_id_str":"488926827394904064","indices":[139,140],"source_status_id":488926827394904064,"id_str":"488926730481332224","type":"photo","display_url":"pic.twitter.com/DDco9wVXtP","url":"http://t.co/DDco9wVXtP"}],"user_mentions":[{"id":225136520,"name":"BULLEY","indices":[3,16],"screen_name":"BulleyBufton","id_str":"225136520"},{"id":132204038,"name":"Mina*Bad Yoga Kitty*","indices":[18,30],"screen_name":"MinaANDMaya","id_str":"132204038"},{"id":2308374684,"name":"Julianna Kaminski","indices":[93,106],"screen_name":"HilbraesDogs","id_str":"2308374684"}]},"source":"<a href=\"http://twitter.com/download/android\" rel=\"nofollow\">Twitter for Android<\/a>","favorited":false,"in_reply_to_user_id":null,"retweet_count":0,"id_str":"488927960280211456","user":{"location":"","default_profile":false,"statuses_count":1370,"profile_background_tile":true,"lang":"zh-tw","profile_link_color":"038544","profile_banner_url":"https://pbs.twimg.com/profile_banners/2272804116/1404662156","id":2272804116,"following":null,"favourites_count":2000,"protected":false,"profile_text_color":"333333","verified":false,"description":"No More Sorrow","contributors_enabled":false,"profile_sidebar_border_color":"000000","name":"Winnie","profile_background_color":"14DBBA","created_at":"Thu Jan 02 10:13:01 +0000 2014","default_profile_image":false,"followers_count":311,"profile_image_url_https":"https://pbs.twimg.com/profile_images/478106512083017728/4ao_8JjE_normal.jpeg","geo_enabled":false,"profile_background_image_url":"http://pbs.twimg.com/profile_background_images/431815421189029888/YrRNpUfd.jpeg","profile_background_image_url_https":"https://pbs.twimg.com/profile_background_images/431815421189029888/YrRNpUfd.jpeg","follow_request_sent":null,"url":null,"utc_offset":null,"time_zone":null,"notifications":null,"profile_use_background_image":true,"friends_count":455,"profile_sidebar_fill_color":"DDEEF6","screen_name":"winnie341881","id_str":"2272804116","profile_image_url":"http://pbs.twimg.com/profile_images/478106512083017728/4ao_8JjE_normal.jpeg","listed_count":0,"is_translator":false}}
PigScript:
REGISTER '/tmp/elephant-bird-hadoop-compat-4.1.jar';
REGISTER '/tmp/elephant-bird-pig-4.1.jar';
A = LOAD 'input.json ' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS myMap;
B = FOREACH A GENERATE myMap#'id' AS ID,myMap#'created_at' AS createdAT,myMap#'user' AS User;
DUMP B;
Output:
(488927960280211456,Tue Jul 15 06:08:04 +0000 2014,[location#,default_profile#false,profile_background_tile#true,statuses_count#1370,lang#zh-tw,profile_link_color#038544,profile_banner_url#https://pbs.twimg.com/profile_banners/2272804116/1404662156,id#2272804116,following#,protected#false,favourites_count#2000,profile_text_color#333333,contributors_enabled#false,description#No More Sorrow,verified#false,name#Winnie,profile_sidebar_border_color#000000,profile_background_color#14DBBA,created_at#Thu Jan 02 10:13:01 +0000 2014,default_profile_image#false,followers_count#311,geo_enabled#false,profile_image_url_https#https://pbs.twimg.com/profile_images/478106512083017728/4ao_8JjE_normal.jpeg,profile_background_image_url#http://pbs.twimg.com/profile_background_images/431815421189029888/YrRNpUfd.jpeg,profile_background_image_url_https#https://pbs.twimg.com/profile_background_images/431815421189029888/YrRNpUfd.jpeg,follow_request_sent#,url#,utc_offset#,time_zone#,notifications#,friends_count#455,profile_use_background_image#true,profile_sidebar_fill_color#DDEEF6,screen_name#winnie341881,id_str#2272804116,profile_image_url#http://pbs.twimg.com/profile_images/478106512083017728/4ao_8JjE_normal.jpeg,is_translator#false,listed_count#0])
In elephantbird library all the values will be stored as key/value pair(ie MAP datatype), so it will be easy to extract the required fields from the loaded data.
In the above pigscript i have extracted the value of 'id','created_at' and 'user' as per your need.
Suppose you want to extract some fields from 'user' data( ex: 'friends_count' and 'followers_count'), in that case you need to project the 'user' field and extract the required data. sample code below.
PigScript:
REGISTER '/tmp/elephant-bird-hadoop-compat-4.1.jar';
REGISTER '/tmp/elephant-bird-pig-4.1.jar';
A = LOAD 'input.json ' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS myMap;
B = FOREACH A GENERATE 'user' AS User;
C = FOREACH B GENERATE User#'friends_count', User#'followers_count';
DUMP C;
Output:
(455,311)

MySQL Query and POSIXct in Loop

I have the following MySQL Query using RMySQL. All the Database parameters are set before and Query is working well. Is there a possibility to put it in a loop to get multiple zoo objects from more than one dpname? Thanks!
dpname.df<-"%Name%"
paste.query.df<-paste("select handle from db.connect where dpname like '",dpname.df,"'",sep='')
handle.df<-dbGetQuery(dbLT,paste.query.df)
paste.query2.df<-paste("select time,value from db.data where handle='",handle.df,"' and time between '",x,"' and '",y,"'",sep='')
df <- dbGetQuery(dbLT,paste.query2.df)
df$time<-as.POSIXct(df$time,format="%Y-%m-%d %H:%M:%S")
df.zoo<-zoo(df[,-1],df[,1])
I tried to set a function with mapply:
query<-function(x,y,dpname.df)
{
paste.query.df<-paste("select handle from db.connect where dpname like '",dpname.df,"'",sep='')
handle.df<-dbGetQuery(dbLT,paste.query.df)
paste.query2.df<-paste("select time,value from db.data where handle='",handle.df,"' and time between '",x,"' and '",y,"'",sep='')
dbGetQuery(dbLT,paste.query2.df)
}
which I can run: with mapply(query,x,y,dpname.df)
But I can't get multiple outputs for each! query. Is it possible to set another List with the output names? So I could also put the zoo and POSIXct stuff in my function.Thanks!