I have a csv file, I am converting this to Json array format. Below are the row wise operations in expression transformation for the two fields.
region
country
Json(output port): '{'||'"region": '||'"'||Region||'",'||'"Country": '||'"'||Country||'"'||'},'
output:
{"region": "Australia and Oceania", "Country": "Tuvalu"},
This output is saved in a text file with session file properties as fixed width.
enter code here
second mapping expression:
JSON(input)
V_JSON_start(variable port):INSTR(JSON,'{',1,1)
V_JSON_end(variable port):instr(JSON,'}',1,10)
O_Json(output port):'['||substr(JSON,V_JSON_start,V_JSON_end)||']'
output:
[{"region": "Australia and Oceania","Country": "Tuvalu"},
{"region": "Central America and the Caribbean","Country": "Grenada"}]
when I try to fetch next 10 records as json, it pulls 20 records instead of ten
below is the expression:
JSON(input)
V_JSON_start(variable port):INSTR(JSON,'{',1,11)
V_JSON_end(variable port):instr(JSON,'}',1,20)
O_Json(output port):'['||substr(JSON,V_JSON_start,V_JSON_end)||']'
Kindly look into this and correct where i am missing.
input: flat file(csv with two fields region and country))
expected output:(5 sessions each session 10 records in json format)
eg., [{"region";"value","country":"value"},
{"region":"value","country":"value"}]
session1(csv to json) -->session 2--session3--session4--session5--session6(all parallel sessions using the file of 1st session 5 records in json format)
i would like to know if there is a query to select values from all of my xml data fields. There are around 1k rows which has xml data. All of them has almost the same data structure. With extract value i was able to extract one data field but at the point where more than one row is part of my subquery it breaks.
Here is an example xml data inside my db:
<EDLXML version="1.0.0" type="variable">
<properties id="template_variables">
<deliveredDuration>4444</deliveredDuration>
<deliveredNum>1</deliveredNum>
<comment/>
<projectname>cdfkusen</projectname>
<name>kral_schalke_trenink</name>
<order_id>372846</order_id>
<cutlistId>2763_ID</cutlistId>
<bcutlistId>51ddgf7a6-1268-1gdfged-95e6-5254000e8e1a</bcutlistId>
<num>1</num>
<duration>177760</duration>
<quotaRelevantDuration>0</quotaRelevantDuration>
<organisationUid>OrgName</organisationUid>
<organisationQuota>333221233</organisationQuota>
<organisationUsedQuota>123</organisationUsedQuota>
<organisationContingentIrrelevantQuotaUsed>54</organisationContingentIrrelevantQuotaUsed>
<userDbId>7xxxx84-eb9b-11fdsb-9ddd1-52cccccde1a</userDbId>
<userId>xxxx</userId>
<userRights>RH_DBC</userRights>
<firstName>DThom</firstName>
<lastName>Test</lastName>
<userMail>xxx#ccc.cz</userMail>
<language>English</language>
<orderTimestamp>1659448080</orderTimestamp>
<stitching>false</stitching>
<transcode>NO</transcode>
<destination>Standard</destination>
<collaboration>private</collaboration>
<premiumUser>false</premiumUser>
<priority>normal</priority>
<userMail2>xxx#ccc.cz</userMail2>
<cutlistItems>
<cutListId>125124_KFC</cutListId>
<cutListItemId cutlistItemDeliveryStatus="✔" cutlistItemDStatusMessage="delivered">112799</cutListItemId>
<bmarkerId>8f16ff80-1269-11ed-95e6-5254000e8e1a</bmarkerId>
<videoId>2912799</videoId>
<counter>1</counter>
<frameInSpecified>true</frameInSpecified>
<frameIn>15638</frameIn>
<frameOutSpecified>true</frameOutSpecified>
<frameOut>20082</frameOut>
<tcIn>00:10:25:13</tcIn>
<tcOut>00:13:23:07</tcOut>
<duration>177760</duration>
<BroadcastDate>2021-07-24</BroadcastDate>
<eventDate>2021-07-24</eventDate>
<resolutionFacet>HD</resolutionFacet>
<provider>DBC</provider>
<technicalrightholders>RH_DBC</technicalrightholders>
<rights>DBC</rights>
<materialType>DP</materialType>
<targetFilename>kral_schalke_trenink</targetFilename>
</cutlistItems>
</properties>
</EDLXML>
I got the right value from query if i do:
SELECT ExtractValue((SELECT job_xml from cutlist where job_xml is not null LIMIT 1), '//deliveredNum');
But when i change the limit amount i get back: Subquery return more than one row.
extractvalue expects two string arguments. When your subquery returns more than one row, you are not simply passing a string as the first argument (you are passing a set of results).
Instead of calling extractvalue once for your entire query, call it once for every row, like:
SELECT ExtractValue(job_xml, '//deliveredNum')
FROM cutlist
WHERE job_xml IS NOT NULL
The software I'm using saves a copy of the data that I think is json in an extra-different table when I do records in the database.
What I want to do is to be able to query the json data contained in the DATASETS column separately.
I'm using SQL 2012 as my server
This is the query I tried so far:
SELECT TOP 1 IND, SNAPSHOTDATE, DATASETS, USERNAME, OWNERFORM
FROM TBLSNAPSHOTS
CODE RESULT:
105 2018-09-14 02:59:34.000 { "Datasets": [{"Name": "TBLSTOKLAR","Lines": [{"IND": "102","STOKNO": "","MALINCINSI": "TITIZ PLASTIK BUYUK KASIK 10 ADET","STOKKODU": "8691262708050","ANABIRIM": "102","BIRIMEX": "102","ALTSEVIYE": "","KRITIKSEVIYE": "","USTSEVIYE": "","DEPOSEVIYESI": "True","URETICI": "","AYLIKVADE": "0","SERINO": "","DEPO": "1","STOKGRUBU": "","GARANTI": "0","PRIM": "0","IPTAL": "False","STOKTIPI": "0","STOKTAKIP": "0","TEMINYERI": "1","RAFOMRU": "0","RESIM": "","KALAN": "0","REZERV": "0","KOD1": "","KOD2": "","KOD3": "","KOD4": "","KOD5": "","KOD6": "","KOD7": "","KOD8": "","KOD9": "","KOD10": "","TAKSITSAYISI": "0","ISTIHBARAT": "","FIYATYOK": "","DELETED": "","ALISFIYATI": "0","ESKIALISFIYATI": "0","SONALISTARIHI": "","SONSATISTARIHI": "","KARTINACILMATARIHI": "14.09.18 ı. 02:57:58","DEVIRIND": "","MALIYET": "1","KDVGRUBU": "1","AKTIF": "False","ISCILIKIND": "0","ISCILIKBIRIMIND": "0","ISCILIKACIKLAMA": "","ISCILIKSTOKKODU": "","ALISFIYATIDEGISMETARIHI": "","STATUS": "1","DALISFIYATI": "","APB": "","OIV": "0","KARORANI": "0","OTV": "0","ISK": "0","STOKGRUPTANIMI": "","ISKSATISFIYATI2": "0","ISKSATISFIYATI3": "0","ALISKDVORANI": "18","ALISISKORANI": "","SIPARISALINMASIN": "False","SIPARISVERILMESIN": "False","P1": "","P2": "","P3": "","SATISKOSULU": "","DEFAULTALISFIYATI": "","DEFAULTALISFIYATIDEGISMESTARIHI": "","KDVGRUBUT": "","HEDEFSATISFIAYTI": "","KURUMISKONTOSU": "","TICARIISKONTO": "","ITSBILDIRIMI": "False","MAXISKORANI": "","IMALATCISATISFIYATI": "","DKUR": "1","ACILSEVK": "False","SOGUKSEVK": "False","ICMIKTAR": "","TICARISEKIL": "","MAXISKTUTAR": "","TAXE": "","KOD11": "","DAPB": "","IKINCIEL": "","ETICARET": "","STOKNEVI": "0","OTVORANSAL": "True","POZ": "","YAZARKASA": "False","KOD12": "","KOD13": "","KOD14": "","KOD15": "","KOD16": "","KOD17": "","KOD18": "","KOD19": "","KOD20": "","KOD21": "","UID": "{0DE71D73-E447-45B0-BF6A-1D312DBAFDD2}"}]}]} ADMIN frmEdtStok```
In SQL 2012 - no, you can't directly query the JSON. In SQL 2016 they added functions to let you do this:
https://learn.microsoft.com/en-us/sql/t-sql/functions/json-query-transact-sql?view=sql-server-2017
But if you need to stay on 2012 you are limited to String parsing it (don't do this), or writing/finding a CLR function which parses it using .Net code and returns the results
If you simply must do it quickly there are some hackey solutions to parse it like so: https://www.red-gate.com/simple-talk/sql/t-sql-programming/consuming-json-strings-in-sql-server/ but don't expect it to work smoothly with complex json
I am trying to use Spark to read a csv file in jupyter notebook. So far I have
spark = SparkSession.builder.master("local[4]").getOrCreate()
reviews_df = spark.read.option("header","true").csv("small.csv")
reviews_df.collect()
This is how the reviews_df looks like:
[Row(reviewerID=u'A1YKOIHKQHB58W', asin=u'B0001VL0K2', overall=u'5'),
Row(reviewerID=u'A2YB0B3QOHEFR', asin=u'B000JJSRNY', overall=u'5'),
Row(reviewerID=u'AAI0092FR8V1W', asin=u'B0060MYKYY', overall=u'5'),
Row(reviewerID=u'A2TAPSNKK9AFSQ', asin=u'6303187218', overall=u'5'),
Row(reviewerID=u'A316JR2TQLQT5F', asin=u'6305364206', overall=u'5')...]
But each row of the data frame contains the column names, how can I reformat the data, so that it can become:
[(u'A1YKOIHKQHB58W', u'B0001VL0K2', u'5'),
(u'A2YB0B3QOHEFR', u'B000JJSRNY', u'5')....]
Dataframe always returns Row objects, thats why when you issue collect() on dataframe, it shows -
Row(reviewerID=u'A1YKOIHKQHB58W', asin=u'B0001VL0K2', overall=u'5')
to get what you want, you can do -
reviews_df.rdd.map(lambda row : (row.reviewerID,row.asin,row.overall)).collect()
this will return you tuple of values of rows
because i can not use spark csv i have manually created a dataframe from CSV as follow:
raw_data=sc.textFile("data/ALS.csv").cache()
csv_data=raw_data.map(lambda l:l.split(","))
header=csv_data.first()
csv_data=csv_data.filter(lambda line:line !=header)
row_data=csv_data.map(lambda p :Row (
location_history_id=p[0],
user_id=p[1],
latitude=p[2],
longitude=p[3],
address=p[4],
created_at=p[5],
valid_until=p[6],
timezone_offset_secs=p[7],
opening_times_id=p[8],
timezone_id=p[9]))
location_df = sqlContext.createDataFrame(row_data)
location_df.registerTempTable("locations")
i need only two columns :
lati_longi_df=sqlContext.sql("""SELECT latitude, longitude FROM locations""")
rdd_lati_longi = lati_longi_df.map(lambda data: Vectors.dense([float(c) for c in data]))
rdd_lati_longi.take(2):
[DenseVector([-6.2416, 106.7949]),
DenseVector([-6.2443, 106.7956])]
now it seems that every thing is ready for KMeans training:
clusters = KMeans.train(rdd_lati_longi, 10, maxIterations=30,
runs=10, initializationMode="random")
but i get the following error:
IndexError: list index out of range
First three lines of ALS.csv:
location_history_id,user_id,latitude,longitude,address,created_at,valid_until,timezone_offset_secs,opening_times_id,timezone_id
Why don't you allow spark to parse csv instead? You can enable csv support with something like this:
pyspark --packages com.databricks:spark-csv_2.10:1.4.0