Querying Last Column in a text file using apache-drill - apache-drill

I am unable to get the last column value Any suggestions, Selecting columns[2] or the last column name does not work
I have the text file as
PolicyID~Name~AccountID
1~ABC Holdings~12
2~XYZ Insurance~2
When i do a Select * from dfs.root it works
When i do either AccountID or columns[2] i get an error.
How do i query the last column in drill?..I get the following error when i query the last column
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: IndexOutOfBoundsException: index: 32384, length: 4 (expected: range(0, 16384)) Fragment 0:0 [Error Id: dba982b9-0517-4ab2-a61f-dbf737c253da on 10.100.94.28:31010]

Related

Values from xml data field in mysql

i would like to know if there is a query to select values from all of my xml data fields. There are around 1k rows which has xml data. All of them has almost the same data structure. With extract value i was able to extract one data field but at the point where more than one row is part of my subquery it breaks.
Here is an example xml data inside my db:
<EDLXML version="1.0.0" type="variable">
<properties id="template_variables">
<deliveredDuration>4444</deliveredDuration>
<deliveredNum>1</deliveredNum>
<comment/>
<projectname>cdfkusen</projectname>
<name>kral_schalke_trenink</name>
<order_id>372846</order_id>
<cutlistId>2763_ID</cutlistId>
<bcutlistId>51ddgf7a6-1268-1gdfged-95e6-5254000e8e1a</bcutlistId>
<num>1</num>
<duration>177760</duration>
<quotaRelevantDuration>0</quotaRelevantDuration>
<organisationUid>OrgName</organisationUid>
<organisationQuota>333221233</organisationQuota>
<organisationUsedQuota>123</organisationUsedQuota>
<organisationContingentIrrelevantQuotaUsed>54</organisationContingentIrrelevantQuotaUsed>
<userDbId>7xxxx84-eb9b-11fdsb-9ddd1-52cccccde1a</userDbId>
<userId>xxxx</userId>
<userRights>RH_DBC</userRights>
<firstName>DThom</firstName>
<lastName>Test</lastName>
<userMail>xxx#ccc.cz</userMail>
<language>English</language>
<orderTimestamp>1659448080</orderTimestamp>
<stitching>false</stitching>
<transcode>NO</transcode>
<destination>Standard</destination>
<collaboration>private</collaboration>
<premiumUser>false</premiumUser>
<priority>normal</priority>
<userMail2>xxx#ccc.cz</userMail2>
<cutlistItems>
<cutListId>125124_KFC</cutListId>
<cutListItemId cutlistItemDeliveryStatus="&#10004" cutlistItemDStatusMessage="delivered">112799</cutListItemId>
<bmarkerId>8f16ff80-1269-11ed-95e6-5254000e8e1a</bmarkerId>
<videoId>2912799</videoId>
<counter>1</counter>
<frameInSpecified>true</frameInSpecified>
<frameIn>15638</frameIn>
<frameOutSpecified>true</frameOutSpecified>
<frameOut>20082</frameOut>
<tcIn>00:10:25:13</tcIn>
<tcOut>00:13:23:07</tcOut>
<duration>177760</duration>
<BroadcastDate>2021-07-24</BroadcastDate>
<eventDate>2021-07-24</eventDate>
<resolutionFacet>HD</resolutionFacet>
<provider>DBC</provider>
<technicalrightholders>RH_DBC</technicalrightholders>
<rights>DBC</rights>
<materialType>DP</materialType>
<targetFilename>kral_schalke_trenink</targetFilename>
</cutlistItems>
</properties>
</EDLXML>
I got the right value from query if i do:
SELECT ExtractValue((SELECT job_xml from cutlist where job_xml is not null LIMIT 1), '//deliveredNum');
But when i change the limit amount i get back: Subquery return more than one row.
extractvalue expects two string arguments. When your subquery returns more than one row, you are not simply passing a string as the first argument (you are passing a set of results).
Instead of calling extractvalue once for your entire query, call it once for every row, like:
SELECT ExtractValue(job_xml, '//deliveredNum')
FROM cutlist
WHERE job_xml IS NOT NULL

Error when trying to split a column using powerquery in Azure Data Factory -UserQuery : Expression.Error: An error occurred invoking 'Table.AddColumn'

I get the following error when trying to split a column by space delimiter on PowerQuery in Data Factory :
UserQuery : Expression.Error: An error occurred invoking 'Table.AddColumn': We can't get the expression for the specified value.
What is causing this and how would I go about resolving it?
Many thanks
This is the error
The PowerQuery itself is :
let
Source = dedupedetipscsv,
#"Split Column by Delimiter" = Table.SplitColumn(Source, "Candidate", Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.Csv, true), {"Candidate.1", "Candidate.2"}),
#"Split Column by Delimiter1" = Table.SplitColumn(Table.TransformColumnTypes(#"Split Column by Delimiter", {{"ApprovedDate", type text}}, "en-GB"), "ApprovedDate", Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.Csv, true), {"ApprovedDate.1", "ApprovedDate.2"})
in
#"Split Column by Delimiter1"
Note: Power Query will split the column into as many columns as needed. The name of the new columns will contain the same name as the
original column. A suffix that includes a dot and a number that
represents the split sections of the original column will be appended
to the name of the new columns.
In the Table.AddColumn step might refer to variable which is a List. You need to refer to #"Renamed Columns" which is the last step that results in a table.
Split columns by delimiter into columns | Here
Make sure : An alternate for split by length and by position is listed below | M script workarounds
Table.AddColumn(Source, "First characters", each Text.Start([Email], 7), type text)
Table.AddColumn(#"Inserted first characters", "Text range", each Text.Middle([Email], 4, 9), type text)

Azure Data Factory - MS Access as Source Database - Error

My source is 'Access Database'
Dynamically generating Source query as 'Select * from <tableName>'
But I got field names with spaces in source table, and destination is of type .parquet, Data Factory pipeline is failing with below error
Example if Table Employee got a column 'First Name'
{
"errorCode": "2200",
"message": "Failure happened on 'Sink' side. ErrorCode=UserErrorJavaInvocationException,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=An error occurred when invoking java,
message: java.lang.IllegalArgumentException:field ended by ';': expected ';' but got 'Area' at line 0: message adms_schema { optional binary RAM Area\ntotal entry:10\r
\norg.apache.parquet.schema.MessageTypeParser.check(MessageTypeParser.java:215)\r\norg.apache.parquet.schema.MessageTypeParser.addPrimitiveType(MessageTypeParser.java:188)\r
\norg.apache.parquet.schema.MessageTypeParser.addType(MessageTypeParser.java:112)\r\norg.apache.parquet.schema.MessageTypeParser.addGroupTypeFields(MessageTypeParser.java:100)\r
\norg.apache.parquet.schema.MessageTypeParser.parse(MessageTypeParser.java:93)\r\norg.apache.parquet.schema.MessageTypeParser.parseMessageType(MessageTypeParser.java:83)\r
\ncom.microsoft.datatransfer.bridge.parquet.ParquetWriterBuilderBridge.getSchema(ParquetWriterBuilderBridge.java:187)\r\ncom.microsoft.datatransfer.bridge.parquet.ParquetWriterBuilderBridge.build
(ParquetWriterBuilderBridge.java:159)\r\ncom.microsoft.datatransfer.bridge.parquet.ParquetWriterBridge.open(ParquetWriterBridge.java:13)\r
\ncom.microsoft.datatransfer.bridge.parquet.ParquetFileBridge.createWriter(ParquetFileBridge.java:27)\r
\n,Source=Microsoft.DataTransfer.Common,''Type=Microsoft.DataTransfer.Richfile.JniExt.JavaBridgeException,Message=,Source=Microsoft.DataTransfer.Richfile.HiveOrcBridge,'",
"failureType": "UserError",
"target": "Copy Table Data to Sink",
"details": []
}
if i change query to SELECT [First Name] as FirstName from Employee, it works fine.
As am generating query dynamically, i was using '*'
Is there some setting on Sink (.parquet) to ignore spaces in column names?
EDIT some info here https://issues.apache.org/jira/browse/SPARK-4521, not sure how to deal in ADF.
And this link: https://github.com/MicrosoftDocs/azure-docs/issues/28320

Loading CSV Neo4j "Neo.ClientError.Statement.SemanticError: Cannot merge node using null property value for Test1'"

I am using grades.csv data from the link below,
https://people.sc.fsu.edu/~jburkardt/data/csv/csv.html
I noticed that all the strings in the csv file were in "" and it causes
error messages:
Neo.ClientError.Statement.SemanticError: Cannot merge node using null property value for Test1
so I removed the "" in the headers
the code I was trying to run:
LOAD CSV WITH HEADERS FROM 'file:///grades.csv' AS row
MERGE (t:Test1 {Test1: row.Test1})
RETURN count(t);
error message:
Neo.ClientError.Statement.SyntaxError: Type mismatch: expected Any, Map, Node, Relationship, Point, Duration, Date, Time, LocalTime, LocalDateTime or DateTime but was List<String> (line 2, column 24 (offset: 65))
"MERGE (t:Test1 {Test1: row.Test1})
Basically you can not merge node using null property value. In your case, Test1 must be null for one or more lines in your file. If you don't see blank values for Test1, please check is there is any blank line at the end of file.
You can also handle null check before MERGE using WHERE, like
LOAD CSV ...
WHERE row.Test1 IS NOT NULL
MERGE (t:Test1 {Test1: row.Test1})
RETURN count(t);
The issues are:
The file is missing a comma after the Test1 value in the row for "Airpump".
The file has white spaces between the values in each row. (Search for the regexp ", +" and replace with ",".)
Your query should work after fixing the above issues.

Active Record: JSON Query

Inside my database model, I've got a json field which has the following structure:
json_field: {"data"=>{"key_1"=>"value1", "key_2"=>"value"} }
Trying to query this using select:
Model.select(:id, "json_field -> 'data'")
Model.select(:id, "json_field -> 'data' as data")
yields the array of objects, but without the json field selected.
#<ActiveRecord::Relation [#<Model id: 1, Model id: 2 ...>]
Thanks for any help.
This:
#<ActiveRecord::Relation [#<Model id: 1, Model id: 2 ...>]
is the result of calling inspect on the query and inspect will only display columns that the model knows about it. The model will query the table for the columns during startup so it will only know about columns that are actually in the table.
ActiveRecord creates column accessor methods on the fly using method_missing so it can create methods things in a query that aren't columns in the actual table.
So your data is there, you just have to ask for it by name, for example:
Model.select(:id, "json_field -> 'data' as data").map(&:data)
will give you the data values.