how to add more storage plugins programatically in apache drill? - apache-drill

I tried drill JDBC driver to query programmatically.
Useful portion of code:
Connection conn = new Driver().connect("jdbc:drill:zk=local", getDefaultProperties());
Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery("show databases");
while (rs.next())
{
String SCHEMA_NAME = rs.getString("SCHEMA_NAME");
System.out.println(SCHEMA_NAME);
}
public static Properties getDefaultProperties()
{
final Properties properties = new Properties();
properties.setProperty(ExecConstants.HTTP_ENABLE, "false");
return properties;
}
Everything worked fine till I have cp & dfs storage plugin. Output of above query:
INFORMATION_SCHEMA
cp.default
dfs.default
dfs.root
dfs.tmp
sys
But when I added mongo as storage plugin with configuration:
{
"type": "mongo",
"connection": "mongodb://localhost:27017/",
"enabled": false
}
I got following exception:
java.sql.SQLException: Failure in starting embedded Drillbit: java.lang.RuntimeException: Unable to deserialize "/tmp/drill/sys.storage_plugins/mongo.sys.drill"
at org.apache.drill.jdbc.impl.DrillConnectionImpl.<init>(DrillConnectionImpl.java:109)
at org.apache.drill.jdbc.impl.DrillJdbc41Factory.newDrillConnection(DrillJdbc41Factory.java:66)
at org.apache.drill.jdbc.impl.DrillFactory.newConnection(DrillFactory.java:69)
at net.hydromatic.avatica.UnregisteredDriver.connect(UnregisteredDriver.java:126)
at org.apache.drill.jdbc.Driver.connect(Driver.java:78)
at com.mkyong.App.main(App.java:37)
Caused by: java.lang.RuntimeException: Unable to deserialize "/tmp/drill/sys.storage_plugins/mongo.sys.drill"
at org.apache.drill.exec.store.sys.local.FilePStore.get(FilePStore.java:140)
at org.apache.drill.exec.store.sys.local.FilePStore$Iter$DeferredEntry.getValue(FilePStore.java:219)
at org.apache.drill.exec.store.StoragePluginRegistry.createPlugins(StoragePluginRegistry.java:168)
at org.apache.drill.exec.store.StoragePluginRegistry.init(StoragePluginRegistry.java:132)
at org.apache.drill.exec.server.Drillbit.run(Drillbit.java:244)
at org.apache.drill.jdbc.impl.DrillConnectionImpl.<init>(DrillConnectionImpl.java:100)
... 5 more
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Could not resolve type id 'mongo' into a subtype of [simple type, class org.apache.drill.common.logical.StoragePluginConfig]
at [Source: [B#21318883; line: 2, column: 3]
at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148)
at com.fasterxml.jackson.databind.DeserializationContext.unknownTypeException(DeserializationContext.java:849)
at com.fasterxml.jackson.databind.jsontype.impl.TypeDeserializerBase._findDeserializer(TypeDeserializerBase.java:167)
at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId(AsPropertyTypeDeserializer.java:99)
at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject(AsPropertyTypeDeserializer.java:84)
at com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType(AbstractDeserializer.java:132)
at com.fasterxml.jackson.databind.deser.impl.TypeWrappedDeserializer.deserialize(TypeWrappedDeserializer.java:41)
at com.fasterxml.jackson.databind.ObjectReader._bindAndClose(ObjectReader.java:1269)
at com.fasterxml.jackson.databind.ObjectReader.readValue(ObjectReader.java:912)
at org.apache.drill.exec.store.sys.serialize.JacksonSerializer.deserialize(JacksonSerializer.java:44)
at org.apache.drill.exec.store.sys.local.FilePStore.get(FilePStore.java:138)
... 10 more
Also, how can I add plugin configuration programmatically?
Edit: Similar behaviour for hive.

Drill provides a REST API. I've used the curl command shown in the docs:
curl -X POST -/json" -d '{"name":"myplugin", "config": {"type": "file", "enabled": false, "connection": "file:///", "workspaces": { "root": { "location": "/", "writable": false, "defaultInputFormat": null}}, "formats": null}}' http://localhost:8047/storage/myplugin.json

Related

AWS GlueStudio to Snowflake JDBC: An error occurred while calling pyWriteDynamicFrame. No suitable driver

I'm trying to move data from a Data Catalog table (MySQL) through AWS Glue (in visual mode of GlueStudio) into a snowflake table.
For this, I'm following this guide Performing data transformations using Snowflake and AWS Glue [1]
I'm following every part of it, but when executing my job, I get the error
An error occurred while calling xxx.pyWriteDynamicFrame. No suitable driver
(yes, other than the stack trace, there's no more info in the error message)
I've tested everything I could think of, like:
Having access to the s3 bucket of the driver
Having network access
I've Tried an incomplete JDBC ULR (no password) and the error says so
I've tried giving the wrong password, and I get the appropriate error
One thing I've found is that a lot of issues I've found are reported on AWS Glue as a script (not for the visual editor) and in many of them, they're using two jars: the snowflake JDBC driver and the Snowflake Spark Connector.
Even though the tutorial I followed[1] isn't that clear, I tried having both files in my 'drivers' bucket, but still the same error.
I've tried many versions of both files to no avail.
So I have no idea what that might be (I even tried on a different AWS account and a different Snowflake account so I could have full access to resources)
Did any of you have tried this setup?
I'm using:
AWS GlueStudio (June 2021)
Snowflake cloud version 5.22.1 (got that with select CURRENT_VERSION(); in Snowflake)
Snowflake JDBC driver v3.13.4
Snowflake spark connector v2.9.0-spark_2.4 for scala v2.12
My connection string: jdbc:snowflake://xxx00000.snowflakecomputing.com/?user=${Username}&password=${Password}&warehouse=${wh}&db=${db}&schema=${schema}
[Edit Jun 23] As for #jonlegend comment:
I'm using the visual editor for this job. So I'm not in control of the code implementation. Nevertheless, I'll post the code generated:
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
## #params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
## #type: DataSource
## #args: [database = "mydb_name", table_name = "mytable_name", transformation_ctx = "DataSource0"]
## #return: DataSource0
## #inputs: []
DataSource0 = glueContext.create_dynamic_frame.from_catalog(database = "mydb_name", table_name = "mytable_name", transformation_ctx = "DataSource0")
## #type: ApplyMapping
## #args: [mappings = [("account_number", "string", "account_number", "string"), ("user_id", "int", "user_id", "int"), ("description", "string", "description", "string"), ("id", "int", "id", "int"), ("group_account_id", "int", "group_account_id", "int"), ("updated", "timestamp", "updated", "timestamp")], transformation_ctx = "Transform0"]
## #return: Transform0
## #inputs: [frame = DataSource0]
Transform0 = ApplyMapping.apply(frame = DataSource0, mappings = [("account_number", "string", "account_number", "string"), ("user_id", "int", "user_id", "int"), ("description", "string", "description", "string"), ("id", "int", "id", "int"), ("group_account_id", "int", "group_account_id", "int"), ("updated", "timestamp", "updated", "timestamp")], transformation_ctx = "Transform0")
## #type: DataSink
## #args: [connection_type = "custom.jdbc", connection_options = {"dbTable":"myschema.myTargetTable","connectionName":"snowflake-connection-v7"}, transformation_ctx = "DataSink0"]
## #return: DataSink0
## #inputs: [frame = Transform0]
DataSink0 = glueContext.write_dynamic_frame.from_options(frame = Transform0, connection_type = "custom.jdbc", connection_options = {"dbTable":"myschema.myTargetTable","connectionName":"snowflake-connection-v7"}, transformation_ctx = "DataSink0")
job.commit()
Also, about the stacktrace, I can share it too:
ERROR [main] glue.ProcessLauncher (Logging.scala:logError(70)): Error from Python:Traceback (most recent call last):
File "/tmp/test_job_v7.py", line 30, in <module>
DataSink0 = glueContext.write_dynamic_frame.from_options(frame = Transform0, connection_type = "custom.jdbc", connection_options =
{
"dbTable": "myschema.myTargetTable",
"connectionName": "snowflake-connection-v7"
}
, transformation_ctx = "DataSink0")
File "/opt/amazon/lib/python3.6/site-packages/awsglue/dynamicframe.py", line 653, in from_options
format_options, transformation_ctx)
File "/opt/amazon/lib/python3.6/site-packages/awsglue/context.py", line 281, in write_dynamic_frame_from_options
format, format_options, transformation_ctx)
File "/opt/amazon/lib/python3.6/site-packages/awsglue/context.py", line 304, in write_from_options
return sink.write(frame_or_dfc)
File "/opt/amazon/lib/python3.6/site-packages/awsglue/data_sink.py", line 35, in write
return self.writeFrame(dynamic_frame_or_dfc, info)
File "/opt/amazon/lib/python3.6/site-packages/awsglue/data_sink.py", line 31, in writeFrame
return DynamicFrame(self._jsink.pyWriteDynamicFrame(dynamic_frame._jdf, callsite(), info), dynamic_frame.glue_ctx, dynamic_frame.name + "_errors")
File "/opt/amazon/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/opt/amazon/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o104.pyWriteDynamicFrame.
: java.sql.SQLException: No suitable driver
at java.sql.DriverManager.getDriver(DriverManager.java:315)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions$$anonfun$6.apply(JDBCOptions.scala:105)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions$$anonfun$6.apply(JDBCOptions.scala:105)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.<init>(JDBCOptions.scala:104)
at org.apache.spark.sql.jdbc.glue.GlueJDBCOptions.<init>(GlueJDBCOptions.scala:14)
at org.apache.spark.sql.jdbc.glue.GlueJDBCOptions.<init>(GlueJDBCOptions.scala:17)
at com.amazonaws.services.glue.marketplace.partner.PartnerJDBCRecordWriterFactory.<init>(PartnerJDBCDataSink.scala:78)
at com.amazonaws.services.glue.marketplace.partner.PartnerJDBCDataSink.createWriterFactory(PartnerJDBCDataSink.scala:32)
at com.amazonaws.services.glue.marketplace.partner.PartnerJDBCDataSink.createWriterFactory(PartnerJDBCDataSink.scala:23)
at com.amazonaws.services.glue.marketplace.connector.GlueCustomDataSink.defaultWriteDynamicFrame(CustomDataSink.scala:68)
at com.amazonaws.services.glue.marketplace.connector.GlueCustomDataSink.writeDynamicFrame(CustomDataSink.scala:61)
at com.amazonaws.services.glue.DataSink.pyWriteDynamicFrame(DataSink.scala:65)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Finally, I'm looking at the error logs AND the Job logs, and the error is the same. Previous messages in both logs don't help either.
A few suggestions:
Ensure your JDBC driver is referenced in your code (I am not sure how to do this in the visual editor, but in the code, change the following line:
DataSink0 = glueContext.write_dynamic_frame.from_options(frame = Transform0, connection_type = "custom.jdbc", connection_options = {"dbTable":"myschema.myTargetTable","connectionName":"snowflake-connection-v7"}, transformation_ctx = "DataSink0")
to:
DataSink0 = glueContext.write_dynamic_frame.from_options(frame = Transform0, connection_type = "custom.jdbc", connection_options = {
"dbTable":"myschema.myTargetTable",
"connectionName":"snowflake-connection-v7",
"customJdbcDriverS3Path": "Amazon S3 path of the custom JDBC driver",
"customJdbcDriverClassName":"class name of the driver"
}, transformation_ctx = "DataSink0")
Also ensure that your glue job has iam permissions to the S3 folder where the driver is located. If you are using the default service-role/AWSGlueServiceRole, just ensure that the string "aws-glue-" appears somewhere in the s3 path, e.g. "S3://somebucket/aws-glue-drivers/mydriver.jar"

How To connect to mysql database with EntityFramework Core?

In the goal to use asp.net core mvc with the mysql database , i have downloaded the specific provider of EF_Core for Mysql.
Then i registred the DbContext service in the startup file:
services.AddDbContext<NawrasContext>(options=>
options.UseMySql(Configuration.GetConnectionString("DefaultConnection")));
and this is my appsettings.json :
{
"Logging": {
"LogLevel": {
"Default": "Warning"
}
},
"AllowedHosts": "*",
"ConnectionStrings": {
"DefaultConnection": "Server=s.mysql.db;Database=s2019;Uid=s2019;Pwd=pass;"
}
}
I have successfully added my first migration , but when i try to update the database , i get this error :
fail: Microsoft.EntityFrameworkCore.Database.Connection[20004]
An error occurred using the connection to database '' on server 's.mysql.db'.
MySql.Data.MySqlClient.MySqlException (0x80004005): Unable to connect to any of the specified MySQL hosts.
at MySqlConnector.Core.ServerSession.ConnectAsync(ConnectionSettings cs, ILoadBalancer loadBalancer, IOBehavior ioBehavior, CancellationToken cancellationToken) in C:\projects\mysqlconnector\src\MySqlConnector\Core\ServerSession.cs:line 440
what i m doing wrong ? why the error is telling me :
An error occurred using the connection to database '' on server 's.mysql.db'.
while the name of the database in the connection string is specified ?
Using these steps you can solve the issue
Use a Nugat package named "Pomelo.EntityFrameworkCore.MySql"
Register service to Startup.cs
services.AddCors();
services.AddDbContext<NawrasContext>(options =>options.UseMySql(Configuration.GetConnectionString("DefaultConnection")));
Use this connection string in appsettings.json
{
"Logging": {
"IncludeScopes": false,
"LogLevel": {
"Default": "Warning"
}
},
"ConnectionStrings": {
"DefaultConnection": "Server=myServerAddress;Database=myDataBase;Uid=myUsername;Pwd=myPassword; Encrypt=true;"
}
}
I had same issue using mac with MAMP as server for MySQL and the fix for me was to enable Allow network access to MySQL => Only from this Mac and then the server variable in connection string was like this :
"server=/Applications/MAMP/tmp/mysql/mysql.sock;port=8889;user=root;password=MyAwesomePassword;database=MyAwesomeDb;"
so basically try to put the mysql.sock path instead of localhost.
hope this will help someone , thanks .

Spark scala dataframe read and show multiline json file

I am trying to read and show JSON file data in spark using Scala. I am successful in reading the file , but when I say dataframe.show() it throws an error. Code as below
I see that reading multiline JSON file got easier from spark version 2.2 hence using this approach.
import java.sql.{Date, Timestamp}
import java.text.SimpleDateFormat
import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql._
object MostTrendingVideoOnADay {
def main(args: Array[ String ]): Unit = {
Logger.getLogger("org").setLevel(Level.OFF)
val spark = SparkSession
.builder()
.appName("youtube")
.master("local[*]")
.getOrCreate()
val usCategory = spark.read.option("multiline", true).option("mode", "PERMISSIVE").json("G:/Apache Spark/DataSets/youtube/US_category_id.json")
usCategory.printSchema()
usCategory.show()
spark.stop()
}
}
JSON File:
{
"kind": "youtube#videoCategoryListResponse",
"etag": "\"m2yskBQFythfE4irbTIeOgYYfBU/S730Ilt-Fi-emsQJvJAAShlR6hM\"",
"items": [
{
"kind": "youtube#videoCategory",
"etag": "\"m2yskBQFythfE4irbTIeOgYYfBU/Xy1mB4_yLrHy_BmKmPBggty2mZQ\"",
"id": "1",
"snippet": {
"channelId": "UCBR8-60-B28hp2BmDPdntcQ",
"title": "Film & Animation",
"assignable": true
}
},
{
"kind": "youtube#videoCategory",
"etag": "\"m2yskBQFythfE4irbTIeOgYYfBU/UZ1oLIIz2dxIhO45ZTFR3a3NyTA\"",
"id": "2",
"snippet": {
"channelId": "UCBR8-60-B28hp2BmDPdntcQ",
"title": "Autos & Vehicles",
"assignable": true
}
}
]
}
Error:
Exception in thread "main" org.apache.spark.SparkException: Job
aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most
recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor
driver): java.io.FileNotFoundException: File
file:/G:/Apache%20Spark/DataSets/youtube/US_category_id.json does not
exist
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE
tableName' command in SQL or by recreating the Dataset/DataFrame
involved.
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:127)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:174)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:105)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1517)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1505)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1504)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1504)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at scala.Option.foreach(Option.scala:245)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1732)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1687)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1676)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2029)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2050)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2069)
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:336)
at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:2861)
at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2150)
at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2150)
at org.apache.spark.sql.Dataset$$anonfun$55.apply(Dataset.scala:2842)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:2841)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2150)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2363)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:241)
at org.apache.spark.sql.Dataset.show(Dataset.scala:637)
at org.apache.spark.sql.Dataset.show(Dataset.scala:596)
at org.apache.spark.sql.Dataset.show(Dataset.scala:605)
at MostTrendingVideoOnADay$.main(MostTrendingVideoOnADay.scala:21)
at MostTrendingVideoOnADay.main(MostTrendingVideoOnADay.scala)
Caused by: java.io.FileNotFoundException: File file:/G:/Apache%20Spark/DataSets/youtube/US_category_id.json does not
exist
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE
tableName' command in SQL or by recreating the Dataset/DataFrame
involved.
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:127)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:174)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:105)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
As seen in your log file java.io.FileNotFoundException: File file:/G:/Apache%20Spark/DataSets/youtube/US_category_id.json does not exist
You can see there is a space in path Apache%20Spark which is causing the issue can you remove the space in the path?
Make it like ApacheSpark or Apache_Spark this should solve the issue.
Hope this helps!

log4js configure function error [NodeJs]

I am getting the following error in console window when running my NodeJs application. I am using log4js for logging. Runtime platform is express.
Error:
undefined:1
?{
^
SyntaxError: Unexpected token ? in JSON at position 0
Code:
var log4js = require('log4js');
var mylog = log4js.configure('log4jsConfig.json');
logger = log4js.getLogger("absolute-logger");
Config json:
{
"appenders": [
{
"type": "file",
"absolute": true,
"filename": "c:/temp/log_file.log",
"maxLogSize": 20480,
"backups": 10,
"category": "absolute-logger"
}
]
}
Question:
Any thoughts. I am thinking this is parsing error, but not sure how to resolve it.
Thanks,
Are you on a windows system? Sometimes, parsers get upset with \r\n (carriage returns). Try converting it UNIX (new line only) style lines.

Getting an error in all figway python scripts

I cloned the github project of figway in order to query the attributes of the entities to the orion but i'm getting an error in all python scripts:
File "GetEntity.py", line 37, in <module>
config = ConfigParser.RawConfigParser(allow_no_value=True)
TypeError: __init__() got an unexpected keyword argument 'allow_no_value'
I called it like -> python GetEntity.py Room
Some tips to investigate what is going on:
You should be using Python2.7 to run these scripts. Can you please let me know which version and OS are you using?
We have updated FIGWAY last week. Can you please clone it again if you did it before?
You should be using the new scripts at folder: /python-IDAS4/ContextBroker
With the previous assumptions you should get something like this (as long as that entity does not exist on that ContextBroker at the time being):
i6#raspberrypi ~/github/fiware-figway/python-IDAS4/ContextBroker $ python GetEntity.py Room
* Asking to http://130.206.80.40:1026/ngsi10/queryContext
* Headers: {'Fiware-Service': 'OpenIoT', 'content-type': 'application/json', 'accept': 'application/json', 'X-Auth-Token': 'NULL'}
* Sending PAYLOAD:
{
"entities": [
{
"type": "",
"id": "Room",
"isPattern": "false"
}
],
"attributes": []
}
...
* Status Code: 200
* Response:
{
"errorCode" : {
"code" : "404",
"reasonPhrase" : "No context element found"
}
}