Error on Reading in a json file with Spark R

Error on Reading in a json file with Spark R - json

I am trying to follow a tutorial in SparkR. I follow the setup as required. But as soon as I try the function "read.json(path)" I get the following error:
"Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) :
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)..."
I am running R 3.3.2 and Java JDK 1.8 as requested in the tutorial.
I attach images of the code and the results:
Is my Java being found, is it the right version?
The image is in R studio showing the code on the left and the console result on the right:

Solution:
The spark-submit or sparkR instance is there.
Using the hdfs//...path put the json file on the Hadoop hdfs:
hadoop-2.0.2\bin> hadoop fs -put "/example/../people.json" "/user/../people.json"
Then use
people <- read.df (sqlContext, "/user/../people.json","json")
to read the json and create dataframe 'people'.
Above steps worked for me after I made necessary changes in the example dataframe.R.

Related

"Error Deserializing JSON Credential Data" gcp auth file c#

Following on from this question, I'd like to open a related one to verify an answer.
Originally my code looked like
string path2key = "C:/Users/me/Downloads/project-id-1-letters.json";
string jsonString = File.ReadAllText(path2key);
Console.WriteLine(jsonString); // this works and prints correctly
// I presume I'm passing json as a string???
var credential = GoogleCredential.FromJson(jsonString);
but I was getting the following error
System.InvalidOperationException: 'Error deserializing JSON credential data.'
FileNotFoundException: Could not load file or assembly 'System.Security.Permissions, Version=0.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51'. The system cannot find the file specified.
I'm using netframework 4.5, meaning I'm somewhat limited in updating packages to the latest and greatest.

The other answer is correct. I was about to post the same answer but #serge beat me to it.
The class GoogleCredentials has a member FromJson. That member takes a string argument. In your code, you were trying to pass an object.
For the error in your comment to the answer:
FileNotFoundException: Could not load file or assembly 'System.Security.Permissions
That error can be corrected by using nuget package manager and adding System.Security.Permissions or via the CLI:
dotnet add package System.Security.Permissions

Error while creating a custom producer in scala

I have written a small code for custom producer in Kafka using scala and it is giving the below error. I have attached the code in code section. I have attached some code for reference.
Name: Compile Error
Message: <console>:61: error: not found: type KafkaProducer
val producer = new KafkaProducer[String, String](props)
^
I think I need to import a relevant package. I tried importing the packages but could not get the correct one.
val producer = new KafkaProducer[String, String](props)
for( i <- 1 to 10) {
//producer.send(new ProducerRecord[String, String]("jin", "test",
"test"));
val record = new ProducerRecord("jin", "key", "the end ")
producer.send(record)

I can't install a scala kernel for jupyter right now, but based on this github you should add Kafka as a dependency, then the library might be recognized
%%configure -f
{
"conf": {
"spark.jars.packages": "org.apache.spark:spark-streaming_2.11:2.1.0,org.apache.bahir:spark-streaming-twitter_2.11:2.1.0,org.apache.spark:spark-streaming-kafka-0-8_2.10:2.1.0,com.google.code.gson:gson:2.4",
"spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.11"
}
}
If this doesn't work, try downloading the whole notebook from the git, and fire it yourself, to see if something else is needed

#Arthur , Magic command %%configure -f did not work in jupyter notebook. I tried downloading the Whole notebook from the git but that also does not work. Luckily I was
reading the apache toree documentation for adding the dependencies and found a command %%addDeps. After putting dependencies in the below format into jupyter notebook,
I managed to run the code.
%AddDeps org.apache.kafka kafka-clients 1.0.0
%AddDeps org.apache.spark spark-core_2.11 2.3.0
Just for the information of others, when we compile the code using SBT, we need to comment this code from jupyter notebook as we will add these in build.sbt file.
Thanks Arthur for showing the direction !

Spark 2.0 CSV Error

I am upgrading to spark 2 from 1.6 and am having an issue reading in CSV files. In spark 1.6 I would have something like this to read in a CSV file.
val df = sqlContext.read.format("com.databricks.spark.csv")
.option("header", "true")
.load(fileName)
Now I use the following code as given in the documentation:
val df = spark.read
.option("header", "true")
.csv(fileName)
This results in the following error when running:
"Exception in thread "main" java.lang.RuntimeException: Multiple sources found for csv (org.apache.spark.sql.execution.datasources.csv.CSVFileFormat, com.databricks.spark.csv.DefaultSource15), please specify the fully qualified class name."
I assume this is because I still had the spark-csv dependency, however I removed that dependency and rebuilt the application and I still get the same error. How is the databricks dependency still being found once I have removed it?

The error message means you have --packages com.databricks:spark-csv_2.11:1.5.0 option while you run spark-shell or have those jars in your class path. Please check your class path and remove that.

I didn't add any jars to my class path.
I use this to load csv file in spark shell(2.3.1).
val df = spark.sqlContext.read.csv('path')

Weka Find Database Util Properties But Throwing Driver Not Found

I build a simple weka application using Weka 3.7.13 dev version using MAVEN. I have initialise the pom.xml function to get weka library from the repository.
i want to run a simple clustering algorithm using weka, and connecting it to MySQL database. As per documentation, the weka need to have a DatabaseUtil.props file which contains it's jdbc driver. I have setup that.
This is my project structure:
I can retrieve the DatabaseUtils.props using my code below. But somehow the weka itself cannot recognise the jdbc:Url and driver name which i already configure inside.
DatabaseUtil.props file:
# JDBC driver (comma-separated list)
jdbcDriver=com.mysql.jdbc.Driver
# database URL
jdbcURL=jdbc:mysql://127.0.0.1:3306/datamining
Java code which i trigger the Weka module:
public void dm(){
int cluster = 4;
InstanceQuery clusterQuery = null;
try{
//file reader of database utilities properties
ClassLoader classLoader = getClass().getClassLoader();
File fileProps = new File(classLoader.getResource("DatabaseUtils.props").getFile());
//setup the data preparation functionality then connect to database
clusterQuery = new InstanceQuery();
clusterQuery.setUsername("test");
clusterQuery.setPassword("test");
clusterQuery.setQuery("SELECT * FROM TEMP_KMEANS");
clusterQuery.setCustomPropsFile(fileProps);
But when executed the function returning error: java.sql.SQLException: No suitable driver found for jdbc:idb=experiments.prp
It seems that the DatabaseUtils.props file cannot override the default driver value jdbc:idb=experiments.prp
Any ideas why this is throwing?
Any feedback and answer much appreciated.

I created a DatabaseUtils.props file in the src / main / resources folder and wrote the following code. It worked for me, I'm using weka 3.8.0.
File file = new File(getClass().getClassLoader().getResource("DatabaseUtils.props").toURI());
InstanceQuery query = new InstanceQuery();
query.setCustomPropsFile(file);
query.setUsername(MYSQL_USER);
query.setPassword(MYSQL_PASSWORD);
query.setQuery("select * from Action");
instances = query.retrieveInstances();
System.out.println(instances.toString());

The documentation says "These may be changed by creating a java properties file called DatabaseUtils.props in user.home or the current directory". Your properties file does not match those requirements (a properties file in a jar is not on the filesystem). See also Chapter 14 of the Weka manual.
An alternative solution is to set the URL in code using setDatabaseURL. As long as the JDBC driver is on the classpath this should work.

play, scala and jerkson noClassDefFound error

I am trying to work with jerkson in play and with scala 2.10.
However, i want to load data fixtures based on a json files. for this prcoedure I'm trying to load the json with the "parse" command from jerkson.
That ultimatly fails.
I'm doing this in the "override def onStart(app: Application)" function. The error:
NoClassDefFoundError: Could not initialize class com.codahale.jerkson.Json$
Any guesses why this is happening ? I have the following libs in my deps.:
"com.codahale" % "jerkson_2.9.1" % "0.5.0",
"com.cloudphysics" % "jerkson_2.10" % "0.6.3"
my parsing command is:
com.codahale.jerkson.Json.parse[Map[String,Any]](json)
Thanks in advance

A NoClassDefFoundError generally means there is some sort of issues with the classpath. For starters, if you are running on scala 2.10, I would remove the following line from your sbt file:
"com.codahale" % "jerkson_2.9.1" % "0.5.0"
Then, make sure the com.cloudphysics jerkson jar file is available in your apps classpath and try your test again.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Error on Reading in a json file with Spark R - json

Related

"Error Deserializing JSON Credential Data" gcp auth file c#

Error while creating a custom producer in scala

Spark 2.0 CSV Error

Weka Find Database Util Properties But Throwing Driver Not Found

play, scala and jerkson noClassDefFound error

Categories

Resources