I(scala beginner) was searching, but i couldn't find a fitting way to solve this following problem.
The Enumeration object(never changes):
object EyeColorEnum extends Enumeration{
val Blue = Value("blue")
val Brown = Value("brown")
val Gray = Value("gray")
val Green = Value("green")
}
The Json Array(case1):
"eyeColor": ["blue", "gray", "green"]
The Json Array(case2):
"eyeColor": []
The Json Array(case3):
"eyeColor": ["orange", "pink", "green"]
This solution should be a json validation for the field "eyeColor".
Case1 and case 2 are valid.
Case 3 is invalid.
for (i <- 1 to(jsonArray.value.length - 1)) {
for (j <- 1 to(jsonArray.value.length - 1)) {
if(jsonArray(i).as[String] == enumArray(j).toString) {
// Item from A exists in B
true
} else {
// Item from A does not exist in B
checker = checker + 1
}
}
}
These for are not working how i want them to work.
Is there probably a way more easy way to get this job done?
Thank u very much.
The reason why that is not working is that for comprehensions used in this way are doing .foreach calls against the Range instance you've created in the parens (which returns Unit rather than the value you're attempting to return):
(1 to (jsonArray.value.length - 1)).foreach{i => ...}
It sounds like you want a combination of the forall and exists methods that collections in Scala provide.
Is there somebody who could point out how to avoid the var?
This is my ugly solution:
def apply(enum: Enumeration, jsonArray: JsArray): Boolean = {
val enumArray = enum.values.toArray
var found = 0
jsonArray.value.length match {
case 0 => true
case _ =>
(0 to (jsonArray.value.length - 1)).foreach{i =>
(0 to (enumArray.length - 1)).foreach{j =>
if (jsonArray(i).as[String] == enumArray(j).toString) {
found +=1
}
}
}
found == jsonArray.value.length
}
}
Related
Thanks to the great help from Tenfour04, I've got wonderful code for handling CSV files.
However, I am in trouble like followings.
How to call these functions?
How to initialize 2-dimensional array variables?
Below is the code that finally worked.
MainActivity.kt
package com.surlofia.csv_tenfour04_1
import androidx.appcompat.app.AppCompatActivity
import android.os.Bundle
import java.io.File
import java.io.IOException
import com.surlofia.csv_tenfour04_1.databinding.ActivityMainBinding
var chk_Q_Num: MutableList<Int> = mutableListOf (
0,
1, 2, 3, 4, 5,
6, 7, 8, 9, 10,
11, 12, 13, 14, 15,
16, 17, 18, 19, 20,
)
var chk_Q_State: MutableList<String> = mutableListOf (
"z",
"a", "b", "c", "d", "e",
"f", "g", "h", "i", "j"
)
class MainActivity : AppCompatActivity() {
private lateinit var binding: ActivityMainBinding
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
// setContentView(R.layout.activity_main)
binding = ActivityMainBinding.inflate(layoutInflater)
val view = binding.root
setContentView(view)
// Load saved data at game startup. It will be invalid if performed by other activities.
val filePath = filesDir.path + "/chk_Q.csv"
val file = File(filePath)
binding.fileExists.text = isFileExists(file).toString()
if (isFileExists(file)) {
val csvIN = file.readAsCSV()
for (i in 0 .. 10) {
chk_Q_Num[i] = csvIN[i][0].toInt()
chk_Q_State[i] = csvIN[i][1]
}
}
// Game Program Run
val csvOUT = mutableListOf(
mutableListOf("0","OK"),
mutableListOf("1","OK"),
mutableListOf("2","OK"),
mutableListOf("3","Not yet"),
mutableListOf("4","Not yet"),
mutableListOf("5","Not yet"),
mutableListOf("6","Not yet"),
mutableListOf("7","Not yet"),
mutableListOf("8","Not yet"),
mutableListOf("9","Not yet"),
mutableListOf("10","Not yet")
)
var tempString = ""
for (i in 0 .. 10) {
csvOUT[i][0] = chk_Q_Num[i].toString()
csvOUT[i][1] = "OK"
tempString = tempString + csvOUT[i][0] + "-->" + csvOUT[i][1] + "\n"
}
binding.readFile.text = tempString
// and save Data
file.writeAsCSV(csvOUT)
}
// https://www.techiedelight.com/ja/check-if-a-file-exists-in-kotlin/
private fun isFileExists(file: File): Boolean {
return file.exists() && !file.isDirectory
}
#Throws(IOException::class)
fun File.readAsCSV(): List<List<String>> {
val splitLines = mutableListOf<List<String>>()
forEachLine {
splitLines += it.split(", ")
}
return splitLines
}
#Throws(IOException::class)
fun File.writeAsCSV(values: List<List<String>>) {
val csv = values.joinToString("\n") { line -> line.joinToString(", ") }
writeText(csv)
}
}
chk_Q.csv
0,0
1,OK
2,OK
3,Not yet
4,Not yet
5,Not yet
6,Not yet
7,Not yet
8,Not yet
9,Not yet
10,Not yet
1. How to call these functions?
The code below seems work well.
Did I call these funtions in right way?
Or are there better ways to achieve this?
read
if (isFileExists(file)) {
val csvIN = file.readAsCSV()
for (i in 0 .. 10) {
chk_Q_Num[i] = csvIN[i][0].toInt()
chk_Q_State[i] = csvIN[i][1]
}
}
write
file.writeAsCSV(csvOUT)
2. How to initialize 2-dimensional array variables?
val csvOUT = mutableListOf(
mutableListOf("0","OK"),
mutableListOf("1","OK"),
mutableListOf("2","OK"),
mutableListOf("3","Not yet"),
mutableListOf("4","Not yet"),
mutableListOf("5","Not yet"),
mutableListOf("6","Not yet"),
mutableListOf("7","Not yet"),
mutableListOf("8","Not yet"),
mutableListOf("9","Not yet"),
mutableListOf("10","Not yet")
)
I would like to know the clever way to use a for loop instead of writing specific values one by one.
For example, something like bellow.
val csvOUT = mutableListOf(mutableListOf())
for (i in 0 .. 10) {
csvOUT[i][0] = i
csvOUT[i][1] = "OK"
}
But this gave me the following error message:
Not enough information to infer type variable T
It would be great if you could provide an example of how to execute this for beginners.
----- Added on June 15, 2022. -----
[Question 1]
Regarding initialization, I got an error "keep stopping" when I executed the following code.
The application is forced to terminate.
Why is this?
val csvOUT: MutableList<MutableList<String>> = mutableListOf(mutableListOf())
for (i in 0 .. 10) {
csvOUT[i][0] = "$i"
csvOUT[i][1] = "OK"
}
[Error Message]
java.lang.RuntimeException: Unable to start activity ComponentInfo{com.surlofia.csv_endzeit_01/com.surlofia.csv_endzeit_01.MainActivity}: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
In my opinion there are basically two parts to your question. First you need an understanding of the Kotlin type system including generics. Secondly you want some knowledge about approaches to the problem at hand.
type-system and generics
The function mutableListOf you're using is generic and thus needs a single type parameter T, as can be seen by definition its taken from the documentation:
fun <T> mutableListOf(): MutableList<T>
Most of the time the Kotlin compiler is quite good at type-inference, that is guessing the type used based on the context. For example, I do not need to provide a type explicitly in the following example, because the Kotlin compiler can infer the type from the usage context.
val listWithInts = mutableListOf(3, 7)
The infered type is MutableList<Int>.
However, sometimes this might not be what one desires. For example, I might want to allow null values in my list above. To achieve this, I have to tell the compiler that it should not only allow Int values to the list but also null values, widening the type from Int to Int?. I can achieve this in at least two ways.
providing a generic type parameter
val listWithNullableInts = mutableListOf<Int?>(3, 7)
defining the expected return type explicitly
val listWithNullableInts: MutableList<Int?> = mutableListOf(3, 7)
In your case the compiler does NOT have enough information to infer the type from the usage context. Thus you either have to provide it that context, e.g. by passing values of a specific type to the function or using one of the two options named above.
initialization of multidimensional arrays
There are questions and answers on creating multi-dimensional arrays in Kotlin on StackOverflow already.
One solution to your problem at hand might be the following.
val csvOUT: MutableList<MutableList<String>> = mutableListOf(mutableListOf())
for (i in 0 .. 10) {
csvOUT[i][0] = "$i"
csvOUT[i][1] = "OK"
}
You help the Kotlin compiler by defining the expected return type explicitly and then add the values as Strings to your 2D list.
If the dimensions are fixed, you might want to use fixed-size Arrays instead.
val csvArray = Array(11) { index -> arrayOf("$index", "OK") }
In both solutions you convert the Int index to a String however.
If the only information you want to store for each level is a String, you might as well use a simple List<String and use the index of each entry as the level number, e.g.:
val csvOut = List(11) { "OK" }
val levelThree = csvOut[2] // first index of List is 0
This would also work with more complicated data structures instead of Strings. You simply would have to adjust your fun File.writeAsCSV(values: List<List<String>>) to accept a different type as the values parameter.
Assume a simple data class you might end up with something along the lines of:
data class LevelState(val state: String, val timeBeaten: Instant?)
val levelState = List(11) { LevelState("OK", Instant.now()) }
fun File.writeAsCSV(values: List<LevelState>) {
val csvString = values
.mapIndexed { index, levelState -> "$index, ${levelState.state}, ${levelState.timeBeaten}" }
.joinToString("\n")
writeText(csvString)
}
If you prefer a more "classical" imperative approach, you can populate your 2-dimensional Array / List using a loop like for in.
val list: MutableList<MutableList<String>> = mutableListOf() // list is now []
for (i in 0..10) {
val innerList: MutableList<String> = mutableListOf()
innerList.add("$i")
innerList.add("OK")
innerList.add("${Instant.now()}")
list.add(innerList)
// list is after first iteration [ ["0", "OK", "2022-06-15T07:03:14.315Z"] ]
}
The syntax listName[index] = value is just syntactic sugar for the operator overload of the set operator, see the documentation on MutableList for example.
You cannot access an index, that has not been populated before, e.g. during the List's initialization or by using add; or else you're greeted with a IndexOutOfBoundsException.
If you want to use the set operator, one option is to use a pre-populated Array as such:
val array: Array<Array<String>>> = Array(11) {
Array(3) { "default" }
} // array is [ ["default, "default", "default"], ...]
array[1][2] = "myValue"
However, I wouldn't recommend this approach, as it might lead to left over, potentially invalid initial data, in case one misses to replace a value.
The json I have looks like this:
{
"cards": [
{
"card_id":"1234567890",
"card_status":"active",
"card_expiration":{
"formatted":"01/20"
},
"debit":{
"masked_debit_card_number":"1111 **** **** 1111",
}
},
{
"card_id":"1234567891",
"card_status":"active",
"card_expiration":null,
"debit":{
"masked_debit_card_number":"2222 **** **** 2222",
}
}
]
}
I'm trying to retrieve all card_expiration fields values using this function:
def getExpirations(json: Json) =
root
.cards
.each
.filter(root.card_status.string.exist(_ == "active"))
.card_expiration
.selectDynamic("formatted")
.string
.getAll(json)
The thing is, the above expression only returns 1 result - for the first card, but I really need to get something like List(Some("01/20"), None)! What can I do in this situation?
The issue is that by the time you've done the formatted step you're no longer matching the null expiration. You could do something like this:
import io.circe.Json, io.circe.optics.JsonPath.root
def getExpirations(json: Json) =
root
.cards
.each
.filter(root.card_status.string.exist(_ == "active"))
.card_expiration
.as[Option[Map[String, String]]]
.getAll(json)
Or:
import io.circe.Json, io.circe.generic.auto._, io.circe.optics.JsonPath.root
case class Expiration(formatted: String)
def getExpirations(json: Json) =
root
.cards
.each
.filter(root.card_status.string.exist(_ == "active"))
.card_expiration
.as[Option[Expiration]]
.getAll(json)
And then:
scala> getExpirations(io.circe.jawn.parse(doc).right.get)
res0: List[Option[Expiration]] = List(Some(Expiration(01/20)), None)
Without more context, it's not clear in my view that this is a good use case for circe-optics. You're probably better off decoding into case classes, or maybe using cursors. If you can provide more information it'd be easier to tell.
My requirement is to convert two string and create a JSON file(using spray JSON) and save in a resource directory.
one input string contains the ID and other input strings contain the score and topic
id = "alpha1"
inputstring = "science 30 math 24"
Expected output JSON is
{“ContentID”: “alpha1”,
“Topics”: [
{"Score" : 30, "TopicID" : "Science" },
{ "Score" : 24, "TopicID" : "math”}
]
}
below is the approach I have taken and am stuck in the last place
Define the case class
case class Topic(Score: String, TopicID: String)
case class Model(contentID: String, topic: Array[Topic])
implicit val topicJsonFormat: RootJsonFormat[Topic] = jsonFormat2(Topic)
implicit val modelJsonFormat: RootJsonFormat[Model] = jsonFormat2(Model)
Parsing the input string
val a = input.split(" ").zipWithIndex.collect{case(v,i) if (i % 2 == 0) =>
(v,i)}.map(_._1)
val b = input.split(" ").zipWithIndex.collect{case(v,i) if (i % 2 != 0) =>
(v,i)}.map(_._1)
val result = a.zip(b)
And finally transversing through result
paired foreach {case (x,y) =>
val tClass = Topic(x, y)
val mClassJsonString = Topic(x, y).toJson.prettyPrint
out1.write(mClassJsonString.toString)
}
And the file is generated as
{"Score" : 30, "TopicID" : "Science" }
{ "Score" : 24, "TopicID" : "math”}
The problem is I am not able to add the contentID as needed above.
Adding ContentId inside foreach is making contentID added multiple time.
You're calling toJson inside foreach creating strings and then you're appending it to buffer.
What you probably wanted to do is to create a class (ADT) hierarchy first and then serialize it:
val topics = paired.map(Topic)
//toArray might be not necessary if topics variable is already an array
val model = Model("alpha1", topics.toArray)
val json = model.toJson.prettyPrint
out1.write(json.toString)
I have an object like this:
val aa = parse(""" { "vals" : [[1,2,3,4], [4,5,6,7], [8,9,6,3]] } """)
I want to access the value '1' in the first JArray.
println(aa.values ???)
How is this done?
Thanks
One way would be :
val n = (aa \ "vals")(0)(0).extract[Int]
println(n)
Another way is to parse the whole json using a case class :
implicit val formats = DefaultFormats
case class Numbers(vals: List[List[Int]])
val numbers = aa.extract[Numbers]
This way you can access the first value of the first list however you like :
for { list <- numbers.vals.headOption; hd <- list.headOption } println(hd)
// or
println(numbers.vals.head.head)
// or ...
I have a RDD[Map[String,Int]] where the keys of the maps are the column names. Each map is incomplete and to know the column names I would need to union all the keys. Is there a way to avoid this collect operation to know all the keys and use just once rdd.saveAsTextFile(..) to get the csv?
For example, say I have an RDD with two elements (scala notation):
Map("a"->1, "b"->2)
Map("b"->1, "c"->3)
I would like to end up with this csv:
a,b,c
1,2,0
0,1,3
Scala solutions are better but any other Spark-compatible language would do.
EDIT:
I can try to solve my problem from another direction also. Let's say I somehow know all the columns in the beginning, but I want to get rid of columns that have 0 value in all maps. So the problem becomes, I know that the keys are ("a", "b", "c") and from this:
Map("a"->1, "b"->2, "c"->0)
Map("a"->3, "b"->1, "c"->0)
I need to write the csv:
a,b
1,2
3,1
Would it be possible to do this with only one collect?
If you're statement is: "every new element in my RDD may add a new column name I have not seen so far", the answer is obviously can't avoid a full scan. But you don't need to collect all elements on the driver.
You could use aggregate to only collect column names. This method takes two functions, one is to insert a single element into the resulting collection, and another one to merge results from two different partitions.
rdd.aggregate(Set.empty[String])( {(s, m) => s union m.keySet }, { (s1, s2) => s1 union s2 })
You will get back a set of all column names in the RDD. In a second scan you can print the CSV file.
Scala and any other supported language
You can use spark-csv
First lets find all present columns:
val cols = sc.broadcast(rdd.flatMap(_.keys).distinct().collect())
Create RDD[Row]:
val rows = rdd.map {
row => { Row.fromSeq(cols.value.map { row.getOrElse(_, 0) })}
}
Prepare schema:
import org.apache.spark.sql.types.{StructType, StructField, IntegerType}
val schema = StructType(
cols.value.map(field => StructField(field, IntegerType, true)))
Convert RDD[Row] to Data Frame:
val df = sqlContext.createDataFrame(rows, schema)
Write results:
// Spark 1.4+, for other versions see spark-csv docs
df.write.format("com.databricks.spark.csv").save("mycsv.csv")
You can do pretty much the same thing using other supported languages.
Python
If you use Python and final data fits in a driver memory you can use Pandas through toPandas() method:
rdd = sc.parallelize([{'a': 1, 'b': 2}, {'b': 1, 'c': 3}])
cols = sc.broadcast(rdd.flatMap(lambda row: row.keys()).distinct().collect())
df = sqlContext.createDataFrame(
rdd.map(lambda row: {k: row.get(k, 0) for k in cols.value}))
df.toPandas().save('mycsv.csv')
or directly:
import pandas as pd
pd.DataFrame(rdd.collect()).fillna(0).save('mycsv.csv')
Edit
One possible way to the second collect is to use accumulators to either build a set of all column names or to count these where you found zeros and use this information to map over rows and remove unnecessary columns or to add zeros.
It is possible but inefficient and feels like cheating. The only situation when it makes some sense is when number of zeros is very low, but I guess it is not the case here.
object ColsSetParam extends AccumulatorParam[Set[String]] {
def zero(initialValue: Set[String]): Set[String] = {
Set.empty[String]
}
def addInPlace(s1: Set[String], s2: Set[String]): Set[String] = {
s1 ++ s2
}
}
val colSetAccum = sc.accumulator(Set.empty[String])(ColsSetParam)
rdd.foreach { colSetAccum += _.keys.toSet }
or
// We assume you know this upfront
val allColnames = sc.broadcast(Set("a", "b", "c"))
object ZeroColsParam extends AccumulatorParam[Map[String, Int]] {
def zero(initialValue: Map[String, Int]): Map[String, Int] = {
Map.empty[String, Int]
}
def addInPlace(m1: Map[String, Int], m2: Map[String, Int]): Map[String, Int] = {
val keys = m1.keys ++ m2.keys
keys.map(
(k: String) => (k -> (m1.getOrElse(k, 0) + m2.getOrElse(k, 0)))).toMap
}
}
val accum = sc.accumulator(Map.empty[String, Int])(ZeroColsParam)
rdd.foreach { row =>
// If allColnames.value -- row.keys.toSet is empty we can avoid this part
accum += (allColnames.value -- row.keys.toSet).map(x => (x -> 1)).toMap
}