Groovy - Cleaning up functions - function

I have a gross looking function and am wanting to clean it up. All it contains is some maps, a for loop, and if statements.
Basically I have a map that I would like to grab only certain information from but one of the keys that I need from it changes by one number in each map.
I was thinking maybe a simple switch statement or something should fix it, but I often get confused with simplifying such things.
Here is what the function looks like:
public void separateBooksFromList() {
book1 = [:] //map for book1
book2 = [:] //map for book2
book3 = [:] //map for book3
book4 = [:] //map for book4
book5 = [:] //map for book5
book6 = [:] //map for book6
book7 = [:] //map for book7
book8 = [:] //map for book8
book9 = [:] //map for book9
book10 = [:] //map for book10
lastModified = new Date(dataFile.lastModified()) //last time the file was scanned
readDate = new Date() //current date the text file was read
for(int i = 0; i < bookList.size(); i++) {
if(i==0) {
book1['lastScan'] = lastModified
book1['readDate'] = readDate
book1['bookNumber'] = bookList['Book Number 0'][i] // <- only part of the map that changes
book1['bookTitle'] = bookList['Book Title'][i]
}
if(i==1) {
book2['lastScan'] = lastModified
book2['readDate'] = readDate
book2['bookNumber'] = bookList['Book Number 1'][i] // <- only part of the map that changes
book2['bookTitle'] = bookList['Book Title'][i]
}
if(i==2) {
book3['lastScan'] = lastModified
book3['readDate'] = readDate
book3['bookNumber'] = bookList['Book Number 2'][i] // <- only part of the map that changes
book3['bookTitle'] = bookList['Book Title'][i]
}
if(i==3) {
book4['lastScan'] = lastModified
book4['readDate'] = readDate
book4['bookNumber'] = bookList['Book Number 3'][i] // <- only part of the map that changes
book4['bookTitle'] = bookList['Book Title'][i]
}
if(i==4) {
book5['lastScan'] = lastModified
book5['readDate'] = readDate
book5['bookNumber'] = bookList['Book Number 4'][i] // <- only part of the map that changes
book5['bookTitle'] = bookList['Book Title'][i]
}
if(i==5) {
book6['lastScan'] = lastModified
book6['readDate'] = readDate
book6['bookNumber'] = bookList['Book Number 5'][i] // <- only part of the map that changes
book6['bookTitle'] = bookList['Book Title'][i]
}
if(i==6) {
book7['lastScan'] = lastModified
book7['readDate'] = readDate
book7['bookNumber'] = bookList['Book Number 6'][i] // <- only part of the map that changes
book7['bookTitle'] = bookList['Book Title'][i]
}
if(i==7) {
book8['lastScan'] = lastModified
book8['readDate'] = readDate
book8['bookNumber'] = bookList['Book Number 7'][i] // <- only part of the map that changes
book8['bookTitle'] = bookList['Book Title'][i]
}
if(i==8) {
book9['lastScan'] = lastModified
book9['readDate'] = readDate
book9['bookNumber'] = bookList['Book Number 8'][i] // <- only part of the map that changes
book9['bookTitle'] = bookList['Book Title'][i]
}
if(i==9) {
book10['lastScan'] = lastModified
book10['readDate'] = readDate
book10['bookNumber'] = bookList['Book Number 9'][i] // <- only part of the map that changes
book10['bookTitle'] = bookList['Book Title'][i]
}
}
}
As you can see, it's quite an ugly function.
Am I able to do a simple switch statement or something to cut down the code and make it look more professional?

There are a few things you can do here. You can use a list of books rather than book1, book2, etc... This will save repeating yourself so much.
This is almost the same but with that change. It will create a list of size 10 (assuming 10 books) with a map entry like what you had before.
public void separateBooksFromList() {
lastModified = new Date(dataFile.lastModified()) //last time the file was scanned
readDate = new Date() //current date the text file was read
// Use a list of Maps instead of separate variables
int numberOfBooks = bookList.size()
def books = []
numberOfBooks.times {
books[it] = [
lastScan: lastModified,
readDate: readDate,
bookNumber: booklist["Book Number $it"][it],
bookTitle: booklist["BookTitle"][it]
]
}
}

Related

Android Room how to query related table?

First thing, my data is POKEMON!!! enjoy 😉
I need to do this on the database side, filtering and sorting the returned data isn't an option as using Paging...
I'm using Room I have my database working well but I now want to query the pokemonType list in my relation
Given this data class
data class PokemonWithTypesAndSpecies #JvmOverloads constructor(
#Ignore
var matches : Int = 0,
#Embedded
val pokemon: Pokemon,
#Relation(
parentColumn = Pokemon.POKEMON_ID,
entity = PokemonType::class,
entityColumn = PokemonType.TYPE_ID,
associateBy = Junction(
value = PokemonTypesJoin::class,
parentColumn = Pokemon.POKEMON_ID,
entityColumn = PokemonType.TYPE_ID
)
)
val types: List<PokemonType>,
#Relation(
parentColumn = Pokemon.POKEMON_ID,
entity = PokemonSpecies::class,
entityColumn = PokemonSpecies.SPECIES_ID,
associateBy = Junction(
value = PokemonSpeciesJoin::class,
parentColumn = Pokemon.POKEMON_ID,
entityColumn = PokemonSpecies.SPECIES_ID
)
)
val species: PokemonSpecies?
)
I can get my data with a simple query and even search it
#Query("SELECT * FROM Pokemon WHERE pokemon_name LIKE :search")
fun searchPokemonWithTypesAndSpecies(search: String): LiveData<List<PokemonWithTypesAndSpecies>>
But now what I want is to add filtering on pokemon types which as you can see is a list (which is probably a transaction under the hood) and is in a seperate table, so given a list of string called filters I want to:
only return pokemon that contain an item in filters
sort pokemon by both number of matching types and by ID
So I want my tests to look something like this
val bulbasaurSpeciesID = 1
val squirtleSpeciesID = 2
val charmanderSpeciesID = 3
val charizardSpeciesID = 4
val pidgeySpeciesID = 5
val moltresSpeciesID = 6
val bulbasaurID = 1
val squirtleID = 2
val charmanderID = 3
val charizardID = 4
val pidgeyID = 5
val moltresID = 6
val grassTypeID = 1
val poisonTypeID = 2
val fireTypeID = 3
val waterTypeID = 4
val flyingTypeID = 5
val emptySearch = "%%"
val allPokemonTypes = listOf(
"normal",
"water",
"fire",
"grass",
"electric",
"ice",
"fighting",
"poison",
"ground",
"flying",
"psychic",
"bug",
"rock",
"ghost",
"dark",
"dragon",
"steel",
"fairy",
"unknown",
)
#Before
fun createDb() {
val context = ApplicationProvider.getApplicationContext<Context>()
db = Room.inMemoryDatabaseBuilder(
context, PokemonRoomDatabase::class.java,
).setTransactionExecutor(Executors.newSingleThreadExecutor())
.allowMainThreadQueries()
.build()
pokemonDao = db.pokemonDao()
speciesDao = db.pokemonSpeciesDao()
speciesJoinDao = db.pokemonSpeciesJoinDao()
pokemonTypeDao = db.pokemonTypeDao()
pokemonTypeJoinDao = db.pokemonTypeJoinDao()
}
#After
#Throws(IOException::class)
fun closeDb() {
db.close()
}
#Test
#Throws(Exception::class)
fun testFiltering() {
insertPokemonForFilterTest()
val pokemon =
pokemonDao.searchAndFilterPokemon(search = emptySearch, filters = allPokemonTypes)
.getValueBlocking(scope)
assertThat(pokemon?.size, equalTo(6)) // This fails list size is 9 with the current query
val pokemonFiltered =
pokemonDao.searchAndFilterPokemon(search = emptySearch, filters = listOf("fire", "flying"))
.getValueBlocking(scope)
assertThat(pokemon?.size, equalTo(4))
assertThat(pokemonFiltered!![0].pokemon.name, equalTo("charizard")) // matches 2 filters and ID is 4
assertThat(pokemonFiltered!![1].pokemon.name, equalTo("moltres")) // matches 2 filters and ID is 6
assertThat(pokemonFiltered!![2].pokemon.name, equalTo("charmander")) // matches one filter and ID is 3
assertThat(pokemonFiltered!![3].pokemon.name, equalTo("pidgey")) // mayches one filter and ID is 5
}
private fun insertPokemonForFilterTest() = runBlocking {
insertBulbasaur()
insertSquirtle()
insertCharmander()
insertCharizard()
insertMoltres()
insertPidgey()
}
private fun insertBulbasaur() = runBlocking {
val bulbasaur = bulbasaur()
val grassJoin = PokemonTypesJoin(pokemon_id = bulbasaurID, type_id = grassTypeID)
val poisonJoin = PokemonTypesJoin(pokemon_id = bulbasaurID, type_id = poisonTypeID)
val bulbasaurSpeciesJoin =
PokemonSpeciesJoin(pokemon_id = bulbasaurID, species_id = bulbasaurSpeciesID)
pokemonDao.insertPokemon(bulbasaur.pokemon)
speciesDao.insertSpecies(bulbasaur.species!!)
speciesJoinDao.insertPokemonSpeciesJoin(bulbasaurSpeciesJoin)
pokemonTypeDao.insertPokemonType(pokemonType = bulbasaur.types[0])
pokemonTypeDao.insertPokemonType(pokemonType = bulbasaur.types[1])
pokemonTypeJoinDao.insertPokemonTypeJoin(grassJoin)
pokemonTypeJoinDao.insertPokemonTypeJoin(poisonJoin)
}
private fun insertSquirtle() = runBlocking {
val squirtle = squirtle()
val squirtleSpeciesJoin =
PokemonSpeciesJoin(pokemon_id = squirtleID, species_id = squirtleSpeciesID)
val waterJoin = PokemonTypesJoin(pokemon_id = squirtleID, type_id = waterTypeID)
pokemonDao.insertPokemon(squirtle.pokemon)
speciesDao.insertSpecies(squirtle.species!!)
speciesJoinDao.insertPokemonSpeciesJoin(squirtleSpeciesJoin)
pokemonTypeDao.insertPokemonType(pokemonType = squirtle.types[0])
pokemonTypeJoinDao.insertPokemonTypeJoin(waterJoin)
}
private fun insertCharmander() = runBlocking {
val charmander = charmander()
val fireJoin = PokemonTypesJoin(pokemon_id = charmanderID, type_id = fireTypeID)
val charmanderSpeciesJoin =
PokemonSpeciesJoin(pokemon_id = charmanderID, species_id = charmanderSpeciesID)
pokemonDao.insertPokemon(charmander.pokemon)
speciesDao.insertSpecies(charmander.species!!)
speciesJoinDao.insertPokemonSpeciesJoin(charmanderSpeciesJoin)
pokemonTypeDao.insertPokemonType(pokemonType = charmander.types[0])
pokemonTypeJoinDao.insertPokemonTypeJoin(fireJoin)
}
private fun insertCharizard() = runBlocking {
val charizard = charizard()
val charizardSpeciesJoin =
PokemonSpeciesJoin(pokemon_id = charizardID, species_id = charizardSpeciesID)
val fireJoin = PokemonTypesJoin(pokemon_id = charizardID, type_id = fireTypeID)
val flyingJoin = PokemonTypesJoin(pokemon_id = charizardID, type_id = flyingTypeID)
pokemonDao.insertPokemon(charizard.pokemon)
speciesDao.insertSpecies(charizard.species!!)
speciesJoinDao.insertPokemonSpeciesJoin(charizardSpeciesJoin)
pokemonTypeDao.insertPokemonType(pokemonType = charizard.types[0])
pokemonTypeDao.insertPokemonType(pokemonType = charizard.types[1])
pokemonTypeJoinDao.insertPokemonTypeJoin(fireJoin)
pokemonTypeJoinDao.insertPokemonTypeJoin(flyingJoin)
}
private fun insertPidgey() = runBlocking {
val pidgey = pidgey()
val pidgeySpeciesJoin =
PokemonSpeciesJoin(pokemon_id = pidgeyID, species_id = pidgeySpeciesID)
val flyingJoin = PokemonTypesJoin(pokemon_id = pidgeyID, type_id = flyingTypeID)
pokemonDao.insertPokemon(pidgey.pokemon)
speciesDao.insertSpecies(pidgey.species!!)
speciesJoinDao.insertPokemonSpeciesJoin(pidgeySpeciesJoin)
pokemonTypeDao.insertPokemonType(pokemonType = pidgey.types[0])
pokemonTypeJoinDao.insertPokemonTypeJoin(flyingJoin)
}
private fun insertMoltres() = runBlocking {
val moltres = moltres()
val moltresSpeciesJoin =
PokemonSpeciesJoin(pokemon_id = moltresID, species_id = moltresSpeciesID)
val fireJoin = PokemonTypesJoin(pokemon_id = moltresID, type_id = fireTypeID)
val flyingJoin = PokemonTypesJoin(pokemon_id = moltresID, type_id = flyingTypeID)
pokemonDao.insertPokemon(moltres.pokemon)
speciesDao.insertSpecies(moltres.species!!)
speciesJoinDao.insertPokemonSpeciesJoin(moltresSpeciesJoin)
pokemonTypeDao.insertPokemonType(pokemonType = moltres.types[0])
pokemonTypeDao.insertPokemonType(pokemonType = moltres.types[1])
pokemonTypeJoinDao.insertPokemonTypeJoin(fireJoin)
pokemonTypeJoinDao.insertPokemonTypeJoin(flyingJoin)
}
fun bulbasaur(): PokemonWithTypesAndSpecies = PokemonWithTypesAndSpecies(
pokemon = Pokemon(id = bulbasaurID, name = "bulbasaur"),
species = PokemonSpecies(
id = bulbasaurSpeciesID,
species = "Seed pokemon",
pokedexEntry = "There is a plant seed on its back right from the day this Pokémon is born. The seed slowly grows larger."
),
types = listOf(
PokemonType(id = poisonTypeID, name = "poison", slot = 1),
PokemonType(id = grassTypeID, name = "grass", slot = 2)
)
)
fun squirtle(): PokemonWithTypesAndSpecies = PokemonWithTypesAndSpecies(
pokemon = Pokemon(id = squirtleID, name = "squirtle"),
species = PokemonSpecies(
id = squirtleSpeciesID,
species = "Turtle pokemon",
pokedexEntry = "Small shell pokemon"
),
types = listOf(PokemonType(id = waterTypeID, name = "water", slot = 1))
)
fun charmander(): PokemonWithTypesAndSpecies = PokemonWithTypesAndSpecies(
pokemon = Pokemon(id = charmanderID, name = "charmander"),
species = PokemonSpecies(
id = charmanderSpeciesID,
species = "Fire lizard pokemon",
pokedexEntry = "If the flame on this pokemon's tail goes out it will die"
),
types = listOf(PokemonType(id = fireTypeID, name = "fire", slot = 1))
)
fun charizard(): PokemonWithTypesAndSpecies = PokemonWithTypesAndSpecies(
pokemon = Pokemon(id = charizardID, name = "charizard"),
species = PokemonSpecies(
id = charizardSpeciesID,
species = "Fire flying lizard pokemon",
pokedexEntry = "Spits fire that is hot enough to melt boulders. Known to cause forest fires unintentionally"
),
types = listOf(
PokemonType(id = fireTypeID, name = "fire", slot = 1),
PokemonType(id = flyingTypeID, name = "flying", slot = 2)
)
)
fun moltres(): PokemonWithTypesAndSpecies = PokemonWithTypesAndSpecies(
pokemon = Pokemon(id = moltresID, name = "moltres"),
species = PokemonSpecies(
id = moltresSpeciesID,
species = "Fire bird pokemon",
pokedexEntry = "Known as the legendary bird of fire. Every flap of its wings creates a dazzling flash of flames"
),
types = listOf(
PokemonType(id = fireTypeID, name = "fire", slot = 1),
PokemonType(id = flyingTypeID, name = "flying", slot = 2)
)
)
fun pidgey(): PokemonWithTypesAndSpecies = PokemonWithTypesAndSpecies(
pokemon = Pokemon(id = pidgeyID, name = "pidgey"),
species = PokemonSpecies(
id = pidgeySpeciesID,
species = "Bird pokemon",
pokedexEntry = "Pidgey is a Flying Pokémon. Among all the Flying Pokémon, it is the gentlest and easiest to capture. A perfect target for the beginning Pokémon Trainer to test his Pokémon's skills."
),
types = listOf(PokemonType(id = flyingTypeID, name = "flying", slot = 1))
)
And the query would be
#Query("SELECT * FROM Pokemon INNER JOIN PokemonType, PokemonTypesJoin ON Pokemon.pokemon_id = PokemonTypesJoin.pokemon_id AND PokemonType.type_id = PokemonTypesJoin.type_id WHERE pokemon_name LIKE :search AND type_name IN (:filters) ORDER BY pokemon_id ASC")
fun searchAndFilterPokemon(search: String, filters: List<String>): LiveData<List<PokemonWithTypesAndSpecies>>
I'm guessing this doesn't work because at this point Room hasn't collected the types from the other table and it's probably not even querying a list, I think this part
type_name IN (:filters)
is checking a column against a list when what I want is a List against a list 🤷‍♂️ but honestly I'm happy to just say I've fallen and can't get up 🤣 can anyone help? any help appreciated
Maybe I could misuse some columns' names, but try this query:
#Query("SELECT pok.id, pok.name FROM Pokemon AS pok
INNER JOIN PokemonTypesJoin AS p_join ON pok.id = p_join.pokemon_id
INNER JOIN PokemonType AS pok_type ON pok_type.id = p_join.type_id
WHERE pok.name LIKE :search AND pok_type.name IN (:filters)
GROUP BY pok.id, pok.name ORDER BY count(*) DESC, pok.id ASC")
Have you compared Room to Cmobilecom-JPA for android? JPA is very good at query relationships. The advantage of using JPA (standard) is obvious, making your code reusable on android, server side java, or swing project.
Thanks to #serglytikhonov my query works and now looks like this
#Query("""SELECT * FROM Pokemon
INNER JOIN PokemonTypesJoin
ON Pokemon.pokemon_id = PokemonTypesJoin.pokemon_id
INNER JOIN PokemonType
ON PokemonType.type_id = PokemonTypesJoin.type_id
WHERE pokemon_name LIKE :search AND type_name IN (:filters)
GROUP BY Pokemon.pokemon_id, Pokemon.pokemon_name
ORDER BY count(*) DESC, pokemon_id ASC""")
fun searchAndFilterPokemon(search: String, filters: List<String>): LiveData<List<PokemonWithTypesAndSpecies>>
the main piece being this count(*) and group by many thanks

how to parse multiple JSON structures in spark program

I am working on parsing logs(Json format) in Scala. I don't know how to proceed. I may get different kinds of logs to be processed.
how do i write/design my code to handle different types of Json structures?
can i give my Scala program a schema and let it parse?
I wrote some code using Object mapper and read through the nodes but i want a more structure agnostic approach.
I am not sure where to start. please point me to some reading or examples. i tried to google or search in Stackoverflow resulting in too many examples and it is confusing as i am learning Scala also.
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.fs.Path
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Calendar;
import org.apache.spark.sql.hive.HiveContext
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.core.JsonParseException;
import com.fasterxml.jackson.databind.JsonMappingException;
import org.apache.spark.rdd.RDD;
sc.setLogLevel("OFF");
val args = sc.getConf.get("spark.driver.args").split("\\s+")
args.foreach(println);
var envStr = "dev";
var srcStr = "appm"
val RootFolderStr = "/source_folder/";
val DestFolderStr = "/dest_folder/";
val dateformatter = new SimpleDateFormat("yyyy-MM-dd'T'hh:mm:ss.SSS'Z'");
val formatter = new SimpleDateFormat("yyyy-MM-dd");
val theMonthFormatter = new SimpleDateFormat("yyyy-MM");
var fromDay: Date = formatter.parse("2018-04-29");
var toDay: Date = formatter.parse("2018-05-01");
if (args.length < 2) {
printf("usage: need at least 2 parameters in spark.driver.args");
sys.exit(2);
}
envStr = args(0).toLowerCase();
srcStr = args(1).toLowerCase();
if (args.length == 4) {
fromDay = formatter.parse(args(2));
toDay = formatter.parse(args(3));
}
if (args.length == 2) {
// default to be yesterday to today
toDay = formatter.parse(formatter.format(Calendar.getInstance().getTime()));
val previousDay = Calendar.getInstance();
previousDay.add(Calendar.DATE, -1);
fromDay = formatter.parse(formatter.format(previousDay.getTime()));
}
// get the sub-folder for the monthly partition
val monthFolder = theMonthFormatter.format(fromDay);
var rootFolder = RootFolderStr.replaceAll("ENV", envStr) + monthFolder;
rootFolder = rootFolder.replaceAll("SRC", srcStr);
val destFolder = DestFolderStr.replaceAll("ENV", envStr);
var toCalendar = Calendar.getInstance();
toCalendar.setTime(toDay);
toCalendar.add(Calendar.DATE, 1);
// need to consider the case across the month boundary
val toDay2 = formatter.parse(formatter.format(toCalendar.getTime()));
// filter out .tmp files and 0-size files
// .tmp files are not safe to read from, it's possible that the files are under updating by Flume job and the message data is incomplete
// when the Spark job starts to read from it.
val pathInfos = FileSystem.get(sc.hadoopConfiguration).listStatus(new Path(rootFolder));
// filter out the 0-length files, .tmp files which is of today
val allfiles = pathInfos.filter(fileStatus => {
if (fileStatus.getLen == 0)
false
else {
val aPath = fileStatus.getPath().getName();
// use the modification time is more accurate.
val lastTime = fileStatus.getModificationTime();
val aDate = new Date(lastTime);
// all files between fromDay and toDay2
aDate.after(fromDay) && aDate.before(toDay2);
}
}
).map(_.getPath.toString);
case class event_log(
time_stp: Long,
msg_sze: Int,
msg_src: String,
action_path: String,
s_code: Int,
s_desc: String,
p_code: String,
c_id: String,
m_id: String,
c_ip: String,
c_gp: String,
gip: String,
ggip: String,
rbody: String
);
def readEvents(fileList: Array[String], msgSrc: String, fromTS: Long, toTS: Long): RDD[(event_log)] = {
val records =
sc.sequenceFile[Long, String](fileList.mkString(","))
.filter((message) => {
(message._1 >= fromTS && message._1 < toTS);
}
)
val eventLogs = records.map((message) => {
val time_stp = message._1;
var msg_sze = message._2.length();
var c_id = ""
var m_id = "";
var p_code = "";
var c_ip = "";
var c_gp = "";
var gip = "";
var ggip = "";
var rbody = "";
var action_path = "";
var s_code: Int = 200;
var s_desc = "";
try {
// parse the message
val mapper = new ObjectMapper();
val aBuff = message._2.getBytes();
val root = mapper.readTree(aBuff);
var aNode = root.path("rbody");
rbody = aNode.textValue();
if (rbody != null && rbody.length() > 0) {
val mapper_2 = new ObjectMapper();
val aBuff_2 = rbody.getBytes();
var root2 = mapper_2.readTree(aBuff_2);
aNode = root2.path("p_code");
if (aNode != null && aNode.isValueNode())
p_code = String.valueOf(aNode.intValue());
aNode = root2.path("mkay");
if (aNode != null && aNode.isObject()) {
root2 = aNode;
}
{
aNode = root2.get("c_id");
if (aNode != null && aNode.isValueNode())
c_id = aNode.textValue();
aNode = root2.get("m_id");
if (aNode != null && aNode.isValueNode()) {
m_id = String.valueOf(aNode.intValue());
}
}
}
aNode = root.path("c_ip");
c_ip = aNode.textValue();
aNode = root.path("c_gp");
c_gp = aNode.textValue();
aNode = root.path("gip");
gip = aNode.textValue();
aNode = root.path("ggip");
ggip = aNode.textValue();
aNode = root.path("action_path");
action_path = aNode.textValue();
aNode = root.path("s_code");
val statusNodeValue = aNode.textValue().trim();
s_code = Integer.valueOf(statusNodeValue.substring(0, 3));
s_desc = statusNodeValue.substring(3).trim();
}
catch {
// return empty string as indicator that it's not a well-formatted JSON message
case jex: JsonParseException => {
msg_sze = 0
};
case ioEx: java.io.IOException => {
msg_sze = 0
};
case rtEx: JsonMappingException => {
msg_sze = 0
};
}
event_log(time_stp, msg_sze, msgSrc, action_path, s_code, s_desc,
p_code, c_id, m_id,
c_ip, c_gp, gip, ggip,
rbody);
});
eventLogs;
}
val hiveContext = new HiveContext(sc)
if (allfiles.length == 0)
sys.exit(3);
val fromTime = fromDay.getTime();
val toTime = toDay.getTime();
val events = readEvents(allfiles, srcStr, fromTime, toTime);
val df = hiveContext.createDataFrame(events).coalesce(1);
df.write.parquet(destFolder);
sys.exit(0);

Index Match Large Array Google Script Taking Very Long

I have the function below where I am trying to scrape 4 websites, and then combine the results into a spreadsheet. Is there a faster way to match over a large array that isn't the INDEX/MATCH formulas. My desired output would be (obv this is an example)
MLBID | FG_ID | PA | K | K% | wOBA
12345 | 12345 | 12 | 5 | 41.7% | .300
While the code I have below works, it takes wayyyy too long reaches the 6-minute limit of Google Script. The matching that I am trying to do is with ~4000 rows. I have commented my code as much as possible.
function minors_batting_stats() {
//this is the spreadsheet where I have a list of all of the IDs -- MLB and FG
var ids = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("Player List");
//this is the output sheet
var mb18vR_sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("2018 minors bat vs R");
//various URLs I am trying to scrape
var mb18vR_PA_url = 'https://www.mlb.com/prospects/stats/search?level=11&level=12&level=13&level=14&level=15&level=16&pitcher_throws=R&batter_stands=&game_date_gt=&game_date_lt=&season=2017&home_away=&draft_year=&prospect=&player_type=batter&sort_by=results&sort_order=desc&group_by=name&min_pa=&min_pitches=#results'
var mb18vR_SO_url = 'https://www.mlb.com/prospects/stats/search?pa_result=strikeout&level=11&level=12&level=13&level=14&level=15&level=16&pitcher_throws=R&batter_stands=&game_date_gt=&game_date_lt=&season=2017&home_away=&draft_year=&prospect=&player_type=batter&sort_by=results&sort_order=desc&group_by=name&min_pa=&min_pitches=#results'
var mb18vR_wOBA_url = 'https://www.mlb.com/prospects/stats/search?level=11&level=12&level=13&level=14&level=15&level=16&pitcher_throws=R&batter_stands=&game_date_gt=&game_date_lt=&season=2017&home_away=&draft_year=&prospect=&player_type=batter&sort_by=woba&sort_order=desc&group_by=name&min_pa=&min_pitches=#results'
//creating an array for each scrape
var res = [];
var res1 = [];
var res2 = [];
var res3 = [];
//getting the MLB and FG ids from the spreadsheet
var mlbids = ids.getRange(1, 11, ids.getLastRow()).getValues();
var fgids = ids.getRange(1,9, ids.getLastRow()).getValues();
//scraping SO against RHP
var content_SO = UrlFetchApp.fetch(mb18vR_SO_url).getContentText();
var e_SO = Parser.data(content_SO).from('tbody').to('</tbody>').build();
var rows_SO = Parser.data(e_SO).from('<tr class="player_row"').to('</tr>').iterate();
for (var i=0; i<rows_SO.length; i++) { //rows.length
res1[i] = [];
res1[i][0] = Parser.data(rows_SO[i]).from('/player/').to('/').build();
var SOs = Parser.data(rows_SO[i]).from('<td align="left">').to('</td>').iterate();
res1[i][1] = SOs[1];
}
//scraping wOBA against RHP
var content_wOBA = UrlFetchApp.fetch(mb18vR_wOBA_url).getContentText();
var e_wOBA = Parser.data(content_wOBA).from('tbody').to('</tbody>').build();
var rows_wOBA = Parser.data(e_wOBA).from('<tr class="player_row"').to('</tr>').iterate();
for (var i=0; i<rows_wOBA.length; i++) { //rows.length
res2[i] = [];
res2[i][0] = Parser.data(rows_wOBA[i]).from('/player/').to('/').build();
var wOBAs = Parser.data(rows_wOBA[i]).from('<td align="left">').to('</td>').iterate();
res2[i][1] = wOBAs[2];
}
//scraping PA against RHP
var content = UrlFetchApp.fetch(mb18vR_PA_url).getContentText();
var e = Parser.data(content).from('tbody').to('</tbody>').build();
var rows = Parser.data(e).from('<tr class="player_row"').to('</tr>').iterate();
for (var i=0; i<rows.length; i++) { //rows.length
res[i] = [];
res[i][0] = Parser.data(rows[i]).from('/player/').to('/').build();
res[i][1] = [];
//matching the MLB_ID with FG_ID
var mlbID = res[i][0];
for(var j = 0; j<mlbids.length;j++){
if(mlbids[j] == mlbID){
res[i][1] = fgids[j];
}
}
var PAs = Parser.data(rows[i]).from('<td align="left">').to('</td>').iterate();
res[i][2] = PAs[1];
//matching the MLB_ID from PA (res) with SO (res1)
res[i][3] = 0;
for (var w=0; w<res1.length; w++) {
if (res[i][0] == res1[w][0]) {
res[i][3] = res1[w][1];
}
}
//Calculating K%
res[i][4] = res[i][3] / res[i][2]
//matching the MLB_ID from PA (res) with wOBA (res1)
res[i][5] = 0;
for (var v=0; v<res2.length; v++) {
if (res[i][0] == res2[v][0]) {
res[i][5] = res2[v][1];
}
}
}
//pasting values
mb18vR_sheet.getRange(2, 1, res.length, res[0].length).setValues(res);
}
The issue you have is that you are forcing your script to loop through large datasets many many times for each row of compared data. A better approach is to build a lookup object, which maps between a desired unique identifier and the row of the data array you want to access:
/* Make an object from an Array[][] that has a unique identifier in one of the columns.
* #param Array[][] data The 2D array of data to index, e.g. [ [r1c1, r1c2, ...], [r2c1, r2c2, ...], ... ]
* #param Integer idColumn The column in the data array that is a unique row identifier
e.g. the column index that contains the product's serial number, in a data
array that has only a single row per unique product.
#return Object {} An object that maps between an id and a row index, such that
`object[id]` = the row index for the specific row in data that has id = id
*/
function makeKey(data, idColumn) {
if(!data || !data.length || !data[0].length)
throw new ValueError("Input data argument is not Array[][]");
// Assume the first column is the column with the unique identifier if not given by the caller.
if(idColumn === undefined)
idColumn = 0;
var key = {};
for(var r = 0, rows = data.length; r < rows; ++r) {
var id = data[r][idColumn];
if (key[id])
throw new ValueError("ID is not unique for id='" + id + "'");
key[id] = r;
}
return key;
}
Usage:
var database = someSheet.getDataRange().getValues();
var lookup = makeKey(database, 3); // here we say that the 4th column has the unique values.
var newData = /* read a 2D array from somewhere */;
for(var r = 0, rows < newData.length; r < rows; ++r) {
var id = newData[r][3];
var existingIndex = lookup[id];
if (existingIndex) {
var oldDataRow = database[existingIndex];
} else {
// No existing data.
}
}
By making a lookup object for your data arrays, you no longer have to re-search them and make comparisons, because you did the search once and stored the relationship, rather than discarding it every time. Note that the key that was made is based on a specific (and unique) property of the data. Without that relationship, this particular indexing approach won't work - but a different one will.

F# CsvTypeProvider extracting the same columns from slightly different csv-files

I am creating a program that reads football matches from different CSV files. The columns I am interested in are present in all the files, but the files have a varying number of columns.
This left me creating a separate mapping function for each variation of file, with a different sample for each type:
type GamesFile14 = CsvProvider<"./data/sample_14.csv">
type GamesFile15 = CsvProvider<"./data/sample_15.csv">
type GamesFile1617 = CsvProvider<"./data/sample_1617.csv">
let mapRows14 (rows:seq<GamesFile14.Row>) = rows |> Seq.map ( fun c -> { Division = c.Div; Date = DateTime.Parse c.Date;
HomeTeam = { Name = c.HomeTeam; Score = c.FTHG; Shots = c.HS; ShotsOnTarget = c.HST; Corners = c.HC; Fouls = c.HF };
AwayTeam = { Name = c.AwayTeam; Score = c.FTAG; Shots = c.AS; ShotsOnTarget = c.AST; Corners = c.AC; Fouls = c.AF };
Odds = { H = float c.B365H; U = float c.B365D; B = float c.B365A } } )
let mapRows15 (rows:seq<GamesFile15.Row>) = rows |> Seq.map ( fun c -> { Division = c.Div; Date = DateTime.Parse c.Date;
HomeTeam = { Name = c.HomeTeam; Score = c.FTHG; Shots = c.HS; ShotsOnTarget = c.HST; Corners = c.HC; Fouls = c.HF };
AwayTeam = { Name = c.AwayTeam; Score = c.FTAG; Shots = c.AS; ShotsOnTarget = c.AST; Corners = c.AC; Fouls = c.AF };
Odds = { H = float c.B365H; U = float c.B365D; B = float c.B365A } } )
let mapRows1617 (rows:seq<GamesFile1617.Row>) = rows |> Seq.map ( fun c -> { Division = c.Div; Date = DateTime.Parse c.Date;
HomeTeam = { Name = c.HomeTeam; Score = c.FTHG; Shots = c.HS; ShotsOnTarget = c.HST; Corners = c.HC; Fouls = c.HF };
AwayTeam = { Name = c.AwayTeam; Score = c.FTAG; Shots = c.AS; ShotsOnTarget = c.AST; Corners = c.AC; Fouls = c.AF };
Odds = { H = float c.B365H; U = float c.B365D; B = float c.B365A } } )
These are again consumed by the loadGames function:
let loadGames season resource =
if season.Year = 14 then GamesFile14.Load(resource).Rows |> mapRows14
else if season.Year = 15 then GamesFile15.Load(resource).Rows |> mapRows15
else GamesFile1617.Load(resource).Rows |> mapRows1617
It seems to me that there must be better ways to get around this problem.
Is there any way I could make my mapping function more generic so that I don't have to repeat the same function over and over?
Is it possible to create the CsvProvider on the fly based on the resource, or do I need to explicitly declare a sample for each variation of my csv-files like in the code above?
Other suggestions?
In your scenario, you might get better results from FSharp.Data's CsvFile type. It uses a more dynamic approach to CSV parsing, using the dynamic ? operator for data access: you lose some of the type-safety guarantees that the type provider gives you, since each separate CSV file will be loaded into the save CsvRow type -- which means that you can't guarantee at compile time that any given column will be in a file, and you have to be prepared for runtime errors. But in your case, that's just what you want, because it would allow your three functions to be rewritten like this:
let mapRows14 rows = rows |> Seq.map ( fun c -> { Division = c?Div; Date = DateTime.Parse c?Date;
HomeTeam = { Name = c?HomeTeam; Score = c?FTHG; Shots = c?HS; ShotsOnTarget = c?HST; Corners = c?HC; Fouls = c?HF };
AwayTeam = { Name = c?AwayTeam; Score = c?FTAG; Shots = c?AS; ShotsOnTarget = c?AST; Corners = c?AC; Fouls = c?AF };
Odds = { H = float c?B365H; U = float c?B365D; B = float c?B365A } } )
Give CsvFile a try and see if it solves your problem.

Sort object by another object value inside

Here some complication sorting in my application.
Now i have data object is like following(called pCList):
Object[0]:
Id: 1
comp: Test
med: xyz
condition: valueObject.Condition
Object[1]:
Id: 2
comp: Test1
med: pqr
condition: valueObject.Condition
Object[2]:
Id: 3
comp: Test
med: abc
condition: valueObject.Condition
condition VO Have data like:
condition data1:
conId: 001
cond: abcds
condition data2:
conId: 001
cond: trdfd
condition data3:
conId: 001
cond: dsdsds
For normal sorting i will do as following way;
var sort:ISort = new Sort();
var sortField:ISortField = new SortField("med");
sort.fields = [sortField];
if(pCList != null)
{
pCList.sort = sort;
pCList.refresh();
}
In which pcList is sort by med.
But now, I want to sort by condition.cond
like first come which have cond value abcds then dsdsds then trdfd and so on...
I Have tried it using:
var sort:ISort = new Sort();
var sortField:ISortField = new SortField("condition.cond");
sort.fields = [sortField];
But not succeed. Any help is greatly appreciated.
ISort has a property compareFunction, that can be used for custom sorting. See example below.
var sort:ISort = new Sort();
sort.compareFunction = function(a:Object, b:Object, fields:Array = null):int {
var conditionA:String = a.Condition.cond;
var conditionB:String = b.Condition.cond;
if (conditionA < conditionB) {
return -1;
} else if (conditionA > conditionB) {
return 1;
} else {
return 0;
}
};