Conversion Function - function

I am trying to create a function that converts any number type to float32 and anything else to 0f.
I first tried using match on type, but the type system infers the type as boolean, even when I set the type as generic.
let numtype x =
match box x with
//| :? sbyte | :? byte | :? int16 | :? int32 | :? int64 | :? uint16 | :? uint32 | :? uint64 | :? nativeint | :? unativeint | :? decimal | :? double | :? float | :? float32 -> float 32 x
| :? char | :? string | :? unit -> 0f
| :? bool -> 0f
| _ -> float32 x
numtype true
Script1.fsx(63,9): error FS0001: This expression was expected to have type 'int' but here has type 'bool'
Then, I tried using a error catching, but it balks at converting from generic
let convToNum<'a> (x:'a) =
try
match x with
| :? float32 -> x
with
//| :? System.Exception -> 0f
| _ -> 0f
convToNum 1
Script1.fsx(76,11): error FS0008: This runtime coercion or type test from type 'a to float32 involves an indeterminate type based on information prior to this program point. Runtime type tests are not allowed on some types. Further type annotations are needed.
Does anyone have a suggestion?
updated based on comments

I think the problem is that you're calling float32 on the original x parameter instead of using the cast value. Try something like this instead:
let numtype (x : obj) =
match x with
| :? sbyte as x' -> float32 x'
| :? int16 as x' -> float32 x'
| :? int32 as x' -> float32 x'
| :? int64 as x' -> float32 x'
| :? uint32 as x' -> float32 x'
| :? uint64 as x' -> float32 x'
| :? nativeint as x' -> float32 x'
| :? unativeint as x' -> float32 x'
| :? decimal as x' -> float32 x'
| :? double as x' -> float32 x'
| :? float32 as x' -> x'
| _ -> 0f
You can also use the built-in System.Convert type:
open System
let numtype (x : obj) =
try Convert.ToSingle(x)
with _ -> 0f

Related

to_html() formatters, float and string together

I'm trying to to_html(formatters={column: function}) in order to make visual changes.
df = pd.DataFrame({'name':['Here',4.45454,5]}, index=['A', 'B', 'C'])
def _form(val):
value = 'STRING' if isinstance(val, str) else '{:0.0f}'.format(val)
return value
df.to_html(formatters={'name':_form})
I get
name
A
STRING
B
4.45454
C
5
instead of
| | name |
| -------- | -------------- |
| A | STRING |
| B | 4 |
| C | 5 |
Problem here is float value doesn't change.
On the other hand when I have all values float or integers, it gives desired result:
df = pd.DataFrame({'name':[323.322,4.45454,5]}, index=['A', 'B', 'C'])
def _form(val):
value = 'STRING' if isinstance(val, str) else '{:0.0f}'.format(val)
return value
df.to_html(formatters={'name':_form})
How can it be fixed?
Thank you.

the `?` operator can only be used in a function that returns `Result` or `Option` (or another type that implements `std::ops::Try`)

I am doing my assignment that's include make connection with the database in Rust. I am using the latest version of mysql crate: mysql ="18.2.0".My Database connection is successful as I print the pool variable. I write my own code for table student but I get the error. Then i paste the code of documentation, I recieve the following error with'?' operator:
I am connecting the database in rust for the first time. Any help is appreciated.
warning: unused import: `std::io`
--> src/main.rs:2:5
|
2 | use std::io;
| ^^^^^^^
|
= note: `#[warn(unused_imports)]` on by default
error[E0277]: the `?` operator can only be used in a function that returns `Result` or `Option` (or another type that implements `std::ops::Try`)
--> src/main.rs:17:12
|
14 | / fn insert(){
15 | |
16 | |
17 | | let pool = Pool::new("mysql://root:root#localhost:3306/Rust_testing")?;
| | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cannot use the `?` operator in a function that returns `()`
... |
58 | |
59 | | }
| |_- this function should return `Result` or `Option` to accept `?`
|
= help: the trait `std::ops::Try` is not implemented for `()`
= note: required by `std::ops::Try::from_error`
error[E0277]: the `?` operator can only be used in a function that returns `Result` or `Option` (or another type that implements `std::ops::Try`)
--> src/main.rs:19:16
|
14 | / fn insert(){
15 | |
16 | |
17 | | let pool = Pool::new("mysql://root:root#localhost:3306/Rust_testing")?;
18 | |
19 | | let mut conn = pool.get_conn()?;
| | ^^^^^^^^^^^^^^^^ cannot use the `?` operator in a function that returns `()`
... |
58 | |
59 | | }
| |_- this function should return `Result` or `Option` to accept `?`
|
= help: the trait `std::ops::Try` is not implemented for `()`
= note: required by `std::ops::Try::from_error`
error[E0277]: the `?` operator can only be used in a function that returns `Result` or `Option` (or another type that implements `std::ops::Try`)
--> src/main.rs:22:1
|
14 | / fn insert(){
15 | |
16 | |
17 | | let pool = Pool::new("mysql://root:root#localhost:3306/Rust_testing")?;
... |
22 | /| conn.query_drop(
23 | || r"CREATE TEMPORARY TABLE payment (
24 | || customer_id int not null,
25 | || amount int not null,
26 | || account_name text
27 | || )")?;
| ||________^ cannot use the `?` operator in a function that returns `()`
... |
58 | |
59 | | }
| |_- this function should return `Result` or `Option` to accept `?`
|
= help: the trait `std::ops::Try` is not implemented for `()`
= note: required by `std::ops::Try::from_error`
error[E0277]: the `?` operator can only be used in a function that returns `Result` or `Option` (or another type that implements `std::ops::Try`)
--> src/main.rs:38:1
|
14 | / fn insert(){
15 | |
16 | |
17 | | let pool = Pool::new("mysql://root:root#localhost:3306/Rust_testing")?;
... |
38 | /| conn.exec_batch(
39 | || r"INSERT INTO payment (customer_id, amount, account_name)
40 | || VALUES (:customer_id, :amount, :account_name)",
41 | || payments.iter().map(|p| params! {
... ||
45 | || })
46 | || )?;
| ||__^ cannot use the `?` operator in a function that returns `()`
... |
58 | |
59 | | }
| |_- this function should return `Result` or `Option` to accept `?`
|
= help: the trait `std::ops::Try` is not implemented for `()`
= note: required by `std::ops::Try::from_error`
error[E0277]: the `?` operator can only be used in a function that returns `Result` or `Option` (or another type that implements `std::ops::Try`)
--> src/main.rs:49:25
|
14 | / fn insert(){
15 | |
16 | |
17 | | let pool = Pool::new("mysql://root:root#localhost:3306/Rust_testing")?;
... |
49 | | let selected_payments = conn
| |_________________________^
50 | || .query_map(
51 | || "SELECT customer_id, amount, account_name from payment",
52 | || |(customer_id, amount, account_name)| {
53 | || Payment { customer_id, amount, account_name }
54 | || },
55 | || )?;
| ||______^ cannot use the `?` operator in a function that returns `()`
... |
58 | |
59 | | }
| |_- this function should return `Result` or `Option` to accept `?`
|
= help: the trait `std::ops::Try` is not implemented for `()`
= note: required by `std::ops::Try::from_error`
error: aborting due to 5 previous errors; 1 warning emitted
For more information about this error, try `rustc --explain E0277`.
error: could not compile `class-09`.
Here is the code, i copy from documentation to test:
use std::io;
use mysql::prelude::*;
use mysql::*;
#[derive(Debug, PartialEq, Eq)]
struct Payment {
customer_id: i32,
amount: i32,
account_name: Option<String>,
}
fn insert(){
let pool = Pool::new("mysql://root:root#localhost:3306/Rust_testing")?;
let mut conn = pool.get_conn()?;
// Let's create a table for payments.
conn.query_drop(
r"CREATE TEMPORARY TABLE payment (
customer_id int not null,
amount int not null,
account_name text
)")?;
let payments = vec![
Payment { customer_id: 1, amount: 2, account_name: None },
Payment { customer_id: 3, amount: 4, account_name: Some("foo".into()) },
Payment { customer_id: 5, amount: 6, account_name: None },
Payment { customer_id: 7, amount: 8, account_name: None },
Payment { customer_id: 9, amount: 10, account_name: Some("bar".into()) },
];
// Now let's insert payments to the database
conn.exec_batch(
r"INSERT INTO payment (customer_id, amount, account_name)
VALUES (:customer_id, :amount, :account_name)",
payments.iter().map(|p| params! {
"customer_id" => p.customer_id,
"amount" => p.amount,
"account_name" => &p.account_name,
})
)?;
// Let's select payments from database. Type inference should do the trick here.
let selected_payments = conn
.query_map(
"SELECT customer_id, amount, account_name from payment",
|(customer_id, amount, account_name)| {
Payment { customer_id, amount, account_name }
},
)?;
println!("Yay!");
}
fn main(){
insert();
}
and when i write my code without the ? operator, I got the following error:
warning: unused import: `std::io`
--> src/main.rs:2:5
|
2 | use std::io;
| ^^^^^^^
|
= note: `#[warn(unused_imports)]` on by default
error[E0599]: no method named `query_drop` found for enum `std::result::Result<mysql::conn::pool::PooledConn, mysql::error::Error>` in the current scope
--> src/main.rs:32:6
|
32 | conn.query_drop(
| ^^^^^^^^^^ method not found in `std::result::Result<mysql::conn::pool::PooledConn, mysql::error::Error>`
error[E0599]: no method named `exec_batch` found for enum `std::result::Result<mysql::conn::pool::PooledConn, mysql::error::Error>` in the current scope
--> src/main.rs:48:6
|
48 | conn.exec_batch(
| ^^^^^^^^^^ method not found in `std::result::Result<mysql::conn::pool::PooledConn, mysql::error::Error>`
error[E0599]: no method named `query_map` found for enum `std::result::Result<mysql::conn::pool::PooledConn, mysql::error::Error>` in the current scope
--> src/main.rs:60:6
|
60 | .query_map(
| ^^^^^^^^^ method not found in `std::result::Result<mysql::conn::pool::PooledConn, mysql::error::Error>`
warning: unused import: `mysql::prelude`
--> src/main.rs:4:5
|
4 | use mysql::prelude::*;
| ^^^^^^^^^^^^^^
error: aborting due to 3 previous errors; 2 warnings emitted
For more information about this error, try `rustc --explain E0599`.
error: could not compile `class-09`.
As the compiler is telling you: you are missing a return type in your function.
The ? operator will return (propagate) the error if any, but for that to work you need to have a return type that can be constructed with the error type.
For prototyping, you can just call unwrap. But this approach should be carefully considered when writing production code, as it will just crash the program when the function returns an error.
find more here

How to call a function from another file and fetch in web

I'm new to Rust and still I am learning things. There is a rust application with main.rs and routes.rs. main.rs file has server configuration and routes.rs has methods with paths.
main.rs
#[macro_use]
extern crate log;
use actix_web::{App, HttpServer};
use dotenv::dotenv;
use listenfd::ListenFd;
use std::env;
mod search;
#[actix_rt::main]
async fn main() -> std::io::Result<()> {
dotenv().ok();
env_logger::init();
let mut listenfd = ListenFd::from_env();
let mut server = HttpServer::new(||
App::new()
.configure(search::init_routes)
);
server = match listenfd.take_tcp_listener(0)? {
Some(listener) => server.listen(listener)?,
None => {
let host = env::var("HOST").expect("Host not set");
let port = env::var("PORT").expect("Port not set");
server.bind(format!("{}:{}", host, port))?
}
};
info!("Starting server");
server.run().await
}
routes.rs
use crate::search::User;
use actix_web::{get, post, put, delete, web, HttpResponse, Responder};
use serde_json::json;
extern crate reqwest;
extern crate serde;
use reqwest::Error;
use serde::{Deserialize};
use rocket_contrib::json::Json;
use serde_json::Value;
// mod bargainfindermax;
#[get("/users")]
async fn find_all() -> impl Responder {
HttpResponse::Ok().json(
vec![
User { id: 1, email: "tore#cloudmaker.dev".to_string() },
User { id: 2, email: "tore#cloudmaker.dev".to_string() },
]
)
}
pub fn init_routes(cfg: &mut web::ServiceConfig) {
cfg.service(find_all);
}
Now what I want is I want to fetch an API using a method in another separate rs file (fetch_test.rs) and route it in the routes.rs file. Then I want to get the response from a web browser by running that route path(link).
How can I do these things ?? I searched everywhere, but I found nothing helpful. And sometimes I didn't understand some documentations also.
**Update.
fetch_test.rs
extern crate reqwest;
use hyper::header::{Headers, Authorization, Basic, ContentType};
pub fn authenticate() -> String {
fn construct_headers() -> Headers {
let mut headers = Headers::new();
headers.set(
Authorization(
Basic {
username: "HI:ABGTYH".to_owned(),
password: Some("%8YHT".to_owned())
}
)
);
headers.set(ContentType::form_url_encoded());
headers
}
let client = reqwest::Client::new();
let resz = client.post("https://api.test.com/auth/token")
.headers(construct_headers())
.body("grant_type=client_credentials")
.json(&map)
.send()
.await?;
}
Errors.
Compiling sabre-actix-kist v0.1.0 (E:\wamp64\www\BukFlightsNewLevel\flights\APIs\sabre-actix-kist)
error[E0425]: cannot find value `map` in this scope
--> src\search\routes\common.rs:28:12
|
28 | .json(&map)
| ^^^ not found in this scope
error[E0728]: `await` is only allowed inside `async` functions and blocks
--> src\search\routes\common.rs:25:12
|
4 | pub fn authenticate() -> String {
| ------------ this is not `async`
...
25 | let resz = client.post("https://api-crt.cert.havail.sabre.com/v2/auth/token")
| ____________^
26 | | .headers(construct_headers())
27 | | .body("grant_type=client_credentials")
28 | | .json(&map)
29 | | .send()
30 | | .await?;
| |__________^ only allowed inside `async` functions and blocks
error[E0277]: the trait bound `std::result::Result<search::routes::reqwest::Response, search::routes::reqwest::Error>: std::future::Future` is not satisfied
--> src\search\routes\common.rs:25:12
|
25 | let resz = client.post("https://api-crt.cert.havail.sabre.com/v2/auth/token")
| ____________^
26 | | .headers(construct_headers())
27 | | .body("grant_type=client_credentials")
28 | | .json(&map)
29 | | .send()
30 | | .await?;
| |__________^ the trait `std::future::Future` is not implemented for `std::result::Result<search::routes::reqwest::Response, search::routes::reqwest::Error>`
error[E0277]: the `?` operator can only be used in a function that returns `Result` or `Option` (or another type that implements `std::ops::Try`)
--> src\search\routes\common.rs:25:12
|
4 | / pub fn authenticate() -> String {
5 | |
6 | | let res = reqwest::get("http://api.github.com/users")
7 | | .expect("Couldnt")
... |
25 | | let resz = client.post("https://api-crt.cert.havail.sabre.com/v2/auth/token")
| |____________^
26 | || .headers(construct_headers())
27 | || .body("grant_type=client_credentials")
28 | || .json(&map)
29 | || .send()
30 | || .await?;
| ||___________^ cannot use the `?` operator in a function that returns `std::string::String`
31 | |
32 | | }
| |_- this function should return `Result` or `Option` to accept `?`
|
= help: the trait `std::ops::Try` is not implemented for `std::string::String`
= note: required by `std::ops::Try::from_error`
error[E0308]: mismatched types
--> src\search\routes\common.rs:4:26
|
4 | pub fn authenticate() -> String {
| ------------ ^^^^^^ expected struct `std::string::String`, found `()`
| |
| implicitly returns `()` as its body has no tail or `return` expression
**Update Again.
extern crate reqwest;
use hyper::header::{Headers, Authorization, Basic, ContentType};
fn construct_headers() -> Headers {
let mut headers = Headers::new();
headers.set(
Authorization(
Basic {
username: "HI:ABGTYH".to_owned(),
password: Some("%8YHT".to_owned())
}
)
);
headers.set(ContentType::form_url_encoded());
headers
}
pub async fn authenticate() -> Result<String, reqwest::Error> {
let client = reqwest::Client::new();
let resz = client.post("https://api.test.com/auth/token")
.headers(construct_headers())
.body("grant_type=client_credentials")
.json(&map)
.send()
.await?;
}
**New Error.
error[E0425]: cannot find value `map` in this scope
--> src\search\routes\common.rs:24:12
|
24 | .json(&map)
| ^^^ not found in this scope
error[E0277]: the trait bound `impl std::future::Future: search::routes::serde::Serialize` is not satisfied
--> src\search\routes.rs:24:29
|
24 | HttpResponse::Ok().json(set_token)
| ^^^^^^^^^ the trait `search::routes::serde::Serialize` is not implemented for `impl std::future::Future`
error[E0308]: mismatched types
--> src\search\routes\common.rs:22:14
|
22 | .headers(construct_headers())
| ^^^^^^^^^^^^^^^^^^^ expected struct `search::routes::reqwest::header::HeaderMap`, found struct `hyper::header::Headers`
|
= note: expected struct `search::routes::reqwest::header::HeaderMap`
found struct `hyper::header::Headers`
error[E0599]: no method named `json` found for struct `search::routes::reqwest::RequestBuilder` in the current scope
--> src\search\routes\common.rs:24:6
|
24 | .json(&map)
| ^^^^ method not found in `search::routes::reqwest::RequestBuilder`
error[E0308]: mismatched types
--> src\search\routes\common.rs:18:63
|
18 | pub async fn authenticate() -> Result<String, reqwest::Error> {
| _______________________________________________________________^
19 | |
20 | | let client = reqwest::Client::new();
21 | | let resz = client.post("https://api.test.com/auth/token")
... |
27 | |
28 | | }
| |_^ expected enum `std::result::Result`, found `()`
|
= note: expected enum `std::result::Result<std::string::String, search::routes::reqwest::Error>`
found unit type `()`
Can I clarify your question? As I understand you already know how to use functions from another file. Do you need to know how to make API requests and pass a result form a request as Response?
Firstly, you need to create fetch_test.rs with using for example reqwest lib:
let client = reqwest::Client::new();
let res = client.post("http://httpbin.org/post")
.json(&map)
.send()
.await?;
Map result or pass it as it is.
Return result in routes.rs: HttpResponse::Ok().json(res)
I hope it will help you.

Which data type maps to MySQL's bit(1)? [duplicate]

I used Java before, so some columns' type in database table is bit(1). But now I want to use beego to rebuild my project and I don't want to alter my database table (need do much). I use beego's orm in my project. So which Go type should I use?
Table like this and the deleted column has the question:
+--------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+-------+
| id | varchar(255) | NO | PRI | NULL | |
| created_time | datetime | YES | | NULL | |
| deleted | bit(1) | NO | | NULL | |
| updated_time | datetime | YES | | NULL | |
| icon_class | varchar(255) | YES | | NULL | |
| mark | varchar(255) | YES | | NULL | |
| name | varchar(255) | YES | | NULL | |
| parent | varchar(255) | YES | | NULL | |
+--------------+--------------+------+-----+---------+-------+
go struct like this:
type BaseModel struct {
Id string `orm:"pk";form:"id"`
CreatedTime time.Time `orm:"auto_now_add;type(datetime)";form:"-"`
UpdatedTime time.Time `orm:"auto_now;type(datetime)";form:"-"`
Deleted bool `form:"-"`
}
When I use bool in my code, the error like this:
`[0]` convert to `*orm.BooleanField` failed, field: shareall-go/models.Category.BaseModel.Deleted err: strconv.ParseBool: parsing "\x00": invalid syntax
Sqlx has created also a custom bool datatype for such situations and it works fine. Link to related code
// BitBool is an implementation of a bool for the MySQL type BIT(1).
// This type allows you to avoid wasting an entire byte for MySQL's boolean type TINYINT.
type BitBool bool
// Value implements the driver.Valuer interface,
// and turns the BitBool into a bitfield (BIT(1)) for MySQL storage.
func (b BitBool) Value() (driver.Value, error) {
if b {
return []byte{1}, nil
} else {
return []byte{0}, nil
}
}
// Scan implements the sql.Scanner interface,
// and turns the bitfield incoming from MySQL into a BitBool
func (b *BitBool) Scan(src interface{}) error {
v, ok := src.([]byte)
if !ok {
return errors.New("bad []byte type assertion")
}
*b = v[0] == 1
return nil
}
So which Go type should I use?
Generally, this depends on how you're using the data more than how it's stored. As you eluded to, you tried using it as a Bool (which makes sense) but got an error.
The problem is that MySQL expresses a BIT differently than a BOOL, and the Go MySQL driver expects a MySQL BOOL. You can fix this by using a custom type that implements the sql.Scanner interface. Since you presumably have only two (or maybe three, if you count NULL) inputs, it should be fairly easy. Note this code is incomplete and untested. It is meant to serve as a guide, not a copy-and-paste solution.
type MyBool bool
func (b *MyBool) Scan(src interface{}) error {
str, ok := src.(string)
if !ok {
return fmt.Errorf("Unexpected type for MyBool: %T", src)
}
switch str {
case "\x00":
v := false
*b = v
case "\x01":
v := true
*b = v
}
return nil
}
I have no idea why Tarmo's solution doesn't work for me. But after a little bit modification, it works.
type BitBool bool
func (bb BitBool) Value() (driver.Value, error) {
return bool(bb), nil
}
func (bb *BitBool) Scan(src interface{}) error {
if src == nil {
// MySql NULL value turns into false
*bb = false
return nil
}
bs, ok := src.([]byte)
if !ok {
return fmt.Errorf("Not byte slice!")
}
*bb = bs[0] == 1
return nil
}
In this way, I can do the following
var isVip BitBool
row := db.QueryRow("SELECT is_vip FROM user WHERE user_id = '12345'")
err := row.Scan(&isVip)
var isVip BitBool = true
rows, err := db.Query("SELECT username FROM user WHERE is_vip = ?", isVip)

Extract Nested Array/Struct Spark DataFrame [duplicate]

How Can I query an RDD with complex types such as maps/arrays?
for example, when I was writing this test code:
case class Test(name: String, map: Map[String, String])
val map = Map("hello" -> "world", "hey" -> "there")
val map2 = Map("hello" -> "people", "hey" -> "you")
val rdd = sc.parallelize(Array(Test("first", map), Test("second", map2)))
I thought the syntax would be something like:
sqlContext.sql("SELECT * FROM rdd WHERE map.hello = world")
or
sqlContext.sql("SELECT * FROM rdd WHERE map[hello] = world")
but I get
Can't access nested field in type MapType(StringType,StringType,true)
and
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved attributes
respectively.
It depends on a type of the column. Lets start with some dummy data:
import org.apache.spark.sql.functions.{udf, lit}
import scala.util.Try
case class SubRecord(x: Int)
case class ArrayElement(foo: String, bar: Int, vals: Array[Double])
case class Record(
an_array: Array[Int], a_map: Map[String, String],
a_struct: SubRecord, an_array_of_structs: Array[ArrayElement])
val df = sc.parallelize(Seq(
Record(Array(1, 2, 3), Map("foo" -> "bar"), SubRecord(1),
Array(
ArrayElement("foo", 1, Array(1.0, 2.0, 2.0)),
ArrayElement("bar", 2, Array(3.0, 4.0, 5.0)))),
Record(Array(4, 5, 6), Map("foz" -> "baz"), SubRecord(2),
Array(ArrayElement("foz", 3, Array(5.0, 6.0)),
ArrayElement("baz", 4, Array(7.0, 8.0))))
)).toDF
df.registerTempTable("df")
df.printSchema
// root
// |-- an_array: array (nullable = true)
// | |-- element: integer (containsNull = false)
// |-- a_map: map (nullable = true)
// | |-- key: string
// | |-- value: string (valueContainsNull = true)
// |-- a_struct: struct (nullable = true)
// | |-- x: integer (nullable = false)
// |-- an_array_of_structs: array (nullable = true)
// | |-- element: struct (containsNull = true)
// | | |-- foo: string (nullable = true)
// | | |-- bar: integer (nullable = false)
// | | |-- vals: array (nullable = true)
// | | | |-- element: double (containsNull = false)
array (ArrayType) columns:
Column.getItem method
df.select($"an_array".getItem(1)).show
// +-----------+
// |an_array[1]|
// +-----------+
// | 2|
// | 5|
// +-----------+
Hive brackets syntax:
sqlContext.sql("SELECT an_array[1] FROM df").show
// +---+
// |_c0|
// +---+
// | 2|
// | 5|
// +---+
an UDF
val get_ith = udf((xs: Seq[Int], i: Int) => Try(xs(i)).toOption)
df.select(get_ith($"an_array", lit(1))).show
// +---------------+
// |UDF(an_array,1)|
// +---------------+
// | 2|
// | 5|
// +---------------+
Additionally to the methods listed above Spark supports a growing list of built-in functions operating on complex types. Notable examples include higher order functions like transform (SQL 2.4+, Scala 3.0+, PySpark / SparkR 3.1+):
df.selectExpr("transform(an_array, x -> x + 1) an_array_inc").show
// +------------+
// |an_array_inc|
// +------------+
// | [2, 3, 4]|
// | [5, 6, 7]|
// +------------+
import org.apache.spark.sql.functions.transform
df.select(transform($"an_array", x => x + 1) as "an_array_inc").show
// +------------+
// |an_array_inc|
// +------------+
// | [2, 3, 4]|
// | [5, 6, 7]|
// +------------+
filter (SQL 2.4+, Scala 3.0+, Python / SparkR 3.1+)
df.selectExpr("filter(an_array, x -> x % 2 == 0) an_array_even").show
// +-------------+
// |an_array_even|
// +-------------+
// | [2]|
// | [4, 6]|
// +-------------+
import org.apache.spark.sql.functions.filter
df.select(filter($"an_array", x => x % 2 === 0) as "an_array_even").show
// +-------------+
// |an_array_even|
// +-------------+
// | [2]|
// | [4, 6]|
// +-------------+
aggregate (SQL 2.4+, Scala 3.0+, PySpark / SparkR 3.1+):
df.selectExpr("aggregate(an_array, 0, (acc, x) -> acc + x, acc -> acc) an_array_sum").show
// +------------+
// |an_array_sum|
// +------------+
// | 6|
// | 15|
// +------------+
import org.apache.spark.sql.functions.aggregate
df.select(aggregate($"an_array", lit(0), (x, y) => x + y) as "an_array_sum").show
// +------------+
// |an_array_sum|
// +------------+
// | 6|
// | 15|
// +------------+
array processing functions (array_*) like array_distinct (2.4+):
import org.apache.spark.sql.functions.array_distinct
df.select(array_distinct($"an_array_of_structs.vals"(0))).show
// +-------------------------------------------+
// |array_distinct(an_array_of_structs.vals[0])|
// +-------------------------------------------+
// | [1.0, 2.0]|
// | [5.0, 6.0]|
// +-------------------------------------------+
array_max (array_min, 2.4+):
import org.apache.spark.sql.functions.array_max
df.select(array_max($"an_array")).show
// +-------------------+
// |array_max(an_array)|
// +-------------------+
// | 3|
// | 6|
// +-------------------+
flatten (2.4+)
import org.apache.spark.sql.functions.flatten
df.select(flatten($"an_array_of_structs.vals")).show
// +---------------------------------+
// |flatten(an_array_of_structs.vals)|
// +---------------------------------+
// | [1.0, 2.0, 2.0, 3...|
// | [5.0, 6.0, 7.0, 8.0]|
// +---------------------------------+
arrays_zip (2.4+):
import org.apache.spark.sql.functions.arrays_zip
df.select(arrays_zip($"an_array_of_structs.vals"(0), $"an_array_of_structs.vals"(1))).show(false)
// +--------------------------------------------------------------------+
// |arrays_zip(an_array_of_structs.vals[0], an_array_of_structs.vals[1])|
// +--------------------------------------------------------------------+
// |[[1.0, 3.0], [2.0, 4.0], [2.0, 5.0]] |
// |[[5.0, 7.0], [6.0, 8.0]] |
// +--------------------------------------------------------------------+
array_union (2.4+):
import org.apache.spark.sql.functions.array_union
df.select(array_union($"an_array_of_structs.vals"(0), $"an_array_of_structs.vals"(1))).show
// +---------------------------------------------------------------------+
// |array_union(an_array_of_structs.vals[0], an_array_of_structs.vals[1])|
// +---------------------------------------------------------------------+
// | [1.0, 2.0, 3.0, 4...|
// | [5.0, 6.0, 7.0, 8.0]|
// +---------------------------------------------------------------------+
slice (2.4+):
import org.apache.spark.sql.functions.slice
df.select(slice($"an_array", 2, 2)).show
// +---------------------+
// |slice(an_array, 2, 2)|
// +---------------------+
// | [2, 3]|
// | [5, 6]|
// +---------------------+
map (MapType) columns
using Column.getField method:
df.select($"a_map".getField("foo")).show
// +----------+
// |a_map[foo]|
// +----------+
// | bar|
// | null|
// +----------+
using Hive brackets syntax:
sqlContext.sql("SELECT a_map['foz'] FROM df").show
// +----+
// | _c0|
// +----+
// |null|
// | baz|
// +----+
using a full path with dot syntax:
df.select($"a_map.foo").show
// +----+
// | foo|
// +----+
// | bar|
// |null|
// +----+
using an UDF
val get_field = udf((kvs: Map[String, String], k: String) => kvs.get(k))
df.select(get_field($"a_map", lit("foo"))).show
// +--------------+
// |UDF(a_map,foo)|
// +--------------+
// | bar|
// | null|
// +--------------+
Growing number of map_* functions like map_keys (2.3+)
import org.apache.spark.sql.functions.map_keys
df.select(map_keys($"a_map")).show
// +---------------+
// |map_keys(a_map)|
// +---------------+
// | [foo]|
// | [foz]|
// +---------------+
or map_values (2.3+)
import org.apache.spark.sql.functions.map_values
df.select(map_values($"a_map")).show
// +-----------------+
// |map_values(a_map)|
// +-----------------+
// | [bar]|
// | [baz]|
// +-----------------+
Please check SPARK-23899 for a detailed list.
struct (StructType) columns using full path with dot syntax:
with DataFrame API
df.select($"a_struct.x").show
// +---+
// | x|
// +---+
// | 1|
// | 2|
// +---+
with raw SQL
sqlContext.sql("SELECT a_struct.x FROM df").show
// +---+
// | x|
// +---+
// | 1|
// | 2|
// +---+
fields inside array of structs can be accessed using dot-syntax, names and standard Column methods:
df.select($"an_array_of_structs.foo").show
// +----------+
// | foo|
// +----------+
// |[foo, bar]|
// |[foz, baz]|
// +----------+
sqlContext.sql("SELECT an_array_of_structs[0].foo FROM df").show
// +---+
// |_c0|
// +---+
// |foo|
// |foz|
// +---+
df.select($"an_array_of_structs.vals".getItem(1).getItem(1)).show
// +------------------------------+
// |an_array_of_structs.vals[1][1]|
// +------------------------------+
// | 4.0|
// | 8.0|
// +------------------------------+
user defined types (UDTs) fields can be accessed using UDFs. See Spark SQL referencing attributes of UDT for details.
Notes:
depending on a Spark version some of these methods can be available only with HiveContext. UDFs should work independent of version with both standard SQLContext and HiveContext.
generally speaking nested values are a second class citizens. Not all typical operations are supported on nested fields. Depending on a context it could be better to flatten the schema and / or explode collections
df.select(explode($"an_array_of_structs")).show
// +--------------------+
// | col|
// +--------------------+
// |[foo,1,WrappedArr...|
// |[bar,2,WrappedArr...|
// |[foz,3,WrappedArr...|
// |[baz,4,WrappedArr...|
// +--------------------+
Dot syntax can be combined with wildcard character (*) to select (possibly multiple) fields without specifying names explicitly:
df.select($"a_struct.*").show
// +---+
// | x|
// +---+
// | 1|
// | 2|
// +---+
JSON columns can be queried using get_json_object and from_json functions. See How to query JSON data column using Spark DataFrames? for details.
Once You convert it to DF, u can simply fetch data as
val rddRow= rdd.map(kv=>{
val k = kv._1
val v = kv._2
Row(k, v)
})
val myFld1 = StructField("name", org.apache.spark.sql.types.StringType, true)
val myFld2 = StructField("map", org.apache.spark.sql.types.MapType(StringType, StringType), true)
val arr = Array( myFld1, myFld2)
val schema = StructType( arr )
val rowrddDF = sqc.createDataFrame(rddRow, schema)
rowrddDF.registerTempTable("rowtbl")
val rowrddDFFinal = rowrddDF.select(rowrddDF("map.one"))
or
val rowrddDFFinal = rowrddDF.select("map.one")
here was what I did and it worked
case class Test(name: String, m: Map[String, String])
val map = Map("hello" -> "world", "hey" -> "there")
val map2 = Map("hello" -> "people", "hey" -> "you")
val rdd = sc.parallelize(Array(Test("first", map), Test("second", map2)))
val rdddf = rdd.toDF
rdddf.registerTempTable("mytable")
sqlContext.sql("select m.hello from mytable").show
Results
+------+
| hello|
+------+
| world|
|people|
+------+