When using dbWriteTable() in the RMySQL package, logical values are written as 0 regardless of value. I would expect that TRUE values would return a 1:
# Setup
# con is a valid MySQLConnection object
> df <- data.frame(string = 'Testing Logical Values',
t_lgl = TRUE,
f_lgl = FALSE,
stringsAsFactors = FALSE)
> df
string t_lgl f_lgl
1 Testing Logical Values TRUE FALSE
> class(df[,2])
[1] "logical"
# Test
# This schema has no tables until dbWriteTable() is called
> dbWriteTable(con,'test_table',df)
[1] TRUE
# Result
> dbReadTable(con,'test_table')
string t_lgl f_lgl
1 Testing Logical Values 0 0
> class(dbReadTable(con,'test_table')[,2])
[1] "integer"
Shouldn't the t_lgl value return 1 since it was TRUE and passed to dbWriteTable() as a logical?
The RMySQL package is being phased out in favour of RMariaDB.
Using RMariaDB I am able to successfully write logical values to a MySQL database like this:
con <- dbConnect(RMariaDB::MariaDB(), group = "my-db")
dbWriteTable(con, "test", data.frame(a=T, b=F), overwrite = T)
Related
I'm using renderDataTable in my shiny app to display the contents of a data.table vals$content4table which is a reactiveValues.
It can happen that the vals$content4table is equal to a datatable with no columns.
In that case i have an error while using formatCurrency because it searches for a column that does not exist.
Is there any way to check if the datatable has columns with ifelse to avoid the error?
Here is a piece of my code of my server.
#initialising vals$content4table when launching the app
vals <- reactiveValues(content4table = if(someBoolean) {data.table(NULL)} else {data.table("Column1" = "whatever","Currency" = 1000}
)
output$TableInUI <- DT::renderDataTable(datatable(vals$content4table) %>% ifelse(nrow(vals$content4table)>0,formatCurrency(2, currency = "", interval = 3, mark = ",", digits = 0),fnothing()))
#where fnothing is defined as
fnothing<-function(df) return(df)
The above code doesn't work and gives this error:
Warning: Error in ifelse: unused argument (fnothing())
You could use req:
output$TableInUI <- DT::renderDataTable({
req(isTRUE(ncol(vals$content4table)>0))
vals$content4table
})
When executing the following code in order to insert multiple values through a variable from python into mysql, I get:
'Incorrect number of arguments executing prepared statement ' error after executing 'result = cursor.executemany(sql_insert_query, records_to_insert)'
if i remove 'prepared=True', the error becomes:
'Not all parameters were used in the SQL statement'
import mysql.connector
connection = mysql.connector.connect(host='localhost',
database='majorprediction',
user='root',
password='')
records_to_insert = [ ('x') ,
('y'),
('z') ]
sql_insert_query = """ INSERT INTO majorpred (Major)
VALUES (%s) """
cursor = connection.cursor(prepared=True)
result = cursor.executemany(sql_insert_query, records_to_insert)
connection.commit()
Can anyone specify where is the problem?
You are passing a list of characters instead of tuples. For instance, if you try and run:
for record in records_to_insert:
print(record, isinstance(record, str), isinstance(record, tuple))
You will get:
x True False
y True False
z True False
To create tuples with a single element in python you can do the following:
records_to_insert = [
('x',),
('y',),
('z',)
]
If you have a list of parameters and want to cast all of them to tuple you can do as follows:
list_of_elements = list("list_of_characters")
tuples = [
tuple(e) for e in list_of_elements
]
Hope this helps!
I would like to scan hbase table and see integers as strings (not their binary representation). I can do the conversion but have no idea how to write scan statement by using Java API from hbase shell:
org.apache.hadoop.hbase.util.Bytes.toString(
"\x48\x65\x6c\x6c\x6f\x20\x48\x42\x61\x73\x65".to_java_bytes)
org.apache.hadoop.hbase.util.Bytes.toString("Hello HBase".to_java_bytes)
I will be very happy to have examples of scan, get that searching binary data (long's) and output normal strings. I am using hbase shell, not JAVA.
HBase stores data as byte arrays (untyped). Therefore if you perform a table scan data will be displayed in a common format (escaped hexadecimal string), e.g: "\x48\x65\x6c\x6c\x6f\x20\x48\x42\x61\x73\x65" -> Hello HBase
If you want to get back the typed value from the serialized byte array you have to do this manually.
You have the following options:
Java code (Bytes.toString(...))
hack the to_string function in $HBASE/HOME/lib/ruby/hbase/table.rb :
replace toStringBinary with toInt for non-meta tables
write a get/scan JRuby function which converts the byte array to the appropriate type
Since you want it HBase shell, then consider the last option:
Create a file get_result.rb :
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.client.HTable
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Result;
import java.util.ArrayList;
# Simple function equivalent to scan 'test', {COLUMNS => 'c:c2'}
def get_result()
htable = HTable.new(HBaseConfiguration.new, "test")
rs = htable.getScanner(Bytes.toBytes("c"), Bytes.toBytes("c2"))
output = ArrayList.new
output.add "ROW\t\t\t\t\t\tCOLUMN\+CELL"
rs.each { |r|
r.raw.each { |kv|
row = Bytes.toString(kv.getRow)
fam = Bytes.toString(kv.getFamily)
ql = Bytes.toString(kv.getQualifier)
ts = kv.getTimestamp
val = Bytes.toInt(kv.getValue)
output.add " #{row} \t\t\t\t\t\t column=#{fam}:#{ql}, timestamp=#{ts}, value=#{val}"
}
}
output.each {|line| puts "#{line}\n"}
end
load it in the HBase shell and use it:
require '/path/to/get_result'
get_result
Note: modify/enhance/fix the code according to your needs
Just for completeness' sake, it turns out that the call Bytes::toStringBinary gives the hex-escaped sequence you get in HBase shell:
\x0B\x2_SOME_ASCII_TEXT_\x10\x00...
Whereas, Bytes::toString will try to deserialize to a string assuming UTF8, which will look more like:
\u8900\u0710\u0115\u0320\u0000_SOME_UTF8_TEXT_\u4009...
you can add a scan_counter command to the hbase shell.
first:
add to /usr/lib/hbase/lib/ruby/hbase/table.rb (after the scan function):
#----------------------------------------------------------------------------------------------
# Scans whole table or a range of keys and returns rows matching specific criterias with values as number
def scan_counter(args = {})
unless args.kind_of?(Hash)
raise ArgumentError, "Arguments should be a hash. Failed to parse #{args.inspect}, #{args.class}"
end
limit = args.delete("LIMIT") || -1
maxlength = args.delete("MAXLENGTH") || -1
if args.any?
filter = args["FILTER"]
startrow = args["STARTROW"] || ''
stoprow = args["STOPROW"]
timestamp = args["TIMESTAMP"]
columns = args["COLUMNS"] || args["COLUMN"] || get_all_columns
cache = args["CACHE_BLOCKS"] || true
versions = args["VERSIONS"] || 1
timerange = args[TIMERANGE]
# Normalize column names
columns = [columns] if columns.class == String
unless columns.kind_of?(Array)
raise ArgumentError.new("COLUMNS must be specified as a String or an Array")
end
scan = if stoprow
org.apache.hadoop.hbase.client.Scan.new(startrow.to_java_bytes, stoprow.to_java_bytes)
else
org.apache.hadoop.hbase.client.Scan.new(startrow.to_java_bytes)
end
columns.each { |c| scan.addColumns(c) }
scan.setFilter(filter) if filter
scan.setTimeStamp(timestamp) if timestamp
scan.setCacheBlocks(cache)
scan.setMaxVersions(versions) if versions > 1
scan.setTimeRange(timerange[0], timerange[1]) if timerange
else
scan = org.apache.hadoop.hbase.client.Scan.new
end
# Start the scanner
scanner = #table.getScanner(scan)
count = 0
res = {}
iter = scanner.iterator
# Iterate results
while iter.hasNext
if limit > 0 && count >= limit
break
end
row = iter.next
key = org.apache.hadoop.hbase.util.Bytes::toStringBinary(row.getRow)
row.list.each do |kv|
family = String.from_java_bytes(kv.getFamily)
qualifier = org.apache.hadoop.hbase.util.Bytes::toStringBinary(kv.getQualifier)
column = "#{family}:#{qualifier}"
cell = to_string_scan_counter(column, kv, maxlength)
if block_given?
yield(key, "column=#{column}, #{cell}")
else
res[key] ||= {}
res[key][column] = cell
end
end
# One more row processed
count += 1
end
return ((block_given?) ? count : res)
end
#----------------------------------------------------------------------------------------
# Helper methods
# Returns a list of column names in the table
def get_all_columns
#table.table_descriptor.getFamilies.map do |family|
"#{family.getNameAsString}:"
end
end
# Checks if current table is one of the 'meta' tables
def is_meta_table?
tn = #table.table_name
org.apache.hadoop.hbase.util.Bytes.equals(tn, org.apache.hadoop.hbase.HConstants::META_TABLE_NAME) || org.apache.hadoop.hbase.util.Bytes.equals(tn, org.apache.hadoop.hbase.HConstants::ROOT_TABLE_NAME)
end
# Returns family and (when has it) qualifier for a column name
def parse_column_name(column)
split = org.apache.hadoop.hbase.KeyValue.parseColumn(column.to_java_bytes)
return split[0], (split.length > 1) ? split[1] : nil
end
# Make a String of the passed kv
# Intercept cells whose format we know such as the info:regioninfo in .META.
def to_string(column, kv, maxlength = -1)
if is_meta_table?
if column == 'info:regioninfo' or column == 'info:splitA' or column == 'info:splitB'
hri = org.apache.hadoop.hbase.util.Writables.getHRegionInfoOrNull(kv.getValue)
return "timestamp=%d, value=%s" % [kv.getTimestamp, hri.toString]
end
if column == 'info:serverstartcode'
if kv.getValue.length > 0
str_val = org.apache.hadoop.hbase.util.Bytes.toLong(kv.getValue)
else
str_val = org.apache.hadoop.hbase.util.Bytes.toStringBinary(kv.getValue)
end
return "timestamp=%d, value=%s" % [kv.getTimestamp, str_val]
end
end
val = "timestamp=#{kv.getTimestamp}, value=#{org.apache.hadoop.hbase.util.Bytes::toStringBinary(kv.getValue)}"
(maxlength != -1) ? val[0, maxlength] : val
end
def to_string_scan_counter(column, kv, maxlength = -1)
if is_meta_table?
if column == 'info:regioninfo' or column == 'info:splitA' or column == 'info:splitB'
hri = org.apache.hadoop.hbase.util.Writables.getHRegionInfoOrNull(kv.getValue)
return "timestamp=%d, value=%s" % [kv.getTimestamp, hri.toString]
end
if column == 'info:serverstartcode'
if kv.getValue.length > 0
str_val = org.apache.hadoop.hbase.util.Bytes.toLong(kv.getValue)
else
str_val = org.apache.hadoop.hbase.util.Bytes.toStringBinary(kv.getValue)
end
return "timestamp=%d, value=%s" % [kv.getTimestamp, str_val]
end
end
val = "timestamp=#{kv.getTimestamp}, value=#{org.apache.hadoop.hbase.util.Bytes::toLong(kv.getValue)}"
(maxlength != -1) ? val[0, maxlength] : val
end
second:
add to /usr/lib/hbase/lib/ruby/shell/commands/
the following file called: scan_counter.rb
#
# Copyright 2010 The Apache Software Foundation
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
module Shell
module Commands
class ScanCounter < Command
def help
return <<-EOF
Scan a table with cell value that is long; pass table name and optionally a dictionary of scanner
specifications. Scanner specifications may include one or more of:
TIMERANGE, FILTER, LIMIT, STARTROW, STOPROW, TIMESTAMP, MAXLENGTH,
or COLUMNS. If no columns are specified, all columns will be scanned.
To scan all members of a column family, leave the qualifier empty as in
'col_family:'.
Some examples:
hbase> scan_counter '.META.'
hbase> scan_counter '.META.', {COLUMNS => 'info:regioninfo'}
hbase> scan_counter 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}
hbase> scan_counter 't1', {FILTER => org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)}
hbase> scan_counter 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804, 1303668904]}
For experts, there is an additional option -- CACHE_BLOCKS -- which
switches block caching for the scanner on (true) or off (false). By
default it is enabled. Examples:
hbase> scan_counter 't1', {COLUMNS => ['c1', 'c2'], CACHE_BLOCKS => false}
EOF
end
def command(table, args = {})
now = Time.now
formatter.header(["ROW", "COLUMN+CELL"])
count = table(table).scan_counter(args) do |row, cells|
formatter.row([ row, cells ])
end
formatter.footer(now, count)
end
end
end
end
finally
add to /usr/lib/hbase/lib/ruby/shell.rb the function scan_counter.
replace the current function with this: (you can identify it by: 'DATA MANIPULATION COMMANDS',)
Shell.load_command_group(
'dml',
:full_name => 'DATA MANIPULATION COMMANDS',
:commands => %w[
count
delete
deleteall
get
get_counter
incr
put
scan
scan_counter
truncate
]
)
I am currently writing a script that downloads a bunch of .csv's from a FTP server, and then puts each .csv in a MySQL database as its own table.
I download the .csv's from the FTP using RCurl and place all of the .csv's in my working directory. To create tables out of each .csv, I am using the sqlSave function from the RODBC package, where the table name is the same name as the .csv. This works fine whenever a .csv name is less than 18 characters, but when it is greater it fails. And by "fails", I mean R crashes. To track down the bug, I called debug on sqlSave.
I found that there are at least two functions that sqlSave calls that cause R to crash. The first is RODBC:::odbcTableExists, which is a non-visible function. Here is the code for the function:
RODBC:::odbcTableExists
function (channel, tablename, abort = TRUE, forQuery = TRUE,
allowDot = attr(channel, "interpretDot"))
{
if (!odbcValidChannel(channel))
stop("first argument is not an open RODBC channel")
if (length(tablename) != 1)
stop(sQuote(tablename), " should be a name")
tablename <- as.character(tablename)
switch(attr(channel, "case"), nochange = {
}, toupper = tablename <- toupper(tablename), tolower = tablename <- tolower(tablename))
isExcel <- odbcGetInfo(channel)[1L] == "EXCEL"
hasDot <- grepl(".", tablename, fixed = TRUE)
if (allowDot && hasDot) {
parts <- strsplit(tablename, ".", fixed = TRUE)[[1]]
if (length(parts) > 2)
ans <- FALSE
else {
res <- if (attr(channel, "isMySQL"))
sqlTables(channel, catalog = parts[1], tableName = parts[2])
else sqlTables(channel, schema = parts[1], tableName = parts[2])
ans <- is.data.frame(res) && nrow(res) > 0
}
}
else if (!isExcel) {
res <- sqlTables(channel, tableName = tablename)
ans <- is.data.frame(res) && nrow(res) > 0
}
else {
res <- sqlTables(channel)
tables <- stables <- if (is.data.frame(res))
res[, 3]
else ""
if (isExcel) {
tables <- sub("^'(.*)'$", "\\1", tables)
tables <- unique(c(tables, sub("\\$$", "", tables)))
}
ans <- tablename %in% tables
}
if (abort && !ans)
stop(sQuote(tablename), ": table not found on channel")
enc <- attr(channel, "encoding")
if (nchar(enc))
tablename <- iconv(tablename, to = enc)
if (ans && isExcel) {
dbname <- if (tablename %in% stables)
tablename
else paste(tablename, "$", sep = "")
if (forQuery)
paste("[", dbname, "]", sep = "")
else dbname
}
else if (ans) {
if (forQuery && !hasDot)
quoteTabNames(channel, tablename)
else tablename
}
else character(0L)
}
This fails here when the table name over 18 characters in length:
res <- sqlTables(channel, tableName = tablename)
I have fixed it by changing this to:
res <- sqlTables(channel, tablename)
I then reassign the function with the same name (odbcTableExists) in the namespace with this code change using assignInNamepace.
RODBC:::odbcTableExists no longer causes an issue. However, R still crashes when sqlwrite is called from within sqlSave(). I called debug on sqlwrite, and I found that RODBC:::odbcColumns (another non-visible function) causes that to crash when tablenames are too long. Unfortunately, I am not sure how to change RODBC:::odbcColumns to avoid the bug like I did before.
I am using R 2.15.1, and the platform is :x86_64-pc-ming32/x64 (64-bit). I should also note that I am trying to run this on a work computer, but if I run the exact same code on my personal computer, R does not crash (no bug). The work computer runs windows 7 professional, and my home computer runs windows 7 home premium with R 2.14.1.
I love this hack (I too have Windows 7 Professional at R 2.15.1 at work), and it does not crash anymore, but it causes another problem after I replaced that line and used assignInNamespace; also for some reason I had to replace odbcValidChannel with RODBC:::odbcValidChannel and quoteTabNames with RODBC:::quoteTabNames
But when I used sqlSave, I got the following error:
Error in odbcUpdate(channel, query, mydata, coldata[m, ], test = test, :
no parameters, so nothing to update
I don't even use odbcUpdate anywhere in the code, and the RODBC::: sqlSave does not have the odbcUpdate call inside.
Any thoughts?
thank you,
-Alex
I just started using the R package called RMySQL in order to get around some memory limitations on my computer. I am trying to take a matrix with 100 columns in R (called data.df), then make a new table on an SQL database that has "100 choose 2" (=4950) columns, where each column is a linear combination of two columns from the initial matrix. So far I have something like this:
countnumber <- 1
con <- dbConnect(MySQL(), user = "root", password = "password", dbname = "myDB")
temp <- as.data.frame(data.df[,1] - data.df[,2])
colnames(temp) <- paste(pairs[[countnumber]][1], pairs[[countnumber]][2], sep = "")
dbWriteTable(con, "spreadtable", temp, row.names=T, overwrite = T)
for(i in 1:(n-1)){
for(j in (i+1):n){
if(!((i==1)&&(j==2))){ #this part excludes the first iteration already taken care of
temp <- as.data.frame(data.df[,i] - data.df[,j])
colnames(temp) <- "hola"
dbWriteTable(con, "spreadtable", value = temp, append = TRUE, overwrite = FALSE, row.names = FALSE)
countnumber <- countnumber + 1
}
}
}
I've also tried toying around with the "field.types" argument of RMySQL::dbWriteTable(), which was suggested at RMySQL dbWriteTable with field.types. Sadly it hasn't helped me out too much.
Questions:
Is making your own sql database a valid solution to the memory-bound nature of R, even if it has 4950 columns?
Is the dbWriteTable() the proper function to be using here?
Assuming the answer is "yes" to both of the previous questions...why isn't this working?
Thanks for any help.
[EDIT]: code with error output:
names <- as.data.frame(index)
names <- t(names)
#dim(names) is 1 409
con <- dbConnect(MySQL(), user = "root", password = "password", dbname = "taylordatabase")
dbGetQuery(con, dbBuildTableDefinition(MySQL(), name="spreadtable", obj=names, row.names = F))
#I would prefer these to be double types with 8 decimal spaces instead of text
#dim(temp) is 1 409
temp <- as.data.frame(data.df[,1] - (ratios[countnumber]*data.df[,2]))
temp <- t(temp)
temp <- as.data.frame(temp)
dbWriteTable(con, name = "spreadtable", temp, append = T)
The table is created successfully in the database (I will change variable type later), but the dbWriteTable() line produces the error:
Error in mysqlExecStatement(conn, statement, ...) :
RS-DBI driver: (could not run statement: Unknown column 'row_names' in 'field list')
[1] FALSE
Warning message:
In mysqlWriteTable(conn, name, value, ...) : could not load data into table
If I make a slight change, I get a different error message:
dbWriteTable(con, name = "spreadtable", temp, append = T, row.names = F)
and
Error in mysqlExecStatement(conn, statement, ...) :
RS-DBI driver: (could not run statement: Unknown column 'X2011_01_03' in 'field list')
[1] FALSE
Warning message:
In mysqlWriteTable(conn, name, value, ...) : could not load data into table
I just want to use "names" as a bunch of column labels. They were initially dates. The actual data I would like to be "temp."
Having a query with 4950 rows is ok, the problem is that what columns you need.
If you always "select * ", you will eventually exhaust all you system memory (in the case that the table has 100 columns)
Why not give us some error message if you have encountered any problems ?