I get the following warnings, when trying to save a simple dataframe to mysql.:
C:...\anaconda3\lib\site-packages\pymysql\cursors.py:170: Warning: (1366, "Incorrect string value: '\x92\xE9t\xE9)' for column 'VARIABLE_VALUE' at row 518")
result = self._query(query)
And
C:...anaconda3\lib\site-packages\pymysql\cursors.py:170: Warning:
(3719, "'utf8' is currently an alias for the character set UTF8MB3,
but will be an alias for UTF8MB4 in a future release. Please consider
using UTF8MB4 in order to be unambiguous.") result =
self._query(query)
Environment info : I use Mysql8, python3.6 (pymysql 0.9.2, sqlalchemy 1.2.1)
I visited posts like the one linked bellow, none of which seem to give a solution as to how to avoid this warning.
MySQL “incorrect string value” error when save unicode string in Django -> Indication is to use UTF8
N.B : The Collation in the table within mysql doesn't seem to be set to the one I specified in the create_db function within the Connection class.
The executable code:
import DataEngine.db.Connection as connection
import random
import pandas as pd
if __name__ == "__main__":
conn = connection.Connection(host="host_name", port="3306", user="username", password="password")
conn.create_db("raw_data")
conn.establish("raw_data")
l1 = []
for i in range(10):
l_nested = []
for j in range(10):
l_nested.append(random.randint(0, 100))
l1.append(l_nested)
df = pd.DataFrame(l1)
conn.save(df, "random_df")
df2 = conn.retrieve("random_df")
print(df2)
So the dataframe that is persisted in the database is :
index 0 1 2 3 4 5 6 7 8 9
0 0 11 57 75 45 81 70 91 66 93 96
1 1 51 43 3 64 2 6 93 5 49 40
2 2 35 80 76 11 23 87 19 32 13 98
3 3 82 10 69 40 34 66 42 24 82 59
4 4 49 74 39 61 14 63 94 92 82 85
5 5 50 47 90 75 48 77 17 43 5 29
6 6 70 40 78 60 29 48 52 48 39 36
7 7 21 87 41 53 95 3 31 67 50 30
8 8 72 79 73 82 20 15 51 14 38 42
9 9 68 71 11 17 48 68 17 42 83 95
My Connection class
import sqlalchemy
import pymysql
import pandas as pd
class Connection:
def __init__(self: object, host: str, port: str, user: str, password: str):
self.host = host
self.port = port
self.user = user
self.password = password
self.conn = None
def create_db(self: object, db_name: str, charset: str = "utf8mb4", collate:str ="utf8mb4_unicode_ci",drop_if_exists: bool = True):
c = pymysql.connect(host=self.host, user=self.user, password=self.password)
if drop_if_exists:
c.cursor().execute("DROP DATABASE IF EXISTS " + db_name)
c.cursor().execute("CREATE DATABASE " + db_name + " CHARACTER SET=" + charset + " COLLATE=" + collate)
c.close()
print("Database %s created with a %s charset" % (db_name, charset))
def establish(self: object, db_name: str, charset: str = "utf8mb4"):
self.conn = sqlalchemy.create_engine(
"mysql+pymysql://" + self.user + ":" + self.password + "#" + self.host + ":" + self.port + "/" + db_name +
"?charset=" + charset)
print("Connection with database : %s has been established as %s at %s." % (db_name, self.user, self.host))
print("Charset : %s" % charset)
def retrieve(self, table):
df = pd.read_sql_table(table, self.conn)
return df
def save(self: object, df: "Pandas.DataFrame", table: str, if_exists: str = "replace", chunksize: int = 10000):
df.to_sql(name=table, con=self.conn, if_exists=if_exists, chunksize=chunksize)
Some elements that might help:
Well, hex 92 and e9 is not valid utf8mb4 (UTF-8). Perhaps you were expecting ’été, assuming CHARACTER SETs cp1250, cp1256, cp1257, or latin1.
Find out where that text is coming from, and let's decide whether it is valid latin1. Then we can fix the code to declare that the client is really using latin1, not utf8mb4? Or we can fix the client to use UTF-8, which would probably be better in the long run.
Related
I am an economist struggling with coding and data scraping.
I am scarping data from the main and unique table on this webpage (https://www.oddsportal.com/basketball/europe/euroleague-2013-2014/results/). I can retrieve all the information of the td HTML tags with python selenium by referring to the class element. The same goes for the th tag where it is stored the information of the date and stage of the competition. In my final dataset, I would like to have the information stored in the th tag in two rows (data and stage of the competition) next to the other rows in the table. Basically, for each match, I would like to have the date and the stage of the competition in rows and not as the head of each group of matches.
The only solution I came up with is to index all the rows (with both th and td tags) and build a while loop to append the information in the th tags to the td rows whose index is lower than the next index for the th tag. Hope I made myself clear (if not I will try to give a more graphical explanation). However, I am not able to code such a logic construct due to my poor coding abilities. I do not know if I need two loops to iterate through different tags (td and th) and in case how to do that. If you have any easier solution, it is more than welcome!
Thanks in advance for the precious help!
code below:
from selenium import webdriver
import time
import pandas as pd
# Season to filter
seasons_filt = ['2013-2014', '2014-2015', '2015-2016','2016-2017', '2017-2018', '2018-2019']
# Define empty data
data_keys = ["Season", "Match_Time", "Home_Team", "Away_Team", "Home_Odd", "Away_Odd", "Home_Score",
"Away_Score", "OT", "N_Bookmakers"]
data = dict()
for key in data_keys:
data[key] = list()
del data_keys
# Define 'driver' variable and launch browser
#path = "C:/Users/ALESSANDRO/Downloads/chromedriver_win32/chromedriver.exe"
#path office pc
path = "C:/Users/aldi/Downloads/chromedriver.exe"
driver = webdriver.Chrome(path)
# Loop through pages based on page_num and season
for season_filt in seasons_filt:
page_num = 0
while True:
page_num += 1
# Get url and navigate it
page_str = (1 - len(str(page_num)))* '0' + str(page_num)
url ="https://www.oddsportal.com/basketball/europe/euroleague-" + str(season_filt) + "/results/#/page/" + page_str + "/"
driver.get(url)
time.sleep(3)
# Check if page has no data
if driver.find_elements_by_id("emptyMsg"):
print("Season {} ended at page {}".format(season_filt, page_num))
break
try:
# Teams
for el in driver.find_elements_by_class_name('name.table-participant'):
el = el.text.strip().split(" - ")
data["Home_Team"].append(el[0])
data["Away_Team"].append(el[1])
data["Season"].append(season_filt)
# Scores
for el in driver.find_elements_by_class_name('center.bold.table-odds.table-score'):
el = el.text.split(":")
if el[1][-3:] == " OT":
data["OT"].append(True)
el[1] = el[1][:-3]
else:
data["OT"].append(False)
data["Home_Score"].append(el[0])
data["Away_Score"].append(el[1])
# Match times
for el in driver.find_elements_by_class_name("table-time"):
data["Match_Time"].append(el.text)
# Odds
i = 0
for el in driver.find_elements_by_class_name("odds-nowrp"):
i += 1
if i%2 == 0:
data["Away_Odd"].append(el.text)
else:
data["Home_Odd"].append(el.text)
# N_Bookmakers
for el in driver.find_elements_by_class_name("center.info-value"):
data["N_Bookmakers"].append(el.text)
# TODO think of inserting the dates list in the dataframe even if it has a different size (19 rows and not 50)
except:
pass
driver.quit()
data = pd.DataFrame(data)
data.to_csv("data_odds.csv", index = False)
I would like to add this information to my dataset as two additional rows:
for el in driver.find_elements_by_class_name("first2.tl")[1:]:
el = el.text.strip().split(" - ")
data["date"].append(el[0])
data["stage"].append(el[1])
Few things I would change here.
Don't overwrite variables. You store elements in your el variable, then you over write the element with your strings. It may work for you here, but you may get yourself into trouble with that practice later on, especially since you are iterating through those elements. It makes it hard to debug too.
I know Selenium has ways to parse the html. But I personally feel BeautifulSoup is a tad easier to parse with and is a little more intuitive if you are simply just trying to pull out data from the html. So I went with BeautifulSoup's .find_previous() to get the tags that precede the games, essentially then able to get your date and stage content.
Lastly, I like to construct a list of dictionaries to make up the data frame. Each item in the list is a dictionary key:value where the key is the column name and value is the data. You sort of do the opposite in creating a dictionary of lists. Now there is nothing wrong with that, but if the lists don't have the same length, you're get an error when trying to create the dataframe. Where as with my way, if for what ever reason there is a value missing, it will still create the dataframe, but will just have a null or nan for the missing data.
There may be more work you need to do with the code to go through the pages, but this gets you the data in the form you need.
Code:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
import time
import pandas as pd
from bs4 import BeautifulSoup
import re
# Season to filter
seasons_filt = ['2013-2014', '2014-2015', '2015-2016','2016-2017', '2017-2018', '2018-2019']
# Define 'driver' variable and launch browser
path = "C:/Users/ALESSANDRO/Downloads/chromedriver_win32/chromedriver.exe"
driver = webdriver.Chrome(path)
rows = []
# Loop through pages based on page_num and season
for season_filt in seasons_filt:
page_num = 0
while True:
page_num += 1
# Get url and navigate it
page_str = (1 - len(str(page_num)))* '0' + str(page_num)
url ="https://www.oddsportal.com/basketball/europe/euroleague-" + str(season_filt) + "/results/#/page/" + page_str + "/"
driver.get(url)
time.sleep(3)
# Check if page has no data
if driver.find_elements_by_id("emptyMsg"):
print("Season {} ended at page {}".format(season_filt, page_num))
break
try:
soup = BeautifulSoup(driver.page_source, 'html.parser')
table = soup.find('table', {'id':'tournamentTable'})
trs = table.find_all('tr', {'class':re.compile('.*deactivate.*')})
for each in trs:
teams = each.find('td', {'class':'name table-participant'}).text.split(' - ')
scores = each.find('td', {'class':re.compile('.*table-score.*')}).text.split(':')
ot = False
for score in scores:
if 'OT' in score:
ot == True
scores = [x.replace('\xa0OT','') for x in scores]
matchTime = each.find('td', {'class':re.compile('.*table-time.*')}).text
# Odds
i = 0
for each_odd in each.find_all('td',{'class':"odds-nowrp"}):
i += 1
if i%2 == 0:
away_odd = each_odd.text
else:
home_odd = each_odd.text
n_bookmakers = soup.find('td',{'class':'center info-value'}).text
date_stage = each.find_previous('th', {'class':'first2 tl'}).text.split(' - ')
date = date_stage[0]
stage = date_stage[1]
row = {'Season':season_filt,
'Home_Team':teams[0],
'Away_Team':teams[1],
'Home_Score':scores[0],
'Away_Score':scores[1],
'OT':ot,
'Match_Time':matchTime,
'Home_Odd':home_odd,
'Away_Odd':away_odd,
'N_Bookmakers':n_bookmakers,
'Date':date,
'Stage':stage}
rows.append(row)
except:
pass
driver.quit()
data = pd.DataFrame(rows)
data.to_csv("data_odds.csv", index = False)
Output:
print(data.head(15).to_string())
Season Home_Team Away_Team Home_Score Away_Score OT Match_Time Home_Odd Away_Odd N_Bookmakers Date Stage
0 2013-2014 Real Madrid Maccabi Tel Aviv 86 98 False 18:00 -667 +493 7 18 May 2014 Final Four
1 2013-2014 Barcelona CSKA Moscow 93 78 False 15:00 -135 +112 7 18 May 2014 Final Four
2 2013-2014 Barcelona Real Madrid 62 100 False 19:00 +134 -161 7 16 May 2014 Final Four
3 2013-2014 CSKA Moscow Maccabi Tel Aviv 67 68 False 16:00 -278 +224 7 16 May 2014 Final Four
4 2013-2014 Real Madrid Olympiacos 83 69 False 18:45 -500 +374 7 25 Apr 2014 Play Offs
5 2013-2014 CSKA Moscow Panathinaikos 74 44 False 16:00 -370 +295 7 25 Apr 2014 Play Offs
6 2013-2014 Olympiacos Real Madrid 71 62 False 18:45 +127 -152 7 23 Apr 2014 Play Offs
7 2013-2014 Maccabi Tel Aviv Olimpia Milano 86 66 False 17:45 -217 +179 7 23 Apr 2014 Play Offs
8 2013-2014 Panathinaikos CSKA Moscow 73 72 False 16:30 -106 -112 7 23 Apr 2014 Play Offs
9 2013-2014 Panathinaikos CSKA Moscow 65 59 False 18:45 -125 +104 7 21 Apr 2014 Play Offs
10 2013-2014 Maccabi Tel Aviv Olimpia Milano 75 63 False 18:15 -189 +156 7 21 Apr 2014 Play Offs
11 2013-2014 Olympiacos Real Madrid 78 76 False 17:00 +104 -125 7 21 Apr 2014 Play Offs
12 2013-2014 Galatasaray Barcelona 75 78 False 17:00 +264 -333 7 20 Apr 2014 Play Offs
13 2013-2014 Olimpia Milano Maccabi Tel Aviv 91 77 False 18:45 -286 +227 7 18 Apr 2014 Play Offs
14 2013-2014 CSKA Moscow Panathinaikos 77 51 False 16:15 -303 +247 7 18 Apr 2014 Play Offs
i'm trying to import a txt-File to octave which contains a matrix of data.
The matrix looks like this:
49 ..1. ...1.......... ..... 49
47 ..12 ...1...... ... ..... 47
45 ..12....1...... 2....1... 45
43 ....2....1...... 2...1.... 43
41 .1..2.. .........2. .1..... 41
39 .1.12.2....1.....2. .1..... 39
37 .1..2.22...1.....2. .1..... 37
35 .1. 2222...2....2....1.1... 35
33 ....22.2...2....2....12.... 33
31 ....22.2...2..........21... 31
29 .....2.2...2.....2....21... 29
27 ........222222....2....21.... 27
25 .......22.2222....2.22.2..... 25
23 .......22.2222....2.2..2..... 23
21 .......222.222....2.2........ 21
19 ........22.222....2.......... 19
17 ..........2.2.2...22........... 17
15 ............................... 15
13 .......................2....... 13
11 .......................2......2 11
9 ........................2.....222 9
7 . ................. ..... ......222. 7
5 .................. ....1.. 5
3 ....... ......... .... 3
1 1
This is actually map/coordinate system (y-axis=azimuth, x-axis=latitude), which i have to plot. (blank=no data, .=no effects, 1=weak effects, 2=strong eff.)
The result should look like this
Because i failed to import this txt-File, i changed it into this.
49;1;1;1;1;1;1;1;2;2;3;2;1;2;2;2;3;2;2;2;2;2;2;2;2;2;2;1;2;2;2;2;2;1;1;1;1;1;1;1;49
47;1;1;1;1;1;1;1;2;2;3;4;1;2;2;2;3;2;2;2;2;2;2;1;2;2;2;1;2;2;2;2;2;1;1;1;1;1;1;1;47
45;1;1;1;1;1;1;1;2;2;3;4;2;2;2;2;3;2;2;2;2;2;2;1;4;2;2;2;2;3;2;2;2;1;1;1;1;1;1;1;45
43;1;1;1;1;1;1;2;2;2;2;4;2;2;2;2;3;2;2;2;2;2;2;1;4;2;2;2;3;2;2;2;2;1;1;1;1;1;1;1;43
39;1;1;1;1;1;1;2;3;2;3;4;2;4;2;2;2;2;3;2;2;2;2;2;4;2;1;2;3;2;2;2;2;2;1;1;1;1;1;1;39
37;1;1;1;1;1;1;2;3;2;2;4;2;4;4;2;2;2;3;2;2;2;2;2;4;2;1;2;3;2;2;2;2;2;1;1;1;1;1;1;37
and so on.
This is working with my code.
RawMap = dlmread('C:\Desktop\2576.map', ';', 0:80, 0:24)
Map = flipud(RawMap)
pcolor(Map(:,2:end-1))
To this automatically i don't want to the change the code. So i need to get the original file imported.
Any suggestions?
Thanks
Here is one approach to parse the file meaningfully:
S = fileread('testo.txt');
S = strsplit (S, "\n");
S = strvcat( S );
S = double(S);
S = S(:, 4:end-4);
S( S == double(" ") ) = 0;
S( S == double(".") ) = 1;
S( S == double("1") ) = 2;
S( S == double("2") ) = 3;
pcolor(S); axis ij;
My current code is:
count1 = 0
for i in range(30):
if i%26 == 0:
b = [i+1, i+2, i+3, i+4, i+5, i+6, i+7, i+8, i+9, i+10]
count1 += 1
print([count1])
print(*b, sep=' ')
elif (i-10)%26 == 0:
b = [i+1, i+2, i+3, i+4, i+5, i+6, i+7, i+8, i+9]
count1 += 1
print([count1])
print(*b, sep= ' ')
elif (i-16)%32 == 0:
b = [i+1, i+2, i+3, i+4, i+5, i+6, i+7, i+8, i+9, i+10]
count1 += 1
print([count1])
print(*b, sep= ' ')
which produces lines:
[1]
1 2 3 4 5 6 7 8 9 10
[2]
11 12 13 14 15 16 17 18 19
[3]
17 18 19 20 21 22 23 24 25 26
[4]
27 28 29 30 31 32 33 34 35 36
I'd like to output these lines in a simple text file. I'm familiar with the open and write functions, but do not know how to apply them to my specific example.
Thanks!
On GNU/Linux systems execute the program in the console, add > and the name of the file.
Example:
Assuming that you are in the directory wich contains the executable.
./[name of the program] > [name of the file]
./helloworld > helloworld.txt
This will save all the printed text in the console in a text file.
I'm trying to write to a hex file using PB12.5, I'm able to write to it without any issues but through testing noticed I will need to send a null value (00) to the file at certain points.
I know if I assign null to a string, it will null out the entire string so I tried using a Blob where I can insert a null value when needed (BlobEdit(blb_data, ll_pos, CharA(0)) )
But BlobEdit() automatically inserts a null value in between each position, I don't want this as it's causing issues as I'm trying to update the hex file. I just need to add my CharA(lb_byte) to each consecutive position in the Blob.
Is there any way around this or is PB just unable to do this? Below is the code:
ll_test = 1
ll_pos = 1
ll_length = Len(ls_output)
Do While ll_pos <= (ll_length)
ls_data = Mid(ls_output, ll_pos, 2)
lb_byte = Event ue_get_decimal_value_of_hex(ls_data)
ll_test = BlobEdit(blb_data, ll_test, CharA(lb_byte), EncodingANSI!)
ll_pos = ll_pos + 2
Loop
Hex file appears as follows:
16 35 2D D8 08 45 29 18 35 27 76 25 30 55 66 85 44 66 57 A4 67 99
After Blob update:
16 00 48 00 5D 00 C3 92 00 08 00 48 00 51 00 E2
I hope to help you:
//////////////////////////////////////////////////////////////////////////
// Function: f_longtohex
// Description: LONG to HEXADECIMAL
// Ambito: public
// Argumentos: as_number //Variable long to convert to hexadecimal
// as_digitos //Number of digits to return
// Return: String
// Example:
// f_longtohex(198 , 2) --> 'C6'
// f_longtohex(198 , 4) --> '00C6'
//////////////////////////////////////////////////////////////////////////
long ll_temp0, ll_temp1
char lc_ret
if isnull(as_digitos) then as_digitos = 2
IF as_digitos > 0 THEN
ll_temp0 = abs(as_number / (16 ^ (as_digitos - 1)))
ll_temp1 = ll_temp0 * (16 ^ (as_digitos - 1))
IF ll_temp0 > 9 THEN
lc_ret = char(ll_temp0 + 55)
ELSE
lc_ret = char(ll_temp0 + 48)
END IF
RETURN lc_ret + f_longtohex(as_number - ll_temp1 , as_digitos - 1)
END IF
RETURN ''
OK, to set the scene, I have written a function to import multiple tables from MySQL (using RODBC) and run randomForest() on them.
This function is run on multiple databases (as separate instances).
In one particular database, and one particular table, the "error in as.POSIXlt.character(x, tz,.....): character string not in a standard unambiguous format" error is thrown. The function runs on around 150 tables across two databases without any issues except this one table.
Here is a head() print from the table:
MQLTime bar5 bar4 bar3 bar2 bar1 pat1 baXRC
1 2014-11-05 23:35:00 184 24 8 24 67 147 Flat
2 2014-11-05 23:57:00 203 184 204 67 51 147 Flat
3 2014-11-06 00:40:00 179 309 49 189 75 19 Flat
4 2014-11-06 00:46:00 28 192 60 49 152 147 Flat
5 2014-11-06 01:20:00 309 48 9 11 24 19 Flat
6 2014-11-06 01:31:00 24 177 64 152 188 19 Flat
And here is the function:
GenerateRF <- function(db, countstable, RFcutoff) {
'load required libraries'
library(RODBC)
library(randomForest)
library(caret)
library(ff)
library(stringi)
'connection and data preparation'
connection <- odbcConnect ('TTODBC', uid='root', pwd='password', case="nochange")
'import count table and check if RF is allowed to be built'
query.str <- paste0 ('select * from ', db, '.', countstable, ' order by RowCount asc')
row.counts <- sqlQuery (connection, query.str)
'Operate only on tables that have >= RFcutoff'
for (i in 1:nrow (row.counts)) {
table.name <- as.character (row.counts[i,1])
col.count <- as.numeric (row.counts[i,2])
row.count <- as.numeric (row.counts[i,3])
if (row.count >= 20) {
'Delete old RFs and DFs for input pattern'
if (file.exists (paste0 (table.name, '_RF.Rdata'))) {
file.remove (paste0 (table.name, '_RF.Rdata'))
}
if (file.exists (paste0 (table.name, '_DF.Rdata'))) {
file.remove (paste0 (table.name, '_DF.Rdata'))
}
'import and clean data'
query.str2 <- paste0 ('select * from ', db, '.', table.name, ' order by mqltime asc')
raw.data <- sqlQuery(connection, query.str2)
'partition data into training/test sets'
set.seed(489)
index <- createDataPartition(raw.data$baXRC, p=0.66, list=FALSE, times=1)
data.train <- raw.data [index,]
data.test <- raw.data [-index,]
'find optimal trees to grow (without outcome and dates)
data.mtry <- as.data.frame (tuneRF (data.train [, c(-1,-col.count)], data.train$baXRC, ntreetry=100,
stepFactor=.5, improve=0.01, trace=TRUE, plot=TRUE, dobest=FALSE))
best.mtry <- data.mtry [which (data.mtry[,2] == min (data.mtry[,2])), 1]
'compress df'
data.ff <- as.ffdf (data.train)
'run RF. Originally set to 1000 trees but M1 dataset is to large for laptop. Maybe train at the lab?'
data.rf <- randomForest (baXRC~., data=data.ff[,-1], mtry=best.mtry, ntree=500, keep.forest=TRUE,
importance=TRUE, proximity=FALSE)
'generate and print variable importance plot'
varImpPlot (data.rf, main = table.name)
'predict on test data'
data.test.pred <- as.data.frame( predict (data.rf, data.test, type="prob"))
'get dates and name date column'
data.test.dates <- data.frame (data.test[,1])
colnames (data.test.dates) <- 'MQLTime'
'attach dates to prediction df'
data.test.res <- cbind (data.test.dates, data.test.pred)
'force date coercion to attempt negating unambiguous format error '
data.test.res$MQLTime <- format(data.test.res$MQLTime, format = "%Y-%m-%d %H:%M:%S")
'delete row names, coerce to dataframe, generate row table name and export outcomes to MySQL'
rownames (data.test.res)<-NULL
data.test.res <- as.data.frame (data.test.res)
root.table <- stri_sub(table.name, 0, -5)
sqlUpdate (connection, data.test.res, tablename = paste0(db, '.', root.table, '_outcome'), index = "MQLTime")
'save RF and test df/s for future use; save latest version of row_counts to MQL4 folder'
save (data.rf, file = paste0 ("C:/Users/user/Documents/RF_test2/", table.name, '_RF.Rdata'))
save (data.test, file = paste0 ("C:/Users/user/Documents/RF_test2/", table.name, '_DF.Rdata'))
write.table (row.counts, paste0("C:/Users/user/AppData/Roaming/MetaQuotes/Terminal/71FA4710ABEFC21F77A62A104A956F23/MQL4/Files/", db, "_m1_rowcounts.csv"), sep = ",", col.names = F,
row.names = F, quote = F)
'end of conditional block'
}
'end of for loop'
}
'close all connection to MySQL'
odbcCloseAll()
'clear workspace'
rm(list=ls())
'end of function'
}
At this line:
data.test.res$MQLTime <- format(data.test.res$MQLTime, format = "%Y-%m-%d %H:%M:%S")
I have tried coercing MQLTime using various functions including: as.character(), as.POSIXct(), as.POSIXlt(), as.Date(), format(), as.character(as.Date())
and have also tried:
"%y" vs "%Y" and "%OS" vs "%S"
All variants seem to have no effect on the error and the function is still able to run on all other tables. I have checked the table manually (which contains almost 1500 rows) and also in MySQL looking for NULL dates or dates like "0000-00-00 00:00:00".
Also, if I run the function line by line in R terminal, this offending table is processed without any problems which just confuses the hell out me.
I've exhausted all the functions/solutions I can think of (and also all those I could find through Dr. Google) so I am pleading for help here.
I should probably mention that the MQLTime column is stored as varchar() in MySQL. This was done to try and get around issues with type conversions between R and MySQL
SHOW VARIABLES LIKE "%version%";
innodb_version, 5.6.19
protocol_version, 10
slave_type_conversions,
version, 5.6.19
version_comment, MySQL Community Server (GPL)
version_compile_machine, x86
version_compile_os, Win32
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: i386-w64-mingw32/i386 (32-bit)
Edit: Str() output on the data as imported from MySQl showing MQLTime is already in POSIXct format:
> str(raw.data)
'data.frame': 1472 obs. of 8 variables:
$ MQLTime: POSIXct, format: "2014-11-05 23:35:00" "2014-11-05 23:57:00" "2014-11-06 00:40:00" "2014-11-06 00:46:00" ...
$ bar5 : int 184 203 179 28 309 24 156 48 309 437 ...
$ bar4 : int 24 184 309 192 48 177 48 68 60 71 ...
$ bar3 : int 8 204 49 60 9 64 68 27 192 147 ...
$ bar2 : int 24 67 189 49 11 152 27 56 437 67 ...
$ bar1 : int 67 51 75 152 24 188 56 147 71 0 ...
$ pat1 : int 147 147 19 147 19 19 147 19 147 19 ...
$ baXRC : Factor w/ 3 levels "Down","Flat",..: 2 2 2 2 2 2 2 2 2 3 ...
So I have tried declaring stringsAsfactors = FALSE in the dataframe operations and this had no effect.
Interestingly, if the offending table is removed from processing through an additional conditional statement in the first 'if' block, the function stops on the table immediately preceeding the blocked table.
If both the original and the new offending tables are removed from processing, then the function stops on the table immediately prior to them. I have never seen this sort of behavior before and it really has me stumped.
I watched system resources during the function and they never seem to max out.
Could this be a problem with the 'for' loop and not necessarily date formats?
There appears to be some egg on my face. The table following the table where the function was stopping had a row with value '0000-00-00 00:00:00'. I added another statement in my MySQL function to remove these rows when pre-processing the tables. Thanks to those that had a look at this.