Force mySQL queries to be characters not numeric in R - mysql

I'm using RODBC to interface R with a MySQL database and have encountered a problem. I need to join two tables based on unique ID numbers (IDNUM below). The issue is that the ID numbers are 20 digit integers and R wants to round them. OK, no problem, I'll just pull these IDs as character strings instead of numeric using CAST(blah AS CHAR).
But R sees the incoming character strings as numbers and thinks "hey, I know these are character strings... but these character strings are just numbers, so I'm pretty sure this guy wants me to store this as numeric, let me fix that for him" then converts them back into numeric and rounds them. I need to force R to take the input as given and can't figure out how to make this happen.
Here's the code I'm using (Interval is a vector that contains a beginning and an ending timestamp, so this code is meant to only pull data from a chosen timeperiod):
test = sqlQuery(channel, paste("SELECT CAST(table1.IDNUM AS CHAR),PartyA,PartyB FROM
table1, table2 WHERE table1.IDNUM=table2.IDNUM AND table1.Timestamp>=",Interval[1],"
AND table2.Timestamp<",Interval[2],sep=""))

You will most likely want to read the documentation for the function you are using at ?sqlQuery, which includes notes about the following two relevant arguments:
as.is which (if any) columns returned as character should be
converted to another type? Allowed values are as for read.table. See
‘Details’.
and
stringsAsFactors logical: should columns returned as character and
not excluded by as.is and not converted to anything else be converted
to factors?
In all likelihood you want to specify the columns in questions in as.is.

Related

How to convert date from csv file into integer

I have to send data from csv into SQL DB.
Problem starts when I try to convert data into Int. It wasnt my idea and I really cant do much with this datatype. When I'm trying to achieve this problem pop up:
Data Conversion 2: Data conversion failed while converting column
"pr_czas" (387) to column "C pr_dCz_id" (14). The conversion returned
status value 2 and status text "The value could not be converted
because of a potential loss of data.".
Tried already to ignore this problem but then another problems came up so there is no other way than solving this.
I have to convert this data from csv file which is str 50 into int 4
It must be int4. One of the requirements Dont know what t odo.
This is data I'm trying to put into int4. Look on pr_czas
This is data's datatype
Before I tried to do same thing with just DD.MM.YYYY but got same result...
Given an input column named [pr_czas] that contain string values that look like 31.01.2020 00:00 which appears to be a formatted date time represented in the format "DD.mm.YYYY HH:MM", I would like to express that as a whole number DDMMYYHHMM
Add a derived column to your data flow and call this new_pr_czas
The logic I'm going to use is a series of REPLACE statements and cast the final result to an integer. Replace the period, replace the colon and the space - all with nothing
(DT_I8)REPLACE(REPLACE(REPLACE([pr_czas], ".", ""), ":", ""), " ", "")
This is an easy case but things to note.
An integer/int32/I4 has a maximum value of 2 billion.
310120200000 is too large to fit into that space so you would need to make that an bigint/int64/I8. If I remember your previous question, you were having troubles with a lookup task so this data type mismatch might hurt you there.
The other thing to be aware of is that leading zeros will be dropped when converted to a number because they are not significant. If you need to retain the leading zeros, then you're working with string data type. This is an advantage to working with the ISO standard but if your data expects DD, then far be it for me to say otherwise.
If you need to slice your date into another format, then you'll want to have a few derived columns. The first one will generate a string column for each piece of pr_czas - year, month, day, hour and minute. You'll use the substring method for this and findstring to find the period space and colon.
The next data flow will be used to put those string pieces back into the new format and cast that to I8. Why? Because you can't debug doing it all in one shot but you can put a data viewer between two derived columns to figure out where a slice went awry.

how to convert values from database to be numeric?

I'm grabbing all the values of a column using:
> myValues <- dbGetQuery(mydb,"select average_Medicare_allowed_amt from STAGING_MEDICAREPUF")
because the values are defined as varchar, when I do a summary(myValues) r is not recognizing that the values are numerical:
Assuming I have no access to the backend schema, and am unable to cast the varchars to decimals, is it possible to first convert myValues to be numerical and then get a summary?
In MySQL, I find that the easiest way to convert to a number value is to simply add zero:
select (average_Medicare_allowed_amt + 0) as average_Medicare_allowed_amt
Note that the use of the column alias. This allows you to refer to the resulting value using the same name.
MySQL does "silent" conversion. If it encounters an error or a non-numeric character, then the conversion stops. So, 'abc' + 0 returns 0 instead of generating an error.
And, regarding your comment, I have never heard of "cast()" permissions in any database.

mysql replace last character if match

I have a table called media with a column called accounts_used in which the rows appear in the following format
68146, 67342, 60577, 61506, 67194, 67034, 63484, 49113, 61518, 66971, 67511,
67351, 63621, 67725, 63638, 68141, 66114, 67262, 67537, 67537, 61765, 63701,
67087, 62641, 61294, 67063, 67049, 67038, 67170, 67147, 67289, 61264, 67091,
63690, 63505, 63505, 49172, 52313, 67070, 66945, 67234, 62265, 61368, 67870,
67211, 67586, 49240, 67538, 67538, 67809, 67183, 67164, 62712, 67519, 66895,
67693, 60266, 60266, 67593, 67031, 67137, 62570, 60682, 61195, 67569, 67569,
67069, 62082, 67345, 61748, 61553, 52029, 66877, 62630, 67196, 67196, 67196,
67196, 67196, 67196, 66873, 63677, 68174, 67127, 63594, 67107, 60419, 66601,
68156, 67203, 68161, 60233, 66586, 52654, 63570, 66887, 67191, 60877, 52108,
67131, 61784, 67566, 67162, 67073, 67092, 67064, 60133, 66907, 67559, 66846,
60490, 60347, 66558, 48737, 61539, 67236, 68135, 67238 , 63656, 67585, 67512
If the row has a comma at the end I want to remove this, so for example if the row looks like the following
1,2,3,4,5,6,
I want to replace it to just this
1,2,3,4,5,6
Is this possible to do using just a simple query?
It is a bad idea to store lists of ids in rows. But, you are doing it. You can fix this by doing:
update media
set accounts_used = left(accounts_used, length(accounts_used) - 1)
where accounts_used = '%,';
Instead, you should have a MediaAccounts table, with one row per "media" and one row per account.
EDIT:
Possibly, the row ends with a ', ' rather than just a comma:
update media
set accounts_used = left(accounts_used, length(accounts_used) - 2)
where accounts_used = '%, ';
We faced a similar string-replacement issue with a large dataset of bibliographic entries, where we also needed to trim extraneous punctuation from a large number of strings stored in the database which had been imported verbatim from another system. Many of the records in our dataset also contained Unicode characters, as such we needed to find a suitable SQL query that would allow us to find the relevant records that needed to be updated, and then to update them in a way that was Unicode (multibyte character) compatible under MySQL.
In testing with our dataset, I found performing a search for the relevant records we needed to update using MySQL's LEFT() and RIGHT() substring methods, performed better than using a LIKE pattern-match query. Additionally, MySQL's LENGTH() method returns the number of bytes in a string, rather than the number of characters, and the distinction is important when dealing with string fields that potentially contain multibyte character sequences as MySQL's substring methods operate on the number of characters to select, rather than the number of bytes. Thus using the LENGTH() method did not work in our case where many of strings under test contained multibyte characters. These requirements resulted in an UPDATE query with the format presented below:
UPDATE media
SET accounts_used = LEFT(accounts_used, CHAR_LENGTH(accounts_used) - 1)
WHERE RIGHT(accounts_used, 1) = ',';
The query selects records in the media table where the accounts_used column ends with a comma , (found here using the WHERE RIGHT(accounts_used, 1) = ',' clause to perform the filtering where the RIGHT() method returns a substring of specified length starting on the right of the provided string/column), and then uses the LEFT(accounts_used, CHAR_LENGTH(accounts_used) - 1) method call to perform the string trim operation, here trimming the last character from the accounts_used column value, where LEFT() returns a substring of specified length starting on the left of the provided string/column).
Here the use of the multibyte-aware CHAR_LENGTH() method – rather than the basic LENGTH() method – was important in our case due to the countless records in our dataset that contained multibyte characters. If you are only dealing with an ASCII-encoded or another single-byte encoded character set then the LENGTH() method would work perfectly, and indeed in that case CHAR_LENGTH() and LENGTH() would return the same length count, and could even be used interchangeably. When dealing with data that could contain multibyte characters, or if in doubt use the CHAR_LEGNTH() method instead as it will return an accurate character length count in either case.
Please note that the column and field names used in the example query above match those noted in the original question, and should be modified as needed to suit your own dataset needs.

Why phone numbers in MySQL database are being truncated

I have created a database table in mySQL of which two column names are "landPhone" and "mobilePhone" to store phone numbers (in the format of: 123-456-8000 for land and 098-765-6601 for mobile). These two columns' data type are set to VARCHAR(30). The data have been inserted in the table. But after SQL query, I found the phone numbers have been truncated. It shows (above two data for example) only first 3 digits (123) for landPhone and only first 2 digits after removing the leading '0' (98) for mobilePhone.
Why this is happening ?
Phone numbers are not actually numbers; they are strings that happen to contain digits (and, in your case, dashes). If you try to interpret one as a number, two things typically happen:
Leading zeros are forgotten.
Everything from the first non-digit to the end of the string is stripped off.
That sounds exactly like the result you're describing. Even if you end up stuffing the result into a string field, it's too late -- the data has already been corrupted.
Make sure you're not treating phone numbers as integers at any point in the process.
You must use
insert into sample values('123-456-8000', '098-765-6601' )
instead of
insert into sample values(123-456-8000, 098-765-6601 )
see this SQLFiddle.
Thanks all for your solution. As cHao suspected, it was me who did the mistake. When I first time created the table, I declared the datatype of the phone columns as INT, later I corrected them to VARCHAR().
When I dropped the table and inserted the same data to the new table, it is working fine.
That sounds exactly like the result you're describing. Even if you end up stuffing the result into a string field, it's too late -- the data has already been corrupted. ..cHao
Question to understand: Why mySQL doesn't override the previous datatype with the new one ?

Using REGEX to alter field data in a mysql query

I have two databases, both containing phone numbers. I need to find all instances of duplicate phone numbers, but the formats of database 1 vary wildly from the format of database 2.
I'd like to strip out all non-digit characters and just compare the two 10-digit strings to determine if it's a duplicate, something like:
SELECT b.phone as barPhone, sp.phone as SPPhone FROM bars b JOIN single_platform_bars sp ON sp.phone.REGEX = b.phone.REGEX
Is such a thing even possible in a mysql query? If so, how do I go about accomplishing this?
EDIT: Looks like it is, in fact, a thing you can do! Hooray! The following query returned exactly what I needed:
SELECT b.phone, b.id, sp.phone, sp.id
FROM bars b JOIN single_platform_bars sp ON REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(b.phone,' ',''),'-',''),'(',''),')',''),'.','') = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(sp.phone,' ',''),'-',''),'(',''),')',''),'.','')
MySQL doesn't support returning the "match" of a regular expression. The MySQL REGEXP function returns a 1 or 0, depending on whether an expression matched a regular expression test or not.
You can use the REPLACE function to replace a specific character, and you can nest those. But it would be unwieldy for all "non-digit" characters. If you want to remove spaces, dashes, open and close parens e.g.
REPLACE(REPLACE(REPLACE(REPLACE(sp.phone,' ',''),'-',''),'(',''),')','')
One approach is to create user defined function to return just the digits from a string. But if you don't want to create a user defined function...
This can be done in native MySQL. This approach is a bit unwieldy, but it is workable for strings of "reasonable" length.
SELECT CONCAT(IF(SUBSTR(sp.phone,1,1) REGEXP '^[0-9]$',SUBSTR(sp.phone,1,1),'')
,IF(SUBSTR(sp.phone,2,1) REGEXP '^[0-9]$',SUBSTR(sp.phone,2,1),'')
,IF(SUBSTR(sp.phone,3,1) REGEXP '^[0-9]$',SUBSTR(sp.phone,3,1),'')
,IF(SUBSTR(sp.phone,4,1) REGEXP '^[0-9]$',SUBSTR(sp.phone,4,1),'')
,IF(SUBSTR(sp.phone,5,1) REGEXP '^[0-9]$',SUBSTR(sp.phone,5,1),'')
) AS phone_digits
FROM sp
To unpack that a bit... we extract a single character from the first position in the string, check if it's a digit, if it is a digit, we return the character, otherwise we return an empty string. We repeat this for the second, third, etc. characters in the string. We concatenate all of the returned characters and empty strings back into a single string.
Obviously, the expression above is checking only the first five characters of the string, you would need to extend this, basically adding a line for each position you want to check...
And unwieldy expressions like this can be included in a predicate (in a WHERE clause). (I've just shown it in the SELECT list for convenience.)
MySQL doesn't support such string operations natively. You will either need to use a UDF like this, or else create a stored function that iterates over a string parameter concatenating to its return value every digit that it encounters.