i have a column query_params with type TEXT but the values are stored as a string unicode. each value in the column is prefixed with a u and i'm struggling to remove it
Is there a way to remove the u from the values and convert the values dictionary to columns?
for example, the query SELECT query_params FROM api_log LIMIT 2 returns two rows
{
u'state': u'CA',
u'page_size': u'1000',
u'market': u'Western',
u'requested_at': u'2014-10-28T00:00:00+00:00'
},
{
u'state': u'NY',
u'page_size': u'1000',
u'market': u'Eastern',
u'requested_at': u'2014-10-28T00:10:00+00:00'
}
is it possible to handle unicode in postgres and convert to columns:
state | page_size | market | requested_at
------+-----------+----------+---------------------------
CA | 1000 | Western | 2014-10-28T00:00:00+00:00
NY | 1000 | Eastern | 2014-10-28T00:10:00+00:00
Thanks for any help.
You should remove u letters and replace single quotes with double ones to get properly formatted json. Then you can use the ->> operator to get its attributes:
select
v->>'state' as state,
v->>'page_size' as page_size,
v->>'market' as market,
v->>'requested_at' as requested_at
from (
select regexp_replace(query_params, 'u\''([^\'']*)\''', '"\1"', 'g')::json as v
from api_log
) s;
Test the solution in SqlFiddle.
Read about POSIX Regular Expression in the documentation.
Find an explanation of the regexp expression in regex101.com.
Related
I have a list of dictionaries within a pandas column to designate landing pages for a particular keyword.
keyword | 07-31-2019 | landing_pages |
cloud api | 50 | [{'url' : 'www.example.com', 'date' : '07-31-2019'}, {'url' ... ]|
database | 14 | [{'url' : 'www.example.com/2', 'date' : '08-30-2019'} ... ]|
*There are actually many date columns, but I've only shown 1 for example.
My issue is that I already have columns for each date, so I want to extract the landing pages as a list and have that as a new column.
keyword | 07-31-2019 | landing_pages
cloud api | 50 | www.example.com, www.example.com/other
database | 14 | www.example.com/2, www.example.com/3
So far, I've tried using json_normalize, which gave me a new table of dates and landing pages. I've tried getting the values with list comprehension, but that gave me the wrong result as well. One way I can think of is to use loops to solve the problem, but I'm concerned that's not efficient. How can I do this efficiently?
Use generator with join for extract url values (if data are dictionaries):
df['landing_pages'] = df['landing_pages'].apply(lambda x: ', '.join(y['url'] for y in x))
print (df)
keyword 07-31-2019 landing_pages
0 cloud api 50 www.example.com
1 database 14 www.example.com/2
If not working because strings repr of dictionaries:
import ast
df['landing_pages'] = df['landing_pages']
.apply(lambda x: ', '.join(y['url'] for y in ast.literal_eval(x)))
EDIT: If want maximal url by recent dates create DataFrame with adding new keys by index values, then convert datetimes from strings and last use DataFrameGroupBy.idxmax for index of maximum datetimes, select by DataFrame.loc for rows with urls and last assign column url to original DataFrame:
L = [dict(x, **{'i':k}) for k, v in df['landing_pages'].items() for x in v]
df1 = pd.DataFrame(L)
df1['date'] = pd.to_datetime(df1['date'])
df['url by max date'] = df1.loc[df1.groupby('i')['date'].idxmax()].set_index('i')['url']
I am trying to get the integer value between two specific strings but I am stacked a little bit.
Example full string:
"The real ABC4_ string is probably company secret."
I need to get the "4" between "ABC" and "_". First I've came up with following script:
select substring_index(substring_index('The real ABC4_ string is probably company secret.', 'ABC', -1),'_', 1);
It gives me 4, perfect! But the problem is if ABC occurs more than one time in the string it fails. I can't simply increase the counter also since I don't know how many times it will be in the future. I have to get first occurrence of that regex: ABC[DIGIT]_
I've seen REGEXP_SUBSTR function but since we use older version of MySQL than 8.0 I can't use it also.
Any suggestions?
Thanks!
Without using Regex, here is an approach using LOCATE(), and other string functions:
SET #input_string = 'The real ABC4_ string is probably company secret.';
SELECT TRIM(LEADING 'ABC'
FROM SUBSTRING_INDEX(
SUBSTR(#input_string FROM
LOCATE('ABC', #input_string)
)
,'_', 1
)
) AS number_extracted;
| number_extracted |
| ---------------- |
| 4 |
View on DB Fiddle
Another way of (ab)using the LOCATE() function:
select substr('The real ABC4_ string is probably company secret.',
locate('ABC', 'The real ABC4_ string is probably company secret.') + 3,
locate('_','The real ABC4_ string is probably company secret.') -
locate('ABC', 'The real ABC4_ string is probably company secret.') - 3) AS num;
Below select requires:
Dollar amount field 11 characters with leading zeros
No comma, decimal, nor negative sign
When leading zero, REPLACE decimal,comma, and negative sign is working I receive an error that varchar not used to SUM. When CAST as Numeric all the commas, decimals, and negative signs show.
CASE WHEN psg_postingtransactions.[AMOUNT] != 0 THEN CAST(REPLACE(REPLACE(REPLICATE('0',12-LEN(RTRIM(psg_postingtransactions [AMOUNT])))+RTRIM(psg_postingtransactions.[AMOUNT]),'.',''), '-', '0') AS NUMERIC) ELSE '00000000000' END
In MySQL try this:
LPAD(FLOOR(ABS(AMOUNT)*100),11,'0')
However your SQL looks like it is TSQL (MS SQL Server) and not MySQL see this SQL Fiddle
Assuming you do want it fo TSQL then, try this:
RIGHT(CONCAT('00000000000',CAST(ABS([AMOUNT])*100 AS INT)),11) AS LEADZERO
ABS() removes the minus
*100 will remove 2 decimals once result is cast to integer
CONCAT() places string of zeros in front of the integer
RIGHT(...,7) takes only 7 chars of the string we need for output
or, you can use floor() instead of the cast()
RIGHT(CONCAT('00000000000',floor(ABS([AMOUNT])*100)),11) AS LEADZERO
.
CREATE TABLE psg_postingtransactions
([AMOUNT] decimal(11,2))
;
INSERT INTO psg_postingtransactions
([AMOUNT])
VALUES
(0),
(123.45),
(45678.1),
(-12.56)
;
Query 1:
SELECT
RIGHT(CONCAT('00000000000',cast(ABS([AMOUNT])*100 AS INT)),7) AS LEADZERO
FROM psg_postingtransactions
Results:
| LEADZERO |
|--------------|
| 00000000000 |
| 00000012345 |
| 00004567810 |
| 00000001256 |
I need a MySQL query to get the below expected output processing the above input data??(in my data i get junk data ranging from \x128 to \x160(Hex data) ASCII characters.).so,I need a regex pattern to fetch only the data that contains hex values and remaining all the values to be NULL except the key column Name.
Input data :
**NAME PHONE ADDRESS**
anu 345#2 hyderabad
vinu 1234 raj^am
ram 234 vizag
kheer 233&3 vz1m
palni 1333 rap#d
Required output data:
**NAME PHONE ADDRESS**
anu 345#2 NULL
vinu NULL raj^am
kheer 233&3 NULL
plain NULL rap#d
you have to use 2 query for this result first on PHONE column and 2nd is ADDRESS Column
1.UPDATE test1 SET Phone=NULL WHERE Phone REGEXP '[#^]'
2.UPDATE test1 SET Address=NULL WHERE Address REGEXP '[#^]'
HEX(phone) REGEXP '^(..)*[89ABCDEF]'
will match any phone with any non-Ascii bytes.
I suspect you did not mean x128, nor x160. Those numbers look more like decimal. If you want to catch >= 128 and < 160 (note: not <= 160):
HEX(phone) REGEXP '^(..)*[89]'
The REGEXP says:
^ -- anchored at start
(..)* -- skip any number of pairs of characters (remember they are hex)
[89] -- match 8x or 9x
-- ignore the rest of the column
If I have a string that starts with a number, then contains non-numeric characters, casting this string to an integer in MySQL will cast the first part of the string, and give no indication that it ran into any problems! This is rather annoying.
For example:
SELECT CAST('123' AS UNSIGNED) AS WORKS,
CAST('123J45' AS UNSIGNED) AS SHOULDNT_WORK,
CAST('J123' AS UNSIGNED) AS DOESNT_WORK
returns:
+-------------+---------------+-------------+
| WORKS | SHOULDNT_WORK | DOESNT_WORK |
+-------------+---------------+-------------+
| 123 | 123 | 0 |
+-------------+---------------+-------------+
This doesn't make any sense to me, as clearly, 123J45 is not a number, and certainly does not equal 123. Here's my use case:
I have a field that contains (some malformed) zip codes. There may be mistypes, missing data, etc., and that's okay from my perspective. Because of another table storing Zip Codes as integers, when I join the tables, I need to cast the string Zip Codes to integers (I would have to pad with 0s if I was going the other way). However, if for some reason there's an entry that contains 6023JZ1, in no way would I want that to be interpreted as Zip Code 06023. I am much happier with 6023JZ1 getting mapped to NULL. Unfortunately, IF(CAST(zipcode AS UNSIGNED) <= 0, NULL, CAST(zipcode AS UNSIGNED)) doesn't work because of the problem discussed above.
How do I control for this?
Use a regular expression:
select (case when val rlike '[0-9][0-9][0-9][0-9][0-9]' then cast(val as unsigned)
end)
Many people consider it a nice feature that MySQL does not automatically produce an error when doing this conversion.
One options is to test for just digit characters 0 thru 9 for the entire length of the string:
zipstr REGEXP '^[0-9]+$'
Based on the result of that boolean, you could return the integer value, or a NULL.
SELECT IF(zipstr REGEXP '^[0-9]+$',zipstr+0,NULL) AS zipnum ...
(note: the addition of zero is an implicit conversion to numeric)
Another option is to do the conversion like you are doing, and cast the numeric value back to character, and compare to the original string, to return a boolean:
CAST( zipstr+0 AS CHAR) = zipstr
(note: this second approach does allow for a decimal point, e.g.
CAST( '123.4'+0 AS CHAR ) = '123.4' => 1
which may not be desirable if you are looking for just a valid integer