$ is used as root object for get_json_object. My json string has already $ in the name of json key, how can I extract it's value? I dont want to use json_tuple.
create external table testing_hive (records string);
insert into testing_hive values("{\"$num\":\"hey\"}");
select get_json_object(testing_hive.records, '$.$num') from testing_hive;
You can replace "$num" with something else without $ in it, for example "xx_num":
select get_json_object(regexp_replace(testing_hive.records,'\\"\\$num\\"','\\"xx_num\\"'), '$.xx_num') as num from testing_hive;
Result:
hey
Also you can replace $ for all keys with some other prefix in single regex_replace:
regexp_replace(testing_hive.records,'\\"\\$(.*?\\":)','\\"xx_$1')
I included ": in the pattern to make sure it will match keys only, not values. Use '$1' as a replacement instead of '\\"xx_$1' if you want to remove $ and leave key without $ as is.
Hope you got the idea. Modify regex pattern accordingly.
Related
I'm using the following regex (https://regex101.com/r/Kt9sNj/1) in PHP to get all the files in the third level of a directory:
/^(\/[^\/]*){1,4}\/?$/m
Then if I have the following data:
/home/myuser/folder_example/first_file.txt
/home/myuser/folder_example/second_file.txt
/home/myuser/folder_example/third_file.txt
I get:
first_file.txt
second_file.txt
third_file.txt
I try to use this in a MySQL query that contains an array of a json object.
My Query is:
SELECT data->'$.files' AS File
FROM table
WHERE user = 'myuser';
And I get:
["/home/myuser/folder_example/first_file.txt","/home/myuser/folder_example/second_file.txt","/home/myuser/folder_example/third_file.txt"]
But when I use that regex on my sql query:
SELECT data->'$.files' AS File
FROM table
WHERE user = 'myuser'
AND data->'$.files' REGEXP '^(\/[^\/]*){1,4}\/?$';
I need to get this (all files under that directory):
["first_file.txt","second_file.txt","third_file.txt"]
It doesn't work. Do you know why?
The function REGEXP returns 1 if the pattern matches and will return the full match as the pattern does match the example strings.
In your pattern you are repeating a capturing group, which will capture the last value of the iteration in group 1, but it still contains a leading forward slash that you don't want in the output.
What you might do is match the first /, and then use a quantifier {3} to repeat exactly 3 times a part ending on a / using a non capture group.
Then capture the filename in group 1, and refer to that group using '$1' in the replacement using REGEXP_REPLACE
^/(?:[^/]*/){3}(\S+\.[^.\s]+)$
Regex demo | Mysql with replace demo
I have one column(varchar) containing only json string within one table. I want replace all keys with "" on that column. How can I do that using sql? My database is MySQL.
For example:
|--------------------------------------------------------------------|
| t_column |
|--------------------------------------------------------------------|
| {"name":"mike","email":"xxx#example.com","isManage":false,"age":22}|
|--------------------------------------------------------------------|
SELECT replace(t_column, regexp, "") FROM t_table
I expect:
mikexxx#example.comfalse22
How to write that regexp?
Start from
select t_column->'$.*' from test
This will return a JSON array of attribute values:
[22, "mike", "xxx#example.com", false]
This might be already all you need, and you can try something like
select *
from test
where t_column->'$.*' like '%mike%';
Unfortunately there seems to be no native way to join array values to a single string like JSON_ARRAY_CONCAT(). In MySQL 8.0 you can try REGEXP_REPLACE() and strip all JSON characters:
select regexp_replace(t_column->'$.*', '[" ,\\[\\]]', '') from test
which will return '22mikexxx#example.comfalse'.
If the values can contain one of those characters, they will also be removed.
Note: That isn't very reliable. But it's all I can do in a "simple" way.
See demo on db-fiddle.
I could be making it too simplistic, but this is just a mockup based on your comment. I can formalize it into a query if it fits your requirement.
Let's say you get your JSON string to this format where you replace all the double quotes and curly brackets and then add a comma at the end. After playing with replace and concat_ws, you are now left with:
name:mike,email:xxx#example.com,isManage:false,age:22,
With this format, every value is now preceded by a semicolon and followed by a comma, which is not true for the key. Let's say you now want to see if this JSON string has the value "mike" in it. This, you could achieve using
select * from your_table where json_col like '%:mike,%';
If you really want to solve the problem with your approach then the question becomes
What is the regex that selects all the undesired text from the string {"name":"mike","email":"xxx#example.com","isManage":false,"age":22} ?
Then the answer would be: {\"name\":\"|\"email\":\"|\",\"isManage\":|,\"age\":|}
But as others let you notice I would actually approach the problem parsing JSONs. Look up for functions json_value and json_query
Hope I helped
PS: Keep close attention on how I structured the bolded sentence. Any difference changes the problem.
EDIT:
If you want a more generic expression, something like select all the text that is not a value on a json-formatted string, you can use this one:
{|",|"\w+\":|"|,|}
In order to clean up code for my Rails project, I moved all regex strings to MySQL. How can I add the string located in MySQL fields to my match method? Here's an example of what I'm trying to do:
foobar = []
regular_ex = StringDb.pluck(:id, :regex)
# :regex is a column that stores regex strings, ie. '/[a-c]|[1,2,3]/'
regular_ex.each do |exp|
if foo.match(exp[1])
foobar << exp[0]
Please let me know if my question is not clear.
Thanks in advance!
You can create new regex from string like this
Regexp.new str
Or using the regex interpolation
%r{#{regex_string}}
That would mean that you would do something similiar to this
foobar = []
regular_ex = StringDb.pluck(:id, :regex)
# :regex is a column that stores regex strings, ie. '/[a-c]|[1,2,3]/'
regular_ex.each do |exp|
if Regexp.new(exp).match("stringToMatch")
# do something
Please note two things.
In order for this to work you should remove the starting and trailing slash from the regex in database.
Store '[a-c]|[1,2,3]' instead of '/[a-c]|[1,2,3]/'.
I'm not really sure why you decided to store all regexes in the database but it does not sound like a good idea.
I have a text-file of data in key-value pairs that I have managed to convert to a format where the key-value pairs are all separated by an underscore between them, and the key is separated from the value by a colon. I thought this format would be useful for keeping spaces intact within the data. Here's an example with the data substituted for ~~~~~~~s.
_ID:~~~_NAME:~~~~~_DESCRIPTION:~~~~~~~_TYPE1:~~~~~~_TYPE2:~~~~~~ ...etc
I want to convert this to a MySQL script to insert the data into a table. My problem is there are nullable fields that aren't included in every record. e.g. A record has a _TYPE1: and may or may not have a _TYPE2:
... _DESCRIPTION:~~~~~~_TYPE1:~~~~~~_TYPE2:~~~~~~_ADDRESS:~~~~~~~ ...
... _DESCRIPTION:~~~~~~_TYPE1:~~~~~~_ADDRESS:~~~~~~~ ...
... _DESCRIPTION:~~~~~~_TYPE1:~~~~~~_ADDRESS:~~~~~~~ ...
... _DESCRIPTION:~~~~~~_TYPE1:~~~~~~_TYPE2:~~~~~~_ADDRESS:~~~~~~~ ...
... _DESCRIPTION:~~~~~~_TYPE1:~~~~~~_ADDRESS:~~~~~~~ ...
I thought to fix this by inserting _TYPE2: after every _TYPE1 without a _TYPE2:. Since there are only a few different possible types, I managed to select the _ after each _TYPE1:~~~~~~ without a TYPE2: following it. I used the following regex, where egtype is one example of a possible type:
(?<=_TYPE1:egtype)_(?!TYPE2:)
At this point, all I have to do is replace that _ with _TYPE2:_ and every field is present in every line, which makes it easy to convert every row to a MySQL insert statement! Unfortunately, Notepad++ is not replacing it when I click the Replace button. I'm not sure why.
Does anyone know why it wouldn't replace an _ with _TYPE2:_ using that particular regex? Or does anyone have any other suggestions on how to turn all this data into a MySQL insert script?
Regex
To do what you want, try this:
Find:
_TYPE1:[^_]+\K(?!.*_TYPE2)
Replace:
_TYPE2:
You can test it with your sample data and have it explained here.
Python Script plugin
As a side note, I don't think it's possible to convert your data into SQL insert statements with the use of one and only one regular expression, and while I see what you are trying to do by adding fake TYPE2, I don't think it is your best option.
So, my suggestion is to use Notepad++'s Python Script plugin.
Install Python Script plugin, from Plugin Manager or from the official website.
Then go to Plugins > Python Script > New Script. Choose a filename for your new file (eg sql_insert.py) and copy the code that follows.
Run Plugins > Python Script > Scripts > sql_insert.py and a new tab will show up the desired result.
Script:
columns = [[]]
values = [[]]
current_line = 0
def insert(line, match):
global current_line
if line > current_line:
current_line += 1
columns.append([])
values.append([])
if match:
i = 0
for m in match.groups():
if i % 2 == 0:
columns[line].append(m)
else:
values[line].append(m)
i += 1
editor.pysearch("_([A-Z0-9]+):([^_\n]+)", insert)
notepad.new()
for line in range(len(columns)):
editor.addText("INSERT INTO table (" + ",".join(columns[line]) + ") values (" + ",".join(values[line]) +");\n")
Note: I'm still learning Python and I've a feeling that this one could be written in a better way. Feel free to edit my answer or drop a comment if you can suggest improvements!
Example input:
_ID:~~~_NAME:~~~~~_DESCRIPTION:~~~~~~~_TYPE1:~~~~~~_TYPE2:~~~~~~
_ID:~~~_NAME:~~~~~_DESCRIPTION:~~~~~~_TYPE1:~~~~~~_TYPE2:~~~~~~_ADDRESS:~~~~~~~
_ID:~~~_NAME:~~~~~_DESCRIPTION:~~~~~~_TYPE1:~~~~~~_ADDRESS:~~~~~~~
Example output:
INSERT INTO table (ID,NAME,DESCRIPTION,TYPE1,TYPE2) values (~~~,~~~~~,~~~~~~~,~~~~~~,~~~~~~);
INSERT INTO table (ID,NAME,DESCRIPTION,TYPE1,TYPE2,ADDRESS) values (~~~,~~~~~,~~~~~~,~~~~~~,~~~~~~,~~~~~~~);
INSERT INTO table (ID,NAME,DESCRIPTION,TYPE1,ADDRESS) values (~~~,~~~~~,~~~~~~,~~~~~~,~~~~~~~);
try searching for (_TYPE1:)(\S\S\S\S\S\S)(_ADDRESS:)
and replacing with \1\2_TYPE2:~~~~~~\3
i tested in notepad++ with your data and it works
don't forget to change the Search Mode to regular expression.
to turn it into an INSERT script just keep using regular expression like i did above, and bracket which ever field you want and then replace with a \number whichever field and move them around it should be pretty simple manual labor, have fun.
for example search for your whole line here i am only doing DESCRIPTION,TYPE1,and TYPE2
search for using regular expression
(_DESCRIPTION)(:)(\S\S\S\S\S\S)(_TYPE1)(:)(\S\S\S\S\S\S)(_TYPE2)(:)(\S\S\S\S\S\S)
then replace with something like
INSERT INTO table1\(desc,type1,type2\)values\('\3','\6','\9'\); (in notepad++)
If this is a once-off problem then a two step process would work. First step would add a _TYPE2:SomeDefaultValue to every line. Step two would remove it from lines where it was not needed.
Step 1: Find what: $, Replace with: _TYPE2:xxx
Step 2: Find what: (_TYPE2:.*)_TYPE2:xxx$, Replace with: \1
In both steps select "regular expression" and un-select "dot matches newline". Also change xxx to your default value.
I am having the following problem:
I have a table T which has a column Name with names. The names have the following structure:
A\\B\C
You can create on yourself like this:
create table T ( Name varchar(10));
insert into T values ('A\\\\B\\C');
select * from T;
Now if I do this:
select Name from T where Name = 'A\\B\C';
That doesn't work, I need to escape the \ (backslash):
select Name from T where Name = 'A\\\\B\\C';
Fine.
But how do I do this automatically to a string Name?
Something like the following won't do it:
select replace('A\\B\C', '\\', '\\\\');
I get: A\\\BC
Any suggestions?
Many thanks in advance.
You have to use "verbatim string".After using that string your Replace function will
look like this
Replace(#"\", #"\\")
I hope it will help for you.
The literal A\\B\C must be coded as A\\\\A\\C, and the parameters of replace() need escaping too:
select 'A\\\\B\\C', replace('A\\\\B\\C', '\\', '\\\\');
output (see this running on SQLFiddle):
A\\B\C A\\\\B\\C
So there is little point in using replace. These two statements are equivalent:
select Name from T where Name = replace('A\\\\B\\C', '\\', '\\\\');
select Name from T where Name = 'A\\\\B\\C';
Usage of regular expression will solve your problem.
This below query will solve the given example.
1) S\\D\B
select * from T where Name REGEXP '[A-Z]\\\\\\\\[A-Z]\\\\[A-Z]$';
if incase the given example might have more then one char
2) D\\B\ACCC
select * from T where Name REGEXP '[A-Z]{1,5}\\\\\\\\[A-Z]{1,5}\\\\[A-Z]{1,5}$';
note: i have used 5 as the max occurrence of char considering the field size is 10 as its mentioned in the create table query.
We can still generalize it.If this still has not met your expectation feel free to ask for my help.
You're confusing what's IN the database with how you represent that data in SQL statements. When a string in the database contains a special character like \, you have to type \\ to represent that character, because \ is a special character in SQL syntax. You have to do this in INSERT statements, but you also have to do it in the parameters to the REPLACE function. There are never actually any double slashes in the data, they're just part of the UI.
Why do you think you need to double the slashes in the SQL expression? If you're typing queries, you should just double the slashes in your command line. If you're generating the query in a programming language, the best solution is to use prepared statements; the API will take care of proper encoding (prepared statements usually use a binary interface, which deals with the raw data). If, for some reason, you need to perform queries by constructing strings, the language should hopefully provide a function to escape the string. For instance, in PHP you would use mysqli_real_escape_string.
But you can't do it by SQL itself -- if you try to feed the non-escaped string to SQL, data is lost and it can't reconstruct it.
You could use LIKE:
SELECT NAME FROM T WHERE NAME LIKE '%\\\\%';
Not exactly sure by what you mean but, this should work.
select replace('A\\B\C', '\', '\\');
It's basically going to replace \ whereever encountered with \\ :)
Is this what you wanted?