I have a table, approx. 1K rows and two columns. The first row is the emp_id and the second is tel_num. The tel_num column is not formatted all the same...some examples are (555) 555-9876, +18763334455, 505-999-888x222, some have no values...and so on. The goal is to format them all the same 10 digits without the leading 1s or any extensions.
The table looks like the following
emp_id
tel_num
Jon Doe
+18763334455
Cal Foe
505-999-8888x222
Ho Moe
nan
GI joe
676.909.4321
trying to make this...
Column A
tel_format
Jon Doe
(876) 333-4455
Cal Foe
(505) 999-8888
Ho Moe
nan
GI Joe
(679) 909-4321
I tried this format...
I'm using databricks.
The current process i tried is somewhat like this...
def formatphone(ph_var):
...some process
return formatted_ph
df = df.withColumn('tel_format', formatphone(df.tel_num))
I can't get it to work.
You can use the following function, assuming that all possible formats are shown in your sample data.
To use this function in withColumn(), you need to create a UDF from it.
#F.udf(returnType=F.StringType())
def format_telephone_number(phone_number):
if phone_number is None:
return None
if phone_number=='nan':
return None
if phone_number[0]== '+':
return '(' + phone_number[2:5] + ') ' + phone_number[5:8] + '-' + phone_number[8:12]
if '-' in phone_number:
return '(' + phone_number[0:3] + ') ' + phone_number[4:7] + '-' + phone_number[8:12]
if '.' in phone_number:
return '(' + phone_number[0:3] + ') ' + phone_number[4:7] + '-' + phone_number[8:12]
else:
return None
Related
I am working on simplifying the expression f = x'yz + xy'z + xyz' + xyz. Actually, it may not be this expression. The question is: simplify the boolean expression for a voting system, the system being: three people vote on multiple candidates, and two or more people should agree(true) on the candidate in order to pass. So I think the answer would be xy + yz + xz, but I can't figure out the process between. Can anyone explain?
From the idempotent/identity law, we have x + x = x, and so xyz + xyz = xyz. Applying this principle, we can rewrite your expression as:
f = x'yz + xy'z + xyz' + xyz
=> f = x'yz + xy'z + xyz' + xyz + xyz + xyz --OR with xyz twice without affecting the value
=> f = x'yz + xyz + xy'z + xyz + xyz' + xyz --Rearrange
=> f = yz (x + x') + xz (y + y') + xy(z' + z) --Group
=> f = yz + xz + xy --Since x+x' = 1
That said, as the diagram clearly shows, you can simply take AND together each pair of inputs, and OR them together to get the same result. By doing this, you ensure that:
If any 2 of the 3 inputs are true, your overall result is true
When all 3 are true, the result is still true
The advantage of expressing it in this way is that you can just focus on each pair of inputs at one time, without worrying about the impact of the third one.
A simple way without involved logical reasoning
Write a truth table. For three inputs, there are 2^3 = 8 rows.
Four rows correspond to the given terms in your sum-of-products expression.
Enter the eight values of your expression into a Karnaugh map:
Group adjacent 1-terms to blocks as shown.
A pair of cells can be merged into a bigger block, if they just differ in one input. This way, the blocks double their cell-count and reduce their input-count by one in every merge step.
Each of the resulting blocks corresponds to one implicant term in the minimized expression.
Drawing the map and finding the blocks can be done automatically using a nice online tool of Marburg University.
I just wanted to know if there is an easy way when I have following entries:
++++++++++++++++++
+ id + languages +
++++++++++++++++++
+ 1 + DE +
++++++++++++++++++
+ 2 + DE,EN +
++++++++++++++++++
+ 3 + FR +
++++++++++++++++++
and the value of the parameter in my procedure is 'DE,EN,FR' that he finds all above entries.
I googled and googled and came to the solution that I have to iterate over all values in the parameter and split them with a SUBSTRING and make a
FIND_IN_SET(splittedParam, `languages`)
for them.
Is there an easier (with shorter code) way?
To clarify how you should normalize this, I would suggest altering your table as follows;
ID | Langauge
-------------
1 | DE
2 | EN
2 | DE
3 | FR
I want to parse a text using substring. The format we have for the text is like this:
N, Adele, A, 18
And the substring we do is like this:
SUBSTRING_INDEX(SUBSTRING_INDEX(text, ',', 2), ', ', -1) as 'Name',
SUBSTRING_INDEX(SUBSTRING_INDEX(text, ',', 4), ', ', -1) as 'Age',
The output we get is:
| Name | Age |
| Adele | 18 |
But we want to change the text format to:
N Adele, A 18
What would be the correct syntax so can I parse the text in the position 1 (N Adele) and use the delimiter space and just get Adele? And then same for the next text (A 18)?
I tried doing
SUBSTRING_INDEX(SUBSTRING_INDEX(text, ' ', 1), ', ', -1) as 'Name',
But the output I got is just
| Name |
| N |
The output I was hoping for is like this:
| Name |
| Adele |
Presuming here that you want to change your original data structure and still be able to get the results out. You change your data structure to:
N Adele, A 18 -- etc
With the potential to have multiple names as the name (space separated), my previous example is not correct.
You could trim off the N and A directly with their space, knowing that they will only ever be two characters long and that they will always be there, like this:
SUBSTRING(TRIM(SUBSTRING_INDEX(`text`, ',', 1)), 3) AS 'Name',
SUBSTRING(TRIM(SUBSTRING_INDEX(`text`, ',', -1)), 3) AS 'Age'
To get:
Name | Age
--------------------
Adele | 18
You can use
SELECT
SUBSTRING(text, 2, INSTR(text, ',') - INSTR(text, ' ')) AS Name,
SUBSTRING(text, INSTR(text, ',') + 3, LENGTH(text) - INSTR(text, ',') + 3) AS Age
FROM your_table;
as the position of the field descriptors (N and A) are fixed (relative to the start of the string and to the comma). You can check the working query in this fiddle.
I'm trying to solve an equation with 5 unknowns in Mathcad 14. My equations look like this:
Given
0 = e
1 = d
0 = c
-1 = 81a + 27b + 9c + 3d + e
0 = 108a + 27b + 6c + d
Find(a,b,c,d,e)
Find(a,b,c,d,e) is marked as red and says "pattern match exception". What is the problem?
In mathcad you need to do something similar to:
c:=0
d:=1
e:=0
a:=1
b:=1
Given
81*a + 27*b + 9*c + 3*d + e = -1
108*a + 27*b + 6*c + d = 0
Find(a,b,c,d,e) = (0,0,0,0,-1)
Now, what I have done here is to define the variables BEFORE the Solve Block (Given...Find), you have to give initial values which you think are close to the solution you require in order for the iteration to succeed.
Tips: To get the equals sign in the Solve Block, use ctrl and '='. If your looking to solve for 5 unknowns then you need 5 equations, the original post looked like you knew 3 of the variables and were looking for a and b, in this case you would do the following:
c:=0
d:=1
e:=0
a:=1
b:=1
Given
81*a + 27*b + 9*c + 3*d + e = -1
108*a + 27*b + 6*c + d = 0
Find(a,b) = (0.111,-0.481)
This has held c, d and e to their original values and iterated to solve for a and b only.
Hope this helps.
In a table named TRY I have a column ABC which has records with value abc:30|def:g h i|j:k|l:m|n:o|p: |q: 0.25 |r:0.47|s:t u
I want to fetch the numeric value after r: The example given has value as r:0.47 But it can also have a value as 123456.012596363
I am not sure on using patindex. Can anyone please help.
Many Thanks
Try this...........
declare #abc nvarchar(100) = 'abc:30|def:g h i|j:k|l:m|n:o|p: |q: 0.25 |r:0.47|s:t u'
select
substring(substring(#abc,charindex('r:',#abc) + 2 ,len(#abc)),
1,
charindex('|',substring(#abc,charindex('r:',#abc) + 2,len(#abc))) - 1)
use this query for your table
select
substring(substring(abc,charindex('r:',abc) + 2 ,len(abc)),
1,
charindex('|',substring(abc,charindex('r:',abc) + 2,len(abc))) - 1)
from TRY
-----Final Try
select case when charindex('r:',abc) = 0 then abc else
substring(substring(abc,charindex('r:',abc) + 2 ,len(abc)),
1,
charindex('|',substring(abc,charindex('r:',abc) + 2,len(abc))) - 1) end
from TRY