How to update 'enableNonSslPort' field of redis cache using azure cli? - azure-cli

I tried the below command to update the enableNonSslPort field to true of redis cache named 'emp' which has 'group1' has resource group.
az redis update --name 'emp' --resource-group 'group1' --enableNonSslPort 'true'
I m getting error,
Try this: 'az redis update --name --resource-group --vm-size '

This is because az redis update cmdlet does not have an option for enabling/disabling non SSL port:
az redis update [--add]
[--force-string]
[--ids]
[--name]
[--remove]
[--resource-group]
[--set]
[--sku {Basic, Premium, Standard}]
[--subscription]
[--vm-size {c0, c1, c2, c3, c4, c5, c6, p1, p2, p3, p4, p5}]
You can use az create command to do this (you cannot update non SSL port settings of Azure Redis cache using CLI):
az redis create --location
--name
--resource-group
--sku {Basic, Premium, Standard}
--vm-size {c0, c1, c2, c3, c4, c5, c6, p1, p2, p3, p4, p5}
[--enable-non-ssl-port]
[--minimum-tls-version {1.0, 1.1, 1.2}]
[--redis-configuration]
[--replicas-per-master]
[--shard-count]
[--static-ip]
[--subnet-id]
[--subscription]
[--tags]
[--tenant-settings]
[--zones {1, 2, 3}]

Related

Loading CSV in Neo4j is time consuming

I want load a CDR csv file with 648000 records to neo4j (4.4.10), But it is about 4 days and And it is not yet completed.
My CSV have 648000 records with 7 columns. and the size of file is about 48 MB.
My computer have 100 GB RAM and intel Zeon E5 CPU.
the columns of CSV are:
OP_Name
TP_Name
Called_Number
OP_ANI
Setup_Time
Duration
OP_Price
the code that I use to load CSV in Neo4j is:
```Cypher
:auto load csv with headers from 'file:///cdr.csv' as line FIELDTERMINATOR ','
with line
where line['Called_Number'] is not null and line['OP_ANI'] is not null
with line['OP_ANI'] as OP_Phone,
(CASE line['OP_Name']
WHEN 'TIC' THEN 'IRAN'
ELSE 'Foreign' END) AS OP_country,
line['Called_Number'] as Called_Phone,
(CASE line['TP_Name']
WHEN 'TIC' THEN 'IRAN'
ELSE 'Foreign' END) AS TP_country,
line['Setup_Time'] as Setup_Time,
line['Duration'] as Duration,
line['OP_Price'] as OP_Price
call {
with OP_Phone, OP_country, Called_Phone, TP_country, Setup_Time, Duration, OP_Price
MERGE (c:Customer{phone: toInteger(Called_Phone)})
on create set c.country = TP_country
WITH c, OP_Phone, OP_country, Called_Phone, TP_country, Setup_Time, Duration, OP_Price
CALL apoc.create.addLabels( c, [ c.country ] ) YIELD node
MERGE (c2:Customer{phone: toInteger(OP_Phone)})
on create set c2.country = OP_country
WITH c2, OP_Phone, OP_country, Called_Phone, TP_country, Setup_Time, Duration, OP_Price, c
CALL apoc.create.addLabels( c2, [ c2.country ] ) YIELD node
MERGE (c2)-[r:CALLED{setupTime: Setup_Time,
duration: Duration,
OP_Price: OP_Price}]->(c)
} IN TRANSACTIONS
```
How can I speed up the load operation?
MERGE acts as an upsert in Neo4j. So the statement:
MERGE (c:Customer{phone: toInteger(Called_Phone)})
checks if there is a Customer node with the given phone number is there or not. If it is, it performs the update otherwise creates the node. When there is a large number of nodes, this lookup can be very slow, and CSV import will be slow overall. Creating an index on the phone property of Customer should do the trick. You can create the index like this:
CREATE INDEX phone IF NOT EXISTS FOR (n:Customer) ON (n.phone)

Duplicate row but update a field in json

I'm trying to duplicate all rows in my table that contain a signalVersion: prod but in the duplicated row I'd like to set signalVersion to 0. There are two keys in my table, signal_key and signal_value and both are json objects. signalVersion is a property of the object in column signal_key. Below is an example of signal_key:
{
"signalType": "OCR_ITEM",
"signalVersion": "prod"
}
This is the code I've written so far but it's failing. Does anyone know why my syntax is incorrect?
insert into signals (signal_key, signal_value)
select signal_key -> '{"signalVersion": "0"}', signal_value
from signals
where signal_key #> '{"signalVersion": "prod"}';
You'll want to use the || operator to merge {"signalVersion": "0"} into the signal_key, not ->:
insert into signals (signal_key, signal_value)
select signal_key || '{"signalVersion": "0"}', signal_value
from signals
where signal_key #> '{"signalVersion": "prod"}';
Alternatively, you could use json_set:
insert into signals (signal_key, signal_value)
select json_set(signal_key, array['signalVersion'], 0), signal_value
from signals
where signal_key #> '{"signalVersion": "prod"}';

how parallel fetch data from MySQL with Sequel Pro in R

I want to fetch data from mysql with seqlpro in R but when I run the query it takes ages.
here is my code :
old_value<- data.frame()
new_value<- data.frame()
counter<- 0
for (i in 1:length(short_list$id)) {
mydb = OpenConn(dbname = '**', user = '**', password = '**', host = '**')
query <- paste0("select * from table where id IN (",short_list$id[i],") and country IN ('",short_list$country[i],"') and date >= '2019-04-31' and `date` <= '2020-09-1';", sep = "" )
temp_old <- RMySQL::dbFetch(RMySQL::dbSendQuery(mydb, query), n = -1
query <- paste0("select * from table2 where id IN (",short_list$id[i],") and country IN ('",short_list$country[i],"') and date >= '2019-04-31' and `date` <= '2020-09-1';", sep = "" )
temp_new <- RMySQL::dbFetch(RMySQL::dbSendQuery(mydb, query), n = -1)
RMySQL::dbDisconnect(mydb)
new_value<- rbind(temp_new,new_value)
old_value<- rbind(temp_old,old_value)
counter=counter+1
base::print(paste("completed for ",counter),sep="")
}
is there any way that I can writ it more efficient and call the queries faster because i have around 5000 rows which should go into the loop. Actually this query works but it takes time.
I have tried this but still it gives me error :
#parralel computing
clust <- makeCluster(length(6))
clusterEvalQ(cl = clust, expr = lapply(c('data.table',"RMySQL","dplyr","plyr"), library, character.only = TRUE))
clusterExport(cl = clust, c('config','short_list'), envir = environment())
new_de <- parLapply(clust, short_list, function(id,country) {
for (i in 1:length(short_list$id)) {
mydb = OpenConn(dbname = '*', user = '*', password = '*', host = '**')
query <- paste0("select * from table1 where id IN (",short_list$id[i],") and country IN ('",short_list$country[i],"') and source_event_date >= date >= '2019-04-31' and `date` <= '2020-09-1';", sep = "" )
temp_data <- RMySQL::dbFetch(RMySQL::dbSendQuery(mydb, query), n = -1) %>% data.table::data.table()
RMySQL::dbDisconnect(mydb)
return(temp_data)}
})
stopCluster(clust)
gc(reset = T)
new_de <- data.table::rbindlist(new_de, use.names = TRUE)
I have also defined the list of short_list as following:
short_list<- as.list(short_list)
and inside short_list is:
id. country
2 US
3 UK
... ...
However it gives me this error:
Error in checkForRemoteErrors(val) :
one node produced an error: object 'i' not found
However when I remove i from the id[i] and country[i] it only give me the first row result not get all ids and country result.
I think an alternative is to upload the ids you need into a temporary table, and query for everything at once.
tmptable <- "mytemptable"
dbWriteTable(conn, tmptable, short_list, create = TRUE)
alldat <- dbGetQuery(conn, paste("
select t1.*
from ", tmptable, " tmp
left join table1 t1 on tmp.id=t1.id and tmp.country=t1.country
where t1.`date` >= '2019-04-31' and t1.`date` <= '2020-09-1'"))
dbExecute(conn, paste("drop table", tmptable))
(Many DBMSes use a leading # to indicate a temporary table that is only visible to the local user, is much less likely to clash in the schema namespace, and is automatically cleaned when the connection is closed. I generally encourage use of temp-tables here, check with your DB docs, schema, and/or DBA for more info here.)
The order of tables is important: by pulling all from mytemptable and then left join table1 onto it, we are effectively filtering out any data from table1 that does not include a matching id and country.
This doesn't solve the speed of data download, but some thoughts on that:
Each time you iterate through the queries, you have not-insignificant overhead; if there's a lot of data then this overhead should not be huge, but it's still there. Using a single query will reduce this overhead significantly.
Query time can also be affected by any index(ices) on the tables. Outside the scope of this discussion, but might be relevant if you have a large-ish table. If the table is not indexed efficiently (or the query is not structured well to use those indices), then each query will take a finite amount of time to "compile" and return data. Again, overhead that will be reduced with a single more-efficient query.
Large queries might benefit from using the command-line tool mysql; it is about as fast as you're going to get, and might iron over any issues in RMySQL and/or DBI. (I'm not saying they are inefficient, but ... it is unlikely that a free open-source driver will be faster than MySQL's own command-line utility.
As for doing this in parallel ...
You're using parLapply incorrectly. It accepts a single vector/list and iterates over each object in that list. You might use it iterating over the indices of a frame, but you cannot use it to iterate over multiple columns within that frame. This is exactly like base R's lapply.
Let's show what is going on when you do your call. I'll replace it with lapply (because debugging in multiple processes is difficult).
# parLapply(clust, mtcars, function(id, country) { ... })
lapply(mtcars, function(id, country) { browser(); 1; })
# Called from: FUN(X[[i]], ...)
debug at #1: [1] 1
id
# [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2
# [24] 13.3 19.2 27.3 26.0 30.4 15.8 19.7 15.0 21.4
country
# Error: argument "country" is missing, with no default
Because the argument (mtcars here, short_list in yours) is a data.frame, since it is a list-like object, lapply (and parLapply) operate on each column at a time. You were hoping that it would "unzip" the data, applying the first column's value to id and the second column's value to country. In fact, the is a function that does this: Map (and the parallel's clusterMap, as I suggested in my comment). More on that later.
The intent of parallelizing things is to not use the for loop inside the parallel function. If short_list has 10 rows, and if your use of parLapply were correct, then you would be querying all rows 10 times, making your problem significantly worse. In pseudo-code, you'd be doing:
parallelize for each row in short_list:
# this portion is run simultaneously in 10 difference processes/threads
for each row in short_list:
query for data related to this row
Two alternatives:
Provide a single argument to parLapply representing the rows of the frame.
new_de <- new_de <- parLapply(clust, seqlen(NROW(short_list)), function(rownum) {
mydb = OpenConn(dbname = '*', user = '*', password = '*', host = '**')
on.exit({ DBI::dbDisconnect(mydb) })
tryCatch(
DBI::dbGetQuery(mydb, "
select * from table1
where id=? and country=?
and source_event_date >= date >= '2019-04-31' and `date` <= '2020-09-1'",
params = list(short_list$id[rownum], short_list$country[rownum])),
error = function(e) e)
})
Use clusterMap for the same effect.
new_de <- clusterMap(clust, function(id, country) {
mydb = OpenConn(dbname = '*', user = '*', password = '*', host = '**')
on.exit({ DBI::dbDisconnect(mydb) })
tryCatch(
DBI::dbGetQuery(mydb, "
select * from table1
where id=? and country=?
and source_event_date >= date >= '2019-04-31' and `date` <= '2020-09-1'",
params = list(id, country)),
error = function(e) e)
}, short_list$id, short_list$country)
If you are not familiar with Map, it is like "zipping" together multiple vectors/lists. For example:
myfun1 <- function(i) paste(i, "alone")
lapply(1:3, myfun1)
### "unrolls" to look like
list(
myfun1(1),
myfun1(2),
myfun1(3)
)
myfun3 <- function(i,j,k) paste(i, j, k, sep = '-')
Map(f = myfun3, 1:3, 11:13, 21:23)
### "unrolls" to look like
list(
myfun3(1, 11, 21),
myfun3(2, 12, 22),
myfun3(3, 13, 23)
)
Some liberties I took in that adapted code:
I shifted from the dbSendQuery/dbFetch double-tap to a single call to dbGetQuery.
I'm using DBI functions, since DBI functions provide a superset of what each driver's package provides. (You're likely using some of it anyway, perhaps without realizing it.) You can switch back with no issue.
I added tryCatch, since sometimes errors can be difficult to deal with in parallel processes. This means you'll need to check the return value from each of your processes to see if either inherits(ret, "error") (problem) or is.data.frame (normal).
I used on.exit so that even if there's a problem, the connection closure should still occur.

Substring_index between 2 delimiters

I am trying to isolate a string in my database that is located between 2 \ 's
Z:\PHR Archives\2016\08-August\2016-08-14 Grasshopper Newfoundland\0059 - Jimmy - SGB05-09 - Bon Jovi - Runaway.mp3
above is a sample row -- I need "2016-08-14 Grasshopper Newfoundland"
Basically everything after the 4th \ up until the next \
I tried
substring_index(substring_index(filename, '\\', -3), '\\', 1)
But it's only working on some rows --

Using AWK to merge two files based on multiple conditions

I know this question has been asked several times before. Here is one example:
Using AWK to merge two files based on multiple columns
My goal is to print out columns 2, 4, 5 and 7 of file_b and columns 17 and 18 of file_a if the following match occurs:
Columns 2, 6 and 7 of file_a.csv matches with Columns 2, 4 and 5 of file_b.csv respectively.
But no matter how much I try, I can't get it to work for my case. Here are my two files:
file_a.csv
col2, col6, col7, col17, col18
a, b, c, 145, 88
e, f, g, 101, 96
x, y, z, 243, 222
file_b.csv
col2, col4, col5, col7
a, b, c, 4.5
e, f, g, 6.3
x, k, l, 12.9
Output should look like this:
col2, col4, col5, col7, col17, col18
a, b, c, 4.5, 145, 88
e, f, g, 6.3, 101, 96
I tried this:
awk -F, -v RS='\r\n' 'NR==FNR{key[$2 FS $6 FS $7]=$17 FS $18;next} {if($2 FS $4 FS $5 in key); print $2 FS $4 FS $5 FS $7 FS key[$2 FS $6 FS $7]}' file_a.csv file_b.csv > out.csv
Currently the output I am getting is:
col2, col4, col5, col7,
a, b, c, 4.5,
e, f, g, 6.3,
In other words, col17 and col18 from file_a is not showing up.
Yesterday I asked a related question where I was having issues with line breaks. That got answered and solved but now I think this problem is related to checking the if condition.
Update:
I am sharing links to truncated copies of the actual data. The only difference between these files and the actual ones are that the real ones have millions of rows. These ones only have 10 each.
file_a.csv
file_b.csv
Please try this (GNU sed):
awk 'BEGIN{RS="\r\n";FS=OFS=",";SUBSEP=FS}NR==FNR{arr[$2,$6,$7]=$17 FS $18;next} {if(arr[$2,$4,$5]) print $2,$4,$5,$7,arr[$2,$4,$5]}'
This is the time BEGIN block kicks in. Also OFS kicks in.
When we are printing out many fields which separated by same thing, we can set OFS, and simply put comma between the things we want to print.
There's no need to check key in arr when you've assigned value for a key in the array,
by default, when arr[somekey] isn't assigned before, it's empty/"", and it evaluates to false in awk (0 in scalar context), and a non-empty string is evaluates to true (There's no literally true and false in awk).
(You used wrong array name, the $2,$6,$7 is the key in the array arr here. It's confusing to use key as array name.)
You can test some simple concept like this:
awk 'BEGIN{print arr["newkey"]}'
You don't need a input file to execute BEGIN block.
Also, you can use quotes sometimes, to avoid confusion and underlying problem.
Update:
Your files actually ends in \n, if you can't be sure what the line ending is, use this:
awk 'BEGIN{RS="\r\n|\n|\r";FS=OFS=",";SUBSEP=FS}NR==FNR{arr[$2,$6,$7]=$17 FS $18;next} {if(arr[$2,$4,$5]) print $2,$4,$5,$7,arr[$2,$4,$5]}' file_a.csv file_b.csv
or this (This one will ignore empty lines):
awk 'BEGIN{RS="[\r\n]+";FS=OFS=",";SUBSEP=FS}NR==FNR{arr[$2,$6,$7]=$17 FS $18;next} {if(arr[$2,$4,$5]) print $2,$4,$5,$7,arr[$2,$4,$5]}' file_a.csv file_b.csv
Also, it's better to convert first to avoid such situations, by:
sed -i 's/\r//' files
Or you can use dos2unix command:
dos2unix file
It's a handy commandline tool do above thing only.
You can install it if you don't have it in your system yet.
Once converted, you don't need to assign RS in normal situations.
$ awk 'BEGIN {RS="\r\n"; FS=OFS=","}
NR==FNR {a[$2,$6,$7]=$17 OFS $18; next}
($2,$4,$5) in a {print $2,$4,$5,$7,a[$2,$4,$5]}' file1 file2 > output
Your main issue is, in the array lookup the index you should use is the second file key, not the first file key. Also the semicolon after the if condition is wrong. The rest is cosmetics only.
Not sure you want the output \r\n terminated, if so set ORS=RS as well, otherwise it's newline only.
Since you have mentioned that the file is huge, you can give a try to Perl, if that is an option.
The files are assumed to have "\r".
$ cat file_a.csv
col2, col6, col7, col17, col18
a, b, c, 145, 88
e, f, g, 101, 96
x, y, z, 243, 222
$ cat file_b.csv
col2, col4, col5, col7
a, b, c, 4.5
e, f, g, 6.3
x, k, l, 12.9
$ perl -F, -lane 'BEGIN { %kv=map{chomp;chop;#a=split(",");"$a[0],$a[1],$a[2]"=>"$a[3]"} qx(cat file_b.csv) } if($.>1){ $x="$F[0],$F[1],$F[2]";chomp($F[-1]);print "$x,$kv{$x}",join(",",#F[-2,-1]) if $kv{$x} } ' file_a.csv
a, b, c, 4.5 145, 88
e, f, g, 6.3 101, 96
$