I am trying to list down all the prime numbers up till a specific number e.g. 1000. The code gets slower as the number increase. I am pretty sure it is because of the for loop where (number -1) is checked by all the prime_factors. Need some advise how I can decrease the processing time of the code for larger numbers. Thanks
import time
t0 = time.time()
prime_list = [2]
number = 0
is_not_prime = 0
count = 0
while number < 1000:
print(number)
for i in range (2,number):
count = 0
if (number%i) == 0:
is_not_prime = 1
if is_not_prime == 1:
for j in range (0,len(prime_list)):
if(number-1)%prime_list[j] != 0:
count += 1
if count == len(prime_list):
prime_list.append(number-1)
is_not_prime = 0
count = 0
break
number += 1
print(prime_list)
t1 = time.time()
total = t1-t0
print(total)
Your solution, on top of being confusing, is very inefficient - O(n^3). Please, use the Sieve of Eratosthenes. Also, learn how to use booleans.
Something like this (not optimal, just a mock-up). Essentially, you start with a list of all numbers, 1-1000. Then, you remove ones that are the multiple of something.
amount = 1000
numbers = range(1, amount)
i = 1
while i < len(numbers):
n = i + 1
while n < len(numbers):
if numbers[n] % numbers[i] == 0:
numbers.pop(n)
else:
n += 1
i += 1
print(numbers)
Finally, I was able to answer because your question isn't language-specific, but please tag the question with the language you're using in the example.
I have a table that has 45 columns for tax values
| Tax1 | Tax2 | .......... | Tax 44 | Tax45 |
I read in a variable length positional record that can contain zero to 90 values. The record is structured so that the first 3 characters are the tax code (values 001 - 045) and the next 7 characters are the tax value:
Examples:
0010013.990140005.00
0040002.00
0150001.150320002.200410014.250420012.990430000.500440001.750450004.68
What I would like to do is, for each record:
if ISNULL(record) or LEN(record) < 10 (3 characters for the code, 7 characters for the value)
quit
else
determine the amount of 10 character sections
for each 10 character section
taxCode = SUBSTRING(record, 1, 3)
taxValue = SUBSTRING(record, 4, 10)
table.Tax(taxCode).Value = taxValue (ex: using the first example record, column Tax1 will hold a value of 0013.99, Tax14 will be 0005.00)
next section
all other Tax[n] columns will have a value of 0.00
end if
Is there a way to do this without having to create 45 variables, one for each corresponding column?
EDIT:
I apologize for the lack of clarity. I receive a flat file from our VMS database. This file has multiple record types per file (ie: IT01, IT02, IT03, IT04, IT05, IT06, IT07). Each record type is on its own line. I read this file into a staging table, which the record type from the data on the line. For example (this is the record type I am referring to in my question):
IT06404034001005.000031013.000
This gets loaded into my staging table as:
RecordType | RecordData |
------------------------------------------
IT06 | 404034001005.000031013.000
The RecordData field is then able to be broken down further as:
ItemNumber | RecordData |
-------------------------------------
404034 | 001005.000031013.000
With a little bit of up-front work, I was able to create a script task to do exactly as I needed it to.
Step 1: add a script component. set it up as a transformation
Step 2: define all of the output columns necessary (long and tedious task, but it worked)
Step 3: put the following code in the script
public override void Input0_ProcessInputRow(Input0Buffer Row){
int sizeOfDataSegment = 11; // size of single record to be parsed (item number/next price)
string recordDetail = Row.RecordDetail.ToString().Trim();
string itemNumber = recordDetail.Substring(0, 6);
//System.Windows.Forms.MessageBox.Show(String.Format("Record Detail: {0}", recordDetail));
// we need a record for every item number, regardless if there are taxes or not
Row.Company = Variables.strCompanyName;
Row.ItemNumber = itemNumber;
if (recordDetail.Length > 6){
string taxData = recordDetail.Substring(6);
if (string.IsNullOrEmpty(taxData)){
}
else{
if (taxData.Length % sizeOfDataSegment == 0){
int numberOfTaxes = taxData.Length / sizeOfDataSegment;
//System.Windows.Forms.MessageBox.Show(String.Format("Number of taxe codes: {0}", numberOfTaxes.ToString()));
int posTaxCode = 0;
for (int x = 0; x < numberOfTaxes; x++){
string taxCode = taxData.Substring(posTaxCode, 3);
string taxValue = taxData.Substring(posTaxCode + 3, 8);
string outputColumnName = "TaxOut" + Convert.ToInt32(taxCode).ToString();
//System.Windows.Forms.MessageBox.Show(String.Format("TaxCode: {0}" + Environment.NewLine + "TaxValue: {1}", taxCode, taxValue));
//using taxCode value (ie: 001), find and set the value for the corresponding table column (ie: Tax1)
//foreach (System.Reflection.PropertyInfo dataColumn in Row.GetType().GetProperties()){
foreach (System.Reflection.PropertyInfo dataColumn in Row.GetType().GetProperties()){
if (dataColumn.Name == outputColumnName){
if (Convert.ToDecimal(taxValue) < 0){
// taxValue is a negative number, and therefore a percentage value
taxValue = (Convert.ToDecimal(taxValue) * -1).ToString() + "%";
}
else{
// taxValue is a positive number, and therefore a dollar value
taxValue = "$" + Convert.ToDecimal(taxValue).ToString();
}
dataColumn.SetValue(Row, taxValue);
}
}
posTaxCode += sizeOfDataSegment;
}
}
else{
System.Windows.Forms.MessageBox.Show(String.Format("Invalid record length({0}): {1}", taxData.Length, taxData));
}
}
}
}
I am trying to combine text from column A and match it with each possibility of column B.I used the formulas:
in C1:
=transpose(split(join("", arrayformula(rept(filter(A1:A, len(A1:A))&char(9999), counta(B1:B)))), char(9999)))
in D1:
=transpose(split(rept(join(char(9999), filter(B1:B, len(B1:B)))&char(9999), counta(A1:A)), char(9999)))
but when I use it in my list I get these errors in C1 and D1 respectively;
Text result of JOIN is longer than the limit of 50000 characters
Text result of REPT is longer than the limit of 32000 characters
I tested this out with a smaller list of just:
a b c 1 2
and managed to get my list to generate this after combining the two cells:
a 1
a 2
a 3
b 1
b 2
b 3
but the list I am combining has a lot more text in each of the columns.
Any suggestions on how to combine my lists as shown above but with 132 possibilities in column A and 52 possibilities in column B?
Each line has between 70 and 150 characters of text in each row.
Go to menu Tools → Script Editor...
Paste this code:
function crossJoin(arr1, arr2, delim) {
delim = delim || '';
var result = [];
var row = [];
for (var i = 0; i < arr1.length; i++) {
for (var j = 0; j < arr2.length; j++) {
row = [];
row.push('' + arr1[0,i] + delim + arr2[0,j]);
result.push(row);
}
}
return result;
}
Save project.
Use it as regular function in spreadsheet:
=crossJoin(A1:A132,B1:B52)
Optionaly use delimeter:
=crossJoin(A1:A132,B1:B52, "-")
I apologize in advance if this question is too specific or involved for this type of forum. I have been a long time lurker on this site, and this is the first time I haven't been able to solve my issue by looking at previous questions, so I finally decided to post. Please let me know if there is a better place to post this, or if you have advice on making it more clear. here goes.
I have a data.table with the following structure:
library(data.table)
dt = structure(list(chr = c("chr1", "chr1", "chr1", "chr1", "chrX",
"chrX", "chrX", "chrX"), start = c(842326, 855423, 855426, 855739,
153880833, 153880841, 154298086, 154298089), end = c(842327L,
855424L, 855427L, 855740L, 153880834L, 153880842L, 154298087L,
154298090L), meth.diff = c(9.35200555410902, 19.1839617944039,
29.6734426495636, -12.3375577709254, 50.5830043986142, 52.7503561092491,
46.5783738475184, 41.8662800742733), mean_KO = c(9.35200555410902,
19.1839617944039, 32.962962583692, 1.8512250859083, 51.2741224212646,
53.0928367727283, 47.4901932463221, 44.8441659366298), mean_WT = c(0,
0, 3.28951993412841, 14.1887828568337, 0.69111802265039, 0.34248066347919,
0.91181939880374, 2.97788586235646), coverage_KO = c(139L, 55L,
55L, 270L, 195L, 194L, 131L, 131L), coverage_WT = c(120L, 86L,
87L, 444L, 291L, 293L, 181L, 181L)), .Names = c("chr", "start",
"end", "meth.diff", "mean_KO", "mean_WT", "coverage_KO", "coverage_WT"
), class = c("data.table", "data.frame"), row.names = c(NA, -8L
))
These are genomic coordinates with associated values, the file is sorted by by chromosome ("chr") (1 through 22, then X, then Y), start and end position so that the first row contains the lowest numbered start position on chromosome 1, and proceeds sequentially for all data points on chromosome 1, then 2, etc. At this point, every single row has a start-end length of 1. After collapsing the start-end lengths will vary depending on how many rows were collapsed and their distance from the adjacent row.
1st: I would like to collapse adjacent rows into larger start/end ranges based on the following criteria:
The two adjacent rows share the same value for the "chr" column (row 1 "chr" = chr1, and row 2 "chr" = chr1)
The two adjacent rows have "start" coordinate within 500 of one another (if row 1 "start" = 1000, and row 2 "start" <= 1499, collapse these into a single row; if row1 = 1000 and row2 = 1500, keep separate)
The adjacent rows must have the same sign for the "diff" column (i.e. even if chr = chr and start within 500, if diff1 = + 5 and diff2 = -5, keep entries separate)
2nd: I would like to calculate the coverage_ weighted averages of the collapsed mean_KO/WT columns with the weighting by the coverage_KO/WT columns:
Ex: collapse 2 rows,
row 1 mean_1 = 5.0, coverage_1 = 20.
row 2 mean_1 =40.0, coverage_1 = 45.
weighted avg mean_1 = (((5.0*20)/(20+45)) + ((40.0*45)/(20+45))) = 29.23
What I would like the output to look like (except collapsed row means would be calculated and not in string form):
library(data.table)
dt_output = structure(list(chr = c("chr1", "chr1", "chr1", "chrX", "chrX"
), start = c(842326, 855423, 855739, 153880833, 154298086), end = c(842327,
855427, 855740, 153880842, 154298090), mean_1 = c("9.35", "((19.18*55)/(55+55)) + ((32.96*55)/(55+55))",
"1.85", "((51.27*195)/(195+194)) + ((53.09*194)/(195+194))",
"((47.49*131)/(131+131)) + ((44.84*131)/(131+131))"), mean_2 = c("0",
"((0.00*86)/(86+87)) + ((3.29*87)/(86+87))", "14.19", "((0.69*291)/(291+293)) + ((0.34*293)/(291+293))",
"((0.91*181)/(181+181)) + ((2.98*181)/(181+181))")), .Names = c("chr",
"start", "end", "mean_1", "mean_2"), row.names = c(NA, -5L), class = c("data.table", "data.frame"))
Help with either part 1 or 2 or any advice is appreciated.
I have been using R for most of my data manipulations, but I am open to any language that can provide a solution. Thanks in advance.
def power(num, x = 1):
result = 1
for i in range(x):
result = result * num
return result
So I came across a tutorial on calling functions with 2 arguments and this one in the picture was used as an example to show how you could make a function called power(num, x=1) that takes an interval in the first argument and raises it to the power of the second argument. Can someone explain in laymen's terms why this happens and what exactly is going on in this function and 'for' loop?
First, range(x) is equivalent to range(0, x), and generates a sequence that ranges from 0 to x - 1. For example, with range(3) you get the sequence 0, 1, and 2, which has three elements. In general, range(x) generates a sequence that has x elements.
Second, for i in range(x) makes i iterates throught all the elements of range(x). Since range(x) has x elements, i will iterate through x different values, so the statements in the for loop will be executed x times.
With the above analysis, the body of the power function is equivalent to the following:
result = 1
result = result * num
result = result * num
// repeat x times
result = result * num
which is equivalent to:
result = 1 * num * num * ... * num // x nums here
which, apparently, is num raised to the power of x.
Update
Here's how this function works with specific input data. When num is 3 and x is 4, we have:
result = 1
result = result * num // = 1 * 3 = 3
result = result * num // = 3 * 3 = 9
reuslt = result * num // = 9 * 3 = 27
result = result * num // = 27 * 3 = 81 = 3^4
return result // 81 is returned
We can also show the execution process in more details:
result = 1
i = 0 // entering the loop
result = result * num // = 1 * 3 = 3
i = 1 // the second round of the loop begins
result = result * num // = 3 * 3 = 9
i = 2 // the third round of the loop begins
reuslt = result * num // = 9 * 3 = 27
i = 3 // the fourth and final round of the loop begins
result = result * num // = 27 * 3 = 81 = 3^4
// range(4) is exhausted, so the loop ends here
return result // 81 is returned