Dataframe is of type 'nonetype'. How should I alter this to allow merge function to operate? - mysql

I have pulled in data from a number of csv files, as well as a database. I wish to use a merge function to make a dataframe isolating the phone numbers that are contained in both dataframes(one originating from csv, the other originating from the database). However, the dataframe from the database displays as type 'nonetype.' This disallows any operation such as merge. How can i change this to allow the operation?
The data comes in from the database as a list of tuples. I then convert this to a dataframe. However, as stated above, it displays as 'nonetype.' I'm assuming at the moment I am confused about about how dataframes handle data types.
#Grab Data
mycursor = mydb.cursor()
mycursor.execute("SELECT DISTINCT(Cell) FROM crm_data.ap_clients Order By Cell asc;")
apclients = mycursor.fetchall()
#Clean Phone Number Data
for index, row in data.iterrows():
data['phone_number'][index] = data['phone_number'][index][-10:]
for index, row in data2.iterrows():
data2['phone_number'][index] = data2['phone_number'][index][-10:]
for index, row in data3.iterrows():
data3['phone_number'][index] = data3['phone_number'][index][-10:]
#make data frame from csv files
fbl = pd.concat([data,data2,data3], axis=0, sort=False)
#make data frame from apclients(database extraction)
apc = pd.DataFrame(apclients)
#perfrom merge finding all records in both frames
successfulleads= pd.merge(fbl, apc, left_on ='phone_number', right_on='0')
#type(apc) returns NoneType
The expected results are to find all records in both dataframes, along with a count so that I may compare the two sets. Any help is greatly appreciated from this great community :)

So it looks like I had a function to rename the column of the dataframe as shown below:
apc = apc.rename(columns={'0': 'phone_number'}, inplace=True)
for col in apc.columns:
print(col)
the code snippet out of the above responsible:
inplace=True
This snippet dictates whether or not the object is modified in the dataframe, or whether a copy is made. The return type on said object is of nonetype.
Hope this helps whoever ends up in my position. A great thanks again to the community. :)

Related

Importing multiple 1D JSON arrays in Excel

I'm trying to import a JSON file containing multiple unrelated 1D arrays with variable amount of elements into Excel.
The JSON I wrote is :
{
"table":[1,2,3],
"table2":["A","B","C"],
"table3":["a","b","c"]
}
When I import the file using Power Query and expand the columns, it multiplies the previous entries each time I expand a new column.
enter image description here
I there a way to solve this, shows the elements of each array below each other and each array as a new column?
One method would be to transform each Record into a List and then create a table using Table.FromColumns method.
This needs to be done from the Advanced Editor:
Read the code comments and explore the Applied Steps to better understand.
Also HELP topics for the various functions will be useful
let
//Change following line to reflect your actual data source
Source = Json.Document(File.Contents("C:\Users\ron\Desktop\New Text Document.txt")),
//Get Field Names (= table names)
fieldNames = Record.FieldNames(Source),
//Create a list of lists whereby each sublist is derived from the original record
jsonLists = List.Accumulate(fieldNames,{}, (state, current)=> state & {Record.Field(Source,current)}),
//Convert the lists into columns of a new table
myTable = Table.FromColumns(
jsonLists,
fieldNames
)
in
myTable
Results

Power Automate Desktop - Convert a data table (with multiple rows) to JSON

I've been researching the best way to convert a data table from excel (with multiple rows) to JSON.
I found a solution on here that appears to "mostly" work, but I am not familiar with JSON to know if it's converting multiple rows correctly.
Here is the data table that I am starting with (from excel)
Here are the steps I took to convert this to JSON
Step 1: Set variable called INVObject to be empty to initialize it
Step 3: Added a For each to loop through each Data Row in the Data Table
Step 4: Added a Set Variable to set the INVObject (Custom Object) to the Data Table for each loop in the For each
Step 5: Convert the Custom Object INVObject to JSON
Results: There is one row/object with all 3 rows from the Data table on the same row
If you scroll to the right, the 2nd row eventually starts and then the 3rd row.
I was expecting to see 3 lines/rows/object to represent the 3 different rows in the Data table.
Can someone provide some insight as to if I am doing something wrong or if this is the expected results for multiple rows?
Thank You!
There is an option in Actions under Variables: 'Convert Custom Object to JSON'
https://learn.microsoft.com/en-us/power-automate/desktop-flows/actions-reference/variables#convertcustomobjecttojson

Pulling One Element From A CSV File

I'm trying to write a function that will return the most recent 'closing' value in a csv file containing the data of a cryptocurrency. The csv file contains 6 columns and about 900 rows and I'm looking to only pull one element of the table.
However, I seem to faced a fair bit of difficulty in pulling this off for some reason. The function below returns values from the column I want, however it seems to be pulling values from the very bottom of the document (whereas I want the most recent values).
Also, just a side note to explain what I was attempting to do with the 'count'. Since I'm expecting the value I want to be located on the second row, I wanted my for loop to only iterate through two lines of the file. However, as the result of the function went on to reveal to me, as it currently stands with the counter I'm returning two values from the function.
I understand there must be a much less convoluted way of getting the information I need so am open to any solution to the problem. Though, that being said, I'd be really interested to see where I went wrong here as I'm fairly new to Python.
Thanks a lot!
def csv_to_close(csv_file):
with open(f"{csv_file}.csv", 'r') as csvfile:
csv_file = csv.reader(csvfile)
running = True
count = 0
while running == True:
if count < 2:
for column in csv_file:
close = column[4]
count += 1
else:
running = False
print(close)

SSIS - Process a flat file with varying data

I have to process a flat file whose syntax is as follows, one record per line.
<header>|<datagroup_1>|...|<datagroup_n>|[CR][LF]
The header has a fixed-length field format that never changes (ID, timestamp etc). However, there are different types of data groups and, even though fixed-length, the number of their fields vary depending on the data group type. The three first numbers of a data group define its type. The number of data groups in each record varies also.
My idea is to have a staging table with to which I would insert all the data groups. So two records like this,
12320160101|12323456KKSD3467|456SSGFED43520160101173802|
98720160102|456GGLWSD45960160108854802|
Would produce three records in the staging table.
ID Timestamp Data
123 01/01/2016 12323456KKSD3467
123 01/01/2016 456SSGFED43520160101173802
987 02/01/2016 456GGLWSD45960160108854802
This would allow me to preprocess the staged records for further processing (some would be discarded, some have their data broken down further). My question is how to break down the flat file into the staging table. I can split the entire record with pipe (|) and then use a Derived Column Transformation to break down the header with SUBSTRING. After that it gets trickier because of the varying number of data groups.
The solution I came up with myself doesn't try to split at the flat file source, but rather in a script. My Data Flow looks like this.
So the Flat File Source output is just a single column containing the entire line. The Script Component contains output columns for each column in the Staging table. The script looks like this.
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
var splits = Row.Line.Split('|');
for (int i = 1; i < splits.Length; i++)
{
Output0Buffer.AddRow();
Output0Buffer.ID = splits[0].Substring(0, 11);
Output0Buffer.Time = DateTime.ParseExact(splits[0].Substring(14, 14), "yyyyMMddHHmmssFFF", CultureInfo.InvariantCulture);
Output0Buffer.Datagroup = splits[i];
}
}
Note that in the SynchronousInputID property (Script Transformation Editor > Input and Outputs > Output0) must be set to None. Otherwise you won't have Output0Buffer available in your script. Finally the OLE DB Destination just maps the script output columns to the Staging table columns. This solves the problem I had with creating multiple output Records from a single input record.

Import CSV column from different file into new file

I have 2 CSV files almost identical with the following differences:
The first has a column, "date".
The second doesn't have "date" and also has 50 rows less than the 1st ("email").
They are a list of subscribers with date created. The second, however, is the updated list with subscribers who wanted to be removed, but this no longer has the date created.
Is there any way to import column "date" from 1st CSV into the 2nd CSV by making a reference to the "email" column so I can get the correct date of that subscriber?
Sorry, there seems to be not a ready made (probably an evening's worth of effort) command line tool available.
You could look at different ways, one complex way is to load it in tables, to the merge (using a select and join on the two tables) and export it back as csv.
The simplest I could think of was to use R (given that you have header names, in your CSV?):
csv1_data <- read.csv('/path/to/csv1.csv')
csv2_data <- read.csv('/path/to/csv2.csv')
merged_csv <- merge(csv1_data, csv2_data)
write.table(merged_csv,file="/path/to/merged_csv.csv",sep=",",row.names=T)
The first 2 lines load the data in R, the 3 line merges them using the default S3 method, the final line exports the result as a csv file, with the headers.
Hope this helps!