Yii2 eager loading with subquery instead of array with id's - mysql

For a big application I am using the following query to get all projects with relations:
$project_query = Project::find()->With(['category', 'deliveryTickets', 'garbagePerProjects', 'hourLogs',
'materialPerProjects', 'employeePerProjects', 'contact', 'invoices'])
->where(['project.organization_id' => $this->organization_id]);
which generates the following query, for example :
SELECT * FROM `delivery_ticket` WHERE `project_id` IN (124, 137, 147, 148, 149, 219, 222, 241, 1263, 1324, 1325, 1333, 1378, 1423, 1499, 1627, 1687, 1688, 1689, 1690, 1705, 1706, 1962, 2047, 2643, 2774, 2876, 2912, 3005, 3287, 3334, 4251, 4570, 4758, 4963, 5644, 6168, 6605, 6639, 6991, 7000, 7003, 7098, 7530, 7531, 7733, 7734, 7823, 7927, 8452, 8752, 8868, 8903, 8914, 8916, 8917, 8921, 8923, 8931, 8947, 8948, 8949, 8952, 8969, 9042, 9134, 9136, 9137, 9280, 9671, 10262, 10272, 10712, 10730, 11436, 11459, 11520, 11641, 11774, 11776, 12028, 12178, 12323, 12831, 12884, 13050, 13478, 13479, 13595, 13651, 13716, 13946, 14431, 14447, 14523, 15303, 15343, 16269, 16270, 16491, 16513, 17950, 17951)
Mysql explain shows that it is using range instead of eq_ref
therefor my page takes 3 seconds to load.
How can turn this query in a subquery ?

range is ambiguous in this situation. To investigate further, provide EXPLAIN SELECT ... and perform these steps:
FLUSH STATUS;
SELECT ...;
SHOW SESSION STATUS LIKE "Handler%";
and look at the largest number that comes out. Possible cases:
Same as number of rows in the table -- You need INDEX(project_id).
Same as number of rows in output (103?) -- Then it did the optimal, namely leapfrog through the table, not do some big "range" scan like it implied. As for "3 seconds" -- that will take some more head scratching.
Some other number -- What version are you running? (This may take more investigation.) And provide SHOW CREATE TABLE delivery_ticket.

Related

Data inserted into the database is missing when fetching data using separate lambda

I ran into this issue using AWS Lambda with API Gateway.
The workflow was that one lambda inserts an item into the database. Then another lambda fetches the list of items from DB.
The newly created item was missing from the initial fetch. But if the listing lambda is invoked again all items are fetched correctly (No idea why).
I defined the connection outside of the lambda handler (as recommended).
This issue was fixed by forcefully closing DB connections inside the lambda handler on all invocations. So now each lambda invocation creates a new connection.
I wanted to know why this is happening. Is the closing of the connection/cursor necessary?
Some code to give more clarity. The issue can be reproduced by using the below code. Consider each connection defined in the example being used in separate lambdas
def sample_code():
# Create Connection 1 (//Lambda 1)
conn_1 = get_connection()
cursor_1 = get_cursor(conn_1)
random_string = str(random.randint(1, 100))
cursor_1.execute(SQL_INSERT, ("John", random_string))
conn_1.commit()
print(f"1. Inserted {('John', random_string)}")
cursor_1.execute(SQL_GET)
result = cursor_1.fetchall()
print(f"1. All rows -> {result=}")
# Create a new Connection/Cursor 2 (//Lambda 2)
conn_2 = get_connection()
cur_2 = get_cursor(conn_2)
random_string = str(random.randint(1, 100))
cur_2.execute(SQL_INSERT, ("Doe", random_string))
conn_2.commit()
print(f"2. Inserted {('Doe', random_string)}")
cur_2.execute(SQL_GET)
results = cur_2.fetchall()
cur_2.close()
conn_2.close()
print(f"2. All rows -> {results=}")
# Fetch results using Connection/Cursor 1. (Lambda 1)
cursor_1.execute(SQL_GET)
get_resp = cursor_1.fetchall()
print(f"\n1. All rows {get_resp=}")
# Create a new connection 3 (//Lambda 3)
con3 = get_connection()
cur_3 = get_cursor(con3)
cur_3.execute(SQL_GET)
results = cur_3.fetchall()
print(f"\n3. All rows -> {results=}")
output
1. Inserted ('John', '75')
1. All rows -> result=((None, '75', 'John'),)
2. Inserted ('Doe', '66')
2. All rows -> results=((None, '75', 'John'), (None, '66', 'Doe'))
1. All rows get_resp=((None, '75', 'John'),)
3. All rows -> results=((None, '75', 'John'), (None, '66', 'Doe'))
My problem is the data inserted using connection 2 is missing when it is fetched using connection 1.
The database is MySQL and uses PyMysql client.

Python3: Adding equal elements together from json format

Data = [{'Ferrari': 51078}, {'Volvo': 83245, 'Ferrari': 70432, 'Skoda':
29264, 'Lambo': 862},
{'Ferrari': 306415, 'Jeep': 4025, 'Saab': 2708, 'Lexus': 161}, {'Fiat':
27583, 'Maserati': 11030, 'Renault': 3194, 'Volvo': 259, 'Skoda': 164},
{'Ferrari': 2313172, 'Renault': 2475},
{'Volvo': 198671}, {'Volvo': 15762}]
I want to add together the numbers for each car, so I get the total amount for each element (the numbers below aren't accurate with the Data and just an example):
Ferrari: 152455
Volvo: 13515
Skoda: 1532
Lambo: 4366
Renault: 4262
Maserati: 2345
Lexus: 235
Jeep: 124
Saab: 15
I've tried with sum(), append it to new lists, collections and many other potential solutions, but I just cannot get this one right. I'm searching for a general solution not only applicable to my problem, so if I change my dataset and hence the numbers and cars, it needs to work also for the new Data.
I'm using Python3.
You can use defaultdict. The code below iterates over the list of dicts. Then taking out a random key-value pair until each dict is empty and summing the results.
from collections import defaultdict
data = [{'Ferrari': 51078},
{'Volvo': 83245, 'Ferrari': 70432, 'Skoda': 29264, 'Lambo': 862},
{'Ferrari': 306415, 'Jeep': 4025, 'Saab': 2708, 'Lexus': 161},
{'Fiat': 27583, 'Maserati': 11030, 'Renault': 3194, 'Volvo': 259, 'Skoda': 164},
{'Ferrari': 2313172, 'Renault': 2475},
{'Volvo': 198671},
{'Volvo': 15762}]
output = defaultdict(int)
for d in data:
while d:
k, v = d.popitem()
output[k] += v
print(output)
Outputs
defaultdict(<class 'int'>, {'Ferrari': 2741097,
'Lambo': 862,
'Skoda': 29428,
'Volvo': 297937,
'Lexus': 161,
'Saab': 2708,
'Jeep': 4025,
'Renault': 5669,
'Maserati': 11030,
'Fiat': 27583})

Neo4j: Loading CSV file combined with substring function

I am trying to load a *.csv file into neo4j and in the same load statement split the line (which has no delimiters but has a set location for the data that I need to create nodes from). I want to use the substring function, I can't figure out how to get it to work. The data reads in as a single line:
0067011990999991958051507004+68750+023550FM-12+038299999V0203301N00671220001CN9999999N9+00001+99999999999
I have tried using the following code:
LOAD CSV WITH HEADERS FROM "file:/c:/itw/Ltemps.csv" AS line
WITH line
WHERE line.year IS split((substring(line, 15, 19))) and line.temp IS split((substring(line, 88, 92))) and line.qlfr IS split((substring(line, 87, 88))) and line.qual IS split((substring(line, 92, 93)))
MERGE (y:Year {year:line.year})
MERGE (t:Temp {temp:line.temp})
MERGE (f:Qlfr {qlfr:line.qlfr})
MERGE (q:Qual {qual:line.qual})
CREATE (y)-[r:HAS_TEMP]->(t);
I am looking to get 4 nodes: year, temp (an absolute value), a qualifier (positive or negative symbol), and a quality number. The indexes on for where the data lies in the string should be accurate.
First, let's try to get the indexes and types right. To convert numeric substrings to integers, we use the toInteger function:
WITH '0067011990999991958051507004+68750+023550FM-12+038299999V0203301N00671220001CN9999999N9+00001+99999999999' AS line
RETURN
toInteger(substring(line, 15, 4)) AS year,
toInteger(substring(line, 88, 2)) AS temp,
substring(line, 87, 1) AS qlfr,
toInteger(substring(line, 92, 1)) AS qual
This gives:
╒══════╤══════╤══════╤══════╕
│"year"│"temp"│"qlfr"│"qual"│
╞══════╪══════╪══════╪══════╡
│1958 │0 │"+" │1 │
└──────┴──────┴──────┴──────┘
If the results good look, add back LOAD CSV the MERGE clauses. Two things:
I don't think it makes sense to use WITH HEADERS, as headers are useless in this case. Simply load the row and use row[0] as the line for splitting.
It is possible to simplify the MERGE by combining the your first two MERGE clauses with the CREATE clause.
So the loader code is the following:
LOAD CSV FROM 'file:/c:/itw/Ltemps.csv' AS row
WITH row[0] AS line
WITH
toInteger(substring(line, 15, 4)) AS year,
toInteger(substring(line, 88, 2)) AS temp,
substring(line, 87, 1) AS qlfr,
toInteger(substring(line, 92, 1)) AS qual
MERGE (y:Year {year: year})-[r:HAS_TEMP]->(t:Temp {temp: temp})
MERGE (f:Qlfr {qlfr: qlfr})
MERGE (q:Qual {qual: qual})

neo4j: How to load CSV using conditional correctly?

What I am trying to do is to import a dataset with a tree data structure inside from CSV to neo4j. Nodes are stored along with their parent node and depth level (max 6) in the tree. So I try to check depth level using CASE and then add a node to its parent like this (creating a node just for 1st level so far for testing purpose):
export FILEPATH=file:///Example.csv
CREATE CONSTRAINT ON (n:Node) ASSERT n.id IS UNIQUE;
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS
FROM {FILEPATH} AS line
WITH DISTINCT line,
line.`Level` AS level,
line.`ParentCodeID_Cal` AS parentCode,
line.`CodeSet` AS codeSet,
line.`Category` AS nodeCategory,
line.`Type` AS nodeType,
line.`L1code` AS l1Code, line.`L1Description` AS l1Description, line.`L1Name` AS l1Name, line.`L1NameAb` AS l1NameAb,
line.`L2code` AS l2Code, line.`L2Description` AS l2Description, line.`L2Name` AS l2Name, line.`L2NameAb` AS l2NameAb,
line.`L3code` AS l3Code, line.`L3Description` AS l3Description, line.`L3Name` AS l3Name, line.`L3NameAb` AS l3NameAb,
line.`L1code` AS l4Code, line.`L4Description` AS l4Description, line.`L4Name` AS l4Name, line.`L4NameAb` AS l4NameAb,
line.`L1code` AS l5Code, line.`L5Description` AS l5Description, line.`L5Name` AS l5Name, line.`L5NameAb` AS l5NameAb,
line.`L1code` AS l6Code, line.`L6Description` AS l6Description, line.`L6Name` AS l6Name, line.`L6NameAb` AS l6NameAb,
codeSet + parentCode AS nodeId
CASE line.`Level`
WHEN '1' THEN CREATE (n0:Node{id:nodeId, description:l1Description, name:l1Name, nameAb:l1NameAb, category:nodeCategory, type:nodeType})
ELSE
END;
But I get this result:
WARNING: Invalid input 'S': expected 'l/L' (line 17, column 3 (offset:
982)) "CASE level " ^
I'm aware there is a mistake at syntax.
I'm using neo4j 3.0.4 & Windows 10 (using neo4j shell running it with D:\Program Files\Neo4j CE 3.0.4\bin>java -classpath neo4j-desktop-3.0.4.jar org.neo4j.shell.StartClient).
You have several syntax errors. For example, a CASE clause cannot contain a CREATE clause.
In any case, you should be able to greatly simplify your Cypher. For example, this might suit your needs:
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS
FROM {FILEPATH} AS line
WITH DISTINCT line, ('l' + line.Level) AS prefix
CREATE (:Node{
id: line.CodeSet + line.ParentCodeID_Cal,
description: line[prefix + 'Description'],
name: line[prefix + 'Name'],
nameAb: line[prefix + 'NameAb'],
category: line.Category,
type: line.Type})

Calculating the average of a column in csv per hour

I have a csv file that contains data in the following format.
Layer relative_time Ht BSs Vge Temp Message
57986 2:52:46 0.00m 87 15.4 None CMSG
20729 0:23:02 45.06m 82 11.6 None BMSG
20729 0:44:17 45.06m 81 11.6 None AMSG
I want to get read in this csv file and calculate the average BSs for every hour. My csv file is quite huge about 2000 values. However the values are not evenly distributed across every hour. For e.g.
I have 237 samples from hour 3 and only 4 samples from hour 6. Also I should mention that the BSs can be collected from multiple sources.The value always ranges from 20-100. Because of this it is giving a skewed result. For each hour I am calculating the sum of BSs for that hour divided by the number of samples in that hour.
The primary purpose is to understand how BSs evolves over time.
But what is the common approach to this problem. Is this where people apply normalization? It would be great if someone could explain how to apply normalization in such a situation.
The code I am using for my processing is shown below. I believe the code below is correct.
#This 24x2 matrix will contain no of values recorded per hour per hour
hours_no_values = [[0 for i in range(24)] for j in range(2)]
#This 24x2 matrix will contain mean bss stats per hour
mean_bss_stats = [[0 for i in range(24)] for j in range(2)]
with open(PREFINAL_OUTPUT_FILE) as fin, open(FINAL_OUTPUT_FILE, "w",newline='') as f:
reader = csv.reader(fin, delimiter=",")
writer = csv.writer(f)
header = next(reader) # <--- Pop header out
writer.writerow([header[0],header[1],header[2],header[3],header[4],header[5],header[6]]) # <--- Write header
sortedlist = sorted(reader, key=lambda row: datetime.datetime.strptime(row[1],"%H:%M:%S"), reverse=True)
print(sortedlist)
for item in sortedlist:
rel_time = datetime.datetime.strptime(item[1], "%H:%M:%S")
if rel_time.hour not in hours_no_values[0]:
print('item[6] {}'.format(item[6]))
if 'MAN' in item[6]:
print('Hour found {}'.format(rel_time.hour))
hours_no_values[0][rel_time.hour] = rel_time.hour
mean_bss_stats[0][rel_time.hour] = rel_time.hour
mean_bss_stats[1][rel_time.hour] += int(item[3])
hours_no_values[1][rel_time.hour] +=1
else:
pass
else:
if 'MAN' in item[6]:
print('Hour Previous {}'.format(rel_time.hour))
mean_bss_stats[1][rel_time.hour] += int(item[3])
hours_no_values[1][rel_time.hour] +=1
else:
pass
for i in range(0,24):
if(hours_no_values[1][i] != 0):
mean_bss_stats[1][i] = mean_bss_stats[1][i]/hours_no_values[1][i]
else:
mean_bss_stats[1][i] = 0
pprint.pprint('mean bss stats {} \n hour_no_values {} \n'.format(mean_bss_stats,hours_no_values))
The number of value per each hour are as follows for hours starting from 0 to 23.
[31, 117, 85, 237, 3, 67, 11, 4, 57, 0, 5, 21, 2, 5, 10, 8, 29, 7, 14, 3, 1, 1, 0, 0]
You could do it with pandas using groupby and aggregate to appropriate column:
import pandas as pd
import numpy as np
df = pd.read_csv("your_file")
df.groupby('hour')['BSs'].aggregate(np.mean)
If you don't have that column in initial dataframe you could add it:
df['hour'] = your_hour_data
numpy.mean - calculates the mean of the array.
Compute the arithmetic mean along the specified axis.
pandas.groupby
Group series using mapper (dict or key function, apply given function to group, return result as series) or by a series of columns
From pandas docs:
By “group by” we are referring to a process involving one or more of the following steps
Splitting the data into groups based on some criteria
Applying a function to each group independently
Combining the results into a data structure
Aggregation: computing a summary statistic (or statistics) about each group.
Some examples:
Compute group sums or means
Compute group sizes / counts