sql/mysql Multiple results from one value - mysql

I am working on an integration output report to feed into a purchase order system.
All of the possible items are straightforward except for one particular item, which has to be broke out into multiple components.
The current code:
select concat("M", id) as ID, MaterialId, quantity, uom,
crew_job.JobSubNbr, crew_job.EmployeeId, crewjob_schedule_actual.startTime
from crewjob_material_actual
inner join crew_job on crewjob_material_actual.crew_job_id = crew_job.crew_job_id
inner join crewjob_schedule_actual on
crewjob_material_actual.crew_job_id = crewjob_schedule_actual.crew_job_id
And the current results output:
What I need to do is something of the effect of (and this is in plain English)
IF "MaterialID" = '3'
THEN [the results should be] show me 3 rows of data such as:
ID = MZ4931, MaterialID = 100, Qty = 0.25, UOM = 1 .... (all else the
same) ID = MZ4932, MaterialID = 101, Qty = 0.50, UOM = 1 .... (all
else the same) ID = MZ4933, MaterialID = 102, Qty = 0.33, UOM = 2
.... (all else the same)
Essential, item #3 is a "combined item" that I need to apply based on a standard ratio, where 1 unit of Item 3 is equal to .25 of item 100, .50 of item 101, and 0.33 of item 102. I'm sure this isn't too difficult but I was having a hard time searching.

Related

Case statements for multiple fields when only certain cases exist for each field

We have an inventory feature where we generate Bills. There is an Edit Bill API call. We have implemented it as PATCH call.
A Bill with id = 1 has 2 LineItems :
| Stock Id | Qty | Rate |
| 10 | 2 | 10 |
| 11 | 3 | 20 |
Now lets say I want to change the quantity for stock Id : 10 to 5 and I want to change the rate for stock Id : 11 to 40
We have represented it as PATCH Call :
bill : {
id : 1
lineItems : [
{
stockId : 10,
qty : 5,
},
{
stockId : 11,
rate : 40
}
]
}
In the backend we run following query :
UPDATE `billlineitem`
SET `rate` = ( CASE
WHEN stockid = 11 THEN '40'
ELSE rate
END ),
`qty` = ( CASE
WHEN stockid = 10 THEN 5
ELSE qty
END ),
`updated_billitemquantity_at` = '2019-09-06 05:16:06.219'
WHERE `bill_id` = '1'
AND `stockid` IN ( 10, 11 )
Is it ok, in the above case when there is no change for an attribute then the else clause will take the value from the database for that attribute. The above update statement is run in a transaction.
Is this a correct approach? Will this do an update for every attribute for every stock Id. Is there a better approach?
We are using MySQL DB.
What you've written should work, but it will get very complex if you have to update different columns for many different stock IDs. It would probably be simpler, and maybe better performance, to do a separate query for each ID.
BEGIN TRANSACTION;
UPDATE billlineitem
SET rate = '40', `updated_billitemquantity_at` = '2019-09-06 05:16:06.219'
WHERE stockid = 10;
UPDATE billlineitem
SET qty = 5, `updated_billitemquantity_at` = '2019-09-06 05:16:06.219'
WHERE stockid = 11;
COMMIT;

Formatting data in a CSV file (calculating average) in python

import csv
with open('Class1scores.csv') as inf:
for line in inf:
parts = line.split()
if len(parts) > 1:
print (parts[4])
f = open('Class1scores.csv')
csv_f = csv.reader(f)
newlist = []
for row in csv_f:
row[1] = int(row[1])
row[2] = int(row[2])
row[3] = int(row[3])
maximum = max(row[1:3])
row.append(maximum)
average = round(sum(row[1:3])/3)
row.append(average)
newlist.append(row[0:4])
averageScore = [[x[3], x[0]] for x in newlist]
print('\nStudents Average Scores From Highest to Lowest\n')
Here the code is meant to read the CSV file and in the first three rows (row 0 being the users name) it should add all the three scores and divide by three but it doesn't calculate a proper average, it just takes the score from the last column.
Basically you want statistics of each row. In general you should do something like this:
import csv
with open('data.csv', 'r') as f:
rows = csv.reader(f)
for row in rows:
name = row[0]
scores = row[1:]
# calculate statistics of scores
attributes = {
'NAME': name,
'MAX' : max(scores),
'MIN' : min(scores),
'AVE' : 1.0 * sum(scores) / len(scores)
}
output_mesg ="name: {NAME:s} \t high: {MAX:d} \t low: {MIN:d} \t ave: {AVE:f}"
print(output_mesg.format(**attributes))
Try not to consider if doing specific things is inefficient locally. A good Pythonic script should be as readable as possible to every one.
In your code, I spot two mistakes:
Appending to row won't change anything, since row is a local variable in for loop and will get garbage collected.
row[1:3] only gives the second and the third element. row[1:4] gives what you want, as well as row[1:]. Indexing in Python normally is end-exclusive.
And some questions for you to think about:
If I can open the file in Excel and it's not that big, why not just do it in Excel? Can I make use of all the tools I have to get work done as soon as possible with least effort? Can I get done with this task in 30 seconds?
Here is one way to do it. See both parts. First, we create a dictionary with names as the key and a list of results as values.
import csv
fileLineList = []
averageScoreDict = {}
with open('Class1scores.csv', newline='') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
fileLineList.append(row)
for row in fileLineList:
highest = 0
lowest = 0
total = 0
average = 0
for column in row:
if column.isdigit():
column = int(column)
if column > highest:
highest = column
if column < lowest or lowest == 0:
lowest = column
total += column
average = total / 3
averageScoreDict[row[0]] = [highest, lowest, round(average)]
print(averageScoreDict)
Output:
{'Milky': [7, 4, 5], 'Billy': [6, 5, 6], 'Adam': [5, 2, 4], 'John': [10, 7, 9]}
Now that we have our dictionary, we can create your desired final output by sorting the list. See this updated code:
import csv
from operator import itemgetter
fileLineList = []
averageScoreDict = {} # Creating an empty dictionary here.
with open('Class1scores.csv', newline='') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
fileLineList.append(row)
for row in fileLineList:
highest = 0
lowest = 0
total = 0
average = 0
for column in row:
if column.isdigit():
column = int(column)
if column > highest:
highest = column
if column < lowest or lowest == 0:
lowest = column
total += column
average = total / 3
# Here is where we put the emtpy dictinary created earlier to good use.
# We assign the key, in this case the contents of the first column of
# the CSV, to the list of values.
# For the first line of the file, the Key would be 'John'.
# We are assigning a list to John which is 3 integers:
# highest, lowest and average (which is a float we round)
averageScoreDict[row[0]] = [highest, lowest, round(average)]
averageScoreList = []
# Here we "unpack" the dictionary we have created and create a list of Keys.
# which are the names and single value we want, in this case the average.
for key, value in averageScoreDict.items():
averageScoreList.append([key, value[2]])
# Sorting the list using the value instead of the name.
averageScoreList.sort(key=itemgetter(1), reverse=True)
print('\nStudents Average Scores From Highest to Lowest\n')
print(averageScoreList)
Output:
Students Average Scores From Highest to Lowest
[['John', 9], ['Billy', 6], ['Milky', 5], ['Adam', 4]]

R- collapse rows based on contents of two columns

I apologize in advance if this question is too specific or involved for this type of forum. I have been a long time lurker on this site, and this is the first time I haven't been able to solve my issue by looking at previous questions, so I finally decided to post. Please let me know if there is a better place to post this, or if you have advice on making it more clear. here goes.
I have a data.table with the following structure:
library(data.table)
dt = structure(list(chr = c("chr1", "chr1", "chr1", "chr1", "chrX",
"chrX", "chrX", "chrX"), start = c(842326, 855423, 855426, 855739,
153880833, 153880841, 154298086, 154298089), end = c(842327L,
855424L, 855427L, 855740L, 153880834L, 153880842L, 154298087L,
154298090L), meth.diff = c(9.35200555410902, 19.1839617944039,
29.6734426495636, -12.3375577709254, 50.5830043986142, 52.7503561092491,
46.5783738475184, 41.8662800742733), mean_KO = c(9.35200555410902,
19.1839617944039, 32.962962583692, 1.8512250859083, 51.2741224212646,
53.0928367727283, 47.4901932463221, 44.8441659366298), mean_WT = c(0,
0, 3.28951993412841, 14.1887828568337, 0.69111802265039, 0.34248066347919,
0.91181939880374, 2.97788586235646), coverage_KO = c(139L, 55L,
55L, 270L, 195L, 194L, 131L, 131L), coverage_WT = c(120L, 86L,
87L, 444L, 291L, 293L, 181L, 181L)), .Names = c("chr", "start",
"end", "meth.diff", "mean_KO", "mean_WT", "coverage_KO", "coverage_WT"
), class = c("data.table", "data.frame"), row.names = c(NA, -8L
))
These are genomic coordinates with associated values, the file is sorted by by chromosome ("chr") (1 through 22, then X, then Y), start and end position so that the first row contains the lowest numbered start position on chromosome 1, and proceeds sequentially for all data points on chromosome 1, then 2, etc. At this point, every single row has a start-end length of 1. After collapsing the start-end lengths will vary depending on how many rows were collapsed and their distance from the adjacent row.
1st: I would like to collapse adjacent rows into larger start/end ranges based on the following criteria:
The two adjacent rows share the same value for the "chr" column (row 1 "chr" = chr1, and row 2 "chr" = chr1)
The two adjacent rows have "start" coordinate within 500 of one another (if row 1 "start" = 1000, and row 2 "start" <= 1499, collapse these into a single row; if row1 = 1000 and row2 = 1500, keep separate)
The adjacent rows must have the same sign for the "diff" column (i.e. even if chr = chr and start within 500, if diff1 = + 5 and diff2 = -5, keep entries separate)
2nd: I would like to calculate the coverage_ weighted averages of the collapsed mean_KO/WT columns with the weighting by the coverage_KO/WT columns:
Ex: collapse 2 rows,
row 1 mean_1 = 5.0, coverage_1 = 20.
row 2 mean_1 =40.0, coverage_1 = 45.
weighted avg mean_1 = (((5.0*20)/(20+45)) + ((40.0*45)/(20+45))) = 29.23
What I would like the output to look like (except collapsed row means would be calculated and not in string form):
library(data.table)
dt_output = structure(list(chr = c("chr1", "chr1", "chr1", "chrX", "chrX"
), start = c(842326, 855423, 855739, 153880833, 154298086), end = c(842327,
855427, 855740, 153880842, 154298090), mean_1 = c("9.35", "((19.18*55)/(55+55)) + ((32.96*55)/(55+55))",
"1.85", "((51.27*195)/(195+194)) + ((53.09*194)/(195+194))",
"((47.49*131)/(131+131)) + ((44.84*131)/(131+131))"), mean_2 = c("0",
"((0.00*86)/(86+87)) + ((3.29*87)/(86+87))", "14.19", "((0.69*291)/(291+293)) + ((0.34*293)/(291+293))",
"((0.91*181)/(181+181)) + ((2.98*181)/(181+181))")), .Names = c("chr",
"start", "end", "mean_1", "mean_2"), row.names = c(NA, -5L), class = c("data.table", "data.frame"))
Help with either part 1 or 2 or any advice is appreciated.
I have been using R for most of my data manipulations, but I am open to any language that can provide a solution. Thanks in advance.

Issue with Union Sub-query

I'm attempting to use a union sub query to get the results for a couple different queries. What I'm looking to do is select all the players who hit a home run in the 2014 season, create a home run count for each player and find the average pitch speed of each home run. I'm also attempting to break things down by pitch type, my current code and result are as follows:
Select output.Batter_Name,
output.Qty,
output.speed,
output.avg_Speed,
output.break,
output.Type_Pitch,
Output.CH_Qty,
Output.CH_Pitch,
Output.Ch_Speed,
Output.CH_Avg_speed,
Output.CH_Break,
Output.CH_Type_Pitch
From(
SELECT
count(gameday.atbats.event) as Qty,
gameday.batters.name_display_first_last as Batter_Name,
gameday.pitches.type as Pitch,
gameday.pitches.start_speed as speed,
avg(gameday.pitches.start_speed) as avg_speed,
avg(gameday.pitches.break_length) as Break,
gameday.pitches.Pitch_type as Type_Pitch,
"0" as CH_Qty,
"0" as CH_Pitch,
"0" as Ch_Speed,
"0" as CH_Avg_speed,
"0" as CH_Break,
"0" as CH_Type_Pitch
FROM
gameday.atbats
JOIN
gameday.pitches ON gameday.atbats.num = gameday.pitches.gameAtBatID
AND gameday.pitches.gamename = gameday.atbats.gamename
INNER JOIN
gameday.batters ON gameday.atbats.batter = gameday.batters.ID
AND gameday.atbats.gamename = gameday.batters.gameName
INNER JOIN
gameday.pitchers ON gameday.atbats.pitcher = gameday.pitchers.ID
AND gameday.atbats.gamename = gameday.pitchers.gamename
WHERE
(gameday.atbats.event = 'Home Run')
AND gameday.pitches.type = 'x'
and gameday.pitches.Pitch_type = 'FF'
group by gameday.batters.name_display_first_last
UNION ALL
SELECT
"0" as Qty,
gameday.batters.name_display_first_last as Batter_Name,
"0" as Pitch,
"0" as Speed,
"0" as Avg_speed,
"0" as Break,
"0" as Type_Pitch,
count(gameday.atbats.event) as CH_Qty,
gameday.pitches.type as CH_Pitch,
gameday.pitches.start_speed as CH_speed,
avg(gameday.pitches.start_speed) as CH_avg_speed,
avg(gameday.pitches.break_length) as CH_Break,
gameday.pitches.Pitch_type as CH_Type_Pitch
FROM
gameday.atbats
JOIN
gameday.pitches ON gameday.atbats.num = gameday.pitches.gameAtBatID
AND gameday.pitches.gamename = gameday.atbats.gamename
INNER JOIN
gameday.batters ON gameday.atbats.batter = gameday.batters.ID
AND gameday.atbats.gamename = gameday.batters.gameName
INNER JOIN
gameday.pitchers ON gameday.atbats.pitcher = gameday.pitchers.ID
AND gameday.atbats.gamename = gameday.pitchers.gamename
WHERE
(gameday.atbats.event = 'Home Run')
AND gameday.pitches.type = 'x'
and gameday.pitches.Pitch_type = 'CH'
group by gameday.batters.name_display_first_last
) as Output
group by Output.Batter_name
A Sample of my results are below:
Batter_Name, Qty, speed, avg_Speed, break, Type_Pitch, CH_Qty, CH_Pitch, Ch_Speed, CH_Avg_speed, CH_Break, CH_Type_Pitch
A.J. Pollock 1 89 90 4.3 FF 0 0 0 0 0 0
Aaron Hicks 0 0 0 0 0 1 X 83 83 6 CH
The first player, Ellis shows that he had one home run on a FF, and zero on a CH. The 2nd player,Peirzynski, had 0 home runs on a FF, but 1 on a CH. The issue is that I know these players had home runs on both types of pitches, but the query is only one or the other, not both. My intended results are something like this:
Batter_Name, Qty, speed, avg_Speed, break, Type_Pitch, CH_Qty, CH_Pitch, Ch_Speed, CH_Avg_speed, CH_Break, CH_Type_Pitch
A.J. Pollock 1 89 90 4.3 FF 2 X 84 82 3.2 CH
Aaron Hicks 4 90 91 2.5 FF 1 X 83 83 6 CH
I'm thinking the issue has to be my setting some fields to 0, kind of like a place holder, but i cant seem to find a workable solution that gets me the results I want.

MySQL: Value of child lines based on proportion of cost?

We have a packslip lines table with the following structure (simplified):
line_id (unique id for packslip line)
sku (item #)
name
weight
value (value/price of item)
cost
is_kit (is this a kit/parent item?)
parent_line_id (if it's a child item, will contain line_id of parent)
A packslip line can represent an individual product, a parent kit, or kit components. For this exercise, use the following data set:
1, 'ITEM1', 'Item # 1', 0.3, 9.99, 4.79, 0, null
2, 'KIT1', 'Kit # 1', 1.3, 29.99, 0, 1, null
3, 'KITITEM1', 'Kit Item # 1', 0.7, 0, 10.0, 0, 2
4, 'KITITEM2', 'Kit Item # 2', 0.3, 0, 2.49, 0, 2
5, 'KITITEM3', 'Kit Item # 3', 0.3, 0, 4.29, 0, 2
As you can hopefully see, ITEM1 is a regular/individual product, KIT1 is a parent kit, and the last 3 items are child components for KIT1.
Notice that the kit lacks a cost and that the kit items lack a value. I need to create a query that will calculate the kit item values based on the proportion of the items' costs to the overall cost of the kit.
So in this example:
KITITEM1 Value = 10 / (10.0 + 2.49 + 4.29) * 29.99 = $17.87
KITITEM2 Value = 2.49 / (10.0 + 2.49 + 4.29) * 29.99 = $4.45
KITITEM3 Value = 4.29 / (10.0 + 2.49 + 4.29) * 29.99 = $7.67
Can I accomplish this in a single query (can have nested queries)? How?
try this query, sqlFiddle
SELECT T1.line_id,
T1.sku,
T1.name,
T1.weight,
IF (T1.parent_line_id IS NULL,T1.value,
ROUND(T1.cost * T2.value_divided_by_total_cost,2))as value,
T1.cost,
T1.is_kit,
T1.parent_line_id
FROM packslip T1
LEFT JOIN
(SELECT parent_line_id,(SELECT value FROM
packslip p2 WHERE p1.parent_line_id = p2.line_id)
/SUM(cost)
as value_divided_by_total_cost
FROM packslip p1
WHERE parent_line_id IS NOT NULL
GROUP BY parent_line_id
)T2
ON T1.parent_line_id = T2.parent_line_id
The query LEFT JOIN with a derived table that gets the (value of parent) divided by SUM(cost) GROUP BY that parent.
Then the outer query checks to see if parent_line_id IS NOT NULL and multiplies the cost by value from the derived table.