MYSQL 3 tables has amount column sum - mysql

Mysql sum 3 column different tables
Table list below
budget
|b_id |amount |
| 1| 100|
| 2| 200|
cash_advance
|ca_id |b_id |ca_amount |
| 1| 1| 100 |
| 2| 2| 200 |
expenses
|exp_id|ca _id|exp_amount|
| 1| 1| 100|
| 2| 2| 40|
| 3| 2| 160|
i want this result
resul
|sum(b_amount)|sum(ca_amount)|sum(exp_amount)|
| 100| 100| 100|
| 200| 200| 200|
any mysql query? thanks

You are trying to access the network on your UI thread. This is bad because it will freeze the UI until the network response has returned. You should do this network access on a separate thread.
There are many options, but the simplest option would be:
Convert msg in MimeMessage msg = new MimeMessage(session); to be final.
Wrap Transport.send(msg); as new Thread(new Runnable() { #Override public void run() { Transport.send(msg); } }).start();

The log is indicating that you're doing network tasks on your main thread, the UI / Activity thread. Use an AsyncTask for those tasks instead.
Android forbids those tasks on your main thread, because they block the UI and make it unusable until your task is finished.

In android 3.0 and higher network connection on the main thread isnt permitted, strictMode is turned on automatically.
To fix this issue, you must perform the network connection on a separate thread... for example, using an AsyncTask, Threads, Handler.

Related

Wrong encoding when reading csv file with pyspark

For my course in university, I run pyspark-notebook docker image
docker pull jupyter/pyspark-notebook
docker run -it --rm -p 8888:8888 -v /path/to/my/working/directory:/home/jovyan/work jupyter/pyspark-notebook
And then run next python code
import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.types import *
sc = pyspark.SparkContext('local[*]')
spark = SparkSession(sc)
spark
listings_df = spark.read.csv("listings.csv", header=True, mode='DROPMALFORMED')
# adding encoding="utf8" to the line above doesn't help also
listings_df.printSchema()
The problem appears during reading a file. It seems that spark reads my file incorrectly (possibly because of encodings problem?) and after reading listings_df has 16494 lines, while the correct number of lines is 16478 (checked with pandas.read_csv()). You can see that something definitely is broken also by running
listings_df.groupBy("room_type").count().show()
which gives next output
+---------------+-----+
| room_type|count|
+---------------+-----+
| 169| 1|
| 4.88612| 1|
| 4.90075| 1|
| Shared room| 44|
| 35| 1|
| 187| 1|
| null| 16|
| 70| 1|
| 27| 1|
| 75| 1|
| Hotel room| 109|
| 198| 1|
| 60| 1|
| 280| 1|
|Entire home/apt|12818|
| 220| 1|
| 190| 1|
| 156| 1|
| 450| 1|
| 4.88865| 1|
+---------------+-----+
only showing top 20 rows
while real room_type values are only ['Private room', 'Entire home/apt', 'Hotel room', 'Shared room'].
Spark info which might be useful:
SparkSession - in-memory
SparkContext
Spark UI
Version
v3.1.2
Master
local[*]
AppName
pyspark-shell
And encoding of the file
!file listings.csv
listings.csv: UTF-8 Unicode text
listings.csv is an Airbnb statistics csv file downloaded from here
All run & drive code I've also uploaded to Colab
There are two things that I've found:
Some lines have quotes to escape (escape='"')
Also #JosefZ has mentioned about unwanted line breaks (multiLine=True)
That's how you must read it:
input_df = spark.read.csv(path, header=True, multiLine=True, escape='"')
output_df = input_df.groupBy("room_type").count()
output_df.show()
+---------------+-----+
| room_type|count|
+---------------+-----+
| Shared room| 44|
| Hotel room| 110|
|Entire home/apt|12829|
| Private room| 3495|
+---------------+-----+
I think encoding the file from here should solve the problem. So you add encoding="utf8" to your tuple of the variable listings_df.
As shown below:
listings_df = spark.read.csv("listings.csv", encoding="utf8", header=True, mode='DROPMALFORMED')

MySQL : Having a table with minimum and maximum value columns, how could I find the "highest" values if my search term exceeds maximum value?

Sorry if my question is a bit confusing, but database design (nor queries) are not my strong points.
Let's say I sell a product, which are cables. And those products have three "variations", which
prices would be applied in layers like 'post-processings':
Type 1: Ordinary cable.
Type 2: Ordinary cable plus custom color.
Type 3: Ordinary cable plus custom color plus terminals.
Also, the final price of the cable will depend on the length, and the more meters of cable you buy, the less price per meter I will apply.
So, I've designed a cable_pricings table like this:
id|product_id|product_type_id|min_length|max_length|price|
--|----------|---------------|----------|----------|-----|
1| 1| 1| 0| 10| 0.50|
2| 1| 1| 10| 20| 0.45|
3| 1| 1| 20| 40| 0.40|
4| 1| 1| 40| 50| 0.30|
5| 1| 1| 50| 60| 0.25|
6| 1| 1| 60| 0| 0.15|
7| 1| 2| 0| 10| 0.35|
8| 1| 2| 10| 20| 0.30|
9| 1| 2| 20| 40| 0.30|
10| 1| 2| 40| 50| 0.20|
11| 1| 2| 50| 60| 0.20|
12| 1| 2| 60| 0| 0.20|
13| 1| 3| 0| 10| 0.40|
14| 1| 3| 10| 20| 0.40|
15| 1| 3| 20| 40| 0.30|
16| 1| 3| 40| 50| 0.30|
17| 1| 3| 50| 60| 0.25|
18| 1| 3| 60| 0| 0.25|
Now with this structure, let's say I want to buy 47 meters of cable, with custom color. With a single query like this:
SELECT * FROM cable_pricings
WHERE product_id = 2
AND product_type_id IN (1,2)
AND min_length <= 47
AND max_length > 47;
I got two rows which will hold those type of cable and be in the length intervals, then on my server code, I iterate over results and get final price. Up to here, everything good.
But my problem is on the "edge" cases:
If I want to buy 60 meters of cable, my query won't work, as max_length is 0.
If I want to buy more than 60 meters of cable, my approach won't work as well, because in that case none of the conditions will apply.
I've already tried with MAXs, MINs, but I'm not getting the expected results (and I think aggregate functions check the whole table, so I'd like to -if that's possible- not to use aggregates).
I also thought to put on the 'edge' max_length the value of 9999999 but I think that's just... a dirty fix. Also, this will be managed from a backend, and I don't expect the final user writing lots of 999999s on edge case.
Then my questions are:
Can I solve "edge" cases with a single query? Or I have to split my cases into two separate queries?
Is my table design correct at all?
you can change
AND max_length > 47
To:
AND (max_length > 47 OR max_length = 0)
I wozuld use this query
SELECT *
FROM cable_pricings
WHERE product_id = 1
AND product_type_id IN (1,2)
AND min_length >= 60
AND (max_length > 60 OR max_length = 0);
dbfille exampole
You have to include the null else the max restriction will not trigger

How can I use a function in dataframe withColumn function in Pyspark?

I have the some dictionaries and a function defined:
dict_TEMPERATURE = {(0, 70): 'Low', (70.01, 73.99): 'Normal-Low',(74, 76): 'Normal', (76.01, 80): 'Normal-High', (80.01, 300): 'High'}
...
hierarchy_dict = {'TEMP': dict_TEMPERATURE, 'PRESS': dict_PRESSURE, 'SH_SP': dict_SHAFT_SPEED, 'POI': dict_POI, 'TRIG': dict_TRIGGER}
def function_definition(valor, atributo):
dict_atributo = hierarchy_dict[atributo]
valor_generalizado = None
if isinstance(valor, (int, long, float, complex)):
for key, value in dict_atributo.items():
if(isinstance(key, tuple)):
lista = list(key)
if (valor > key[0] and valor < key[1]):
valor_generalizado = value
else: # if it is not numeric
valor_generalizado = dict_atributo.get(valor)
return valor_generalizado
What this function basically do is: check the value which is passed as an argument to the "function_definition" function, and replace its value according to its dictionary's references.
So, if I call "function_definition(60, 'TEMP')" it will return 'LOW'.
On the other hand, I have a dataframe with the next structure (this is an example):
+----+-----+-----+---+----+
|TEMP|SH_SP|PRESS|POI|TRIG|
+----+-----+-----+---+----+
| 0| 1| 2| 0| 0|
| 0| 2| 3| 1| 1|
| 0| 3| 4| 2| 1|
| 0| 4| 5| 3| 1|
| 0| 5| 6| 4| 1|
| 0| 1| 2| 5| 1|
+----+-----+-----+---+----+
What I want to do is to replace the values of one column of the dataframe based on the function defined above, so I have the next code-line:
dataframe_new = dataframe.withColumn(atribute_name, function_definition(dataframe[atribute_name], atribute_name))
But I get the next error message when executing it:
AssertionError: col should be Column
What is wrong in my code? How could do that?
Your function_definition(valor,atributo) returns a single String (valor_generalizado) for a single valor.
AssertionError: col should be Column means that you are passing an argument to WithColumn(colName,col) that is not a Column.
So you have to transform your data, in order to have Column, for example as you can see below.
Dataframe for example (same structure as yours):
a = [(10.0,1.2),(73.0,4.0)] # like your dataframe, this is only an example
dataframe = spark.createDataFrame(a,["tp", "S"]) # tp and S are random names for these columns
dataframe.show()
+----+---+
| tp| S|
+----+---+
|10.0|1.2|
|73.0|4.0|
+----+---+
As you can see here
udf Creates a Column expression representing a user defined function (UDF).
Solution:
from pyspark.sql.functions import udf
attr = 'TEMP'
udf_func = udf(lambda x: function_definition(x,attr),returnType=StringType())
dataframe_new = dataframe.withColumn("newCol",udf_func(dataframe.tp))
dataframe_new.show()
+----+---+----------+
| tp| S| newCol|
+----+---+----------+
|10.0|1.2| Low|
|73.0|4.0|Normal-Low|
+----+---+----------+

Join query issue in MySQL

I'm storing the records in hierarchy.
Ex.
Account -> Hospital -> Department
Account -> Hospital -> Department -> Section
I'm storing the association of all the records in following manner.
+------+---------------+----------+---------------+-----------+
| Id | ParentType | ParentId | Child Type | ChildId |
+------+---------------+----------+---------------+-----------+
| 1| account| 1| hospital| 10|
| 2| account| 1| hospital| 20|
| 3| hospital| 10| department| 100|
| 4| hospital| 10| department| 101|
| 5| department| 100| device| 1000|
| 6| department| 101| device| 1001|
| 6| department| 101| device| 1002|
| 1| account| 2| hospital| 30|
| 2| account| 2| hospital| 40|
| 3| hospital| 30| department| 200|
| 4| hospital| 40| department| 201|
| 5| department| 200| section| 5000|
| 5| department| 200| section| 5001|
| 6| section| 5000| device| 2001|
| 6| section| 5001| device| 2002|
+------+---------------+----------+---------------+-----------+
So, account with id 1, follows first hierarchy; whereas account with id 2 follows second hierarchy.
I need to fetch the records for the given level.
Ex.
Get all the devices belonging to account with id = 1
Get all the devices belonging to department with id = 200 and account with id = 2
and so on.
I can retrieve these with queries like:
First query:
SELECT a3.ChildType, a3.ChildId FROM association_lookup a1 -- [got hosp level]
JOIN association_lookup a2 ON a2.parentId = a1.ChildId -- [got dept level]
JOIN association_lookup a3 ON a3.parentId = a2.ChildId AND a3.ParentType = a2.ChildType -- [got device level]
WHERE a1.ParentId = 1 AND a1.ParentType = 'account'
AND a3.ChildType = 'device'
I can make this as dynamic query with self joins equal to level difference - 1. i.e. account level = 0, device level = 3; hence 2 joins.
But now, if I want to associate device against hospital level instead of department level; like:
| xx| hospital| 10| device| 1003|
then for the same query this device will be skipped and only the devices associated with department level will be returned. How can I get all the devices (i.e. under both hospital level and department level).
That is a horrible way to store data.
I suggest restructuring and creating separate tables each entity.
I.e. create table account, create table hospital ...
Then you can jion properly. Everything else would require dynamic iterative selection which is not built in to mysql and needs to be done with an external program or by hand.
You can write a script to dynamicall generate a table for each parenttype and childtype though.

CakePHP (sub)query to get youngest record of each group

In a project I manage invoices that have a status which is changed throughout their lifetime. The status changes are saved in another database table which is similar to this:
|id|invoice_id|user_id|old_status_id|new_status_id|change_date |
-----------------------------------------------------------------------
| 1| 1| 1| 1| 3|2013-11-11 12:00:00|
| 2| 1| 2| 3| 5|2013-11-11 12:30:00|
| 3| 2| 3| 1| 2|2013-11-10 08:00:00|
| 4| 1| 1| 5| 6|2013-11-11 13:10:00|
| 5| 2| 2| 2| 5|2013-11-10 09:00:00|
For each invoice, I would like to retrieve the last status change. Thus the result should contain the records with the ids 4 and 5.
|id|invoice_id|user_id|old_status_id|new_status_id|change_date |
-----------------------------------------------------------------------
| 4| 1| 1| 5| 6|2013-11-11 13:10:00|
| 5| 2| 2| 2| 5|2013-11-10 09:00:00|
If I group by the invoice_id and use max(change_date), I will retrieve the youngest date, but the field values of the other fields are not taken from those records containing the youngest date in the group.
That's challenge #1 for me.
Challenge #2 would be to realize the query with CakePHP's methods, if possible.
Challenge #3 would be to filter the result to those records belonging to the current user. SO if the current user has the id 1, the result is
|id|invoice_id|user_id|old_status_id|new_status_id|change_date |
-----------------------------------------------------------------------
| 4| 1| 1| 5| 6|2013-11-11 13:10:00|
If he or she has user id 2, the result is
|id|invoice_id|user_id|old_status_id|new_status_id|change_date |
-----------------------------------------------------------------------
| 5| 2| 2| 2| 5|2013-11-10 09:00:00|
For the user with id 3 the result would be empty.
In other words, I do not want to find all latest changes that a user has made, regardless whether he was the last one that made a change. Instead, I want to find all invoice changes where that user was the ast one so far who made a change. The motivation is that I want to enable a user to undo his change, which is only possible if no other user after him performed another change.
In case anyone needs an answer
Strictly focusing on:
I want to find all invoice changes where that user was the last one so far who made a change
Write the SQL as
SELECT foo.*
FROM foo
LEFT JOIN foo AS after_foo
ON foo.invoice_id = after_foo.invoice_id
AND foo.change_date < after_foo.change_date
WHERE after_foo.id IS NULL
AND foo.user_id = 1;
Implement using the JOIN clause within Cakephp's find.
The SQL for the suggested algorithm is something like:
SELECT foo.*
FROM foo
JOIN (SELECT invoice_id, MAX(change_date) AS most_recent
FROM foo
GROUP BY invoice_id) AS recently
ON recently.invoice_id = foo.invoice_id
AND recently.most_recent = foo.change_date
WHERE foo.user_id = 1;