mysql sort by number and then varchar - mysql

I want to sort data like below in mysql
unsorted data
aa
1
2
f
11
3
df
10
Sorted data(desired output)
1
2
3
10
11
aa
df
f

Related

Windows partition by with condition

I have a Dataframe like this
Sys_id
Id
A
4
A
5
A
6
A
100
A
2
A
3
A
4
A
5
A
6
A
7
B
100
B
2
B
3
B
4
B
5
B
6
B
100
I want to fetch the Id's between Id==100 how can I get that by partition using sys_id
I want an output like this
Sys_id
Id
A
2
A
3
A
4
A
5
A
6
A
7
B
2
B
3
B
4
B
5
B
6
I tried using
Windowspec=Window.partitionBy("sys_id").orderBy("timestamp")
df=df.withColum("id",(df.id==100).cast("int")
df=df.withColumn("next_id",lead('id',1).over(Windowspec))
Is there any alternative way to get the answer?

Comparing the contents of two csv files, where the relation between the two files is specified in a third file?

I have two files with sales data, and I want to validate whether the sales numbers in the first file are the same as the sales numbers in the second file. But the product ID used in each file are different. I do have a 3rd file with the the correspondence between the old product ID and the new product ID.
Old Sales file
Product ID Store ID Week ID Sales
a 1 201801 5
a 2 201801 4
a 2 201802 3
b 1 201801 3
b 2 201802 4
b 3 201801 2
c 2 201802 2
New Sales file
Product ID Store ID Week ID Sales
X 1 201801 5
X 2 201801 4
X 2 201802 3
Y 1 201801 5
Y 2 201802 4
Y 3 201801 2
Z 2 201802 2
And an Old product ID/New Product ID correspondence file:
Old Product ID New Product ID
a X
b Y
c Z
I want to run a script or a command that could verify if the sales are the same for each product/store/week combination in both files. That is:
If a and X designated the same product, then I want to check if for a given store and a given week the sales will always match in both file.
Note that not all product present in the old sales file are necessarily present in the new sales file.
The output should look like:
Product ID Store ID Week ID Sales Diff
X 1 201801 0
X 2 201801 0
X 2 201802 0
Y 1 201801 2
Y 2 201802 0
Y 3 201801 0
Z 2 201802 0
I'm thinking of either pulling all 3 files into a bunch of pandas data frames and then merging and doing the validation using pandas merge and difference utilities, or pulling the files into some redshift tables and using SQL to validate. But both seem like overkill. Is there a simpler way of doing this using command line/bash utilities?
I'm a fan of the "do it in sql" approach, specifically, sqlite:
#!/bin/sh
oldsales="$1"
newsales="$2"
junction="$3"
# Import into database. Do once and reuse if running repeated reports on the same data
if [ ! -f sales.db ]; then
sqlite3 -batch sales.db <<EOF
CREATE TABLE old_sales(product_id TEXT, store_id INTEGER, week_id INTEGER, sales INTEGER
, PRIMARY KEY(product_id, store_id, week_id)) WITHOUT ROWID;
CREATE TABLE new_sales(product_id TEXT, store_id INTEGER, week_id INTEGER, sales INTEGER
, PRIMARY KEY(product_id, store_id, week_id)) WITHOUT ROWID;
CREATE TABLE mapping(old_id TEXT PRIMARY KEY, new_id TEXT) WITHOUT ROWID;
.mode csv
.separator \t
.import '|tail -n +2 "$oldsales"' old_sales
.import '|tail -n +2 "$newsales"' new_sales
.import '|tail -n +2 "$junction"' mapping
.quit
EOF
fi
# And query it
sqlite3 -batch sales.db <<EOF
.headers on
.mode list
.separator \t
SELECT n.product_id AS "Product ID", n.store_id AS "Store ID", n.week_id AS "Week ID"
, n.sales - o.sales AS "Sales Diff"
FROM old_sales AS o
JOIN mapping AS m ON o.product_id = m.old_id
JOIN new_sales AS n ON m.new_id = n.product_id
AND o.store_id = n.store_id
AND o.week_id = n.week_id
ORDER BY "Product ID", "Store ID", "Week ID";
.quit
EOF
This assumes your data files are delimited by tabs, and produces tab deliminated output (Easy to change if desired). It also caches the data in the file sales.db and re-uses that if it exists, so you can run the report multiple times on the same data and only populate the database the first time, for efficiencies's sake.
$ ./report.sh old_sales.tsv new_sales.tsv product_mappings.tsv
Product ID Store ID Week ID Sales Diff
X 1 201801 0
X 2 201801 0
X 2 201802 0
Y 1 201801 2
Y 2 201802 0
Y 3 201801 0
Z 2 201802 0
Here's a suggestion for your pandas approach. I called your old dataframe old and your new dataframe new:
First we use your third dataframe as a dictionary to map the old Product ID's to the new ones:
product_id_dct = dict(zip(df3['Old Product ID'], df3['New Product ID']))
old['Product ID'] = old['Product ID'].map(product_id_dct)
print(old)
Product ID Store ID Week ID Sales
0 X 1 201801 5
1 X 2 201801 4
2 X 2 201802 3
3 Y 1 201801 3
4 Y 2 201802 4
5 Y 3 201801 2
6 Z 2 201802 2
Then we do a left merge on the columns which you want to check the changes on. Note a left merge will give us all the matches, and the differences will show in NaN. In this case we don't have any:
new.merge(old, on=['Product ID', 'Store ID', 'Week ID', 'Sales'],
suffixes=['_new', '_old'],
how='left')
Product ID Store ID Week ID Sales
0 X 1 201801 5
1 X 2 201801 4
2 X 2 201802 3
3 Y 1 201801 3
4 Y 2 201802 4
5 Y 3 201801 2
6 Z 2 201802 2
If we leave sales out as a key, we can compare more easily because of the suffixes argument:
new.merge(old, on=['Product ID', 'Store ID', 'Week ID'],
suffixes=['_new', '_old'],
how='left')
Product ID Store ID Week ID Sales_new Sales_old
0 X 1 201801 5 5
1 X 2 201801 4 4
2 X 2 201802 3 3
3 Y 1 201801 3 3
4 Y 2 201802 4 4
5 Y 3 201801 2 2
6 Z 2 201802 2 2
$ cat tst.awk
BEGIN { OFS="\t" }
ARGIND==1 { map[$2] = $1; next }
ARGIND==2 { old[$1,$2,$3] = $4; next }
FNR==1 { gsub(/ +/,OFS); sub(/ *$/,"_Diff"); print; next }
{ print $1, $2, $3, $4 - old[map[$1],$2,$3] }
$ awk -f tst.awk map old new | column -s$'\t' -t
Product ID Store ID Week ID Sales_Diff
X 1 201801 0
X 2 201801 0
X 2 201802 0
Y 1 201801 2
Y 2 201802 0
Y 3 201801 0
Z 2 201802 0
The above uses GNU awk for ARGIND. With other awks just add the line FNR==1 { ARGIND++ } just after the BEGIN line.

How to read and bind column?

I have many files, but I can not find how to bind column.
For example, files are followed
[1.txt]
ID Score
A 1
B 2
C 3
D 4
[2.txt]
ID Score
A 2
B 2
C 3
D 4
[3.txt]
ID Score
A 4
B 4
C 5
D 3
I want to make
A 1 2 4
B 2 2 4
C 3 3 5
D 4 4 3
You could use cbind() as follows:
df_final <- cbind(cbind(df1, df2["Score"]), df3["Score"])
df_final
ID Score Score Score
1 A 1 2 4
2 B 2 2 4
3 C 3 3 5
4 D 4 4 3
Note that if you were trying to match IDs between data frames which did not coincidentally have the order you want, then you would be asking more for a database style join. In this case, R offers the merge() function from baseR.
Demo

Is it possible to display this data horizontally in SSRS?

I have a table that that is being joined by another table that looks like this:
Id Score Total
1 10 30
1 7 30
1 13 30
2 14 27
2 10 27
2 3 27
I want to be able to display this data like this in SSRS:
Id 1 2 3 Total
1 10 7 13 30
2 14 10 3 27
Can this be done and how?
You can do this by using a matrix.
You can add a row identifier for each id in your dataset (assuming you can modify the dataset, as you joined 2 tables). Below code is for SQL Server (T-SQL).
Select Id, Score, row_number() over (partition by id order by score) ident
from table
Output:
Id Score Ident
1 10 1
1 7 2
1 13 3
2 14 1
2 10 2
2 3 3
No need of the Total field, you can add it in matrix (Right Click on ColumnGroup>Add Total>After).
Use the above query in Matrix as shown below.

how to join two different datasets in ssrs

I have two different datasets source and destination datsets
Source Dataset
Type A B C D E F G
X 1 2 3 4 5 6 7
Y 2 1 3 5 6 7 8
Z 3 4 5 6 7 8 9
Destination Dataset
Type A B C D E F G
X 0 2 3 6 3 7 9
Y 1 1 5 5 4 8 0
Z 2 3 4 4 5 9 9
Is it possible two create a report in the following format?
Type A B C D E F G
Source X 1 2 3 4 5 6 7
Destin X 0 2 3 6 3 7 9
Source Y 2 1 3 5 6 7 8
Destin Y 1 1 5 5 4 8 0
Source Z 3 4 5 6 7 8 9
Destin Z 2 3 4 4 5 9 9
Handle this in SQL itself with query like this:
SELECT * FROM
(SELECT 'Source' AS myField, Type, A, B, C, D, E, F, G
FROM Table1 T1
UNION ALL
SELECT 'Destination' AS myField, Type, A, B, C, D, E, F, G
FROM Table1 T2 ) A
ORDER BY myField Desc, Type
It will be better way instead of handling it in SSRS.
To solve it in SSRS, you would need to know if the Types in both the datasets are mutually exclusive or not. If there are Types which exists in one but not in other, then you would have to do lot of hardcoding. All changes in the input data you would need to change the report. If the types in both dataset are not mutually exclusive then you might be able to use Lookup functions.
You can use Lookup functionality,
OR instead of doing the join in SSRS it's better to do this is in SQL.