I'm trying to make a cross tabulation in R, and having its output resemble as much as possible what I'd get in an Excel pivot table. The objective is to replace a report made manually with Excel and Word with one automated with R Markdown; data wrangling and charts have been already taken care of but some tables are missing. So, given this code:
set.seed(2)
df<-data.frame("ministry"=paste("ministry ",sample(1:3,20,replace=T)),"department"=paste("department ",sample(1:3,20,replace=T)),"program"=paste("program ",sample(letters[1:20],20,replace=F)),"budget"=runif(20)*1e6)
library(tables)
library(dplyr)
arrange(df,ministry,department,program)
tabular(ministry*department~((Count=budget)+(Avg=(mean*budget))+(Total=(sum*budget))),data=df)
which yields (actual data is much more complicated):
Avg Total
ministry department Count budget budget
ministry 1 department 1 5 479871 2399356
department 2 1 770028 770028
department 3 1 184673 184673
ministry 2 department 1 2 170818 341637
department 2 1 183373 183373
department 3 3 415480 1246440
ministry 3 department 1 0 NaN 0
department 2 5 680102 3400509
department 3 2 165118 330235
This is as close as I could get to Excel results. I need to display subtotals, like this (generated in Excel using the exact same data):
Is it possible at all to get something like this in R (without manually coding the table cell-by-cell)?
Thanks!
Replace the left hand side with:
ministry * (department + 1) + 1
That is, try this:
tabular(ministry * (department + 1) + 1 ~
((Count = budget) + (Avg = (mean * budget)) + (Total = (sum * budget))),
data = df)
giving:
Avg Total
ministry department Count budget budget
ministry 1 department 1 5 479871 2399356
department 2 1 770028 770028
department 3 1 184673 184673
All 7 479151 3354057
ministry 2 department 1 2 170818 341637
department 2 1 183373 183373
department 3 3 415480 1246440
All 6 295242 1771449
ministry 3 department 1 0 NaN 0
department 2 5 680102 3400509
department 3 2 165118 330235
All 7 532963 3730744
All 20 442813 8856250
Update: correction.
Related
I have a database of transactions like in the table below
user_id order_id order_number product_name n
<int> <int> <int> <fctr> <int>
1 11878590 3 Pistachios 1
1 11878590 3 Soda 1
1 12878790 4 Yogurt 1
1 12878790 4 Cheddar Popcorn 1
1 12878790 4 Cinnamon Toast Crunch 1
2 12878791 11 Milk Chocolate Almonds 1
2 12878791 11 Half & Half 1
2 12878791 11 String Cheese 1
11 12878792 19 Whole Milk 1
11 12878792 19 Pistachios 1
11 12878792 19 Soda 1
11 12878792 19 Paper Towel Rolls 1
The table has multiple users who each have multiple transactions. Some users only have 3 transactions, other users have 15, etc. This is all in one table.
I'm trying to calculate a transition matrix for a markov model. I want to find the probability that an item will be in a new basket given that it was present in the previous basket of transactions.
I want my final table to look something like this
user_id product_name probability_present probability_absent
1 Soda .5 .5
1 Pistachios .5 .5
I'm having trouble figuring out how to get the data into a form so that I can calculate the probabilities and specifically coming up with a way to compare all of the t,t-1 combinations.
I have code that I've written to get things into this form, but I'm stuck at this point. I've written my code using the dplyr R package, but I could translate something in SQL into the R code. I can post my code in R if it will be helpful, but it is pretty simple at this point as I just had to do a few joins to get the table into this shape.
What else do I have to do to get the table/values that I'm trying to calculate?
This seems to give you the desired probabilities:
SELECT user_id,
product_name,
COUNT(DISTINCT order_number) / COUNT(*) AS prob_present,
1 - COUNT(DISTINCT order_number) / COUNT(*) AS prob_absent
FROM tbl
WHERE user_id = 1
GROUP BY user_id, product_name;
Or at least it gives you the numbers you have. If this is not right, please provide a slightly more complex example dataset.
quantity total
measure lv1 lv2 lv3 summary lv1% lv2% lv3%
xyz | 2 1 4 7
frs | 4 4 1 9 how to find the each level % ?
sdfkj| 4 1 1 6
From my package data I used case statement to display those columns and quantity is the measure of my cross tab.
My question is: how can I find the % value of each level as we know the metric is like each cell divided by total summary. but how will that reflect in my table if the cross table measure is quantity. xyz, frs, sdfkj are rows displayed for my query row and lv1,lv2,lv3 are my columns and total summary is the summary option to find the total of 3 levels.i am using cognos 10.2.2
Have you tried:
lv1%
[lvl1]/([lv1] + [lv2] + [lv3])
lv2%
[lvl2]/([lv1] + [lv2] + [lv3])
lv3%
[lvl3]/([lv1] + [lv2] + [lv3])
So I am using Visual Studios 2015 and MySQL 5.7.
First, I have a table called "items" which contain the following columns and rows:
BranchNo itemNo itemName Qty Pkey
1 1 Chicken 99 11
1 2 Coke 99 12
1 3 Applie Pie 99 13
Then I have another table called "setmeal" which uses the Pkey (itemNo in "setmeal" table) of table "items":
setmealno branchno itemNo Name Price SetMealID entryNo
1 1 11 1pc Fried Chicken 69 11 1
1 1 12 1pc Fried Chicken 69 11 2
2 1 13 Apple Pie Single 50 12 3
3 1 12 Coke Drink 20 13 4
Basically what table "setmeal" does is, is if there are 2 itemNo's with same "SetMealID", then they belong to the same set.
Then I have a "cart" which consists of (assuming I ordered):
TransactionNo UserNo SetMealNo Name Price Date BranchNo
1 1 11 1pc Fried Chicken 69 2015 1
So what I want to do in Visual Studios (2015) is whatever you put in the "cart", there's a code that's automatically going to deduct 1 in "Qty" in table "items" if the SetmealNo is the same with table "setmeal".
I need a code in visual studios to successfully deduct 1 from Qty whenever the conditions above are met.
Although, I think joining the 3 tables is the first step. (Although I also have no idea how to do it in MySQL Select Statements)
EDIT: I found how to merge the 3 tables
SELECT s.setmealid, t.setmealNo, i.pKey, s.itemNo, i.qty
FROM items i, setmeal s, cart t
WHERE s.itemNo = i.pKey and t.setmealno != 0
AND s.setmealID = t.setmealNo
AND t.transactionNo = '" & Form2.TextBoxTransNo.Text & "';
It Produces this table
setmealid setmealNo pKey itemNo qty
11 11 11 11 99
11 11 12 12 99
All that's left is a Visual Studios code to deduct from "qty" although I do not know how to do it.
I'm building a e-Commerce platform (PHP + MySQL) and I want to add a attribute (feature) to products, the ability to specify (enable/disable) the selling status for specific city.
Here are simplified tables:
cities
id name
==========
1 Roma
2 Berlin
3 Paris
4 London
products
id name cities
==================
1 TV 1,2,4
2 Phone 1,3,4
3 Book 1,2,3,4
4 Guitar 3
In this simple example is easy to query (using FIND_IN_SET or LIKE) to check the availability of product for specific city.
This is OK for 4 city in this example or even 100 cities but will be practical for a large number of cities and for very large number of products?
For better "performance" or better database design should I add another table to table to JOIN in query (productid, cityid, status) ?
availability
id productid cityid status
=============================
1 1 1 1
2 1 2 1
3 1 4 1
4 2 1 1
5 2 3 1
6 2 4 1
7 3 1 1
8 3 2 1
9 3 3 1
10 3 4 1
11 4 3 1
For better "performance" or better database design should I add
another table
YES definitely you should create another table to hold that information likewise you posted rather storing in , separated list which is against Normalization concept. Also, there is no way you can gain better performance when you try to JOIN and find out the details pf products available in which cities.
At any point in time if you want to get back a comma separated list like 1,2,4 of values then you can do a GROUP BY productid and use GROUP_CONCAT(cityid) to get the same.
If I have an output dataset from a CTE that looks like
PERIOD FT GROUP DEPT VALUE
1 Actual KINDER MATH 200
2 Actual KINDER MATH 363
3 Actual KINDER MATH 366
1 Budget KINDER MATH 457
2 Budget KINDER MATH 60
3 Budget KINDER MATH 158
1 Actual HIGHSCH ENGLISH 456
2 Actual HIGHSCH ENGLISH 745
3 Actual HIGHSCH ENGLISH 125
1 Budget HIGHSCH ENGLISH 364
2 Budget HIGHSCH ENGLISH 158
3 Budget HIGHSCH ENGLISH 200
6 Budget HIGHSCH ENGLISH 502
7 Budget HIGHSCH ENGLISH 650
1 Actual COLL ENGLISH 700
2 Actual COLL ENGLISH 540
3 Actual COLL ENGLISH 160
1 Budget COLL ENGLISH 820
2 Budget COLL ENGLISH 630
3 Budget COLL ENGLISH 800
but I want to add a column that will have an identifier for each group (the grouping being by FT, Group and Dept) like this:
PERIOD FT GROUP DEPT VALUE GroupID
1 Actual KINDER MATH 200 1
2 Actual KINDER MATH 363 1
3 Actual KINDER MATH 366 1
1 Budget KINDER MATH 457 2
2 Budget KINDER MATH 60 2
3 Budget KINDER MATH 158 2
1 Actual HIGHSCH ENGLISH 456 3
2 Actual HIGHSCH ENGLISH 745 3
3 Actual HIGHSCH ENGLISH 125 3
1 Budget HIGHSCH ENGLISH 364 4
2 Budget HIGHSCH ENGLISH 158 4
3 Budget HIGHSCH ENGLISH 200 4
1 Budget HIGHSCH ENGLISH 502 5
2 Budget HIGHSCH ENGLISH 650 5
3 Budget HIGHSCH ENGLISH 336 5
1 Actual COLL ENGLISH 700 6
2 Actual COLL ENGLISH 540 6
3 Actual COLL ENGLISH 160 6
1 Budget COLL ENGLISH 820 7
2 Budget COLL ENGLISH 630 7
3 Budget COLL ENGLISH 800 7
Please do you know how to go about it?
EDIT:
I feel like something in this direction may be useful
SELECT *,
CASE WHEN FT = 'Actual' THEN <something_incremental_to_do_with_row_num> OVER (PARTITION DEPT, GROUP, FT) END as GROUPID
FROM cte
I can't use ORDER BY in the OVER clause because I am on 2008
It's hard to say without seeing the SQL for the query, but I would think a variant on 'row_number() (partition by [some fields])' would give you this. Have you looked into that?
This existing question might give you what you need:
How to add sequence number for groups in a SQL query without temp tables
Chen, I believe your answer is already available in another thread:
How to return a incremental group number per group in SQL
Additionally, you could just create a unique_id for each group by concatenating fields. For example, if you wanted to group by FT, GROUP, and DEPT, then just put them all together. Ex:
SELECT *,
CAST(FT AS VARCHAR) +
CAST([GROUP] AS VARCHAR) +
CAST(DEPT AS VARCHAR) AS UNIQUE_ID
FROM MYTABLE
If you wanted to use it for a while to do a more complex query, then just throw it into a temp table:
SELECT *,
CAST(FT AS VARCHAR) +
CAST([GROUP] AS VARCHAR) +
CAST(DEPT AS VARCHAR) AS UNIQUE_ID
INTO #MYTEMPTABLE
FROM MYTABLE
You only need the CAST function if there are different data types in the fields you want to join. Best to keep it in to avoid any unnecessary headaches. Hope this helps.