I'm learning MySQL (via mode) and am approaching the Advanced section. I have a dataset that contains events that recur and I want to add a column that identifies the first, second, third, fourth, etc (no limit) occurrence of the events for that day.
CustomerID ActivityType Day Sequence
Adam Inquiry 1 1
Barb Inquiry 1 2
Adam Inquiry 1 3
Charlie Inquiry 1 4
Barb Order 1 5
Charlie Inquiry 1 6
Adam Inquiry 1 7
Barb Order 1 8
I've searched here for problems that seem similar, to help focus my learning but I don't see anything quite the same.
My desired output would be the same as above but with an added column that shows the sequence of recurring combinations of customer and activity, like this:
CustomerID ActivityType Day Sequence Recur
Adam Inquiry 1 1 1
Barb Inquiry 1 2 1
Adam Inquiry 1 3 2
Charlie Inquiry 1 4 1
Barb Order 1 5 1
Charlie Inquiry 1 6 2
Adam Inquiry 1 7 3
Barb Order 1 8 2
In MySQL 8.0, you can simply use window function ROW_NUMBER() for that purpose:
SELECT
t.*,
ROW_NUMBER() OVER(PARTITION BY customerID, ActivityType ORDER BY Sequence) recur
FROM mytable t
Note: As commented by Strawberry, storing that information in a additional column does not make a lot of sense, because it can be complicated to maintain: for example, if your table is updated, you will potentially need to recompute the whole column. You would better compute the information on the fly when querying the table.
Related
I have a database of transactions like in the table below
user_id order_id order_number product_name n
<int> <int> <int> <fctr> <int>
1 11878590 3 Pistachios 1
1 11878590 3 Soda 1
1 12878790 4 Yogurt 1
1 12878790 4 Cheddar Popcorn 1
1 12878790 4 Cinnamon Toast Crunch 1
2 12878791 11 Milk Chocolate Almonds 1
2 12878791 11 Half & Half 1
2 12878791 11 String Cheese 1
11 12878792 19 Whole Milk 1
11 12878792 19 Pistachios 1
11 12878792 19 Soda 1
11 12878792 19 Paper Towel Rolls 1
The table has multiple users who each have multiple transactions. Some users only have 3 transactions, other users have 15, etc. This is all in one table.
I'm trying to calculate a transition matrix for a markov model. I want to find the probability that an item will be in a new basket given that it was present in the previous basket of transactions.
I want my final table to look something like this
user_id product_name probability_present probability_absent
1 Soda .5 .5
1 Pistachios .5 .5
I'm having trouble figuring out how to get the data into a form so that I can calculate the probabilities and specifically coming up with a way to compare all of the t,t-1 combinations.
I have code that I've written to get things into this form, but I'm stuck at this point. I've written my code using the dplyr R package, but I could translate something in SQL into the R code. I can post my code in R if it will be helpful, but it is pretty simple at this point as I just had to do a few joins to get the table into this shape.
What else do I have to do to get the table/values that I'm trying to calculate?
This seems to give you the desired probabilities:
SELECT user_id,
product_name,
COUNT(DISTINCT order_number) / COUNT(*) AS prob_present,
1 - COUNT(DISTINCT order_number) / COUNT(*) AS prob_absent
FROM tbl
WHERE user_id = 1
GROUP BY user_id, product_name;
Or at least it gives you the numbers you have. If this is not right, please provide a slightly more complex example dataset.
I am pretty new to mysql and this site. I got an old mysql database (100.000 entries) to migrate to our new system. This is the old table:
CUSTOMER
Customer_ID Name Categories
1 Bob 1,2
2 Phil NULL
3 Ines 10,8
4 Carol 1
5 Rick 13,2
And i need the following structure:
CUSTOMER
Customer_ID Name
1 Bob
2 Phil
3 Ines
4 Carol
5 Rick
Category
Category_ID Category_Name
1 Biker
2 Doctors
3 Teacher
... ...
13 Drivers
CustomerHasCategory
Customer_ID Category_ID
1 1
1 2
3 10
3 8
4 1
5 13
5 2
Thanks for any help.
I also had this problem but not in MySQL. I solved it with Python using the Pandas library. So, the exact steps I followed won't be useful for you. However, I'll show you the general idea behind the solution I used.
Below is image of the original column
First, I splitted the text into columns using the comas as the delimiter.
Next, I 'stacked' the columns
Finally, I removed the artefact column(s). So, I have only the ID and the values columns. This creates a one-to-many relationship.
I'm building a e-Commerce platform (PHP + MySQL) and I want to add a attribute (feature) to products, the ability to specify (enable/disable) the selling status for specific city.
Here are simplified tables:
cities
id name
==========
1 Roma
2 Berlin
3 Paris
4 London
products
id name cities
==================
1 TV 1,2,4
2 Phone 1,3,4
3 Book 1,2,3,4
4 Guitar 3
In this simple example is easy to query (using FIND_IN_SET or LIKE) to check the availability of product for specific city.
This is OK for 4 city in this example or even 100 cities but will be practical for a large number of cities and for very large number of products?
For better "performance" or better database design should I add another table to table to JOIN in query (productid, cityid, status) ?
availability
id productid cityid status
=============================
1 1 1 1
2 1 2 1
3 1 4 1
4 2 1 1
5 2 3 1
6 2 4 1
7 3 1 1
8 3 2 1
9 3 3 1
10 3 4 1
11 4 3 1
For better "performance" or better database design should I add
another table
YES definitely you should create another table to hold that information likewise you posted rather storing in , separated list which is against Normalization concept. Also, there is no way you can gain better performance when you try to JOIN and find out the details pf products available in which cities.
At any point in time if you want to get back a comma separated list like 1,2,4 of values then you can do a GROUP BY productid and use GROUP_CONCAT(cityid) to get the same.
I want to set display sequence for all users
i have taken select box for set sequence
where i have displayed total no of users like now total records are 5 then i will display
1,2,3,4,5
id name sequence
------------------------
1 Steve 3
2 lee 2
3 Lisa 1
4 John 0
5 smith 0
But when i am going to edit tab sequence for John i want to display 4 OR 5 in select box
because 1,2,3 are already used for other users
Can i manage all things in single query
Please advice
Supoose I have the following:
tbl_options
===========
id name
1 experience
2 languages
3 hourly_rate
tbl_option_attributes
=====================
id option_id name value
1 1 beginner 1
2 1 advanced 2
3 2 english 1
4 2 french 2
5 2 spanish 3
6 3 £10 p/h 10
7 3 £20 p/h 20
tbl_user_options
================
user_id option_id value
1 1 2
1 2 1
1 2 2
1 2 3
1 3 20
In the above example tbl_user_options stores option data for the user. We can store multiple entries for some options.
Now I wish to extend this, i.e. for "languages" I want the user to be able to specify their proficiency in a language (basic/intermediate/advanced). There will also be other fields that will have extended attributes.
So my question is, can these extended attributes be stored in the same table (tbl_user_options) or do I need to create more tables? Obviously if I put in a field "language_proficiency" it won't apply to the other fields. But this way I only have one user options table to manage. What do you think?
EDIT: This is what I propose
tbl_user_options
================
user_id option_id value lang_prof
1 1 2 null
1 2 1 2
1 2 2 3
1 2 3 3
1 3 20 null
My gut instinct would be to split the User/Language/Proficiency relationship out into its own tables. Even if you kept it in the same table with your other options, you'd need to write special code to handle the language case, so you might as well use a new table structure.
Unless your data model is in constant flux, I would rather have tbl_languages and tabl_user_languages tables to store those types of data:
tbl_languages
================
lang_id name
1 English
2 French
3 Spanish
tbl_user_languages
================
user_id lang_id proficiency hourly_rate
1 1 1 20
1 2 2 10
2 2 1 15
2 2 3 20
3 3 2 10
Designing a system that is "too generic" is a Turing tarpit trap for a relational SQL database. A document-based database is better suited to arbitrary key-value stores.
Excepting certain optimisations, your database model should match your domain model as closely as possible to minimise the object-relational impedance mismatch.
This design lets you display a sensible table of user language proficiencies and hourly rates with only two inner joins:
SELECT
ul.user_id,
u.name,
l.name,
ul.proficiency,
ul.hourly_rate
FROM tbl_user_languages ul
INNER JOIN tbl_languages l
ON l.lang_id = ul.lang_id
INNER JOIN tbl_users u
ON u.user_id = ul.user_id
ORDER BY
l.name, u.hour
Optionally you can split out a list of language proficiencies into a tbl_profiencies table, where 1 == Beginner, 2 == Advanced, 3 == Expert and join it onto tbl_user_languages.
i'm thinking it's a mistake to put "languages" as an option. while reading your text it seems to me that english is an option, and it might have an attribute from option_attributes.