How to request lists that contain certain items in MySQL - mysql

In the application I am developing, the user has to set parameters to define the end product he will get.
My tables look like this :
Categories
-------------
Id Name
1 Material
2 Color
3 Shape
Parameters
-------------
Id CategoryId Name
1 1 Wood
2 1 Plastic
3 1 Metal
4 2 Red
5 2 Green
6 2 Blue
7 3 Round
8 3 Square
9 3 Triangle
Combinations
-------------
Id
1
2
...
ParametersCombinations
----------------------
CombinationId ParameterId
1 1
1 4
1 7
2 1
2 5
2 7
Now only some combinations of parameters are available to the user. In my example, he could get a red round wooden thingy or a green round wooden thingy but not a blue one because I can't produce it.
Let's say the user selected wood and round parameters. How do I make a request to know that there's only red and green available so I can disable the blue option for him ?
Or is there some better way to model my database ?

Let us assume you provide the selected parameters id in the following format
// I call this a **parameterList** for convenience sake.
(1,7) // this is parameter id 1 and id 7.
I am also assuming you are using some scripting language to help you with your app. Like ruby or php.
I am also assuming you want to avoid putting as much logic into your stored procedure or MySQL queries as much as possible.
Another assumption is that you are using one of the Rapid Application MVC Frameworks like Rails, Symfony or CakePHP.
Your logic would be:
Find all the combinations that contain ALL the parameters in your parameterList and put these found combinations in a list called relevantCombinations
Find all the parameters_combinations that contain at least 1 of the combinations in the list relevantCombinations. Retrieve only the unique parameter values.
First two steps can be solved using simple Model::find methods and a forloop in the frameworks I described above.
If you are not using frameworks, it is also cool to use the scripting language raw.
If you require them in MySQL queries, here are some possible queries. Be aware that these are not necessary the best queries.
First one is
SELECT * FROM (
SELECT `PossibleList`.`CombinationId`, COUNT(`PossibleList`.`CombinationId`) as number
FROM (
SELECT `CombinationId` FROM `ParametersCombinations`
WHERE `ParameterId` IN (1, 7)
) `PossibleList` GROUP BY `PossibleList`.`CombinationId`
) `PossibleGroupedList` WHERE `number` = 2;
-- note that the (1, 7) and the number 2 needs to be supplied by your app.
-- 2 refers to the number of parameters supplied.
-- In this case you supplied 1 and 7 therefore 2.
To confirm, look at http://sqlfiddle.com/#!2/16831/3.
Note how I purposely have a Combination 3 which only has the Parameter 1 but not 7. Therefore the query did not give you back 3, but only 1 and 2. Feel free to tweak the asterisk * in the first line.
Second one is
SELECT DISTINCT(`ParameterID`)
FROM `ParametersCombinations`
WHERE `CombinationId` IN (1, 2);
-- note that (1, 2) is the result we expect from the first step.
-- the one we call relevantCombinations
To confirm, look at http://sqlfiddle.com/#!2/16831/5
I do not recommend being a masochist and attempt to get your answer in a single query.
I also do NOT recommend using the MySQL queries I have supplied. It is less masochistic. But sufficiently masochistic for me NOT to recommend this way.
Since you did not indicate any tag other than mysql, I suspect that you are stronger with mysql. Hence my answer contains mysql.
My strongest suggestion would be my first. Make full use of established frameworks and put your logic in the business logic layer. Not in the data layer. Even if you don't use frameworks and just use raw php and ruby, that is still a better place for you to place your logic in than MySQL.
I saw that T gave an answer in a single MySQL query but I can tell you that (s)he considers only 1 parameter.
See this part:
WHERE ParameterId = 7 -- 7 is the selected parameter
You can adapt his/her answer with some trickery using a forloop and appending OR clauses.
Again, I do NOT recommend that in the big picture of building an app.
I have also tested his/her answer with http://sqlfiddle.com/#!2/2eda4/2. There may be 1 or 2 small bugs.
In summary, my recommendations in descending order of strength:
Use a framework like Rails or CakePHP and the pseudocode step 1 and 2 and as many find as you need. (STRONGEST)
Use raw scripting language and the pseudocode step 1 and 2 and as many simple queries as you need.
Use the raw MySQL queries I created. (LEAST STRONG)
P.S. I left out the part in my queries as to how to get the name of the Parameters. But given that you can get the ParameterIDs from my answer, I think that is trivial. I have also left out how you may need to remove the already selected parameters (1, 7). Again, that should be trivial to you.

Try the following
SELECT p.*, pc.CombinationId
FROM Parameters p
-- get the parameter combinations for all the parameters
JOIN ParametersCombinations pc
ON pc.ParameterId = p.Id
-- filter the parameter combinations to only combinations that include the selected parameter
JOIN (
SELECT CombinationId
FROM ParametersCombinations
WHERE ParameterId = 7 -- 7 is the selected parameter
) f ON f.CombinationId = pc.CombinationId
Or removing the already selected parameters
SELECT p.*, pc.CombinationId
FROM Parameters p
JOIN ParametersCombinations pc
ON pc.ParameterId = p.Id
JOIN (
SELECT CombinationId
FROM ParametersCombinations
WHERE ParameterId IN (7, 1)
) f ON f.CombinationId = pc.CombinationId
WHERE ParameterId NOT IN (7, 1)

Related

MySQL - counter query

Have two tables users and user_demographics
users has the basic structure of (does have more fields but not needed here):
id name email gender age ethnicity
1 test1 test1#test.com 1 1 1
2 test2 test2#test.com 1 2 1
3 test3 test3#test.com 2 3 2
4 test4 test4#test.com 3 1 1
5 test5 test5#test.com 2 4 5
**Gender**:
1 - Male,
2 - Female,
3 - Prefer not to say
**Age**:
1 - 16-20,
2 - 21-24,
3 - 25-30,
4 - 31-24
**Ethnicity**:
1 - White,
2 - Black,
5 - Prefer not to say
and so on and currently have around 1000 users.
user_demographics structure is:
coreid, type, option (for the sake of this question 'type' will be text, just to make it clearer)
coreid, type option
1 gender 1
1 gender 2
1 age 1
1 age 3
1 ethnicity 1
2 gender 2
2 gender 3
2 age 3
3 gender 1
On a web based form I have 3 sets of checkbox lists, one for each option gender, age, ethnicity and the a user can select multiple from each. They click update and these details are stored in the mysql database as above. coreid is related to another table, but not relevant here.
What I'm trying to do is get a total count of users for each coreid regardless of what type it is. The count should get smaller the more options you select. So coreid 3 should have the biggest count because I've only selected one option.
Example: coreid 3 is selecting all males
Example: coreid 2 is selecting all (females AND 'prefer not to say') AND age range 25-30
Struggling on how to create a single query that will give me the results I need, hope this makes sense.
The idea behind the over all system is that we have a large form that a user fill outs and we store in the information in the users table. Then a member of the admin team can go in and select these users by selecting options from the various demographics information we have collected. So they might just want to see everyone that has ticked the gender options of 'male' and 'prefer not to say' for example. Another admin member may go in and say they want all males, between the age of 25-30. Or they could just tick all options under gender. The idea is that they can select any combination and get a list of results. At the minute I just need to get a count back for the combination selected.
By the SOUNDS of it, you are probably going to need to do with dynamic SQL where you actually build the query on-the-fly, then execute that. Also, to clarify what I THINK you are asking is as follows. CoreID is like a set of filters that some manager is interested in getting count and or details of specific users. They are interested in
EITHER gender condition (1 or 2)
AND EITHER age condition (1 or 3)
AND just the one ethnicity
to possibly target products that might hit those demographics. So you would pre-query every record for CoreID = 1 then start building your query. You would want to order your query by the TYPE to group common items such as the gender, age, ethnicity categories.
Then, within your either localized code (not indicated such as C#, VB, java, whatever), you would need to build the query in such a way that you parenthesis OR those within same category, and logical AND between different such as
where
( Gender = 1
OR Gender = 2 )
AND ( Age = 1
OR Age = 3 )
AND ( Ethnicity = 1 )
If you are trying to write as a MySQL stored procedure, it would be a type of dynamic SQL query... either way, the WHERE clause needs to be constructed from the Core criteria someone is looking for.
You are correct, the last one would be easiest for CoreID = 3 would be a simple
WHERE ( Gender = 1 )
Clarify language source and I or others might be able to offer additional direction, but if I am accurate, you should try to write your own first pass of code, but I will shoot out a pseudo-code for you something like
Get Records Ordered for one CoreID, order by the type of criteria.
prep variable identifying if pending open Parenthesis
prep variable identifying last "type" building for.
for each record
If new type
if has Open Parenthesis
add closing paren
add logical AND before the next entry we are getting
add open parenthesis
set flag we have open parenthesis
else
since same type as last type, add logical OR
go to next record, repeat.
If after last record we would always need to close parenthesis even if a single criteria

SQL - Add To Existing Average

I'm trying to build a reporting table to track server traffic and popularity overall. Each SID is a unique game server hosting a particular game, and each UCID is a unique player key connecting to that server.
Say I have a table like so:
SID UCID AvgTime NumConnects
-----------------------------------------
1 AIE9348ietjg 300.55 5
1 Po328gieijge 500.66 7
2 AIE9348ietjg 234.55 3
3 Po328gieijge 1049.88 18
We can see that there are 2 unique players, and 3 unique servers, with SID 1 having 2 players that have connected to it at some point in the past. The AvgTime is the average amount of time those players spent on that server (in seconds), and the NumConnects is the size of the average (ie. 300.55 is averaged out of 5 elements).
Now I run a job in the background where I process a raw connection table and pull out player connections like so:
SID UCID ConnectTime DisconnectTime
-----------------------------------------
1 AIE9348ietjg 90.35 458.32
2 Po328gieijge 30.12 87.15
2 AIE9348ietjg 173.12 345.35
This table has no ID or other fluff to help condense my example. There may be multiple connect/disconnect records for multiple players in this table. What I want to do is add to my existing AvgTime for each SID these new values.
There is a formula from here I am trying to use (taken from this math stackexchange: https://math.stackexchange.com/questions/1153794/adding-to-an-average-without-unknown-total-sum/1153800#1153800)
Average = (Average * Size + NewValue) / Size + 1
How can I write an update query to update each ServerIDs traffic table above, and add to the average using the above formula for each pair of records. I tried something like the following but it didn't work (returned back null):
UPDATE server_traffic st
LEFT JOIN connect_log l
ON st.SID = l.SID AND st.UCID = l.UCID
SET AvgTime = (AvgTime * NumConnects + SUM(l.DisconnectTime - l.ConnectTime) / NumConnects + COUNT(l.UCID)
I would prefer an answer in MySql, but I'll accept MS SQL as well.
EDIT
I understand that statistics and calculations are generally not to be stored in tables and that you can run reports that would crunch the numbers for you. My requirement is that users can go to a website and view the popularity of various servers. This needs to be done in a way that
A: running a complex query per user doesn't crash or slow down the system
B: the page returns the data within a few seconds at most
See this example here: https://bf4stats.com/pc/shinku555555
This is a web page for battlefield 4 stats - notice that the load is almost near instant for this player, and I get back a load of statistics without waiting for some complex report query to return the data. I'm assuming they store these calculations in preprocessed tables where the webpage just needs to do a simple select to return back the values. That's the same approach I want to take with my Database and Web Application design.
Sorry if this is off topic to the original question - but hopefully this adds additional context that helps people understand my needs.
Since you cannot run aggregate functions like SUM and COUNT by themselves at the unit level in SQL but contained in an aggregate query, consider joining to an aggregate subquery for the UPDATE...LEFT JOIN. Also, adjust parentheses in SET to match above formula.
Also, note that since you use LEFT JOIN, rows with non-match IDs will render NULL for aggregate fields and this entity cannot be used in arithmetic operations and will return NULL. You can convert to zero with IFNULL() but may fail with formula's division.
UPDATE server_traffic s
LEFT JOIN
(SELECT SID, UCID, COUNT(UCID) As GrpCount,
SUM(DisconnectTime - ConnectTime) AS SumTimeDiff
FROM connect_log
GROUP BY SID, UCID) l
ON s.SID = l.SID AND s.UCID = l.UCID
SET s.AvgTime = (s.AvgTime * s.NumConnects + l.SumTimeDiff) / s.NumConnects + l.GrpCount
Aside - reconsider saving calculations/statistics within tables as they can always be run by queries even by timestamps. Ideally, database tables should store raw values.

database schema one column entry references many rows from another table

Let's say we have a table called Workorders and another table called Parts. I would like to have a column in Workorders called parts_required. This column would contain a single item that tells me what parts were required for that workorder. Ideally, this would contain the quantities as well, but a second column could contain the quantity information if needed.
Workorders looks like
WorkorderID date parts_required
1 2/24 ?
2 2/25 ?
3 3/16 ?
4 4/20 ?
5 5/13 ?
6 5/14 ?
7 7/8 ?
Parts looks like
PartID name cost
1 engine 100
2 belt 5
3 big bolt 1
4 little bolt 0.5
5 quart oil 8
6 Band-aid 0.1
Idea 1: create a string like '1-1:2-3:4-5:5-4'. My application would parse this string and show that I need --> 1 engine, 3 belts, 5 little bolts, and 4 quarts of oil.
Pros - simple enough to create and understand.
Cons - will make deep introspection into our data much more difficult. (costs over time, etc)
Idea 2: use a binary number. For example, to reference the above list (engine, belt, little bolts, oil) using an 8-bit integer would be 54, because 54 in binary representation is 110110.
Pros - datatype is optimal concerning size. Also, I am guessing there are tricky math tricks I could use in my queries to search for parts used (don't know what those are, correct me if I'm in the clouds here).
Cons - I do not know how to handle quantity using this method. Also, Even with a 64-bit BIGINT still only gives me 64 parts that can be in my table. I expect many hundreds.
Any ideas? I am using MySQL. I may be able to use PostgreSQL, and I understand that they have more flexible datatypes like JSON and arrays, but I am not familiar with how querying those would perform. Also it would be much easier to stay with MySQL
Why not create a Relationship table?
You can create a table named Workorders_Parts with the following content:
|workorderId, partId|
So when you want to get all parts from a specific workorder you just type:
select p.name
from parts p inner join workorders_parts wp on wp.partId = p.partId
where wp.workorderId = x;
what the query says is:
Give me the name of parts that belongs to workorderId=x and are listed in table workorders_parts
Remembering that INNER JOIN means "INTERSECTION" in other words: data i'm looking for should exist (generally the id) in both tables
IT will give you all part names that are used to build workorder x.
Lets say we have workorderId = 1 with partID = 1,2,3, it will be represented in our relationship table as:
workorderId | partId
1 | 1
1 | 2
1 | 3

Mysql combinations for 2 data sets

I would like a table or query in mysql of all permutations of two separate datasets, but with rules.
I have a table of jobs, and a table of drivers.
I'd like to produce table or query of all combinations of jobs to drivers. Each job must have a driver, but each driver doesn't necessarily need a job. Like this:
In this example I have 4 jobs and 3 drivers.
Job1 | Job2 | Job3 | Job4
1 1 1 1
1 1 1 2
1 1 1 3
1 1 2 1
1 1 3 1
This, I can't do, so if someone could help me that would be awesome. I believe that the number of rows in this example would be 4 to the power 3 (jobs to the power of drivers) which is 64 rows.
But the second part of this is what I call the "rules". Each job will have defined drivers that can do the job.
For example Job 1 can only be done by driver 1 or 3.
Job 2 can only be done by driver 1.
etc
I was thinking that if I did a create table, then running delete queries, but I am really at a loss. I would like to just create the query using the rules to start with in an attempt to speed it up.
This will eventually help me to make a plan for each job by showing all the ways that these jobs can be assigned.
Sorry for being vague but I'm hoping the community can help me out here.
Edit:
I think my maths may have been wrong. According to this: combination calculator where I input 3 to choose from (drivers) and 4 numbers chosen (jobs) order is important and repetition is allowed (not sure what that is) then it produces 81.
Although way unclear and smells like homework, her my two cents:
Setup one table with the jobs jobs:
job_id|job_name
Next, set up a table with drivers:
driver_id|driver_name
Now we need to maps:
First the "rules", drivers which are capable of jobs, job_capabilities:
driver_id|job_id
this will contain one row for each assignment job => driver, the combination (job_id, driver_id) should be unique - one driver is only one times capable of a particular job.
Second map contains the assignments itself, assigned_jobs:
driver_id|job_id
actually that has the same layout, but for a given time period (which is missing here) one driver actually can only work on one job, so driver_id and a date-time should be unique. Skipped for clearness now.
Now we can construct an SQL like
SELECT job_capabilities.driver_id, job.job_id from jobs, job_capabilities
where job.job_id = job_capabilities.job_id
AND job.job_id = 42;
We could just use that to insert into assigned_jobs with
INSERT INTO assigned_jobs .... (Select from above) ...
probably enhanced by a ON DUPLACET KEY UPDATE ... clause.
To now get the assigned jobs, we can alter that statement a bit:
SELECT assigned_jobs.driver_id, drivers.driver_name, job.job_id, job.job_name
from drivers, jobs, assigned_jobs
where job.job_id = assigned_jobs.job_id
AND drivers.driver_id = assigned_jobs.driver_id
AND job.job_id = 42;
This is not tested and probably not valid SQL, but a first approach I would use.

Storing Hierarchical Data (MySQL) for Referral Marketing

I need to have a 5 levels hierarchy for the users registered to a website. Every user is invited by another, and I need to know all descendants for a user. And also ancestors for a user.
I have in mind 2 solution.
Keeping a table with relationships this way. A closure table:
ancestor_id descendant_id distance
1 1 0
2 2 0
3 3 0
4 4 0
5 5 0
6 6 0
2 3 1
Having this table for relationships. Keeping in a table 5 levels ancestors. A "ancestors" table:
user_id ancestor_level1_id ancestor_level2_id ancestor_level3_id ancestor_level4_id ancestor_level5_id
10 9 7 4 3 2
9 7 4 3 2 1
Are these good ideas?
I know about "the adjacency list model" and "the modified preorder tree traversal algorithm", but are these good solutions for a "referral" system?
The queries that I need to perform on this tree are:
frequently adding a new users
when a user buys something, their referrers get a percentage commission
every user should be able to find out how many people they've referred (and how many people were referred by people who they referred....) at each level
Closure Table
ancestor_id descendant_id distance
1 1 0
2 2 0
3 3 0
4 4 0
5 5 0
6 6 0
2 3 1
To add user 10, referred by user 3. (I don't think you need to lock the table between these two insertions):
insert into ancestor_table
select ancestor_id, 10, distance+1
from ancestor_table
where descendant_id=3;
insert into ancestor_table values (10,10,0);
To find all users referred by user 3.
select descendant_id from ancestor_table where ancestor_id=3;
To count those users by depth:
select distance, count(*) from ancestor_table where ancestor_id=3 group by distance;
To find the ancestors of user 10.
select ancestor_id, distance from ancestor_table where descendant_id=10;
The drawback to this method is amount of storage space this table will take.
Use the OQGRAPH storage engine.
You probably want to keep track of an arbitrary number of levels, rather than just 5 levels. Get one of the MySQL forks that supports the QGRAPH engine (such as MariaDB or OurDelta), and use that to store your tree. It implements the adjacency list model, but by using a special column called latch to send a command to the storage engine, telling it what kind of query to perform, you get all of the advantages of a closure table without needing to do the bookkeeping work each time someone registers for your site.
Here are the queries you'd use in OQGRAPH. See the documentation at
http://openquery.com/graph-computation-engine-documentation
We're going to use origid as the referrer, and destid as the referree.
To add user 11, referred by user 10
insert into ancestors_table (origid,destid) values (10,11)
To find all users referred by user 3.
SELECT linkid FROM ancestors_table WHERE latch = 2 AND origid = 3;
To find the ancestors of user 10.
SELECT linkid FROM ancestors_table WHERE latch = 2 AND destid = 10;
To find the number of users at each level, referred by user 3:
SELECT count(linkid), weight
FROM ancestors_table
WHERE latch = 2 AND origid = 3
GROUP BY weight;
Managing Hierarchical Data in MySQL
In general, I like the "nested set", esp. in MySQL which doesn't really have language support for hierarchical data.
It's fast, but you'll need to make sure your developers read that article if ease of maintenance is a big deal. It's very flexible - which doesn't seem to matter much in your case.
It seems a good fit for your problem - in the referral model, you need to find the tree of referrers, which is fast in the nested set model; you also need to know who are the ~children# of a given user, and the depth of their relationship; this is also fast.
Delimited String of Ancestors
If you're strongly considering the 5-level relationship table, it may simplify things to use a delimited string of ancestors instead of 5 separate columns.
user_id depth ancestors
10 7 9,7,4,3,2,1
9 6 7,4,3,2,1
...
2 2 1
1 1 (empty string)
Here are some SQL commands you'd use with this model:
To add user 11, referred by user 10
insert into ancestors_table (user_id, depth, ancestors)
select 11, depth+1, concat(10,',',ancestors)
from ancestors_table
where user_id=10;
To find all users referred by user 3. (Note that this query can't use an index.)
select user_id
from ancestors_table
where ancestors like '%,3,%' or ancestors like '3,%' or ancestors like '%,3';
To find the ancestors of user 10. You need to break up the string in your client program. In Ruby, the code would be ancestorscolumn.split(",").map{|x| x.to_i}. There's no good way to break up the string in SQL.
select ancestors from ancestors_table where user_id=10;
To find the number of users at each level, referred by user 3:
select
depth-(select depth from ancestors_table where user_id=3),
count(*)
from ancestors_table
where ancestors like '%,3,%' or ancestors like '3,%' or ancestors like '%,3'
group by depth;
You can avoid SQL injection attacks in the like '%,3,%' parts of these queries by using like concat('%,', ?, ',%') instead and binding the an integer for the user number to the placeholder.