MySQL - counter query - mysql

Have two tables users and user_demographics
users has the basic structure of (does have more fields but not needed here):
id name email gender age ethnicity
1 test1 test1#test.com 1 1 1
2 test2 test2#test.com 1 2 1
3 test3 test3#test.com 2 3 2
4 test4 test4#test.com 3 1 1
5 test5 test5#test.com 2 4 5
**Gender**:
1 - Male,
2 - Female,
3 - Prefer not to say
**Age**:
1 - 16-20,
2 - 21-24,
3 - 25-30,
4 - 31-24
**Ethnicity**:
1 - White,
2 - Black,
5 - Prefer not to say
and so on and currently have around 1000 users.
user_demographics structure is:
coreid, type, option (for the sake of this question 'type' will be text, just to make it clearer)
coreid, type option
1 gender 1
1 gender 2
1 age 1
1 age 3
1 ethnicity 1
2 gender 2
2 gender 3
2 age 3
3 gender 1
On a web based form I have 3 sets of checkbox lists, one for each option gender, age, ethnicity and the a user can select multiple from each. They click update and these details are stored in the mysql database as above. coreid is related to another table, but not relevant here.
What I'm trying to do is get a total count of users for each coreid regardless of what type it is. The count should get smaller the more options you select. So coreid 3 should have the biggest count because I've only selected one option.
Example: coreid 3 is selecting all males
Example: coreid 2 is selecting all (females AND 'prefer not to say') AND age range 25-30
Struggling on how to create a single query that will give me the results I need, hope this makes sense.
The idea behind the over all system is that we have a large form that a user fill outs and we store in the information in the users table. Then a member of the admin team can go in and select these users by selecting options from the various demographics information we have collected. So they might just want to see everyone that has ticked the gender options of 'male' and 'prefer not to say' for example. Another admin member may go in and say they want all males, between the age of 25-30. Or they could just tick all options under gender. The idea is that they can select any combination and get a list of results. At the minute I just need to get a count back for the combination selected.

By the SOUNDS of it, you are probably going to need to do with dynamic SQL where you actually build the query on-the-fly, then execute that. Also, to clarify what I THINK you are asking is as follows. CoreID is like a set of filters that some manager is interested in getting count and or details of specific users. They are interested in
EITHER gender condition (1 or 2)
AND EITHER age condition (1 or 3)
AND just the one ethnicity
to possibly target products that might hit those demographics. So you would pre-query every record for CoreID = 1 then start building your query. You would want to order your query by the TYPE to group common items such as the gender, age, ethnicity categories.
Then, within your either localized code (not indicated such as C#, VB, java, whatever), you would need to build the query in such a way that you parenthesis OR those within same category, and logical AND between different such as
where
( Gender = 1
OR Gender = 2 )
AND ( Age = 1
OR Age = 3 )
AND ( Ethnicity = 1 )
If you are trying to write as a MySQL stored procedure, it would be a type of dynamic SQL query... either way, the WHERE clause needs to be constructed from the Core criteria someone is looking for.
You are correct, the last one would be easiest for CoreID = 3 would be a simple
WHERE ( Gender = 1 )
Clarify language source and I or others might be able to offer additional direction, but if I am accurate, you should try to write your own first pass of code, but I will shoot out a pseudo-code for you something like
Get Records Ordered for one CoreID, order by the type of criteria.
prep variable identifying if pending open Parenthesis
prep variable identifying last "type" building for.
for each record
If new type
if has Open Parenthesis
add closing paren
add logical AND before the next entry we are getting
add open parenthesis
set flag we have open parenthesis
else
since same type as last type, add logical OR
go to next record, repeat.
If after last record we would always need to close parenthesis even if a single criteria

Related

Storing csv in MySQL field – bad idea?

I have two tables, one user table and an items table. In the user table, there is the field "items". The "items" table only consists of a unique id and an item_name.
Now each user can have multiple items. I wanted to avoid creating a third table that would connect the items with the user but rather have a field in the user_table that stores the item ids connected to the user in a "csv" field.
So any given user would have a field "items" that could have a value like "32,3,98,56".
It maybe is worth mentioning that the maximum number of items per user is rather limited (<5).
The question: Is this approach generally a bad idea compared to having a third table that contains user->item pairs?
Wouldn't a third table create quite an overhead when you want to find all items of a user (I would have to iterate through all elements returned by MySQL individually).
You don't want to store the value in the comma separated form.
Consider the case when you decide to join this column with some other table.
Consider you have,
x items
1 1, 2, 3
1 1, 4
2 1
and you want to find distinct values for each x i.e.:
x items
1 1, 2, 3, 4
2 1
or may be want to check if it has 3 in it
or may be want to convert them into separate rows:
x items
1 1
1 2
1 3
1 1
1 4
2 1
It will be a HUGE PAIN.
Use atleast normalization 1st principle - have separate row for each value.
Now, say originally you had this as you table:
x item
1 1
1 2
1 3
1 1
1 4
2 1
You can easily convert it into csv values:
select x, group_concat(item order by item) items
from t
group by x
If you want to search if x = 1 has item 3. Easy.
select * from t where x = 1 and item = 3
which in earlier case would use horrible find_in_set:
select * from t where x = 1 and find_in_set(3, items);
If you think you can use like with CSV values to search, then first like %x% can't use indexes. Second, it will produce wrong results.
Say you want check if item ab is present and you do %ab% it will return rows with abc abcd abcde .... .
If you have many users and items, then I'd suggest create separate table users with an PK userid, another items with PK itemid and lastly a mapping table user_item having userid, itemid columns.
If you know you'll just need to store and retrieve these values and not do any operation on it such as join, search, distinct, conversion to separate rows etc. etc. - may be just may be, you can (I still wouldn't).
Storing complex data directly in a relational database is a nonstandard use of a relational database. Normally they are designed for normalized data.
There are extensions which vary according to the brand of software which may help. Or you can normalize your CSV file into properly designed table(s). It depends on lots of things. Talk to your enterprise data architect in this case.
Whether it's a bad idea depends on your business needs. I can't assess your business needs from way out here on the internet. Talk to your product manager in this case.

SQL Report Builder for Survey Data

I have a large dataset with People's names and their Rating from 1 to 5.
Then I made a query that summarizes this data for PersonA:
Rating Count
------- ------
1 4
2 6
3 1
4 0
5 2
I just need to know how to show this on my report.
I have made a cell for each rating and need to put in an expression that says "If Rating=1, show count for rating 1".
I tried using =IIf(Fields!Rating.Value = 1, Fields!Count.Value, 0) but this didn't work.
I'm not sure why you would need an expression like that, based on your description of the dataset it sounds like you already have two columns of data for rating and count, so you could use a tablix (table), with columns:
Rating Count
which would list all the rating values and associated count values, similar to the example result in your question.

How to request lists that contain certain items in MySQL

In the application I am developing, the user has to set parameters to define the end product he will get.
My tables look like this :
Categories
-------------
Id Name
1 Material
2 Color
3 Shape
Parameters
-------------
Id CategoryId Name
1 1 Wood
2 1 Plastic
3 1 Metal
4 2 Red
5 2 Green
6 2 Blue
7 3 Round
8 3 Square
9 3 Triangle
Combinations
-------------
Id
1
2
...
ParametersCombinations
----------------------
CombinationId ParameterId
1 1
1 4
1 7
2 1
2 5
2 7
Now only some combinations of parameters are available to the user. In my example, he could get a red round wooden thingy or a green round wooden thingy but not a blue one because I can't produce it.
Let's say the user selected wood and round parameters. How do I make a request to know that there's only red and green available so I can disable the blue option for him ?
Or is there some better way to model my database ?
Let us assume you provide the selected parameters id in the following format
// I call this a **parameterList** for convenience sake.
(1,7) // this is parameter id 1 and id 7.
I am also assuming you are using some scripting language to help you with your app. Like ruby or php.
I am also assuming you want to avoid putting as much logic into your stored procedure or MySQL queries as much as possible.
Another assumption is that you are using one of the Rapid Application MVC Frameworks like Rails, Symfony or CakePHP.
Your logic would be:
Find all the combinations that contain ALL the parameters in your parameterList and put these found combinations in a list called relevantCombinations
Find all the parameters_combinations that contain at least 1 of the combinations in the list relevantCombinations. Retrieve only the unique parameter values.
First two steps can be solved using simple Model::find methods and a forloop in the frameworks I described above.
If you are not using frameworks, it is also cool to use the scripting language raw.
If you require them in MySQL queries, here are some possible queries. Be aware that these are not necessary the best queries.
First one is
SELECT * FROM (
SELECT `PossibleList`.`CombinationId`, COUNT(`PossibleList`.`CombinationId`) as number
FROM (
SELECT `CombinationId` FROM `ParametersCombinations`
WHERE `ParameterId` IN (1, 7)
) `PossibleList` GROUP BY `PossibleList`.`CombinationId`
) `PossibleGroupedList` WHERE `number` = 2;
-- note that the (1, 7) and the number 2 needs to be supplied by your app.
-- 2 refers to the number of parameters supplied.
-- In this case you supplied 1 and 7 therefore 2.
To confirm, look at http://sqlfiddle.com/#!2/16831/3.
Note how I purposely have a Combination 3 which only has the Parameter 1 but not 7. Therefore the query did not give you back 3, but only 1 and 2. Feel free to tweak the asterisk * in the first line.
Second one is
SELECT DISTINCT(`ParameterID`)
FROM `ParametersCombinations`
WHERE `CombinationId` IN (1, 2);
-- note that (1, 2) is the result we expect from the first step.
-- the one we call relevantCombinations
To confirm, look at http://sqlfiddle.com/#!2/16831/5
I do not recommend being a masochist and attempt to get your answer in a single query.
I also do NOT recommend using the MySQL queries I have supplied. It is less masochistic. But sufficiently masochistic for me NOT to recommend this way.
Since you did not indicate any tag other than mysql, I suspect that you are stronger with mysql. Hence my answer contains mysql.
My strongest suggestion would be my first. Make full use of established frameworks and put your logic in the business logic layer. Not in the data layer. Even if you don't use frameworks and just use raw php and ruby, that is still a better place for you to place your logic in than MySQL.
I saw that T gave an answer in a single MySQL query but I can tell you that (s)he considers only 1 parameter.
See this part:
WHERE ParameterId = 7 -- 7 is the selected parameter
You can adapt his/her answer with some trickery using a forloop and appending OR clauses.
Again, I do NOT recommend that in the big picture of building an app.
I have also tested his/her answer with http://sqlfiddle.com/#!2/2eda4/2. There may be 1 or 2 small bugs.
In summary, my recommendations in descending order of strength:
Use a framework like Rails or CakePHP and the pseudocode step 1 and 2 and as many find as you need. (STRONGEST)
Use raw scripting language and the pseudocode step 1 and 2 and as many simple queries as you need.
Use the raw MySQL queries I created. (LEAST STRONG)
P.S. I left out the part in my queries as to how to get the name of the Parameters. But given that you can get the ParameterIDs from my answer, I think that is trivial. I have also left out how you may need to remove the already selected parameters (1, 7). Again, that should be trivial to you.
Try the following
SELECT p.*, pc.CombinationId
FROM Parameters p
-- get the parameter combinations for all the parameters
JOIN ParametersCombinations pc
ON pc.ParameterId = p.Id
-- filter the parameter combinations to only combinations that include the selected parameter
JOIN (
SELECT CombinationId
FROM ParametersCombinations
WHERE ParameterId = 7 -- 7 is the selected parameter
) f ON f.CombinationId = pc.CombinationId
Or removing the already selected parameters
SELECT p.*, pc.CombinationId
FROM Parameters p
JOIN ParametersCombinations pc
ON pc.ParameterId = p.Id
JOIN (
SELECT CombinationId
FROM ParametersCombinations
WHERE ParameterId IN (7, 1)
) f ON f.CombinationId = pc.CombinationId
WHERE ParameterId NOT IN (7, 1)

Populating with '0' when Data in SSRS Does not exist

I'm trying to create a report in SSRS where I have a matrix, which has gender as the column headings and specifically defined agegroups as the rows. The report is sorted by date (ie, the records being displayed are filtered by the modifedAt value). My problem is that i wish for all of the age group categories to be displayed, even if the dataset does not return any data for that row.
So, for example, if i set the date to be a date where there are no db rows where there are Age5-16 children in - I still want to display the category name, but just have the cells related to that row to display '0'. Instead, the report just drops the whole row because, obviously the query returns no data.
Is the solution to have a separate dataset that brings back the entire list of categories and then somehow fit them together? I'm stuck here so any help is appreciated!
I can think of a few ways to do this:
DataSet level
Instead of just returning the relevant data in the underlying data in the DataSet, include all the categories you want to display in all cases.
e.g. For a database query it might be the difference between an inner and left join, i.e. going from something like:
select *
from AgeGroup
inner join MyData on ...
to:
select *
from AgeGroup
left join MyData on ...
So the report always has all the age groups to display. Where there are NULL values, just display 0.
I think this is the best option if you have control over the DataSet - you won't have to update your report at all, with luck the actual DataSet changes should be minimal, there is still only one DataSet call, and it's by far the simplest to maintain.
Hard code groups into the report
Here you include a table header row for each group you want to display, so these are always displayed in all cases.
Here you have some sort of conditional expression to display the values, e.g. For each group row it will be tailored to that group:
=Sum(IIf(Fields!AgeGroup.Value = "5-16", Fields!Amount.Value, Nothing)
This is not too flexible and will need updates as you change groups, and doesn't have as many options for layout. There is still only one DataSet call, so that is a plus.
Subreports
You can have a parent DataSet that displays one row for each age group, then embed a subreport in each row that displays the data you want for that row.
This allows you flexibility in layout but it will add complexity to the report(s) and will mean that you make a lot of DataSet calls that could be avoided with other options.
I know this is old, but I wanted to elaborate on Ian's section 1 above using joins at the dataset level. (His answer was super helpful to me for a report I'm working on.)
per op:
Is the solution to have a separate dataset that brings back the entire list of categories and then somehow fit them together?
That is how I've handled it successfully, but you can do so without actually creating a separate dataset by using common table expressions (or temp tables, of course).
For these example tables:
AGE_Table
ID Group Group_Desc Toys
1 A 00-10 Teddy Bear
2 B 11-20 Video Game
3 C 21-30 Sports Car
4 D 31-40 Mansion
5 E 41-50 Jewelry
People_Table (filtered for whatever date)
ID Name Age Gender Age_Group
1 Ariel 07 F A
2 Brandon 23 M C
3 Chelsea 27 F C
4 Derek 06 M A
You want to see 2 results for the 00-10 row, 2 for the 21-30 row, and then still see rows for the other age groups even if there aren't any results.
We want to create a dataset with all the different age groupings and then join on it. Behold a solution using common table expressions:
with CTE_Age AS
(SELECT Distinct Age_Group from AGE_Table)
SELECT ID, Name, Age, Gender, CTE_Age.Age_Group FROM People_Table
RIGHT JOIN CTE_Age ON
People_Table.Age_Group = CTE_Age.Age_Group
This will return:
ID Name Age Gender Age_Group
1 Ariel 7 F A
4 Derek 6 M A
NULL NULL NULL NULL B
2 Brandon 23 M C
3 Chelsea 27 F C
NULL NULL NULL NULL D
NULL NULL NULL NULL E
Once you have that in your dataset, you can change NULL values to 0 on the report builder side -- I think in 2008R2 the default is just blank.

Count a specific value from multiple columns and group by values in another column... in mysql

Hey. I have 160 columns that are filled with data when a user fills a report form out and submit it. A few of these sets of columns contain similar data, but there needs to be multiple instance of this data per record set as it may be different per instance in the report.
For example, an employee opens a case by a certain type at one point in the day, then at another point in the day they open another case of a different type. I want to create totals per user based on the values in these columns. There is one column set that I want to target right now, case type. I would like to be able to see all instances of the value "TSTO" in columns CT1, CT2, CT3... through CT20. Then have that sorted by the employee ID number, which is just one column in the table.
Any ideas? I am struggling with this one.
So far I have SELECT CT1, CT2, CT3, CT4, CT5, CT6, CT7, CT8, CT9, CT10, CT11, CT12, CT13, CT14, CT15, CT16, CT17, CT18, CT19, CT20 FROM REPORTS GROUP BY OFFICER
This will display the values of all the case type entries in a record set but I need to count them, I tried to use,
SELECT CT1, CT2, CT3, CT4, CT5, CT6, CT7, CT8, CT9, CT10, CT11, CT12, CT13, CT14, CT15, CT16, CT17, CT18, CT19, CT20 FROM REPORTS COUNT(TSTO) GROUP BY OFFICER
but it just spits an error. I am fairly new to mysql databasing and php, I feel I have a good grasp but query'ing the database and the syntax involved is a tad bit confused and/or overwhelming right now. Just gotta learn the language. I will keep looking and I have found some similar things on here but I don't understand what I am looking at (completely) and I would like to shy away from using code that "works" but I don't understand fully.
Thank you very much :)
Edit -
So this database is an activity report server for the days work for the employees. The person will often open cases during the day. These cases vary in type, and their different types are designated by a four letter convention. So your different case types could be TSTO, DOME, ASBA, etc etc. So the user will fill out their form throughout the day then submit it down to the database. That's all fine :) Now I am trying to build a page which will query the database by user request for statistics of a user's activities. So right now I am trying to generate statistics. Specifically, I want to be able to generate the statistic of, and in human terms, "HOW MANY OCCURENCES OF "USER INPUTTED CASE TYPE" ARE THERE FOR EMPLOYEEIDXXX"
So when a user submits a form they will type in this four letter case type up to 20 times in one form, there is 20 fields for this case type entry, thus there is 20 columns. So these 20 columns for case type will be in one record set, one record set is generated per report. Another column that is generated is the employeeid column, which basically identifies who generated the record set through their form.
So I would like to be able to query all 20 columns of case type, across all record sets, for a defined type of case (TSTO, DOME, ASBA, etc etc) and then group that to corresponding user(s).
So the output would look something like,
316 TSTO's for employeeid108
I hope this helps to clear it up a bit. Again I am fairly fresh to all of this so I am not the best with the vernacular and best practices etc etc...
Thanks so much :)
Edit 2 -
So to further elaborate on what I have going on, I have an HTML form that has 164 fields. Each of these fields ultimately puts a value into a column in a single record set in my DB, each submission. I couldn't post images or more than two URLs so I will try to explain it the best I can without screenshots.
So what happens is this information gets in the DB. Then there is the query'ing. I have a search page which uses an HTML form to select the type of information to be searched for. It then displays a synopsis of each report that matches the query. The user than enters the REPORT ID # for the report they want to view in full into another small form (an input field with a submit button) which brings them to a page with the full report displayed when they click submit.
So right now I am trying to do totals and realizing my DB will be needing some work and tweaking to make it easier to create querys for it for different information needed. I've gleaned some good information so far and will continue to try and provide concise information about my setup as best I can.
Thanks.
Edit 3 -
Maybe you can go to my photobucket and check them out, should let me do one link, there is five screenshots, you can kind of see better what I have happening there.
http://s1082.photobucket.com/albums/j376/hughessa
:)
The query you are looking for would be very long and complicated for your current db schema.
Every table like (some_id, column1, column2, column3, column4... ) where columns store the same type of data can be also represented by a table (some_id, column_number, column_value ) where instead of 1 row with values for 20 columns you have 20 rows.
So your table should rather look like:
officer ct_number ct_value
1 CT1 TSTO
1 CT2 DOME
1 CT3 TSTO
1 CT4 ASBA
(...)
2 CT1 DOME
2 CT2 TSTO
For a table like this if you wanted to find how many occurences of different ct_values are there for officer 1 you would use a simple query:
SELECT officer, ct_value, count(ct_value) AS ct_count
FROM reports WHERE officer=1 GROUP BY ct_value
giving results
officer ct_value ct_count
1 TSTO 2
1 DOME 1
1 ASBA 1
If you wanted to find out how many TSTO's are there for different officers you would use:
SELECT officer, ct_value, count( officer ) as ct_count FROM reports
WHERE ct_value='TSTO' GROUP BY officer
giving results
officer ct_value ct_count
1 TSTO 2
2 TSTO 1
Also any type of query for your old schema can be easily converted to new schema.
However if you need store additional information about every particular report I suggest having two tables:
Submissions
submission_id report_id ct_number ct_value
primary key
auto-increment
------------------------------------------------
1 1 CT1 TSTO
2 1 CT2 DOME
3 1 CT3 TSTO
4 1 CT4 ASBA
5 2 CT1 DOME
6 2 CT2 TSTO
with report_id pointing to a record in another table with as many columns as you need for additional data:
Reports
report_id officer date some_other_data
primary key
auto-increment
--------------------------------------------------------------------
1 1 2011-04-29 11:28:15 Everything went ok
2 2 2011-04-29 14:01:00 There were troubles
Example:
How many TSTO's are there for different officers:
SELECT r.officer, s.ct_value, count( officer ) as ct_count
FROM submissions s JOIN reports r ON s.report_id = r.report_id
WHERE s.ct_value='TSTO'
GROUP BY r.officer