Aid with SQL query for my JAVA application

Aid with SQL query for my JAVA application - mysql

I'm not much of an SQL guy so forgive me if something similar has been asked before. I'm not even sure what I would need to search for in order to learn this. Since I only need to do something like this once I thought I could justify asking.
I'm writing one of my first android applications that needs to talk to an online database, and have successfully written a couple of SQL queries that work well with my application, but this one is slightly complicated for my basic knowledge.
Below I have provided a sample of what I need in what I feel is understandable by anyone with at least a basic knowledge of SQL. I am wondering if any kind soul would be able to help scratch up a query or give me a little insight for what I would need to do. Thanks in advance!
Pseudo Sample:
SELECT *
FROM events
WHERE user_has_event.user_user_id = user.user_id AND user_has_event.attendance = 1 OR 2
JOIN attendance
Here is a basic visual of my tables (Without user table):
Event Table User_has_event Table
----------------------------------- ---------------------------------------
|event_id|event_name|event_society| |user_user_id|event_event_id|attendance|
----------------------------------- ---------------------------------------
| | | | | | | |
Here is my desired outcome:
Outcome Table
----------------------------------------------
|event_id|event_name|event_society|attendance|
----------------------------------------------
| | | | |

Since your knowledge of SQL is basic, I'll expand a bit (well, as it turns out, rather a lot) on Andy's answer. First, the t1 and t2 are not required, but are a convenience. You can refer to a table directly, and don't have to if the field names are unique. You could do this:
SELECT
events.event_id,
events.event_name,
events.event_society,
user_has_event.attendance
FROM
events
INNER JOIN user_has_event ON events.event_id = user_has_event.event_event_id
As you can see, that is rather long-winded and tedious. So you can, when you first reference a table, immediately follow it with an abbreviation as Andy has done, and indeed as it is generally considered best practice to do. Now, you could also do this:
SELECT
event_id,
event_name,
event_society,
attendance
FROM
events
INNER JOIN user_has_event ON event_id = event_event_id
You can get away with this because all of the field names are unique in the tables accessed by your SELECT statement. Since this is often not true, it's not a good idea, since it's too easy to miss an ambiguous reference. Andy's is the best way to do it. Now, you might have gone out of your way to use different field names because you didn't know that you could reference the table using Table.Field syntax. It's often clearer to use the same field name; different people feel differently about this. I generally just use "ID" for the primary key in each table. That works because you can resolve ambiguities by using Table.Field to refer to a field.
This leads to the next thing you will find it helpful to know, which is that you can assign whatever field name you want to the output with the AS keyword. Suppose I rename your fields thus:
Event
ID
Name
Society
UserEvent
ID
EventID
Attendance
Now, have a look at this:
SELECT
e.ID AS 'Event ID',
e.Name AS 'Event Name',
e.Society AS 'Event Society',
ue.Attendance
FROM
Events e
INNER JOIN UserEvent ue ON e.ID = ue.EventID
Now you have decoupled the name of the selected field from the name of the field in the outcome, which should save you headaches down the line. An important principle is that the way that you store the data and the way that you format data output should be loosely coupled. You don't want considerations of how you want your output data to look to dictate how you should name your fields, so you need to know this stuff.
Now, let's pretend that you also have a User table (you probably do). Let's say it looks like this (it probably doesn't):
User
ID
FirstName
LastName
OtherStuff
Now, we'll modify the UserEvent table thus, to include a foreign key to the User table:
UserEvent
ID
EventID
UserID
Attendance
Now, have a look at this:
SELECT
e.Name AS 'Event Name',
e.Society AS 'Event Society',
u.LastName + ', ' + u.FirstName AS 'User Name',
ue.Attendance
FROM
Events e
JOIN UserEvent ue ON e.ID = ue.EventID
JOIN User u ON u.ID = ue.UserID
This should give you the basics, except for the WHERE clause, the basics of which you can probably pick up on your own (feel free to ask questions about the WHERE clause as well).
One side note: a JOIN is the same as an INNER JOIN, the most common type of join, representing the intersection of two sets. There are also LEFT, RIGHT, and (sometimes) OUTER joins. I generally just say JOIN rather than INNER JOIN; again, different people feel differently about this. Consistency is the most important principle here.

You can add in USER table similarly ... but for the basic output you requested see the following
SELECT
t1.event_id,
t1.event_name,
t1.event_society,
t2.attendance
FROM
events t1
INNER JOIN user_has_event t2 ON t1.event_id = t2.event_event_id

Related

How to retrieve values from a normalised MySQL 5.7 structure that match certain criterias

I am trying to normalise my MySQL 5.7 data shema and strugle with replacing the SQL queries:
At the moment there is one table containing all attributes of each article:
article_id | title | ref_id | dial_c_id
The task is to retrieve all articles which match two given attributes (ref_id and dial_c_id) and also retrieve all their other attributes.
With just one table, this is straightforward:
SELECT *
FROM test.articles_test
WHERE
ref_id = '127712'
AND dial_c_id = 51
Now in my effort to normalise, I have created a second table, which stores the attributes of each article and removed the ones in table articles:
table 1:
article_id | title
table 2:
article_id | attr_group | attribute
1 ref_id 51
1 dial_c_id 33
1 another 5
2 ..
I would like to retrieve all article details including ALL attributes which match ref_id and dial_c_id with this two table shema.
Somehow like this:
SELECT
a.article_id,
a.title,
attr.*
FROM test.articles_test a
INNER JOIN attributes attr ON a.article_id = attr.article_id
AND ref_id = '127712'
AND dial_c_id = 51
How can this be done?

You have used an Entity-Attribute-Value table to record your attributes.
This is the opposite of normalization.
Name the rule of normalization that guided you to put different attributes into the same column. You can't, because this is not a normalization practice.
To accomplish your query with your current EAV design, you need to pivot the result so you get something as if you had your original table.
SELECT * FROM (
SELECT
a.article_id,
a.title,
MAX(CASE attr_group WHEN 'ref_id' THEN attribute END) AS ref_id,
MAX(CASE attr_group WHEN 'dial_c_id' THEN attribute END) AS dial_c_id
-- ...others...
FROM test.articles_test a
INNER JOIN attributes attr ON a.article_id = attr.article_id
GROUP BY a.article_id, a.title) AS pivot
WHERE pivot.ref_id = '127712'
AND pivot.dial_c_id = 51
While the above query can produce the result you want, the performance will be terrible. It has to create a temp table for the subquery, containing all data from both tables, then apply the WHERE clause against the temp table.
You're really better off with each attribute in its own column in your original table.
I understand that you are trying to allow for many attributes in the future. This is a common problem.
See my answer to
How to design a product table for many kinds of product where each product has many parameters
But you shouldn't call it "normalised," because it isn't. It's not even denormalised. It's derelational.
You can't just use words to describe anything you want — especially not the opposite of what the word means. I can't let the air out of my bicycle tire and say "I'm inflating it."
You commented that you're trying to make your database "scalable." You also misunderstand what the word "scalable" means. By using EAV, you're creating a structure where the queries needed are difficult to write and inefficient to execute, and the data takes 10x space. It's the opposite of scalable.
What you mean is that you're trying to create a system that is extensible. This is complex to implement in SQL, but I describe several solutions in the other Stack Overflow answer to which I linked. You might also like my presentation Extensible Data Modeling with MySQL.

Learning to use Advanced features of SQL

I'm at the point where i need to learn to use more advanced features of SQL if i would like to advance my career. I confess i don't know what a join is or how to use them. i start reading queries with joins and my brain just turns to mush, but i have encountered a situation where i might benefit from using a join, but i don't know if what i want to do is even possible.
consider the following tables:
user
id | user_name | location_id
===============================
1 | r3wt | 316
location
id | state | city
=========================
316 | Clarksville | Arkansas
Now when i select the user, i get back an array like
[
id => 1,
user_name => r3wt ,
location_id =>316
]
What i would like to get back is:
[
id => 1,
user_name => r3wt ,
location_id =>[
id=>316,
city=>Clarksville,
state=>Arkansas
]
]
I'm wondering if this is possible, and if so how i might alter my quite bland select query to make this possible. Thanks, and if you need more details or want me to make what will be a pathetic attempt at figuring it out myself, i am willing to embarrass myself to learn. thank you
My pathetic little attempt which of course doesnt work:
SELECT id, user_name, location AS location_id
FROM user JOIN location ON location.id = user.location_id
Keep in mind i have absolutely no idea what i'm doing. i don't even understand what joins are and how they work, but i understand what i would like to be able to do with SQL.

You should:
SELECT u.id, u.name, u.location_id, l.city, l.state
FROM user u -- identify columns from user table with 'u.'
LEFT OUTER JOIN location l -- identify columns from location with 'l.'
ON u.location_id = l.id --- the join predicate when the two ids match
WHERE l.state = 'Arkansas' -- other selection criteria
This will give you almost what you wanted but as a single flat array:-
[
u.id => 1,
u.user_name => r3wt ,
u.location_id => 316,
l.city=>Clarksville,
l.state=>Arkansas
]
SQL results are always a two dimensional table. This is fundamental to the way SQL works! The "LEFT OUTER" means you want null values for city and state in the result if no location is present.
Just google "MYSQL TUTORIAL" and you will find dozens of sites offering on-line guides for free. Cannot recommend a particular one as I learnt all this stuff in pre-history.

Joins are used to combine 2 or more tables to get results from them all. There are several types of Joins and you can read about them here: http://www.w3schools.com/sql/sql_join.asp
It is important to remember that MySql return rows so in your example if for the discussion a user can have several locations you will not be using join - or if you do use joins you will have several rows of the user id and the different locations he have.
In your example, for a user that have one row of location you can run this query
SELECT *
FROM user
INNER JOIN location
ON user.location_id=location.id;
This query will return you rows for all users that have a location record and you will get all the fields of both tables (as we used *).

Specific MYSQL view with null values

I'm having trouble creating a view for one of my MYSQL assignments. I understand how to create a view technically, as in, the commands to do so. (I have already done a few other different views for this assignment) My problem is with how to design this particular view... I don't know how to with the knowledge I have/The way I designed my tables.
So, I have 2 relevant tables(There are 2 others but I don't think they are needed for this problem): Attendance and Scholar. I need to create a view where all scholars are listed as well as the date where they were an invited speaker. However, if they were never an Invited Speaker, the date should have a null value shown. So I need to select First Name and LastName from Scholar and ADate from Attendance. Attendance has the column AttendanceType that can be either Invited Speaker or Chairman. Attendance also has the foreign key LName, relating to LastName, and ADate obviously. I can't conceptually think of have to do this, I thought that using a join, which I'm not that experienced with would be the right choice but it didn't work...
Here's what I attempted
CREATE VIEW InvitedScholars
AS SELECT FirstName,LastName,ADate
FROM Scholar LEFT JOIN Attendance ON AttendanceType='Invited Speaker'
WHERE Lname=LastName;
This only gave me Invited Speakers, not all Scholars... I don't know how to progress... any advice would be appreciated.

You need to do your left join on the Last Name (assuming this is your key on both tables). See the SQL below:
CREATE VIEW InvitedScholars
AS SELECT FirstName,LastName,ADate
FROM Scholar LEFT JOIN Attendance ON Scholar.LastName = Attendance.LName
AND Attendance.AttendanceType = 'Invited Speaker';

It appears you have your join and where clauses mixed up. You want to join the the two tables on the last name (which invites another issue if you have two speakers with the same last name) and filter by AttendanceType
FROM Scholar LEFT JOIN Attendance ON Lname=LastName
WHERE AttendanceType='Invited Speaker'

MySQL select users on multiple criteria

My team working on a php/MySQL website for a school project. I have a table of users with typical information (ID,first name, last name, etc). I also have a table of questions with sample data like below. For this simplified example, all the answers to the questions are numerical.
Table Questions:
qid | questionText
1 | 'favorite number'
2 | 'gpa'
3 | 'number of years doing ...'
etc.
Users will have the ability fill out a form to answer any or all of these questions. Note: users are not required to answer all of the questions and the questions themselves are subject to change in the future.
The answer table looks like this:
Table Answers:
uid | qid | value
37 | 1 | 42
37 | 2 | 3.5
38 | 2 | 3.6
etc.
Now, I am working on the search page for the site. I would like the user to select what criteria they want to search on. I have something working, but I'm not sure it is efficient at all or if it will scale (not that these tables will ever be huge - like I said, it is a school project). For example, I might want to list all users whose favorite number is between 100 and 200 and whose GPA is above 2.0. Currently, I have a query builder that works (it creates a valid query that returns accurate results - as far as I can tell). A result of the query builder for this example would look like this:
SELECT u.ID, u.name (etc)
FROM User u
JOIN Answer a1 ON u.ID=a1.uid
JOIN Answer a2 ON u.ID=a2.uid
WHERE 1
AND (a1.qid=1 AND a1.value>100 AND a1.value<200)
AND (a2.qid=2 AND a2.value>2.0)
I add the WHERE 1 so that in the for loops, I can just add " AND (...)". I realize I could drop the '1' and just use implode(and,array) and add the where if array is not empty, but I figured this is equivalent. If not, I can change that easy enough.
As you can see, I add a JOIN for every criteria the searcher asks for. This also allows me to order by a1.value ASC, or a2.value, etc.
First question:
Is this table organization at least somewhat decent? We figured that since the number of questions is variable, and not every user answers every question, that something like this would be necessary.
Main question:
Is the query way too inefficient? I imagine that it is not ideal to join the same table to itself up to maybe a dozen or two times (if we end up putting that many questions in). I did some searching and found these two posts which seem to kind of touch on what I'm looking for:
Mutiple criteria in 1 query
This uses multiple nested (correct term?) queries in EXISTS
Search for products with multiple criteria
One of the comments by youssef azari mentions using 'query 1' UNION 'query 2'
Would either of these perform better/make more sense for what I'm trying to do?
Bonus question:
I left out above for simplicity's sake, but I actually have 3 tables (for number valued questions, booleans, and text)
The decision to have separate tables was because (as far as I could think of) it would either be that or have one big answers table with 3 value columns of different types, having 2 always empty.
This works with my current query builder - an example query would be
SELECT u.ID,...
FROM User u
JOIN AnswerBool b1 ON u.ID=b1.uid
JOIN AnswerNum n1 ON u.ID=n1.uid
JOIN AnswerText t1 ON u.ID=t1.uid
WHERE 1
AND (b1.qid=1 AND b1.value=true)
AND (n1.qid=16 AND n1.value<999)
AND (t1.qid=23 AND t1.value LIKE '...')
With that in mind, what is the best way to get my results?
One final piece of context:
I mentioned this is for a school project. While this is true, then eventual goal (it is an undergrad senior design project) is to have a department use our site for students creating teams for their senior design. For a rough estimate of size, every semester, the department would have somewhere around 200 or so students use our site to form teams. Obviously, when we're done, the department will (hopefully) check our site for security issues and other stuff they need to worry about (what with FERPA and all). We are trying to take into account all common security practices and scalablity concerns, but in the end, our code may be improved by others.
UPDATE
As per nnichols suggestion, I put in a decent amount of data and ran some tests on different queries. I put around 250 users in the table, and about 2000 answers in each of the 3 tables. I found the links provided very informative
(links removed because I can't hyperlink more than twice yet) Links are in nnichols' response
as well as this one that I found:
http://phpmaster.com/using-explain-to-write-better-mysql-queries/
I tried 3 different types of queries, and in the end, the one I proposed worked the best.
First: using EXISTS
SELECT u.ID,...
FROM User u WHERE 1
AND EXISTS
(SELECT * FROM AnswerNumber
WHERE uid=u.ID AND qid=# AND value>#) -- or any condition on value
AND EXISTS
(SELECT * FROM AnswerNumber
WHERE uid=u.ID AND qid=another # AND some_condition(value))
AND EXISTS
(SELECT * FROM AnswerText
...
I used 10 conditions on each of the 3 answer tables (resulting in 30 EXISTS)
Second: using IN - a very similar approach (maybe even exactly?) which yields the same results
SELECT u.ID,...
FROM User u WHERE 1
AND (u.ID) IN (SELECT uid FROM AnswerNumber WHERE qid=# AND ...)
...
again with 30 subqueries.
The third one I tried was the same as described above (using 30 JOINs)
The results of using EXPLAIN on the first two were as follows: (identical)
The primary query on table u had a type of ALL (bad, though users table is not huge) and rows searched was roughly twice the size of the user table (not sure why). Each other row in the output of EXPLAIN was a dependent query on the relevant answer table, with a type of eq_ref (good) using WHERE and key=PRIMARY KEY and only searching 1 row. Overall not bad.
For the query I suggested (JOINing):
The primary query was actually on whatever table you joined first (in my case AnswerBoolean) with type of ref (better than ALL). The number of rows searched was equal to the number of questions answered by anyone (as in 50 distinct questions have been answered by anyone) (which will be much less than the number of users). For each additional row in EXPLAIN output, it was a SIMPLE query with type eq_ref (good) using WHERE and key=PRIMARY KEY and only searching 1 row. Overall almost the same, but a smaller starting multiplier.
One final advantage to the JOIN method: it was the only one I could figure out how to order by various values (such as n1.value). Since the other two queries were using subqueries, I could not access the value of a specific subquery. Adding the order by clause did change the extra field in the first query to also have 'using temporary' (required, I believe, for order by's) and 'using filesort' (not sure how to avoid that). However, even with those slow-downs, the number of rows is still much less, and the other two (as far as I could get) cannot use order by.

You could answer most of these questions yourself with a suitably large test dataset and the use of EXPLAIN and/or the profiler.
Your INNER JOINs will almost certainly perform better than switching to EXISTS but again this is easy to test with a suitable test dataset and EXPLAIN.

SQL schema design question around relating 2 different types of ID's to one piece of information

I'm working on redesigning some parts of our schema, and I'm running into a problem where I just don't know a good clean way of doing something. I have an event table such as:
Events
--------
event_id
for each event, there could be n groups or users associated with it. So there's a table relating Events to Users to reflect that one to many relationship such as:
EventUsers
----------
event_id
user_id
The problem is that we also have a concept of groups. We want to potentially tie n groups to an event in addition to users. So, that user_id column isn't sufficient, because we need to store potentially either a user_id or a group_id.
I've thought of a variety of ways to handle this, but they all seem like a big hack. For example, I could make that a participant_id and put in a participant_type column such as:
EventUsers
----------
event_id
participant_id
participant_type
and if I wanted to get the events that user_id 10 was a part of, it could be something like:
select event_id
from EventUsers
where participant_id = 10
and participant_type = 1
(assuming that somewhere participant_type 1 was defined to be a User). But I don't like that from a philosophical point of view because when I look at the data, I don't know what the number in participant_id means unless I also look at the value in particpant_type.
I could also change EventUsers to be something like:
EventParticipants
-----------------
event_id
user_id
group_id
and allow the values of user_id and group_id to be NULL if that record is dealing with the other type of information.
Of course, I could just break EventUsers and we'll call it EventGroups into 2 different tables but I'd like to keep who is tied to an event stored in one single place if there's a good logical way to do it.
So, am I overlooking a good way to accomplish this?

Tables Events, Users and Groups represent the basic entities. They are related by EventUsers, GroupUsers and EventGroups. You need to union results together, e.g. the attendees for an event are:
select user_id
from EventUsers
where event_id = #event_id
union
select GU.user_id
from EventGroups as EG inner join
GroupUsers as GU on GU.group_id = EG.group_id
where EG.event_id = #event_id
Don't be shy about creating additional tables to represent different types of things. It is often easier to combine them, e.g. with union, than to try to sort out a mess of vague data.

Of course, I could just break EventUsers and we'll call it EventGroups into 2 different tables
This is the good logical way to do it. Create a junction table for each many-to-many relationship; one for events and users, the other for events and groups.

There's no correct answer to this question (although I'm sure if you look hard enough you'll finds some purists that believe that their approach is the correct one).
Personally, I'm a fan of the second approach because it allows you to give columns names that accurately reflect the data they contain. This makes your SELECT statements (in particular when it comes to joining) a bit easier to understand. Yeah, you'll end up with a bunch of NULL values in the column that is unused, but that's not really a big deal.
However, if you'll be joining on this table a lot, it might be wise to go with the first approach, so that the column you join on is consistently the same. Also, if you anticipate new types of participant being added in the future, which would result in a third column in EventParticipants, then you might want to go with the first approach to keep the table narrow.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008