Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
Fairly new to SQL - Using MySQL with PhpMyAdmin - I have two CSV sheets that have information that need to be merged into a new CSV while removing all other inapplicable columns. The columns I need are email, user_id, and need to create a new column titled "role"
The email column is different on each file, and the end result would be 74 lines, not 100 (not all emails will match). So far, I've been able to use join to get the columns I need. What I'm struggling with is creating a new table with "role" and having the data transfer over.
The table is created, with the right columns, but the values from example and test do not appear.
What I started with:
Select example.user_name,
test.email
from example
join test
on example.user_email = test.email
Where I got to:
CREATE table test2 (role text)
select email, user_name
from (
Select example.user_name,test.email
from example
join test
on example.user_email = test.email)
As Test
I've been at this for 4 hours (took me a good chunk of time just to get to this point) and this hurdle has been the most difficult. Any help would be greatly appreciated!
Table 1 Example.
User ID :1111.
User_email: example.com.
Table 2 Test
first_name: Tom
last_name: Laugh.
email: example.com
The first query joins these two together. The third table would need to add "role" which would be a defined value and would look like
Table 3 All together.
Role: onboarding.
user_id: 1111.
user_email: example.com
Hope that helps clarify things! Apologies for the confusion.
CREATE TABLE test2 ( user_email VARCHAR(255),
user_id INT PRIMARY KEY,
user_name VARCHAR(255),
role TEXT)
SELECT user_email,
example.user_id,
CONCAT_WS(' ', test.first_name, test.last_name) user_name,
'onboarding' role
FROM example
JOIN test USING (user_email);
Specify new table structure completely. This eliminates some possible problems in future (for example, you may obtain string columns which are shorter than you need).
Specify column names in SELECT part completely with strict matching to those in table definition - if not then destination columns would be empty but excess columns with unneeded (and sometimes strange) names will be added.
Always use absolutely the same names for matching columns in different tables until it is impossible.
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=5e7c77aa8ad2f5f44e1851bd3ebe17e7
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have a website for a project that needs to summarize all of the budget categories in one column.
For example I have a column which contains:
Categories:
Water,Electricity,Gas,Rentals,Hospital Fees,Medicine,Personal Care,Fitness,
I want to select the sum of
water,electricity,gas,rentals
and name it as utility bills.
Same as sum of
hospital fees, medicine, personal care, fitness
as healthcare.
What sql statement should i use?
Any help will be appreciated
You'd have some other table perhaps, or another column on this table, that maps the specific bills to a general group or category
You would then run a query like (if you put the category group in the main table)
SELECT categorygroup, sum(amount)
FROM bills
GROUP BY categorygroup
Or (if you have a separate table you join in)
SELECT bcg.categorygroup, sum(amount)
FROM bills b INNER JOIN billcategorygroups bcg ON b.category=bcg.category
GROUP BY bcg.categorygroup
You would then maintain the tables, either like (category in main table style):
Bills
Category, CategoryGroup, Amount
---
Electricity, Utility, 123
Water, Utility, 456
Or (separate table to map categories with groups style)
BillCategoryGroups
Category, CategoryGroup
---
Water, Utility
Electricity, Utility
Etc
Something has to map electricity -> utility, water -> utility etc. I'd probably have a separate table because it is easy to reorganize. If you decide that Cellular is no longer Utility but instead Personal then just changing it in the mapping table will change all the reporting. It also helps prevent typos and data entry errors affecting reports - if you use the single table route and put even one Electricity bill down as Utitily then it gets its own line on the report. Adding new categories is easy with a separate table too. All these things can be done with single table and big update statements etc but we have "normalization" of data for good reasons
You may use conditional aggregation. Like
SELECT project,
SUM(CASE WHEN category IN ('water','electricity','gas','rentals')
THEN spent
ELSE 0
END) AS bills,
SUM(CASE WHEN category IN ('hospital fees','medicine','personal care','fitness')
THEN spent
ELSE 0
END) AS healthcare
FROM datatable
GROUP BY project;
But the data normalization is the best option. All categories must be moved to separate table. See Caius Jard's answer.
I am developing a database on online quiz management
one table stdinfo stores usernames and student details
table testinfo stores the testid name subjects and their marking schemes(separate for each subject) in form of multiple rows like
and one table question has all the questions with their qids
and one table records the responses of students
Now there is a column marks in responses which displays the marks obtained in that question.
Since i want to generate the result i have already inserted the responses and their ids/usernames. Now i want to insert marks for calculate result;
what it should do
set marks=0 where the response is null ;
set marks=posmark obtained from the testinfo table for respective subject if the response is correct
set marks=negmark obtained from the testinfo table for respective subject if the response is incorrect
table structures are given below;
I'm reposting the answer in a more elaborate manner since my prevoius answer was deleted.
I found it myself and also found the answer by #Jirka49 helpful. See link below.
See the last section in the webpage
So, we need to avoid the use of the IN operator.
So create a temporary table during the query which contains details of the questions,test information and question information(the positive marks) wherever the student has answered correctly.
Now use this table to update the responses table by where clause.
So the query becomes:
update responses,
(select posmark,responses.testid,responses.qid,responses.stid from testinfo,responses,question
where testinfo.testid=responses.testid and responses.testid="<whatever testid>" and question.qid=responses.qid and question.cans=responses.response and question.subcode=testinfo.subcode and responses.stid='<for whichever student>')as temp
set responses.marks=temp.posmark where responses.qid=temp.qid and temp.testid=responses.testid and responses.stid=temp.stid;
Similar thing can be done for negative marks too. And check wheverever answers are null a simple query will assign zero marks.
see the answer for your question [here][1] https://forums.mysql.com/read.php?20,85813,85816#msg-85816
or look at this:
UPDATE a
INNER JOIN b USING (id)
SET a.firstname='Pekka', a.lastname='Kuronen',
b.companyname='Suomi Oy',companyaddress='Mannerheimtie 123, Helsinki Suomi'
WHERE a.id=1;
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
Let's say a user has posts table like this:
Post with id=1 is the first post that a user has posted. Post with an id=2 – is the edit that was made to the post, with id=3 – latest current version of the post.
post_param_a cannot be changed throughout versions, as well as user_id – they always stay the same since the first version. So we could store it like this:
So the question is: would it be better to store it the second way, with no duplication? This way, to get a current version of user's post we'd have to join the first version and check its user_id all the time. Or is it okay to store duplicate fields in this case?
p.s. this is questioned because we want to avoid duplication and accident changes of values that cannot be changed throughout versions, so we want to be storing them all in one place
Take the entity Post and look at the simple tuple:
ID User_ID Post_Param_A Comment
1 69 foo This is a post
This is perfectly normalized. However, the post may undergo editing and you want to track the changes made. So you add another field to track the changes. Instead of an incremental value, however, it would make more sense to add a datetime field.
ID EffDate User_ID Post_Param_A Comment
1 1/1/16 12:00 69 foo This is a post
This has two advantages: 1) if you track the changes, you will want to know anyway when this version was saved and 2) you don't have to find the largest incremental value for the post to find out what value to save with each new version. Just save the current date and time.
However, with either an incremental value or date, there is a problem. In the simple row, each field has a function dependency on the PK. In the version row, User_ID and Post_Param_A maintain their dependency on the PK but Comment is now dependent on the PK and EffDate.
The tuple is no longer in 2nf.
So the solution is a simple matter of normalizing it:
ID User_ID Post_Param_A
1 69 foo
ID EffDate Comment
1 1/1/16 12:00 This is a post
1 1/1/17 12:00 An edit was made
1 1/1/17 15:00 The last and current version (so far)
with (ID, EffDate) the composite PK in the new table.
The query to read the latest post is a bit complicated:
select p.ID, v.EffDate, p.User_ID, p.Post_Param_A, v.Comment
from Posts p
join PostVersions v
on v.ID = p.ID
and v.EffDate = (
select Max( v1.EffDate )
from PostVersions v1
where v1.ID = p.ID
and v1.EffDate <= today )
and p.ID = 1;
This is not really as complicated as it looks and it is impressively fast. The really neat feature is -- if you replace "today" with, say, 1/1/17 13:00, the result will be the second version. So you can query the present or the past using the same query.
Another neat feature is achieved by creating a view from the "today" query with the last line ("and p.ID = 1") removed. This view will expose the latest version of all posts. Create triggers on the view and this allows the apps that are only interested in the current version to do their work without consideration of the underlying structure.
You could have a separate table where you store the post_param_a for each post_id, then you wouldn't need to have NULL values or duplicate values.
The 1st solution is better because user_id is aligned with the post_id and avoid various interpretations.
This way, to get a current version of user's post we'd have to join the first version and check its user_id all the time.
Do you think about adding a field timestamp, so that you can always get the last version of a post?
In the 2nd solution, NULL could be ambiguous when the data grow. And even querying will be difficult, every SQL should be well designed to think about the NULL cases and their specific meanings.
The 3rd solution could be a normalization of your table using 2 separated ones, e.g. post and post_history. As you mentioned in the question that post_param_a cannot be changed throughout versions, as well as user_id – they always stay the same since the first version. In this case,
In table post, you can store information related to the post which are permanent (won't be changed): id, param_a, user_id, created_at ...
In table post_history, you can store informations related to the post which are related to each version / modification: version_id, comment, modified_at ... And you can add a FK constraint for the second table which indicates post_history.post_id = post.id
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I'm trying to figure out a good way to build a database for events. I have a client that has a list of customer names and promo codes. A customer on the list can go to a landing page, fill out the promo code and choose an event from a drop down field they would like to attend. He currently has 4 events ready to go.
In the database, should I create 4 tables, one for each event with customers or separate the customers from the event tables (ie...customer table and 4 event tables). There might be more events in the future so scalable options would be preferred.
Also, each customer is only aloud a maximum number of 4 tickets and they can only use the promo code once.
Thanks!
Jay is correct that a complete answer would be quite long, but I'll offer a few starting pointers nonetheless as it sounds like you're quite new to database architecture.
As a general principle, you should never build a schema that involves adding/removing tables at run time. The relationship you're looking for between customers and events is many-to-many, which in MySQL would use a junction table. An example schema would look like this:
customer
customer_id (primary key)
email, name, etc.
event
event_id (primary key)
name, time, etc.
ticket
ticket_id (primary key)
customer_id (index)
event_id (index)
date_purchased, etc.
Rules like "each customer is only allowed 4 tickets" should be implemented at a code level rather than a schema level since that is subject to change and your schema should be flexible enough to accommodate that change, tempting as it may be to have four columns in the customers table for the four tickets.
To get the events that customer ID 1 is attending:
SELECT DISTINCT event.*
FROM ticket
LEFT JOIN event ON ticket.event_id = event.event_id
WHERE ticket.customer_id = 1
To get the customers attending event ID 1:
SELECT DISTINCT customer.*
FROM ticket
LEFT JOIN customer ON ticket.customer_id = customer.customer_id
WHERE ticket.event_id = 1
A common format for junction tables is to combine the two table names, as in event_customer, but in this case calling it ticket makes more sense, since you might be including additional information about the ticket purchase in that table.
I'm currently working on a survey creation/administration web application with PHP/MySQL. I have gone through several revisions of the database tables, and I once again find that I may need to rethink the storage of a certain type of answer.
Right now, I have a table that looks like this:
survey_answers
id PK
eid
sesid
intvalue Nullable
charvalue Nullable
id = unique value assigned to each row
eid = Survey question that this answer is in reply to
sesid = The survey 'session' (information about the time and date of a survey take) id
intvalue = The value of the answer if it is a numerical value
charvalue = the value of the answer if it is a textual representation
This allowed me to continue using MySQL's mathematical functions to speed up processing.
I have however found a new challenge: storing questions that have multiple responses.
An example would be:
Which of the following do you enjoy eating? (choose all the apply)
Girl Scout Cookies
Bacon
Corn
Whale Fat
Now, when I want to store the result, I'm not sure of the best way to handle it.
Currently, I have a table just for multiple choice options that looks like this:
survey_element_options
id PK
eid
value
id = unique value associated with each row
eid = question/element that this option is associated with
value = textual value of that option
With this setup, I then store my returned multiple selection answers in 'survey_answers' as strings of comma separated id's of the element_options rows that were selected in the survey. (ie something like "4,6,7,9") I'm wondering if that is indeed the best solution, or if it would be more practical to create a new table that would hold each answer chosen, and then reference back to a given answer row which in turn references back to the element and ultimately the survey.
EDIT
for anyone interested, here is the approach I ended up taking (In PhpMyAdmin Relations View):
And a rudimentary query to gather the counts for a multiple select question would look like this:
SELECT e.question AS question, eo.value AS value, COUNT(eo.value) AS count
FROM survey_elements e, survey_element_options eo, survey_answer_options ao
WHERE e.id = 19
AND eo.eid = e.id
AND ao.oid = eo.id
GROUP BY eo.value
This really depends on a lot of things.
Generally, storing lists of comma separated values in a database is bad, especially if you plan to do anything remotely intelligent with that data. Especially if you want to do any kind of advanced reporting on the answers.
The best relational way to store this is to also define the answers in a second table and then link them to the users response to a question in a third table (with multiple entries per user-question, or possibly user-survey-question if the user could take multiple surveys with the same question on it.
This can get slightly complex as a a possible scenario as a simple example:
Example tables:
Users (Username, UserID)
Questions (qID, QuestionsText)
Answers (AnswerText [in this case example could be reusable, but this does cause an extra layer of complexity as well], aID)
Question_Answers ([Available answers for this question, multiple entries per question] qaID, qID, aID),
UserQuestionAnswers (qaID, uID)
Note: Meant as an example, not a recommendation
Convert primary key to not unique index and add answers for the same question under the same id.
For example.
id | eid | sesid | intval | charval
3 45 30 2
3 45 30 4
You can still add another column for regular unique PK if needed.
Keep things simple. No need for relation here.
It's a horses for courses thing really.
You can store as a comma separated string (But then what happens when you have a literal comma in one of your answers).
You can store as a one-to-many table, such as:
survey_element_answers
id PK
survey_answers_id FK
intvalue Nullable
charvalue Nullable
And then loop over that table. If you picked one answer, it would create one row in this table. If you pick two answers, it will create two rows in this table, etc. Then you would remove the intvalue and charvalue from the survey_answers table.
Another choice, since you're already storing the element options in their own table, is to create a many-to-many table, such as:
survey_element_answers
id PK
survey_answers_id FK
survey_element_options_id FK
Again, one row per option selected.
Another option yet again is to store a bitmask value. This will remove the need for a many-to-many table.
survey_element_options
id PK
eid FK
value Text
optionnumber unique for each eid
optionbitmask 2 ^ optionnumber
optionnumber should be unique for each eid, and increment starting with one. There will impose a limit of 63 options if you are using bigint, or 31 options if you are using int.
And then in your survey_answers
id PK
eid
sesid
answerbitmask bigint
Answerbitmask is calculated by adding all of the optionbitmask's together, for each option the user selected. For example, if 7 were stored in Answerbitmask, then that means that the user selected the first three options.
Joins can be done by:
WHERE survey_answers.answerbitmask & survey_element_options.optionbitmask > 0
So yeah, there's a few options to consider.
If you don't use the id as a foreign key in another query, or if you can query results using the sesid, try a many to one relationship.
Otherwise I'd store multiple choice answers as a serialized array, such as JSON or through php's serialize() function.