I am new to SQL and programming in general, this could be an easy question, or it may not be, I have not clue. I just know I have not been able to find a straightforward answer. I have an excel file with a bunch of different data sheets. Each data sheet has the same data, just for different occurances. I want to be able to associate certain readings with a given value of the subject. (i.e. I want to be able to return all paces recorded during a race if the person is under 24 years old) In this situation, the paces would be recorded every minutes during a 2 hour race. That would be in a column pace, age would be in another column age. There will be a data sheet for every subject. I ultimately would like to find the average pace of all people in different age ranges (24 and under etc.) I can combine the columns with the UNION command. I am working with SQL in R. What I have looks like this:
sqlQuery(Race, paste("SELECT [PACE] FROM [Bill] UNION [STEVE]"))
I understand the WHERE clause to return rows where the given value is present. My dilemma is I have enough data it would be very time consuming to input the age for every row so that I can query the rows.
Is it possible for me to create code that asks something like "return me all table where age is less than 24?"
I'd strongly advise against putting each data sheet into its own table - just add a column DATA_SHEET to your table to differentiate between occurences.
That said, I'd probably go one step further and use these tables (assuming one data sheet represents one race):
PERSON
------
PK
Name
Age (better: Date of birth, since age changes over time)
Gender
...
RACE
----
PK
Name
Start date
...
PACE
----
PERSON_FK -- foreign key to PERSON table
RACE_FK -- foreign key to race table
PACE
...
This way, to get all paces of people younger than 24 for a given race:
select race.name race_name, person.name person_name, pace.pace
from person, race, pace
where person.pk = pace.person_fk
and race.pk = pace.race_fk
and person.age < 24
sqlQuery(Race, paste("SELECT [PACE], AVG(PACE) FROM [Bill] UNION [STEVE] WHERE columnName["age"] < 24"))
I'm not a master in SQL-server/T-SQL and I have no experience developing R, but in SQL the syntax is not so different from this.
But you could separate the queries like this (I have no clue if the syntax is good):
sqlQuery(Race, paste("SELECT [PACE] FROM [Bill] UNION [STEVE] WHERE [PACE].age < 24"))
then:
sqlQuery(Race, paste("SELECT AVG([PACE].age) FROM [Bill] UNION [STEVE]"))
The question:
Blockquote "return me all table where age is less than 24?"
Implies that you would have more than one table with a column called age. If the age applies to the runner I would suggest reviewing your schema. You should have an entity named something like:
Person
or
Runner
or
Participant
That has a column age. The trick would then be to simply move all your data into that table. Then a simple
SELECT
*
FROM
Person
WHERE
age < 24
Would return all the data you're looking for.
I think where this is getting confusing is the concept of a datasheet in excel vs. a table in SQL. Your data sheets sound like they're instances of a participant with various additional data. Instead of creating a table for each data sheet you should create a schema that fits all of your data and then fill it with each instance from your data.
Take a look here for a reference to schema design:
How to design this RDBMS schema?
Try avg() function of sql server and with where condition where ageCol < 24
Related
I have a table full of businesses each with a scannable QR Code, and another table that stores the scans the users make. Right now, the scan table schema looks like this:
id | user_id | business_id | scanned_date
If I want to create charts and analytics in the front-end of my Application for statistics about business scans I'd just get the business_id and get the business info with it, but the problem is that if a business' data is ever changed then the statistical data will also change, and it shouldn't be this way.
The first thing that came to my mind in order to have static data was to store the whole business row as a JSON String in a new column in the scan table, but it doesn't sound like a good practice. (Although storing JSON String is not advised against if the data won't be tampered with, which won't since it's supposed to be static).
Another thing that I thought of was to make a clone table out of the business table's schema, but that'd mean to work twice whenever I want to make changes to the original one since I must also change the cloned one.
You need a way to represent the history of the businesses' data in your database.
You didn't mention what attributes you store in each business's row, so I will guess. Let's say you have these columns
business_id
name
category
qr_code
website
Your problem is this: if you change any attribute of the business, the old value vanishes.
Here's a solution to that problem. Add start and end columns to the table. They should probably have TIMESTAMP data types.
Then, never DELETE rows from the table. When you UPDATE them, only change the value of the end column. Instead add new rows. Let me explain.
For a row to be active at the time NOW(), it must pass these WHERE criteria:
start_date >= NOW()
AND (end_date IS NULL OR end_date < NOW())
Let's say you start with two businesses in the table.
business_id start end name category qr_code website
1 2019-05-01 NULL Joe's tavern lkjhg12 joes.example.com
2 2019-05-01 NULL Acme rockets sdlfj48 acme.example.com
Good: You can count QR code scans day by day with this query
SELECT COUNT(*), DATE(s.scanned_date) day, b.name
FROM business b
JOIN scan s ON b.business_id = s.business_id
AND b.start >= s.scanned_date
AND (b.end IS NULL OR b.end < s.scanned_date)
GROUP BY DATE(s.scanned_date), b.name
Now, suppose Joe sells his tavern and its name changes. To represent that change you must UPDATE the existing row for Joe's to set the end date, and then INSERT a new row with the new data. Afterward, your table looks like this
business_id start end name category qr_code website
(updated) 1 2019-05-01 2019-05-24 Joe's tavern lkjhg12 joes.example.com
(inserted) 1 2019-05-24 NULL Fancy tavern lkjhg12 fancy.example.com
(unchanged) 2 2019-05-01 NULL Acme rockets sdlfj48 acme.example.com
The query above still works, because it takes into account the start and end dates of the changes.
This approach works best when you have many more scans than changes to businesses. That seems likely in this case.
Your business table needs a composite primary key (business_id, start).
Prof. Richard Snodgrass wrote a book on this subject, Developing Time-Oriented Database Applications in SQL, and generously made a pdf available.
I hope I got your question.
You could try having duplicates in the business table. Instead of editing the business, try adding a new one with a new id. When you are editing your business, instead of updating the existing one, you can INSERT a new one. The stats will use the old id and will not get affected by the changes. When you are trying to get the last business info, try sorting them according to their ids to get the last one. That way you won't need a second table for business data.
Edit: If the business id needs to be specific to a business, instead of using the business id, you can add a column that represents the insertion of data to the table. Again, you can use sorting limiting the query to get the last one.
Edit 2:
Removing entities that were inserted a certain amount of time ago
If you don't need the stats from a month ago, you could remove them from businesses to save up space. You can use the new time column you created to get the time difference and check if it is greater than the range you want.
I am asking this question which is to teach myself of using correct approach in a certain scenario than any how-to-code problem.
Since I am self taught student and haven't used relational tables before. With search and experiment, I have come to know the basic concept of relations and their usage but I am not sure if I am still using the correct approach while using these tables.
I do not have any official teachers so only place I can ask troubling questions is here with you guys.
For example, I have written a little code where I have 2 tables.
Table-1 is doctors which has an id (AI & Primary) and names table of varChar.
Table-2 is patient_recipts which has a doctor_name table of tinyInt
names table hold the name of the doctor
doctor_name table holds the corresponding id from doctors table
name and doctor_name are related to each other in database
Now when I need to fetch data from patient_recipts and display doctor's name, I will need to INNER JOIN doctor table, compare the doctor_name value with id in doctor table and get the name of the doctor.
The query I will use to fetch patients of a certain doctor, is something like,
$getPatList = $db->prepare("SELECT *
FROM patient_recipts
INNER JOIN doctor ON patient_recipts.doctor_name = doctor.id
WHERE dept = 'OPD' AND date_time = DATE(NOW())
ORDER BY patient_recipts.id DESC");
Now if I were to INSERT an action log entry in some other processor file, it would be something like (action and log entry),
$recipt_no = $_POST['recipt_no'];
$doctor_name = $_POST['doctor_name']; //this hold id(int) not text
$dept = $_POST['dept'];
$patient_name = $_POST['patient_name'];
$patient_tel = $_POST['patient_telephone'];
$patient_addr = $_POST['patient_address'];
$patient_age = $_POST['patient_age'];
$patient_gender = $_POST['patient_gender'];
$patient_fee = $_POST['patient_fee'];
$logged_user = $_SESSION['user_name'];
$insData = $db->prepare("
INSERT INTO patient_recipts (date_time, recipt_no, doctor_name, dept, pat_gender, pat_name, pat_tel, pat_address, pat_age, pat_fee, booked_by)
VALUES (NOW(),?,?,?,?,?,?,?,?,?,?)");
$insData->bindValue(1,$recipt_no);
$insData->bindValue(2,$doctor_name);
$insData->bindValue(3,$dept);
$insData->bindValue(4,$patient_gender);
$insData->bindValue(5,$patient_name);
$insData->bindValue(6,$patient_tel);
$insData->bindValue(7,$patient_addr);
$insData->bindValue(8,$patient_age);
$insData->bindValue(9,$patient_fee);
$insData->bindValue(10,$logged_user);
$insData->execute();
// Add Log
write_log("{$logged_user} booked OPD of patient {$patient_name} for {$doctor_name}");
OUTPUT: Ayesha booked OPD of patient Steve for 15
Now here the problem is apparent, I would need to execute the above mentioned fetch query yet again to get name of the doctor with ID comparison and bind the ID 15 to Doctor's name before calling the write_log() function.
So this is where I think my approach has been wrong altogether.
One way could be using actual doctor name in patient_recipts rather than ID
but this would i, in first place, kill the purpose of learning related tables and keys, learning design scenarios and troubleshooting.
Please help so I can understand and implement a better approach for days to come :)
Your table structure is correct, it's considered best practice to use the ID as the foreign key in other tables. If you want to include the doctor's name in the log message, you do have to do another SELECT query. A query like
SELECT name
FROM doctor
WHERE id = :doctor_id
is not very expensive.
But you can simply live with the log file only containing IDs. Look up the doctor's name later if you need to find out which doctor a particular log message is referring to.
BTW, when you use PDO, I recommend you use named placeholders (as in my example above) rather than ?. It makes the code easier to read, and if you modify the query to add or remove columns you don't have to change all the placeholder numbers.
I have a Microsoft Access table of data with 3 fields: "part_number", "date_of_price_change" and "new_price", but I need to convert the "new_price" field to show the "price_change", rather than the full "new_price" of each part on each date.
This will obviously involve some process that looks at each unique part number on each date and looks up the price of the record with the same part number with the next earliest date and deduct the prices to get the price change.
Problem is, I have no idea how to do this in Access and there are too many records for Excel. Can anyone assist with how to do this in Access? (Note that date changes can happen any time and are not periodic).
Many thanks in advance.
Ana
Add the new column price_change as a Money data type, then run a query something like below. Make sure you backup the table first with an APPEND table to a new table, just in case. Since it is a new column i may not matter.
UPDATE
T1
SET
T1.price_change = T1.new_price - Nz((SELECT top 1 T2.new_price from MyTable T2 WHERE T2.part_number = T1.Part_Number ORDER BY date_of_price_change DESC),0)
FROM
MyTable T1
I'm currently working on a survey creation/administration web application with PHP/MySQL. I have gone through several revisions of the database tables, and I once again find that I may need to rethink the storage of a certain type of answer.
Right now, I have a table that looks like this:
survey_answers
id PK
eid
sesid
intvalue Nullable
charvalue Nullable
id = unique value assigned to each row
eid = Survey question that this answer is in reply to
sesid = The survey 'session' (information about the time and date of a survey take) id
intvalue = The value of the answer if it is a numerical value
charvalue = the value of the answer if it is a textual representation
This allowed me to continue using MySQL's mathematical functions to speed up processing.
I have however found a new challenge: storing questions that have multiple responses.
An example would be:
Which of the following do you enjoy eating? (choose all the apply)
Girl Scout Cookies
Bacon
Corn
Whale Fat
Now, when I want to store the result, I'm not sure of the best way to handle it.
Currently, I have a table just for multiple choice options that looks like this:
survey_element_options
id PK
eid
value
id = unique value associated with each row
eid = question/element that this option is associated with
value = textual value of that option
With this setup, I then store my returned multiple selection answers in 'survey_answers' as strings of comma separated id's of the element_options rows that were selected in the survey. (ie something like "4,6,7,9") I'm wondering if that is indeed the best solution, or if it would be more practical to create a new table that would hold each answer chosen, and then reference back to a given answer row which in turn references back to the element and ultimately the survey.
EDIT
for anyone interested, here is the approach I ended up taking (In PhpMyAdmin Relations View):
And a rudimentary query to gather the counts for a multiple select question would look like this:
SELECT e.question AS question, eo.value AS value, COUNT(eo.value) AS count
FROM survey_elements e, survey_element_options eo, survey_answer_options ao
WHERE e.id = 19
AND eo.eid = e.id
AND ao.oid = eo.id
GROUP BY eo.value
This really depends on a lot of things.
Generally, storing lists of comma separated values in a database is bad, especially if you plan to do anything remotely intelligent with that data. Especially if you want to do any kind of advanced reporting on the answers.
The best relational way to store this is to also define the answers in a second table and then link them to the users response to a question in a third table (with multiple entries per user-question, or possibly user-survey-question if the user could take multiple surveys with the same question on it.
This can get slightly complex as a a possible scenario as a simple example:
Example tables:
Users (Username, UserID)
Questions (qID, QuestionsText)
Answers (AnswerText [in this case example could be reusable, but this does cause an extra layer of complexity as well], aID)
Question_Answers ([Available answers for this question, multiple entries per question] qaID, qID, aID),
UserQuestionAnswers (qaID, uID)
Note: Meant as an example, not a recommendation
Convert primary key to not unique index and add answers for the same question under the same id.
For example.
id | eid | sesid | intval | charval
3 45 30 2
3 45 30 4
You can still add another column for regular unique PK if needed.
Keep things simple. No need for relation here.
It's a horses for courses thing really.
You can store as a comma separated string (But then what happens when you have a literal comma in one of your answers).
You can store as a one-to-many table, such as:
survey_element_answers
id PK
survey_answers_id FK
intvalue Nullable
charvalue Nullable
And then loop over that table. If you picked one answer, it would create one row in this table. If you pick two answers, it will create two rows in this table, etc. Then you would remove the intvalue and charvalue from the survey_answers table.
Another choice, since you're already storing the element options in their own table, is to create a many-to-many table, such as:
survey_element_answers
id PK
survey_answers_id FK
survey_element_options_id FK
Again, one row per option selected.
Another option yet again is to store a bitmask value. This will remove the need for a many-to-many table.
survey_element_options
id PK
eid FK
value Text
optionnumber unique for each eid
optionbitmask 2 ^ optionnumber
optionnumber should be unique for each eid, and increment starting with one. There will impose a limit of 63 options if you are using bigint, or 31 options if you are using int.
And then in your survey_answers
id PK
eid
sesid
answerbitmask bigint
Answerbitmask is calculated by adding all of the optionbitmask's together, for each option the user selected. For example, if 7 were stored in Answerbitmask, then that means that the user selected the first three options.
Joins can be done by:
WHERE survey_answers.answerbitmask & survey_element_options.optionbitmask > 0
So yeah, there's a few options to consider.
If you don't use the id as a foreign key in another query, or if you can query results using the sesid, try a many to one relationship.
Otherwise I'd store multiple choice answers as a serialized array, such as JSON or through php's serialize() function.
I have 3 tables in Mysql 5.
Table Client: ID, Username, Password.
Table Client_Data: ID, Dataname
Table Client_Client_Data: client_id, Data_id, Value
The idea is that I can have the user of this software determine which information he wants to get from his clients. The Client_Data table would typically be filled with "First Name", "Last Name", "Address" and so on. The third table will join the tables together. An example:
Client: ID=1 Username=Bert01 Password=92382938v2nvn239
Client_Data: ID=1 Dataname=First Name
Client_Client_Data: client_id=1 data_id=1 value=Bert
This would mean that Bert01 has a first name "Bert" when joining the tables in a select query.
I'm displaying all this in a table where the columns are the DataName values (if you lost me here: the headers would be like "First Name", "Last Name" and so on). I want to be able to sort this data alphabetically for each column.
My solution was to use 2 queries. The first one would collect the data with WHERE Client_Data.Dataname = $sortBy ORDER BY Client_Client_Data.value and the second query would then collect the other data with WHERE Client.ID = 1 OR 2 OR 3 containing all of the ID's collected in the first query. This is working great.
The problem that has been playing in my mind for a long time now is when I want to search my data. This would not be too hard if it weren't for the sorting. After the search has been done the table would contain the results, but this table has to be sorted the same way as before.
Does anyone have any idea on how to do this without bothering the webserver's memory by looping through potentially thousands of clients? (meaning: i want to do this in Mysql).
If your solution would require altering the tables without losing the capability of storing this kind of data: that would be no problem.
you could relocate the looping. make a select from all the datatypes
Select * from Client_Data
then use that info to build a query like so (psuedo code)
orderby = "name"
query = "select *"
foreach(datatypes as dt){
query += ",(select d.value from Client_Client_Data as d where d.data_id="+dt.ID+" and d.client_id=cl.ID) as "+dt.Dataname
}
query = "from Client as cl order by "+orderby;
this will result in a table with all the available datatypes transfered into a column and the corresponding value connected to the correct client trough d.client_id=cl.ID
whereas cl.ID refers to the main queries client id and matches it against Client_Client_Data.client_id
now beware i am not entirely sure about the subqueries being more efficient. would require some testing