access - select representative sample data from live database - ms-access

We are regularly running controls of employees (based on employee_id) doing different types of transactions (based on transaction_id). To do so we are extracting a sample of live production data (sized by the Loi de Poisson).
The question is:
How can I make sure that every type of transaction is part of the sample, and that each employee is also being controlled in the sample, knowing that there is not always a combination of both.
For example, this is my live data (of course, mine is much larger).
Transaction_id Employee_id
100 123456
100 234567
101 123456
203 234567
203 345678
301 234567
I must make sure that in my sample, all transaction_id is selected at least once (100, 101, 203 and 301). I must also make sure that all my employees are selected at least once (123456, 234567, 345678). So the sample should be
301, 234567 -> only possibility
203, 345678 -> only possibility because if I select the other employee_id 234567, then employee_id 345678 will never be controlled
101, 123456 -> only possibility
100, 123456 or 234567
How would you advise I write the SQL statements to ensure that the sample data contains at least one occurrence of each transaction_id and at least one occurrence of employee_id ?

Related

Nested sort in SELECT followed by Conditional INSERT based upon results of SELECT inquiry

I have been struggling with the following for some time.
The server I am using has MySQL ver 5.7 installed.
The issue:
I wish to take recorded tank level readings from one table, find the difference between the last two records for a particular tank, and multiply this by a factor to get a quantity used.
The extracted quantity, if it is +ve, else 0 , then to be inserted into another table for further use.
The Quant value extracted may be +ve or -ve as tanks fill and empty. I only require the used quantity -ie falling level.
The two following tables are used:
Table 'tf_rdgs' sample;
value 1 is content height.
id
location
value1
reading_time
1
18
1500
2
18
1340
3
9
1600
4
18
1200
5
9
1400
6
18
1765
yyyy
7
18
1642
xxxx
Table 'flow' example
id
location
Quant
reading_time
1
18
5634
dd-mm: HH-mm
2
18
0
dd-mm: HH-mm
3
18
123
current time
I do not require to go back over history and am only interested in the latest level readings as a new level reading is inserted.
I can get the following to work with a table of only one location.
INSERT INTO flow (location, Quant)
SELECT t1.location, (t2.value1 - t1.value1) AS Quant
FROM tf_rdgs t1 cross join tf_rdgs t2 on t1.reading_time > t2.reading_time
ORDER BY t2.reading_time DESC limit 1
It is not particularly efficient but works and gives the following return from the above table.
location
Quant
18
123
for a table with mixed locations including a WHERE t1.location = ... statement does not work.
The problems i am struggling with are
How to nest the initial sorting by location for the subsequent inquiry of difference between the last two tank level readings.
A singular location search is ok rather than all tanks.
A Conditional INSERT to insert the 'Quant' value only if it is +ve or else insert a 0 if it is -ve (ie filling)
I have tried many permutations on these without success.
Once the above has been achieved it needs to run on a conditional trigger - based upon location of inserted data - in the tf_rdgs table activated upon each new reading inserted from the sensors on a particular tank.
I can achieve the above with the exception of the conditional insert if each tank had a dedicated table but unfortunately I cant go there due existing data structure and usage.
Any direction or assitance on parts or whole of this much appreciated.

processing MySQL data when there are field values inserted with commas

I have some columns in mysql table with field vaues are seperated with commas. fields like IP address and running_port_ids, dns_range or subnet etc. running a cron to check every hour whether the ports are used or not on the appliance. if ports are used against each appliance running_port_ids(like 2,3,7) are inserted with comma seperated values.
How to process the data so that i can get a reports which ports are less used (i have a static list of port ids) in ascending order like below by grouping of address, running_port_ids and insert date for a date range of one month.
address port usage%
10.2.1.3 3 1
10.3.21.22 2 20
there are thousands of record now in the table with comma seperated running_port_ids. is there any methods available in MySql to do this?
Any help much appreciated.
If you can convert your data model to a n:m relation (or "link table"), i.e. normalize your data model, this is pretty easy using grouping (or "aggregate") functions. So I'd advise to revise your data model and introduce a table containing one row for each of the ports, in stead of storing this de-normalized in a text column.
A typical example would be: "student has many classes", and a property of this relation is "attendance":
Student
id name
1 John
2 Jane
Course
id name
1 Engineering
2 Databases
Class
id courseid date room
1 1 2015-08-05 10:00:00 301
2 1 2015-08-13 10:00:00 301
3 1 2015-09-03 10:00:00 301
StudentClass
studentid classid attendance
1 1 TRUE
1 2 FALSE
1 3 NULL
2 1 TRUE
2 2 TRUE
2 3 NULL
In this case, you can see the relation between student and class is normalized, i.e. every other value is stored vertically in stead of horizontally. This way, you can easily query things like "How many classes did John miss?" or "How many students did not miss any class". NULL in the example shows that we can not yet tell anything about the attendance (as the date is in the future), but we do know that they should attend.
This is the way you should keep track of properties of a relation between two things. I can't really make out what you're trying to build, but I'm pretty sure you need a similar model.
Hope this helps.

Access Calculated Field

I am having difficulty trying to make a calculated field that I need. So here is what I am trying to do:
I have a query that combines the information based on three tables. The most important fields that for the application are as follows:
Family Income Age Patient
15,000 18 Yes
28,000 25 No
30,000 1 Yes
From here I want to make a calculated field that gives the correct program the patient was enrolled in. based on these fields ie:
Program Minimum Income Maximum Income Minimum Age Maximum Age Patient
Children's 0 20,000 1 19 Yes
Adult 0 12,000 19 65 No
Non Patient 0 20,000 1 19 No
Adult 2 12,000 50,000 19 65 No
Etc.
to create:
Family Income Age Patient Program
15,000 18 Yes Children's
28,000 25 No Adult 2
30,000 1 Yes Children's 2
I know I can use IIf to hard code it in to the field, but then it will be really difficult for other people to update the information as the guidelines change. Is it possible to have the information stored in a table? and use the information on the table form etc, or will I need to use IIf
Any Ideas? is it possible to dynamically create the IIf in SQL using VBA while pulling the information from the table?
EDIT:::
Thank you for your response and for formatting my tables, I still have no idea how you changed it, but it looks amazing!
I tried to add the SQL you added down below, but I was not able to make it work. I'm not sure if I made a mistake so I included the SQL of my Query. The query currently returns 0 values, so I think I messed something up. (The real Query is embarassing...I'm sorry for that). Unfortunately, I have done everything in my power to avoid SQL, and now I am paying the price.
SELECT qry_CombinedIndividual.qry_PrimaryApplicant.[Application Date],
qry_CombinedIndividual.qry_PrimaryApplicant.[Eligibility Rep],
qry_CombinedIndividual.qry_PrimaryApplicant.Name,
qry_CombinedIndividual.qry_PrimaryApplicant.Clinic,
qry_CombinedIndividual.qry_PrimaryApplicant.Outreach,
qry_CombinedIndividual.qry_PrimaryApplicant.[Content Type ID],
qry_CombinedIndividual.qry_PrimaryApplicant.[Application Status],
qry_CombinedIndividual.qry_PrimaryApplicant.Renewal,
qry_CombinedIndividual.qry_Enrolled.EthnicityEnr,
qry_CombinedIndividual.qry_Enrolled.GenderEnr, qry_CombinedIndividual.AgeAtApp,
qry_CombinedIndividual.[Percent FPL], tbl_ChildrensMedical.MinPercentFPL,
tbl_ChildrensMedical.MaxPercentFPL, tbl_ChildrensMedical.MinAge,
tbl_ChildrensMedical.MaxAge, tbl_ChildrensMedical.Program
FROM qry_CombinedIndividual
INNER JOIN tbl_ChildrensMedical ON qry_CombinedIndividual.qry_Enrolled.Patient = tbl_ChildrensMedical.Patient
WHERE (((qry_CombinedIndividual.AgeAtApp)>=[tbl_ChildrensMedical].[MinAge]
And (qry_CombinedIndividual.AgeAtApp)<[tbl_ChildrensMedical].[MinAge])
AND ((qry_CombinedIndividual.[Percent FPL])>=[tbl_ChildrensMedical].[MinPercentFPL]
And (qry_CombinedIndividual.[Percent FPL])<[tbl_ChildrensMedical].[MaxPercentFPL]));
Also there are many different programs. Here is the real Children's Table (eventually I would like to add adults if possible)
*Note the actual table uses FPL (which takes family size into account, but is used the same as income). I am again at a total loss as to how you formated the table.
Program Patient MinPercentFPL MaxPercentFPL MinAge MaxAge
SCHIP (No Premium) No 0 210 1 19
SCHIP (Tier 1) No 210 260 1 19
SCHIP (Tier 2) No 260 312 1 19
Newborn No 0 300 0 1
Newborn (Patient) Yes 0 300 0 1
Children's Medical Yes 0 200 1 19
CHIP (20 Premium) Yes 200 250 1 19
CHIP (30 Premium) Yes 250 300 1 19
Do I have the correct implementation for the table I have? Or should I be changing something. I can also send more information/sample data if that would help.
Thank you again!
I just created some tables with your sample data and used the following SQL. Your 3rd 'patient' doesn't match any of the ranges (Age 1, Income $30K)
SELECT tblPatient.PatName, tblPatient.FamInc, tblPatient.Age, tblPatient.Patient,
tblPatientRange.Program, tblPatientRange.MinInc, tblPatientRange.MaxInc, tblPatientRange.MinAge,
tblPatientRange.MaxAge, tblPatientRange.Patient
FROM tblPatient INNER JOIN tblPatientRange ON tblPatient.Patient = tblPatientRange.Patient
WHERE (((tblPatient.FamInc)>=[tblPatientRange]![MinInc] And (tblPatient.FamInc)<=[tblPatientRange]![MaxInc])
AND ((tblPatient.Age)>=[tblPatientRange]![MinAge] And (tblPatient.Age)<=[tblPatientRange]![MaxAge]));

How to use SSIS to export data from multiple tables into one flat file?

I have 2 tables with different number of columns, and I need to export the data using SSIS to a text file. For example, I have customer table, tblCustomers; order table, tblOrders
tblCustomers (id, name, address, state, zip)
id name address state zip’
100 custA address1 NY 12345
99 custB address2 FL 54321
and
tblOrders(id, cust_id, name, quantity, total, date)
id cust_id name quantity total date
1 100 candy 10 100.00 04/01/2014
2 99 veg 1 2.00 04/01/2014
3 99 fruit 2 0.99 04/01/2014
4 100 veg 1 3.99 04/05/2014
The result file would be as following
“custA”, “100”, “recordtypeA”, “address1”, “NY”, “12345”
“custA”, “100”, “recordtypeB”, “candy”, “10”, “100.00”, “04/01/2014”
“custA”, “100”, “recordtypeB”, “veg”, “1”, “3.99”, “04/05/2014”
“custB”, “99”, “recordtypeA”, “address2”, “FL”, “54321”
“custB”, “99”, “recordtypeB”, “veg”, “1”, “2.00”, “04/01/2014”
“custB”, “99”, “recordtypeB”, “fruit”, “2”, “0.99”, “04/01/2014”
Can anyone please guild me as how to do this?
I presume you meant "guide", not "guild" - I hope your typing is more careful when you code?
I would create a Data Flow Task in an SSIS package. In that I would first add an OLE DB Source and point it at tblOrders. Then I would add a Lookup to add the data from tblCustomers, by matching tblOrders.Cust_id to tblCustomers.id.
I would use a SQL Query that joins the tables, and sets up the data, use that as a source and export that.
Note that the first row has 6 columns and the second one has 7. It's generally difficult (well not as easy as a standard file) to import these types of header/detail files. How is this file being used once created? If it needs to be imported somewhere you'd be better of just joining the data up and having 10 columns, or exporting them seperately.

Aggregating multiple groups in SQL

I have an application that needs to query a MySQL database and retrieve a list of users that may be sharing an IP address, and I am having some trouble turning my the concept of what I want to do with my query in my head into a functional query.
The situation is that I have a table which contains known ip information for users. Each time the user logs in, it creates a timestamped entry containing their user id and the ip address they logged into.
Initially, I used the following query to return rows representing IP addresses that were shared:
select ip, GROUP_CONCAT(DISTINCT account ORDER BY timestamp SEPARATOR ' ')
from known_ips
group by ip having count(1) > 1
However, many users have dynamic IP addresses so this list contains many duplicate entries (one for each IP address they share with others, obviously).
What I would like to do is have each row returned be a unique group of users that have shared any ip address with one another at any point.
For example, if Bob and Jane had shared IP address 192.168.0.1 and Bob and Fred had shared IP address 192.168.0.2, I would want the row to return 'Bob Fred Jane' (the program is taking the results of this query and doing some operations with it, and essentially needs a list of accounts on which to take action).
What I can't figure out on my own is how to do this aggregation (or whether it is even possible). I initially tried to have the original query as CTE (using with clause) and then trying to group on that, however I reached a stumbling block in that I couldn't figure out how to logically carry out the operation "compare string-delimited list of users in group 1 to see if any exist in group 2", and I figure doing that sort of string comparison is not what SQL is all about anyway (and I can do it in the program rather than the SQL besides).
Does anyone know of any technique whereby I can represent the logic of what I am trying to achieve here in MySQL? Or, should I accept the solution I already reached and then do the aggregation in the client application?
Edit:
In response to the request to a sample of data and output, here is a contrived example of the data:
Account IP Timestamp
Bob 192.168.0.1 2014-02-12 08:00
Bob 192.168.0.1 2014-02-12 09:30
Bob 192.168.0.2 2014-02-12 10:00
Mary 192.168.0.1 2014-03-12 07:00
Bob 192.168.0.2 2014-03-12 08:00
Jim 192.168.0.4 2014-03-12 08:30
Ted 192.168.0.2 2014-03-12 09:00
Jim 192.168.0.5 2014-04-12 08:30
Bob 192.168.0.3 2014-04-12 09:30
Andy 192.168.0.6 2014-04-12 10:30
Paul 192.168.0.6 2014-04-12 11:30
From this sample data, I would expect exactly two rows returned:
Bob Mary Ted
Andy Paul
I am ambivalent about the ordering of accounts in the list, despite my use of ORDER BY timestamp earlier.