Need to delete random tuples from database in SQL - mysql

We're hiring some third party Test engineers and programmers to help us with some bugs on our website. They would be working on a beta installation of our web application. The thing is that we need to give them a copy of our database, we don't want to give the entire database, its a huge database of companies. So we would want to give them a watered down version of it that has less than a fraction of the actual data -- just enough for making a proper test.
We have data in the following Schema:
COMPANIES
ID|NAME|CATEGORY|COUNTRY_ID.....
We also have a set number of categories and countries.
The thing is that we don't want the deletion to be too random, basically out of the hundreds of thousands of entries we need to give them a version that has a few hundred entries but such that, you have at least 2-3 companies for each country and category.
I'm a bit perplexed as how to do a select query with the above restriction much less delete.
It's a MySQL database we would be using here. Can this be even done in SQL or do we need to make a script in php or so?

Following select statement will select companies with first 3 id in ascending order for each category, country_id combination:
select id, name, category, country_id
from companies c1
where id in (
select id
from companies c2
where c2.category=c1.category and c2.countr_id=c1.country_id
order by id
limit 3
);

Not sure my answer will fit your needs since I am doing some assumptions that may be wrong, but you could try the following approach:
select category, country_id, min(id) id1, max(id) id2
from companies
group by country_id, category
order by country_id, category
This query only gives you 2 company ids instead of 3 and they will be the first and last id that match category and country.
Please note also I wrote this out of my mind and have no MySQL engine to test it.
Hope that helps or at least gives you a hint on how to do it.

Related

MySQL: Taxonomy - Get ID that belongs to multiple categories

I was hoping someone could help me come up with a query for what I'm looking to do.
I have a website that lists game servers and I'm trying to improve my search system a bit.
There's three tables of interest; servers, version_taxonomy and category_taxonomy. The taxonomy tables contain two columns, one for a server ID and one for a version/category ID, where associations between a server and it's supported versions and categories can be made.
Up till now, I've been joining both taxonomy tables to the server table and be looking up servers for one version and one category, it's been working fine. However I'm looking to allow the search of a server that has multiple categories at the same time.
I've made an image to try and illustrate what I'm looking to do:
Say I'm looking for a server that has both categories 5 and 12 - Based on the table on the left that would be servers 1 and 3. But how would that be in a query? And how would I use that query to later get and work with the rest of the server data (JOIN like I'd normally do?)
Hopefully that makes sense! Looking forward to your responses.
Assuming I understand the question:
Join the two tables then count the distinct values of category ID while limiting by them. Distinct is not be needed if you can guarantee the uniqueness of serverID, categoryID from table A and a 1:1 relationship to server taxonomy which would be true if you always limit by 1 and only 1 version...
SELECT A.ServerID, count(A.CategoryID) CatCnt
FROM A
INNER JOIN B
on A.ServerID = B.ServerID
WHERE A.CATEGORYID in (5,12)
and B.Version= 1.16
GROUP BY A.ServerID
HAVING count(distinct A.CategoryID) = 2
The category ID could be parameter passed in as well as the count distinct as you know both values.
This could be used as a CTE or as a inline derived table as a source then join in to get the addiontal data; or left join in the desired data assuming it's a 1:1 relationship.
If you want a working example: post DDL for table and SQL to create sample data and I'll put something in https://rextester.com/.

SQL Query get all construction plans order by missing parts

OK lets consider we have the following db shema:
ConstructionPlans
c_id
Part
p_id
PlanPart
c_id p_id
The Table Construction Plans has all the plans, Part has all the parts and PlanPart has an entry for every plan and part combination. I am trying to think of a query considering that you would want to get all the construction plans (literally all) and have them in order of the number of parts that are missing. I have all the parts I have in an array for this query. I would also like to be able to check all the missing parts for a plan later on (but I guess one would simply take alle the c_ids in PlanPart and substract with the other Query and a Where Clause with the parts one already has).
Thank you for your input :)
Assuming that you have a column quantity in your Part table and a coulmn name in you ConstructionPlan table
select p.p_id from Part p, PlanPart pp, ConstructionPlan cp where p.quantity=0 and p.p_id=pp.p_id and cp.c_id = pp.c_id and cp.name = '<your plan name goes here>'

Why would a SQL query need to be so complicated like this feature allows?

I am studying for SQL exam, and I came across this fact, regarding subqueries:
2. Main query and subquery can get data from different tables
When is a case when this feature would be useful? I find it difficult to imagine such a case.
Millions of situations call for finding information in different tables, it's the basis of relational data. Here's an example:
Find the emergency contact information for all students who are in a chemistry class:
SELECT Emergency_Name, Emergency_Phone
FROM tbl_StudentInfo
WHERE StudentID IN (SELECT b.StudentID
FROM tbl_ClassEnroll b
WHERE Subject = 'Chemistry')
SELECT * FROM tableA
WHERE id IN (SELECT id FROM tableB)
There is plenty of reasons why you have to get data from different tables, such as select sth from main query, which is based on subquery/subqueries from another tables. The usage is really huge.
choose customers from main query which is based on regions and their values
SELECT * FROM customers
WHERE country IN(SELECT name FROM country WHERE name LIKE '%land%')
choose products from main query which is greater or lower than average incoming salary of customers and so on...
You could do something like,
SELECT SUM(trans) as 'Transactions', branch.city as 'city'
FROM account
INNER JOIN branch
ON branch.bID = account.bID
GROUP BY branch.city
HAVING SUM(account.trans) < 0;
This would for a company to identify which branch makes the most profit and which branch is making a loss, it would help identify if the company had to make changes to their marketing approach in certain regions, in theory allowing for the company to become more dynamic and reactive to changes in the economy at any give time.

How to query 3 mysql tables and return matching results (with one to many relationships)?

I am trying to query a database to return some matching records and can't work out how to do it in the most efficient way. I have a TUsers table, a TJobsOffered table and a TJobsRequested table. The UserID is the primary key for the TUsers table and is used within the Job tables in a one to many relationship.
Ultimately I want to run a query that returns a list of all matching users based on a particular UserID (eg a matching user is one that has at least one matching record in both tables, eg if UserA has jobid 999 listed in TJobsOffered and UserB has jobid 999 listed in TJobsRequested then this is a match).
In order to try and get my head around it i've simplified it down a lot and am trying to match the records based on the jobids for the user in question, eg:
SELECT DISTINCT TJobsOffered.FUserID FROM TJobsOffered, TJobsRequested
WHERE TJobsOffered.FUserID=TJobsRequested.FUserID AND
(TJobsRequested.FJobID='12' OR TJobsRequested.FJobID='30') AND
(TJobsOffered.FJobID='86' OR TJobsOffered.FJobID='5')
This seems to work fine and returns the correct results however when I introduce the TUsers table (so I can access user information) it starts returning incorrect results. I can't work out why the following query doesn't return the same results as the one listed above as surely it's still matching the same information just with a different connector (or is the one above effectively many to many and the one below 2 sets of one to many comparisons)?
SELECT DISTINCT TUsers.Fid, TUsers.FName FROM TUsers, TJobsOffered, TJobsRequested
WHERE TUsers.Fid=TJobsRequested.FUserID AND TUsers.Fid=TJobsOffered.FUserID AND
(TJobsRequested.FJobID='12' OR TJobsRequested.FJobID='30') AND
(TJobsOffered.FJobID='86' OR TJobsOffered.FJobID='5')
If anyone could explain where i'm going wrong with the second query and how you should incorporate TUsers then that would be greatly appreciated as I can't get my head around the join. If you are able to give me any pointers as to how I can do this all in one query by just passing the user id in then that would be massively appreciated as well! :)
Thanks so much,
Dave
Try this
SELECT DISTINCT TJobsOffered.FUserID , TUsers.FName
FROM TJobsOffered
INNER JOIN TJobsRequested ON TJobsOffered.FUserID=TJobsRequested.FUserID
LEFT JOIN TUsers ON TUsers.Fid=TJobsOffered.FUserID
WHERE
(TJobsRequested.FJobID (12,30) AND
(TJobsOffered.FJobID IN (86 ,5)
You need to add "AND TJobsOffered.FUserID=TJobsRequested.FUserID" to your where clause.

What is the best way to count rows in a mySQL complex table

I have a table with the following fields (for example);
id, reference, customerId.
Now, I often want to log an enquiry for a customer.. BUT, in some cases, I need to filter the enquiry based on the customers country... which is in the customer table..
id, Name, Country..for example
At the moment, my application shows 15 enquiries per page and I am SELECTing all enquiries, and for each one, checking the country field in customerTable based on the customerId to filter the country. I would also count the number of enquiries this way to find out the total number of enquiries and be able to display the page (Page 1 of 4).
As the database is growing, I am starting to notice a bit of lag, and I think my methodology is a bit flawed!
My first guess at how this should be done, is I can add the country to the enquiryTable. Problem solved, but does anyone else have a suggestion as to how this might be done? Because I don't like the idea of having to update each enquiry every time the country of a contact is changed.
Thanks in advance!
It looks to me like this data should be spread over 3 tables
customers
enquiries
countries
Then by using joins you can bring out the customer and country data and filter by either. Something like.....
SELECT
enquiries.enquiryid,
enquiries.enquiredetails,
customers.customerid,
customers.reference,
customers.countryid,
countries.name AS countryname
FROM
enquiries
INNER JOIN customers ON enquiries.customerid = customers.customerid
INNER JOIN countries ON customers.countryid = countries.countryid
WHERE countries.name='United Kingdom'
You should definitely be only touching the database once to do this.
Depending on how you are accessing your data you may be able to get a row count without issuing a second COUNT(*) query. You havent mentioned what programming language or data access strategy you have so difficult to be more helpful with the count. If you have no easy way of determining row count from within the data access layer of your code then you could use a stored procedure with an output parameter to give you the row count without making two round trips to the database. It all depends on your architecture, data access strategy and how close you are to your database.