I'm trying to figure out how to efficiently run a set of queries that will provide a new table of all values that would return results for an arbitrary query.
Say my table has a schema like:
id
name
age
city
What is an efficient way to list all values that would return results for an arbitrary query (e.g., SELECT * FROM main WHERE NOT city=X AND age BETWEEN Y AND Z)?
My naive approach for this would be to use a script and recurse through all possible combinations of {city, age, age} and see which SELECTs return more than 0 results, but that seems incredibly inefficient. I've also tried building large joins on {city, age, age} as well and basically using that table as an argument list to the query, but that quickly becomes an impossibility for queries on many columns.
For simple conjunctive equality queries (e.g., SELECT * FROM main WHERE name=X AND age=Y), this is much simpler, as I can do something like:
SELECT name, age, count(*) AS count FROM main GROUP BY name, age HAVING count > 0
But I'm having difficulty coming up with a general approach for anything more complicated than that.
Any pointers in the right direction would be most helpful, thanks.
EDIT:
It appears I did a very poor job explaining this, sorry.
Imagine a user gives me a database and a template query and says, "Tell me all the values I can use in this query that will yield results from this database." For example, the user might want to know all age range queries that will return at least one row (e.g., the template query is SELECT * FROM main WHERE age BETWEEN X AND Y).
In that particular example, one could run a SELECT to find the min/max ages in the database, and just tell the user to query between those ages.
Now imagine that the query template is more complicated, such as SELECT * FROM main WHERE NOT city=W AND age BETWEEN X AND Y AND name LIKE Z. How could one determine the range of W/X/Y/Z values that can be used with this query to return results? Does it require creating a join table with every single {city, age, age, name} combination and running the SELECT on each row? How can I do this efficiently so that the operation is time-bound on large databases?
Hopefully that clarifies it.
You could try to write a trigger after insert into your table which inserts the values into another table if they don't exist already. This way you'd have a table with the distinct values of your table. This table would look something like
columnNameFromYourTable | distinctValue
city NY
city LA
age 1
age 2
...
Then when you want to know if a record exists in your table for query SELECT * FROM main WHERE NOT city=W AND age BETWEEN X AND Y AND name LIKE Z you'd query the distinctTable with
select 1 from dual where 1=1
and not exists (select 1 from distinctTable where columnNameFromYourTable = 'city' and distinctValue = 'W')
and exists (select 1 from distinctTable where columnNameFromYourTable = 'age' and distinctValue BETWEEN X AND Y)
and exists (select 1 from distinctTable where columnNameFromYourTable = 'name' and distinctValue LIKE '%Z%')
This would be pretty fast then. If it returns 1 there is an entry in your table, if NULL not.
Related
Before i start my question i cover briefly what the problem is:
I have a table that stores around 4 million 'parameter' values. These values have an id, simulation id and parameter id.
The parameter id maps to a parameter table that basically just maps the id to a text like representation of the parameter x,y, etc etc
The simulation table has around 170k entries that map parameter values to a job.
There is also a score table which stores the score of each simulation , simulations have varying scores for example one might have one score another might have three. The scores tables has a simulation_id column for selecting this.
Each job has an id and an objective.
Currently im trying to select all the parameter_values who's parameter is 'x' and where the job id is 17 and fetch the score of it. The variables of the select will change but in princible its only really these things im interested in.
Currently im using this statement:
SELECT simulation.id , value , name , ( SELECT GROUP_CONCAT(score) FROM score WHERE score.simulation_id = simulation.id ) AS score FROM simulation,parameter_value,parameter WHERE simulation.id=parameter_value.simulation_id AND simulation.job_id = 17 AND parameter_value.parameter_id=parameter.id AND parameter.name = "$x1"
This works nicley except its taking around 3 seconds to execute. Can this be done any faster?
I don't know if it would be faster doing a query before this a pre-calculating the parameter_ids im searching for and doing an WHERE parameter_id IN (1,2,3,4) etc.
But i was under the impression SQL would optimize this anyway?
I have created index's where ever possible but cant get faster than the 2.7 seconds mark.
So my question would be:
Should i pre-calculate some values and avoid the joins,
Is there another other than group_concat to get the scores
and is there any other optimizations i could make to this?
I should also add that the scores must be in the same row or at least return sorted so i can easily read them from the result set.
Thanks,
Lewis
I am working on a query that needs to output 'total engagements' by users in columns like 1 -eng column will display users who have one engagements, second column 2-eng which will display users who have done 2 engagements. Likewise 3eng, and so on. Note that the display should be like this. I have a engagement table which has userID. So I get distinct users like this
select count(distinct userID) from engagements
and I get engagements as
select count(*) from engagements
Engagements here refers to users who have either liked,replied,or shared the content
Please help. Thanks! I have used CASE and IF but unable to display in the below form
1eng 2eng 3eng
100 200 100
Consider returning the results in rows and pivoting them afterwards in your application.
To return the desired results in rows, you could use the following query:
SELECT
engagementCount,
COUNT(*) AS userCount
FROM (
SELECT
userID,
COUNT(*) AS engagementCount
FROM engagements
GROUP BY userID
) AS s
GROUP BY engagementCount
;
Basically, you first group the engagements rows by userID and get the row counts per userID. Afterwards, you use the counts as the grouping criterion and count how many users were found with that count.
If you insist on returning the columnar view in SQL, you'll need to resort to dynamic SQL because of the indefinite number of columns in the final result set. You'd probably need to store the results of the inner SELECT temporarily, scan it to build the list of count expressions for every engagementCount value and ultimately construct a query of this kind:
SELECT
COUNT(engagementCount = 1 OR NULL) AS `1eng`,
COUNT(engagementCount = 2 OR NULL) AS `2eng`,
COUNT(engagementCount = 3 OR NULL) AS `3eng`,
...
FROM temporary_storage
;
Or SUM(engagementCount = value) instead COUNT(engagementCount = value OR NULL). (For me, the latter expresses the intention more explicitly, hence why I've suggested it first, but, in case you happen to prefer the SUM technique, there should be no discernible difference in performance between the two. The OR NULL trick is explained here.)
Suppose I have a table with a field named 'rating', it may take different values, but I want to receive a count of specific values.
Example:
Create table mytable(
rating int(1),
);
First and the obvious way I could think of was the following:
select rating,count(rating) from mytable group by rating order by rating
The problem though it is not clear how many values it would return, it may be also not easy to process them way.
What I would really like to do is to select two fields in one row showing the number of records that have some specific values.
Example...
//something like this (some pseudocode):
select count(rating=-1) as rating1, count (rating=1) as rating2 from mytable
Could you advice on some neat way I could select in the ^ above format?
select SUM(IF(rating=-1,1,0)) AS rating1,
SUM(IF(rating=1,1,0)) AS rating2 from mytable
I have multiple select statements from different tables on the same database. I was using multiple, separate queries then loading to my array and sorting (again, after ordering in query).
I would like to combine into one statement to speed up results and make it easier to "load more" (see bottom).
Each query uses SELECT, LEFT JOIN, WHERE and ORDER BY commands which are not the same for each table.
I may not need order by in each statement, but I want the end result, ultimately, to be ordered by a field representing a time (not necessarily the same field name across all tables).
I would want to limit total query results to a number, in my case 100.
I then use a loop through results and for each row I test if OBJECTNAME_ID (ie; comment_id, event_id, upload_id) isset then LOAD_WHATEVER_OBJECT which takes the row and pushes data into an array.
I won't have to sort the array afterwards because it was loaded in order via mysql.
Later in the app, I will "load more" by skipping the first 100, 200 or whatever page*100 is and limit by 100 again with the same query.
The end result from the database would pref look like "this":
RESULT - selected fields from a table - field to sort on is greatest
RESULT - selected fields from a possibly different table - field to sort on is next greatest
RESULT - selected fields from a possibly different table table - field to sort on is third greatest
etc, etc
I see a lot of simpler combined statements, but nothing quite like this.
Any help would be GREATLY appreciated.
easiest way might be a UNION here ( http://dev.mysql.com/doc/refman/5.0/en/union.html ):
(SELECT a,b,c FROM t1)
UNION
(SELECT d AS a, e AS b, f AS c FROM t2)
ORDER BY a DESC
I need to select records from 2 tables, one called cities and one called neighborhoods. They both share a table column in common called parent_state. In this cell the id of the parent state is stored.
I need to select all cities and neighborhoods that belong to a certain state. For example if the state id is 10, I need to get all the cities and neighborhoods that has this value for it's parent_state cell.
The state id is stored in a PHP variable like so:
$parent_state = '10';
What would this query look like (preferably the merged results from both tables should be sorted by the column name in alphabetical order)?
EDIT
Yes, I probably do need a union. I'm very new to mysql and all I can do at the moment is query tables individually.
I can always query both the cities and neighborhoods tables individually but the reason why I want to merge the results is for the sole purpose of listing said results alphabetically.
So can someone please show how the UNION query for this would look?
Use:
SELECT c.name
FROM CITIES c
WHERE c.parent_state = 10
UNION ALL
SELECT n.name
FROM NEIGHBORHOODS h
WHERE n.parent_state = 10
UNION ALL will return the result set as a combination of both queries as a single result set. UNION will remove duplicates, and is slower for it - this is why UNION ALL is a better choice, even if it's unlikely to have a city & neighbourhood with the same name. Honestly, doesn't sound like a good idea mixing the two, because a neighbourhood is part of a city...
Something else to be aware of with UNION is that there needs to be the same number of columns in the SELECT clause for all the queries being UNION'd (this goes for UNION and UNION ALL). IE: You'll get an error if the first query has three columns in the SELECT clause and the second query only had two.
Also, the data types have to match -- that means not returning a DATE/TIME data type in the same position was an other query returning an INTEGER.
What you want is probably not a join, but rather, a union. note that a union can only select the exact same columns from both of the joined expressions.
select * from city as c
inner join neighborhoods as n
on n.parent_state = c.parent_state
where c.parent_state=10
You can use Left,Right Join, in case of city and nighborhoods dont have relational data.