I have a requirement to remove "duplicate" entries from a dataset, which is being displayed on the front-end of our application.
A duplicate is defined by the client as a speed test result which is in the same exchange.
Here is my current query,
SELECT id, isp, exchange_name, exchange_postcode_area, download_kbps, upload_kbps
FROM speedtest_results
WHERE postcode IS NOT NULL
AND exchange_name IS NOT NULL
ORDER BY download_kbps DESC, upload_kbps ASC
This query would return some data like this,
12062 The University of Bristol Bristol North BS6 821235 212132
12982 HighSpeed Office Limited Totton SO40 672835 298702
18418 University of Birmingham Victoria B9 553187 336889
14050 Sohonet Limited Lee Green SE13 537686 104439
19981 The JNT Association Holborn WC1V 335833 74459
19983 The JNT Association Holborn WC1V 333661 84397
5652 University of Southampton Woolston SO19 330320 64200
As you can see, there are two tests in the WC1V postcode area, which I'd like to aggregate into a single result, ideally using max rather than avg.
How can I modify my query to ensure that I am selecting the fastest speed test result for the exchange whilst still being able to return a list of all the max speeds?
Seems that I was far too hasty to create a question! I have since solved my own issue.
SELECT id, isp, exchange_name, exchange_postcode_area, MAX(download_kbps) as download_kbps, upload_kbps
FROM speedtest_results
WHERE exchange_name IS NOT NULL
AND postcode IS NOT NULL
GROUP BY exchange_name
ORDER BY MAX(download_kbps) DESC
LIMIT 20
Related
I have an InnoDB table with 5000 rows. Here is an example of my table named 'insitutes'.
id| name
1 | University of London
2 | Department of Maths University of London
3 | Department of Biology University of London
4 | Department of Chemistry University of London
5 | Department of Physics University of London
...
This is what my query looks like
SELECT *,
MATCH (name) AGAINST ('London University' IN BOOLEAN MODE) AS score
FROM insitutes
WHERE MATCH (name) AGAINST ('London University' IN BOOLEAN MODE)
ORDER BY score DESC
This is what my result will look like
Department of Biology University of London
Department of Maths University of London
Department of Chemistry University of London
University of London
....
I want to get 'University of London' as the first result. Saying this I mean I want to get the closest match to the search query.
By playing with my data I found out that changing the table type to MyISAM and modifying the query to 'IN NATURAL LANGUAGE MODE' will give me expected results. But I cannot use the table type MyISAM as it does not indexes words less than 4 charaters.
First of all, you can indeed control the minimum word size in your MyISAM fulltext index, with ft_min_word_len.
SET ft_min_word_len= 3
or whatever you need, will do it. You probably want to make sure it's set in your MySQL my.cnf file too, so if your server restarts it's still set.
Second, the word of in your search term is in the FULLTEXT stoplist. You can't use it for matching unless you remove it from the stoplist.
And, if you have managed to include of in your index, notice that the department name strings contain it twice, which will boost their score.
If you change FULLTEXT's configuration be sure to rebuild your index.
Third, as you know the order of your resultset comes from the score assigned to each row by FULLTEXT. FULLTEXT is designed as an assistance to human perception. It presents choices for a human to choose among, rather than precisely correct choices. Expecting perfectly predictable results from FULLTEXT is probably a mistake.
What ever letter We Can Use As Text. If U Have Problem Say Me More Clearly. Ok ?
I Think You Can Use Mysql Query Like This:
select
*, count(name)
from
insitutes
where
name like "%London University%"
order by
name desc;
Once Check This If This Statement Works Say Me Else Define Your Problem More Clearly.
I have this table called times where I record race information for a racing game:
race_id map name time
30509 desert Peter 12.68
30510 desert Jakob 10.72
30511 desert Peter 18.4
30512 jungle Peter 39.909
30513 jungle Peter 39.84
30514 desert Harry 16.129
30515 space Harry 774.765
30516 jungle Jonas 46.047
30517 city Jonas 23.54
30518 city Jonas 23.13
30519 desert Mike 22.9
30520 space Fred 174.244
I have two questions. How would I best go about:
Finding the lowest time (world record) on a given map?
I have tried this query:
SELECT *, MIN(time) FROM times WHERE map = 'desert';
This yields a seemingly incorrect arbitrary row with an added column called MIN(time) where the correct lowest time is.
Finding the lowest time on all maps, but only if it's done by a certain player (find all world records by given player)?
For this I have tried this query:
SELECT *, MIN(time) FROM times WHERE name = 'Peter' GROUP BY map;
This seems to only return the first row by the given name for each map, regardless if it's the lowest time or not.
I'm fairly new to SQL(MySQL), so I might be missing something obvious here. I've been looking around for quite a while now, and any help would be greatly appreciated. Thanks!
if you want the fastest performance on a given race, you can just order by and limit:
select *
from times
where map = 'desert'
order by time limit 1
On the other hand, if you want all race records for a given user, then it is a bit different. One option uses a correlated subquery for filtering:
select t.*
from times t
where
name = 'Peter'
and time = (select min(t1.time) from times t1 where t1.map = t.map)
Finding the lowest time (world record) on a given map
SELECT `time`
FROM times
WHERE map = #map
ORDER BY `time` ASC
LIMIT 1
Finding the lowest time on all maps, but only if it's done by a certain player (find all world records by given player)
SELECT `time`
FROM times
WHERE name = #name
ORDER BY `time` ASC
LIMIT 1
There is a mysql Person table like below. id is primary, indexed. But other columns are not indexed..
id name surname age city branch
1 John Black 34 London driver
2 Lara Croft 28 New York teacher
3 Ahmad Hasan 41 Doha doctor
...
1000.000......
My Question is when I make execute select query with where clause with multiple conditions, does it decrease select speed.
For example which one is faster?
Select * From Person Where age > 30
or
Select * from Person Where age > 20 AND city = 'London' AND name = 'John' AND branch = 'doctor' AND ...
Could you tell me which one will be faster?
Without indexes, any WHERE clause causes a table scan. That is, to satisfy the query the server must examine every row in the table. So the search operations you have shown take on the order of the same time as one another.
It also takes time to send a large result set from the MySQL server to the client. Fewer rows in the result set make that part of satisfying your query faster.
Pro tip: avoid SELECT * when dealing tables over about 100 rows long. Instead give the names of the columns you actually need.
I have a database table for storing restaurant names and the city they are located in. Example:
name | city
eleven madison park | NYC
gramercy tavern | NYC
Lotus of Siam | TOK
The Modern | LA
ABC Kitchen | LA
Now when there is an incoming entry before INSERT, if there is no similar restaurant name in the same city, I want to go ahead and perform the insert.
But if the entry is like, say { name: "Eleven Madison", city: "NYC" }, I want to find similar entries in "name" column with the same city, in this example "eleven madison park" in "NYC", I want to do the insert and store a new row in 'conflicts' table - with the IDs of these restaurants (last insert id and similar row id)
I used the Levenshtein distance algorithm, with the following SQL query:
SELECT id, levenshtein_ratio(name, 'Eleven Madison') AS levsh from restaurants
where
city_name = 'NYC'
order by levsh asc
limit 0, 1
Then I set a threshold of 8, and if levsh is less than 8, then I mark it as a conflict i.e. insert a new record in 'conflicts' table. This query was working fine until the table grew to 1000 records. Now this query takes 2 seconds to finish.
I understand that this is because I am calculating levenshtein_ratio for all the restaurants in the city - and I only need to apply the ratio function only on similar names for ex. the ones containing 'Eleven' , 'Madison',.. or even better would be if i can do something like
WHERE city_name = 'NYC' AND SOUNDEX(any word in `name`) = SOUNDEX(any word in 'Eleven Madison')
Please help with suggestions on how to improve and optimize this query, and if possible any better approach to what I am doing.
Thanks
I am having a complete nightmare with my application. I haven't worked with datasets this big before, and my query is either timing out or taking ages to return something. I've got a feeling that my approach is just all wrong.
I have a payments table with a postcode field (among others). It has 40,000 rows roughly (one for each transaction). It has an auto-inc PRIMARY key and an INDEX on the postcode foreign-key.
I also have a postcodes lookup table with 2,500,000 rows. The table is structured like so;
postcode | country | county | localauthority | gor
AB1 1AA S99999 E2304 X 45
AB1 1AB S99999 E2304 X 45
The postcode field is PRIMARY and I have INDEXes on all the other fields.
Each field (apart from postcode) has a lookup table. In the case of country it's something like;
code | description
S99999 Wales
The point of the application is that the user can select areas of interest (such as "England", "London", "South West England" etc) and be shown payments results for those areas.
To do this, when a user selects the areas they are interested, I then created a temp table, with one row, listing ALL postcodes for the areas they selected. Then I LEFT JOIN it on to my payments table.
The problem is that if the user selects a big region (like "England") then I have to create a massive temp table (or about 1 million rows) and then compare it to the 40,000 payments to decide which to display.
UPDATE
Here is my code;
$generated_temp_table = "selected_postcodes_".$timestamp_string."_".$userid;
$temp_table_data = $temp_table
->setTempTable($generated_temp_table)
->newQuery()
->with(['payment' => function ($query) use ($column_values) {
$query->select($column_values);
}])
;
Here is my attempt to print out the raw query;
$sql = str_replace(['%', '?'], ['%%', "'%s'"], $temp_table_data->toSql());
$fullSql = vsprintf($sql, $temp_table_data->getBindings());
print_r($fullSql);
This is the result;
select * from `selected_postcodes_1434967426_1`
This doesn't look like the right query, I can't work out what Eloquent is doing here. I don't know why the full query is not printing out.
if you have too many result like 1 million, then use offset limit concept. Then it will save you'r time of the query. Also make sure in you select query you are filtering required fields only.( avoid select * from XXXX. use select A, B from XXX).