Is there a way to find all the orders shipped to London using an SQL query? Simply searching for London in the columns doesn't work as some customers have put the district name rather than "London".
So I thought the best way to go was via the postcode. Would this be the best way to go about finding the rows? And continue with using OR statements for each postcode?
select * from tt_order_data
where ship_postcode like "e1%"
According to wiki, this is the postcode range:
The E, EC, N, NW, SE, SW, W and WC postcode areas (the eight London
postal districts) comprised the inner area of the London postal region
and correspond to the London post town.
The BR, CR, DA, EN, HA, IG, SL, TN, KT, RM, SM, TW, UB, WD and CM (the
14 outer London postcode areas) comprised the outer area of the London
postal region.[20]
The inner and outer areas together comprised the London postal
region.[13]
One way to do this would be to leverage REGEXP and define a pattern that matches only ship_postcodes that begin with one of the aforementioned London postcode character sequences:
SELECT *
FROM tt_order_data
WHERE UPPER(TRIM(ship_postcode)) REGEXP '^(E|EC|N|NW|SE|SW|W|WC|BR|CR|DA|EN|HA|IG|SL|TN|KT|RM|SM|TW|UB|WD|CM)'
DB Fiddle | Regex101
It's important to keep in mind that you will still need to perform some amount of data cleansing if the inputs weren't properly controlled, as invalid postcodes would match this filter (e.g., E1 7AA is valid, but this filter would also consider a string like ERGO valid as well).
As an aside, I'm not exactly sure how this will perform with your specific dataset at scale, but if this is for a one-off exercise then it should fit your needs just fine.
Related
I am using Match() Against() in mysql.
What I want to do is somehow get the keyword that is matched with the string.
lets say the keyword is 'rain water'
and I want to find it in the table. Since its match() Against() It will match both rain and water individually and thats find. But I want to get the word which is matched.
Like if rain is matched i need the word rain if water is matched i need the word water.
example table
--------------------------------------------------------
id| text1 | text2
--------------------------------------------------------
1 rain water harvesting I have a new car
2 summer season heat I want to make tea
3 I want to go to paris I love to water plants
4 Its raining in england rain drops are falling
5 do not waste water we eat bun
6 water is essential I love to dance in rain
7 fire burns my laptop is old
8 we breathe air We eat good food
---------------------------------------------------------
This is the query I have reached so far
SELECT *,
MATCH(text1,text2) AGAINST('rain water' IN NATURAL LANGUAGE MODE) AS Word
FROM EXAMPLE_TABLE
WHERE MATCH(text1,text2) AGAINST('rain water' IN NATURAL LANGUAGE MODE)
MySQL does not directly offer you feedback about why your search term hit, apart from the aggregated score.
You can however extract that information if you match your result to each term separately, and this way create your own post-evaluation:
SELECT *,
CONCAT_WS(', ',
IF(MATCH(text1,text2) AGAINST('rain'), 'rain', null),
IF(MATCH(text1,text2) AGAINST('water'), 'water', null) ) as words
FROM EXAMPLE_TABLE
WHERE MATCH(text1,text2) AGAINST('rain water')
This will check your found rows against each term separately, and if it was a hit, appends the word to your result string.
In general though, things you do just to format your output belong into your application, which also usually has more flexibility in processing strings. If you e.g. want to order your terms by position or occurance in your result ("water, rain" instead of "rain, water"), a query-only solution will quickly become a mess (and if you'd still need it in MySQL, you would do it in a stored function and do basically the same as you would in e.g. php).
I have a requirement to remove "duplicate" entries from a dataset, which is being displayed on the front-end of our application.
A duplicate is defined by the client as a speed test result which is in the same exchange.
Here is my current query,
SELECT id, isp, exchange_name, exchange_postcode_area, download_kbps, upload_kbps
FROM speedtest_results
WHERE postcode IS NOT NULL
AND exchange_name IS NOT NULL
ORDER BY download_kbps DESC, upload_kbps ASC
This query would return some data like this,
12062 The University of Bristol Bristol North BS6 821235 212132
12982 HighSpeed Office Limited Totton SO40 672835 298702
18418 University of Birmingham Victoria B9 553187 336889
14050 Sohonet Limited Lee Green SE13 537686 104439
19981 The JNT Association Holborn WC1V 335833 74459
19983 The JNT Association Holborn WC1V 333661 84397
5652 University of Southampton Woolston SO19 330320 64200
As you can see, there are two tests in the WC1V postcode area, which I'd like to aggregate into a single result, ideally using max rather than avg.
How can I modify my query to ensure that I am selecting the fastest speed test result for the exchange whilst still being able to return a list of all the max speeds?
Seems that I was far too hasty to create a question! I have since solved my own issue.
SELECT id, isp, exchange_name, exchange_postcode_area, MAX(download_kbps) as download_kbps, upload_kbps
FROM speedtest_results
WHERE exchange_name IS NOT NULL
AND postcode IS NOT NULL
GROUP BY exchange_name
ORDER BY MAX(download_kbps) DESC
LIMIT 20
I bought a geo-database a long time ago and I'm updating its precision to the lat/lng values. But I've found some weird stuff. There are some cities that have the same lat/lng coordinates. Thing that is geographically impossible.
id City State Lat Lng
1 A sA XX XX
2 B sA XX XX
3 C sA YY YY
4 D sA ZZ ZZ
So I tried Group By City, Lat, Lng but as I need the id to update the record the group by clause will ask me to add ´id´ column.
From the table ids 1 and 2 should be updated leaving 3 and 4 out. It shouldn't be 2 (or more) cities with the same Lat/Lng. The Table has 22K rows. I could send all to gmap API but I'm looking for use the time, bandwith and hits to the API as smart as possible but I'm running out of time considering I can make a request per second using the free API access.
I've tried
SELECT DISTINCT postcodes_id, Latitude, Longitude, Region1Name, Region2Name, Nation_D
FROM postcodes
where Latitude + Longitude IN
(
SELECT Latitude + Longitude
FROM
(
SELECT postcodes_id, Latitude, Longitude, count(distinct(Region2Name)) as cantidad
FROM postcodes
where Nation_D is not null
GROUP BY Latitude, Longitude
having count(distinct(Region2Name)) > 1
) A
)
AND Nation_D IS NOT NULL
ORDER BY Latitude, Longitude, Region1Name, Region2Name, Nation_D
But is not working as expected. I think its pretty obvious for a new pair of eyes.
I wrote a python script to use Google Map geocode to get the current Lat/Lng and update it if it's different. This script works ok.
Hope someone has an idea. Thanks!!
Running MySQL 5.5 and Python 2.7 on a CentOS 7.
Just some pointers for you, which may be helpful:
You should not use group by or distinct on lat/lon or any combination of them, since they are contiguous floating points numbers and not discrete integers or strings.
By the same token, you should not use WHERE clauses on lat/lon or their sum. If you mean to check for proximity of two locations, use st_distance() function instead.
Multiple city names can refer to the same location. For example, New York, NY and Manhattan, NY.
And a non-technical point: storing Google geocoding data in your database is against their licensing agreement.
I have been intrigued by a problem on SQLZoo. It is a "greatest-n-per-group" problem. I would like to understand how the engine is operating.
A table called bbc contains the name, region of the world and population of each country:
bbc( name, region, population)
The given task is to select the most populous country of each region, showing its name, the region and population.
The solution provided is:
SELECT region, name, population FROM bbc x
WHERE population >= ALL
(SELECT population FROM bbc y
WHERE y.region=x.region
AND population>0)
1. Main Question. I am finding this a bit of a mind twister. I would like to understand how the engine processes this, because at first blush it seems there is some kind of co-dependence (x depending on y, and y depending on x). Does the engine follow some kind of recursion to produce the final selection? Or am I missing something, such that either x or y is actually fixed?
2. Secondary Question. Oddly, when I pull the "AND population>0" out of the parenthesis and leave it on its own at the bottom, one of the regions (Europe / Russia) goes missing from the 8 results. Why? I don't understand that.
And indeed, when I try the query on the world database (available from the mySQL website on the same page as Sakila), the behavior is different:
With population > 0 out of the parentheses, I get 6 regions. Six is the right number in this database, because "SELECT continent FROM country GROUP BY continent" reveals seven continents, of which one is Antarctica, which includes 5 "countries", all with a 0 population.
So that seems right.
SELECT continent, `name`, population FROM country X
WHERE population >= ALL
(SELECT population FROM country Y
WHERE Y.`Continent` = X.`Continent`)
AND population>0
On the other hand, when I pull "population > 0" back into the parentheses as on SQLZoo, I also get 5 countries with a zero (the countries "belonging to Antarctica"). It doesn't matter if I specify x.population or y.population, I get zeroes.
continent name population
------------- -------------------------------------------- ------------
Antarctica Antarctica 0
Antarctica French Southern territories 0
Oceania Australia 18886000
South America Brazil 170115000
Antarctica Bouvet Island 0
Asia China 1277558000
Antarctica Heard Island and McDonald Islands 0
Africa Nigeria 111506000
Europe Russian Federation 146934000
Antarctica South Georgia and the South Sandwich Islands 0
North America United States 278357000
Very much looking for insights on these questions!
Wishing you all a beautiful week.
:)
Notes:
For reference, the problem is number 3a on this page:
http://old.sqlzoo.net/1a.htm?answer=1
A thread mentioning the "greatest-n-per-group" problem for the same query:
MySQL world database Trying to avoid subquery
The world database is available here: http://dev.mysql.com/doc/index-other.html
Main Question. I am finding this a bit of a mind twister. I would like to understand how the engine processes this, because at first
blush it seems there is some kind of co-dependence (x depending on y,
and y depending on x). Does the engine follow some kind of recursion
to produce the final selection? Or am I missing something, such that
either x or y is actually fixed?
This isn't recursion. See this from the MySQL docs. Their solution to the problem is equivalent to this
SELECT region, name, population FROM bbc x
WHERE population =
(SELECT max(population) FROM bbc y
WHERE y.region=x.region
)
Secondary Question. Oddly, when I pull the "AND population>0" out of the parenthesis and leave it on its own at the bottom, one of the
regions (Europe / Russia) goes missing from the 8 results. Why? I
don't understand that.
Slight changes (as suggested by ypercube above) work
SELECT region, name, population FROM bbc x
WHERE population >= ALL
(SELECT population FROM bbc y
WHERE y.region=x.region
AND population IS NOT NULL)
This query
SELECT region, name, population FROM bbc x
WHERE population is null
Returns a row. Not sure why population should be nullable, but didn't take a good look at the rest of it. Otherwise, the query should work fine without the >0
Also, this is different from the greatest-n-per-group. In that problem you seek to find the top N items instead of just the top one.
I am attempting to query a table for a limited resultset in order to populate an autocomplete field in javascript. I am, therefore, using a LIKE operator with the partial string entered.
If I have, for example, a table such as:
tblPlaces
id country
1 Balanca
2 Cameroon
3 Canada
4 Cape Verde
5 Denmark
For the sake of this example, let's say I want two rows returning - and yeah, for this example, I made up a country there ;) I want to prioritize any instance where a partial string is matched at the beginning of country. The query I began using, therefore is:
SELECT id, country FROM tblPlaces WHERE country LIKE 'ca%' LIMIT 2
This returned 'Cameroon' and 'Canada' as expected. However, in instances where there are no two names in which the string is matched at the beginning of a word (such as 'de'), I want it to look elsewhere in the word. So I revised the query to become
SELECT id, country FROM tblPlaces WHERE country LIKE '%ca%' LIMIT 2
This then returned 'Cape Verde' and 'Denmark', but in doing so broke my original search for 'ca', which now returns 'Balanca' and 'Cameroon'.
So, my question is, how to go about this using a single query that will prioritize a match at the start of a word (perhaps I need to use REGEXP?) I am assuming also that if the 'country' column is indexed, these matches will at least be returned with subsequent alphabetical priority (i.e. Cameroon before Canada etc).
If you mean to prioritize matches that are Exactly at the start...
SELECT id, country
FROM tblPlaces
WHERE country LIKE '%ca%'
ORDER BY CASE WHEN country LIKE 'ca%' THEN 0 ELSE 1 END, country
LIMIT 2
EDIT
More generic and possibly faster (Assuming "closer to the start the 'better' the match")...
SELECT id, country
FROM tblPlaces
WHERE country LIKE '%ca%'
ORDER BY INSTR(country, 'ca'), country
LIMIT 2