Find column values that are a start string of given string. - mysql

I have a database table that contains URLs in a column. I want to show certain data depending on what page the user is on, defaulting to a 'parent' page if not a direct match. How can I find the columns where the value is part of the submitted URL?
Eg. I have www.example.com/foo/bar/baz/here.html; I would expect to see (after sorting on length of column value):
www.example.com/foo/bar/baz/here.html
www.example.com/foo/bar/baz
www.example.com/foo/bar
www.example.com/foo
www.example.com
if all those URLs are in the table of course.
Is there a built in function or would I need to create a procedure? Googling kept getting me to LIKE and REGEXP, which is not what I need. I figured that a single query would be much more efficient than chopping the URL and making multiple queries (the URLs could potentially contain many path components).

Simple turn around the "Like" operator:
SELECT * FROM urls WHERE "www.example.com/foo/bar/baz/here.html" LIKE CONCAT(url, "%");
http://sqlfiddle.com/#!2/ef6ee/1

Related

Best way to parse a big and intricated Json file with OpenRefine (or R)

I know how to parse json cells in Open refine, but this one is too tricky for me.
I've used an API to extract the calendar of 4730 AirBNB's rooms, identified by their IDs.
Here is an example of one Json file : https://fr.airbnb.com/api/v2/calendar_months?key=d306zoyjsyarp7ifhu67rjxn52tv0t20&currency=EUR&locale=fr&listing_id=4212133&month=11&year=2016&count=12&_format=with_conditions
For each ID and each day of the year from now until november 2017, i would like to extract the availability of this rooms (true or false) and its price at this day.
I can't figure out how to parse out these informations. I guess that it implies a series of nested forEach, but i can't find the right way to do this with Open Refine.
I've tried, of course,
forEach(value.parseJson().calendar_months, e, e.days)
The result is an array of arrays of dictionnaries that disrupts me.
Any help would be appreciate. If the operation is too difficult in Open Refine, a solution with R (or Python) would also be fine for me.
Rather than just creating your Project as text, and working with GREL to parse out...
The best way is just select the JSON record part that you want to work with using our visual importer wizard for JSON files and XML files (you can even use a URL pointing to a JSON file as in your example). (A video tutorial shows how here: https://www.youtube.com/watch?v=vUxdB-nl0Bw )
Select the JSON part that contains your records that you want to parse and work with (this can be any repeating part, just select one of them and OpenRefine will extract all the rest)
Limit the amount of data rows that you want to load in during creation, or leave default of all rows.
Click Create Project and now your in Rows mode. However if you think that Records mode might be better suited for context, just import the project again as JSON and then select the next outside area of the content, perhaps a larger array that contains a key field, etc. In the example, the key field would probably be the Date, and why I highlight the whole record for a given date. This way OpenRefine will have Keys for each record and Records mode lets you work with them better than Row mode.
Feel free to take this example and make it better and even more helpful for all , add it to our Wiki section on How to Use
I think you are on the right track. The output of:
forEach(value.parseJson().calendar_months, e, e.days)
is hard to read because OpenRefine and JSON both use square brackets to indicate arrays. What you are getting from this expression is an OR array containing twelve items (one for each month of the year). The items in the OR array are JSON - each one an array of days in the month.
To keep the steps manageable I'd suggest tackling it like this:
First use
forEach(value.parseJson().calendar_months,m,m.days).join("|")
You have to use 'join' because OR can't store OR arrays directly in a cell - it has to be a string.
Then use "Edit Cells->Split multi-valued cells" - this will get you 12 rows per ID, each containing a JSON expression. Now for each ID you have 12 rows in OR
Then use:
forEach(value.parseJson(),d,d).join("|")
This splits the JSON down into the individual days
Then use "Edit Cells->Split multi-valued cells" again to split the details for each day into its own cell.
Using the JSON from example URL above - this gives me 441 rows for the single ID - each contains the JSON describing the availability & price for a single day. At this point you can use the 'fill down' function on the ID column to fill in the ID for each of the rows.
You've now got some pretty easy JSON in each cell - so you can extract availability using
value.parseJson().available
etc.

select multi url from one column in mysql table

I have a table with "content" column store forum post, there is one or more url in one record of "content" field, I want to get all the url in the “content" column, one url in one row, I use below code
select substr(`content`, locate(`content`,"http://"))
it work for one url in one record, get a list of url like
http://www.google.com
http://www.facebook.com
...
it only get the first url if there are more than one url in the record.
how to fix it?
Another way to look at it is to try:
SELECT GROUP_CONCAT(substr(`content`, locate(`content`,"http://"))) FROM your_table;
which would concatenate all URLs to a single string and carry on from there - maybe you can split it in the code rather than require the DB to do it. Otherwise you can hack on using an auxiliary table of integers 1-n: SQL split comma separated row

How can I change the domain of a URL within a varchar column?

I have a database structure where one of my columns (innerLink) has a URL within it.
So that innerLink column will have a URL structured as follows
http://www.123456.com/forums/showthread.php?t=123456
I wanted to change the http://www.123456.com to a wholly different URL --> http://789.123.com without affecting the rest of the URL structure (ie. /forums/showthread.php?t=123456 )
I need this change to hit every URL in that column that is on the 123456 domain. I have other URLs such as cnn.com or msnbc.com so I dont want those affected. The change should only be to make www.123456.com to 789.123.com
I've never done this type of manipulation with MYSQL before, so was hoping for a bit of guidance before I hose my entire database of about 4000 records :) I will be doing this through PHPMYADMIN
Thanks for any help!!
You want to use the REPLACE() string function
UPDATE `table` SET `innerLink` = REPLACE('www.123456.com', '789.123.com');

What can I do with an inconsistent column delimited text file?

I have a text file that looks something like...
firstname:middle:lastname
firstname:middle:lastname
firstname:lastname
firstname:middle:lastname
firstname:lastname
I would like to be able to eventually use this information in a MySQL database, but since the columns are not correct I am not sure what to do. Is there any way to resolve this?
If the data you have is only the above variations, then you can make the assumptions:
First part is the firstname
Last part is the lastname
Therefore if using PHP for example you could use explode to separate the data on the delimeter such as in this case being :.
When looping through each row just assume the last part is the lastname, first part is the firstname and the middle part is the middlename.
You can use count() to find out how many parts are in the specific row you are reading inside the loop. This should allow you to figure out which one is the last part.
If the file is so simple ... the solution is trivial
firstname:middle:lastname
firstname:lastname
if(there are only two columns) { that means we have first and last name }
else { we have first, middle and last name }
If there are more columns, you could maybe resolve data to proper columns if you manage to build a priority list (like in what order they could be missing, for example 'last name > first name > middle name') or/and if you could combine that with data type matching (string/int/double/date) ... anyway you need to gather all your domain knowledge and see if that suffice.

separating values in a URL, not with an &

Each parameter in a URL can have multiple values. How can I separate them? Here's an example:
http://www.example.com/search?queries=cars,phones
So I want to search for 2 different things: cars and phones (this is just a contrived example). The problem is the separator, a comma. A user could enter a comma in the search form as part of their query and then this would get screwed up. I could have 2 separate URL parameters:
http://www.example.com/login?name1=harry&name2=bob
There's no real problem there, in fact I think this is how URLs were designed to handle this situation. But I can't use it in my particular situation. Requires a separate long post to say why... I need to simply separate the values.
My question is basically, is there a URL encodable character or value that can't possibly be entered in a form (textarea or input) which I can use as a separator? Like a null character? Or a non-visible character?
UPDATE: thank you all for your very quick responses. I should've listed the same parameter name example too, but order matters in my case so that wasn't an option either. We solved this by using a %00 URL encoded character (UTF-8 \u0000) as a value separator.
The standard approach to this is to use the same key name twice.
http://www.example.com/search?queries=cars&queries=phones
Most form libraries will allow you to access it as an array automatically. (If you are using PHP (and making use of $_POST/GET and not reinventing the wheel) you will need to change the name to queries[].)
You can give them each the same parameter name.
http://www.example.com/search?query=cars&query=phones
The average server side HTTP API is able to obtain them as an array. As per your question history, you're using JSP/Servlet, so you can use HttpServletRequest#getParameterValues() for this.
String[] queries = request.getParameterValues("query");
Just URL-encode the user input so that their commas become %2C.
Come up with your own separator that is unlikely to get entered in a query. Two underscores '__' for example.
Why not just do something like "||"? Anyone who types that into a search area probably fell asleep on their keyboard :} Then just explode it on the backend.
easiest thing to do would be to use a custom separator like [!!ValSep!!].