Apache Drill - Get max directory in each path? - apache-drill

Drill lets you query multiple directories simultaneously, and lets you control which ones with the dir0/dir1/etc variables.
It also lets you find the MAXDIR or MINDIR with its directory functions. So, you can, for example, write a new version of files in a directory and ensure drill always uses the newest one.
Is there a way to query the newest version of each leaf directory though? For example.
2018/
01/
v1/
v2/
02/
v1/
v2/
I'd like to select only the data in the v2 directories for each month. So, dir0 would be 2018, dir1 would be *, and I'd want the MAX(dir2).
I was thinking of something like this:
SELECT count(*)
FROM dfs.`/path/drill-data/`
where
dir0 = '2018' and dir1 = '*' and dir3 = MAXDIR('dfs', dir1);
but it doesn't seem to work; it says something about a null-related error with the MAXDIR function. I suspect I need to provide a full path as the second parameter but then I think it would probably choose a single max directory and not one per leaf folder.

Figured it out.
If a directory variable would be a wild-card, just don't put it in the where clause.
Use concat() to build the path using the dir variables for the second parameter of MAXDIR. This way you can make a path based on the current values of the DIR variables on the current record.
For example
Note that this example has some extra directory levels.
SELECT distinct epoch_hour, concat(dir0, '-', dir1, '-', dir2, '-', dir3) as origin
FROM dfs.`/path/drill-data/`
where
dir0 = '2018' and dir1 = '01'
and dir3 = MAXDIR('dfs', concat('/path/drill-data/', dir0, '/', dir1, '/', dir2, '/'))

Related

MySQL SUBSTR LOCATE multi-search-strings

Tricky one, and my brain is mush after staring at my screen for about an hour.
I'm trying to query my database to return the first part of a string (domain name eg. http://www.example.com) in the column image_link.
I have managed this for all rows where the image_link contains .com as part of the string... but I need the code to be more versatile, so it searches for the likes of .net and .co.uk too.
Had thought some sort of nested REPLACE might work, but it doesn't make sense when I try to apply it - and I'm stuck.
Query Builder code:
$builder->select("SUBSTRING(image_link, 1, LOCATE('.com', image_link) + 3) AS domain");
Example strings, with desired results:
http://www.example.com/brands/567.jpg // http://www.example.com
https://www.example.org/photo.png // https://www.example.org
http://example.net/789 // http://example.net
Any help/advice warmly welcomed!
SELECT ... ,
SUBSTRING_INDEX(image_link, '/', 3) domain
FROM test;
Or, if protocol may be absent, then
SELECT ... ,
SUBSTRING_INDEX(image_link, '/', CASE WHEN LOCATE('//', image_link) THEN 3 ELSE 1 END) domain
FROM test;
fiddle

Calculating Average after MySQL Query is Run

I've run a MySQL query (this is in wordpress php):
$myQuery = $wpdb->get_results('SELECT Opponent, ROUND(AVG(Points),2)
AS Avg_Points, ROUND(AVG(Plus_Minus),2) AS Avg_Plus_Minus
FROM ' . 'afl_defense_v_position' . ' WHERE Position = "MID"
AND Rank <= 1 AND Round >= 10 GROUP BY Opponent
ORDER BY Avg_Plus_Minus DESC')
This all works fine and I can build an html table off this no problem.
What I am looking to do now though is find the standard deviation and average of the resulting Avg_Plus_Minus column and assign them to php variables so that I can use them to colour the table rows.
How do I assign these variables? (Once assigned I know how to code the colours)
I know how to do this by running another MySQL query and modifying the aforementioned code, however, I assume there is an easier way to calculate these from the array result of the original query.
Any help is appreciated.
You can install the PECL extension stats and then use the stats_standard_deviation() function to compute the standard deviation. Also look at http://php.net/manual/en/function.stats-standard-deviation.php for a pure PHP implementation (if you do not want to install the extension).

SQL Querying inside XML column

Am trying to Query inside an SQL table which has XML Column .
Table name: 'Purchase'
Column name: 'XML_COL'
Please find below xml data for column name 'XML_COL' under purchase table:
<ns1:Request xmlns:ns1="http://www.sample.com/hic/event/request"
xmlns:ns2="http://www.sample.com/hic/eventpayload/request">
<ns1:createeventRequest>
<ns1:eventPayLoad>
<ns2:eventPayLoad>
<Id>123456</Id>
</ns2:eventPayLoad>
</ns1:eventPayLoad>
</ns1:createeventRequest>
</ns1:Request>
I have written below query :
`select * from purchase,
XMLTABLE ('$d/Request/createeventRequest/eventPayLoad/eventPayLoad' PASSING XML_COL as "d"
COLUMNS
Id varchar(20) PATH 'Id') as a where(a.Id like '1234%');`
But this is returning me an empty column with no data.
But my requirement is it should fetch all the data for this particular Id.
Please help if any one faced this kind of issue.
Do we need to include namespaces as well while querying?? or am I missing any thing?
I think the expression PATH 'Id' is bit to simple...
I'm not familiar with MySQL's abilities to query XML... The Path Id would try to find an element "Id" from the current node (which is the root node in the first action). But there is no "Id"... You must either specify the full path, starting with a single / to start at the root node, or let the engine try a deep search, starting with two //
These paths should work:
SELECT ExtractValue(
'<ns1:Request xmlns:ns1="http://www.sample.com/hic/event/request" xmlns:ns2="http://www.sample.com/hic/eventpayload/request">
<ns1:createeventRequest>
<ns1:eventPayLoad>
<ns2:eventPayLoad>
<Id>123456</Id>
</ns2:eventPayLoad>
</ns1:eventPayLoad>
</ns1:createeventRequest>
</ns1:Request>',
'/ns1:Request[1]/ns1:createeventRequest[1]/ns1:eventPayLoad[1]/ns2:eventPayLoad[1]/Id[1]' ) AS result;
If there is only one element with a value (in your case "Id") you might use the simple deep search like this:
SELECT ExtractValue(
'<ns1:Request xmlns:ns1="http://www.sample.com/hic/event/request" xmlns:ns2="http://www.sample.com/hic/eventpayload/request">
<ns1:createeventRequest>
<ns1:eventPayLoad>
<ns2:eventPayLoad>
<Id>123456</Id>
</ns2:eventPayLoad>
</ns1:eventPayLoad>
</ns1:createeventRequest>
</ns1:Request>',
'//Id[1]' ) AS result;
But - in general - it is good advise to be as specific as possible...
Just cracked the query...When name spaces are being used in an XML, instead of the entire path, I found it's better to use '/*//' which traverses through the required element tag through XML.
Final Query:
select * from purchase,
XMLTABLE('$d' PASSING XML_COL as "d"
COLUMNS
Id varchar(20) PATH '/*//Id') as a where(a.Id like '1234%') with ur
Using 'with ur' helps to read the data that has not been committed in the database.
Please post comments if it is helpful.

How to get post slug from Drupal database

I'm trying to export Drupal posts over to Wordpress (which is in itself a hassle). I can't figure out how to maintain the URLs of the blog posts though. Some of them are customized:
Blog titled Story of Soil is blog/2012/03/03/soil-story in Drupal. One titled Welcome John Doe is simply /john
Is there a Drupal function to making these URLs? Where does it store the customized blog posts?
You can get the URL Alias by using the url method.
$url = url('node/' . $nid);
You should be able to get the Alias for a node by using drupal_lookup_path:
// alias: return an alias for a given Drupal system path (if one exists).
$alias = drupal_lookup_path('alias', $node->nid);
Drupal manual: drupal_lookup_path or the reverse, look up node/internal path from alias: drupal_get_normal_path.
It seems the url function function that Rawkode posted does about the same, so I guess it comes down to your personal preference.
Also see: http://daipratt.co.uk/how-to-get-the-path-of-a-node-from-the-node-id-in-drupal/
I found the above tremendously helpful because i was hoping to get what I needed from the db when I was looking and not by using a function. it gave me a good starting point but running this as it sits does nothing but error out because those are not the right column names. Keep in mind this updates stuff in place and will break everything if you are not working on a copy of the database. This code will fix it:
UPDATE `url_alias`
SET source=SUBSTRING_INDEX(`source`, '/', -1),
alias=SUBSTRING_INDEX(`alias`, '/', -1)
WHERE 1
since im not big on doing things in place I just added two columns nid and slug and then ran this query
UPDATE `url_alias`
SET nid=SUBSTRING_INDEX(`source`, '/', -1),
slug=SUBSTRING_INDEX(`alias`, '/', -1)
WHERE 1
I found the url_alias table in the Drupal database, and I ran this SQL statement:
UPDATE `url_alias`
SET src=SUBSTRING_INDEX(`src`, '/', -1),
dst=SUBSTRING_INDEX(`dst`, '/', -1)
WHERE 1
src is now the nid and dst is now the slug. I can rename them and then INSERT INTO wp_posts as post_name

Combine 'like' and 'in' in a SqlServer Reporting Services query?

The following doesn't work, but something like this is what I'm looking for.
select *
from Products
where Description like (#SearchedDescription + %)
SSRS uses the # operator in-front of a parameter to simulate an 'in', and I'm not finding a way to match up a string to a list of strings.
There are a few options on how to use a LIKE operator with a parameter.
OPTION 1
If you add the % to the parameter value, then you can customize how the LIKE filter will be processed. For instance, your query could be:
SELECT name
FROM master.dbo.sysobjects
WHERE name LIKE #ReportParameter1
For the data set to use the LIKE statement properly, then you could use a parameter value like sysa%. When I tested a sample report in SSRS 2008 using this code, I returned the following four tables:
sysallocunits
sysaudacts
sysasymkeys
sysaltfiles
OPTION 2
Another way to do this that doesn't require the user to add any '%' symbol is to generate a variable that has the code and exceute the variable.
DECLARE #DynamicSQL NVARCHAR(MAX)
SET #DynamicSQL =
'SELECT name, id, xtype
FROM dbo.sysobjects
WHERE name LIKE ''' + #ReportParameter1 + '%''
'
EXEC (#DynamicSQL)
This will give you finer controller over how the LIKE statement will be used. If you don't want users to inject any additional operators, then you can always add code to strip out non alpha-numeric characters before merging it into the final query.
OPTION 3
You can create a stored procedure that controls this functionality. I generally prefer to use stored procedures as data sources for SSRS and never allow dynamically generated SQL, but that's just a preference of mine. This helps with discoverability when performing dependency analysis checks and also allows you to ensure optimal query performance.
OPTION 4
Create a .NET code assembly that helps dynamically generate the SQL code. I think this is overkill and a poor choice at best, but it could work conceivably.
Have you tried to do:
select * from Products where Description like (#SearchedDescription + '%')
(Putting single quotes around the % sign?)
Dano, which version of SSRS are you using? If it's RS2000, the multi-parameter list is
not officially supported, but there is a workaround....
put like this:
select *
from tsStudent
where studentName like #SName+'%'
I know this is super old, but this came up in my search to solve the same problem, and I wound up using a solution not described here. I'm adding a new potential solution to help whomever else might follow.
As written, this solution only works in SQL Server 2016 and later, but can be adapted for older versions by writing a custom string_split UDF, and by using a subquery instead of a CTE.
First, map your #SearchedDescription into your Dataset as a single string using JOIN:
=JOIN(#SearchedDedscription, ",")
Then use STRING_SPLIT to map your "A,B,C,D" kind of string into a tabular structure.
;with
SearchTerms as (
select distinct
Value
from
string_split(#SearchedDescription, ',')
)
select distinct
*
from
Products
inner join SearchTerms on
Products.Description like SearchTerms.Value + '%'
If someone adds the same search term multiple times, this would duplicate rows in the result set. Similarly, a single product could match multiple search terms. I've added distinct to both the SearchTerms CTE and the main query to try to suppress this inappropriate row duplication.
If your query is more complex (including results from other joins) then this could become an increasingly big problem. Just be aware of it, it's the main drawback of this method.