Spark data frame actions for selected rows [closed] - mysql

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I have a data frame like below
+------++-----------------------+
| state| time stamp |
+------+------------------------+
| 0 | Sun Aug 13 10:58:44 |
| 1 | Sun Aug 13 11:59:44 |
| 1 | Sun Aug 13 12:50:43 |
| 1 | Sun Aug 13 13:00:44 |
| 0 | Sun Aug 13 13:58:42 |
| 0 | Sun Aug 13 14:00:41 |
| 0 | Sun Aug 13 14:30:45 |
| 0 | Sun Aug 13 14:58:46 |
| 1 | Sun Aug 13 15:00:47 |
| 0+ | Sun Aug 13 16:00:49 |
+------+------------------------+
I need to select the timestamps only when state changes from 1 to 0,
I need to separate these rows separately
Sun Aug 13 11:59:44
Sun Aug 13 13:58:42
Sun Aug 13 15:00:47
Sun Aug 13 16:00:49
then take the time differences and sum up.
So can some one suggest, what kind of query I should write for this .
I need some result like below
(13:58:42 - 11:59:44) + (16:00:49 - 15:00:47)

Window function should help with your first need. Filter will fulfill you third need. Your third need can be fulfilled by extracting the time from the date-time value.
Given a dataframe as
+-----+-------------------+
|state|timestamp |
+-----+-------------------+
|0 |Sun Aug 13 10:58:44|
|1 |Sun Aug 13 11:59:44|
|1 |Sun Aug 13 12:50:43|
|1 |Sun Aug 13 13:00:44|
|0 |Sun Aug 13 13:58:42|
|0 |Sun Aug 13 14:00:41|
|0 |Sun Aug 13 14:30:45|
|0 |Sun Aug 13 14:58:46|
|1 |Sun Aug 13 15:00:47|
|0 |Sun Aug 13 16:00:49|
+-----+-------------------+
Doing the things I explained above should help. Doing the following should solve your first and second needs.
import org.apache.spark.sql.functions._
df.withColumn("temp", lag("state", 1).over(Window.orderBy("timestamp")))
.withColumn("temp", when(col("temp").isNull, lit(0)).otherwise(col("temp")))
.filter(col("state") =!= col("temp"))
You should have
+-----+-------------------+----+
|state|timestamp |temp|
+-----+-------------------+----+
|1 |Sun Aug 13 11:59:44|0 |
|0 |Sun Aug 13 13:58:42|1 |
|1 |Sun Aug 13 15:00:47|0 |
|0 |Sun Aug 13 16:00:49|1 |
+-----+-------------------+----+
Now regarding you third need, you need to find ways to extract time from timestamp column and do something like below
import org.apache.spark.sql.functions._
df.withColumn("temp", lag("state", 1).over(Window.orderBy("timestamp")))
.withColumn("temp", when(col("temp").isNull, lit(0)).otherwise(col("temp")))
.filter(col("state") =!= col("temp"))
.select(collect_list(col("timestamp")).as("time"))
.withColumn("time", concat_ws(" + ", concat_ws(" - ", $"time"(1), $"time"(0)), concat_ws(" - ", $"time"(3), $"time"(2))))
You should have
+-------------------------------------------------------------------------------------+
|time |
+-------------------------------------------------------------------------------------+
|Sun Aug 13 13:58:42 - Sun Aug 13 11:59:44 + Sun Aug 13 16:00:49 - Sun Aug 13 15:00:47|
+-------------------------------------------------------------------------------------+
I hope the answer is helpful except with extraction of time value from timestamp column

Related

I need a query to geta output like pivot table? [duplicate]

This question already has answers here:
How can I return pivot table output in MySQL?
(10 answers)
Closed 2 years ago.
My table look like this
name value monthyear year
g1 10 March 18 2018
g1 11 March 18 2018
g1 34 March 19 2019
g1 45 March 19 2019
g2 10 April 18 2018
g1 11 May 18 2018
g1 34 May 19 2019
g1 45 June 19 2019
And I need out put like this
Name March 18 March 19 April 18 May 18 May 19 June 19
g1 21 79 11 34 45
g2 10
In Pivot table I am getting this, How can I get this table with sql query ? Please help & Thanks in advance.
You can use case expression inside sum. here is the demo.
select
name,
sum(case when monthyear = 'March 18' then value end) as March18
sum(case when monthyear = 'March 19' then value end) as March19
..
..
from yourTable
group by
name
output:
| name | March_18 | March_19 | April_18 | May_18 | May_19 | June_19 |
| ---- | -------- | -------- | -------- | ------ | ------ | ------- |
| g1 | 21 | 79 | | 11 | 34 | |
| g2 | | | 10 | | | |

MySQL `SUM` with `GROUP BY` is missing data in results

I have a database with a table containing information on some images, each row containing a createdAt date and a viewCount. The data ranges from September 2014 until today (July 2016). I want to get a monthly sum of the amount of views across all images for the month
When I run the query
SELECT YEAR(createdAt), MONTH(createdAt), SUM(viewCount)
FROM Images
GROUP BY MONTH(createdAt);
I'm only returned 12 rows with results between September 2014 and August 2015
Year | Month | Views
-------------------
2014 | 9 | 1452
2014 | 10 | 279
2014 | 11 | 34428
2014 | 12 | 4763
2015 | 1 | 2826
2015 | 2 | 777
2015 | 3 | 568
2015 | 4 | 1309
2015 | 5 | 46744
2015 | 6 | 1541
2015 | 7 | 8160
2015 | 8 | 91
If I add a date restraint it will give me the latest data, but again only 12 rows
SELECT YEAR(createdAt), MONTH(createdAt), SUM(viewCount)
FROM Images WHERE createdAt > DATE('2015-08-01 00:00:00')
GROUP BY MONTH(createdAt);
Year | Month | Views
--------------------
2015 | 8 | 981
2015 | 9 | 1031
2015 | 10 | 2566
2015 | 11 | 3325
2015 | 12 | 411
2016 | 1 | 2140
2016 | 2 | 710
2016 | 3 | 714
2016 | 4 | 1985
2016 | 5 | 426
2016 | 6 | 119
2016 | 7 | 81
I do realise that since it's July the second query stops there as that's where the data ends, but why does the first query not return all the results?
Group by year/month:
SELECT YEAR(createdAt), MONTH(createdAt), SUM(viewCount)
FROM Images
--WHERE createdAt > DATE('2015-08-01 00:00:00')
GROUP BY YEAR(createdAt), MONTH(createdAt);
Related Group by clause in mySQL and postgreSQL, why the error in postgreSQL?
Keep in mind that from MySQL 5.7.6+ your initial query may not even work because of only_full_group_by which is set by default.
You can simply add Year to you group by
SELECT YEAR(createdAt), MONTH(createdAt), SUM(viewCount)
FROM Images
GROUP BY YEAR(createdAt), MONTH(createdAt)
ORDER BY YEAR(createdAt), MONTH(createdAt)

Order by month giving reverse order in Mysql

I'm trying to order a mysql query according to the month name but it is ordering in reverse order .I have tried both the ASC and DESC order but not working!This is what i'm getting :
order_amount month_number
370.245 Dec
0.01 Aug
0.02 July
0.01 May
2 Apr
3 Mar
4 Feb
5 Jan
This is the query:
select sum(amnt) as order_amount,month_number from orders where paid =1 GROUP BY month_number ORDER BY MONTH(month_number) ASC
This is a sample table 'order' on which i'm running query
order_amount month_number paid
370.245 Jan 1
0.01 Aug 1
0.02 July 1
0.01 Apr 1
2 May 0
3 Mar 1
4 Feb 0
5 Nov 0
month_number() is a lousy name for a string name for the month. That said, your attempts will not work, because month() takes a date, not a string representation. The best way is probably just to use case:
order by (case when month_number = 'Jan' then 1
when month_number = 'Feb' then 2
. . .
when month_number = 'Dec' then 12
end)
I would strongly suggest you include a date or actual month number in the table. And, rename month_number to something like month_name.
According to your sample, you've got 'Dec', 'Aug' and so on in your month_number field.
How is month(month_number) supposed to work if you don't pass a valid date type? if you pass some random text you'll just get NULL and can't order by NULL
http://sqlfiddle.com/#!2/3a774d/3/0
http://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_month
SELECT * FROM my_table;
+-------+
| month |
+-------+
| Apr |
| Aug |
| Dec |
| Feb |
| Jan |
| Jul |
| Jun |
| Mar |
| May |
| Nov |
| Oct |
| Sep |
+-------+
SELECT * FROM my_table ORDER BY STR_TO_DATE(CONCAT('2014-',month,'-01'),'%Y-%b-%d');
+-------+
| month |
+-------+
| Jan |
| Feb |
| Mar |
| Apr |
| May |
| Jun |
| Jul |
| Aug |
| Sep |
| Oct |
| Nov |
| Dec |
+-------+
... or something like that

MySQL extract a single word from a text field

Here is what my table looks like
----------------------------+-----------+
| tw_dt | text | ID |
+--------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+-----------+
| Thu Dec 12 19:14:13 +0000 2013 | #CoSignCountdown airs MON-FRI #7p send in your hottest song to cosigncountdown#gmail.com Original music ONLY!!! #TEAMFOLLOWBACK #HipHopDotCA | 14985388 |
| Thu Dec 12 19:14:12 +0000 2013 | "Get any Beat at http://t.co/R61XJ5EYUQ for $200 each. Exclusive Rights. To pruchase - GummyBeatz#Gmail.com" | 170900539 |
| Thu Dec 12 19:14:09 +0000 2013 | "#iwantmixedbabys PREMO is looking for Models for the SuperBadd fashionshow January10th 3000 EXPECTED email us premopresents#gmail.com" | 343062689 |
| Thu Dec 12 19:14:09 +0000 2013 | mixtape hosting contact . bookdjlyrical#gmail.com | 99644471 |
| Thu Dec 12 19:14:09 +0000 2013 | checkout http://t.co/4f2txyVpLy RIGHT NOW!!!! #Artist send in music and videos to preciseearz#gmail.com #BlogsAreTheNewMixtapes | 27872863 |
| Thu Dec 12 20:01:10 +0000 2013 | ATTENTION ARTISTS: DO YOU NEED US TO HELP YOU PROMOTE YOUR MUSIC?!? EMAIL US FOR OUR PROMO PACKAGES AT VIEWHIPHOP#gmail.com | 357913081 |
| Thu Dec 12 20:01:10 +0000 2013 | RT #ITSMIKESWAGGER: #Mike_Bell1 no doubt my artist appreciates the love mikeswagger#gmail.com hit me up | 165499064 |
| Thu Dec 12 20:01:10 +0000 2013 | #grandst #cmj_333 - swking1964#gmail.com is my Santa - but just needs to be aware of your site in General! Happy Holidays! | 574317429 |
+--------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+-----------+
I want to extract the email address from the text as it's own column.
Any thoughts on how I accomplish this in MYSQL?
Well, there's this huge regex for matching e-mail addresses. Combined with the regex replace solution found here, you should be good to go.

Header wrapping to new line when each column is smaller than its size

I have a column breakout that has a header that is rather long. The breakout always has more than one column (usually 12, one for each month).
My title is: "2012 Forecasts", so I would like it to look like this:
| 2012 Forecasts
|Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec |
The issue is that when I preview the report it looks like this:
| 2012 Forecasts
|
|Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec |
This is unacceptable formatting for this table, because there are many nested rows, which make the headers extremely large.
In experimenting for solutions, I noticed that making the length of the columns large enough to fit the string "2012 Forecasts", makes the row size down. However, by doing this my columns are way too wide.
Is there a way that I can force the column header not to wrap in this way
In the properties of the header I set CanGrow to false, and it suppressed the new line