I have a command in my script that goes like this
MESSAGE=`grep -Po 'MSG.\K[\w\s,.'\'',.:]*' < $FILENAME`
Now when this command is run I have an output which look like this
Kohl's EDI will be down for scheduled maintenance starting at 12:30 am until
approximately 4:00 am central time on Wednesday June 22nd. Kohl's will not be
able to send or receive EDI or AS2 transmissions during this time. If your
company's AS2 software has an automated process to resend a file after a
failure, Kohl's encourages your company to enable the resend process. This is
also a reminder for AS2 trading partners that Kohl's AS2 certificate will be
changing at 11:00 am central time on Tuesday June 21st.
Now after grepping the whole thing out I would pass the result of the command to a variable that will be used so that I can store the result to a mysql database
The question is How will I do it?
Make sure you have connection to mysql from the server and then you can pass MEASSGE variable in insert statement as \"$MESSAGE\" - \ is because of wrapping message in double quotes for valid insert statement.
Test: I did not have enough bigger column to store entire message as yours so I trimmed it little to fit in the column:
> MESSAGE="Kohl's EDI will be down for scheduled maintenance starting at 12:30 am until
approximately 4:00 am central time on Wednesday June 22nd. Kohl's will not be
able to send or receive EDI or AS2 transmissions during this time. If your
company's AS2 software has an automated process to resend a file after a
"
> sql "insert into at_test_run (run_id,run_error) values ('10000111',\"$MESSAGE\");"
> sql "select run_id,run_error from at_test_run where run_id='10000111'"
+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| run_id | run_error |
+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 10000111 | Kohl's EDI will be down for scheduled maintenance starting at 12:30 am until
approximately 4:00 am central time on Wednesday June 22nd. Kohl's will not be
able to send or receive EDI or AS2 transmissions during this time. If your
company's AS2 software has an automated process to resend a file after a
|
+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Related
this one's difficult and I haven't found any answers for hours, I hope you can help me. I'm not an english native speaker, I apologize in advance.
I arrived last week in a company and am working with .json files which are all listed in directories, by companies.
e.g.
d/Company1/comport/enregistrement_sessionhash1
enregistrement_sessionhash2
enregistrement_sessionhash3
d/Company2/comport/enregistrement_sessionhashX
d/Company3/comport/enregistrement_sessionhashY...
Each of them can contain [0-n] characters.
We use these files to calculate data.
The person before me didn't think about classifying them by /year/months, therefore it takes a lot of time when we do algorithms on data during a specific month, because it reads all the files inside the directory, which are being stored every 10 seconds per websitecompany and website-user user for approximately 2 years.
Sadly, we can't use systems' creation/modification time, only text informations in the .json files, since there's been a server problem and my coworkers had to paste files, resetting creation time.
Here is a template of the .json files
BEGIN OF FILE
{"session":"session_hash","enregistrements":[{"session":"session_hash",[...]{"data2":"xxx"}],"timedate_saved":"27 04 2020 12:39:21"},{"session":"session_hash",[...],"timedate_saved":"17 06 2020 11:01:08"},{"data1":"session_hash"[...],{"data2":"xxx"}],"timedate_saved":"27 04 2020 18:01:14"}]}
END OF FILE
In a file, there can't be a different "session" value. This value is a hash, used aswell in the filename e.g. d/Company1/comport/enregistrement_session_hash
I would like to read the files, cut every "enregistrements" sub-arrays (starting with [{"session"...and ending with "timedate_saved":"01 01 1970 00:00:00"}]}. Doing this, i want the cutted out text to get written in files having the same filenames (session_hash), stored by company/comport/year/months/enregistrement_sessionhash, gotten by the "timedate_saved" data. And of course be able to reuse these files for further use, so having the .json parsing.
That's a lot, I hope someone has time on his hands to help me getting through it.
I have the following CSV file:
textbox6,textbox10,textbox35,textbox17,textbox43,textbox20,textbox39,textbox23,textbox9,textbox16
"Monday, March 02, 2015",Water Front Lodge,"Tuesday, September 23, 2014",,Routine,#1 Johnson Street,Low,Northern Health - Mamaw/Keewa/Athab,Critical Item,4 - Hand Washing Facilities/Practices
"Monday, March 02, 2015",Water Front Lodge,"Thursday, August 01, 2013",,Routine,#1 Johnson Street,Low,Northern Health - Mamaw/Keewa/Athab,General Item,11 - Accurate Thermometer Available to Monitor Food Temperatures
"Monday, March 02, 2015",Water Front Lodge,"Wednesday, February 08, 2012",,Routine,#1 Johnson Street,Low,Northern Health - Mamaw/Keewa/Athab,Critical Item,1 - Refrigeration/Cooling/Thawing (must be 4°C/40°F or lower)
"Monday, March 02, 2015",Water Front Lodge,"Wednesday, February 08, 2012",,Routine,#1 Johnson Street,Low,Northern Health - Mamaw/Keewa/Athab,General Item,12 - Construction/Storage/Cleaning of Equipment/Utensils
And here's what file tells me:
Little-endian UTF-16 Unicode text, with CRLF, CR line terminators
I was trying to use Scala-csv to parse it but always get Malformed CSV exceptions. I've uploaded it to CSV Lint and get 5 "unknown errors".
Eyeballing the file, I cannot determine why two separate parsers would fail. it seems to be perfectly ordinary and valid CSV. What about it is malformed?
And yes, I'm aware that it's terrible CSV. I didn't create it -- I just have to parse it.
EDIT: Of note is that this parser also fails.
It is definitely the newline. See the Lint results here:
CSV Lint Validation
I copied your SCV and made sure the newline characters were CRLF
I used Notepad++ and used the Edit=>EOL Conversion=>Windows Format to do the conversion.
I've got two files - venues.csv and tweets.csv. I want to count for each of the venues the number of times occurs in the tweet message from the tweets file.
I've imported the csv files in HCatalog.
What I managed to do so far:
I know how to filter the text fields and to get these tuples that contain 'Shell' their tweet messages. I want to do the same but not with hard-coded Shell, rather for each name from the venuesNames bag. How can I do that? Also then how can I use the generate command properly to generate a new bag that is matching the results from the count with the names of the venues?
a = LOAD 'venues_test_1' USING org.apache.hcatalog.pig.HCatLoader();
b = LOAD 'tweets_test_1' USING org.apache.hcatalog.pig.HCatLoader();
venuesNames = foreach a generate name;
countX = FILTER b BY (text matches '.*Shell.*');
venueToCount = generate ('Shell' as venue, COUNT(countX) as countVenues);
DUMP venueToCount;
The files that I'm using are:
tweets.csv
created_at,text,location
Sat Nov 03 13:31:07 +0000 2012, Sugar rush dfsudfhsu, Glasgow
Sat Nov 03 13:31:07 +0000 2012, Sugar rush ;dfsosjfd HAHAHHAHA, London
Sat Apr 25 04:08:47 +0000 2009, at Sugar rush dfjiushfudshf, Glasgow
Thu Feb 07 21:32:21 +0000 2013, Shell gggg, Glasgow
Tue Oct 30 17:34:41 +0000 2012, Shell dsiodshfdsf, Edinburgh
Sun Mar 03 14:37:14 +0000 2013, Shell wowowoo, Glasgow
Mon Jun 18 07:57:23 +0000 2012, Shell dsfdsfds, Glasgow
Tue Jun 25 16:52:33 +0000 2013, Shell dsfdsfdsfdsf, Glasgow
venues.csv
city,name
Glasgow, Sugar rush
Glasgow, ABC
Glasgow, University of Glasgow
Edinburgh, Shell
London, Big Ben
I know that these are basic questions but I'm just getting started with Pig and any help will be appreciated!
I presume that your list of venue names is unique. If not, then you have more problems anyway because you will need to disambiguate which venue is being talked about (perhaps by reference to the city fields). But disregarding that potential complication, here is what you can do:
You have described a fuzzy join. In Pig, if there is no way to coerce your records to contain standard values (and in this case, there isn't without resorting to a UDF), you need to use the CROSS operator. Use this with caution because if you cross two relations with M and N records, the result will be a relation with M*N records, which might be more than your system can handle.
The general strategy is 1) CROSS the two relations, 2) Create a custom regex for each record*, and 3) Filter those that pass the regex.
venues = LOAD 'venues_test_1' USING org.apache.hcatalog.pig.HCatLoader();
tweets = LOAD 'tweets_test_1' USING org.apache.hcatalog.pig.HCatLoader();
/* Create the Cartesian product of venues and tweets */
crossed = CROSS venues, tweets;
/* For each record, create a regex like '.*name.*'
regexes = FOREACH crossed GENERATE *, CONCAT('.*', CONCAT(venues::name, '.*')) AS regex;
/* Keep tweet-venue pairs where the tweet contains the venue name /*
venueMentions = FILTER regexes BY text MATCHES regex;
venueCounts = FOREACH (GROUP venueMentions BY venues::name) GENERATE group, COUNT($1);
The sum of all venueCounts might be more than the number of tweets, if some tweets mention multiple venues.
*Note that you have to be a little careful with this technique, because if the venue name contains characters that have special interpretations in Java regular expressions, you'll need to escape them.
We have a 2,000,000 lines of code application in Mercurial. Obviously there is a lot of valuable information inside this repository.
Are there any tools or techniques to dig out some of that information?
For instance, over the history of the project, what five files have seen the most changes? What five files are the most different from what they were one year ago? Any particular lines of code seen a lot of churn?
I'm interested in that sort of thing and more.
Is there a way to extract this kind of information from our repository?
I don't know of any tools specifically made for doing this, but Mercurial's log templates are very powerful for getting data out of the system. I've done a bit of this sort of analysis in the past, and my approach was:
Use hg log to dump commits to some convenient format (xml in my case)
Write a script to import the xml into something queryable (database, or just work from the XML directly if it's not too big)
Here's an example hg log command to get you going:
mystyle.txt: (template)
changeset = '<changeset>\n<user>{author|user}</user>\n<date>{date|rfc3339date|escape}</date>\n<files>\n{file_mods}{file_adds}{file_dels}</files>\n<rev>{node}</rev>\n<desc>{desc|strip|escape}</desc>\n<branch>{branches}</branch><diffstat>{diffstat}</diffstat></changeset>\n\n'
file_mod = '<file action="modified">{file_mod|escape}</file>\n'
file_add = '<file action="added">{file_add|escape}</file>\n'
file_del = '<file action="deleted">{file_del|escape}</file>\n'
Example invocation using template and date range:
hg --repository /path/to/repo log -d "2012-01-01 to 2012-06-01" --no-merges --style mystyle.txt
Try the built-in hg churn extension. One thing I like to use it for, for example, is to see a monthly bar graph of commits like this:
> hg churn -csf '%Y-%m'
2014-02 65 *************************************
2014-03 22 *************
2014-04 52 ******************************
2014-05 67 ***************************************
2014-06 31 ******************
2014-07 29 *****************
2014-08 29 *****************
2014-09 61 ***********************************
2014-10 36 *********************
2014-11 23 *************
2014-12 32 ******************
2015-01 60 ***********************************
2015-02 20 ************
(might want to set up aliases if you find you're using the command often enough)
I am trying to reprogram the the output of my Magtek MagWedge and I cant find any documentation on how the syntax to send to output just the cc number from my cc swipe reader and not of the other data
Below is the example configuration, however I have no clue how to change these values to.
Comment:Set up IntelliPIN to Required Configuration
/rawxact 50B01001011
/rawxact 50E10000000
/rawxact 940101010101010101
/rawxact 564
Comment:99{{SN}}
/rawsend 52
Comment:50Z00000110
/rawsend 42Setup Done
Thanks!
It turns out I needed to get the USBMSR Demo program and then send message 01 03 then send 02, then restart the application and send 01 03 send msg then 02 send msg and it fixed it for me.