I have to compare two csv files using Mule.
I have to find details like new rows added, deleted or updated. I have searched but there is no such feature/component.
Can I use a bash script in mule?
I know I can use a Java component, but am soliciting better suggestions or ideas. Please suggest pointers to get me started.
In Mule you can leverage a Java component, and there are a number of solutions to this problem using Java.
Related
I have to do the following steps two or three times a day
log in into Elasticsearch
Go to Dev Tools
Run a specific query by selecting it and pressing ctrl + enter
Query that I have to run
Select the results that returns in the "buckets" and copy it.
The yellow markdown in the image is what I have to select and copy
Then I go to https://www.convertcsv.com/json-to-csv.htm and paste the results so it converts to CSV.
Where I have to paste the results.
I can then download the CSV and then import it into google sheets so I can view the results in a Looker Dashboard.
Button to download the converted CSV.
This take me some time everyday and I would like if there is any way that I could automate such routine.
Maybe some ETL tool that can perform at least part of the process or may some more specific way to do it with python.
Thanks in advance.
I don't have much experience with what I want to do and I tried to search online similar issues, but couldn't really find anything useful.
I don't know you tried, but there is a reporting tool on elasticsearch inside the "Stack Management > Reporting". On the other side, there are another tools which you can work from a server with crontab. Here are some of them :
A little bit old but I think it can work for you. ES2CSV. YOu can check there are examples inside the docs folder. YOu can send queries via file and report to CSV.
Another option which is my preference too. You can use pandas library of python. YOu can write a script according to this article, and you can get a csv export CSV. The article I mentioned is really explained in a great way.
Another alternative a library written in Java. But the documentation is a little bit weak.
Another alternative for python library can be elasticsearch-tocsv. This one is a little bit recently updated when I compare it to first alternative. But the query samples are a little bit weak. But there is a detailed article. You can check it.
You can use elasticdump, which is written on NodeJS and a great tool to report data from elasticsearch. And there is a CSV export option. You can see examples on GitHub page.
I will try to find more and I will update this answer time by time. Thanks!
I am working on school project where we need to create website and use Redis to search database, in my case it will be movie database. I got JSON file with names and ratings of 100 movies. I would like to upload this dataset into Redis instead on entering the entire dataset manually. JSON file is saved on my desktop and I am using Ubuntu 20.04.
Is there a way to do it?
I have never used Redis so my question might be very silly. I've been looking all over the internet and cannot find exactly what needs to be done. I might be googling incorrect question maybe that's why I cannot find the answer.
Any help would be appreciated.
Write an appropriate program to do the job. There's no one-size-fits-all process because how your data is structured in redis is up to you; once you decide on that, it should be easy to write a program to parse the JSON and insert the data.
I have many, large JSON files that I'd like to run some analytics against. I'm just getting started with SparkSQL and am trying to make sure I understand the benefits between having SparkSQL read the JSON records into an RDD/DataFrame from file (and have the schema inferred) or to run a SparkSQL query on the files directly. If you have any experience using SParkSQL either way I'd be interested to hear which method is preferred and why.
Thank you, in advance, for your time and help!
You can call explain() as an action instead of show() or count() on a dataset. Then Spark will show you the selected physical plan.
You can find the above picture here. As far as I know there should be no difference. But I prefer to use the read() method. When I use an IDE, I can see all the available methods. When you do it with SQL, there could be a mistake like slect instead of select, but you will get the error first, when you run your code.
This is probably a noob question, so I apologize in advance.
The HBase console, as far as I understand, is an extension (or a script running over) JIRB. Also, it comes with several HBase-specific commands, one of which is 'get' - to retrieve columns\values from a table.
However, it seems like 'get' only writes to screen and doesn't output values at all.
Is there any native hbase console command which will allow me to retrieve a value (e.g. a set of rows\columns), put them into a variable and retrieve their values?
Thanks
No, there is not a native console command in 0.92. If you dig into the source code, there is a class Hbase::Table that could be used to do what you want. I believe this is going to be more exposed in 0.96. At this point, I have resorted to adding my own Ruby to my shell to handle a variety of common tasks (like using SingleColumnValueFilters on scans).
I am building my first database driven website with Drupal and I have a few questions.
I am currently populating a google docs excel spreadsheet with all of the data I want to eventually be able to query from the website (after it's imported). Is this the best way to start?
If this is not the best way to start what would you recommend?
My plan is to populate the spreadsheet then import it as a csv into the mysql db via the CCK Node.
I've seen two ways to do this.
http://drupal.org/node/133705 (importing data into CCK nodes)
http://drupal.org/node/237574 (Inserting data using spreadsheet/csv instead of SQL insert statements)
Basically my question(s) is what is the best way to gather, then import data into drupal?
Thanks in advance for any help, suggestions.
There's a comparison of the available modules at http://groups.drupal.org/node/21338
In the past when I've done this I simply write code to do it on cron runs (see http://drupal.org/project/phorum for an example framework that you could strip down and build back up to do what you need).
If I were to do this now I would probably use the http://drupal.org/project/migrate module where the philosophy is "get it into MySQL, View the data, Import via GUI."
There is a very good module for this, node import. It allows you to take your GoogleDocs spreadsheet and import it as a .csv file.
It's really easy to use, the module allows you to map your .csv columns to the node fields you want them to go to, so you don't have to worry about setting your columns in a particular order. Also, if there is an error on some records, it will spit out a .csv with the error files and what caused the error, but will import all good records.
I have imported up to 3000 nodes with this method.