How to post multiple JSON files to a server using cURL - json

I have multiple JSON files within a folder and I would like to post them all at once, in a single command line using curl. Is there a way to do this?
I have these files within a folder in my directory..
20190116_101859_WifiSensor(1).json
20190116_101859_WifiSensor(2).json
20190116_101859_WifiSensor(3).json
20190116_101859_WifiSensor(4).json
20190116_101859_WifiSensor(5).json
20190116_101859_WifiSensor(6).json
20190116_101859_WifiSensor(7).json
20190116_101859_WifiSensor(8).json
... plus more
I'd like to post all of the files from the folder in one go.
I know how to post one file using
curl -d "#20190116_101859_WifiSensor(1).json" http://iconsvr:8005/data
I need a way of posting them in one go, without having to write out each file name, if possible.

You can use a foreach loop to iterate over all files in your current directory which contains WifiSensor in the filename.
In Linux (Bash) you could use
for f in *WifiSensor*.json; do curl -d $f http://iconsvr:8005/data; done
In Windows (CMD)
for /r %f in (*WifiSensor*.json) do curl -d %f http://iconsvr:8005/data
Don't forget if you using the Windows Snippet above in a Batch file, you need to double the % signs.

Related

Use cURL to pull data from Web into excel

Guys I'm currently working with cURL for the first time.
What I am trying to do is pull data from a website usimg cURL
an put them into excel using the following commmand. I have to use an API-Key
to get the data.
curl -H "X-API-Key: API_KEY_NUMBER" http://example.com/api/exports/model/62f0d0dc24757f6e5bb0b723 -o "text.xlsx"
This works fine so far, the problem is that if want to open it in Excel it tells me that the file can not be opened because the file format or the file extension is invalid.
If i change the file extension to
curl -H "X-API-Key: API_KEY_NUMBER" http://example.com/api/exports/model/62f0d0dc24757f6e5bb0b723 -o "text.txt"
it opens in a text file but with all the data that i need. Now I am looking for a way to solve this.

How do I force tika server to exclude the TesseractOCRParser using curl

I'm running tika-server-1.23.jar with tesseract and extracting text from files using curl via php. Sometimes it takes too long to run with OCR so I'd like, occasionally, to exclude running tesseract. I can do this by inserting
<parser-exclude class="org.apache.tika.parser.ocr.TesseractOCRParser"/>
in the tika config xml file but this means it never runs tesseract.
Can I force the tika server to skip using tesseract selectively at each request via curl and, if so, how?
I've got a workaround where I'm running two instances of the tika server each with a different config file listening on different ports but this is sub-optimal.
Thanks in advance.
You can set the OCR strategy using headers for PDF files, which includes an option not to OCR:
curl -T test.pdf http://localhost:9998/tika --header "X-Tika-PDFOcrStrategy: no_ocr"
There isn't really an equivalent for other file types, but there is a similar header prefix call X-Tika-OCR that allows you to set configuration on the TesseractOCRConfig instance when used on any file type.
You have some options which could be of interest in your scenario:
maxFileSizeToOcr - which you could set to 0
timeout - which you could set to the timeout you are willing to give
tesseractPath - which you can set to anything, as if it can't find it, it can't execute
So, for example, if you want to skip a file you could set the max file size to 0 which means it will not be processed:
curl -T testOCR.jpg http://localhost:9998/tika --header "X-Tika-OCRmaxFileSizeToOcr: 0"
Or set the path to /dummy:
curl -T testOCR.jpg http://localhost:9998/tika --header "X-Tika-OCRtesseractPath: /dummy"
You can of course also use these headers with PDF files too, should you wish.

Linux shell script command - gzip

I am having one shell script in Linux in which the output will be generated in .csv format.
At the end of the script i am making this .csv to .gz format to reduce the space on my machine.
The file which is generated comes in this format Output_04-07-2015.csv
The command which i have written to make it zip is:-gzip Output_*.csv
But i am facing an issue that if the file already exists, then it should make the new file with that reported time stamp.
Can anyone help me with it.?
If all you want is to just overwrite the file if it already exists, gzip has a -f flag for it.
gzip -f Output_*.csv
What the -f flag does is forcefully create the gzip file, and overwrite whatever existing zip file there might already be.
Have a look at the man pages by typing man gzip or even this link for many other options.
If instead you want to do it more elegantly, you could check out and see if shell commands for your script work for you or not. But that would differ depending on what shell you have, bash, cshell, etc.

Importing large datasets into Couchbase

I am having difficulty importing large datasets into Couchbase. I have experience doing this very fast with Redis via the command line but I have not seen anything yet for Couchbase.
I have tried using the PHP SDK and it imports about 500 documents / second. I have also tried the cbcdocload script in the Couchbase bin folder but it seems to want each document in its on JSON file. It is a bit of work to create all these files and then load them. Is there some other importation process I am missing? If cbcdocload is the only way load data fast then is it possible to put multiple documents into 1 json file.
Take the file that has all the JSON documents in it and zip up the file:
zip somefile.zip somefile.json
Place the zip file(s) into a directory. I used ~/json_files/ in my home directory.
Then load the file or files by the following command:
cbdocloader -u Administrator -p s3kre7Pa55 -b MyBucketToLoad -n 127.0.0.1:8091 -s 1000 \
~/json_files/somefile.zip
Note: '-s 1000' is the memory size. You'll need to adjust this value for your bucket.
If successful you'll see output stating how many documents were loaded, success, etc.
Here is a brief script to load up a lot of .zip files in a given directory:
#!/bin/bash
JSON_Dir=~/json_files/
for ZipFile in $JSON_Dir/*.zip ;
do /Applications/Couchbase\ Server.app/Contents/Resources/couchbase-core/bin/cbdocloader \
-u Administrator -p s3kre7Pa55 -b MyBucketToLoad \
-n 127.0.0.1:8091 -s 1000 $ZipFile
done
UPDATED: Keep in mind this script will only work if your data is formatted correctly or if the files are less than the max single document size of 20MB. (not the zipfile, but any document extracted from the zip)
I have created a blog post describing bulk loading from a single file as well and it is listed here:
Bulk Loading Documents Into Couchbase

Recursive directory parsing with Pandoc on Mac

I found this question which had an answer to the question of performing batch conversions with Pandoc, but it doesn't answer the question of how to make it recursive. I stipulate up front that I'm not a programmer, so I'm seeking some help on this here.
The Pandoc documentation is slim on details regarding passing batches of files to the executable, and based on the script it looks like Pandoc itself is not capable of parsing more than a single file at a time. The script below works just fine in Mac OS X, but only processes the files in the local directory and outputs the results in the same place.
find . -name \*.md -type f -exec pandoc -o {}.txt {} \;
I used the following code to get something of the result I was hoping for:
find . -name \*.html -type f -exec pandoc -o {}.markdown {} \;
This simple script, run using Pandoc installed on Mac OS X 10.7.4 converts all matching files in the directory I run it in to markdown and saves them in the same directory. For example, if I had a file named apps.html, it would convert that file to apps.html.markdown in the same directory as the source files.
While I'm pleased that it makes the conversion, and it's fast, I need it to process all files located in one directory and put the markdown versions in a set of mirrored directories for editing. Ultimately, these directories are in Github repositories. One branch is for editing while another branch is for production/publishing. In addition, this simple script is retaining the original extension and appending the new extension to it. If I convert back again, it will add the HTML extension after the markdown extension, and the file size would just grow and grow.
Technically, all I need to do is be able to parse one branches directory and sync it with the production one, then when all changed, removed, and new content is verified correct, I can run commits to publish the changes. It looks like the Find command can handle all of this, but I just have no clue as to how to properly configure it, even after reading the Mac OS X and Ubuntu man pages.
Any kind words of wisdom would be deeply appreciated.
TC
Create the following Makefile:
TXTDIR=sources
HTMLS=$(wildcard *.html)
MDS=$(patsubst %.html,$(TXTDIR)/%.markdown, $(HTMLS))
.PHONY : all
all : $(MDS)
$(TXTDIR) :
mkdir $(TXTDIR)
$(TXTDIR)/%.markdown : %.html $(TXTDIR)
pandoc -f html -t markdown -s $< -o $#
(Note: The indented lines must begin with a TAB -- this may not come through in the above, since markdown usually strips out tabs.)
Then you just need to type 'make', and it will run pandoc on every file with a .html extension in the working directory, producing a markdown version in 'sources'. An advantage of this method over using 'find' is that it will only run pandoc on a file that has changed since it was last run.
Just for the record: here is how I achieved the conversion of a bunch of HTML files to their Markdown equivalents:
for file in $(ls *.html); do pandoc -f html -t markdown "${file}" -o "${file%html}md"; done
When you have a look at the script code from the -o argument, you'll see it uses string manipulation to remove the existing html with the md file ending.