Currently, I have an image stored as an MxNx3 uint8 array in MATLAB. However, I need to embed it in an HTML document, and I can't include the image separately.
Instead, I've decided to try and encode the image as a base64 string. However, I can't seem to find a way to encode the image as a string without having to first save the image to disk. I tried looking into writebmp and the like, but I can't seem to get it to work.
I'd really rather not write the image to a file, just to read it back using fread. The computer I'm using has very low Disk I/O, so that will take way too long.
Any help would be appreciated!
Edit:
I looked here, but that errors in R2018b due to "no method found". When I linearize the image, the returned string is incorrect
From an image matrix to HTML
1 Convert the image to the bytes of a BMP
function [header] = writeBMP(IM)
header = uint8([66;77;118;5;0;0;0;0;0;0;54;0;0;0;40;0;0;0;21;0;0;0;21;0;0;0;1;0;24;0;0;0;0;0;64;5;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0]);
IMr = IM(:,:,1);
IMg = IM(:,:,2);
IMb = IM(:,:,3);clear IM;
IM(:,:,1)=IMb';
IM(:,:,2)=IMg';
IM(:,:,3)=IMr';
IM(:,:,:)=IM(:,end:-1:1,:);
[i,j,~]=size(IM);
header(19:22) = typecast(int32(i),'uint8'); %width
header(23:26) = typecast(int32(j),'uint8'); %height
IM = permute(IM,[3,1,2]);
IM = reshape(IM,[i*3,j]);
W = double(i)*3;
W = ceil(W/4)*4;
IM(3*i+1:W,:)=0; %padd zeros
IM = IM(:); %linear
header(35:38) = typecast(uint32(length(IM)),'uint8'); %datasize
header = [header;IM];
header(3:6) = typecast(uint32(length(header)),'uint8'); %filesize
end
You can also look into ...\toolbox\matlab\imagesci\private\writebmp.m for a more detailed example.
2 Encode the bytes to base64 characters
This is best done in a mex-file.
Save this code as encodeB64.c and run mex encodeB64.c
/*==========================================================
* encodeB64.c - converts a byte vector to base64
*
* The calling syntax is:
*
* [B] = encodeB64(B)
*
* input: - B : vector of uint8
*
* output: - B : vector of base64 char
*
* This is a MEX-file for MATLAB.
*
*========================================================*/
#include "mex.h"
/* The computational routine */
void Convert(unsigned char *in, unsigned char *out,unsigned long Nin, unsigned long Nout)
{
int temp;
static unsigned char alphabet[64] = {65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,48,49,50,51,52,53,54,55,56,57,43,47};
for (int i=0;i<(Nin-2);i+=3){
temp = in[i+2] | (int)in[i+1]<<8 | (int)in[i]<<16;
for (int j=0;j<4;j++){
out[3+(i/3)*4-j] = alphabet[(temp >> (j*6)) & 0x3f];
}
}
if (Nin%3==1){
temp = (int)in[Nin-1]<<16;
out[Nout-1] = 61;
out[Nout-2] = 61;
out[Nout-3] = alphabet[(temp >> 12) & 0x3f];
out[Nout-4] = alphabet[(temp >> 18) & 0x3f];
}
if (Nin%3==2){
temp = in[Nin-1]<<8 | (int)in[Nin-2]<<16;
out[Nout-1] = 61;
out[Nout-2] = alphabet[(temp >> 6) & 0x3f];
out[Nout-3] = alphabet[(temp >> 12) & 0x3f];
out[Nout-4] = alphabet[(temp >> 18) & 0x3f];
}
}
/* The gateway function */
void mexFunction( int nlhs, mxArray *plhs[],int nrhs, const mxArray *prhs[])
{
unsigned char *InputV; /* input vector 1*/
unsigned char *OutputV; /* output vector 1*/
unsigned long Nin;
unsigned long Nout;
/* check for proper number of arguments */
if(nrhs!=1) {
mexErrMsgIdAndTxt("MyToolbox:arrayProduct:nrhs","One inputs required.");
}
if(nlhs!=1) {
mexErrMsgIdAndTxt("MyToolbox:arrayProduct:nlhs","One output required.");
}
/* make sure the first input argument is scalar integer*/
if( !mxIsClass(prhs[0],"uint8") || mxGetNumberOfElements(prhs[0]) == 1 || mxGetN(prhs[0]) != 1) {
mexErrMsgIdAndTxt("MyToolbox:arrayProduct:notRowInteger","Input one must be uint8 column vector.");
}
/* get the value of the scalar input */
InputV = mxGetPr(prhs[0]);
Nin = mxGetM(prhs[0]); /*number of input bytes */
Nout = 4*((Nin+2)/3);
/* create the output matrix */
plhs[0] = mxCreateNumericMatrix((mwSize)Nout,1,mxUINT8_CLASS,mxREAL);
/* get a pointer to the real data in the output matrix */
OutputV = (unsigned char *) mxGetData(plhs[0]);
/* call the computational routine */
Convert(InputV,OutputV,Nin,Nout);
}
To test it you can run
T = randi(255,[2^28,1],'uint8'); %250MB random data
tic;Res=encodeB64(T);t=toc %convert
(length(T)/2^20) / t %read in MB/s
(length(Res)/2^20) / t %write in MB/s
My result:
read: 467 MB/s write: 623 MB/s
3 Put it all together and test
file = 'test.html';
fid = fopen(file,'wt');
fwrite(fid,sprintf('<html>\n<header> </header>\n<body>\n'));
fwrite(fid,sprintf('<p>%s</p>\n','Show the Matlab demo image street1.jpg'));
IM = imread('street1.jpg');figure(1);clf;image(IM);
B = writeBMP(IM);
str = encodeB64(B);
fwrite(fid,sprintf('<img src="data:image/bmp;base64,%s"/>\n',str));
fwrite(fid,sprintf('</body>\n</html>'));
fclose(fid);
this should generate a 1,229,008 byte HTML file with an image encoded.
Related
I'm working on a wrapper for texgenpack for my AssetStudio python port.
The goal of the wrapper is a conversion of texture types into a format that PIL can use.
Atm, I simply want to save the original texture as a file, then read it via texgenpack, convert it and feed the result to PIL.
(Later on the file r/w will be replaced by passing bytes.)
When I try to use
def decompress(self, dst_filetype : FileType = FileType.PNG):
# init
cdef Image *image = <Image *> malloc(sizeof(Image))
src = tempfile.NamedTemporaryFile(suffix=self.filetype.name.lower(), delete=True)
dst = tempfile.NamedTemporaryFile(suffix=dst_filetype.name.lower(), delete=True)
#write image data to tempfile
src.write(self.data)
#load temp file as texture -> image
load_image(<const char *>src.name, <int> self.filetype, *image)
#save image as png
save_image(*image, <const char> *dst.name, <int> dst_filetype)
I get the error
#save image as png
save_image(*image, <const char> *dst.name, <int> dst_filetype)
^
------------------------------------------------------------
texgenpack.pyx:57:34: Expected an identifier or literal
I don't understand why the error shows up there, but not at load_image.
I tried multiple things, but pretty much all of them ended up in this error.
Since I mainly want to use it to convert textures I tried to circumvent
the problem by making a c function which does the load/save itself.
void convert_stexture_to_simage(const char *filename, int filetype, const char *dstname) {
Image image;
load_image(filename, filetype, &image);
save_image(&image, dstname, FILE_TYPE_PNG);
}
in image.c and added it to the header.
When I try to use this function via
convert_stexture_to_simage(<const char *>src.name, <int> self.filetype,<const char *>dst.name)
the following error is produced
texgenpack.obj : error LNK2001: unresolved external symbol "void __cdecl convert_stexture_to_simage(char const *,int,char const *)" (?convert_stexture_to_simage##YAXPEBDH0#Z)
build\lib.win-amd64-3.7\texgenpack_py.cp37-win_amd64.pyd : fatal error LNK1120: 1 unresolved externals
I hope that one of you can tell me how one of these two problems can be solved.
Edit
Image is defined as
typedef struct {
unsigned int *pixels;
int width;
int height;
int extended_width;
int extended_height;
int alpha_bits; // 0 for no alpha, 1 if alpha is limited to 0 and 0xFF, 8 otherwise.
int nu_components; // Indicates the number of components.
int bits_per_component; // 8 or 16.
int is_signed; // 1 if the components are signed, 0 if unsigned.
int srgb; // Whether the image is stored in sRGB format.
int is_half_float; // The image pixels are combinations of half-floats. The pixel size is 64-bit.
} Image;
in the .pyx as
ctypedef struct Image:
unsigned int* pixels
int width
int height
int extended_width
int extended_height
int alpha_bits # 0 for no alpha, 1 if alpha is limited to 0 and 0xFF, 8 otherwise.
int nu_components # Indicates the number of components.
int bits_per_component # 8 or 16.
int is_signed # 1 if the components are signed, 0 if unsigned.
int srgb # Whether the image is stored in sRGB format.
int is_half_float # The image pixels are combinations of half-floats. The pixel size is 64-bit.
based on the 2nd answer of this question
The complete code can be found here.
I was able to solve the problem,
it was actually pretty laughable.
The problem was,
that the C source wasn't compiled,
so the function references in the header files couldn't link to any actual functions.
setup.py
import os
from setuptools import Extension, setup
try:
from Cython.Build import cythonize
except ImportError:
cythonize = None
def ALL_C(folder, exclude = []):
return [
'/'.join([folder, f])
for f in os.listdir(folder)
if f[-2:] == '.c' and f not in exclude
]
extensions = [
Extension(
name = "texgenpy",
sources = [
"texgen.pyx",
*ALL_C('texgenpack'),
],
#language = "c++",
include_dirs = [
"texgenpack",
"libfgen",
],
)
]
if cythonize:
extensions = cythonize(extensions)
setup(ext_modules = extensions)
correct cython code to load the image via texgenpack
cdef class TTexture:
cdef Image image
def __init__(self, srcfile : str, filetype : int = -1):
if filetype == -1:
filetype = KNOWN_FILE_TYPES.get(srcfile.rsplit('.')[1].upper(), 0x000)
self.load(srcfile, filetype)
def load(self, srcfile, filetype):
# convert filepath to const char
src_b = (u"%s" % srcfile).encode('ascii')
cdef const char*src = src_b
load_image(src, <int> filetype, &self.image)
conversion to pillow image
#property
def image(self) -> PILImage:
# prepare tmp image in case of required conversion
cdef Image tmp_image
clone_image(&self.image, &tmp_image)
# convert image type
if tmp_image.is_half_float:
convert_image_from_half_float(&tmp_image, 0, 1.0, 1.0)
elif tmp_image.bits_per_component != 8:
print("Error -- cannot write PNG file with non 8-bit components.\n")
return None
if tmp_image.nu_components == 1: #grayscale
img = PILImage.new('L', (tmp_image.width, tmp_image.height))
img_data = img.load()
elif tmp_image.alpha_bits > 0:
img = PILImage.new('RGBA', (tmp_image.width, tmp_image.height))
img_data = img.load()
for y in range(tmp_image.height):
for x in range(tmp_image.width):
img_data[y,x] = calc_color_rgba(tmp_image.pixels[y*tmp_image.height + x])
else:
img = PILImage.new('RGB', (tmp_image.width, tmp_image.height))
img_data = img.load()
for y in range(tmp_image.height):
for x in range(tmp_image.width):
img_data[y,x] = calc_color(tmp_image.pixels[y*tmp_image.height + x])
return img
EDIT: new minimal working example to illustrate the question and better explanation of nvvp's outcome (following suggestions given in the comments).
So, I have crafted a "minimal" working example, which follows:
#include <cuComplex.h>
#include <iostream>
int const n = 512 * 100;
typedef float real;
template < class T >
struct my_complex {
T x;
T y;
};
__global__ void set( my_complex< real > * a )
{
my_complex< real > & d = a[ blockIdx.x * 1024 + threadIdx.x ];
d = { 1.0f, 0.0f };
}
__global__ void duplicate_whole( my_complex< real > * a )
{
my_complex< real > & d = a[ blockIdx.x * 1024 + threadIdx.x ];
d = { 2.0f * d.x, 2.0f * d.y };
}
__global__ void duplicate_half( real * a )
{
real & d = a[ blockIdx.x * 1024 + threadIdx.x ];
d *= 2.0f;
}
int main()
{
my_complex< real > * a;
cudaMalloc( ( void * * ) & a, sizeof( my_complex< real > ) * n * 1024 );
set<<< n, 1024 >>>( a );
cudaDeviceSynchronize();
duplicate_whole<<< n, 1024 >>>( a );
cudaDeviceSynchronize();
duplicate_half<<< 2 * n, 1024 >>>( reinterpret_cast< real * >( a ) );
cudaDeviceSynchronize();
my_complex< real > * a_h = new my_complex< real >[ n * 1024 ];
cudaMemcpy( a_h, a, sizeof( my_complex< real > ) * n * 1024, cudaMemcpyDeviceToHost );
std::cout << "( " << a_h[ 0 ].x << ", " << a_h[ 0 ].y << " )" << '\t' << "( " << a_h[ n * 1024 - 1 ].x << ", " << a_h[ n * 1024 - 1 ].y << " )" << std::endl;
return 0;
}
When I compile and run the above code, kernels duplicate_whole and duplicate_half take just about the same time to run.
However, when I analyze the kernels using nvvp I get different reports for each of the kernels in the following sense. For kernel duplicate_whole, nvvp warns me that at line 23 (d = { 2.0f * d.x, 2.0f * d.y };) the kernel is performing
Global Load L2 Transaction/Access = 8, Ideal Transaction/Access = 4
I agree that I am loading 8 byte words. What I do not understand is why 4 bytes is the ideal word size. In special, there is no performance difference between the kernels.
I suppose that there must be circumstances where this global store access pattern could cause performance degradation. What are these?
And why is that I do not get a performance hit?
I hope that this edit has clarified some unclear points.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
I'll start wit some kernel code to exemplify my question, which will follow below
template < class data_t >
__global__ void chirp_factors_multiply( std::complex< data_t > const * chirp_factors,
std::complex< data_t > * data,
int M,
int row_length,
int b,
int i_0
)
{
#ifndef CUGALE_MUL_SHUFFLE
// Output array length:
int plane_area = row_length * M;
// Process element:
int i = blockIdx.x * row_length + threadIdx.x + i_0;
my_complex< data_t > const chirp_factor = ref_complex( chirp_factors[ i ] );
my_complex< data_t > datum;
my_complex< data_t > datum_new;
for ( int i_b = 0; i_b < b; ++ i_b )
{
my_complex< data_t > & ref_datum = ref_complex( data[ i_b * plane_area + i ] );
datum = ref_datum;
datum_new.x = datum.x * chirp_factor.x - datum.y * chirp_factor.y;
datum_new.y = datum.x * chirp_factor.y + datum.y * chirp_factor.x;
ref_datum = datum_new;
}
#else
// Output array length:
int plane_area = row_length * M;
// Element to process:
int i = blockIdx.x * row_length + ( threadIdx.x + i_0 ) / 2;
my_complex< data_t > const chirp_factor = ref_complex( chirp_factors[ i ] );
// Real and imaginary part of datum (not respectively for odd threads):
data_t datum_a;
data_t datum_b;
// Even TIDs will read data in regular order, odd TIDs will read data in inverted order:
int parity = ( threadIdx.x % 2 );
int shuffle_dir = 1 - 2 * parity;
int inwarp_tid = threadIdx.x % warpSize;
for ( int i_b = 0; i_b < b; ++ i_b )
{
int data_idx = i_b * plane_area + i;
datum_a = reinterpret_cast< data_t * >( data + data_idx )[ parity ];
datum_b = __shfl_sync( 0xFFFFFFFF, datum_a, inwarp_tid + shuffle_dir, warpSize );
// Even TIDs compute real part, odd TIDs compute imaginary part:
reinterpret_cast< data_t * >( data + data_idx )[ parity ] = datum_a * chirp_factor.x - shuffle_dir * datum_b * chirp_factor.y;
}
#endif // #ifndef CUGALE_MUL_SHUFFLE
}
Let us consider the case where data_t is float, which is memory bandwidth limited. As it can be seen above, there are two versions of the kernel, one which reads/writes 8 bytes (a whole complex number) per thread and another which reads/writes 4 bytes per thread and then shuffles the results so the complex product is computed correctly.
The reason why I have written the version using shuffle is because nvvp insisted that reading 8 bytes per thread was not the best idea because this memory access pattern would be inefficient. This is the case even though in both systems tested (GTX 1050 and GTX Titan Xp) memory bandwidth was very close to theoretical maximum.
Surely enough I knew that no improvement was likely to happen, and this was indeed the case: both kernels take pretty much the same time to run. So, my question is the following:
Why is that nvvp reports that reading 8 bytes would be less efficient than reading 4 bytes per thread? In which circumstances would that be the case?
As a side note, single precision is more important to me, but double is useful in some cases too. Interestingly enough, in the case where data_t is double, there is no execution time difference too between the two kernel versions, even though in this case the kernel is compute bound and the shuffle version performs some more flops than the original version.
Note: the kernels are applied to a row_length * M * b dataset (b images with row_length columns and M lines) and the chirp_factor array is row_length * M. Both kernels run perfecly fine (I can edit the question to show you the calls to both versions if you have doubts about it).
The issue here has to do with how the compiler is processing your code. nvvp is merely dutifully reporting what is happening when you run your code.
If you use the cuobjdump -sass tool on your executable, you will discover that the duplicate_whole routine is doing two 4-byte loads and two 4-byte stores. This is not optimal, partly becuase there is a stride in each load and store (each load and store touches alternate elements in memory).
The reason for this is that the compiler does not know the alignment of your my_complex struct. Your struct would be legal for use in situations that would prevent the compiler from generating a (legal) 8-byte load. As discussed here we can fix this by informing the compiler that we only intend to use the struct in alignment scenarios where a CUDA 8-byte load is legal (i.e. it is "naturally aligned"). The modification to your struct looks like this:
template < class T >
struct __align__(8) my_complex {
T x;
T y;
};
With that change to your code, the compiler generates 8-byte loads for the duplicate_whole kernel, and you should see a different report from the profiler. You should use this sort of decoration only when you understand what it means and are willing to enter into a contract with the compiler that you will ensure this is the case. If you do something unusual, like unusual pointer casting, you can violate your end of the bargain and generate a machine fault.
The reason you don't see much performance difference almost certainly has to do with CUDA load/store behavior and the GPU caches
When you do a strided load, the GPU loads an entire cacheline anyway, even though (in this case) you only need half the elements (the real elements) for that particular load operation. However you need the other half of the elements (the imaginary elements) anyway; they will be loaded on the next instruction, and this instruction most likely hits in the cache, due to the previous load.
On a strided store in this case, writing strided elements in one instruction and the alternate elements in the next instruction will end up using one of the caches as a "coalescing buffer". This isn't coalescing in the typical sense used in CUDA terminology; that sort of coalescing only applies to a single instruction. However the cache "coalescing buffer" behavior allows it to "accumulate" multiple writes to an already-resident line, before that line gets written out or evicted. This is approximately equivalent to "write-back" cache behavior.
I want to build a binary tree in a vector s.t. parent's value would be the sum of its both children. To recursively build the tree in C would look like:
int construct(int elements[], int start, int end, int* tree, int index) {
if (start == end) {
tree[index] = elements[start];
return tree[index];
}
int middle = start + (end - start) / 2;
tree[index] = construct(elements, start, middle, tree, index*2) +
construct(elements, middle, end, tree, index*2+1);
return tree[index];
}
But I don't know how to build it in the CUDA in a parallel way by utilizing the thread. One reference I found useful is
How should we go about parallelizing this kind of recursive algorithm? One way is to use the approach presented by Garanzha et al., which processes the levels of nodes sequentially, starting from the root. The idea is to maintain a growing array of nodes in a breadth-first order, so that every level in the hierarchy corresponds to a linear range of nodes. On a given level, we launch one thread for each node that falls into this range. The thread starts by reading first and last from the node array and calling findSplit(). It then appends the resulting child nodes to the same node array using an atomic counter and writes out their corresponding sub-ranges. This process iterates so that each level outputs the nodes contained on the next level, which then get processed in the next round.
which process each level sequentially and parallelize the nodes at each level. I think it makes total sense, but I don't how to implement that exactly, can somebody give me an idea or example on how to do that?
I am not sure the indexing scheme described above would work.
Here is a sample code that could work: (though the tree indexing might not suit your needs):
__global__ void buildtreelevel(const int* elements, int count, int* tree)
{
int parentcount = (count + 1) >> 1;
for (int k = threadIdx.x + blockDim.x * blockIdx.x ; k < parentcount ; k += blockDim.x * gridDim.x)
{
if ((2*k+1) < count)
tree[k] = elements[k*2] + elements[k*2+1] ;
else
tree[k] = elements[k*2] ;
}
}
This function only processes one tree level at a time. The overall tree size is provided by :
int treesize (int count, int& maxlevel)
{
int res = 1 ;
while (count > 1)
{
count = (count + 1) >> 1 ;
res += count ;
++maxlevel;
}
return res ;
}
And building the whole tree requires several calls to the buildtreelevel kernel:
int buildtree (int grid, int block, const int* d_elements, int count, int** h_tree, int* d_data)
{
const int* ptr_elements = d_elements ;
int* ptr_data = d_data ;
int level = 0 ;
int levelcount = count ;
while (levelcount > 1)
{
buildtreelevel <<< grid, block >>> (ptr_elements, levelcount, ptr_data) ;
levelcount = (levelcount + 1) >> 1 ;
h_tree [level++] = ptr_data ;
ptr_elements = ptr_data ;
ptr_data += levelcount ;
}
return level ;
}
Synchronization only needs to occur at the end as all kernels are executed on stream 0.
int main()
{
int nElements = 10000000 ;
int* d_elements ;
int* d_data ;
int** h_tree ;
int maxlevel = 1 ;
cudaMalloc ((void**)&d_elements, nElements * sizeof (int)) ;
cudaMalloc ((void**)&d_data, treesize(nElements, maxlevel) * sizeof (int)) ;
h_tree = new int*[maxlevel];
buildtree (64, 256, d_elements, nElements, h_tree, d_data) ;
cudaError_t res = cudaDeviceSynchronize() ;
if (cudaSuccess != res)
fprintf (stderr, "ERROR (%d) : %s \n", res, cudaGetErrorString(res));
cudaDeviceReset();
}
Your tree structure is stored in h_tree, which is a host array of device pointers.
This is not optimal, but might be a good start (using aligned int4 with __ldg) and processing 4 levels at a time might improve performance.
I find a predicate xml_quote_attribute/2 in a library(sgml)
of SWI-Prolog. This predicate works with the first argument
as input and the second argument as output:
?- xml_quote_attribute('<abc>', X).
X = '<abc>'.
But I couldn't figure out how I can do the reverse conversion.
For example the following query doesn't work:
?- xml_quote_attribute(X, '<abc>').
ERROR: Arguments are not sufficiently instantiated
Is there another predicate that does the job?
Bye
This is how Ruud's solution looks like with DCG notation + pushback lists / semicontext notation.
:- use_module(library(dcg/basics)).
html_unescape --> sgml_entity, !, html_unescape.
html_unescape, [C] --> [C], !, html_unescape.
html_unescape --> [].
sgml_entity, [C] --> "&#", integer(C), ";".
sgml_entity, "<" --> "<".
sgml_entity, ">" --> ">".
sgml_entity, "&" --> "&".
Using DCGs makes the code a bit more readable. It also does away with some of the superfluous backtracking that Cookie Monster noted is the result of using append/3 for this.
Here's the naive solution, using lists of character codes. Most likely it will not give you the best performance possible, but for strings that are not extremely long, it might just be alright.
html_unescape("", "") :- !.
html_unescape(Escaped, Unescaped) :-
append("&", _, Escaped),
!,
append(E1, E2, Escaped),
sgml_entity(E1, U1),
!,
html_unescape(E2, U2),
append(U1, U2, Unescaped).
html_unescape(Escaped, Unescaped) :-
append([C], E2, Escaped),
html_unescape(E2, U2),
append([C], U2, Unescaped).
sgml_entity(Escaped, [C]) :-
append(["&#", L, ";"], Escaped),
catch(number_codes(C, L), error(syntax_error(_), _), fail),
!.
sgml_entity("<", "<").
sgml_entity(">", ">").
sgml_entity("&", "&").
You will have to complete the list of SGML entities yourself.
Sample output:
?- html_unescape("<a> 曹操", L), format('~s', [L]).
<a> 曹操
L = [60, 97, 62, 32, 26361, 25805].
If you don't mind linking a foreign module, then you can make a very efficient implementation in C.
html_unescape.pl:
:- module(html_unescape, [ html_unescape/2 ]).
:- use_foreign_library(foreign('./html_unescape.so')).
html_unescape.c:
#include <stdio.h>
#include <string.h>
#include <SWI-Prolog.h>
static int to_utf8(char **unesc, unsigned ccode)
{
int ok = 1;
if (ccode < 0x80)
{
*(*unesc)++ = ccode;
}
else if (ccode < 0x800)
{
*(*unesc)++ = 192 + ccode / 64;
*(*unesc)++ = 128 + ccode % 64;
}
else if (ccode - 0xd800u < 0x800)
{
ok = 0;
}
else if (ccode < 0x10000)
{
*(*unesc)++ = 224 + ccode / 4096;
*(*unesc)++ = 128 + ccode / 64 % 64;
*(*unesc)++ = 128 + ccode % 64;
}
else if (ccode < 0x110000)
{
*(*unesc)++ = 240 + ccode / 262144;
*(*unesc)++ = 128 + ccode / 4096 % 64;
*(*unesc)++ = 128 + ccode / 64 % 64;
*(*unesc)++ = 128 + ccode % 64;
}
else
{
ok = 0;
}
return ok;
}
static int numeric_entity(char **esc, char **unesc)
{
int consumed;
unsigned ccode;
int ok = (sscanf(*esc, "&#%u;%n", &ccode, &consumed) > 0 ||
sscanf(*esc, "&#x%x;%n", &ccode, &consumed) > 0) &&
consumed > 0 &&
to_utf8(unesc, ccode);
if (ok)
{
*esc += consumed;
}
return ok;
}
static int symbolic_entity(char **esc, char **unesc, char *name, int ccode)
{
int ok = strncmp(*esc, name, strlen(name)) == 0 &&
to_utf8(unesc, ccode);
if (ok)
{
*esc += strlen(name);
}
return ok;
}
static foreign_t pl_html_unescape(term_t escaped, term_t unescaped)
{
char *esc;
if (!PL_get_chars(escaped, &esc, CVT_ATOM | REP_UTF8))
{
PL_fail;
}
else if (strchr(esc, '&') == NULL)
{
return PL_unify(escaped, unescaped);
}
else
{
char buffer[strlen(esc) + 1];
char *unesc = buffer;
while (*esc != '\0')
{
if (*esc != '&' || !(numeric_entity(&esc, &unesc) ||
symbolic_entity(&esc, &unesc, "<", '<') ||
symbolic_entity(&esc, &unesc, ">", '>') ||
symbolic_entity(&esc, &unesc, "&", '&')))
// TODO: more entities...
{
*unesc++ = *esc++;
}
}
return PL_unify_chars(unescaped, PL_ATOM | REP_UTF8, unesc - buffer, buffer);
}
}
install_t install_html_unescape()
{
PL_register_foreign("html_unescape", 2, pl_html_unescape, 0);
}
The following statement will build a shared library html_unescape.so from html_unescape.c. Tested on Ubuntu 14.04; may be different on Windows.
swipl-ld -shared -o html_unescape html_unescape.c
Start up SWI-Prolog:
swipl html_unescape.pl
Sample output:
?- html_unescape('<a> 曹操', S).
S = '<a> 曹操'.
With special thanks to the SWI-Prolog documentation and source code, and to C library to convert unicode code points to UTF8?
Not aspiring as being the ultimate answer, since it doesn't give
a solution for SWI-Prolog. For a Java based interpreter the problem
is that XML escaping is not part of J2SE, at least not in a simple
form (didn't figure out how to use Xerxes or the like).
A possible route would be to interface to StringEscapeUtils ( * ) from
Apache Commons. But then again this would not be necessary on
Android since there is a class TextUtil. So we rolled our own ( * * )
little conversion. It works as follows:
?- text_escape('<abc>', X).
X = '<abc>'
?- text_escape(X, '<abc>').
X = '<abc>'
Note the use of the Java methods codePointAt() and charCount()
respectively appendCodePoint() in the Java source code. So it
could also escape and unescape code points above the basic
plane, i.e. in a range >0xFFFF (currently not implemented,
left as an exercise).
On the other hand the Apache libraries, at least version 2.6, are
NOT surrogate pair aware and will place two decimal entities per
code point instead as one.
Bye
( * ) Java: Class StringEscapeUtils Source
http://grepcode.com/file/repo1.maven.org/maven2/commons-lang/commons-lang/2.6/org/apache/commons/lang/Entities.java#Entities.escape%28java.io.Writer,java.lang.String%29
( * * ) Jekejeke Prolog: Module xml
http://www.jekejeke.ch/idatab/doclet/prod/en/docs/05_run/10_docu/05_frequent/07_theories/20_system/03_xml.html
I'm in a little trouble here.
Can anyone help me implement a solution that reverses every byte so 0xAB becomes 0xBA but not so "abcd" becomes "dcba". I need it so AB CD EF becomes BA DC FE.
Preferably in C or C++ but it doesn't really matter provided it can run.
So far, I've implemented a UBER CRAPPY solution that doesn't even work (and yes, I know that converting to string and back to binary is a crappy solution) in PureBasic.
OpenConsole()
filename$ = OpenFileRequester("Open File","","All types | *.*",0)
If filename$ = ""
End
EndIf
OpenFile(0,filename$)
*Byte = AllocateMemory(1)
ProcessedBytes = 0
Loc=Loc(0)
Repeat
FileSeek(0,Loc(0)+1)
PokeB(*Byte,ReadByte(0))
BitStr$ = RSet(Bin(Asc(PeekS(*Byte))),16,"0")
FirstStr$ = Left(BitStr$,8)
SecondStr$ = Right(BitStr$,8)
BitStr$ = SecondStr$ + FirstStr$
Bit.b = Val(BitStr$)
WriteByte(0,Bit)
ProcessedBytes = ProcessedBytes + 1
ClearConsole()
Print("Processed Bytes: ")
Print(Str(ProcessedBytes))
Loc=Loc(0)
Until Loc = Lof(0)
Delay(10000)
Thanks for reading.
Reading your PureBasic code (I skipped it at first), it does seem you want to swap endian, even though it's not what your text is asking—0xAB practically always means a byte with decimal value 171, not two bytes, and it's extremely common to display a byte as two hex digits, where you use A-F in your example.
#include <iostream>
int main() {
using namespace std;
for (char a; cin.get(a);) {
char b;
if (!cin.get(b)) {
cout.put(a); // better to write it than lose it
cerr << "Damn it, input ends with an odd byte, is it in "
"the right format?\n";
return 1;
}
cout.put(b);
cout.put(a);
}
return 0;
}
// C version is a similar easy translation from the original code
import numpy
import sys
numpy.fromfile(sys.stdin, numpy.int16).byteswap(True).tofile(sys.stdout)
Original answer:
I'm not sure why you want this (it doesn't convert endian, for example, if you want that), but here you go:
#include <stdio.h>
int main() {
for (char c; (c == getchar()) != EOF;) {
putchar((c & 0xF << 4) | ((int)c & 0xF0 >> 4));
}
return 0;
}
#include <iostream>
int main() {
for (char c; std::cin.get(c);) {
std::cout.put((c & 0xF << 4) | ((int)c & 0xF0 >> 4));
}
return 0;
}
import sys
for line in sys.stdin:
sys.stdout.write("".join(
chr((ord(c) & 0xF << 4) | (ord(c) & 0xF0 >> 4))
for c in line
))
All assume that text translations don't occur (such as \n to \r\n and vice versa); you'll have to change them to opening files in binary mode if that's the case. They read from stdin and write to stdout, if you're unfamiliar with that, so just use programname < inputfile > outputfile to run them.
Reversing the high and low half-byte is possible through a simple arithmetic formula (assuming you operate on unsigned bytes):
reversed = (original % 16) * 16 + (original / 16);
A Haskell solution:
module ReverseBytes where
import qualified Data.ByteString as B
import Data.Bits
import Data.Word
-----------------------------------------------------------
main :: IO ()
main = B.getContents >>= B.putStr . B.map reverseByte
reverseByte :: Word8 -> Word8
reverseByte = flip rotate 4
runghc ReverseBytes.hs < inputfile > outputfile