Obtain source code from uncompiled contract with bytecode and ABI? - ethereum

Theres a contract on BSC that isnt verified and I am really keen to get the code behind it. I have both the full bytecode and ABI. Is it possible to obtain readable source code using this?
Thanks!

BSCScan has an integrated decompiler that produces pseudocode from the input binary bytecode.
It's not perfect - some of the resulting code performs overly complicated operations that can be written on one line in Solidity, some functions are not able to decompile at all, ... but it can help with manually reconstructing the source code.
There are other decompilers available online as well. Usually it helps to decompile the binary using multiple tools so that you get a better sense of what the source code should do.

Related

Minify ceylon-sdk and ceylon-language when compiling to javascript

For an in-browser application written in ceylon-js it would be desirable to reduce the size of the ceylon.language-1.2.0.js file to only that what is actually needed.
This question was answered already.
How to use ceylon js (also with google closure compiler)
But the given solution involves manually editing javascript code resulting from compilation. This is not desirable since a compiler should produce code that hasnĀ“t to be edited manually after compilation (abstraction).
And it is not clear to me if google closure compiler can cope with the ceylon flavour of it in a reliable way.
Is it instead a solution to copy ceylon.language source in ceylon into the project and import only those parts of ceylon.language into the project that are required by it? Then compile to javascript. And then leave away ceylon.language-1.2.0.js at all from the client / in-browser application.
Now my questions:
What parts are needed in the most simple browser application? I think of something like Array(String) and the like.
Has that solution a chance to work absolutely reliable?
Will there be a better solution coming from the authors of ceylon that make this attempt obsolete?
The compilation of the language module to JS is a tricky process, because of the native stuff involved and because there are a couple of declarations that have to be in a certain order for things to work.
Minification is still pending, we are going to do it but it's not the highest priority right now, and we have to determine the best way to solve this problem; one option that has been discussed is to have a version of the language module without any metamodel info, for example.

Code translation process

I'm going to do a presentation about programming languages in our class, gonna talk about the basics. It's going to be a brief one, around 5-10 minutes. The audience has no knowledge in this subject.
One of the things I'm going to talk about is low-level and high-level languages, and machine code. To simplify and visualize the difference I created this image.
But this is just a guess. I'm not sure if this is correct. Probably not. Could you enlighten me on how this process works without going into too much detail?
I'm not sure if this is the right place to ask this question. If not, I'll move it to somewhere else. Guide me. Also, about the title and the tags, you can correct them.
What happens largely depends on your environment, so there is no one answer. A general high level view, considering you're starting with what appears to be the C language and assuming its a standard environment (not something such as a Java virtual machine) is that:
A compiler converts C to assembly
An assembler converts assembly to object code (what you show as "low-level language")
A linker gathers one or more file of object code and attempts to fill out its needs with the content of libraries it knows about. This output is still object code, but step 3's object code was for a specific file's instructions only. This object code is in a format appropriate for step 4.
A loader reads the program into memory, potentially satisfying dynamic links that are required to run the program. It takes operating system specific steps to create a process that will execute the program.

Reading disassembled code

I wrote simple Hello word program with masm32. But then when I try to disassemble it with IDA and I am getting much bigger output (I won't write it there because it would take to much space). And I don't get it why it's different. How to run the disasembled code?
This is normal. Compilation is a "lossy" process, which means that if you compile code and then decompile it, you're not guaranteed to get exactly the same thing out that you originally put in. The same thing applies to assembly language. When you assemble and link the code, it's a one-way process.
This is why programmers save the original source code, rather than just trying to decompile their binaries when they want to fix bugs.

Understanding run time code interpretation and execution

I'm creating a game in XNA and was thinking of creating my own scripting language (extremely simple mind you). I know there's better ways to go about this (and that I'm reinventing the wheel), but I want the learning experience more than to be productive and fast.
When confronted with code at run time, from what I understand, the usual approach is to parse into a machine code or byte code or something else that is actually executable and then execute that, right? But, for instance, when Chrome first came out they said their JavaScript engine was fast because it compiles the JavaScript into machine code. This implies other engines weren't compiling into machine code.
I'd prefer not compiling to a lower language, so are there any known modern techniques for parsing and executing code without compiling to low level? Perhaps something like parsing the code into some sort of tree, branching through the tree, and comparing each symbol and calling some function that handles that symbol? (Wild guessing and stabbing in the dark)
I personally wouldn't roll your own parser ( turning the input into tokens ) or lexer ( checking the input tokens for your language grammar ). Take a look at ANTLR for parsing/lexing - it's a great framework and has full source code if you want to dig into the guts of it.
For executing code that you've parsed, I'd look at running a simple virtual machine or even better look at llvm which is an open-source(ish) attempt to standardise the virtual machine byte code format and provide nice features like JITing ( turning your script compiled byte code into assembly ).
I wouldn't discourage you from the more advanced options that you machine such as native machine code execution but bear in mind that this is a very specialist area and gets real complex, real fast!
Earlz pointed out that my reply might seem to imply 'don't bother doing this yourself. Re-reading my post it does sound a bit that way. The reason I mentioned ANTLR and LLVM is they both have heaps of source code and tutorials so I feel this is a good reference source. Take it as a base and play
You can try this framework for building languages (it works well with XNA):
http://www.meta-alternative.net/mbase.html
There are some tutorials:
http://www.meta-alternative.net/calc.pdf
http://www.meta-alternative.net/pfront.pdf
Python is great as a scripting language. I would recommend you make a C# binding for its C API and use that. Embedding Python is easy. Your application can define functions, types/classes and variables inside modules which the Python interpreter can access. The application can also call functions in Python scripts and get a result back. These two features combined gives you a two-way communication scheme.
Basically, you get the Python syntax and semantics for free. What you would need to implement is the API your application exposes to Python. An example could be access to game logic functions and render functions. Python scripts would then define functions which calls these, and the host application would invoke the Python functions (with parameters) to get work done.
EDIT: Seems like IronPython can save you even more work. It's a C# implementation of CPython, and has its own embedding API: http://www.ironpython.net/

Adding a language to the AVM2

I'm interested in making a language to run on the AVM2 and I'm looking for advice on where to start. I do realize that this is by no means a trivial task, but I would like to give it a try and at the very least learn more about implementing a language along the way.
I have messed around with ANTLR and have been reading up on syntax issues for language development. What I'm looking for is advice on a path to take or useful references/books.
For instance I would like to generate (script/manually) some very simple AVM2 bytecode and get that to run on the VM as a start.
Thanks
If you are not interested in Haxe, you will basically need to write your own compiler that compiles objects down to ABC (Actionscript Byte Code). The AVM2 Overview document available from Adobe on ABC and the AVM2 which should help you get started. It's a fairly thorough document but stay alert for a few typo's in the bytecode instructions.
You will also need to wrap the bytecode in a doABC tag as part of a SWF container. You can get more information from the SWF File Format documentation.
If you'd like a headstart on writing the data structures (optimised int formats, etc), feel free to checkout the code at asmock, a dynamic mocking project I've been working on. The SWF/ByteCode generation stuff is a bit messy but there are IDataOutput wrappers (SWF, ByteCode) that might come in handy.
Project Alchemy by Adobe can be a good reference
http://labs.adobe.com/technologies/alchemy/
How did it go?
I'm also interested in doing a Java to AVM2 compiler...
Do you have any published code?
Take a look at Haxe: it is an open source language that can target different platforms, including the AVM. You can dig into the SWF compiler source code to get some inspiration.