PyPy 2.2.1 and SQLAlchemy/Storm

PyPy 2.2.1 and SQLAlchemy/Storm - sqlalchemy

We are starting a project that uses PyPy 2.2.1 and we want to start using SQLAlchemy in it. One thing we noticed is that when installing SQLAlchemy in the pypy virtual environment the following message appeared:
WARNING: C extensions are not supported on this Python platform, speedups are not enabled.
We, of course are concerned by the performance of the SQL queries that we will build and the manipulation of the result sets; and we don't really know how high can be the magnitude of these speedups not being enabled in SQLAlchemy. In addition, while we were looking into other solutions, for the sake of a test, installing Storm and SQLObject didn't throw any warning message like SQLAlchemy did in the pypy virtual environment.
In this specific scenario, would you recomend to still use SQLAlchemy even if the speedups are not enabled? or using an ORM like Storm would have better performance specially when dealing with complex queries and larger result sets?

the C Extensions in sqlalchemy provide a faster result set object than can be easily achieved in pure python, but you may get good benefits in this case from the pypy jit. Even if you don't, this won't in any way affect query performance. The warning can be safely ignored.

Related

The use of packages to parse command arguments employing options/switches?

I have a couple questions about adding options/switches (with and without parameters) to procedures/commands. I see that tcllib has cmdline and Ashok Nadkarni's book on Tcl recommends the parse_args package and states that using Tcl to handle the arguments is much slower than this package using C. The Nov. 2016 paper on parse_args states that Tcl script methods are or can be 50 times slower.
Are Tcl methods really signicantly slower? Is there some minimum threshold number of options to be reached before using a package?
Is there any reason to use parse_args (not in tcllib) over cmdline (in tcllib)?
Can both be easily included in a starkit?
Is this included in 8.7a now? (I'd like to use 8.7a but I'm using Manjaro Linux and am afraid that adding it outside the package manager will cause issues that I won't know how to resolve or even just "undo").
Thank you for considering my questions.

Are Tcl methods really signicantly slower? Is there some minimum threshold number of options to be reached before using a package?
Potentially. Procedures have overhead to do with managing the stack frame and so on, and code implemented in C can avoid a number of overheads due to the way values are managed in current Tcl implementations. The difference is much more profound for numeric code than for string-based code, as the cost of boxing and unboxing numeric values is quite significant (strings are always boxed in all languages).
As for which is the one to use, it really depends on the details as you are trading off flexibility for speed. I've never known it be a problem for command line parsing.
(If you ask me, fifty options isn't really that many, except that it's quite a lot to pass on an actual command line. It might be easier to design a configuration file format — perhaps a simple Tcl script! — and then to just pass the name of that in as the actual argument.)
Is there any reason to use parse_args (not in tcllib) over cmdline (in tcllib)?
Performance? Details of how you describe things to the parser?
Can both be easily included in a starkit?
As long as any C code is built with Tcl stubs enabled (typically not much more than define USE_TCL_STUBS and link against the stub library) then it can go in a starkit as a loadable library. Using the stubbed build means that the compiled code doesn't assume exactly which version of the Tcl library is present or what its path is; those are assumptions that are usually wrong with a starkit.
Tcl-implemented packages can always go in a starkit. Hybrid packages need a little care for their C parts, but are otherwise pretty easy.
Many packages either always build in stubbed mode or have a build configuration option to do so.
Is this included in 8.7a now? (I'd like to use 8.7a but I'm using Manjaro Linux and am afraid that adding it outside the package manager will cause issues that I won't know how to resolve or even just "undo").
We think we're about a month from the feature freeze for 8.7, and builds seem stable in automated testing so the beta phase will probably be fairly short. The list of what's in can be found here (filter for 8.7 and Final). However, bear in mind that we tend to feel that if code can be done in an extension then there's usually no desperate need for it to be in Tcl itself.

Apache Johnzon vs Jackson

since Apache released the first final version of Johnzon, it would be really interesting to see if there are already some comparison between Johnzon and FastXML Jackson to see if it is worth to switch. The most important topic is probably the performance.
Has anyone already done performance tests? Can you share your result?
Best

There are some performance benchmarks up on github.
But for each of them you really have to verify if the benchmark is actually correctly implemented.
For what I've seen most benchmarks use the official javax.* APIs in a sub-optimal way. Most use Json.createGenerator, etc but they should actually use JsonProvider.provider() and store this away for your operations. Then call createGenerator etc on this JsonProvider.
That way you can make sure that you really get comparable results.
We have done quite a few tests and for me the numbers of Johnzon look really good. And especially since it's much smaller than most other JSON libs.

As mentioned in several other sources and mailing lists(TomEE, for example), the performance gain, if any, is negligible especially when you compare it to the overall request-response processing chain.
If you use Spring Boot, you will find a lot more community support and flexibility in terms of features for Jackson.
Jackson has tons of different modules and good support for other JVM languages(for example KotlinModule).
We, in my project, also use quite a lot of Clojure, where we use Cheshire, which relies on Jackson under the hood.
In the end, it's up to you what to use and whether the cases I mentioned are applicable to your project, but so far I haven't seen any compelling performance reports about Johnson and until it happens, I would go for a library with a lot higher adoption in the industry.

How to increase efficiency of Unit testing?

We are creating unit test cases for our existing code base, while progressing through the creation of the test cases the test files are getting bigger in size and are taking very long time in execution.
I know the limitations of unit testing and I did some research also to increase efficiency. While research I found one useful idea to tighten up the provided data set.
Still I am looking for some more ideas on how I can increase efficiency of running/creating the unit test cases? We can keep the option to increase the server resources outside of this scope.

As your question was general I'll cover a few of the common choices. But most of the speed-up techniques have downsides.
If you have dependencies on external components (web services, file systems, etc.) you can get a speed-up by mocking them. This is usually desirable for unit testing anyway. You still need to have integration/functional tests that test with the real component.
If testing databases, you can get quite a speed-up by using an in-memory database (sqlite works well with PHP's PDO; with java maybe H2?). This can have downsides, unless database portability is already a design goal. (I'm about to move to running one set of unit tests against both MySQL and sqlite.) Mocking the database away completely (see above) may be better.
PHPUnit allows you to specify #group on each test. You could go through and mark your slower tests with #group slow, and then use the --exclude-group commandline flag to exclude them on most of your tests runs, and just include them in the overnight build. (You can also specify groups to include/exclude in your phpunit.xml.dist file.
(I don't think jUnit has this option, but TestNG does; for C#, NUnit offers categories for this.)
Creating fixtures once, and then sharing them between tests is quicker than creating the fixture before each test. The XUnit Test Patterns devotes whole chapters to the pros and cons (mostly cons) of this approach.
I know throwing hardware at it was explicitly forbidden in your question, but look again at #group, and consider how it can allow you to split your tests across multiple machines. Or splitting tests by directory, and processing one directory on each of multiple machines on your LAN. (PHPUnit is single-threaded, so you could run multiple instances on the same machine, each doing its own directory: be aware of how fixtures need to be independent (including unique names for databases you create, mocking the filesystem, etc.) if you go down this route.)

What is the most mature/stable mysql node.js module

I am looking to do some work around mysql and node.js and have found a few different modules out there but I cannot get a good bead on their stability/maturity. I know each author puts very hard work into each one, but for the work we're doing I need to know I've got a solid mysql foundation. The modules I've found that look pretty good are:
db-mysql This appears pretty active.
node-mysql This is a pretty pervasive module I've seen so far, it appears to be in a maintenance phase, and seems solid.
node-mysql-native I like the async work being done here, but I'm not sure how well it works yet.
node-mysql-libmysqlclient I'm not sure about this one, but it appears to be active as well.
I don't have many needs that are too far out of the ordinary. I need regular query support, extras would be nice, I just need a good foundation to start from. Any input as to the strengths and weaknesses of these modules would be great. If there is another quality contender I have not found I am not at all against considering another option.

I'm the author of node-mysql-native driver, from my point of view the differences are
no prepared statements support (yet) in node-mysql
according to my benchmarks node-mysql is 10-20% slower than node-mysql-native
node-mysql has much wider adoption, more tests and users. If you need stability, better use it
node-mysql-libmysqlclient is 2 to 3 times faster on fast queries. However, if you have a lot of slow queries and use connection pools it could be even slower than native JS driver because libmysqlclient async calls are based on nodejs thread pool and not on event loop.
update
As of 11/07/2013
(2). no longer valid (mysql-native is a bit slower than node-mysql)
have this alternative to node-mysql, on some benchmarks it's 2-3 times faster, has same API + support for prepared statements, SSL and compression. Also implements simple subset of server side protocol - see for example MySQL -> Postgres proxy.
node-mariasql is also a very good option (if it's ok to use binary addon) - fast, stable, async, prepared statements support, compression and SSL.

I went through a similar search and ended up settling on node-mysql. I like it's simplicity, the fact that it's pure js, and that it's well supported. It was slower in tests that I did than some of the mixed modules (those that used non-js libs), but I did a minor patch that helped considerably with that for my cases:
https://github.com/geochap/node-mysql

Groovy performance

Hi
We are going to start a CRUD project.
I have some experience using groovy and
I think it is the right tool.
My concern is about performance.
How good is groovy compared to a java solution.
It is estimated that we can have up to 100
simultaneosly users. We are going to use a
MySql DB and a tomcat server.
Any comment or suggestion?
Thanks

I've recently gathered five negative votes (!) on an answer on Groovy performance; however, I think there should be, indeed, a need for objective facts. Personally, I think it's productive and fun to work with Groovy and Grails; nevertheless, there is a performance issue that needs to be addressed.
There are a number of benchmark comparisons on the web, including this one. You can never trust single benchmarks (and the cited one isn't even close to being scientific), but you'll get the idea.
Groovy strongly relies on runtime meta programming. Every object in Groovy (well, except Groovy scripts) extends from GroovyObject with its invokeMethod(..) method, for example. Every time you call a method in your Groovy classes, the method will not be called, directly, as in Java, but by invoking the aforementioned invokeMethod(..) (which does a whole bunch of reflection and lookups).
Additionally, every GroovyObject has an associated MetaClass. The concepts of method invocation, etc., are similar.
There are other factors that decrease Groovy performance in comparison to Java, including boxing of primitive data types and (optional) weak typing, but the aforementioned concept of runtime meta programming is crucial. You cannot even think of a JIT compiler with Groovy, that compiles Java bytecode to native code to speed up execution.
To address these issues, there's the Groovy++ project. You simply annotate your Groovy classes with #Typed, and they'll be statically compiled to (real) Java bytecode. Unfortunately, however, I found Groovy++ to be not quite mature, and not well integrated with the main Groovy line, and IDEs. Groovy++ also contradicts basic Groovy programming paradigms. Moreover, Groovy++' #Typed annotation does not work recursively, that is, does not affect underlying libraries like GORM or the Grails controllers infrastructure.
I guess you're evaluating employing a Grails project, as well.
When looking at Grails' GORM, that framework makes heavily use of runtime meta programming, using Hibernate directly, should perform much better.
At the controllers or (especially) services level, extensive computations can be externalized to Java classes. However, GORMs proportion in typical CRUD applications is higher.
Potential performance in Grails are typically addressed by caching layers at the database level or by avoiding to call service or controllers methods (see the SpringCache plugin or the Cache Filter plugin). These are typically implemented on top of the Ehcache infrastructure.
Caching, obviously, may suit well with static data in contrast to (database) data that frequently changes, or web output that is rather variable.
And, finally, you can "throw hardware at it". :-)
In conclusion, the most decisive factor for or against using Groovy/Grails in a large-scaling website ought to be the question whether caching fits with the specific website's nature.
EDIT:
As for the question whether Java's JIT compiler had a chance to step in ...
A simple Groovy class
class Hello {
def getGreeting(name) {
"Hello " + name
}
}
gets compiled to
public class Hello
implements GroovyObject
{
public Hello()
{
Hello this;
CallSite[] arrayOfCallSite = $getCallSiteArray();
}
public Object getGreeting(Object name) {
CallSite[] arrayOfCallSite = $getCallSiteArray();
return arrayOfCallSite[0].call("Hello ", name);
}
static
{
Long tmp6_3 = Long.valueOf(0L);
__timeStamp__239_neverHappen1288962446391 = (Long)tmp6_3;
tmp6_3;
Long tmp20_17 = Long.valueOf(1288962446391L);
__timeStamp = (Long)tmp20_17;
tmp20_17;
return;
}
}
This is just the top of an iceberg. Jochen Theodoru, an active Groovy developer, put it that way:
A method invocation in Groovy consists
usually of several normal method
calls, where the arguments are stored
in a array, the classes of the
arguments must be retrieved, a key is
generated out of them, a hashmap is
used to lookup the method and if that
fails, then we have to test the
available methods for compatible
methods, select one of the methods
based on the runtime type, create a
key for the hasmap and then in the
end, do a reflection like call on the
method.
I really don't think that the JIT inlines such dynamic, highly complex invocations.
As for a "solution" to your question, there is no "do it that way and you're fine". Instead, the task is to identify the factors that are more crucial than others and possible alternatives and mitigation strategies, to evaluate their impact on your current use cases ("can I live with it?"), and, finally, to identify the mix of technologies that meets the requirements best (not completely).

Performance (in the context of web applications) is an aspect of your application and not of the framework/language you are using. Any discussion and comparison about method invocation speed, reflection speed and the amount of framework layers a call goes through is completely irrelevant. You are not implementing photoshop filters, fractals or a raytracer. You are implementing web based CRUD.
Your showstopper will most probably be inefficient database design, N+1 queries (in case you use ORM), full table scans etc.
To answer your question: use any modern language/web framework you feel more confident with and focus on correct architecture/design to solve the business problem at hand.

Thanks for the answers and advices. I like groovy. It might be some performance problems under some circumstances. Groovy++ might be a better choice. At his point I would prefer to give a chance to "spring roo" which has a huge overlapping with Groovy but you remain at java and NO roo.jar is added to your project. Therefore you are not paying any extra cost for using it.
Moreover "roo" allows backward engineering and roundtrip engineering.
Unfortunately the plug-in library is pretty small up to now.
Luis

50 to 100 active users is not much of a traffic. As long as you have cached pages correctly, mysql queries are properly indexes, you should be ok.
Here is a site I am running in my basement in a $1000 server. It's written in Grails.
Checkout performance yourself http://www.ewebhostguide.com
Caution: Sometimes Comcast connections are down and site may appear down. But that happens only for few minutes. Cons of running site in basement.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008