- redirecting final input/output from a program.
- parameterizing things before they start.
- combining results or i/o from multiple programs, written in different languages.
Examples: Data scientists often write python scripts to post process data that has been summarized by a big data pipeline written in a language like java. Or a sysadmin might write a scripts to launch, at midnight, a precompiled C program which might, for example, do a bunch of calculations for a banking system's inflow, outflow, suggested interest rates for the next month, and so on.
The question comes up: Why do people write scripts, when we have sophisticated software programs which already are doing 90% of the work?
Well: Two reasons: Compiling code is a pain, and scripts don't require any recompilation, and the last 10% tends to change.
Now, Do we need scripts?
1) Compilation: even though languages like Go make compilation FAST, they don't change the fact that they introduce a lot of dependencies to people wanting to modify the original program in any way.
2) Changing: program use cases always change.
3) Scripts make monolithic architecture impossible, and enforce an organic decoupling leading to clean separation of concerns.
So for reasons (1) and (2), I think we will always need scripts in modern open source projects. That is : Frameworks which require compilation and deep understanding of an internal system of tests, dependencies, and so on should decouple the static parts which define constants, do basic text parsing, and so on, from the more sophisticated internal implementations which need to be compiled as binaries, and in particular that decoupling needs to happen because programs tend to change.
Reason (3) is just a nice to have but I won't say its a hard requirement.
Now, I will ramble on... Where are some examples of how scripts are used?
For example. The data scientist in the first example might want to start using R to process the ETL'd data instead of python. Since python is low cost to maintain, and decoupled from the codebase - s(he) can easily do so. Meanwhile, the sysadmin may hear that "oh we're redoing the C program in java next month, beware". Well... In his case, as long as the underlying system stays the same, his lack of technical coupling to the platform benefits the entire organization.
In ASF BigTop a decision was made to do everything in JVM languages. In some ways, its quite elegant, because a Java developer can easily tool around with Jars and Groovy scripts to reuse libraries and so on. But sometimes I feel like alot of the stuff we do is done quite often by python developers, and our community would be bigger, and our tooling would be more robust, if we used python for what it is good at, and Java for the rest.
In other words: I don't think you should have a single language that dominates all aspects of an open source project - it can lead to people having to fork your project's source code at its core just because they want to run it in a slightly different manner.
I love programming in Go, but a similar debate came up in the kubernetes project this week, and I was hoping to weave a python abstraction layer on top of the Go parts. But instead ooks like we're going to do everyting in Go.
I guess time will tell if this is the ideal decision. Go is quite flexible, and compiles fast, so it is not as dangerous as the Java decision we made in bigtop. But I'm a little skeptical that leaving out the scripting languages will lead to extra forking, and harder ramp up time for ops folks.
No comments:
Post a Comment