Ridiculously Long-Winded Programming Babble

Okay I don’t actually know how this pointless rambling got so long. I know the longer it is the more people will just tend to skim, because I do that all the time. So I went back and refactored—er, rewrote all the somewhat tangential bits (wow these puns are too easy) into footnotes. Manually. Obviously if I have to do this again I’ll write a script for it. But the post is still really long, and I bet nobody will read the whole thing. Oh well.

Life updates: I got out of the hospital Friday two-and-a-half weeks ago, went to the preliminaries of NPSC (a national team programming contest) with classmates, threw up a lot, went back into the hospital, and came out again. I wrote a lot of stuff about the experience and how much it sucked (hint: a lot) when I started this draft around that time, but now putting so much detail in this post feels weird. I’m mostly good now.

Three years ago NPSC was the only programming contest I really knew of; now I’ve participated in quite a few more, both online and locally, but it’s still the only contest I’ve entered that gives you real-time verdicts. I believe it inherits this from being modeled after ACM-ICPC, but that’s for college people and I’m less clear on how it works. All the other contests, namely TopCoder, CodeForces, USACO, and the other local individual competition (there doesn’t appear to be an English name so for the purpose of this post I’ll just call it “Nameless Local”; there’s a nation-wide competition in one-and-a-half weeks!), have system tests after the contest that don’t allow you to resubmit afterwards.¹ They all give pretests that you get to know about right away, just to catch super-silly non-algorithmic mistakes like failing to remove the debug statements or reading input from the wrong place, but these contain weak test cases and don’t guarantee that the solution will pass the system tests and get full score.

After roughly parsing the rules on the IOI’s website, I believe the IOI (at least this year; I don’t know if it has evolved or anything) allows real-time testing and feedback with a confusing bunch of restrictions: how it apparently works is there are free “public tests”, and you can get results for a “release test” using the official test cases by using a “release token”, which you get two of every half hour. Additionally, there is apparently a cap of 64 submissions per problem, but I’m pretty sure it’s the kind of cap that honest competitors can safely ignore, like the 100-per-group in Australian puzzlehunts. Well, at least it tells us the organizers think in binary. Or octal.

Anyway, the variety got me thinking about how these contest methods compare and which one might reveal algorithmic programming ability better.

Input/output:

Most of these contests just make you read input from standard input in a problem-dependent format and output to standard output. Some contests (USACO, NPSC a few years while they still allowed VB because it apparently doesn’t have standard I/O; can’t figure out what ACM-ICPC does) use file input and output. This is an annoyance, albeit a minor one, compared to standard input/output (setting it up may be messy and you can’t just use command-line redirection for testing.)

TopCoder seems to be special in that it does away with this: it asks you to implement a method that receives parameters of problem-dependent types and returns some values. For example if the problem is to compute some value of a list, you might have to implement a method int computeFoobar(int[] data), instead of having to reading a bunch of integers from standard input. It basically removes any parsing or formatting of string data from the coding, except when they are intentionally part of the problem.

The IOI, at least for this year, also does this for the tasks where the program is just supposed to receive one test case and return one result. It has a lot more variety in its tasks, though; there are some that deliver a sequence of function calls to your program as input, and some that give you an input file you can do whatever you want with on your computer for as long as you want to produce an answer. This year there was one where you were basically supposed to write (well, probably generate) a program in an esoteric programming language. Darn, it looks like fun.

Personally I wish more programming contests didn’t require text input/output, because parsing and formatting is a tedious and rather language-specific task that doesn’t really involve deep algorithmic knowledge. You won’t be able to get a correct answer without knowing/memorizing that you need to output a floating-point number with one digit after the decimal point with something like this:²

cout << fixed << setprecision(1) << d << endl;

and of course, the code is absolutely nothing like this if you switch languages. I’ve participated in one CodeForces contest where the outputted floating-point value is allowed, regardless of how many decimal digits you have, as long as it’s close enough, but C++’s default precision turns out to be quite small. Blargh. Well obviously if I practiced more I’d have met and learned about these problems already, but still.

There’s also the issue (at least for C++) of deciding between <cstdio> and <iostream> input and output; as everybody who reads the fine print in contest I/O tutorials knows, the latter is subtly slower and can cause a TLE on massive input files even if the rest of your program is fast enough.³ On the other hand, if you use C-style I/O sooner or later you’ll end up in a situation where your machine requires %lld while the judge machine requires %I64d for long long ints or vice-versa.

Scoring:

CodeForces and TopCoder assign each problem a score that’s correlated to difficulty and depreciates linearly with time, with penalties of a certain amount of points applied for each resubmission. The score is an all-or-nothing thing, only given if the problem passes every single test. CodeForces starts the countdown for all problems once the contest starts, but I think TopCoder starts counting time for each problem only after you first click into the problem to read it. In addition, you can also score points in either one by submitting a test case that breaks a competitor’s program. TopCoder calls this action “challenging”, and CodeForces says “hacking” apparently to prevent people getting sued or something. In CodeForces you can type in an input file with the problem format or upload a test-case generating program; as for TopCoder, due to its input/output of direct program values you have to select parts of the input with its slightly clunky interface, which is annoying if you’re trying to input a really big test case. It’s perhaps one of the few negative effects of its special input/output system.⁴

In ACM-ICPC, as in our NPSC, teams are first ranked by the number of problems solved, and then by the sum of penalties: time elapsed since contest start until first submission, plus 20 minutes for each failed submission. No matter how slow you are or how many times you resubmit, if you solve two problems then you rank higher than a team that solved only one. Problems are not weighted by difficulty in any way, but since there are live leaderboards the contestants can look at (until one hour before ending) and these are team contests, it’s probably much less likely for a team to misjudge an easy problem’s difficulty and underperform. Well, TopCoder and CodeForces have live leaderboards too, but their problems are ordered by difficulty. Usually.⁵

Compared to these, USACO is a lot more friendly; it gives points for every passed test case. It also doesn’t weight the problems or consider how long you took to solve them. The IOI has a highly problem-specific scoring method that doesn’t depend on time either; each task has different subtasks that require different levels of program optimality, and some non-program-submission tasks even give partial credit in weird ways.

Of course, as a contestant I greatly prefer being able to get partial credit, if only to save my ego when I make stupid mistakes. I guess a full-on zero is reasonable if my chosen algorithm had all the wrong asymptotics or just completely failed in any not-quite-straightforward case, but then there was the time I missed about one test case out of a hundred because the problem asked for a solution mod 1000000007 or something and I modded all the intermediate values but not the very last sum. Grrr.

I imagine it couldn’t be that hard to find a way to somewhat distinguish between these cases. For instance, maybe the score would not be linear with respect to how many test cases were solved; I would totally be fine with a score like (I’m just making this up) \(\left(\text{problem weight} \times \frac{(\text{correct test cases})^3}{(\text{total test cases})^3}\right)\), so completely solving one problem is much better than submitting two weak solutions that can only deal with trivial test cases.

But on the other hand, I guess it does reinforce the importance of testing and writing robust code.

Having said all that, I should still acknowledge that algorithmic programming is a very small (and special) part of programming, the same way math olympiad problems are a very small part of math in general. There is a lot of variety in other programming tasks and in real-world programming. Also, with programming, the space of attainable tasks opens up a lot more quickly with respect to difficulty when compared to math, so I think it’s a lot easier to get started messing around with developing one’s own programs.

For instance (you knew this was coming): there are many programming-related tasks in rankk where a lot of the difficulty (hopefully this isn’t a spoiler) is related to things like figuring out how to “submit your solution via GET in 3 seconds”, plus sometimes cryptography or reverse-engineering code. They have a bunch of affiliate sites with similar challenges, but I’m still having a lot of fun struggling at rankk and didn’t have a reason to add even more accounts.

Then after ignoring its presence on certain blogs for a long time, I decided to start Project Euler, which is a site of mathematical/algorithmic problems with much more emphasis on the math. Plenty can reportedly be solved with pencil and paper. All you need to do is produce an answer, usually a number, by plugging in some big numbers or data the problem gives to you. The number of problems is daunting (I think they release one weekly) and difficulty of problems is overall very hard to judge. (I discovered there is at least one problem with the exact same algorithmic description as rankk, but I don’t even want to start speculating whose fault this is. The algorithmic idea isn’t that special or clever, at least.)

Plus, as I “just” posted about, I wrote Gridderface, my pet project for marking logic puzzles on the computer with the keyboard. Compared to coding for any of the above problems or challenges, coding things like this are different mainly because of the sheer volume of the code.⁶ The difficulties are completely different.

The main difficulty is organization. Of course that can be a problem in short algorithmic programs, and I do make sure to separate my code into subroutines by structural logic even then⁷, but object-oriented code brings the difficulty to a new dimension. You have to think for a long time to decide what each class should do and where the code for something. (Alternatively, you can write totally unextensible and unmaintainable code.) I try, but I’m too sloppy to apply design patterns or the model-view-controller or all that consistently. At the same time, there’s also designing the program: figuring out what you want the program to do, how you want to interact with it, what the UI looks like, and so on. This is something you don’t have to worry about in contest problems, as your goal is spelled out in black and white, right down to restrictions on the integer sizes. I guess maybe it could be considered as a separate activity from programming, but either way, it has to be done to produce the project.

Also, there’s this additional really annoying gap between high-school algorithm competitions and actual programming: they tend to reward people who aren’t as diligent in organizing their code. Being organized has inherent benefits in efficiently coding and avoiding and detecting bugs, but the small size of the programs means the usefulness of this is reduced. Meanwhile, you have to produce code under stringent time limits, and in TopCoder and CodeForces, where people can try to break your program with test cases for points, you don’t want them to be able to understand and debug your code. The fine print in those contests explicitly disallows going as far as intentional code obfuscation, but people can still adopt uncommon code conventions, use as many personal macros or abbreviations as they can, and write short, cryptic variable names nobody else can understand.

Anyway. After comparing programming tasks it’s natural to go on thinking about and comparing the programming languages I know so far. Some of these following thoughts were sparked after I read this essay on which languages to learn in the modern day. Well, somebody tweeted it and somebody retweeted it and I didn’t know who the author is or anything but it was something to think about, and a lot more focused on artful, organized programming than getting a job when compared to the other stuff you get in a web search. Obviously right now I want to program well more than to be able to get a job.

My first language was Java; if memory serves me, I started this in first grade or something. It’s stuck with me, although even though I’ve become disillusioned with the hardline object-orientation that is everywhere to be found. I admit, if I said I hate the stuff about ProblemFactoryInstance patterns I’d just be parroting stuff I heard through the internet hive mind. I have only a little experience with writing things that need design patterns that complicated — not enough to have much of a valid opinion. There are two things that definitely bug me about it though:

Every program is a class. You just have to write a class if you want to do anything and stick your procedure in it. This carries a bunch of additional annoyances: you have to make sure your class has the same name as the file it’s in, you have to invoke it from the right place in the package/folder structure, and so on. Even Scala, which is almost entirely Java-interoperable and has a lot of emphasis on object-oriented thinking and classes too, lets you execute programs written in a script-like sequence of statements.
More annoyingly, functions are not objects. Even C++ has function pointers, as creepy as the syntax is. You can’t pass functions around, and the result is you have to make a zillion anonymous classes implementing interfaces. Being a GUI program, Gridderface calls for a lot of Actions in particular. So the result is a dozen anonymous classes and half a dozen named classes that are each used twice. Yuck, I thought programming was supposed to prevent people from having to repeat themselves..

Dad sent me a pretty entertaining rant on Java’s “kingdom of nouns” which, though maybe a bit exaggerated, makes these points very well. I don’t think object-oriented programming is that bad; just let me pass functions around and I’d be a happy guy. Still, it gets in the way for small programs; I don’t think I would voluntarily code in Java for any simple projects without any strange requirements. I ended up using it for Gridderface for its API. Gridderface inherently had to involve a GUI and a significant amount of image manipulation; I doubt that given its purpose, there are any design choices that could change that. And Java’s API in these things is dependably portable and one I’m familiar with.

I don’t think it would have been wasted effort to learn Python’s GUI functionality for this (Tcl/Tk, right?), since I could probably use it in other projects, but when it comes to image manipulation the most common thing is Python Imaging Library, and when I tried it it just didn’t seem very friendly to install or reliably portable. Java’s graphics capabilities are all there in java.awt, java.awt.image, and javax.swing.imageio, and the file input/output provided by ImageIO is ludicrously painless to use.

Well, thanks to the first essay, the unknowledgeable me learned that there were newer languages that operated on the JVM and could interoperate seamlessly with Java libraries. They are the fourth and fifth languages on the list: Clojure, a Lisp dialect, and Scala, a very multi-paradigm language. I’ve played with Lisp before and thought it was cool but I still doubt I’d be able to write large things in it.⁸

When I first realized Scala existed, I really wanted to learn it, and maybe port my whole project into it before the five-line anonymous classes drove me nuts. Unfortunately I didn’t find any really good tutorials on-line, and of course I’d be skipping two languages on the list. But most importantly, I want to do things the Scala way, and they have a bunch of wrappers around Java Swing, so I worry if I start porting I’ll break stuff and end up unable to hack in any puzzle types I might want to construct.⁹

I don’t remember how the learning sequence continued, but the only other languages I code fluently in are Python and C++, and I have never coded any large projects in them. Not that small programs are a bad thing; I’m not an expert but I think the whole Unix philosophy is based on them. In particular, I have never written anything large enough in them to require anything but the most trivial ten-line classes. Actually, I don’t think I have ever written a class in C++. It’s kind of silly because the addition of object-oriented programming is the big change from C to C++, but I mainly use it for <iostream> and the STL data structures. Algorithm competitions, you see.

Python I mostly use for short get-things-done scripts, like sometimes if I want to batch-rename files with a pattern or search the SOWPODS lexicon for words that fit a puzzle’s restrictions. I solved my first challenges in Project Euler with it, and I also use it for rankk. There’s a really cool library out there. That’s all I’m saying.

C++ is something I use mainly because it’s the de facto high-school programming contest language. Right now, there is really no other language that you can always depend on the contest platform supporting.¹⁰ Also, of course, just about all the programming contest resources use it in examples and so do most of my peers, so using any other language means putting much more effort into learning anything related. Obviously, the algorithms one wants to implement translate pretty easily between (imperative, non-esoteric) languages, but when you want to use built-in data structures and algorithms or even just input or output something nicely as I mentioned, you’d be on your own. And finally, it’s fastest; something as high-level as Python could never compete with it on complex allocations or loops that have to run 1,000,000 times, and even Java reportedly has some overhead due to starting the JVM. Now, you don’t need that sort of efficiency for every problem, but much contest knowledge is separate from what is encountered in regular programming and it’s a lot of extra effort to learn them for two or more languages.

What follows these? Well, I’m fluent in HTML + CSS but technically it’s not a programming language. Serious languages I’ve used (and I use “used” rather generously) include, in roughly decreasing order of familiarity, JavaScript, Common Lisp, TI-Basic¹¹, PHP, VimScript, and Haskell. Depending on your definition you might include LaTeX (it is Turing-complete, after all.)

As for esolangs, I’ve written a lot of GolfScript, for code golf at anarchy golf of course. It’s… a language designed for code golf. Yeah. After it was designed there was FlogScript (which I think has no documentation except the source code) and then apparently Burlesque. I also played with Befunge a while ago after I got my hands on the Java source for a graphical interpreter and hacked in partial Funge-98 support.

Well, what next? I decided to learn Haskell, as sort of a substitute for number-three-on-the-list ML, since I’m pretty sure the point is for the language to be functional and I’ve heard more about it (including from my favorite rankk challenge to date :P). From it, there is this feeling of writing good code, but thinking functionally is pretty exhausting. However I was also surprised how many functional ideas I had already used in Python: in particular, list comprehensions. The main obstacle I have is that it’s hard to optimize or get asymptotics when computation is expensive (a big problem if you’re trying to learn through Project Euler problems, particularly ones with lots of primes). Otherwise, the types and Int, Integer, Integral give me some trouble from time to time. But I can unhesitantly say that learning Haskell was really fun and completely worth it.

Also maybe I’ll start playing with git some of these days (from the command line!) because everybody else seems to be using it for version control.

Can you believe I have the time for this stuff? I must be crazy. I have a different programming competition next week. Nameless competition, national level. Not a snowball’s chance in hell of using Haskell there, of course.

Darn, I didn’t realize there was an USACO contest this weekend. Oh well. See? I told you this post had no point whatsoever.

What can I say? This is pretty arrogant, but I seem to consistently overestimate my competition.

Actually, “system test” is a misnomer for the local one. They had teachers go from computer to computer, plug in a USB, and copy-paste the test data in. Also, the teacher will completely let you get away with it if you add random input prompts to your program, as long as s/he can tell what the answer is supposed to be. There were only about five or six test cases for each of five problems. Each correct test case got you points (well, USACO does this too) so you’d still get some even if you did something like fail to read the problem statement and set your character array to be 3 characters long like the guy next to me.↩
Well, I guess it’s possible and not that difficult to hack together a subroutine for formatting output, using only integer and character output, with suitable integer operations and casting. Of course, it would be time-consuming, introduce more room for bugs, and possibly slow your program down.↩
Well, I’ve heard
```
ios_base::sync_with_stdio(0);
```
will speed up <iostream> to be on par with <cstdio>, but I haven’t dared to try it out under contest conditions yet.↩
The interface for entering arrays is kind of weird and I couldn’t find any documentation, so here are some key points I figured out with experimentation. You can type text into the textbox and press <Enter> and it’ll be added to the end of the array. To add a lot of elements at once, particularly if you want to get them from the output from a program, type e.g. {"1", "2", "3"} and click the {} button. The C button clears all.↩
Well… I would have said the same thing about last year’s IMO and look how things turned out.↩
Currently I have 60 or so classes; a lot are short ten-line ones, but the main class has 1,200+ lines. And I know long classes are a bad sign; I worry a lot about how I’m going to refactor the class to be shorter, but it’s not easy. Too many functions I have to pretend are anonymous classes.↩
Unlike just about all contest code I’ve read by others. The horrible nested loops and macros make me shudder.↩
Well, after further random browsing it seems Python and Ruby both have Java-interoperable versions, Jython and JRuby. I don’t code in Ruby, but being offered Python with the Java API to some degree is tempting too. So many choices…↩
This sounds like one of those short-sighted design choices that result in bad code. Part of me feels very strongly that I should probably stop being perfectionist (my code is bad enough already), just port things the naive way, and transition to wrappers when I get all the big bits done. It’s such a big project, though. Darn.↩
USACO allows C, C++, Java, Pascal, and Python. NPSC allows C, C++, and Pascal. TopCoder usually has C++, Java, C#, and Visual Basic. CodeForces has just about everything: C++, C, Pascal, Delphi, C#, Java, Ruby, Python 2.7, PHP, Haskell, F#, OCaml, and Scala. Of course, the higher-level languages will probably still be unusable on problems with tight asymptotic bounds.

Now, it’d probably still be possible to one-line easy problems with them where performance is totally unimportant, and that would free up a lot of time. Unfortunately now that I’m out of Div II, problems on which I can do that are probably rare, if even existent.↩
On my TI-89 I implemented craps (the dice game) and a complicated, somewhat turn-based spaceship shooter game where there were even upgrades you could buy for shooting bull’s-eyes, among lots of other things. I vaguely remember that I wrote so many statements, I learned to touch-type the English letters (instead of hunting through menus for the right commands). On a calculator keyboard. Whee.↩

(note: the commenting setup here is experimental and I may not check my comments often; if you want to tell me something instead of the world, email me!)