ewjordan's blog: 2010

Tuesday, October 12, 2010

Poor People Should Probably Gamble More

Well, okay - maybe not in the sense that you think. It's not generally a winning proposition for poor folks to spend a lot of time at the casino, chasing losses and re-gambling their winnings until they're left with nothing but a third mortgage and an empty fridge.

But hear me out.

Meet Mary

Mary is a pretty typical college student; she's not dumb, but she's young, and she makes mistakes. She typically doesn't have very much cash available to her, beyond what it takes to pay rent and eat. Minor financial incidents deal serious blows to her because she's so close to the edge so much of the time, and she has trouble dealing with them in intelligent ways because she gets so stressed over them.

A little while ago, Mary got pulled over and was written a ticket for driving an unregistered vehicle. She's since resolved the vehicle registration, but the ticket was substantial for her, $250, and she rarely has that much cash to spend at any one time, living mostly hand-to-mouth off of the ~$100 a week she's left with after paying rent, some of which has to go towards gas, utilities, etc.

The ticket is now extremely overdue, and she's cruising for a bruising: if she doesn't pay it within the next three days, she's going to lose her driver's license, which would be a disaster (she's a tutor, and she has no affordable way to get to her students' houses other than driving herself). If that happens, she'll ultimately have to pay another $100 to the DMV as a penalty, in addition to what she owes on the ticket, not to mention the black mark on her record.

Let's assume for the moment that Mary has no safety net; there are no friends she can ask to float her a loan, no family members that she can reach out to, etc. She's only got $130 to her name, and her next paycheck won't show up for another week. What's a girl to do?

Well, Mary lives within an easy ten minute drive from the Mohegan Sun casino. And that's exactly where she should go, to slap the remains of her meager wad down on black, and cross her fingers!

WTF? Roulette has a house edge, you dummy!

No kidding. And even if it's a tiny edge, you're going to lose in the long run if you try to play roulette to make money. On a red/black bet, your odds of winning are about 47%, whereas the payoff is simply your bet, so the pot odds there are not working for you, and in a vacuum, this is a stupid bet. Worse, when people show up to the casino with $100 to gamble, they're likely to lose a lot more than that house edge would suggest, because they tend to keep gambling until they've placed a lot more than $100 in bets, and that edge gets applied to the total amount that you end up betting over the course of the night, not to the size of your chip stack when you start!

But this is a special situation. Let's evaluate Mary's two options here, following along with my fiction that these are the only two options available to her:

Option A: Do nothing. Mary's payoff here (we'll define this as compared to her current assets minus liabilities) is a completely certain -$100, because she won't be able to pay the ticket, and she'll eventually have to pay the DMV fee in order to keep driving, along with the ticket fee (which she already owes).
Option B: Gamble $130 on black. Here we've got two possibilities:
1. She wins the bet, walking away with $260, with which she's able to pay her ticket and come out with $10 to her name, for a net gain of $130, with a probability of about 47%.
2. She loses the bet, goes home with nothing, for a loss of $130, and still ends up owing an additional $100 to the DMV because she can't pay the ticket. That's a loss of $230, with a probability of 53%.

Now, we check expected values: the expected value of Option A is -$100, because it's a certainty. The expected value of Option B is $130 * .47 + $-230 * .53 = 61.1 - 121.9 = -$60.8! Option B, going to the casino and "wasting money" on a "losing bet", saves Mary about $40, on average, which is a full 40% taken off of her expected loss!

She's screwed either way, but by going to the casino, she's chopped 40% off of how badly she's screwed, on average - not bad for a few minutes work!

And here you were, thinking gambling was nothing more than a tax on the stupid.

The Moral

It just so happens that through sheer stupidity I landed myself in this exact situation when I was in college, except that while we assumed Mary had no options, I actually had many at my disposal that I was too foolish, stressed, and/or embarrassed to pursue (I could have asked any of a dozen family members or friends for the money, and I could have done a bunch of other things to earn it myself that only occur to me now that I'm older and wiser about such matters, not to mention all I could have done to avoid the problem in the first place).

In the end, though, I did nothing, my license was revoked, and I had to live without one until I was able to come up with the extra cash that it took get it back. Only now do I realize that even if I was too stupid to come up with something smart to do to pay that ticket, even doing something as idiotic as going to the casino and putting my entire net worth on the line on a losing bet would have been smarter than doing nothing!

The moral here has nothing to do with this particular incident; it's a general fact of our world that poor people have to deal with all the time: there are many, many circumstances where a lack of current liquidity takes away even more of your money later. Many sorts of debt fit this description, as do many governmental interactions, bank fees, etc. And unfortunately, the approach that many people in these situations take is to do exactly what I did, throw their hands up, moan "I just don't have the money!", and get hit with the fees/rate hikes/credit report consequences/etc. But if you're within any reasonable distance of the money that you need, that might not be the best option - depending on the situation, it could even be worth it to take the more unlikely roulette bets, such as putting it all down on #17! Run the numbers, figure it out, don't just sit there - what've you got to lose?

Further, while in my example I had many other options, in the real world this is not always the case: it may have been easy enough for me to borrow $100 from my parents (even if I was too stupid to ask when I still had time), but if I was $50k in the hole, or all my family and friends were as hard off as I was, it could easily become more difficult to close the gap. Of course, with more money on the line I'd be more likely to find a bet on the stock market with less of a house edge than a roulette wheel (keep in mind it takes a few days for trades to clear, though), but the point remains.

Real world safety-nets like bankruptcy don't necessarily change this picture too drastically, because while they may partially shield you from some of the consequences of doing nothing, they shield you equally well from the consequences of the losing end of your "Hail Mary" bet.

Gambling may be "stupid", but in the face of certain loss it can seriously mitigate the damage if you have no other options. The poverty cycle is helped along quite a bit by compounding effects, and in many cases even the financial damage that small extra fees or rate hikes cause is bad enough to justify making a slightly losing bet. Just make sure you know the odds and the payoffs before you lay it all on the line...

And for God's sake, man, pay your tickets before they're overdue, a few hundred bucks is nothing compared to the pain of trying to get just about anything done at the Connecticut Department of Motor Vehicles!

Saturday, September 11, 2010

Complexity of Intelligence: Kurzweil v. Myers Redux

PZ Myers doesn't like Ray Kurzweil very much. He came out swinging with a scathing critique of some of Kurzweil's statements, which Kurzweil fired back at in a similarly hostile manner. Somehow I don't think these guys are going to be having dinner together any time soon...

To decide the winner in this little sparring match (tl;dr summary: they're both wrong, in a way), we've really got to get the argument straight.

First, Myers was upset by Kurzweil's supposed claim that we could reverse engineer the human brain in a decade. Now, it turns out that this wasn't what Kurzweil actually claimed, but a reference to Henry Markram of the Blue Brain project's beliefs - his own thoughts are that it will take more on the order of 20 years. Point to Kurzweil (though it should be mentioned that it was Gizmodo that got that wrong, not Myers, at least directly).

But that's not what Myers was really jazzed up about. Here's what got his goat for reals:

Here's how that math works, Kurzweil explains: The design of the brain is in the genome.  The human genome has three billion base pairs or six billion bits, which is about 800 million bytes before compression, he says. Eliminating redundancies and applying loss-less compression, that information can be compressed into about 50 million bytes, according to Kurzweil.

Myers took particular issue with the statement "The design of the brain is in the genome", and that's, in my opinion, the most important part of this dispute.

At some level, the design of the brain is quite trivially in the genome, because it is constructed from it. But Myers, and most biologists, are quick to point out, correctly, that the developmental road from DNA to a living, functioning brain is exceedingly complex, and that the complicated physics, chemistry, and biology that would be required to simulate all of that would probably take a helluva lot of bytes to simulate properly. Way more than the mere DNA string.

Point Myers.

Kurzweil fights back, though: he's not talking about literally reverse engineering the brain in this number of bytes, he's talking about the information content that it takes to create the brain.

If we ignore the fact that Kurzweil has literally advocated for such an explicit reverse engineering of the brain as the easiest route to AI, then I suppose we can give him that point, or at least split the difference. Even if, at that point in his rebuttal, he meanders into the usual "you guys just don't get exponential growth!" spiel, which I'm sure we're all tired of by now.

Break. We'll call it a draw, for now - I'm not sufficiently convinced by either side to care too much about what they're saying. Especially since most of the argument seems to be a matter of interpreting exactly what it is that Kurzweil is claiming, and he won't really pin it down very precisely.

These two are getting a little bit emo for my tastes, so let's break it down to something that we can specify precisely.

What can we infer from the fact that the brain is coded up (even if we don't understand the entire decoding framework) in somewhere around 50 million bytes? Specifically, how does this figure affect our estimates about what it would take to create an AI that was as intelligent as a human? [Note: I'm very deliberately not specifying here that it should function exactly the way a human does, just that it's "as intelligent", for whatever that means...]

My impression is that Myers would say that we can infer almost nothing: the dynamics of the translation from DNA -> cellular functionality destroy any information theoretic content of any DNA based estimate. In other words, because things that do very complex tasks can be specified in short strings of DNA, there's an unpredictable amount of information that sneaks its way in, and we can't use genome length as a proxy for information content in any meaningful way.

Kurzweil, on the other hand, would probably argue that the genome length is a reasonable estimator for the length of a program that we could hope to produce that would achieve the same function.

Here's where it gets fun: in a general sense, applied to all the things that DNA encodes, Kurzweil is wrong. Even if we restrict to just the construction of the brain, he's probably wrong - a working simulation of the brain would likely require more than that 50 million bytes, for many of the reasons that Myers gave.

But - and this is a big but - if we are very careful about what we're looking for, that 50 million byte estimate does give us an upper bound on something.

It turns out that we can, with extremely high probability, assume that we can create an algorithm that can do roughly the same thing as the human brain, within 50 million bytes or less, based on Kurzweil's argument.

Notice the crucial difference in the claim: I'm not saying that we can simulate the brain, I'm saying that we can roughly achieve its abilities. This might be a big difference to Myers, but to anyone that cares about AI, the difference is all but trivial.

Here's the way the proof goes - it's remarkably simple. We simply compare the classes of information processing algorithms that can be implemented via development from DNA - call these U(DNA), the universe of DNA-based algorithms, and the algorithms that we can implement on a computer - U(CPU). Unless you subscribe to some dualist hoohah, you'll likely accept that there's a subset of each of these sets of algorithms that would qualify as "intelligent".

Now, it turns out that the information content necessary to specify a working algorithm to solve a task within a certain "language" (universe, here) is inversely related to the number of possible algorithms that exist in that language that solve that problem. So comparing information content is really a question of percentages: what percent of algorithms in U(DNA) are intelligent, and what percent in U(CPU) are? That will tell us how many bits it takes to code up an intelligent algorithm in each.

The real guts of my claim (a revised version of Kurzweil's) follows: if we know that a problem can be solve in N bytes in universe A, then as a rule, we can expect it to take roughly N bytes or less to solve in universe B, as long as there's nothing "special" about universe A that makes this problem particularly easy to solve (or that makes it especially difficult in universe B).

Myers' claim (or rather, my implied extension of it) is, in fact, that there's something "special" somewhere along the line between the string of DNA that is used to build the brain and the working lump of goo inside your head, that makes the directive "solve intelligence" much easier to solve than it should be.

What we should be asking: what computational advantage does the DNA->protein->neuron process offer that makes it so well suited to computing an "intelligence function", to the extent that it can express such a function so much more easily than a bunch of highly flexible computer code?

Most of the instances where we see special complexity reduction are extremely obvious and low level: DNA is very well suited to the "compute protein folding" problem, because it has a direct physical representation. Similarly, it solves the "arrange atoms in physically stable configurations" problem, and the "create an object that self-replicates" problem extraordinarily efficiently in bit-wise terms, because the continued existence of the structure literally depends on solving such problems. DNA would not stably exist if it couldn't solve those things, so we shouldn't be surprised that it's able to.

Conversely, "do intelligence" is neither a low-level side-effect of the way cells work, nor required for their continued existence. So absent some other evidence, we would be extremely surprised to find that DNA was particularly well suited to create it (or in other words, that a higher percentage of DNA-spawned algorithms lead to intelligence than the corresponding percentage of machine code algorithms).

We have yet to see many (if any!) instances where the low level functionality of DNA, proteins, or cells make high level computations much easier than they would otherwise be, and for very good reason - most "easy" constructs in biology are borne of emergence, not evolution or intelligence, so they are not very powerful things, and they just don't do very much! On the other hand, evolutionary constructs can be more powerful, and intelligently designed constructs an order of magnitude more powerful than even evolutionarily designed ones. But all of evolution's work ends up writing bits into DNA, and would thus be captured by the raw complexity estimate that Myers so abhors.

So in the end, Myers might be right: the design of the brain is not exactly in the genome. We do need a massive machinery to dissect, interpret, munge, and fiddle with the base pairs before we end up with a brain, and that brain has a lot of dynamics of its own that we'd be foolish to think we could replicate without a truly heroic effort.

But absent some really extraordinary evidence, we have no reason whatsoever to believe that any of that extra machinery actually helps the goal of creating intelligence: yes, we'd need to simulate it to simulate a brain, but as long as our goal is not to explicitly simulate a brain, we can probably discard most of what it does as irrelevant implementation detail. If we could figure out the algorithm, in all likelihood we could fit it within the space of the brain's part of the human genome.

So I'm calling this one a draw. We might not be able to simulate the brain within 50 million bytes, but it's overwhelmingly probable that we can solve the same problem that the brain does in that amount of space. To me that's the goal that we're shooting for, we don't need to worry about biological accuracy - I don't want to create yet another human, we've got plenty already, I just want a smarter computer!

Kurzweil may not understand the brain (and who does, really?), but Myers doesn't understand that nobody working on AI gives one solitary shit about the physical brain; we're after intelligence. And Kurzweil's argument more or less stands up as long as that's where we constrain our focus.

Now if only I could figure out the right 50 million bytes to code, I should have this thing cracked in no time at all...there's only 2^(400 million) possible algorithms, it should be easy, right? :)

In fact, I believe we can (will, and should!) get there with orders of magnitude less code, but that's another post for another day...

Sunday, July 4, 2010

0 = infinity, for certain values of infinity

$\int x^{n}dx=\frac{x^{n+1}}{n+1}+C$
(unless n=-1, in which case this formula doesn't work - you remember our friend the logarithm, I hope)

Straightforward integral, right? Okay, then. Here's your task: integrate 1/x^2 from -1 to +1. Done? Congratulations - you just "proved" that the definite integral of a positive definite function is equal to -2.

Good one - still think this is simple?

Let's dig deeper. Now take the same integral, except from +1 to infinity. Ready? I hope you got 1 as your answer. Do the same thing for -1 to -infinity - feel free to use the symmetry of the function if you want, I don't mind!

Now, put these two results together with the result from the last paragraph...what do we have?

$\int_{-\infty}^{+\infty} x^{-2}dx=0$

That's right! Nevermind the fact that 1/x^2 is positive everywhere along the real line, except at 0, where it is infinite; we've just "proved" that its integral over the whole thing is equal to zero. Yay!

What is going on here? Something has clearly gone wrong, and you probably know what it is. You're not allowed to integrate over any point where a function becomes infinite. Our calculus teachers beat that mantra into our heads, and it certainly seems like reasonable advice if ignoring it gets us results like this!

But happily (sadly, if you hate math) that's not the entire story. We've got a few more rules to break: imagine the state of mathematics if we were still worried about the fact that the square root of two could not be perfectly expressed as a fraction! This is a similar situation, where we obtain more by drawing outside the lines than by following them precisely.

Here's the magic: the expressions above are, in fact, correct. However, they don't mean what you probably think they mean - we have not shown that 0 = infinity, alas. But what we have shown is far more useful and pretty!

To see what we've got here, we need to talk for a moment about analytic functions. I'm sure you know what a function f(x) over the real numbers is - for any real value x we can pick another real value f(x), and this defines a function. Now, for the complex numbers, we'll write z = x + iy where both x and y are real numbers. What is a complex function f(z)? Here we have a slight problem. A function of z alone is a different beast than a function of x and y separately. When we speak of complex functions, we usually mean analytic functions; we mean functions of z alone, that don't depend separately on x or y. In other words, f(z) = 3z + z(z+1) is an analytic function of z, but f(z,x,y) = 3z + x + y*x is not, because it explicitly depends on x and y, not merely on the combination x+iy.

Now, determining whether a particular function f(x,y) is analytic as a function of z=x+iy is slightly tricky, though straightforward (just a bit of calculus - look up the Cauchy Riemann equations if you care). There are actually some interesting connections here to electrostatics, but I digress...in any case, one of the most important facts about analytic functions is that once you know the way the function looks in a finite area or strip (for instance, between 0 and 1 on the real line, or in a small disk), this actually determines the function's values everywhere. You can even get by merely knowing the power series at a point! It does not suffice to know the values at scattered points, though - you actually need a continuous piece. With that in hand, the only other trick is to figure out a practical way to get an expression everywhere else - this is not always easy, or possible! On this point, keep in mind that with probability one, most functions are not expressible in elementary terms; in fact, most functions are not even nameable (i.e. picking a random function, even from fairly nice families of functions, out of a hat, there is zero probability that you can describe it in any way whatsoever) due to the simple fact that analytic functions are at least as numerous as the real numbers (which also cannot be named with probability one).

Back to the point. If knowing an analytic function along a strip is enough to determine it everywhere, then we have a fabulously useful corollary: if we know that two analytic functions are equal along any strip, we then know that they must be equal everywhere! So for instance, if we know that e^ix = cos x + i sin x along the real line, then we don't need to separately verify that e^iz = cos z + i sin z for complex z! It automatically holds, as long as each side is well defined; we'll see later that one of the most useful applications of analytic continuation is to handle cases where one side is not well defined in certain areas of the complex plane and we can match a nicer expression to it where it is defined. If they are equal there, then we can use the nice expression to actually define the nasty one elsewhere.

So now you may see the gist of where we're going, though the details may be unclear. Alas, I have no time to finish up now, so the resolution of these issues will have to wait until I return. Until then...

Wednesday, June 2, 2010

The Pirate Handshake: Mutually Assured Destruction via Copyright Infringement

Just when you thought it was safe to do P2P file sharing again, we find that P2P lawsuits have become big business once more.

Don't look to me for vitriol, I'll save my energy and direct you elsewhere to vent.

But don't get me wrong, I'm against this sort of thing, and I'd like to think of a solution. And when I say "solution", I mean some way for everyone to continue downloading free shit without getting in trouble for it, because that's obviously the only reasonable outcome, right?

As a nerd, my first response is to think of a technology-based solution to the problem. Here, the way I see it, we need two things:

Avoid guilt by association
Achieve mutually assured destruction in case of a lawsuit

In fact, we could achieve both of these things fairly easily with a few modifications to the typical torrent protocol.

The Plan

The setup: Alice wants to download an independent movie, and she's found a torrent where it's being offered. But Alice knows that Bob works for the producers of said movie, and he's watching for people downloading the movie so he can bring lawsuits against them and supplement the meager income that the movie actually earned in the box office. Bob relies on the fact that in order to download the movie via torrent, it's necessary to upload it to other people as well - this is great for him, because he can go after people for illegal distribution of a copyrighted work, which pretty much guarantees him cash when he finds his mark.

Right then. So what can Alice do? For one, Alice can try to specifically block Bob from connecting to her computer, under the theory that even if she's listed as part of the swarm, Bob will have a tough time proving that she's distributing the movie if he couldn't download any of it from her. Obviously this is not ideal, though; Alice might not know Bob's IP address, he might even have multiple ones, and there's always the risk that if it came down to it the simple fact that she's in the swarm might incriminate her. Even if he didn't have proof to stand up with in court, Bob could always threaten Alice with a lawsuit and offer a settlement, and Alice would have no idea whether Bob was bluffing or not. Not a great situation for Alice.

Enter Cathy. Cathy made some modifications to the way the torrent downloading software works, and got together a large group of people using her new version. Instead of merely connecting up to a single tracker, when Alice wants to download a movie using CathyTorrent, she connects up with an additional set of trackers tracking items that she has no interest in (or rather, CathyTorrent does it for her, automagically). She won't actively participate in these downloads, but she's listed as a peer under each one.

That hasn't bought Alice much yet. She's still got to share pieces of the file she wants in order to get the thing, so Bob can still nab her. But Cathy's a smart cookie, and she has help from Debra.

Debra owns the copyrights on a dozen or so smallish text files, containing sonnets that she painstakingly created and hopes to license. Unfortunately, her licensing fees are rather high, so there aren't many takers. Debra still wants to get her work in front of people, though, so she's decided to turn a blind eye to most copyright infringement of her work. She's fine with Cathy using the files as part of CathyTorrent's new protocol, and has no problem with users of said software redistributing the files.

Unless and until, that is, someone was to bring a copyright lawsuit against a CathyTorrent user in court. At that point, the friendly (possibly informal and unstated) agreement that Debra extended to most users becomes null and void, and she'll bring the full weight of her extensive legal team down to bear upon the infringing party. After all, it's her work, she has the right to determine who gets to distribute it.

So now, Alice is connected to the swarm, and finds a peer named "Boris" that wants to share pieces of the movie with her in exchange for the pieces that she has. She suspects this "Boris" might in fact be Bob under a fake name, but has no way to be sure, since she doesn't recognize the IP. So before she sends any of the movie his way, they "shake hands" by sending each other some of Debra's sonnets. Only once they've each verified that the other sent the actual set of sonnets through (thus opening themselves up to a dozen or more counts of infringement-fueled hellfire from Debra's lawyers should Debra get pissy) do they ever send a single bit of the actual movie that they're interested in.

Now recall that I said Alice was connected to several trackers; sometimes Eric, on one of these other trackers, requests part of his Ubuntu 12.10 (Quotidian Quokka, obviously) LiveCD from Alice, but she's not interested in that file, and she doesn't even have it (though she may have reported having parts of it, because that's how CathyTorrent rolls). In this case, at some random point (under some probabilistic weighting) before the end of the handshake, but after having sent a few full sonnets through, Alice simply bids Eric adieu, and since Eric didn't complete the handshake with her, he moves on to find another source; he temporarily blacklists Alice on that file, and won't ask her for it again.

Bob's job has just become infinitely harder. He's got some tough decisions to make: he wants to prove that Alice has been distributing his client's movie, but he doesn't want to open himself up to the wrath of Debra, so he certainly can't complete the handshake with Alice only to sue her in court later - even if he won the lawsuit or extracted a settlement from Alice, Debra would have him on more counts of infringement than he had Alice on (recall, there are many separate works included as part of the handshake), so he'd come out behind. But because Alice is using CathyTorrent, she's been listed in many swarms that she is not actually participating in; in fact, she didn't even select them, the software did it automatically. Unless he actually completes the handshake with her, he doesn't even know if she's really sharing his client's movie or not, she could very well just be downloading Ubuntu, perfectly legally. The fact that she's willing to start the handshake doesn't prove anything, because everyone starts the handshake; the only way to know who's really willing to share the file is to complete the handshake, and it's impossible to know whether Alice will finish the handshake unless Bob has already fully shared several of Debra's sonnets.

At this point, Bob decided to stop chasing Alice and just focus on the Azureus users - this was supposed to be easy money, and even if he could possibly come up with some sort of legal workaround, it's too damn risky. There's always the chance that Cathy and Debra's shenanigans will hold up in court, and that's not something he wants to mess with.

Now, obviously this scheme is not perfect. It's a bit wasteful of bandwidth, and it overreports the number of peers (and flat out lies about the pieces of the file that the phony peers have); that's the price to pay, I guess. Cathy might be in some trouble, since it's been pretty firmly held that developers of software where the primary purpose is infringing can be found guilty of secondary infringement, so she needs to make sure her ass is covered, or perhaps far away from any jurisdiction that will care enough to track her down. In addition, Debra would have to be trustworthy and committed - she would have to be willing to put up the up-front costs of bringing legal action against Bob should he decide to fight, and Alice would need to trust Debra to help her out with any settlement or legal costs should Bob sue.

But at the root of it, the fact remains, if Bob wants to prove that Alice infringed on his client's copyright, he needs to admit that he or one of his agents infringed on Debra's, and Alice will have the logs to prove that it's so, which she'll be more than happy to share with Debra - the only way around this would be if Bob could somehow keep the IP that he used to download from Alice secret. But the fact that he caught Alice is, in itself, evidence of a crime against poor old Debra, an uninvolved third party, and Debra could probably use that in itself to begin legal proceedings. Debra has done absolutely nothing even remotely illegal, so while a judge probably wouldn't like her for being involved in this scheme, I think he would be hard pressed to find a legal reason to dismiss her completely valid copyright claims.

ewjordan's blog