share
Stack OverflowThe most impossible programming/technical behavior
[+21] [13] Shimi Bandiel
[2008-10-07 13:45:50]
[ fun ]
[ http://stackoverflow.com/questions/178506]

What is the most impossible/irrational/magic behavior you ever encountered and what was the 'simple' rationale behind it?

As an example i give this link to the famous 500-miles problem [1].

[+20] [2008-10-07 13:53:18] Shimi Bandiel

Another impossible problem (although not related to programming)
The Vanilla Ice Cream Problem [1]

[1] http://www.campbells.org/Rant+Rave/r+r_VanillaIcecream.html

Wow, this was pretty awesome. - apandit
This absolutely awsome. I'm going to send it to most of my co-worker. - Remo.D
Wow! Real life is indeed stranger than fiction! - Robert K
This was debunked in snopes.com, as an urban legend. Pretty neat anyways though - Mario Ortegón
snopes.com didn't actually "debunk" it -- they merely pointed out that the story has changed over the year -- however I never believed it; checkout line times would vary far more than a walk to the back of the store. - James Curran
Yes, I too like that story. I read it, and several other interesting debugging stories in David J. Agans’ book Debugging : the nine indispensable rules for finding even the most elusive software and hardware problems. I highly recommend it; it’s a great read. - Synetech inc.
1
[+15] [2008-10-07 14:47:59] James Curran

This one was interesting because there was two phases to it : Diagnosing it, and solving it in the field.

I was maintaining a PC based point-of-sale system, which had replaced an embedded system. The PC system had to duplicated all the quirks of the embedded system. One of them was how it used the printer. If the printer was plugged into the PC, turned on and online, whatever went to the screen was also sent to the printer. Now this was on an mid-80's MSDOS PC.

Normally, if a printer went off-line, the text would just be automatically buffered, and printing would resume where it left off when it came back online. Here, however, I was supposed to sense that the printer went offline, and dump the intervening output.

Similarly, if one were to attempt to print to a printer that wasn't connected to the PC, the PC would just lock up, so again, I had to monitor the pin-outs of the printer port, and if the printer suddenly disappeared, I had to stop printing.

After way too much time with printer manuals and PC hardware specs, I figured a plan on which pins I had to watch. The system shipped and all was well.

Until, some sites complained that their system would lock up. They would get two or three characters displayed on the screen and then nothing. The system would be frozen. It sounded just like the "printing to a missing printer" problem, except (a) I had dealt with that problem, and (b) The printer really was there.

After some time (which include one site sending us their system to test --- problem couldn't be duplicated), I finally stumbled upon the problem -- if the cable was loose, I'd be getting the signal from some pins but not others, the app would get confused and try printing went it couldn't. The simple solution was to just give the plug a good shove to reseat it.

The real problem however was getting people to do that. I'd be doing phone support (naturally because I was the principle developer....). People would describe the problem; I'd say "The printer cable is loose". They --not wanting to reach behind a PC that had been stuffed in some cramped corner, would just glance at the cable an assure me it fine. We would argue a bit more....

Eventually, I hit upon the method to get them it fix the real problem. When I got a call like that, I would first tell them we need to "isolate the problem" and ask them to pull the printer cable off. The PC would immediately unlock. So then we "try the next step" -- plugging the printer back in. They plugged it in firmly -- problem solved.


Nice behavioral workaround. - Brad Gilbert
(2) This reminds me of the whole "switch the ends of the ethernet cable" trick I learned when I did tech support. It does nothing but make sure the customers' ethernet cable is secure. - Jason Baker
2
[+6] [2008-10-07 14:08:54] Randy

A somewhat programming related one (though not my story):

A Story About 'Magic' [1]

[1] http://catb.org/esr/jargon/html/magic-story.html

3
[+6] [2008-10-07 15:37:28] Justsalt

The Story of Mel [1]

Of course, not my own experience, but surely the nec plus ultra of such stories.

[1] http://rixstep.com/2/2/20071015,01.shtml

4
[+5] [2008-10-07 14:30:59] Glomek

I was debugging a C program (not written by me) that was intermittently failing by running really slow. It decided whether to look something up in an index or do a full database scan by looking in an AVL tree. Using gdb, I verified that the index in question was entered in the search tree, and I verified that it was not found in the search tree. It turns out that the comparison function used the addresses of nodes, which was okay for reasons that I won't go into, but it did "return node1 - node2", which works as long as the subtraction doesn't overflow. Unfortunately, when the subtraction overflows, you end up with the strange result that a > b and b > c, but c > a.

This was in a huge code base, and it took me weeks to find the problem. To top it all off, once I found the problem, it turned out that it had already been fixed in the latest release of the product.


5
[+4] [2008-10-07 13:51:15] Rodger Cooley

So far, my problems have all been irrational timelines... ;-)


6
[+2] [2008-10-07 14:40:40] Kyle Cronin

While working on implementing an extremely naive version of call-with-current-continuation by manually copying the C stack I ran into a problem where it would work some of the time and fail in others. After losing much sanity, I discovered that the process doesn't like it when you try to rewrite your current stack frame. This was evident when attempting to restore a continuation that was deeply nested at a later time with fewer stack frames. The solution was simply to call the function again and again with the same parameters and build up the stack. Once it was large enough you could rewrite the lower portion with a saved stack and longjmp into it.

atom *icont(acont *c, atom *ret) {
    /* ... */
    char tos;

    if (c->respt < &tos)
    	return icont(c, ret);
    /* ... */
}

In this code, c is the continuation, a struct containing a restore point (respt) that indicates where the stack needs to be copied back to. The ret value is the value needed to return from the continuation and is not dealt with here. The tos value is a stack value that indicates where the current top of the stack is. The if statement tests to see if we've built up enough stack by testing the relative position of respt and tos. If not, we recurse with the same parameters, effectively adding a stack frame each time.


shudder That is diabolical! - Justsalt
This code needs a comment. - unwind
I hope this helps - Kyle Cronin
7
[+2] [2008-10-07 15:04:40] Ryan

I was in my OS class junior year of college, and I was coding my first threading assignment.

Now, here we need a bit of background. My school had recently transitioned the introductory classes from C and C++ to Java. Specific C or C++ classes are offered, but are not a requirement. I took the Java-style intro course, and had never really learned any substantial amount in C.

Somehow my OS professor did not get the message, and his class required a full, working knowledge of C. So here I am, with nearly no C knowledge, working with pthreads. Yay.

Anyway, The assignment requires splitting off N threads which all perform some calculation and are then join-ed to get some final result. I ran into an interesting bug: my program worked with 9 or fewer threads, but with 10 would segfault.

I worked all night mucking around with mutexes, flags, recompiling pthreads, trying it on windows, linux, and solaris, etc. etc. etc. and still, 9 threads worked just fine, 10 threads blew up.

After a day of class, coming back to my assignment I finally figured out the problem. I had created an array to hold pointers to threads. Problem was, instead of creating it as [num_threads], I created it as [num_something_else]. The particular variable I used escapes me, but it always evaluated to 9.

When it was time to make my threads I had for (int x=0; x<num_threads; x++) {threads[x] = pthread_create()} or something. My java-addicted brain was expecting ArrayOutOfBoundsException. I had made a 9th-grader mistake.

Lessons learned.

  1. Learn C.
  2. After a few hours failing to find the problem, you are better off just going to bed.

8
[+2] [2008-10-29 02:53:20] Adam Liss

We once sent a developer--about 1/5 of the company's engineering department--from the US to Finland for almost a month in the middle of Winter to fix a critical problem for a large customer.

The customer had deployed our relatively new embedded product and complained that it eventually stopped responding to all network traffic until it was rebooted. Sometimes it worked for a few hours, sometimes for a few days. Everything was absolutely normal until it suddenly become completely unresponsive to the network.

After several weeks of observation and instrumentation with home-brew debug tools, we discovered that some of the customer's devices broadcast an unusual type of packet on the one port that was mis-handled in our 3rd-party stacks. Instead of being ignored, each of these packets was queued in a buffer until a (nonexistent) listener consumed it. Our product eventually ran out of buffers....

Once we identified the problem, we deployed a one-line fix almost immediately and allowed our developer to defrost.


it is not that cold in Finland ;-) - Petteri Hietavirta
(2) @pepez: LOL! Guess he was just thin-blooded! - Adam Liss
9
[+1] [2009-11-25 00:35:40] Andreas Bonini
10
[0] [2008-11-27 07:35:11] MrValdez

This is an irrational technical behavior from a financial company in a previous job.

After I deployed my software, their live server refused my connections. But it worked fine with their test servers. I double checked and then triple checked my code to be sure I didn't fouled up. There was no problem in my code, so I called them on the phone.

They couldn't figure out what the problem was. They even suggested that it was a problem with our code. After a week of unsatisfying responses, I asked them what error logs they were receiving. They sent me a single line describing the error. It was a Java exception about SSL [1]. I researched on the Internet and 2 hours later, I found out that their live servers are not accepting our SSL connection. I asked them why they were not accepting the connection. A day later, they responded that our SSL's Certificate Authority (CA) wasn't in their whitelist.

I'm not going to reveal who is our SSL CA but it is one of the top 5 CA in the market. And the CA they recommended includes CA which our own CA outranks. I also disliked the way they refer to their CAs as "a more trusted brand".

I called bullshit and created a 2+ page report [3] detailing why our CA can be trusted, our CA will not disappear and stop sending (which is one of their reasons why they don't want to whitelist our CA), the ranking of our CA compared to their recommendations, a partial list of big companies involved in financial and mission-critical* software that uses our CA and an entry in their own documentation that state that we only require X bit SSL certificate (no CA was stated).

The simple rationale was: they didn't want to add our CA. I think its a political or bureaucracy reason. I refuse to believe they can't add it because of technical reasons.

Weeks later, we dropped them and created our own competing software.


[1] funny thing is, I don't know Java yet I understood what their problem was. I wouldn't bring this up but they insist that they suggested that its because we weren't using Java that we were having trouble. This doesn't explain why we have no problem with their test servers though.

[2] Sorry for the marketing term, but I'm trying to be as vague as possible.

[3] Its only 2 pages but I also included a lot of references such as wiki, a report on CA, a blog post about SSL Certificates on Coding Horror, etc.


11
[0] [2008-10-07 13:52:10] Steve Moyer

How about the NetFlixPrize challenge? Although some of the teams are getting close and may claim the prize by brute force, I tend to think that NetFlix has a ton of extra data that would actually result in better predictions. See http://www.netflixprize.com.


12
[0] [2008-10-07 14:16:00] skaffman

There's the classical Travelling Salesman problem [1]

[1] http://en.wikipedia.org/wiki/Travelling_salesman_problem

13