Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Traceroute isn't real (gekk.info)
61 points by ogurechny on Aug 1, 2023 | hide | past | favorite | 37 comments


This article is overly negative - if a single use of traceroute offends author ("debugging other people connections"), it does not mean the tool is not useful. I found traceroute to be extremely helpful when debugging the networks I control. It is also pretty good at general troubleshooting of others' websites, as long as you don't expect much. traceroute can tell you if the problem is at your ISP (switch to 3G modem!), or on the backbone (do you have that proxy on the other coast?) or target (just give up).

Additionally, this is demonstratively false - traceroute actually works on the great internet.

> With that information, go ahead and ask yourself if you think anyone, at any network hardware company, has given a shit about implementing TTL Exceeded since the 90s. The answer is obvious: No. Without a doubt, this is not on anyone's priority list.

Nope, just did "traceroute www.yahoo.com" and got 11/13 hops responding.. turns out yahoo direct peers with Verizon, who knew! "google.com" got 8/10 hops responding.. "gekk.info" is badly incomplete, but it did manage to report "dreamhost.com" so I could still make my ISP/backbone/target ISP determination.


I got 16/16 with www.yahoo.com, 15/15 with google.com, 22/25 with gekk.info. Using mtr I got 26/27 after 120 packets (1 with 50% packet loss, 4 with 95% packet loss, and only 1 with 100% loss)


This is a pretty poorly reasoned article.

Multimeters aren't real because the schematics, design documents and repair manuals for this device don't mention them. Therefore the voltage readings I'm getting here and there don't mean anything.

Sonar isn't real, because it's not in the specification of submarines and ship hulls that they should reflect sound.

Air pressure isn't real; it was here before us and is undocumented, so cruft like 101.3 kPa doesn't mean a thing.

Meltdown and Spectre exploits or other side channel attacks aren't real because the behaviors cannot be deduced from CPU architecture manuals.


>This is a pretty poorly reasoned article. Multimeters aren't real because [...]

I understand your analogies but the author was trying to explain a different type of phenomenon...

Diagnostic tools like traceroute and ping require "cooperation" from the hosts because of how they choose to handle ICMP packets.

In contrast, other tools like multimeters and air pressure gauges don't need cooperation from the device they're probing. E.g. a 9-volt battery can't "lie" to the digital multimeter and notice that it's hooked up to a voltage measurement tool instead of being inside a smoke alarm and pretend to be 10000 volts or 0 volts.

Anyways, when authors go for shock headlines like "Traceroute isn't real" instead of explaining in plain language, it just invites readers to misinterpret what the intended message is.


> 9-volt battery can't "lie" to the digital multimeter...

they totally do lie, for a number of reasons. A most common one is that multimeter presents much higher resistance than many loads, so an almost-empty battery might present "8.7V" to multimeter but that immediately drops down to 1-3V if any real load is present. This won't happen with smoke alarm, but can happen with a radio or a toy. This is a common gotcha for many people who are just learning to use the tool.

The less common reason include: shaking the battery (such as when you are removing it from the device) might increase its voltage for a few minutes; bringing outdoor device into warn room might also make battery appear more full.

So yes, multimeters have their own limitations, and if the device under test cooperates (exposes test points for example), they work much better. You need some skill to know when to trust it and when to ignore its reading. Kinda like for traceroute :) so maybe the analogy is not that bad after all.


>they totally do lie, for a number of reasons. A most common one is that multimeter presents much higher resistance [...] So yes, multimeters have their own limitations,

Yes, I understand that any physical measurement tool in the universe has an error range whether it's a multimeter or an air pressure gauge. I have 3 air pressure gauges and yes -- all of them are are "lying" to me because they're all "wrong" by various percentages for various environmental reasons. Even the $150 Longacre elite air pressure gauge is lying to me about my car's actual tire pressure by the perspective you're using. The mistruth you're pointing out is from the perspective of the measurement tool (e.g. the multimeter has higher internal resistance.)

However, the "lie" perspective I was using was from the human-configured _target_ device itself -- for adversarial-or-performance-optimization reasons. The humans that configured the network nodes can deliberately make it not respond to ICMP packets which renders the traceroute statistics less than helpful (or "not real" as the author puts it). That's the angle the author was trying to explain in a confusing way. That's a different situation from the analogies that gp (kazinator) was using.

Your perspective is also correct but the physical limitations of measurement tools wasn't my point. The humans with "agency" in how to make their routers or servers respond to ICMP packets is a different kind of "lie" than batteries that haven't been shaken.

>Kinda like for traceroute :) so maybe the analogy is not that bad after all.

I use digital multimeters every day and reading "3v" from a 9-vote battery doesn't feel like a "lie" but a sort of "truth" based on the load factors you point out. However, pinging a system and getting timeouts feels more like a mistruth because the target system is often actually there but somebody didn't want to make it reply to those particular packets.


I don't think the distinction between "physical limitation" vs "humans with agency" is quite as clear-cut as you think... After all, if one wanted my 9V batteries to be always clearly measurable they could make thicker electrodes and thinner electrolyte. Instead, the humans which designed batteries configured them to prioritized high capacity and low cost, "deliberately" making multimeters inaccurate.

Or for a more extreme example: Apple chargers prominently say "20V" on them.. and yet if you get one and hook up a multimeter to it, you will only measure 6 volts (see "Charger Startup Process" in [0] to see why, spoiler: it needs special enable pulse as a safety measure). Will you call this "a lie"? Can we say "multimeters aren't real" because Apple power supplies require special pulse? Can we say "humans have deliberately made it not work with multimeters"?

Instruments, be it traceroute or multimeters, have limitations; and users should know about them. Even if technology advances make tools less useful in some situations, there is no reason to disparage them.

[0] http://www.righto.com/2013/06/teardown-and-exploration-of-ma...


Electronic devices sometimes have test points on their circuit boards, and you may get documentation for those which tells you what to measure, how and how to interpret the values. These devices are analogous to a protocol with telemetry.

Devices that don't have such a thing are like hosts that may or may not give you an ICMP message. You're on your own, reverse engineering and guessing.

The author says that the latter approach isn't "real".

If an object is man-made, investigating it empirically using the scientific method, using whatever tricks are available, including the repurposing of its features, is off the table and the results are not real.

OK, a few idiots think that if traceroute shows node 6 and node 8, node 7 must be down. So what?

A few idiots probably also think that if they put an ohmmeter on a resistor that is soldered into a circuit, they can read off that resistor's resistance.

Or that if a 10V battery reads 8.5V, it's still 85% full!


>You're on your own, reverse engineering and guessing. The author says that the latter approach isn't "real". [...] ,

No, that isn't what the author is saying about "real". He's not saying that using some common sense problem-solving is "not real" as you're claiming.

Instead, he's saying official industry support for traceroute is what's "not real". Example quote: "For a good summary I highly recommend this presentation. But as good as that deck is, I always felt it left out a crucial piece of information: Traceroute, as far as the industry is concerned, does not exist."

To your credit, I don't think it's your fault for misinterpreting it since the author went for a "cute" title instead of just plainly stating limitations of traceroute in a non-dramatic way.

This following outlines the writing style the author used that inadvertently caused readers to hyperfocus on the word "real" instead of just agreeing or disagreeing on the actual points about "limitations of traceroute":

- step 1: define "not real" in a very specific idiosyncratic way so it conveniently creates a nice shock-value blog "title". E.g. "With the powers vested to me as author of this blog, I'm going to use "Not Real" to mean ... no modern-RFC, and no ironclad legally-binding SLA for ICMP/UDP/TTL responses from all network hosts, and no hidden performance hacks or dropping of packets, etc." (Basically, he's saying that there's no perfect cooperation/coordination between all actors in the network so that means "traceroute isn't real" in his constrained definition.) And bonus points for creating a special definition "not real" that is so nuanced that no casual reader could hold it in their head.

- step 2: write in a condescending way ("everyone was wrong") so that traceroute appears completely useless and any user is clueless for even trying it as a diagnostic tool

The author's writing style undermined the effect he wanted since the end result is a reader like you thinking he meant "reverse engineering as a problem-solving technique isn't real".


The author has latched onto one problem with traceroute, that hosts do not reliably generate the ICMP error. From that one thing (that everyone halfway knowledgeable in networking already knows) he generated a huge diatribe.

It's obvious that the author means "not real" in the same way that i = a[i++] returning a particular value on one C compiler is "not real". There are no engineering requirements anywhere which give you specifics that you can rely on.

Yes, traceroute is "not real" in a similar way.

What it means is that you probably don't want to be writing an application in which you run traceroute and have the code making important decisions based on what is scraped from the output.

Traceroute gives real information though: yes, that host #5 you see in the list really did drop the TTL-dead packet and sent you the ICMP notice. Yes, the repeated listing of hosts 9, 10, 9, 10, 9, 10 .. almost certainly mean they have a routing loop between them.

My argumentation is that people have to investigate and predict the behavior of systems that are not specified by any engineering requirements. For instance, we have meteorologists who predict the weather, imperfectly. The weather doesn't come with a metrics API and reference manual so meteorology isn't real. And look, it's often hardly better than a wild guess more than about 4 days out.

Maybe traceroute is like a weather radar. It tells you there are some droplets in the air out to the northeast, about 15 miles away. Maybe it's rain, maybe not; or is it a swarm of locusts? Those radars are useful nonetheless and pretty real.


I am a bit baffled by the arguments about what the author meant. (Needless to say, people who “prove” they can run `traceroute yahoo.com` are hopeless.)

Traceroute is a byproduct of a complex process that may match the forward path of real application traffic you send to some address. It may not if you have some local (transparent) proxy for certain protocols, or QoS in the middle (that may mis-classify your application traffic, unlike regular ICMP pings), or intrusion detection system, DoS protection system, or low level load balancer on the other end. The author invites to check another explanation multiple times, and it gives at least one example when replies from a certain hop are canonically sent by a different network hop. Some ISPs use IP addresses from private ranges, and they surface in traceroutes. Some ISPs lease others' infrastructure, and you see those IP addresses and DNS names inside their networks. Some ISPs deliberately mess with low TTL packets to cloak details like those from clients and competitors, and keep their physical network layout secret.

You need to know all of that, from ARP to BGP, to understand traceroute output. You also need to remember that one post in network engineering community a month ago that told that ISP M was bought by Corp N to guess that they are probably switching traffic from leased line between IX A and IX B to Corp N's own lines between City X and City Y, which might explain what you see. However, traceroute manual doesn't give you a list of 30 thick books to read in advance, nor does it contain the latest gossip about the state of ISP O's hardware in City Z. Therefore, unprepared user is not aware of all the possible complexities.

This reminds me of “helpful” websites that state that hosts with IP addresses of well known CDNs are located in California. Then people from all other the world run ping or traceroute, and deduce that they have a 10 ms RTT to California. Wow, such a great CDN!


> This is a pretty poorly reasoned article.

Wait until you get to the preaching social justice part!


Traceroute is problematic and sometimes lies. Paths aren't always symmetric. Sometimes routers are too busy to answer ICMP. And MPLS/encapsulating/tunneling is opaque.

On the other hand, real world network engineers rely a whole lot on traceroute from their machines and from looking glasses (and mtr is wonderful).


The author doesn't like mtr either:

> The example given is that a handful of users running MTR (do not get me started on this bastard program) can actually hit this rate limit. This is an outstanding example because I have seen something similar in practice.

> Consider what that would look like, and how common it would be: If you have a NOC full of people who think they know what they're doing, but don't, that only enhances the probability that everyone is trying to troubleshoot on their own instead of doing a screenshare and coordinating their efforts - thus, you have six guys running MTR to the same IP.

> If they hit that rate limit, what do they see? Nodes suddenly not responding! Randomly, in fact - sometimes responding, sometimes not! That means it's not just a hop that doesn't respond to traceroutes, but packet loss! Wow! We found the problem!

> Of course, if four of them hit Ctrl+C, the PL would mysteriously vanish. Huh! Weird! Well, it must be an intermittent issue in squints at resolved hostname Hurricane Electric's network. I'm sure they have a flapping port they haven't noticed (lol.) Just send them a trouble ticket!


Is there any guarantee that there's a stable path for the duration of the traceroute packet series; FWIU, IP routes can change in the middle of a traceroute?

From "Gping – ping, but with a graph" https://news.ycombinator.com/item?id=36549005 :

> Scapy has a 3d visualization of one traceroute sequence with vpython. In college, I remember modifying it to run multiple traceroutes and then overlaying all of the routes; and wondering whether a given route is even stable through a complete traceroute packet sequence. https://scapy.readthedocs.io/en/latest/usage.html#tcp-tracer...

It's trivial to modify packet TTLs with e.g. iptables.

Virtual circuit: https://en.wikipedia.org/wiki/Virtual_circuit


There's no guarantees on path during a trace; after all, a connection could drop or come up during a trace, and routing may change as a result. We're packet switched, not circuit switched, so there's nothing out of spec if you route differently.

However, tcp is sensitive to getting packets out of order, and most multipacket udp applications aren't a big fan either. So it's common to ensure a packet with a given 5-tuple {ip dest, ip source, protocol, protocol port dest, port source} will route the same, including traveling over the same port in a multi-port bundle, unless that next hop goes down or a better next hop comes up.

That said, many traceroute implementations default to using a new port for each probe, and depending on the networks between you and what you're probing, you might see a variety of paths in a single short probe. When that happens, you really don't get much information.

Of course, the return path for the packets is likely to be different and isn't easy to know; if you control both ends, you obviously can; and if the destination network has a good looking glass, you can get some idea, but detailed traces aren't going to happen.

All that said, parts of the rant are accurate. Routers and hosts don't have to participate and many don't or are rate limited. Intermediate latency is often not very helpful. Where a trace ends only tells you were the trace ends, not where the packets left the network (possibly at the destination). Having useful diagnostic information doesn't help if you can't get in touch with someone who will use the information to fix the problem.

Being able to trace from multiple vantage points helps make sense of things.

If you're experiencing intermittent behavior like some tcp connections between two hosts work great and some don't, and most of the network involved participates in tracing and you have a good contact, you can do specific tracing and narrow down the routers they need to check to find the problem; sometimes that helps time to resolution a lot.

If you don't have a cooperative network, it's better to invest in making multiple connections and using the best one as measured; because that gives you agency to 'fix' network problems.


One perspective: http://shouldiblockicmp.com/ :

- Don't block ICMP: Echo Request and Echo Reply, Fragmentation Needed (IPv4) / Packet Too Big (IPv6), Time Exceeded, and only within the boundary allow NDP and SLAAC (IPv6)

But presumably none of that helps solve for what traceroute is used to help diagnose.

"ICMP: The Good, the Bad, and the Ugly" (2016) https://blog.securityevaluators.com/icmp-the-good-the-bad-an... ; blocking ICMP mitigates {Ping sweep, Ping flood, ICMP tunneling, Forged ICMP redirects,} but breaks {Path MTU discovery (PMTUD), TTL, ICMP Redirect}

ICMP > Redirect: https://en.wikipedia.org/wiki/Internet_Control_Message_Proto... ; Type 5

So this would drop just ICMP redirects:

  sudo iptables -A OUTPUT -p icmp --icmp-type 5 -j REJECT


What I block or don't for ICMP doesn't much matter when I'm tracerouting through networks I don't control,which is nearly all of them. Yeah, if I block the returns from my probes, I won't get useful results for traces to pretty much anywhere, but that should solve itself.


mtr, etc, notices changes.

Sure, you can create bogus traceroutes once things enter your network. I had 10 extra hops for awhile for luls.


Very informative article. I learned something and got a couple of laughs as a bonus:

In addition to the millions of hosts that an internet router has to arbitrate between, it also has its own IP addresses, which people rudely try to interact with all day long. When you ping a router, you're making that poor little 600MHz ARM chip find time to deal with your traffic, not the terabit-per-second monster that it's married to.

It has been my experience that easily 75% of people working networking jobs are operating in a state of absolute terror, trying to keep their head above water with problems they don't really understand at all.


I've never used traceroute, but after reading this article, I know I will.

Went ahead and looked at the source for traceroute. This heading comment is quite great and informative as well: https://github.com/openbsd/src/blob/master/usr.sbin/tracerou...

They also use the event APIs to implement the loop of traceroute. Seems like a lot of effort to me given how few events. Maybe someone else has insights why.


I am probably sharing forbidden knowledge, but look up the mtr program. It is kind of like a strange hybrid of traceroute and ping.


In my experience, nmap is a far more useful tool for troubleshooting connectivity than traceroute. Scan a host and see if the port you want is open. If not, then "closed" versus "filtered" versus "host is unreachable" will give you an idea if the issue is the destination program isn't running, the firewall on the target (or possibly elsewhere in between) is blocking things, or the host's not available on the network, either because it's misconfigured, in a different subnet or VLAN or whatever, or shut down.


Nmap is a fantastic tool, but it helps with issues at an entirely different level than debugging routing issues.

When you’re talking about L2 routing, firewalling and NACLs, it is very, very likely that most packets are just silently dropped. In this case, nmap would just report that the host is not up (even if you tell nmap it is up).


It’s an interesting read but the author really takes way too negative a view about using things for other than their intended purposes. That’s literally what we do here. Finding new uses for the things we already have is how so much real progress is made.

The tone of the writing is very distracting.


> It is, generally speaking, not possible to call AT&T and say "Hey, when I try to ping one of your subscribers in California from a Level3 circuit in New York, I'm hitting a routing loop."

I did almost this exact same thing (with an issue on AT&T's end of an AT&T/HE connection) and the issue was resolved within the hour. I contacted HE (my ISP) with my mtr results, they contacted AT&T, and AT&T fixed the issue.


> You can easily confirm this is true. Run a traceroute... anywhere. Yahoo dot com. You will see nodes that never respond, 9 times out of 10.

  tracert -d  -6 yahoo.com

  Tracing route to yahoo.com [2001:4998:24:120d::1:0] over a maximum of 30 hops:

  1    <1 ms    <1 ms    <1 ms  XXXX:6ecd:d6ff:fe44:4082
  2    21 ms    20 ms    21 ms  XXXX::1
  3     *        *        *     Request timed out.
  4    21 ms    30 ms    19 ms  2001:504:0:3:0:1:310:1
  5    22 ms    27 ms    27 ms  2001:4998:f030:2::
  6    45 ms    47 ms    47 ms  2001:4998:f00a:24::
  7    66 ms    67 ms    66 ms  2001:4998:f00a:201::1
  8    73 ms    79 ms    77 ms  2001:4998:f00f:3::
  9    72 ms    71 ms    71 ms  2001:4998:f00f:d::1
 10    72 ms    91 ms    72 ms  2001:4998:24:fe0c::1
 11    73 ms    71 ms    70 ms  2001:4998:24:fa07::1
 12    71 ms    71 ms    70 ms  2001:4998:24:d011::1
 13    72 ms    72 ms    72 ms  2001:4998:24:120d::1:0

  Trace complete.
LGTM ¯\_(ツ)_/¯


    $ traceroute yahoo.com --resolve-hostnames
    traceroute to yahoo.com (74.6.231.21), 64 hops max
      1   192.168.163.25 (_gateway)  27.468ms  1.662ms  3.452ms 
      2   *  *  * 
      3   172.24.100.177 (172.24.100.177)  53.072ms  60.815ms  79.800ms 
      4   172.24.194.169 (172.24.194.169)  139.138ms  55.275ms  20.311ms 
      5   172.24.194.170 (172.24.194.170)  19.617ms  51.354ms  26.870ms 
      6   *  *  * 
      7   185.6.36.58 (3ireland.ipv4.sw01.inex.ie)  58.357ms  55.352ms  32.625ms 
      8   185.6.36.77 (xe-2-3-0.pat1.iry.yahoo.com)  65.414ms  41.834ms  21.050ms 
      9   209.191.64.178 (ae-6.pat1.bfy.yahoo.com)  124.513ms  107.566ms  159.550ms 
     10   209.191.65.132 (unknown.yahoo.com)  156.093ms  129.892ms  126.091ms 
     11   209.191.64.214 (ae-7.pat2.nez.yahoo.com)  148.971ms  150.930ms  176.944ms 
     12   209.191.65.111 (et-18-1-0.msr2.ne1.yahoo.com)  162.165ms  144.948ms  136.803ms 
     13   98.138.97.23 (et-18-0-1.clr1-a-gdc.ne1.yahoo.com)  148.601ms  148.984ms  171.028ms 
     14   98.138.51.1 (lo0.fab2-2-gdc.ne1.yahoo.com)  128.665ms  131.716ms  193.848ms 
     15   98.138.97.157 (usw2-1-lbd.ne1.yahoo.com)  182.217ms  133.661ms  170.828ms 
     16   74.6.231.21 (media-router-fp74.prod.media.vip.ne1.yahoo.com)  161.739ms  139.523ms  205.784ms


    traceroute yahoo.com                    
    traceroute: Warning: yahoo.com has multiple addresses; using 54.161.105.65
    traceroute to yahoo.com (54.161.105.65), 64 hops max, 52 byte packets
    [redacted]
    6  ewr-sa1-i.ewr.us.net.dtag.de (62.154.5.250)  90.669 ms  89.555 ms  90.020 ms
    7  62.157.248.123 (62.157.248.123)  91.494 ms  90.158 ms  89.145 ms
    8  150.222.87.251 (150.222.87.251)  89.575 ms
        150.222.87.249 (150.222.87.249)  92.532 ms
        150.222.87.247 (150.222.87.247)  93.773 ms
    9  15.230.204.54 (15.230.204.54)  90.257 ms
        52.93.59.77 (52.93.59.77)  92.142 ms
        15.230.204.56 (15.230.204.56)  89.808 ms
    10  * * *
    11  * * *
    12  * * *
    13  * * *
    14  * * *
    15  * * *
    16  52.93.28.194 (52.93.28.194)  97.715 ms
        52.93.28.170 (52.93.28.170)  94.380 ms
        52.93.28.168 (52.93.28.168)  110.934 ms
    17  * * *
    18  * * *
    19  * * *
    20  * * *
    21  * * *
    22  * * *
    23  * * *
    24  * * *
    25  * * *
    26  * * *
    27  * * *
    28  * * *
    29  * * *
    30  * * *
    31  * * *
    32  * * *
    33  * * *
    34  * * *
    35  * * *
    36  * * *
    37  * * *
    38  * * *
    39  * * *
    40  * * *
    41  * * *
    42  * * *
    43  * * *
    44  * * *
    45  * * *
and still going... ¯\_(ツ)_/¯


Ah, another variant of "because I barely understand something, it's too complicated for anyone to use safely" argument. Which would be an OK argument if you were the smartest person on the planet. Or in charge of making sure nobody else ever does something wrong on the internet.

Quite negative article, makes what are good points in isolation like the fact that if L2 shenanigans are happening L3 routing tools can't properly characterize them. Ooookay, that's true, but what if you don't care and want to characterize what L3 routing is doing?

But it might not work! Oookay, that's true. Then we don't use it in this case. And so on.

Sounds like the problem is the author's experience was more with poor quality network "engineers" than tools that "aren't real" enough.

I suspect that I should not enjoy a Thanksgiving meal with this man. He may be more the bad uncle than he thinks he is.


Traceroute isn't real sounds exactly like Bumblebees could not fly

Just because it isn't 100% reliable doesn't mean it's useless. It's great when diagnosing if trouble lies inside your network or somewhere outside.


decades ago while i was at work one day i had problems reaching the university server where i still had my email. for some reason i used traceroute to diagnose the problem. (i suppose it told me that there was an issue connecting to the university network or something in between, but the server itself was probably still running)

so i ran traceroute in a loop at i don't know what interval watching for the server to come back. eventually i forgot about it. and since it was in a screen terminal, it kept running.

days (or weeks?) later my boss received an email from the university about some strange packets periodically arriving at the university from our office. my boss asked me about it and then i remembered my traceroute loop, switched to the screen terminal and turned it off.

i am/was impressed that they even noticed. and given how traceroute works as a hack, it also makes sense that they could not recognize that the packets were from a traceroute.


Traceroute, ping, mtr, telnet, nc, wireshark/tcpdump, iperf, arping, dig, nslookup… the list goes on of incredibly useful tools for diagnosing network and core infra issues. No one tool does everything, and no one expects them to.


ab, openssl -s_client, ssh -vvv, nmap, netstat, route... what else is missing?


ss, lsof, ifconfig, ip, tshark, mitmproxy, socat?



This article reminds me of being in discordandian and slack fidonet subs back in my bbs days.

That is to say very unhinged and entertaining.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: