On IRC bootstrapping

16 messages BitcoinTalk soultcer, Satoshi Nakamoto, theymos, The Madhatter, Cdecker, Laszlo Hanyecz, lachesis, blanu, DataWraith, Vasiliev, Martti Malmi March 15, 2010 — July 7, 2010

soultcer March 15, 2010 Source · Permalink

A week or so ago I met a very nice Freenoder staffer in the #bitcoin and #bitcoin-dev channels. He told me that the #bitcoin channel turned up on Freenode’s radar as it looks like a Botnet Command and Control channel, but after I explained to him how Bitcoin works and why they need IRC, he said that the channel at it’s current size is not a problem.

However, this got me thinking and later this week I also discussed the topic on IRC, and I came to the conclusion, that IRC is the wrong method for bootstrapping, especially in it’s current form. At the moment, each client will connect to IRC and stay connected. Using /who and join messages, the client will connect to the found IPs on port 8333 as a bootstrapping method. However, the clients internally also talk to each other and broadcast new nodes via the Bitcoin protocol. Still, they are always online in IRC. This has various disadvantages:

IRC connectivity is necessary for bootstrapping (firewalls often block it and Freenode blocks TOR)There is a single point of failure (Freenode)We are leeching Freenode’s services instead of using our own infrastructure. Many servers actually disallow bot connections in their MOTDs.Minor point: The additional protocol inside Bitcoin brings extra complexity There is already a list of permanently-on Bitcoin IPs around in this forum, which is a nice idea, but not very scalable, thus I propose the following solution: Gnutella and MUTE face very similar bootstrapping problems. To solve them, they rely on a list of “Gnutella Webcaches”. Those webcaches are run by volunteers on simple PHP servers and a master list of them is distributed with each Gnutella/MUTE release. When a client wants to join the network, it asks one or two of the Webcaches via HTTP for a list of other nodes and also gets added to that list (which is usually a list of the last X clients seen). Every few hours (or days) a running client reconnects to the webcache to tell it, that it is still alive and does not have to be deleted from the list. I suggest, that the same thing is implemented for Bitcoin. Volunteers could run those webcaches on cheap PHP webspace and tell their URL to Satoshi or Sirius, who in turn could add the list to each release. This would allow users running behind a restrictive firewall or TOR to use Bitcoin without manually finding other nodes, and it is a much more scalable approach. (As a bonus we could remove those HTTP calls to whatismyip.com or similar sites).

Of course, there might be a better idea for bootstrapping Bitcoin and I would love to hear it. Or maybe suggestions for the Webcache idea. Please post them here!

Cheers, soultcer

theymos March 16, 2010 Source · Permalink

To eliminate segmentation, every peer should have a list of every other peer. Tor also has this requirement, so we should copy them: have several trusted directory servers periodically create a signed list of all peers and publish it via HTTP. All BitCoin clients have the option to act as a directory mirror, which will be indicated in the dirServers’ list. Generators need to ask to be added to the list (which could also include info like the maximum number of connections that peer will accept), but people just wishing to make transactions can just get the list from a dirMirror and connect to a few random peers.

If this is too centralized, we can do what I2P did and allow anyone to become a directory server. You need to be able to detect when a dirServer goes rogue in this case, though.

The Madhatter March 16, 2010 Source · Permalink

I vote for the I2P method, myself. It works great.

Satoshi Nakamoto March 16, 2010 Source · Permalink

Thanks soultcer for talking with the Freenode staffer. Good to know it’s OK at the current size, and now they know who we are. They’re supportive of projects like TOR so I hope they would probably be friendly to us. We don’t want to overstay our welcome. If we get too big, then by the same token, we’re big enough that we don’t need IRC anymore and we’ll get off.

We only needed IRC because nobody had a static IP. In the early days there were some steady supporters, but they all had pool-allocated IPs that change every few days. IRC was only intended as a temporary solution. Bitcoin’s built-in addr system is the main solution.

Bitcoin can get the list of IPs from any bitcoin node. In that sense, every node serves as a directory server.

When there are enough static IP nodes to have a good chance that at least one will still be running by the time the current version goes out of use, we can preprogram a seed list.

How do you think we should compile the seed list? Would it be OK to create it from the currently connected IPs that have been static for a while?

BTW, if we want to supplement by deploying separate directory server software, may I suggest IRC? IRC is a good directory server (I’ve heard it has other uses too), and there are mature IRC server implementations available that anyone can run. Bitcoin’s IRC client implementation is already thoroughly tested.

Cdecker June 8, 2010 Source · Permalink

Anyone into a set of multi entity owned DNS servers that cooperate for a Fast Flux [1] network to ensure bootstrap availability?

[1] http://en.wikipedia.org/wiki/Fast_flux

Laszlo Hanyecz (laszlo) June 8, 2010 Source · Permalink

What’s wrong with IRC? It’s just another method that’s used to exchange the peer list. You can just prevent it from connecting and use -addnode=1.2.3.4 to connect to a known node for bootstrapping if you want..

If nodes disconnect from IRC after they get their list then it makes it less useful for bootstrapping since it will be empty except for the node that’s trying to bootstrap at the time.

IRC has been around forever and it’s well documented (and easy to understand for newcomers) - why create something more complicated?

lachesis June 8, 2010 Source · Permalink

I like the idea of distributed host caches like Gnutella uses. At the moment, for the majority of people, IRC is a single point of failure. Let’s assume that for some reason our Freenode channel was gone. Maybe Freenode got fed up and shut us down. Maybe MenInBlack saw our system, laughed maniacally, and then pressured Freenode to shut us down.

When you start your client, it will do nothing. You could drop to a command line and type “-addnode” (or is it -peer? whatever) to connect to a known node, but at that point you’d somehow need to know a node. It probably wouldn’t be that hard for one of us, but what about a new user? We could keep a list of peers on the website for them to use, but at that point, they’ve gone from “just double click the shiny gold coin and get trading” to “check our website for updated peer lists, open command prompt, navigated to the bitcoin directory, and type the proper peer…” And if MIB were after us, the website would probably be long since gone.

Of course, we could implement addpeer in a more user-friendly manner. Perhaps a popup that says “I can’t connect to the network. Enter a peer: ” with instructions on some ways to find one, but at that point we’re creating a social solution to a technical problem.

Also, if we get bigger we will need to move away from IRC anyway (as implied by the OP’s conversation with a Freenode staffer). And what about Tor users? Why should people who want to use Tor to be anonymous have to manually add a peer?

Finally, anyone on Freenode can easily get a list of all running Bitcoin clients, when they came online, when they went offline, etc. That goes against the project’s stated goal of anonymity. Of course, with a host cache system, anybody who connected to that cache could be logged by the server operator, but no one operator would have a full picture of the network.

I think the IRC solution is a wonderful beginning, and I applaud how stable it has proven to be. It was a great decision to get the network up easily and concentrate on the more interesting and important considerations in the program. I just think that Bitcoin will outgrow it someday, if it hasn’t already.

Satoshi Nakamoto June 14, 2010 Source · Permalink

Bitcoin has its own distributed address directory using the “addr” message. It’s about time we coded in a list of the current long running static nodes to seed from. I can add code so new nodes do not preferentially stay connected to the seed nodes, just connect and get the list, so it won’t be a burden on them.

What do you think, should I go ahead with adding the seeds?

It’ll still try IRC first. The IRC has the advantage that it lists nodes that are currently online, since they have to stay connected to stay on the list, but the disadvantage that it’s a single point of failure. The “addr” system has no single point of failure, but can only tell you what nodes have recently been seen, so it takes a little longer to get connected since some of the nodes you try have gone offline. The combination of the two gets us the best of both worlds and more total robustness.

Is there anyone who wants to volunteer to run an IRC server in case freenode gets tired of us?

Laszlo Hanyecz (laszlo) June 14, 2010 Source · Permalink

I run an IRC server you can use, it’s fairly stable but it’s not on redundant connections or anything. It is only two servers right now but we don’t mess with it or anything, it just runs.

My box is a dedicated irc server: 2:28PM up 838 days, 20:54, 1 user, load averages: 0.06, 0.08, 0.08

You can use irc.lfnet.org to connect.

I hang out on #linuxos if anyone wants to drop in.

blanu June 18, 2010 Source · Permalink

This is a common problem in P2P, known as Original Introduction, although bootstrapping is also a good word for it. The problem with bootstrapping is that you can’t decentralize it. Whether it’s IRC or HTTP or DNS, the client needs to be hardcoded with an address or list of addresses which is sufficiently fresh that at least one of the listed addresses is still active. After the first node is reached, you are no longer in Original Introduction mode and can use the full range of techniques for decentralization, such as gossip. Unless, of course, you get disconnected from the network and all of your known peers go away, in which case you’re back to bootstrapping.

There are two properties that are at odds when you chose a bootstrapping method: robustness (scalability/reliability) and freshness. Robustness is increased at the expense of freshness by caching on multiple servers, as is usually done with HTTP peer lists. Freshness is maximized (at least up to the TCP timeout) at the expense of robustness by having everyone connected, as with IRC. Of course, the key is finding the right mix of robustness and freshness because you need both for the bootstrap to be successful.

Here are some of my current favorite methods for bootstrapping:

Append list of fresh peers to executable or installer dynamically on download. People usually get the application from its official website, so the website is already a point of failure for new users. You’re already hardcoding an address in the application, the address that the application will use to bootstrap. So instead just add fresh peers at the moment of download. You need some fancy code in the executable to read the list off the end, but I’ve implemented this in an NSIS installer and it’s not that hard. Most software developers are upset by the idea of this method.

Connect via XMPP to Google App Engine application. This gives the freshness of IRC, but with more robust scaling. App Engine is mostly for writing web apps, but it provides email and XMPP handling as well. It would be simple to write one application that could handle peer lists via either XMPP or HTTP with the same handler code. I’m currently using this in an application and it works well and is very reliable. I only wish there was a second App Engine to use as a fallback because it does have occasional downtime.

An alternative to requiring all nodes to include the complexity of a protocol like IRC or XMPP is to have a few special sentinel nodes which sit on the network and collect addresses of connected nodes via the usual decentralized methods available to an active node. These sentinel nodes periodically upload fresh addresses, say via HTTP POST to a number of websites. A new node can then download a fresh address list from any of the websites which is currently functioning and reachable. If you have 5 sentinels each uploading every 5 minutes (staggered), then you’ll have updates roughly once a minute. This is on par with IRC in terms of freshness and is robust as you care to make it by varying the number of HTTP mirrors and the number of sentinels.

DataWraith June 18, 2010 Source · Permalink

I think the way eMule handles bootstrapping for its KAD-network is pretty close to optimal:

The list of known peers is stored in a file (nodes.dat), and every client maintains a list of known nodes in that file (sorted by longest uptime, I think — that’s an intrinsic property of Kademlia, but still a good idea). The released client should be accompanied by such a file that contains the addresses of a few reliable peers on static IP addresses, from which a new client can then get more addresses to connect to (and hence store in its own file).

If the “seed list” gets out of date, or the server is shut down or something, you can just ask anyone in the network to publish his nodes-file (on rapidshare, say), and voila, you’ve got a fresh list of IPs you can connect to.

Satoshi Nakamoto June 18, 2010 Source · Permalink

The SVN version now uses IRC first and if that fails it falls back to a hardcoded list of seed nodes. There are enough seed nodes now that many of them should still be up by the time of the next release. It only briefly connects to a seed node to get the address list and then disconnects, so your connections drop back to zero for while. At that point, be patient. It’s only slow to get connected the first time.

This means TOR users won’t need to -addnode anymore, it’ll get connected automatically.

Satoshi Nakamoto June 25, 2010 Source · Permalink

Quote from: laszlo on June 14, 2010, 06:30:58 PM

I run an IRC server you can use, it’s fairly stable but it’s not on redundant connections or anything. It is only two servers right now but we don’t mess with it or anything, it just runs.

My box is a dedicated irc server: 2:28PM up 838 days, 20:54, 1 user, load averages: 0.06, 0.08, 0.08

You can use irc.lfnet.org to connect. This seems like a good idea.

What does everyone think, should we make the switch for 0.3?

Vasiliev June 25, 2010 Source · Permalink

You may want to leave Freenode in as a fallback server — if his server doesn’t work, use Freenode’s.

Martti Malmi (sirius) July 7, 2010 Source · Permalink

Maybe we should have an option dialog that allows you to choose the IRC server and channel you connect to?

Satoshi Nakamoto July 7, 2010 Source · Permalink

Everybody needs to connect to the same IRC server and channel so they can find each other.

Quote from: Vasiliev on June 25, 2010, 11:50:15 PM

You may want to leave Freenode in as a fallback server — if his server doesn’t work, use Freenode’s.

It might not be good if we suddenly rushed freenode with a ton of users all at once.

The fallback is our own seed system.

irc.lfnet.org is pretty old and has impressive uptime. I think it’s going to be fine.

We could take IRC out at some point if we want, but I’d rather ease into it and just test our own seed system as a backup for now, and I really like the complementary redundant attributes of the two different systems.