Elsevier

Computer Networks

Volume 53, Issue 13, 28 August 2009, Pages 2340-2359
Computer Networks

Brahms: Byzantine resilient random membership sampling

https://doi.org/10.1016/j.comnet.2009.03.008Get rights and content

Abstract

We present Brahms, an algorithm for sampling random nodes in a large dynamic system prone to malicious behavior. Brahms stores small membership views at each node, and yet overcomes Byzantine attacks by a linear portion of the system. Brahms is composed of two components. The first is an attack-resilient gossip-based membership protocol. The second component extracts independent uniformly random node samples from the stream of node ids gossiped by the first. We evaluate Brahms using rigorous analysis, backed by simulations, which show that our theoretical model captures the protocol’s essentials. We study two representative attacks, and show that with high probability, an attacker cannot create a partition between correct nodes. We further prove that each node’s sample converges to an independent uniform one over time. To our knowledge, no such properties were proven for gossip protocols in the past.

Introduction

We consider the problem of sampling random nodes (sometimes called peers) in a large dynamic system subject to adversarial (Byzantine) attacks. Random node sampling is important for many scalable dynamic applications, including neighbor selection in constructing and maintaining overlay networks [23], [32], [35], [37], selection of communication partners in gossip-based protocols [13], [18], [21], data sampling, and choosing locations for data caching, e.g., in unstructured peer-to-peer networks [34].
Typically, in such applications, each node maintains a set of random node ids that is asymptotically smaller than the system size. This set is called the node’s local view. We consider a dynamic system, subject to churn, whereby the set of active nodes changes over time. Local views in such a system must continuously evolve to incorporate new active nodes and to remove ones that are no longer active. By using small local views, the maintenance overhead is kept small. In the absence of malicious behavior, small local views can be effectively maintained with gossip-based membership protocols [1], [21], [22], [26], [43], which were proven to have a low probability for partitions, including under churn [1].
Nevertheless, adversarial attacks present a major challenge for small local views. Previous Byzantine-tolerant gossip protocols either considered static settings where the full membership is known to all [19], [33], [39], or maintained (almost) full local views [9], [28] (i.e., views that include all the nodes in the system), where faulty nodes cannot push correct ones out of the view (please see Section 2 for more detailed discussion of previous work). In contrast, small local views are susceptible to poisoning with entries (node ids) originating from faulty nodes; this is because in a dynamic system, nodes must inherently accept new ids and store them in place of old ones in their local views. In Section 3, we illustrate that traditional gossip-based membership is highly vulnerable to adversary attacks, which can quickly poison the entire views of correct nodes.
It is even more challenging to provide independent uniform samples in such a setting. Even without Byzantine failures, gossip-based membership only ensures that eventually the average representation of nodes in local views is uniform [1], [22], [26], and not that every node obtains an independent uniform random sample. Faulty nodes may attempt to skew the system-wide distribution, as well as the individual local view of a given node.
This paper addresses these challenges. In Section 4, we present Brahms, a membership service that stores a sub-linear number of ids (e.g., Θn3 in a system of size n) at each node, and provides each node with independent random node samples that converge to uniform ones over time. The main ideas behind Brahms are (1) to use gossip-based membership with some extra defenses to make it viable (in the sense that local views are not solely composed of faulty ids) in an adversarial setting; (2) to recognize that such a solution is susceptible to attacks that may bias the views, i.e., cause certain nodes to be over-represented in views while others are under-represented (we precisely quantify the extent of this bias mathematically); and (3) to correct this bias at each node. Specifically, each node maintains, in addition to the gossip-based local view, an unbiased sample list of nodes.
To achieve the latter, we introduce Sampler, a component that obtains uniform samples out of a data stream in which elements recur with an unknown bias. Sampler uses min-wise independent permutations [14], and stores one element of the stream at a time. In Brahms, the data stream is comprised of gossiped ids, from which Samplers obtain independent uniformly random id samples, and store them in the sample list. By using such history samples from the sample list to update part of the local view, Brahms achieves self-healing from partitions that may occur with gossip-based membership. In particular, nodes that have been active for sufficiently long (we quantify how long) cannot be isolated from the rest of the system, with high probability. The use of history samples is an example of amplification, whereby even a small healthy sample of the past can boost the resilience of a constantly evolving view. We note that only a small portion of the view is updated with history samples, e.g., 10%. Therefore, the protocol can still deal effectively with churn.
In Section 5, we define the attacker’s goals and the corresponding attack strategies, under which we evaluate Brahms. We consider two possible goals for an attacker. First, we study attacks that attempt to maximize the representation of faulty ids in local views at any given time. This goal is achieved by a uniform attack, whereby the attacker equally divides its power among all correct nodes. Second, we consider an attacker that aims to partition the network. The easiest way to do so is by isolating one node from the rest [1]. Since samples help prevent isolation, we analyze the most adverse circumstances, where an attack is launched on a new node that joins the system when its samples are still empty, and when it does not yet appear in views or samples of other nodes. We further assume that such a _targeted attack on the new node occurs in tandem with an attack on the entire system, as described above.
One of the important contributions of this paper is our mathematical analysis, which provides insights to the extent of damage that an attacker can cause and the effectiveness of various mechanisms for dealing with them. Extensive simulations of Brahms with up to 4000 nodes validate the few simplifying assumptions made in the analysis. We first show (in Section 6) that whenever the set of nodes remains connected, the sample lists converge to independent uniformly random selections from among all nodes. We further show that if views are of size Ωn3, then the convergence rate is bounded independently of the system size. Section 7 then analyzes the local views generated by the gossip process and shows that under certain circumstances, they preserve the connectivity required for uniform samples.
Specifically, for the attack goal of maximizing the representation of faulty ids (Section 7.1), we show that under certain conditions on the adversary, even without using history samples, the portion of faulty ids in local views generated by Brahms’s gossip process is bounded by a constant smaller than one. (Recall that the over-representation of faulty ids is later fixed by Sampler; the upper bound on faulty ids in local views ensures Sampler has good ids to work with.)
Next, we consider the goal of isolating a node (Section 7.2). The key to proving that Brahms prevents, with high probability, an attacked node’s isolation is in comparing how long it takes for two competing processes to complete: on the one hand, we provide a lower bound on the expected time to poison the entire view of the attacked node, assuming there are no history samples at all. On the other hand, we provide an upper bound on how fast history samples are expected to converge, under the same attack. Whenever the former exceeds the latter, the attacked node is expected to become immune to isolation before it is isolated. We prove that with appropriate parameter settings, this is indeed the case.
Finally, we simulate the complete system (Section 8), and measure Brahms’s resilience to the combination of both attacks. Our results show that, indeed, Brahms prevents the isolation of attacked nodes, its views never partition, and the membership samples converge to perfectly random ones over time.

Section snippets

Related work

We are not familiar with any previous work explicitly dealing with random node sampling in a Byzantine setting. We next review previous work on Byzantine membership (Section 2.1), node sampling and sampling from data streams in benign settings (Section 2.2), and on the related problem of Byzantine-resilient overlay construction (Section 2.3).

Model, goal, and challenges

We describe the system model, outline our design goal, and illustrate the challenges in achieving it.

The Brahms protocol

Brahms has two components. The local sampling component maintains a sample list S – a tuple of uniform samples from the set of ids that traversed the node (Section 4.1). The gossip component is a distributed protocol that spreads ids across the network (Section 4.2), and maintains a dynamic view V. We denote the size of V by 1 and the size of S by 2. Each node has some initial V (e.g., received from some bootstrap server or peer node). V and S may contain duplicates, and some entries in S may

Analysis structure

In this section, we first present the definitions and the assumptions used in the analysis of our protocol, and then discuss the attack models and analysis structure.

Analysis – sampling

In this section we analyze the properties of a sample Su of a correct node u. Let s=Su[i] be a sampler element for some correct u and some i. Recall that s employs a permutation s.h, chosen independently at random. Let s(t) denote the output of s at time t. We define the perfect id corresponding to s,s, to be the id with the minimal value of s.h among all n ids (we neglect collisions for the sake of the definition). Note that s can be either a correct or a faulty id. In Section 6.1 we show

Analysis – overlay connectivity

We now prove that Brahms, with appropriate parameter settings, maintains overlay connectivity despite the attacks defined in Section 5, satisfying the prerequisite for Theorem 6.1.
We study two possible adversary _targets. The first _target, addressed in Section 7.1, is increasing the global representation of faulty ids. We prove that in any single round, a balanced attack, which spreads faulty pushes evenly among correct nodes, maximizes the expected system-wide fraction of faulty ids at the end

Putting it all together

In previous sections we analyzed each of Brahms’s mechanisms separately. We now simulate the entire system. Fig. 10 depicts the degree of node u in N(t) under a _targeted attack. Node u remains connected to the overlay, thanks to history samples (γ=0.1). The actual degree of u in N(t) is higher than the lower bound shown in Section 7.2, due to the pessimistic assumptions made in the analysis (no history samples, no imperfect correct ids, etc.).
We now demonstrate the convergence of S in the

Conclusions

We presented Brahms, a Byzantine-resilient membership sampling algorithm. Brahms stores small views, and yet resists the failure of a linear portion of the nodes. It ensures that every node’s sample converges to a uniform one, which was not achieved before by gossip-based membership even in benign settings. We presented extensive analysis and simulations explaining the impact of various attacks on the membership, as well as the effectiveness of the different mechanisms Brahms employs.

Acknowledgements

We thank Christian Cachin and Udi Wieder for their valuable suggestions that helped to improve our paper. We are grateful to Roie Melamed and Igor Yanover for stimulating discussions of a random walk overlay-based solution.
Edward Bortnikov is a researcher at Yahoo! Research Israel. He holds the Ph.D. degree in Electrical Engineering (2008) and the M.Sc. (1998) and B.A. (1995, summa cum laude) degrees in Computer Science from the Technion, Israel Institute of Technology. His research interests broadly span networking technologies, and distributed computing, and large-scale information processing. Dr. Bortnikov authored multiple papers and US patents, and received many awards for excellence in research. He has a

References (44)

  • A.Z. Broder et al.

    Min-wise independent permutations

    J. Comput. Syst. Sci.

    (2000)
  • A. Allavena, A. Demers, J.E. Hopcroft, Correctness of a gossip based membership protocol, in: ACM PODC, 2005, pp....
  • N. Alon, Y. Matias, M. Szegedy, The space complexity of approximating the frequency moments, in: Proc. of the 28th...
  • H. Attiya et al.

    Distributed Computing Fundamentals, Simulations, and Advanced Topics

    (2004)
  • B. Awerbuch, C. Scheideler, Group spreading: a protocol for provably secure distributed name service, in: ICALP, 2004,...
  • B. Awerbuch, C. Scheideler, Robust random number generation for peer-to-peer systems, in: OPODIS, 2006, pp....
  • B. Awerbuch, C. Scheideler, Towards a scalable and robust DHT, in: SPAA, 2006, pp....
  • B. Awerbuch, C. Scheideler, Towards scalable and robust overlay networks, in: IPTPS,...
  • B. Babcock, M. Datar, R. Motwani, Sampling from a moving window over streaming data, in: Proc. of the 13th annual...
  • G. Badishi, I. Keidar, A. Sasson, Exposing and eliminating vulnerabilities to denial of service attacks in secure...
  • Z. Bar-Yossef, R. Friedman, G. Kliot, RaWMS – random walk based lightweight membership service for wireless ad hoc...
  • Z. Bar-Yossef, M. Gurevich, Random sampling from a search engine’s index, in: Proc. of 15th WWW, 2006, pp. 367–376,...
  • Z. Bar-Yossef, T.S. Jayram, R. Kumar, D. Sivakumar, L. Trevisan, Counting distinct elements in a data stream, in: Proc....
  • K.P. Birman et al.

    Bimodal multicast

    ACM Trans. Comput. Syst.

    (1999)
  • M. Castro, P. Druschel, A.J. Ganesh, A.I.T. Rowstron, D.S. Wallach, Secure routing for structured peer-to-peer overlay...
  • T. Condie, V. Kacholia, S. Sankararaman, J. Hellerstein, P. Maniatis, Induced churn as shelter from routing-table...
  • M. Datar et al.

    Maintaining Stream Statistics over Sliding Windows

    SIAM J. Comput.

    (2002)
  • A. Demers, D. Greene, C. Hauser, W. Irish, J. Larson, S. Shenker, H. Sturgis, D. Swinehart, D. Terry. epidemic...
  • D. Malkhi, Y. Mansour, M. K. Reiter, On diffusing updates in a Byzantine environment, in: SRDS, 1999, pp....
  • J.R. Douceur, The Sybil attack, in: Proc. of the First International Workshop on Peer-to-Peer Systems (IPTPS), 2002,...
  • P.Th. Eugster et al.

    Lightweight probabilistic broadcast

    ACM Trans. Comput. Syst. (TOCS)

    (2003)
  • A.J. Ganesh et al.

    Peer-to-Peer membership management for gossip-based protocols

    IEEE Trans. Comput.

    (2003)
  • Cited by (50)

    View all citing articles on Scopus
    Edward Bortnikov is a researcher at Yahoo! Research Israel. He holds the Ph.D. degree in Electrical Engineering (2008) and the M.Sc. (1998) and B.A. (1995, summa cum laude) degrees in Computer Science from the Technion, Israel Institute of Technology. His research interests broadly span networking technologies, and distributed computing, and large-scale information processing. Dr. Bortnikov authored multiple papers and US patents, and received many awards for excellence in research. He has a seven-year track record in many technical leadership positions in the software industry.
    Maxim Gurevich received the B.Sc. (2001, cum laude) and M.Sc. (2006) degrees from the Department of Electrical Engineering of the Technion, Israel Institute of Technology. He is currently a Ph.D. student with the Department of Electrical Engineering at the Technion. His research interests include search engine mining and distributed and P2P systems. He is a recipient of the Levi Eshkol Fellowship awarded by the Israeli Ministry of Science.
    Idit Keidar is a faculty member at the Department of Electrical Engineering at Technion. She holds Ph.D., M.Sc. (summa cum laude), and B.Sc. (summa cum laude) degrees from the Hebrew University of Jerusalem. She was a postdoctoral research associate at MIT’s Laboratory for Computer Science, where she held post-doctoral fellowships from Rothschild Yad-Hanadiv and NSF CISE. Her research interests include distributed computing, fault tolerance, and concurrency.
    Gabriel Kliot currently works in Microsoft Research, Redmond, WA. He holds the Ph.D. degree in Computer Science (2009) and the BA (2003, cum laude) degree in Computer Science from the Technion, Israel Institute of Technology.
    Gabriel’s Ph.D. dissertation focused on Probabilistic Middleware Services for Wireless Mobile Ad-Hoc Networks. His research interests include various aspects of distributed systems and computer networks, such as distributed middlewares, fault tolerance, large scale (P2P) systems and mobile ad hoc networks. He was awarded the Wolf fund excellence award for graduate students and he is a recipient of Israel Internet Association scholarship for Ph.D. students.
    Alexander Shraer received the B.Sc. (summa cum laude) and M.Sc. (cum laude) degrees in computer science from the Technion, Haifa, Israel, in 2004 and 2006, respectively. He is currently a Ph.D. student in the Department of Electrical Engineering, Technion. His research focuses on fault tolerance in distributed systems. He is a recipient of the Levi Eshkol Fellowship, awarded by the Israeli Ministry of Science.
    A shorter preliminary version of this paper appeared in the 27th ACM Symposium on Principles of Distributed Computing (PODC), August 2008.
    1
    The work was done when the author was with the Department of Electrical Engineering, The Technion – Israel Institute of Technology, 32000 Haifa, Israel.
    2
    The work was done when the author was with the Department of Computer Science, The Technion – Israel Institute of Technology, 32000 Haifa, Israel.
    View full text