[VulnWatch] [PAPER] Juggling with packets: floating data storage

From: Wojciech Purczynski (cliph@private)
Date: Mon Oct 06 2003 - 01:59:54 PDT

  • Next message: Frog Man: "[VulnWatch] myPHPCalendar : Informations Disclosure, File Include"

      The following paper explores the possibilities of using certain
      properties of the Internet or any other large network to create 
      a reliable, volatile distributed data storage of a large capacity.
    
    ==============================================
     Juggling with packets: floating data storage
    ==============================================
    
      "Your dungeon is built on an incline. Angry monsters can't play
      marbles!"
    
      Wojciech Purczynski <cliph@private>
      Michal Zalewski <lcamtuf@private>
    
    
    1) Juggle with oranges!
    ------------------------
    
      Most of us, the authors of this paper, have attempted to juggle with
      three or more apples, oranges, or other fragile ballistic objects.
      The effect is usually rather pathetic, but most adept juggler padawans
      sooner or later learn to do it without inflicting excessive collateral
      damage.
    
      A particularly bright juggler trainee may notice that, as long as he
      continues to follow a simple procedure, at least one of the objects is
      in the air at all times, and that he has to hold at most two objects
      in his hands at once. Yet, each and every apple goes thru his hands
      every once in a while, and it is possible for him to recover it at
      will.
    
      He may then decide the entire process is extremely boring and go back
      to his computer. And while checking his mail, an educated juggler may
      notice that a typical network service has but one duty: to accept and
      process data coming from a remote system, and take whatever steps it
      deems appropriate based on its interpretation of the data. Many of
      those services do their best to behave robustly and be fault-tolerant
      - and, why not, to supply useful feedback about the transaction.
    
      In some cases, the mere fact a service is attempting to process the
      data and reply conforming to the protocol can be used in the ways
      authors never dreamed of. One of more spectacular examples, which our
      fellow juggler may be familiar with, is a research done at the
      University of Notre Dame, titled "Parasitic Computing" and published
      in letters to "Nature". The researchers have attempted to use TCP/IP
      checksum generation algorithm to perform boolean operations necessary
      to solve non-polynomial problems.
    
      Nevertheless, such attempts were quite impractical in the real world,
      concludes our hero; the cost of preparing and delivering a trivia to
      be solved far exceeds any eventual gain - the sender has to perform
      operations of comparable computational complexity to even deliver the
      request. The computing power of such a device is puny! - we hear him
      saying.
    
      A real juggler would focus on a different kind of outsourced data
      processing, one that is much closer to his domain of expertise. Why,
      distributed parasitic data storage, of course. - What if I write
      a single letter on every orange, and then start juggling? I can then
      store more orange-bytes than my physical capacity (the number of
      oranges I can hold in my hands) is! How brilliant... But, but, would
      it work without oranges?
    
    
    2) The same, without oranges
    -----------------------------
    
      The basic premise for this paper is the observation that for all
      network communications, there is a non-zero (and often considerable)
      delay between the act of sending an information and receiving a reply.
      The effect should be contributed to the physical constrains of the
      medium, and to the data processing times in all computer equipment.
    
      A packet storing a piece of data, just like an orange with a message
      written on it, once pushed travels for a period of time before coming
      back to the source - and for this period of time, we can safely 
      forget its message without losing data.
    
      As such, the Internet has a non-zero momentary data storage capacity.
      It is possible to push out a piece of information and effectively have
      it stored until echoed back. By establishing a mechanism for cyclic
      transmission and reception of chunks of data to and from a number of
      remote hosts, it is possible to maintain an arbitrary amount of data
      constantly `on the wire', thus establishing a high-capacity volatile
      medium.
    
      The medium can be used for memory-expensive operations, as a regular
      storage, or for handling certain types of sensitive data that are
      expected not to leave a physical trail on a hard disk or other
      non-volatile media.
    
      Since it is not considered a bad programming practice to send back as
      much relevant information as the sender had sent to the service, and
      many services or stacks maintain a high level of verbosity, based on
      our juggling experience, we conclude it is not only possible, but also
      feasible to establish this kind of storage, even over a low-end
      network hook-up.
    
      As opposed to traditional methods of parasitic data storage (P2P
      abuse, open FTP servers, binary Usenet postings), the use of this
      medium may or may not leave a trail of data (depending on a solution
      used) and specifically does not put any single system under
      a noticable load. The likehood of this technique being detected,
      considered an abuse, and the data being removed, is much lower.
    
    
    3) Class A data storage: memory buffers
    ----------------------------------------
    
      Class A data storage uses the capacity inherent to communication
      delays during the transmission and processing of live data. The
      information remains cached in the memory of a remote machine, and is
      not likely to be swapped out to a disk device.
    
      Examples rely on sending a message that is known to result in
      partial or full echo of the original request, and include:
    
        - SYN+ACK, RST+ACK responses to SYN packets and other bounces,
    
        - ICMP echo replies,
    
        - DNS lookup responses and cache data. It is possible to store some
          information in a lookup request and have it bounced back with
          a NXDomain reply; or to store data in NS cache,
    
        - Cross-server chat network message relaying. Relaying text messages
          across IRC servers and so on can exhibit considerable latency,
    
        - HTTP, FTP, web proxy or SMTP error or status replies,
    
      Most important properties of class A storage are:
    
        - Low latency (miliseconds to minutes), making it more useful
          for near random access memory applications,
    
        - Lower per-system capacity (usually kilobytes), making it less
          suitable for massive storage,
    
        - Only one chance to receive, or few retransmits, making it
          less reliable in case of a network failure,
    
        - Data not likely to be stored on a non-volatile medium or swapped
          out, increasing privacy and deniability.
     
      In particular, when using higher level protocols, additional features
      appear which may solve some of above disadvantages. It is possible to
      establish a connection to a service such as SMTP, FTP, HTTP or any
      other text-based service, and send a command that is known to result
      in an acknowledgment or error message being echoed along with part of
      the original data. We do not, however, send full formatted message,
      leaving some necessary characters unsent. In most cases end-of-line
      characters are required in order the command to be completed. In this
      state, our data is already stored on remote service waiting for
      a complete command or until connection timeout occurs.
    
      To prevent timeouts, either on TCP or application level, no-op packets
      need to be sent periodically. \0 character interpreted as an empty
      string has no effect on many services but is sufficient to reset TCP
      and service timeout timers. A prominent example of an application
      vulnerable to this attack is Microsoft Exchange.
    
      The attacker can sustain the connection for an arbitrary amount of
      time, with a piece of data already stored at the other end. To recover
      the information, the command must be completed with missing \r\n and
      then response is send to the client.
    
      A good example is SMTP VRFY command:
    
      220 inet-imc-01.redmond.corp.microsoft.com Microsoft.com ESMTP Server
      Thu, 2 Oct 2003 15:13:22 -0700
      VRFY AAAA...
      252 2.1.5 Cannot VRFY user, but will take message for 
      <AAAA...@microsoft.com>
    
      It is possible to store just over 300 bytes, including non-printable
      characters, this way - and have it available almost instantly.
    
      More data may be stored if HTTP TRACE method is used with data passed
      in arbitrary HTTP headers, depending on the server software.
    
      Sustained connections may give us arbitrarily high latency, thus
      making large storage capacity.
    
      This type of storage is naturally more suited for privacy-critical
      applications or low-latency lower to medium capacity storage
      (immediate RAM-extending storage for information that should leave no
      visible traces).
    
      The storage is not suitable for critical data that should be preserved
      at all costs, due to the risk of data being lost on network failure.
    
    
    4) Class B data storage: disk queues
    -------------------------------------
    
      Class B data storage uses "idle" data queues that are used to store
      information for an extended period of time (often on the disk).
      Particularly on MTA systems mail messages are queued for up to 7 days
      (or more, depending on the configuration). This feature may give us
      a long delay between sending data to store on remote host and 
      receiving it back.
      
      As a typical SMTP server prevents to relay mail from the client to
      himself, mail bounces may be used to get data returned after long
      period of time.
    
      An example attack scenario:
    
      1. The user builds a list of SMTP servers (perhaps ones that provide
         a reasonable expectation of being beyond the reach of his foes),
    
      2. The user blocks (with block/drop, not reject) all incoming
         connections to his port 25.
    
      3. For each server, the attacker has to confirm its delivery timeouts
         and the IP from which the server connects back while trying to
         return a bounce. This is done by sending an appropriate probe to
         an address local to the server (or requesting a DSN notification
         for a valid address) and checking how long the server tries to
         connect back before giving up. The server does not have to be an 
         open relay.
    
      4. After confirming targets, the attacker starts sending data at
         a pace chosen so that the process is spread evenly over the
         period of one week. The data should be divided so that there
         is one chunk per each server. Every chunk is sent to a separate
         server to immediately generate a bounce back to the sender.
    
      5. The process of maintaining the data boils down to accepting
         an incoming connection and receiving the return at most a week
         from the initial submission, just before the entry is about
         to be removed from the queue. This is done by allowing this
         particular server to go thru the firewall. Immediately after
         receiving a chunk, it is relayed back.
    
      6. To access any portion of data, the attacker has to look up which
         MTA is holding this specific block, then allow this IP to connect
         and deliver the bounce. There are three possible scenarios:
    
         - If the remote MTA supports ETRN command, the delivery can be
           induced immediately,
    
         - If the remote MTA was in the middle of a three-minute run in
           attempt to connect to a local system (keeps retrying thanks to
           the fact its SYN packets are dropped, not rejected with RST+ACK),
           the connection can be established in a matter of seconds,
    
         - Otherwise, it is necessary to wait between 5 minutes and 
           an hour, depending on queue settings.
    
      This scheme can be enhanced using DNS names instead of IPs for users
      on dynamic IP or to provide an additional protection (or when it is
      necessary to cut the chain immediately).
    
      Important properties of class B storage:
    
        - High per-system capacity (megabytes), making it a perfect solution
          for storing large files and so on,
    
        - Higher access latency (minutes to hours), likening it to a tape
          device, not RAM (with exceptions of SMTP hosts that accept ETRN
          command to immediately re-attempt delivery),
    
        - Very long lifetime, increasing per-user capacity and reliability,
    
        - Plenty of delivery attempts, making it easy to recover the data
          even after temporary network or hardware problems,
    
        - Likely to leave a trace on the storage devices, making it a less
          useful solution for fully deniable storage (although it would
          still require examining a number of foreign systems, which does
          not have to be feasible).
    
      Class B storage is suitable for storing regular file archives,
      large append-only buffers, encrypted resources (with a proper
      selection of hosts, it remains practically deniable), etc.
    
    
    5) Discreet class A storage
    ----------------------------
    
      In certain situations, it might be necessary to advise a solution for
      discreet data storage that is not stored on the machine itself, and
      makes it possible to deny the presence of this information on any
      other system. 
      
      The basic requirement is that the data is:
    
        - not being delivered back until a special key sequence
          is sent,
    
        - permanently discarded without leaving any record on any
          non-volatile storage media in absence of keep-alive
          requests
    
      It is possible to use class A storage to implement this functionality
      using the sustained command method discussed above. The proper TCP
      sequence number is necessary to release the data, and until this
      sequence is delivered, the data is not sent back or disclosed to any
      party. If the client node goes offline, the data is discarded and
      likely overwritten.
    
      The sequence number is thus the key to the stored information,
      and, granted the lifetime of the data is fairly short when
      keep-alive \0s stop coming, it is often an adequate protection.
    
    
    6) User-accessible capacity
    ----------------------------
    
      In this section, we attempt to estimate the capacity available to
      a single user.
    
      To maintain a constant amount of data "outsourced" to the network, it
      is required to receive and send it back on a regular basis. 
      
      The time data may be stored remotely is defined by maximum lifetime
      Tmax of a single packet (including packet queuing and processing
      delays). The maximum amount of data that may be sent is limited by
      maximum available network bandwith (L).
    
      Thus, the maximum capacity can be defined as:
    
        Cmax [bytes] = L [bytes/second] * Tmax [seconds] / Psize * Dsize
    
      where:
    
        Dsize - size of a packet required to store an initial portion of
                data on remote host
    	    
        Psize - size of a packet required to sustain the information
                stored on remote host
    
      Psize and Dsize are equal and thus can be omitted whenever the 
      entire chunk of data is bounced back and forth, and differ only 
      for "sustained command" scenarios. The smallest TCP/IP packet to
      accomplish this would have 41 bytes. The maximum amount of data 
      that can be sustained using HTTP headers is about 4096 bytes.
    
      That all, in turn, gives us the following chart:
    
                Bandwith  | Class A | Class B
               -----------+---------+---------
                28.8 kbps |  105 MB |    2 GB
                 256 kbps |  936 MB |   18 GB
                   2 Mbps |  7.3 GB |  147 GB
                 100 Mbps |  365 GB |    7 TB
    
    
    7) Internet as a whole
    -----------------------
    
      In this section, we attempt to estimate the theoretical momentary
      capacity of the Internet as a whole.
    
      Class A:
    
        To estimate theoretical class A storage capacity of the Internet,
        we assumed that:
    
          - ICMP messages are the best compromise between storage
            capacity and preserving remote system's resources,
    
          - An average operating system has a packet input queue capable
            of holding at least 64 packets,
    
          - The default path MTU is around 1500 (the most common MTU).
    
        We have taken the estimated number of hosts on the Internet from ISC
        survey for 2003, which listed a count of 171,638,297 systems with
        reverse DNS entries. Not all IPs with reverse DNS have to be
        operational. To take this into account, we have used ICMP echo
        response ratio calculated from the last survey that performed such
        a test (1999). The data suggested that approximately 20% of visible
        systems were alive. This sets the number of systems ready to respond
        ICMP requests at roughly 34,000,000.
    
        By multiplying the number of systems that reply to ICMP echo request
        by the average packet cache size and maximum packet size (minus
        headers), we estimate the total theoretical momentary capability for
        class A ICMP storage to be at approximately 3 TB.
    
      Class B:
    
        To estimate theoretical class B storage capacity, we use the example
        of MTA software. There is no upper cap for the amount of data we
        feed to a single host. While it is safe to assume only messages
        under approximately one megabyte are not going to cause noticable
        system load and other undesirable effects, we assume the average
        maximum queue size to be at 500 MB.
    
        Own research suggests that roughly 15% of systems that respond to
        ping requests have port 25 open. We thus estimate the population of
        SMTP servers to be 3% (15% of 20%) of the total host count, that is
        just over 5,000,000 hosts.
    
        This gives a total storage space capacity of 2500 TB.
    
    
    8) Legal notice
    ----------------
    
      The authors reserve right to a method and system for cheap data
      storage for developing countries.
    
      A do-it-yourself kit (long wire, speaker + microphone, shovel and
      a driver disk) will provide an affordable, portable and reusable way 
      for extending storage memory on portable systems.
    
      It is estiamted that, after digging a 100 ft well, it is possible to
      achieve over six kilobits of extra RAM storage at 20 kHz.
    
      We are currently looking for distributors.
    
    
    9) Credits
    -----------
    
      The authors would like to thank Lance Spitzner for a brief review
      of this paper and some useful suggestions and comments.
    
    
    10) Availability
    
      You may always find this paper under following locations:
    
        http://isec.pl/papers/juggling_with_packets.txt
        http://lcamtuf.coredump.cx/juggling_with_packets.txt
    
    
    -- 
    Wojciech Purczynski
    iSEC Security Research
    http://isec.pl/
    



    This archive was generated by hypermail 2b30 : Mon Oct 06 2003 - 09:31:09 PDT