Network Working Group R. Even Internet-Draft H. Zheng Intended status: Informational Huawei Expires: March 16, 2018 L. Geng ChinaMobile R. Huang Huawei September 12, 2017 Passive Measurements in Network for troubleshooting Video Delivery Problems draft-even-quic-troubleshooting-video-delivery-00 Abstract This document provides a detailed description of the passive measurements that operators are using to troubleshoot network problems when delivering streaming video and multimedia services. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on March 16, 2018. Copyright Notice Copyright (c) 2017 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must Even, et al. Expires March 16, 2018 [Page 1] Internet-Draft passive-measurements-in-network September 2017 include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Passive Measurements for troubleshooting Video Delivery Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1. Passive Measurements for TCP . . . . . . . . . . . . . . 5 2.1.1. RTT Measurements . . . . . . . . . . . . . . . . . . 5 2.1.2. Loss Measurements . . . . . . . . . . . . . . . . . . 6 2.2. Video Delivery Problems Troubleshooting . . . . . . . . . 7 2.2.1. Locating WIFI Problems in Home Network . . . . . . . 7 2.2.2. Locating Network Devices Problems . . . . . . . . . . 8 2.2.3. Locating Server Side Problems . . . . . . . . . . . . 9 3. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 9 4. Security Considerations . . . . . . . . . . . . . . . . . . . 10 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 6. Informative References . . . . . . . . . . . . . . . . . . . 10 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 11 1. Introduction Privacy protection has been a growing concern in IETF. [RFC7258] says that Pervasive Monitoring (PM) is a technical attack that should be mitigated where possible through the design of protocols. This call of [RFC7258] is answered by emerging protocols, for example, QUIC [I-D.ietf-quic-transport]. QUIC is a new transport protocol designed to be secure. Once a connection is set up, packets exchanged by QUIC are largely encrypted during transmission; only a minimal piece of information in the protocol header is exposed. The encryption protects QUIC packets from being tampered by network middleboxes. Driven by concerns on privacy, the Internet has been accelerating the shift from using plaintext traffic towards using encrypted traffic [I-D.mm-wg-effect-encrypt]. Google started to offer end-to-end encryption for Gmail in 2010 and for searches in 2013. YouTube traffic has been carried via HTTPS (or QUIC) since 2014. In addition, the Snowden revelations [RFC7624] seem to cause an upward surge in encrypted traffic. However, it is also documented in [RFC7258] that making networks unmanageable to mitigate PM is not an acceptable outcome. The prevalence of encryption precludes operators from obtaining certain traffic information to do some applications service quality estimation; an example use case is that operators need ways to locate Even, et al. Expires March 16, 2018 [Page 2] Internet-Draft passive-measurements-in-network September 2017 fault and perform diagnosis when users report a degradation in quality of service. For traditional transport protocols such as TCP, passive measurements are easy to perform, because TCP exposes protocol state information in the protocol header, even HTTPS or TCPinc is used. The passive measurements on TCP traffic enables operators to manage and diagnoses TCP traffic in many ways, depending on the particular needs of a specific application. This draft aims to provide information on how operators use passive measurements derived from TCP header for video applications. For example, how to monitor video service quality degradation, and trouble-shoot and locate problematic devices in the networks. This information can be used as a reference to future transport protocols about how passive measurements can be useful. 2. Passive Measurements for troubleshooting Video Delivery Problems Below is a blueprint of video streaming across networks: Even, et al. Expires March 16, 2018 [Page 3] Internet-Draft passive-measurements-in-network September 2017 ------------ / +------+ \ / |Video | \ Server | |Server| | Network | +--v---+ | \ | / \ | / +-------V--------+ | Server Gateway | +-------V--------+ / | \ / | \ | +-----v------+ | | |Core Router | | Operator | +-----v------+ | Network | | | | +-----V-------+| | |Access Router|| | +-----V-------+| \ | / \ | / +------V-------+ | Home Gateway | +------V-------+ / | \ / | \ Home | +--v---+ | Network | |Video | | \ |Player| / \ +------+ / ------------ Figure 1: Video Streaming across Networks Video streaming relies on the network to transport its data. When a user experiences a degradation in the quality of video service, the user may complain and report the degradation to the network operator. Such kind of report usually includes little useful information for the operator to identify what the problem is. This due to video streaming is an end-to-end service; it is not easy to tell which path on the delivery chain goes wrong, given that the delivery chain can span multiple network providers' networks. In Figure 1, the delivery of a video stream crosses three networks: server network, operator network and the end user's home network. In this case, trouble- shooting the degradation is difficult and involves devices in the server network and the home network; both are out of operator's control. Even, et al. Expires March 16, 2018 [Page 4] Internet-Draft passive-measurements-in-network September 2017 To aid operator on such kind of issues, operators need a way to monitor and measure the video streaming performance on various nodes in the network along the delivery path. Operators may deploy probes on the network nodes (e.g. home gateway, access router, or core router) to measure and report information at flow level. With such information at hand, it should be easier for operators to diagnosis service degradation. Moreover, for operator to detect and take action about service degradation proactively. To summarize, passive measurements in the network help operators detect application problems at large. Without it, operators may have to resort to traditional methods, to perform tests in the network and analyze the results. It could be time consuming and off the scene, since not all problems are reproducible by tests. 2.1. Passive Measurements for TCP Section 3 of [I-D.stephan-quic-interdomain-troubleshooting] also mentions these measurements. Here, more detailed descriptions is given. As indicated in Figure 2, TCP passive measurements require setting up a measurement point on the path of a TCP connection. The measurement point virtually splits the path into halves. The half close to the server is called "upstream"; the half close to the client is called "downstream". The following sections are going to describe the methods for the measurement of Round Trip Time (RTT) and Loss in both upstream and downstream. "Inbound" and "Outbound" are used to denote stream direction. "Inbound" denotes the stream is toward server, whereas "Outbound" denotes the stream is toward client. Measurement Point +--------+ | +--------+ | Server |<------------------|------------------->| client | +--------+ Upstream | Downstream +--- ----+ Figure 2: Passive Measurement Point One caveat about passive measurement is it has no way to know the processing time at end points. For example, if server or client adds some delay before sending a packet, the delay cannot be mitigated at the measurement point when calculating RTT. 2.1.1. RTT Measurements TCP connection setup is a three-way handshake. Usually the client initiates a connection to the server. The signals "SYN -> SYN-ACK -> ACK" can be used to determine the initial upstream/downstream RTT. Even, et al. Expires March 16, 2018 [Page 5] Internet-Draft passive-measurements-in-network September 2017 o Initial Upstream RTT: The time difference between SYN and SYN-ACK o Initial Downstream RTT: The time difference between SYN-ACK and ACK After the connection setup phase, the initial RTT should be updated by sequence number matching. The built-in mechanism of TCP requires every segment to be acknowledged. By matching the sequence number, it is possible to pair a segment to its corresponding acknowledgement at the Measurement Point. The time difference between the segment and its acknowledgement can be a strong candidate for the RTT. However, this method is not without measurement error. In the following situations measurement error can occur: o Delayed ACK. For good reasons, the TCP endpoint may decide to delay sending acknowledgement for a little while. The measurement error contributed by delayed ACK can be up to 500 milliseconds, according to the statement in [RFC1122]. o Packet Loss. Another source of measurement error is from packet loss. A segment past the Measure Point can still be lost on the way to its destination. An acknowledgement can be lost before arriving the Measurement Point. There are times a segment cannot be matched to its corresponding acknowledgement, but to a latter one, thus contributing to measurement error. Note that bidirectional streams are required to measure both downstream and upstream RTT when using sequence number matching. Unidirectional stream from server to client yields downstream RTT. For upstream RTT, unidirectional stream from client to server is required. An alternative method of measuring RTT is described in Section 4 of [RFC7323], which utilize the TCP Timestamps option. The method results in less measurement error than sequence number matching 2.1.2. Loss Measurements TCP uses sliding window at both endpoints to coordinate data transmission. Sending endpoint utilizes send window to control how many data it can send; receiving endpoint utilizes receive window as a buffering mechanism for incoming data and to report window size update. The sliding window mechanism exchanges information by using fields and options of TCP header, thus it is visible to the network. Such information can be obtained at the Measurement Point, and the following loss measurements can be performed: o Downstream Loss Rate Measurement. Even, et al. Expires March 16, 2018 [Page 6] Internet-Draft passive-measurements-in-network September 2017 o Upstream Loss Rate Measurement. Downstream Loss Rate can be measure by monitoring outbound streams at the Measurement Point. From the sequence number exposed in TCP header, two values can be calculated: total amount of original data, and total amount of retransmitted data. Total amount of application data represents the number of bytes application wants to send. Total amount of retransmitted data represents the number of bytes that have been previously received at the Measurement Point. Downstream Loss Rate is calculated as: Total Amount of Retransmitted Data / Total Amount of Application Data For Upstream Loss Rate, monitoring outbound streams can only give estimates. This is due to difficult in counting the amount of data that is lost in the upstream before arriving to the Measurement Point. Data loss in the upstream causes the Measurement Point seeing "holes" in received sequence numbers. The amount of data represented by the "holes" can be used as an estimate for upstream data loss. However, to make more practical estimate of loss, two issues need to be considered. A) out-of-order packets can as well cause "holes", so the measurement should also account for out-of-order arrival. B) If the segment with newer sequence number than that is recorded at the Measurement Point, there is no way to tell such loss at the Measurement Point. The situation is reversed when monitoring inbound streams instead of outbound streams. In this case, Upstream Loss Rate can be measured more precisely and Downstream Loss Rate can only be estimated. 2.2. Video Delivery Problems Troubleshooting This section describes how the TCP passive measurements are used for troubleshooting the video delivery problems. As depicted in Figure 1, three network segments are concerned: home network, operator's network and server network. The following subsections address problems regarding each of the network segments. 2.2.1. Locating WIFI Problems in Home Network It is common that WIFI is used in home network to share internet access wirelessly. This functionality brings mobility to people when accessing internet at home. However, it comes at a cost when wireless access performances is worse than wired access, since wireless signal suffers more from varying environmental conditions. Wireless access inherently incurs more packet loss and often results in large delay. Performance of network applications is often degraded in wireless network. Even, et al. Expires March 16, 2018 [Page 7] Internet-Draft passive-measurements-in-network September 2017 When network application performance degrades, WIFI is often blamed. It is desirable for operators to know how much WIFI has contributed to the degradation. Some passive measurement methods are needed to help visualize the problem. One method is to profile the RTT in the home network. High RTT values may be seen for home networks that use WIFI. One important reason that WIFI causes high RTT values is that WIFI retransmits lost frames in its Medium Access Control (MAC) layer, in order to alleviate high loss induced by poor wireless conditions. Due to the trade-off at MAC layer, WIFI traffic often has the trait of high delay and relatively low packet loss rate. This trait makes traffic over WIFI more distinguishable from traffic other carriers. To profile the RTT in the home network, the Measurement Point should be set at the home gateway if it is controlled by the operator. Otherwise, the Measurement Point has to be deployed one level above the home gateway in the access network, usually the next hop IP address from the home gateway. In this case, the Measurement Point is distant from the home network it measures. Congestion in the link between the Measurement Point and the home network can affect the test result. The Measurement Point must not account for RTTs affected by congestion in the link. When congestion occurs, the loss and delay both increases, making it distinguishable from ordinary WIFI traffic, which is high delay but low loss. Passive measurement on TCP traffic is crucial to the RTT profiling method introduced above, since TCP traffic is the major constitution of all traffic on the Internet. It is a viable source to collect downstream RTT from TCP traffic. 2.2.2. Locating Network Devices Problems Sometimes application performance degradation is caused by problems in the network. One faulty or misconfigured node in the network may cause unusual packet loss or unnecessary delay for packets. When this happens, it is often difficult for operators to locate the faulty or misconfigured node, due to the complex architecture of network. Operators have to find out whether the problem exists in the access level, or the aggregation level, or even in the core. To help locate the problem, it is useful to identify which network segment causes it. For that, passive measurement can serve as a vital means for the problem demarcation between network segments. Probes can be deployed on suitable nodes along the whole network path, as indicated in Figure 3. Even, et al. Expires March 16, 2018 [Page 8] Internet-Draft passive-measurements-in-network September 2017 +--------------------+ | Measurement Center | +--------------------+ \ \ \ Probe \ Probe \ \ +-------+ ++ ++ +-------+ |Network| Access || Aggregation || Core |Network| |Ingress|---------++--------------++---------|Egress | |Node | Network || Network || Network |Node | +-------+ ++ ++ +-------+ Figure 3: Probes Deployed on Network Path The purpose of those probes deployed on the network path is to measure TCP traffic passively and report the collected downstream/ upstream RTT and Loss information to the Measurement Center. Then it is possible for the Measurement Center to build a normal baseline of the characteristics of the network segments. If a network node turns faulty or misconfigured, its behavior will deviate from the normal baseline, thus be detected by the Measurement Center. This will greatly aid operators in trouble-shooting problems that are caused by the network. 2.2.3. Locating Server Side Problems For end-to-end application such as video streaming, there is a possibility that performance degradation is caused by the problems in the upstream of the service chain, located in the server side, owned by server network providers. In this case, if the operator network provider can use passive measurement results as a proof to server network providers, and improve the server network provider's understanding about how the network is doing outside the server gateway. Using this information, server network providers can focus more on the potential problem area, rather than looking outside. 3. Conclusion The information exposed by TCP Header enables network operators to do passive measurements such as RTT and packet loss. This information is useful for network operators to do trouble-shooting. This document proposes several use cases about passive measurement. A conclusion can be drawn from those use cases is that passive measurement is a viable means for diagnosis of application performance degradation, especially in problem demarcation between network segments. Even, et al. Expires March 16, 2018 [Page 9] Internet-Draft passive-measurements-in-network September 2017 It is a recommendation for future transport protocols that passive measurements of RTT and packet loss are supported. New transport protocols may exploit different ways than what TCP does. It is required that information needed for doing passive measurements is exposed to network. For more information and discussion on solutions see also [I-D.stephan-quic-interdomain-troubleshooting] and [I-D.ietf-quic-manageability]. 4. Security Considerations T.B.D. 5. IANA Considerations This document has no requirement on IANA. 6. Informative References [I-D.ietf-quic-manageability] Kuehlewind, M., Trammell, B., and D. Druta, "Manageability of the QUIC Transport Protocol", draft-ietf-quic- manageability-00 (work in progress), July 2017. [I-D.ietf-quic-transport] Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed and Secure Transport", draft-ietf-quic-transport-05 (work in progress), August 2017. [I-D.mm-wg-effect-encrypt] Moriarty, K. and A. Morton, "Effect of Pervasive Encryption on Operators", draft-mm-wg-effect-encrypt-12 (work in progress), June 2017. [I-D.stephan-quic-interdomain-troubleshooting] Emile, S., Cayla, M., Braud, A., and F. Fieau, "QUIC Interdomain Troubleshooting", draft-stephan-quic- interdomain-troubleshooting-00 (work in progress), July 2017. [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - Communication Layers", STD 3, RFC 1122, DOI 10.17487/RFC1122, October 1989, . Even, et al. Expires March 16, 2018 [Page 10] Internet-Draft passive-measurements-in-network September 2017 [RFC7258] Farrell, S. and H. Tschofenig, "Pervasive Monitoring Is an Attack", BCP 188, RFC 7258, DOI 10.17487/RFC7258, May 2014, . [RFC7323] Borman, D., Braden, B., Jacobson, V., and R. Scheffenegger, Ed., "TCP Extensions for High Performance", RFC 7323, DOI 10.17487/RFC7323, September 2014, . [RFC7624] Barnes, R., Schneier, B., Jennings, C., Hardie, T., Trammell, B., Huitema, C., and D. Borkmann, "Confidentiality in the Face of Pervasive Surveillance: A Threat Model and Problem Statement", RFC 7624, DOI 10.17487/RFC7624, August 2015, . Authors' Addresses Roni Even Huawei Email: roni.even@huawei.com Hui Zheng (Marvin) Huawei Email: marvin.zhenghui@huawei.com Liang Geng ChinaMobile Email: gengliang@chinamobile.com Rachel Huang Huawei Email: rachel.huang@huawei.com Even, et al. Expires March 16, 2018 [Page 11]