/[base]/head/sys/kern/uipc_socket.c
ViewVC logotype

Log of /head/sys/kern/uipc_socket.c

Parent Directory Parent Directory | Revision Log Revision Log


Links to HEAD: (view) (download) (annotate)
Sticky Revision:


Revision 368326 - (view) (download) (annotate) - [select for diffs]
Modified Fri Dec 4 04:39:48 2020 UTC (3 years, 6 months ago) by kevans
File length: 113260 byte(s)
Diff to previous 367498
kern: soclose: don't sleep on SO_LINGER w/ timeout=0

This is a valid scenario that's handled in the various protocol layers where
it makes sense (e.g., tcp_disconnect and sctp_disconnect). Given that it
indicates we should immediately drop the connection, it makes little sense
to sleep on it.

This could lead to panics with INVARIANTS. On non-INVARIANTS kernels, this
could result in the thread hanging until a signal interrupts it if the
protocol does not mark the socket as disconnected for whatever reason.

Reported by:	syzbot+e625d92c1dd74e402c81@syzkaller.appspotmail.com
Reviewed by:	glebius, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D27407


Revision 367498 - (view) (download) (annotate) - [select for diffs]
Modified Mon Nov 9 00:04:35 2020 UTC (3 years, 7 months ago) by mjg
File length: 113230 byte(s)
Diff to previous 365222
kqueue: save space by using only one func pointer for assertions


Revision 365222 - (view) (download) (annotate) - [select for diffs]
Modified Tue Sep 1 22:12:32 2020 UTC (3 years, 10 months ago) by mjg
File length: 113534 byte(s)
Diff to previous 364409
kern: clean up empty lines in .c and .h files


Revision 364409 - (view) (download) (annotate) - [select for diffs]
Modified Wed Aug 19 23:42:33 2020 UTC (3 years, 10 months ago) by rmacklem
File length: 113536 byte(s)
Diff to previous 363680
Add the MSG_TLSAPPDATA flag to indicate "return ENXIO" for non-application TLS
data records.

The kernel RPC cannot process non-application data records when
using TLS. It must to an upcall to a userspace daemon that will
call SSL_read() to process them.

This patch adds a new flag called MSG_TLSAPPDATA that the kernel
RPC can use to tell sorecieve() to return ENXIO instead of a non-application
data record, when that is what is at the top of the receive queue.
I put the code in #ifdef KERN_TLS/#endif, although it will build without
that, so that it is recognized as only useful when KERN_TLS is enabled.
The alternative to doing this is to have the kernel RPC re-queue the
non-application data message after receiving it, but that seems more
complicated and might introduce message ordering issues when there
are multiple non-application data records one after another.

I do not know what, if any, changes will be required to support TLS1.3.

Reviewed by:	glebius
Differential Revision:	https://reviews.freebsd.org/D25923


Revision 363680 - (view) (download) (annotate) - [select for diffs]
Modified Wed Jul 29 23:24:32 2020 UTC (3 years, 11 months ago) by jhb
File length: 112729 byte(s)
Diff to previous 363464
Properly handle a closed TLS socket with pending receive data.

If the remote end closes a TLS socket and the socket buffer still
contains not-yet-decrypted TLS records but no decrypted TLS records,
soreceive needs to block or fail with EWOULDBLOCK.  Previously it was
trying to return data and dereferencing a NULL pointer.

Reviewed by:	np
Sponsored by:	Chelsio
Differential Revision:	https://reviews.freebsd.org/D25838


Revision 363464 - (view) (download) (annotate) - [select for diffs]
Modified Thu Jul 23 23:48:18 2020 UTC (3 years, 11 months ago) by jhb
File length: 112685 byte(s)
Diff to previous 362338
Add support for KTLS RX via software decryption.

Allow TLS records to be decrypted in the kernel after being received
by a NIC.  At a high level this is somewhat similar to software KTLS
for the transmit path except in reverse.  Protocols enqueue mbufs
containing encrypted TLS records (or portions of records) into the
tail of a socket buffer and the KTLS layer decrypts those records
before returning them to userland applications.  However, there is an
important difference:

- In the transmit case, the socket buffer is always a single "record"
  holding a chain of mbufs.  Not-yet-encrypted mbufs are marked not
  ready (M_NOTREADY) and released to protocols for transmit by marking
  mbufs ready once their data is encrypted.

- In the receive case, incoming (encrypted) data appended to the
  socket buffer is still a single stream of data from the protocol,
  but decrypted TLS records are stored as separate records in the
  socket buffer and read individually via recvmsg().

Initially I tried to make this work by marking incoming mbufs as
M_NOTREADY, but there didn't seemed to be a non-gross way to deal with
picking a portion of the mbuf chain and turning it into a new record
in the socket buffer after decrypting the TLS record it contained
(along with prepending a control message).  Also, such mbufs would
also need to be "pinned" in some way while they are being decrypted
such that a concurrent sbcut() wouldn't free them out from under the
thread performing decryption.

As such, I settled on the following solution:

- Socket buffers now contain an additional chain of mbufs (sb_mtls,
  sb_mtlstail, and sb_tlscc) containing encrypted mbufs appended by
  the protocol layer.  These mbufs are still marked M_NOTREADY, but
  soreceive*() generally don't know about them (except that they will
  block waiting for data to be decrypted for a blocking read).

- Each time a new mbuf is appended to this TLS mbuf chain, the socket
  buffer peeks at the TLS record header at the head of the chain to
  determine the encrypted record's length.  If enough data is queued
  for the TLS record, the socket is placed on a per-CPU TLS workqueue
  (reusing the existing KTLS workqueues and worker threads).

- The worker thread loops over the TLS mbuf chain decrypting records
  until it runs out of data.  Each record is detached from the TLS
  mbuf chain while it is being decrypted to keep the mbufs "pinned".
  However, a new sb_dtlscc field tracks the character count of the
  detached record and sbcut()/sbdrop() is updated to account for the
  detached record.  After the record is decrypted, the worker thread
  first checks to see if sbcut() dropped the record.  If so, it is
  freed (can happen when a socket is closed with pending data).
  Otherwise, the header and trailer are stripped from the original
  mbufs, a control message is created holding the decrypted TLS
  header, and the decrypted TLS record is appended to the "normal"
  socket buffer chain.

(Side note: the SBCHECK() infrastucture was very useful as I was
 able to add assertions there about the TLS chain that caught several
 bugs during development.)

Tested by:	rmacklem (various versions)
Relnotes:	yes
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D24628


Revision 362338 - (view) (download) (annotate) - [select for diffs]
Modified Thu Jun 18 19:32:34 2020 UTC (4 years ago) by markj
File length: 112621 byte(s)
Diff to previous 361613
Add the SCTP_SUPPORT kernel option.

This is in preparation for enabling a loadable SCTP stack.  Analogous to
IPSEC/IPSEC_SUPPORT, the SCTP_SUPPORT kernel option must be configured
in order to support a loadable SCTP implementation.

Discussed with:	tuexen
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation


Revision 361613 - (view) (download) (annotate) - [select for diffs]
Modified Fri May 29 00:09:12 2020 UTC (4 years, 1 month ago) by jhb
File length: 112590 byte(s)
Diff to previous 361567
Permit SO_NO_DDP and SO_NO_OFFLOAD to be read via getsockopt(2).

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D24627


Revision 361567 - (view) (download) (annotate) - [select for diffs]
Modified Wed May 27 23:20:35 2020 UTC (4 years, 1 month ago) by rmacklem
File length: 112550 byte(s)
Diff to previous 361056
Fix sosend() for the case where mbufs are passed in while doing ktls.

For kernel tls, sosend() needs to call ktls_frame() on the mbuf list
to be sent.  Without this patch, this was only done when sosend()'s
arguments used a uio_iov and not when an mbuf list is passed in.
At this time, sosend() is never called with an mbuf list argument when
kernel tls is in use, but will be once nfs-over-tls has been incorporated
into head.

Reviewed by:	gallatin, glebius
Differential Revision:	https://reviews.freebsd.org/D24674


Revision 361056 - (view) (download) (annotate) - [select for diffs]
Modified Thu May 14 20:17:09 2020 UTC (4 years, 1 month ago) by kib
File length: 112404 byte(s)
Diff to previous 361037
Fix r361037.

Reorder flag manipulations and use barrier to ensure that the program
order is followed by compiler and CPU, for unlocked reader of so_state.

In collaboration with:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D24842


Revision 361037 - (view) (download) (annotate) - [select for diffs]
Modified Thu May 14 17:54:08 2020 UTC (4 years, 1 month ago) by kib
File length: 112091 byte(s)
Diff to previous 360581
Fix spurious ENOTCONN from closed unix domain socket other' side.

Sometimes, when doing read(2) over unix domain socket, for which the
other side socket was closed, read(2) returns -1/ENOTCONN instead of
EOF AKA zero-size read. This is because soreceive_generic() does not
lock socket when testing the so_state SS_ISCONNECTED|SS_ISCONNECTING
flags. It could end up that we do not observe so->so_rcv.sb_state bit
SBS_CANTRCVMORE, and then miss SS_ flags.

Change the test to check that the socket was never connected before
returning ENOTCONN, by adding all state bits for connected.

Reported and tested by:	pho
In collaboration with:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D24819


Revision 360581 - (view) (download) (annotate) - [select for diffs]
Modified Sun May 3 00:21:11 2020 UTC (4 years, 2 months ago) by glebius
File length: 112037 byte(s)
Diff to previous 360416
Step 4.1: mechanically rename M_NOMAP to M_EXTPG

Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D24598


Revision 360416 - (view) (download) (annotate) - [select for diffs]
Modified Mon Apr 27 23:55:09 2020 UTC (4 years, 2 months ago) by rmacklem
File length: 112037 byte(s)
Diff to previous 360408
Fix sosend_generic() so that it can handle a list of ext_pgs mbufs.

Without this patch, sosend_generic() will try to use top->m_pkthdr.len,
assuming that the first mbuf has a pkthdr.
When a list of ext_pgs mbufs is passed in, the first mbuf is not a
pkthdr and cannot be post-r359919.  As such, the value of top->m_pkthdr.len
is bogus (0 for my testing).
This patch fixes sosend_generic() to handle this case, calculating the
total length via m_length() for this case.

There is currently nothing that hands a list of ext_pgs mbufs to
sosend_generic(), but the nfs-over-tls kernel RPC code in
projects/nfs-over-tls will do that and was used to test this patch.

Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D24568


Revision 360408 - (view) (download) (annotate) - [select for diffs]
Modified Mon Apr 27 23:17:19 2020 UTC (4 years, 2 months ago) by jhb
File length: 111964 byte(s)
Diff to previous 359923
Initial support for kernel offload of TLS receive.

- Add a new TCP_RXTLS_ENABLE socket option to set the encryption and
  authentication algorithms and keys as well as the initial sequence
  number.

- When reading from a socket using KTLS receive, applications must use
  recvmsg().  Each successful call to recvmsg() will return a single
  TLS record.  A new TCP control message, TLS_GET_RECORD, will contain
  the TLS record header of the decrypted record.  The regular message
  buffer passed to recvmsg() will receive the decrypted payload.  This
  is similar to the interface used by Linux's KTLS RX except that
  Linux does not return the full TLS header in the control message.

- Add plumbing to the TOE KTLS interface to request either transmit
  or receive KTLS sessions.

- When a socket is using receive KTLS, redirect reads from
  soreceive_stream() into soreceive_generic().

- Note that this interface is currently only defined for TLS 1.1 and
  1.2, though I believe we will be able to reuse the same interface
  and structures for 1.3.


Revision 359923 - (view) (download) (annotate) - [select for diffs]
Modified Tue Apr 14 15:38:18 2020 UTC (4 years, 2 months ago) by jtl
File length: 111485 byte(s)
Diff to previous 359922
Make sonewconn() overflow messages have per-socket rate-limits and values.

sonewconn() emits debug-level messages when a listen socket's queue
overflows. Currently, sonewconn() tracks overflows on a global basis. It
will only log one message every 60 seconds, regardless of how many sockets
experience overflows. And, when it next logs at the end of the 60 seconds,
it records a single message referencing a single PCB with the total number
of overflows across all sockets.

This commit changes to per-socket overflow tracking. The code will now
log one message every 60 seconds per socket. And, the code will provide
per-socket queue length and overflow counts. It also provides a way to
change the period between log messages using a sysctl.

Reviewed by:	jhb (previous version), bcr (manpages)
MFC after:	2 weeks
Sponsored by:	Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D24316


Revision 359922 - (view) (download) (annotate) - [select for diffs]
Modified Tue Apr 14 15:30:34 2020 UTC (4 years, 2 months ago) by jtl
File length: 111020 byte(s)
Diff to previous 358333
Print more detail as part of the sonewconn() overflow message.

When a socket's listen queue overflows, sonewconn() emits a debug-level
log message. These messages are sometimes useful to systems administrators
in highlighting a process which is not keeping up with its listen queue.

This commit attempts to enhance the usefulness of this message by printing
more details about the socket's address. If all else fails, it will at
least print the domain name of the socket.

Reviewed by:	bz, jhb, kbowling
MFC after:	2 weeks
Sponsored by:	Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D24272


Revision 358333 - (view) (download) (annotate) - [select for diffs]
Modified Wed Feb 26 14:26:36 2020 UTC (4 years, 4 months ago) by kaktus
File length: 108904 byte(s)
Diff to previous 358319
Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)

r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718


Revision 358319 - (view) (download) (annotate) - [select for diffs]
Modified Tue Feb 25 19:26:40 2020 UTC (4 years, 4 months ago) by glebius
File length: 108813 byte(s)
Diff to previous 353355
Make ktls_frame() never fail.  Caller must supply correct mbufs.
This makes sendfile code a bit simplier.


Revision 353355 - (view) (download) (annotate) - [select for diffs]
Modified Wed Oct 9 16:59:42 2019 UTC (4 years, 8 months ago) by glebius
File length: 108890 byte(s)
Diff to previous 353328
Cleanup unneeded includes that crept in with r353292.


Revision 353328 - (view) (download) (annotate) - [select for diffs]
Modified Tue Oct 8 21:34:06 2019 UTC (4 years, 8 months ago) by jhb
File length: 109024 byte(s)
Diff to previous 353292
Add a TOE KTLS mode and a TOE hook for allocating TLS sessions.

This adds the glue to allocate TLS sessions and invokes it from
the TLS enable socket option handler.  This also adds some counters
for active TOE sessions.

The TOE KTLS mode is returned by getsockopt(TLSTX_TLS_MODE) when
TOE KTLS is in use on a socket, but cannot be set via setsockopt().

To simplify various checks, a TLS session now includes an explicit
'mode' member set to the value returned by TLSTX_TLS_MODE.  Various
places that used to check 'sw_encrypt' against NULL to determine
software vs ifnet (NIC) TLS now check 'mode' instead.

Reviewed by:	np, gallatin
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D21891


Revision 353292 - (view) (download) (annotate) - [select for diffs]
Modified Mon Oct 7 22:40:05 2019 UTC (4 years, 8 months ago) by glebius
File length: 109014 byte(s)
Diff to previous 351522
Widen NET_EPOCH coverage.

When epoch(9) was introduced to network stack, it was basically
dropped in place of existing locking, which was mutexes and
rwlocks. For the sake of performance mutex covered areas were
as small as possible, so became epoch covered areas.

However, epoch doesn't introduce any contention, it just delays
memory reclaim. So, there is no point to minimise epoch covered
areas in sense of performance. Meanwhile entering/exiting epoch
also has non-zero CPU usage, so doing this less often is a win.

Not the least is also code maintainability. In the new paradigm
we can assume that at any stage of processing a packet, we are
inside network epoch. This makes coding both input and output
path way easier.

On output path we already enter epoch quite early - in the
ip_output(), in the ip6_output().

This patch does the same for the input path. All ISR processing,
network related callouts, other ways of packet injection to the
network stack shall be performed in net_epoch. Any leaf function
that walks network configuration now asserts epoch.

Tricky part is configuration code paths - ioctls, sysctls. They
also call into leaf functions, so some need to be changed.

This patch would introduce more epoch recursions (see EPOCH_TRACE)
than we had before. They will be cleaned up separately, as several
of them aren't trivial. Note, that unlike a lock recursion the
epoch recursion is safe and just wastes a bit of resources.

Reviewed by:	gallatin, hselasky, cy, adrian, kristof
Differential Revision:	https://reviews.freebsd.org/D19111


Revision 351522 - (view) (download) (annotate) - [select for diffs]
Modified Tue Aug 27 00:01:56 2019 UTC (4 years, 10 months ago) by jhb
File length: 108880 byte(s)
Diff to previous 351214
Add kernel-side support for in-kernel TLS.

KTLS adds support for in-kernel framing and encryption of Transport
Layer Security (1.0-1.2) data on TCP sockets.  KTLS only supports
offload of TLS for transmitted data.  Key negotation must still be
performed in userland.  Once completed, transmit session keys for a
connection are provided to the kernel via a new TCP_TXTLS_ENABLE
socket option.  All subsequent data transmitted on the socket is
placed into TLS frames and encrypted using the supplied keys.

Any data written to a KTLS-enabled socket via write(2), aio_write(2),
or sendfile(2) is assumed to be application data and is encoded in TLS
frames with an application data type.  Individual records can be sent
with a custom type (e.g. handshake messages) via sendmsg(2) with a new
control message (TLS_SET_RECORD_TYPE) specifying the record type.

At present, rekeying is not supported though the in-kernel framework
should support rekeying.

KTLS makes use of the recently added unmapped mbufs to store TLS
frames in the socket buffer.  Each TLS frame is described by a single
ext_pgs mbuf.  The ext_pgs structure contains the header of the TLS
record (and trailer for encrypted records) as well as references to
the associated TLS session.

KTLS supports two primary methods of encrypting TLS frames: software
TLS and ifnet TLS.

Software TLS marks mbufs holding socket data as not ready via
M_NOTREADY similar to sendfile(2) when TLS framing information is
added to an unmapped mbuf in ktls_frame().  ktls_enqueue() is then
called to schedule TLS frames for encryption.  In the case of
sendfile_iodone() calls ktls_enqueue() instead of pru_ready() leaving
the mbufs marked M_NOTREADY until encryption is completed.  For other
writes (vn_sendfile when pages are available, write(2), etc.), the
PRUS_NOTREADY is set when invoking pru_send() along with invoking
ktls_enqueue().

A pool of worker threads (the "KTLS" kernel process) encrypts TLS
frames queued via ktls_enqueue().  Each TLS frame is temporarily
mapped using the direct map and passed to a software encryption
backend to perform the actual encryption.

(Note: The use of PHYS_TO_DMAP could be replaced with sf_bufs if
someone wished to make this work on architectures without a direct
map.)

KTLS supports pluggable software encryption backends.  Internally,
Netflix uses proprietary pure-software backends.  This commit includes
a simple backend in a new ktls_ocf.ko module that uses the kernel's
OpenCrypto framework to provide AES-GCM encryption of TLS frames.  As
a result, software TLS is now a bit of a misnomer as it can make use
of hardware crypto accelerators.

Once software encryption has finished, the TLS frame mbufs are marked
ready via pru_ready().  At this point, the encrypted data appears as
regular payload to the TCP stack stored in unmapped mbufs.

ifnet TLS permits a NIC to offload the TLS encryption and TCP
segmentation.  In this mode, a new send tag type (IF_SND_TAG_TYPE_TLS)
is allocated on the interface a socket is routed over and associated
with a TLS session.  TLS records for a TLS session using ifnet TLS are
not marked M_NOTREADY but are passed down the stack unencrypted.  The
ip_output_send() and ip6_output_send() helper functions that apply
send tags to outbound IP packets verify that the send tag of the TLS
record matches the outbound interface.  If so, the packet is tagged
with the TLS send tag and sent to the interface.  The NIC device
driver must recognize packets with the TLS send tag and schedule them
for TLS encryption and TCP segmentation.  If the the outbound
interface does not match the interface in the TLS send tag, the packet
is dropped.  In addition, a task is scheduled to refresh the TLS send
tag for the TLS session.  If a new TLS send tag cannot be allocated,
the connection is dropped.  If a new TLS send tag is allocated,
however, subsequent packets will be tagged with the correct TLS send
tag.  (This latter case has been tested by configuring both ports of a
Chelsio T6 in a lagg and failing over from one port to another.  As
the connections migrated to the new port, new TLS send tags were
allocated for the new port and connections resumed without being
dropped.)

ifnet TLS can be enabled and disabled on supported network interfaces
via new '[-]txtls[46]' options to ifconfig(8).  ifnet TLS is supported
across both vlan devices and lagg interfaces using failover, lacp with
flowid enabled, or lacp with flowid enabled.

Applications may request the current KTLS mode of a connection via a
new TCP_TXTLS_MODE socket option.  They can also use this socket
option to toggle between software and ifnet TLS modes.

In addition, a testing tool is available in tools/tools/switch_tls.
This is modeled on tcpdrop and uses similar syntax.  However, instead
of dropping connections, -s is used to force KTLS connections to
switch to software TLS and -i is used to switch to ifnet TLS.

Various sysctls and counters are available under the kern.ipc.tls
sysctl node.  The kern.ipc.tls.enable node must be set to true to
enable KTLS (it is off by default).  The use of unmapped mbufs must
also be enabled via kern.ipc.mb_use_ext_pgs to enable KTLS.

KTLS is enabled via the KERN_TLS kernel option.

This patch is the culmination of years of work by several folks
including Scott Long and Randall Stewart for the original design and
implementation; Drew Gallatin for several optimizations including the
use of ext_pgs mbufs, the M_NOTREADY mechanism for TLS records
awaiting software encryption, and pluggable software crypto backends;
and John Baldwin for modifications to support hardware TLS offload.

Reviewed by:	gallatin, hselasky, rrs
Obtained from:	Netflix
Sponsored by:	Netflix, Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D21277


Revision 351214 - (view) (download) (annotate) - [select for diffs]
Modified Mon Aug 19 12:42:03 2019 UTC (4 years, 10 months ago) by ae
File length: 107234 byte(s)
Diff to previous 349989
Use TAILQ_FOREACH_SAFE() macro to avoid use after free in soclose().

PR:		239893
MFC after:	1 week


Revision 349989 - (view) (download) (annotate) - [select for diffs]
Modified Sun Jul 14 21:44:18 2019 UTC (4 years, 11 months ago) by tuexen
File length: 107218 byte(s)
Diff to previous 349599
Improve the input validation for l_linger.
When using the SOL_SOCKET level socket option SO_LINGER, the structure
struct linger is used as the option value. The component l_linger is of
type int, but internally copied to the field so_linger of the structure
struct socket. The type of so_linger is short, but it is assumed to be
non-negative and the value is used to compute ticks to be stored in a
variable of type int.

Therefore, perform input validation on l_linger similar to the one
performed by NetBSD and OpenBSD.

Thanks to syzkaller for making me aware of this issue.

Thanks to markj@ for pointing out that a similar check should be added
to so_linger_set().

Reviewed by:		markj@
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D20948


Revision 349599 - (view) (download) (annotate) - [select for diffs]
Modified Tue Jul 2 14:24:42 2019 UTC (5 years ago) by markj
File length: 106971 byte(s)
Diff to previous 349529
Fix handling of errors from sblock() in soreceive_stream().

Previously we would attempt to unlock the socket buffer despite having
failed to lock it.  Simply return an error instead: no resources need
to be released at this point, and doing so is consistent with
soreceive_generic().

PR:		238789
Submitted by:	Greg Becker <greg@codeconcepts.com>
MFC after:	1 week


Revision 349529 - (view) (download) (annotate) - [select for diffs]
Modified Sat Jun 29 00:48:33 2019 UTC (5 years ago) by jhb
File length: 106965 byte(s)
Diff to previous 349475
Add an external mbuf buffer type that holds multiple unmapped pages.

Unmapped mbufs allow sendfile to carry multiple pages of data in a
single mbuf, without mapping those pages.  It is a requirement for
Netflix's in-kernel TLS, and provides a 5-10% CPU savings on heavy web
serving workloads when used by sendfile, due to effectively
compressing socket buffers by an order of magnitude, and hence
reducing cache misses.

For this new external mbuf buffer type (EXT_PGS), the ext_buf pointer
now points to a struct mbuf_ext_pgs structure instead of a data
buffer.  This structure contains an array of physical addresses (this
reduces cache misses compared to an earlier version that stored an
array of vm_page_t pointers).  It also stores additional fields needed
for in-kernel TLS such as the TLS header and trailer data that are
currently unused.  To more easily detect these mbufs, the M_NOMAP flag
is set in m_flags in addition to M_EXT.

Various functions like m_copydata() have been updated to safely access
packet contents (using uiomove_fromphys()), to make things like BPF
safe.

NIC drivers advertise support for unmapped mbufs on transmit via a new
IFCAP_NOMAP capability.  This capability can be toggled via the new
'nomap' and '-nomap' ifconfig(8) commands.  For NIC drivers that only
transmit packet contents via DMA and use bus_dma, adding the
capability to if_capabilities and if_capenable should be all that is
required.

If a NIC does not support unmapped mbufs, they are converted to a
chain of mapped mbufs (using sf_bufs to provide the mapping) in
ip_output or ip6_output.  If an unmapped mbuf requires software
checksums, it is also converted to a chain of mapped mbufs before
computing the checksum.

Submitted by:	gallatin (earlier version)
Reviewed by:	gallatin, hselasky, rrs
Discussed with:	ae, kp (firewalls)
Relnotes:	yes
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20616


Revision 349475 - (view) (download) (annotate) - [select for diffs]
Modified Thu Jun 27 22:50:11 2019 UTC (5 years ago) by jhb
File length: 106859 byte(s)
Diff to previous 344741
Fix comment in sofree() to reference sbdestroy().

r160875 added sbdestroy() as a wrapper around sbrelease_internal to be
called from sofree(), yet the comment added in the same revision to
sofree() still mentions sbrelease_internal().

Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20488


Revision 344741 - (view) (download) (annotate) - [select for diffs]
Modified Sun Mar 3 18:57:48 2019 UTC (5 years, 4 months ago) by glebius
File length: 106868 byte(s)
Diff to previous 343005
Remove bogus assert that I added in r319722. It is a legitimate case
to call soabort() on a newborn socket created by sonewconn() in case
if further setup of PCB failed. Code in sofree() handles such socket
correctly.

Submitted by:	jtl, rrs
MFC after:	3 weeks


Revision 343005 - (view) (download) (annotate) - [select for diffs]
Modified Sun Jan 13 20:33:54 2019 UTC (5 years, 5 months ago) by jah
File length: 106927 byte(s)
Diff to previous 342905
Handle SIGIO for listening sockets

r319722 separated struct socket and parts of the socket I/O path into
listening-socket-specific and dataflow-socket-specific pieces.  Listening
socket connection notifications are now handled by solisten_wakeup() instead
of sowakeup(), but solisten_wakeup() does not currently post SIGIO to the
owning process.

PR:	234258
Reported by:	Kenneth Adelman
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D18664


Revision 342905 - (view) (download) (annotate) - [select for diffs]
Modified Thu Jan 10 00:25:12 2019 UTC (5 years, 5 months ago) by glebius
File length: 106832 byte(s)
Diff to previous 342768
Simplify sosetopt() so that function has single return point. No
functional change.


Revision 342768 - (view) (download) (annotate) - [select for diffs]
Modified Fri Jan 4 17:31:50 2019 UTC (5 years, 5 months ago) by markj
File length: 106871 byte(s)
Diff to previous 340783
Support MSG_DONTWAIT in send*(2).

As it does for recv*(2), MSG_DONTWAIT indicates that the call should
not block, returning EAGAIN instead.  Linux and OpenBSD both implement
this, so the change makes porting easier, especially since we do not
return EINVAL or so when unrecognized flags are specified.

Submitted by:	Greg V <greg@unrelenting.technology>
Reviewed by:	tuexen
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D18728


Revision 340783 - (view) (download) (annotate) - [select for diffs]
Modified Thu Nov 22 20:49:41 2018 UTC (5 years, 7 months ago) by markj
File length: 106842 byte(s)
Diff to previous 339419
Plug some networking sysctl leaks.

Various network protocol sysctl handlers were not zero-filling their
output buffers and thus would export uninitialized stack memory to
userland.  Fix a number of such handlers.

Reported by:	Thomas Barabosch, Fraunhofer FKIE
Reviewed by:	tuexen
MFC after:	3 days
Security:	kernel memory disclosure
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18301


Revision 339419 - (view) (download) (annotate) - [select for diffs]
Modified Thu Oct 18 14:20:15 2018 UTC (5 years, 8 months ago) by jtl
File length: 106903 byte(s)
Diff to previous 339170
r334853 added a "socket destructor" callback. However, as implemented, it
was really a "socket close" callback.

Update the socket destructor functionality to run when a socket is
destroyed (rather than when it is closed). The original submitter has
confirmed that this change satisfies the intended use case.

Suggested by:	rwatson
Submitted by:	Michio Honda <micchie at sfc.wide.ad.jp>
Tested by:	Michio Honda <micchie at sfc.wide.ad.jp>
Approved by:	re (kib)
Differential Revision:	https://reviews.freebsd.org/D17590


Revision 339170 - (view) (download) (annotate) - [select for diffs]
Modified Wed Oct 3 17:40:04 2018 UTC (5 years, 9 months ago) by glebius
File length: 106902 byte(s)
Diff to previous 338136
In PR 227259, a user is reporting that they have code which is using
shutdown() to wakeup another thread blocked on a stream listen socket.
This code is failing, while it used to work on FreeBSD 10 and still
works on Linux.

It seems reasonable to add another exception to support something users are
actually doing, which used to work on FreeBSD 10, and still works on Linux.
And, it seems like it should be acceptable to POSIX, as we still return
ENOTCONN.

This patch is different to what had been committed to stable/11, since
code around listening sockets is different. Patch in D15019 is written
by jtl@, slightly modified by me.

PR:		227259
Obtained from:	jtl
Approved by:	re (kib)
Differential Revision:  D15019


Revision 338136 - (view) (download) (annotate) - [select for diffs]
Modified Tue Aug 21 14:04:30 2018 UTC (5 years, 10 months ago) by tuexen
File length: 106495 byte(s)
Diff to previous 336170
Add SOL_SOCKET level socket option with name SO_DOMAIN to get
the domain of a socket.

This is helpful when testing and Solaris and Linux have the same
socket option using the same name.

Reviewed by:		bcr@, rrs@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16791


Revision 336170 - (view) (download) (annotate) - [select for diffs]
Modified Tue Jul 10 13:03:06 2018 UTC (5 years, 11 months ago) by brooks
File length: 106410 byte(s)
Diff to previous 336023
Use uintptr_t alone when assigning to kvaddr_t variables.

Suggested by:	jhb


Revision 336023 - (view) (download) (annotate) - [select for diffs]
Modified Fri Jul 6 10:03:33 2018 UTC (5 years, 11 months ago) by brooks
File length: 106430 byte(s)
Diff to previous 335979
Correct breakage on 32-bit platforms from r335979.


Revision 335979 - (view) (download) (annotate) - [select for diffs]
Modified Thu Jul 5 13:13:48 2018 UTC (5 years, 11 months ago) by brooks
File length: 106408 byte(s)
Diff to previous 334962
Make struct xinpcb and friends word-size independent.

Replace size_t members with ksize_t (uint64_t) and pointer members
(never used as pointers in userspace, but instead as unique
idenitifiers) with kvaddr_t (uint64_t). This makes the structs
identical between 32-bit and 64-bit ABIs.

On 64-bit bit systems, the ABI is maintained. On 32-bit systems,
this is an ABI breaking change. The ABI of most of these structs
was previously broken in r315662.  This also imposes a small API
change on userspace consumers who must handle kernel pointers
becoming virtual addresses.

PR:		228301 (exp-run by antoine)
Reviewed by:	jtl, kib, rwatson (various versions)
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15386


Revision 334962 - (view) (download) (annotate) - [select for diffs]
Modified Mon Jun 11 17:10:19 2018 UTC (6 years ago) by mmacy
File length: 106388 byte(s)
Diff to previous 334960
limit change to fixing controlp handling pending review


Revision 334960 - (view) (download) (annotate) - [select for diffs]
Modified Mon Jun 11 16:31:42 2018 UTC (6 years ago) by mmacy
File length: 106492 byte(s)
Diff to previous 334853
soreceive_stream: correctly handle edge cases

- non NULL controlp is not an error, returning EINVAL
  would cause X forwarding to fail

- MSG_PEEK and MSG_WAITALL are fairly exceptional, but we still
  want to handle them - punt to soreceive_generic


Revision 334853 - (view) (download) (annotate) - [select for diffs]
Modified Fri Jun 8 19:35:24 2018 UTC (6 years ago) by jtl
File length: 106446 byte(s)
Diff to previous 334719
Add a socket destructor callback.  This allows kernel providers to set
callbacks to perform additional cleanup actions at the time a socket is
closed.

Michio Honda presented a use for this at BSDCan 2018.
(See https://www.bsdcan.org/2018/schedule/events/965.en.html .)

Submitted by:	Michio Honda <micchie at sfc.wide.ad.jp> (previous version)
Reviewed by:	lstewart (previous version)
Differential Revision:	https://reviews.freebsd.org/D15706


Revision 334719 - (view) (download) (annotate) - [select for diffs]
Modified Wed Jun 6 15:45:57 2018 UTC (6 years ago) by sbruno
File length: 106257 byte(s)
Diff to previous 332967
Load balance sockets with new SO_REUSEPORT_LB option.

This patch adds a new socket option, SO_REUSEPORT_LB, which allow multiple
programs or threads to bind to the same port and incoming connections will be
load balanced using a hash function.

Most of the code was copied from a similar patch for DragonflyBSD.

However, in DragonflyBSD, load balancing is a global on/off setting and can not
be set per socket. This patch allows for simultaneous use of both the current
SO_REUSEPORT and the new SO_REUSEPORT_LB options on the same system.

Required changes to structures:
Globally change so_options from 16 to 32 bit value to allow for more options.
Add hashtable in pcbinfo to hold all SO_REUSEPORT_LB sockets.

Limitations:
As DragonflyBSD, a load balance group is limited to 256 pcbs (256 programs or
threads sharing the same socket).

This is a substantially different contribution as compared to its original
incarnation at svn r332894 and reverted at svn r332967.  Thanks to rwatson@
for the substantive feedback that is included in this commit.

Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Obtained from:	DragonflyBSD
Relnotes:	Yes
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D11003


Revision 332967 - (view) (download) (annotate) - [select for diffs]
Modified Tue Apr 24 19:55:12 2018 UTC (6 years, 2 months ago) by sbruno
File length: 106209 byte(s)
Diff to previous 332894
Revert r332894 at the request of the submitter.

Submitted by:	Johannes Lundberg <johalun0_gmail.com>
Sponsored by:	Limelight Networks


Revision 332894 - (view) (download) (annotate) - [select for diffs]
Modified Mon Apr 23 19:51:00 2018 UTC (6 years, 2 months ago) by sbruno
File length: 108328 byte(s)
Diff to previous 332122
Load balance sockets with new SO_REUSEPORT_LB option

This patch adds a new socket option, SO_REUSEPORT_LB, which allow multiple
programs or threads to bind to the same port and incoming connections will be
load balanced using a hash function.

Most of the code was copied from a similar patch for DragonflyBSD.

However, in DragonflyBSD, load balancing is a global on/off setting and can not
be set per socket. This patch allows for simultaneous use of both the current
SO_REUSEPORT and the new SO_REUSEPORT_LB options on the same system.

Required changes to structures
Globally change so_options from 16 to 32 bit value to allow for more options.
Add hashtable in pcbinfo to hold all SO_REUSEPORT_LB sockets.

Limitations
As DragonflyBSD, a load balance group is limited to 256 pcbs
(256 programs or threads sharing the same socket).

Submitted by:	Johannes Lundberg <johanlun0@gmail.com>
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D11003


Revision 332122 - (view) (download) (annotate) - [select for diffs]
Modified Fri Apr 6 17:35:35 2018 UTC (6 years, 2 months ago) by brooks
File length: 106209 byte(s)
Diff to previous 326023
Move most of the contents of opt_compat.h to opt_global.h.

opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c.  A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by:	kib, cem, jhb, jtl
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14941


Revision 326023 - (view) (download) (annotate) - [select for diffs]
Modified Mon Nov 20 19:43:44 2017 UTC (6 years, 7 months ago) by pfg
File length: 106233 byte(s)
Diff to previous 323594
sys: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.


Revision 323594 - (view) (download) (annotate) - [select for diffs]
Modified Thu Sep 14 18:05:54 2017 UTC (6 years, 9 months ago) by glebius
File length: 106189 byte(s)
Diff to previous 322856
Fix locking in soisconnected().

When a newborn socket moves from incomplete queue to complete
one, we need to obtain the listening socket lock after the child,
which is a wrong order.  The old code did that in potentially
endless loop of mtx_trylock().  The new one does only one attempt
of mtx_trylock(), and in case of failure references listening
socket, unlocks child and locks everything in right order.  In
case if listening socket shuts down during that, just bail out.

Reported & tested by:	Jason Eggleston <jeggleston llnw.com>
Reported & tested by:	Jason Wolfe <jason llnw.com>


Revision 322856 - (view) (download) (annotate) - [select for diffs]
Modified Thu Aug 24 20:49:19 2017 UTC (6 years, 10 months ago) by glebius
File length: 105734 byte(s)
Diff to previous 321325
Third take on the r319685 and r320480.  Actually allow for call soisconnected()
via soisdisconnected(), and in the earlier unlock earlier to avoid lock
recursion.

This fixes a situation when a socket on accept queue is reset before being
accepted.

Reported by:	Jason Eggleston <jeggleston llnw.com>


Revision 321325 - (view) (download) (annotate) - [select for diffs]
Modified Fri Jul 21 07:44:43 2017 UTC (6 years, 11 months ago) by tuexen
File length: 105709 byte(s)
Diff to previous 320652
Fix getsockopt() for listening sockets when using SO_SNDBUF, SO_RCVBUF,
SO_SNDLOWAT, SO_RCVLOWAT. Since r31972 it only worked for non-listening
sockets.

Sponsored by:	Netflix, Inc.


Revision 320652 - (view) (download) (annotate) - [select for diffs]
Modified Tue Jul 4 18:23:17 2017 UTC (7 years ago) by hselasky
File length: 105521 byte(s)
Diff to previous 320324
After r319722 two fields were left uninitialized when transforming a
socket structure into a listening socket. This resulted in an invalid
instruction fault for all 32-bit platforms.

When INVARIANTS is set the union where the two uninitialized fields
reside gets properly zeroed. This patch ensures the two uninitialized
fields are zeroed when INVARIANTS is undefined.

For 64-bit platforms this issue was not visible because so->sol_upcall
which is uninitialized overlaps with so->so_rcv.sb_state which is
already zero during soalloc();

For 32-bit platforms this issue was visible and resulted in an invalid
instruction fault, because so->sol_upcall overlaps with
so->so_rcv.sb_sel which is always initialized to a valid data pointer
during soalloc().

Verifying the offset locations mentioned above are identical is left
as an exercise to the reader.

PR: 220452
PR: 220358
Reviewed by:	ae (network), gallatin
Differential Revision:	https://reviews.freebsd.org/D11475
Sponsored by:	Mellanox Technologies


Revision 320324 - (view) (download) (annotate) - [select for diffs]
Modified Sun Jun 25 01:41:07 2017 UTC (7 years ago) by glebius
File length: 105469 byte(s)
Diff to previous 319988
Provide sbsetopt() that handles socket buffer related socket options.
It distinguishes between data flow sockets and listening sockets, and
in case of the latter doesn't change resource limits, since listening
sockets don't hold any buffers, they only carry values to be inherited
by their children.


Revision 319988 - (view) (download) (annotate) - [select for diffs]
Modified Thu Jun 15 20:11:29 2017 UTC (7 years ago) by glebius
File length: 106259 byte(s)
Diff to previous 319722
Plug read(2) and write(2) on listening sockets.


Revision 319722 - (view) (download) (annotate) - [select for diffs]
Modified Thu Jun 8 21:30:34 2017 UTC (7 years ago) by glebius
File length: 106116 byte(s)
Diff to previous 319641
Listening sockets improvements.

o Separate fields of struct socket that belong to listening from
  fields that belong to normal dataflow, and unionize them.  This
  shrinks the structure a bit.
  - Take out selinfo's from the socket buffers into the socket. The
    first reason is to support braindamaged scenario when a socket is
    added to kevent(2) and then listen(2) is cast on it. The second
    reason is that there is future plan to make socket buffers pluggable,
    so that for a dataflow socket a socket buffer can be changed, and
    in this case we also want to keep same selinfos through the lifetime
    of a socket.
  - Remove struct struct so_accf. Since now listening stuff no longer
    affects struct socket size, just move its fields into listening part
    of the union.
  - Provide sol_upcall field and enforce that so_upcall_set() may be called
    only on a dataflow socket, which has buffers, and for listening sockets
    provide solisten_upcall_set().

o Remove ACCEPT_LOCK() global.
  - Add a mutex to socket, to be used instead of socket buffer lock to lock
    fields of struct socket that don't belong to a socket buffer.
  - Allow to acquire two socket locks, but the first one must belong to a
    listening socket.
  - Make soref()/sorele() to use atomic(9).  This allows in some situations
    to do soref() without owning socket lock.  There is place for improvement
    here, it is possible to make sorele() also to lock optionally.
  - Most protocols aren't touched by this change, except UNIX local sockets.
    See below for more information.

o Reduce copy-and-paste in kernel modules that accept connections from
  listening sockets: provide function solisten_dequeue(), and use it in
  the following modules: ctl(4), iscsi(4), ng_btsocket(4), ng_ksocket(4),
  infiniband, rpc.

o UNIX local sockets.
  - Removal of ACCEPT_LOCK() global uncovered several races in the UNIX
    local sockets.  Most races exist around spawning a new socket, when we
    are connecting to a local listening socket.  To cover them, we need to
    hold locks on both PCBs when spawning a third one.  This means holding
    them across sonewconn().  This creates a LOR between pcb locks and
    unp_list_lock.
  - To fix the new LOR, abandon the global unp_list_lock in favor of global
    unp_link_lock.  Indeed, separating these two locks didn't provide us any
    extra parralelism in the UNIX sockets.
  - Now call into uipc_attach() may happen with unp_link_lock hold if, we
    are accepting, or without unp_link_lock in case if we are just creating
    a socket.
  - Another problem in UNIX sockets is that uipc_close() basicly did nothing
    for a listening socket.  The vnode remained opened for connections.  This
    is fixed by removing vnode in uipc_close().  Maybe the right way would be
    to do it for all sockets (not only listening), simply move the vnode
    teardown from uipc_detach() to uipc_close()?

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D9770


Revision 319641 - (view) (download) (annotate) - [select for diffs]
Modified Wed Jun 7 01:48:11 2017 UTC (7 years ago) by glebius
File length: 97932 byte(s)
Diff to previous 319640
Provide typedef for socket upcall function.
While here change so_gen_t type to modern uint64_t.


Revision 319640 - (view) (download) (annotate) - [select for diffs]
Modified Wed Jun 7 01:21:34 2017 UTC (7 years ago) by glebius
File length: 97961 byte(s)
Diff to previous 319505
Remove a piece of dead code.


Revision 319505 - (view) (download) (annotate) - [select for diffs]
Modified Fri Jun 2 17:49:21 2017 UTC (7 years, 1 month ago) by glebius
File length: 98105 byte(s)
Diff to previous 317421
Rename accept filter getopt/setopt functions, so that they are prefixed
with module name and match other functions in the module.  There is no
functional change.


Revision 317421 - (view) (download) (annotate) - [select for diffs]
Modified Tue Apr 25 19:54:34 2017 UTC (7 years, 2 months ago) by pkelsey
File length: 98120 byte(s)
Diff to previous 316874
Remove unnecessary check for NULL mbuf in soreceive_generic().

This check has been redundant since it was introduced in r162554.

Reviewed by:	emaste, glebius
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D10322


Revision 316874 - (view) (download) (annotate) - [select for diffs]
Modified Fri Apr 14 17:23:28 2017 UTC (7 years, 2 months ago) by sobomax
File length: 98125 byte(s)
Diff to previous 313043
Restore ability to shutdown DGRAM sockets, still forcing ENOTCONN to be returned
by the shutdown(2) system call. This ability has been lost as part of the svn
revision 285910.

Reviewed by:	ed, rwatson, glebius, hiren
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D10351


Revision 313043 - (view) (download) (annotate) - [select for diffs]
Modified Wed Feb 1 13:12:07 2017 UTC (7 years, 5 months ago) by harti
File length: 97474 byte(s)
Diff to previous 312379
Merge filt_soread and filt_solisten and decide what to do when checking
for EVFILT_READ at the point of the check not when the event is registers.
This fixes a problem with asio when accepting a connection.

Reviewed by:	kib@, Scott Mitchell


Revision 312379 - (view) (download) (annotate) - [select for diffs]
Modified Wed Jan 18 13:31:17 2017 UTC (7 years, 5 months ago) by hselasky
File length: 97800 byte(s)
Diff to previous 312296
Implement kernel support for hardware rate limited sockets.

- Add RATELIMIT kernel configuration keyword which must be set to
enable the new functionality.

- Add support for hardware driven, Receive Side Scaling, RSS aware, rate
limited sendqueues and expose the functionality through the already
established SO_MAX_PACING_RATE setsockopt(). The API support rates in
the range from 1 to 4Gbytes/s which are suitable for regular TCP and
UDP streams. The setsockopt(2) manual page has been updated.

- Add rate limit function callback API to "struct ifnet" which supports
the following operations: if_snd_tag_alloc(), if_snd_tag_modify(),
if_snd_tag_query() and if_snd_tag_free().

- Add support to ifconfig to view, set and clear the IFCAP_TXRTLMT
flag, which tells if a network driver supports rate limiting or not.

- This patch also adds support for rate limiting through VLAN and LAGG
intermediate network devices.

- How rate limiting works:

1) The userspace application calls setsockopt() after accepting or
making a new connection to set the rate which is then stored in the
socket structure in the kernel. Later on when packets are transmitted
a check is made in the transmit path for rate changes. A rate change
implies a non-blocking ifp->if_snd_tag_alloc() call will be made to the
destination network interface, which then sets up a custom sendqueue
with the given rate limitation parameter. A "struct m_snd_tag" pointer is
returned which serves as a "snd_tag" hint in the m_pkthdr for the
subsequently transmitted mbufs.

2) When the network driver sees the "m->m_pkthdr.snd_tag" different
from NULL, it will move the packets into a designated rate limited sendqueue
given by the snd_tag pointer. It is up to the individual drivers how the rate
limited traffic will be rate limited.

3) Route changes are detected by the NIC drivers in the ifp->if_transmit()
routine when the ifnet pointer in the incoming snd_tag mismatches the
one of the network interface. The network adapter frees the mbuf and
returns EAGAIN which causes the ip_output() to release and clear the send
tag. Upon next ip_output() a new "snd_tag" will be tried allocated.

4) When the PCB is detached the custom sendqueue will be released by a
non-blocking ifp->if_snd_tag_free() call to the currently bound network
interface.

Reviewed by:		wblock (manpages), adrian, gallatin, scottl (network)
Differential Revision:	https://reviews.freebsd.org/D3687
Sponsored by:		Mellanox Technologies
MFC after:		3 months


Revision 312296 - (view) (download) (annotate) - [select for diffs]
Modified Mon Jan 16 17:46:38 2017 UTC (7 years, 5 months ago) by sobomax
File length: 97543 byte(s)
Diff to previous 312277
Add a new socket option SO_TS_CLOCK to pick from several different clock
sources to return timestamps when SO_TIMESTAMP is enabled. Two additional
clock sources are:

o nanosecond resolution realtime clock (equivalent of CLOCK_REALTIME);
o nanosecond resolution monotonic clock (equivalent of CLOCK_MONOTONIC).

In addition to this, this option provides unified interface to get bintime
(equivalent of using SO_BINTIME), except it also supported with IPv6 where
SO_BINTIME has never been supported. The long term plan is to depreciate
SO_BINTIME and move everything to using SO_TS_CLOCK.

Idea for this enhancement has been briefly discussed on the Net session
during dev summit in Ottawa last June and the general input was positive.

This change is believed to benefit network benchmarks/profiling as well
as other scenarios where precise time of arrival measurement is necessary.

There are two regression test cases as part of this commit: one extends unix
domain test code (unix_cmsg) to test new SCM_XXX types and another one
implementis totally new test case which exchanges UDP packets between two
processes using both conventional methods (i.e. calling clock_gettime(2)
before recv(2) and after send(2)), as well as using setsockopt()+recv() in
receive path. The resulting delays are checked for sanity for all supported
clock types.

Reviewed by:    adrian, gnn
Differential Revision:  https://reviews.freebsd.org/D9171


Revision 312277 - (view) (download) (annotate) - [select for diffs]
Modified Mon Jan 16 08:25:33 2017 UTC (7 years, 5 months ago) by hiren
File length: 97224 byte(s)
Diff to previous 311568
Add kevent EVFILT_EMPTY for notification when a client has received all data
i.e. everything outstanding has been acked.

Reviewed by:	bz, gnn (previous version)
MFC after:	3 days
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D9150


Revision 311568 - (view) (download) (annotate) - [select for diffs]
Modified Fri Jan 6 23:41:45 2017 UTC (7 years, 5 months ago) by jhb
File length: 96739 byte(s)
Diff to previous 309018
Set MORETOCOME for AIO write requests on a socket.

Add a MSG_MOREOTOCOME message flag. When this flag is set, sosend*
set PRUS_MOREOTOCOME when invoking the protocol send method. The aio
worker tasks for sending on a socket set this flag when there are
additional write jobs waiting on the socket buffer.

Reviewed by:	adrian
MFC after:	1 month
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D8955


Revision 309018 - (view) (download) (annotate) - [select for diffs]
Modified Tue Nov 22 18:31:43 2016 UTC (7 years, 7 months ago) by br
File length: 96674 byte(s)
Diff to previous 306186
Revert r306186 ("Adjust the sopt_val pointer on bigendian systems").

This logic doesn't work with bigger sopt_valsize (e.g. when ipfw
passing 2048 bytes rule).

Reported by:	adrian
Sponsored by:	DARPA, AFRL


Revision 306186 - (view) (download) (annotate) - [select for diffs]
Modified Thu Sep 22 12:41:53 2016 UTC (7 years, 9 months ago) by br
File length: 96791 byte(s)
Diff to previous 305832
Adjust the sopt_val pointer on bigendian systems (e.g. MIPS64EB).

sooptcopyin() checks if size of data provided by user is <= than we can
accept, else it strips down the size. On bigendian platforms we have to
move pointer as well so we copy the actual data.

Reviewed by:	gnn
Sponsored by:	DARPA, AFRL
Sponsored by:	HEIF5
Differential Revision:	https://reviews.freebsd.org/D7980


Revision 305832 - (view) (download) (annotate) - [select for diffs]
Modified Thu Sep 15 13:16:20 2016 UTC (7 years, 9 months ago) by emaste
File length: 96674 byte(s)
Diff to previous 305824
Renumber license clauses in sys/kern to avoid skipping #3


Revision 305824 - (view) (download) (annotate) - [select for diffs]
Modified Thu Sep 15 07:41:48 2016 UTC (7 years, 9 months ago) by kevlo
File length: 96674 byte(s)
Diff to previous 300419
Remove the 4.3BSD compatible macro m_copy(), use m_copym() instead.

Reviewed by:	gnn
Differential Revision:	https://reviews.freebsd.org/D7878


Revision 300419 - (view) (download) (annotate) - [select for diffs]
Modified Sun May 22 13:10:48 2016 UTC (8 years, 1 month ago) by bapt
File length: 96654 byte(s)
Diff to previous 300418
Fix typo introduced by me (not the submitter) when fixing typos


Revision 300418 - (view) (download) (annotate) - [select for diffs]
Modified Sun May 22 13:04:45 2016 UTC (8 years, 1 month ago) by bapt
File length: 96653 byte(s)
Diff to previous 298819
Fix typos in the comments

Submitted by:	cipherwraith666@gmail.com (via github)


Revision 298819 - (view) (download) (annotate) - [select for diffs]
Modified Fri Apr 29 22:15:33 2016 UTC (8 years, 2 months ago) by pfg
File length: 96652 byte(s)
Diff to previous 298796
sys/kern: spelling fixes in comments.

No functional change.


Revision 298796 - (view) (download) (annotate) - [select for diffs]
Modified Fri Apr 29 20:11:09 2016 UTC (8 years, 2 months ago) by jhb
File length: 96651 byte(s)
Diff to previous 297145
Introduce a new protocol hook pru_aio_queue.

This allows a protocol to claim individual AIO requests instead of using
the default socket AIO handling.

Sponsored by:	Chelsio Communications


Revision 297145 - (view) (download) (annotate) - [select for diffs]
Modified Mon Mar 21 08:03:50 2016 UTC (8 years, 3 months ago) by maxim
File length: 96560 byte(s)
Diff to previous 296277
o "avaliable" -> "available".

PR:		208141
Submitted by:	Tyler Littlefield


Revision 296277 - (view) (download) (annotate) - [select for diffs]
Modified Tue Mar 1 18:12:14 2016 UTC (8 years, 4 months ago) by jhb
File length: 96560 byte(s)
Diff to previous 295136
Refactor the AIO subsystem to permit file-type-specific handling and
improve cancellation robustness.

Introduce a new file operation, fo_aio_queue, which is responsible for
queueing and completing an asynchronous I/O request for a given file.
The AIO subystem now exports library of routines to manipulate AIO
requests as well as the ability to run a handler function in the
"default" pool of AIO daemons to service a request.

A default implementation for file types which do not include an
fo_aio_queue method queues requests to the "default" pool invoking the
fo_read or fo_write methods as before.

The AIO subsystem permits file types to install a private "cancel"
routine when a request is queued to permit safe dequeueing and cleanup
of cancelled requests.

Sockets now use their own pool of AIO daemons and service per-socket
requests in FIFO order.  Socket requests will not block indefinitely
permitting timely cancellation of all requests.

Due to the now-tight coupling of the AIO subsystem with file types,
the AIO subsystem is now a standard part of all kernels.  The VFS_AIO
kernel option and aio.ko module are gone.

Many file types may block indefinitely in their fo_read or fo_write
callbacks resulting in a hung AIO daemon.  This can result in hung
user processes (when processes attempt to cancel all outstanding
requests during exit) or a hung system.  To protect against this, AIO
requests are only permitted for known "safe" files by default.  AIO
requests for all file types can be enabled by setting the new
vfs.aio.enable_usafe sysctl to a non-zero value.  The AIO tests have
been updated to skip operations on unsafe file types if the sysctl is
zero.

Currently, AIO requests on sockets and raw disks are considered safe
and are enabled by default.  aio_mlock() is also enabled by default.

Reviewed by:	cem, jilles
Discussed with:	kib (earlier version)
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D5289


Revision 295136 - (view) (download) (annotate) - [select for diffs]
Modified Tue Feb 2 05:57:59 2016 UTC (8 years, 5 months ago) by alfred
File length: 96381 byte(s)
Diff to previous 285910
Increase max allowed backlog for listen sockets
from short to int.

PR: 203922
Submitted by: White Knight <white_knight@2ch.net>
MFC After: 4 weeks


Revision 285910 - (view) (download) (annotate) - [select for diffs]
Modified Mon Jul 27 13:17:57 2015 UTC (8 years, 11 months ago) by ed
File length: 96239 byte(s)
Diff to previous 285862
Make shutdown() return ENOTCONN as required by POSIX, part deux.

Summary:
Back in 2005, maxim@ attempted to fix shutdown() to return ENOTCONN in case the socket was not connected (r150152). This had to be rolled back (r150155), as it broke some of the existing programs that depend on this behavior. I reapplied this change on my system and indeed, syslogd failed to start up. I fixed this back in February (279016) and MFC'ed it to the supported stable branches. Apart from that, things seem to work out all right.

Since at least Linux and Mac OS X do the right thing, I'd like to go ahead and give this another try. To keep old copies of syslogd working, only start returning ENOTCONN for recent binaries.

I took a look at the XNU sources and they seem to test against both SS_ISCONNECTED, SS_ISCONNECTING and SS_ISDISCONNECTING, instead of just SS_ISCONNECTED. That seams reasonable, so let's do the same.

Test Plan:
This issue was uncovered while writing tests for shutdown() in CloudABI:

https://github.com/NuxiNL/cloudlibc/blob/master/src/libc/sys/socket/shutdown_test.c#L26

Reviewers: glebius, rwatson, #manpages, gnn, #network

Reviewed By: gnn, #network

Subscribers: bms, mjg, imp

Differential Revision: https://reviews.freebsd.org/D3039


Revision 285862 - (view) (download) (annotate) - [select for diffs]
Modified Fri Jul 24 22:13:39 2015 UTC (8 years, 11 months ago) by delphij
File length: 96129 byte(s)
Diff to previous 285522
Fix a typo in comment.

Submitted by:	Yanhui Shen via twitter
MFC after:	3 days


Revision 285522 - (view) (download) (annotate) - [select for diffs]
Modified Tue Jul 14 02:00:50 2015 UTC (8 years, 11 months ago) by cem
File length: 96130 byte(s)
Diff to previous 279209
Fix cleanup race between unp_dispose and unp_gc

unp_dispose and unp_gc could race to teardown the same mbuf chains, which
can lead to dereferencing freed filedesc pointers.

This patch adds an IGNORE_RIGHTS flag on unpcbs marking the unpcb's RIGHTS
as invalid/freed. The flag is protected by UNP_LIST_LOCK.

To serialize against unp_gc, unp_dispose needs the socket object. Change the
dom_dispose() KPI to take a socket object instead of an mbuf chain directly.

PR:		194264
Differential Revision:	https://reviews.freebsd.org/D3044
Reviewed by:	mjg (earlier version)
Approved by:	markj (mentor)
Obtained from:	mjg
MFC after:	1 month
Sponsored by:	EMC / Isilon Storage Division


Revision 279209 - (view) (download) (annotate) - [select for diffs]
Modified Mon Feb 23 15:24:43 2015 UTC (9 years, 4 months ago) by ae
File length: 96137 byte(s)
Diff to previous 279206
soreceive_generic() still has similar KASSERT(), therefore instead of
remove KASSERT(), change it to check mbuf isn't NULL.

Suggested by:	kib
MFC after:	1 week


Revision 279206 - (view) (download) (annotate) - [select for diffs]
Modified Mon Feb 23 13:41:35 2015 UTC (9 years, 4 months ago) by ae
File length: 96059 byte(s)
Diff to previous 278780
In some cases soreceive_dgram() can return no data, but has control
message. This can happen when application is sending packets too big
for the path MTU and recvmsg() will return zero (indicating no data)
but there will be a cmsghdr with cmsg_type set to IPV6_PATHMTU.
Remove KASSERT() which does NULL pointer dereference in such case.
Also call m_freem() only when m isn't NULL.

PR:		197882
MFC after:	1 week
Sponsored by:	Yandex LLC


Revision 278780 - (view) (download) (annotate) - [select for diffs]
Modified Sat Feb 14 20:00:57 2015 UTC (9 years, 4 months ago) by davide
File length: 96038 byte(s)
Diff to previous 275968
Don't access sockbuf fields directly, use accessor functions instead.
It is safe to move the call to socantsendmore_locked() after
sbdrop_locked() as long as we hold the sockbuf lock across the two
calls.

CR:	D1805
Reviewed by:	adrian, kmacy, julian, rwatson


Revision 275968 - (view) (download) (annotate) - [select for diffs]
Modified Sat Dec 20 22:12:04 2014 UTC (9 years, 6 months ago) by glebius
File length: 96186 byte(s)
Diff to previous 275808
Revert r274494, r274712, r275955 and provide extra comments explaining
why there could appear a zero-sized mbufs in socket buffers.

A proper fix would be to divorce record socket buffers and stream
socket buffers, and divorce pru_send that accepts normal data from
pru_send that accepts control data.


Revision 275808 - (view) (download) (annotate) - [select for diffs]
Modified Mon Dec 15 17:52:08 2014 UTC (9 years, 6 months ago) by jhb
File length: 96076 byte(s)
Diff to previous 275329
Check for SS_NBIO in so->so_state instead of sb->sb_flags in
soreceive_stream().

Differential Revision:	https://reviews.freebsd.org/D1299
Reviewed by:	bz, gnn
MFC after:	1 week


Revision 275329 - (view) (download) (annotate) - [select for diffs]
Modified Sun Nov 30 13:24:21 2014 UTC (9 years, 7 months ago) by glebius
File length: 96076 byte(s)
Diff to previous 275326
Merge from projects/sendfile: extend protocols API to support
sending not ready data:
o Add new flag to pru_send() flags - PRUS_NOTREADY.
o Add new protocol method pru_ready().

Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix


Revision 275326 - (view) (download) (annotate) - [select for diffs]
Modified Sun Nov 30 12:52:33 2014 UTC (9 years, 7 months ago) by glebius
File length: 95980 byte(s)
Diff to previous 274712
Merge from projects/sendfile:

o Introduce a notion of "not ready" mbufs in socket buffers.  These
mbufs are now being populated by some I/O in background and are
referenced outside.  This forces following implications:
- An mbuf which is "not ready" can't be taken out of the buffer.
- An mbuf that is behind a "not ready" in the queue neither.
- If sockbet buffer is flushed, then "not ready" mbufs shouln't be
  freed.

o In struct sockbuf the sb_cc field is split into sb_ccc and sb_acc.
  The sb_ccc stands for ""claimed character count", or "committed
  character count".  And the sb_acc is "available character count".
  Consumers of socket buffer API shouldn't already access them directly,
  but use sbused() and sbavail() respectively.
o Not ready mbufs are marked with M_NOTREADY, and ready but blocked ones
  with M_BLOCKED.
o New field sb_fnrdy points to the first not ready mbuf, to avoid linear
  search.
o New function sbready() is provided to activate certain amount of mbufs
  in a socket buffer.

A special note on SCTP:
  SCTP has its own sockbufs.  Unfortunately, FreeBSD stack doesn't yet
allow protocol specific sockbufs.  Thus, SCTP does some hacks to make
itself compatible with FreeBSD: it manages sockbufs on its own, but keeps
sb_cc updated to inform the stack of amount of data in them.  The new
notion of "not ready" data isn't supported by SCTP.  Instead, only a
mechanical substitute is done: s/sb_cc/sb_ccc/.
  A proper solution would be to take away struct sockbuf from struct
socket and allow protocols to implement their own socket buffers, like
SCTP already does.  This was discussed with rrs@.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.


Revision 274712 - (view) (download) (annotate) - [select for diffs]
Modified Wed Nov 19 14:27:38 2014 UTC (9 years, 7 months ago) by glebius
File length: 95855 byte(s)
Diff to previous 274504
Do not allocate zero-length mbuf in sosend_generic().

Found by:	pho
Sponsored by:	Nginx, Inc.


Revision 274504 - (view) (download) (annotate) - [select for diffs]
Modified Fri Nov 14 15:33:40 2014 UTC (9 years, 7 months ago) by glebius
File length: 95840 byte(s)
Diff to previous 274421
Merge from projects/sendfile:
  Use sbcut_locked() instead of manually editing a sockbuf.

Sponsored by:	Nginx, Inc.


Revision 274421 - (view) (download) (annotate) - [select for diffs]
Modified Wed Nov 12 09:57:15 2014 UTC (9 years, 7 months ago) by glebius
File length: 95876 byte(s)
Diff to previous 271254
In preparation of merging projects/sendfile, transform bare access to
sb_cc member of struct sockbuf to a couple of inline functions:

sbavail() and sbused()

Right now they are equal, but once notion of "not ready socket buffer data",
will be checked in, they are going to be different.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.


Revision 271254 - (view) (download) (annotate) - [select for diffs]
Modified Mon Sep 8 09:04:22 2014 UTC (9 years, 9 months ago) by hrs
File length: 95826 byte(s)
Diff to previous 271216
- Make hhook_run_socket() vnet-aware instead of adding CURVNET_SET() around
  the function calls.
- Fix a memory leak and stats in the case that hhook_run_socket() fails
  in soalloc().

PR:	193265


Revision 271216 - (view) (download) (annotate) - [select for diffs]
Modified Sun Sep 7 05:44:14 2014 UTC (9 years, 9 months ago) by glebius
File length: 96093 byte(s)
Diff to previous 271182
Fix for r271182.

Submitted by:	mjg
Pointy hat to:	me, submitter and everyone who urged me to commit


Revision 271182 - (view) (download) (annotate) - [select for diffs]
Modified Fri Sep 5 19:50:18 2014 UTC (9 years, 9 months ago) by glebius
File length: 96135 byte(s)
Diff to previous 270664
Set vnet context before accessing V_socket_hhh[].

Submitted by:	"Hiroo Ono (小野寛生)" <hiroo.ono+freebsd gmail.com>


Revision 270664 - (view) (download) (annotate) - [select for diffs]
Modified Tue Aug 26 14:44:08 2014 UTC (9 years, 10 months ago) by glebius
File length: 96041 byte(s)
Diff to previous 270318
- Remove socket file operations declaration from sys/file.h.
- Make them static in sys_socket.c.
- Provide generic invfo_truncate() instead of soo_truncate().

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.


Revision 270318 - (view) (download) (annotate) - [select for diffs]
Modified Fri Aug 22 05:03:30 2014 UTC (9 years, 10 months ago) by hrs
File length: 96013 byte(s)
Diff to previous 270158
Fix a panic which occurs in a VIMAGE-enabled kernel after r270158, and
separate socket_hhook_register() part and put it into VNET_SYS{,UN}INIT()
handler.

Discussed with:	marcel


Revision 270158 - (view) (download) (annotate) - [select for diffs]
Modified Mon Aug 18 23:45:40 2014 UTC (9 years, 10 months ago) by marcel
File length: 95345 byte(s)
Diff to previous 269502
For vendors like Juniper, extensibility for sockets is important.  A
good example is socket options that aren't necessarily generic.  To
this end, OSD is added to the socket structure and hooks are defined
for key operations on sockets.  These are:
o   soalloc() and sodealloc()
o   Get and set socket options
o   Socket related kevent filters.

One aspect about hhook that appears to be not fully baked is the return
semantics (the return value from the hook is ignored in hhook_run_hooks()
at the time of commit).  To support return values, the socket_hhook_data
structure contains a 'status' field to hold return values.

Submitted by:	Anuranjan Shukla <anshukla@juniper.net>
Obtained from:	Juniper Networks, Inc.


Revision 269502 - (view) (download) (annotate) - [select for diffs]
Modified Mon Aug 4 05:40:51 2014 UTC (9 years, 11 months ago) by davide
File length: 93261 byte(s)
Diff to previous 269142
Fix an overflow in getsockopt(). optval isn't big enough to hold
sbintime_t.
Re-introduce r255030 behaviour capping socket timeouts to INT_32
if they're too large.

CR:	https://phabric.freebsd.org/D433
Reported by:	demon
Reviewed by:	bde [1], jhb [2]
MFC after:	2 weeks


Revision 269142 - (view) (download) (annotate) - [select for diffs]
Modified Sat Jul 26 19:27:34 2014 UTC (9 years, 11 months ago) by marcel
File length: 93226 byte(s)
Diff to previous 260719
The accept filter code is not specific to the FreeBSD IPv4 network stack,
so it really should not be under "optional inet". The fact that uipc_accf.c
lives under kern/ lends some weight to making it a "standard" file.

Moving kern/uipc_accf.c from "optional inet" to "standard" eliminates the
need for #ifdef INET in kern/uipc_socket.c.

Also, this meant the net.inet.accf.unloadable sysctl needed to move, as
net.inet does not exist without networking compiled in (as it lives in
netinet/in_proto.c.) The new sysctl has been named net.accf.unloadable.

In order to support existing accept filter sysctls, the net.inet.accf node
has been added netinet/in_proto.c.

Submitted by:	Steve Kiernan <stevek@juniper.net>
Obtained from:	Juniper Networks, Inc.


Revision 260719 - (view) (download) (annotate) - [select for diffs]
Modified Thu Jan 16 13:45:41 2014 UTC (10 years, 5 months ago) by glebius
File length: 93281 byte(s)
Diff to previous 257865
Simplify wait/nowait code, eventually killing last remnant of
historical mbuf(9) allocator flag.

Sponsored by:	Nginx, Inc.


Revision 257865 - (view) (download) (annotate) - [select for diffs]
Modified Fri Nov 8 20:11:15 2013 UTC (10 years, 7 months ago) by hiren
File length: 93324 byte(s)
Diff to previous 257472
Fix typo in a comment.


Revision 257472 - (view) (download) (annotate) - [select for diffs]
Modified Thu Oct 31 20:33:21 2013 UTC (10 years, 8 months ago) by emax
File length: 93324 byte(s)
Diff to previous 255608
Rate limit (to once per minute) "Listen queue overflow" message in
sonewconn().

Reviewed by:	scottl, lstewart
Obtained from:	Netflix, Inc
MFC after:	2 weeks


Revision 255608 - (view) (download) (annotate) - [select for diffs]
Modified Mon Sep 16 06:25:54 2013 UTC (10 years, 9 months ago) by kib
File length: 93092 byte(s)
Diff to previous 255138
Remove zero-copy sockets code.  It only worked for anonymous memory,
and the equivalent functionality is now provided by sendfile(2) over
posix shared memory filedescriptor.

Remove the cow member of struct vm_page, and rearrange the remaining
members.  While there, make hold_count unsigned.

Requested and reviewed by:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Approved by:	re (delphij)


Revision 255138 - (view) (download) (annotate) - [select for diffs]
Added Sun Sep 1 23:34:53 2013 UTC (10 years, 10 months ago) by davide
File length: 97259 byte(s)
Diff to previous 255030
Fix socket buffer timeouts precision using the new sbintime_t KPI instead
of relying on the tvtohz() workaround. The latter has been introduced
lately by jhb@ (r254699) in order to have a fix that can be backported
to STABLE.

Reported by:	Vitja Makarov <vitja.makarov at gmail dot com>
Reviewed by:	jhb (earlier version)



This form allows you to request diffs between any two revisions of this file. For each of the two "sides" of the diff, enter a numeric revision.

  Diffs between and
  Type of Diff should be a

  ViewVC Help
Powered by ViewVC 1.1.27