/[base]/head/sys/kern/kern_event.c
ViewVC logotype

Log of /head/sys/kern/kern_event.c

Parent Directory Parent Directory | Revision Log Revision Log


Links to HEAD: (view) (download) (annotate)
Sticky Revision:


Revision 367498 - (view) (download) (annotate) - [select for diffs]
Modified Mon Nov 9 00:04:35 2020 UTC (3 years, 11 months ago) by mjg
File length: 63984 byte(s)
Diff to previous 365222
kqueue: save space by using only one func pointer for assertions


Revision 365222 - (view) (download) (annotate) - [select for diffs]
Modified Tue Sep 1 22:12:32 2020 UTC (4 years, 1 month ago) by mjg
File length: 64220 byte(s)
Diff to previous 360140
kern: clean up empty lines in .c and .h files


Revision 360140 - (view) (download) (annotate) - [select for diffs]
Modified Tue Apr 21 03:57:30 2020 UTC (4 years, 6 months ago) by kevans
File length: 64221 byte(s)
Diff to previous 357956
kqueue: fix conversion of timer data to sbintime

This unbreaks the i386 kqueue timer tests after a recent change switched
NOTE_ABSTIME over to using microseconds. Notably, the data argument (which
holds useconds) is an int64_t, but we were passing it to timer2sbintime
which takes an intptr_t. Perhaps in a previous incarnation, intptr_t would
have made sense, but now it just leads to the timestamp getting truncated
and subsequently rejected when it no longer fits in an intptr_t.

PR:		245768
Reported by:	lwhsu / CI
MFC after:	1 week


Revision 357956 - (view) (download) (annotate) - [select for diffs]
Modified Sat Feb 15 01:30:13 2020 UTC (4 years, 8 months ago) by mjg
File length: 64222 byte(s)
Diff to previous 350421
kqueue: use new capsicum helpers


Revision 350421 - (view) (download) (annotate) - [select for diffs]
Modified Mon Jul 29 20:26:01 2019 UTC (5 years, 3 months ago) by markj
File length: 64209 byte(s)
Diff to previous 341722
Avoid relying on header pollution from sys/refcount.h.

MFC after:	3 days
Sponsored by:	The FreeBSD Foundation


Revision 341722 - (view) (download) (annotate) - [select for diffs]
Modified Sat Dec 8 06:34:12 2018 UTC (5 years, 10 months ago) by mjg
File length: 64185 byte(s)
Diff to previous 340900
proc: postpone proc unlock until after reporting with kqueue

kqueue would always relock immediately afterwards.

While here drop the NULL check for list itself. The list is
always allocated.

Sponsored by:	The FreeBSD Foundation


Revision 340900 - (view) (download) (annotate) - [select for diffs]
Modified Sat Nov 24 17:06:01 2018 UTC (5 years, 11 months ago) by markj
File length: 64192 byte(s)
Diff to previous 340899
Pass malloc flags directly through kevent(2) subroutines.

Some kevent functions have a boolean "waitok" parameter for use when
calling malloc(9).  Replace them with the corresponding malloc() flags:
the desired behaviour is known at compile-time, so this eliminates a
couple of conditional branches, and makes the code easier to read.

No functional change intended.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18318


Revision 340899 - (view) (download) (annotate) - [select for diffs]
Modified Sat Nov 24 17:02:31 2018 UTC (5 years, 11 months ago) by markj
File length: 64324 byte(s)
Diff to previous 340898
Plug some kernel memory disclosures via kevent(2).

The kernel may register for events on behalf of a userspace process,
in which case it must be careful to zero the kevent struct that will be
copied out to userspace.

Reviewed by:	kib
MFC after:	3 days
Security:	kernel stack memory disclosure
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18317


Revision 340898 - (view) (download) (annotate) - [select for diffs]
Modified Sat Nov 24 16:58:34 2018 UTC (5 years, 11 months ago) by markj
File length: 64293 byte(s)
Diff to previous 340897
Ensure that knotes do not get registered when KQ_CLOSING is set.

KQ_CLOSING is set before draining the knotes associated with a kqueue,
so we must ensure that new knotes are not added after that point.  In
particular, some kernel facilities may register for events on behalf
of a userspace process and race with a close of the kqueue.

PR:		228858
Reviewed by:	kib
Tested by:	pho
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18316


Revision 340897 - (view) (download) (annotate) - [select for diffs]
Modified Sat Nov 24 16:41:29 2018 UTC (5 years, 11 months ago) by markj
File length: 63999 byte(s)
Diff to previous 340861
Lock the knlist before releasing the in-flux state in knote_fork().

Otherwise there is a window, before iteration is resumed, during which
the knote may be freed.  The in-flux state ensures that the knote will
not be removed from the knlist while locks are dropped.

PR:		228858
Reviewed by:	kib
Tested by:	pho
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18316


Revision 340861 - (view) (download) (annotate) - [select for diffs]
Modified Fri Nov 23 23:10:03 2018 UTC (5 years, 11 months ago) by markj
File length: 63999 byte(s)
Diff to previous 340734
Honour the waitok parameter in kevent_expand().

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18316


Revision 340734 - (view) (download) (annotate) - [select for diffs]
Modified Wed Nov 21 17:32:09 2018 UTC (5 years, 11 months ago) by markj
File length: 64121 byte(s)
Diff to previous 340733
Avoid unsynchronized updates to kn_status.

kn_status is protected by the kqueue's lock, but we were updating it
without the kqueue lock held.  For EVFILT_TIMER knotes, there is no
knlist lock, so the knote activation could occur during the kn_status
update and result in KN_QUEUED being lost, in which case we'd enqueue
an already-enqueued knote, corrupting the queue.

Fix the problem by setting or clearing KN_DISABLED before dropping the
kqueue lock to call into the filter.  KN_DISABLED is used only by the
core kevent code, so there is no side effect from setting it earlier.

Reported and tested by:	Sylvain GALLIANO <sg@efficientip.com>
Reviewed by:	kib
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18060


Revision 340733 - (view) (download) (annotate) - [select for diffs]
Modified Wed Nov 21 17:28:10 2018 UTC (5 years, 11 months ago) by markj
File length: 63926 byte(s)
Diff to previous 336761
Remove KN_HASKQLOCK.

It is a write-only flag whose last use was removed in r302235.

No functional change intended.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18059


Revision 336761 - (view) (download) (annotate) - [select for diffs]
Modified Fri Jul 27 13:49:17 2018 UTC (6 years, 3 months ago) by dab
File length: 64064 byte(s)
Diff to previous 333856
Allow a EVFILT_TIMER kevent to be updated.

If a timer is updated (re-added) with a different time period
(specified in the .data field of the kevent), the new time period has
no effect; the timer will not expire until the original time has
elapsed. This violates the documented behavior as the kqueue(2) man
page says (in part) "Re-adding an existing event will modify the
parameters of the original event, and not result in a duplicate
entry."

This modification, adapted from a patch submitted by cem@ to PR214987,
fixes the kqueue system to allow updating a timer entry. The
kevent timer behavior is changed to:

  * When a timer is re-added, update the timer parameters to and
    re-start the timer using the new parameters.
  * Allow updating both active and already expired timers.
  * When the timer has already expired, dequeue any undelivered events
    and clear the count of expirations.

All of these changes address the original PR and also bring the
FreeBSD and macOS kevent timer behaviors into agreement.

A few other changes were made along the way:

  * Update the kqueue(2) man page to reflect the new timer behavior.
  * Fix man page style issues in kqueue(2) diagnosed by igor.
  * Update the timer libkqueue system test to test for the updated
    timer behavior.
  * Fix the (test) libkqueue common.h file so that it includes
    config.h which defines various HAVE_* feature defines, before the
    #if tests for such variables in common.h. This enables the use of
    the actual err(3) family of functions.
  * Fix the usages of the err(3) functions in the tests for incorrect
    type of variables. Those were formerly undiagnosed due to the
    disablement of the err(3) functions (see previous bullet point).

PR:		214987
Reported by:	Brian Wellington <bwelling@xbill.org>
Reviewed by:	kib
MFC after:	1 week
Relnotes:	yes
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D15778


Revision 333856 - (view) (download) (annotate) - [select for diffs]
Modified Sat May 19 05:06:18 2018 UTC (6 years, 5 months ago) by mmacy
File length: 61801 byte(s)
Diff to previous 333840
kevent: annotate unused stack local


Revision 333840 - (view) (download) (annotate) - [select for diffs]
Modified Sat May 19 04:07:00 2018 UTC (6 years, 5 months ago) by mmacy
File length: 61824 byte(s)
Diff to previous 333425
filt_timerdetach: only assign to old if we're going to check it in
a KASSERT


Revision 333425 - (view) (download) (annotate) - [select for diffs]
Modified Wed May 9 18:47:24 2018 UTC (6 years, 5 months ago) by mmacy
File length: 61792 byte(s)
Diff to previous 332122
Eliminate the overhead of gratuitous repeated reinitialization of cap_rights

- Add macros to allow preinitialization of cap_rights_t.

- Convert most commonly used code paths to use preinitialized cap_rights_t.
  A 3.6% speedup in fstat was measured with this change.

Reported by:	mjg
Reviewed by:	oshogbo
Approved by:	sbruno
MFC after:	1 month


Revision 332122 - (view) (download) (annotate) - [select for diffs]
Modified Fri Apr 6 17:35:35 2018 UTC (6 years, 6 months ago) by brooks
File length: 61839 byte(s)
Diff to previous 326271
Move most of the contents of opt_compat.h to opt_global.h.

opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c.  A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by:	kib, cem, jhb, jtl
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14941


Revision 326271 - (view) (download) (annotate) - [select for diffs]
Modified Mon Nov 27 15:20:12 2017 UTC (6 years, 11 months ago) by pfg
File length: 61863 byte(s)
Diff to previous 326184
sys/kern: adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.


Revision 326184 - (view) (download) (annotate) - [select for diffs]
Modified Sat Nov 25 04:49:12 2017 UTC (6 years, 11 months ago) by jhb
File length: 61811 byte(s)
Diff to previous 325721
Decode kevent structures logged via ktrace(2) in kdump.

- Add a new KTR_STRUCT_ARRAY ktrace record type which dumps an array of
  structures.

  The structure name in the record payload is preceded by a size_t
  containing the size of the individual structures.  Use this to
  replace the previous code that dumped the kevent arrays dumped for
  kevent().  kdump is now able to decode the kevent structures rather
  than dumping their contents via a hexdump.

  One change from before is that the 'changes' and 'events' arrays are
  not marked with separate 'read' and 'write' annotations in kdump
  output.  Instead, the first array is the 'changes' array, and the
  second array (only present if kevent doesn't fail with an error) is
  the 'events' array.  For kevent(), empty arrays are denoted by an
  entry with an array containing zero entries rather than no record.

- Move kevent decoding tables from truss to libsysdecode.

  This adds three new functions to decode members of struct kevent:
  sysdecode_kevent_filter, sysdecode_kevent_flags, and
  sysdecode_kevent_fflags.

  kdump uses these helper functions to pretty-print kevent fields.

- Move structure definitions for freebsd11 and freebsd32 kevent
  structures to <sys/event.h> so that they can be shared with userland.
  The 32-bit structures are only exposed if _WANT_KEVENT32 is defined.
  The freebsd11 structures are only exposed if _WANT_FREEBSD11_KEVENT is
  defined.  The 32-bit freebsd11 structure requires both.

- Decode freebsd11 kevent structures in truss for the compat11.kevent()
  system call.

- Log 32-bit kevent structures via ktrace for 32-bit compat kevent()
  system calls.

- While here, constify the 'void *data' argument to ktrstruct().

Reviewed by:	kib (earlier version)
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D12470


Revision 325721 - (view) (download) (annotate) - [select for diffs]
Modified Sat Nov 11 18:04:39 2017 UTC (6 years, 11 months ago) by mjg
File length: 62694 byte(s)
Diff to previous 320471
Add pfind_any

It looks for both regular and zombie processes. This avoids allproc relocking
previously seen with pfind -> zpfind calls.


Revision 320471 - (view) (download) (annotate) - [select for diffs]
Modified Thu Jun 29 14:40:33 2017 UTC (7 years, 4 months ago) by kib
File length: 62746 byte(s)
Diff to previous 320043
Do not cast struct kevent_args or struct freebsd11_kevent_args to
struct g_kevent_args.

On some architectures, e.g. PowerPC, there is additional padding in uap.

Reported and tested by:	andreast
Sponsored by:	The FreeBSD Foundation


Revision 320043 - (view) (download) (annotate) - [select for diffs]
Modified Sat Jun 17 00:57:26 2017 UTC (7 years, 4 months ago) by kib
File length: 62380 byte(s)
Diff to previous 320038
Add abstime kqueue(2) timers and expand struct kevent members.

This change implements NOTE_ABSTIME flag for EVFILT_TIMER, which
specifies that the data field contains absolute time to fire the
event.

To make this useful, data member of the struct kevent must be extended
to 64bit.  Using the opportunity, I also added ext members.  This
changes struct kevent almost to Apple struct kevent64, except I did
not changed type of ident and udata, the later would cause serious API
incompatibilities.

The type of ident was kept uintptr_t since EVFILT_AIO returns a
pointer in this field, and e.g. CHERI is sensitive to the type
(discussed with brooks, jhb).

Unlike Apple kevent64, symbol versioning allows us to claim ABI
compatibility and still name the new syscall kevent(2).  Compat shims
are provided for both host native and compat32.

Requested by:	bapt
Reviewed by:	bapt, brooks, ngie (previous version)
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D11025


Revision 320038 - (view) (download) (annotate) - [select for diffs]
Modified Fri Jun 16 23:41:13 2017 UTC (7 years, 4 months ago) by kib
File length: 59679 byte(s)
Diff to previous 315238
Style.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
X-Differential revision:	https://reviews.freebsd.org/D11025


Revision 315238 - (view) (download) (annotate) - [select for diffs]
Modified Tue Mar 14 09:25:01 2017 UTC (7 years, 7 months ago) by kib
File length: 59674 byte(s)
Diff to previous 315237
Use designated initializers for kevent_copyops.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week


Revision 315237 - (view) (download) (annotate) - [select for diffs]
Modified Tue Mar 14 08:45:52 2017 UTC (7 years, 7 months ago) by kib
File length: 59643 byte(s)
Diff to previous 315155
Hide kev_iovlen() definition under #ifdef KTRACE, fixing build of
kernel configs without KTRACE.

Reported by:	rpokala
Sponsored by:	The FreeBSD Foundation
MFC after:	4 days


Revision 315155 - (view) (download) (annotate) - [select for diffs]
Modified Sun Mar 12 13:48:24 2017 UTC (7 years, 7 months ago) by kib
File length: 59622 byte(s)
Diff to previous 312277
Ktracing kevent(2) calls with unusual arguments might leads to an
overly large allocation requests.

When ktrace-ing io, sys_kevent() allocates memory to copy the
requested changes and reported events.  Allocations are sized by the
incoming syscall lengths arguments, which are user-controlled, and
might cause overflow in calculations or too large allocations.

Since io trace chunks are limited by ktr_geniosize, there is no sense
it even trying to satisfy unbounded allocations.  Export ktr_geniosize
and clamp the buffers sizes in advance.

PR:	217435
Reported by:	Tim Newsham <tim.newsham@nccgroup.trust>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week


Revision 312277 - (view) (download) (annotate) - [select for diffs]
Modified Mon Jan 16 08:25:33 2017 UTC (7 years, 9 months ago) by hiren
File length: 59399 byte(s)
Diff to previous 311040
Add kevent EVFILT_EMPTY for notification when a client has received all data
i.e. everything outstanding has been acked.

Reviewed by:	bz, gnn (previous version)
MFC after:	3 days
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D9150


Revision 311040 - (view) (download) (annotate) - [select for diffs]
Modified Mon Jan 2 01:23:21 2017 UTC (7 years, 9 months ago) by markj
File length: 59339 byte(s)
Diff to previous 310617
Factor out instances of a knote detach followed by a knote_drop() call.

Reviewed by:	kib (previous version)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D9015


Revision 310617 - (view) (download) (annotate) - [select for diffs]
Modified Mon Dec 26 19:33:40 2016 UTC (7 years, 10 months ago) by kib
File length: 59613 byte(s)
Diff to previous 310615
Make knote KN_INFLUX state counted.  This is final fix for the issue
closed by r310302 for knote().

If KN_INFLUX | KN_SCAN flags are set for the note passed to knote() or
knote_fork(), i.e. the knote is scanned, we might erronously clear
INFLUX when finishing notification.  For normal knote() it was fixed
in r310302 simply by remembering the fact that we do not own
KN_INFLUX, since there we own knlist lock and scan thread cannot clear
KN_INFLUX until we drop the lock.  For knote_fork(), the situation is
more complicated, e must drop knlist lock AKA the process lock, since
we need to register new knotes.

Change KN_INFLUX into counter and allow shared ownership of the
in-flux state between scan and knote_fork() or knote().  Both in-flux
setters need to ensure that knote is not dropped in parallel.  Added
assert about kn_influx == 1 in knote_drop() verifies that in-flux state
is not shared when knote is destroyed.

Since KBI of the struct knote is changed by addition of the int
kn_influx field, reorder kn_hook and kn_hookid to fill pad on LP64
arches [1].  This keeps sizeof(struct knote) to same 128 bytes as it
was before addition of kn_influx, on amd64.

Reviewed by:	markj
Suggested by:	markj [1]
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D8898


Revision 310615 - (view) (download) (annotate) - [select for diffs]
Modified Mon Dec 26 19:28:10 2016 UTC (7 years, 10 months ago) by kib
File length: 59571 byte(s)
Diff to previous 310613
Change knlist_destroy() to assert that knlist is empty instead of
accepting the wrong state and printing warning.  Do not obliterate
kl_lock and kl_unlock pointers, they are often useful for post-mortem
analysis.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
X-Differential revision:	https://reviews.freebsd.org/D8898


Revision 310613 - (view) (download) (annotate) - [select for diffs]
Modified Mon Dec 26 19:26:40 2016 UTC (7 years, 10 months ago) by kib
File length: 59823 byte(s)
Diff to previous 310554
Style.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
X-Differential revision:	https://reviews.freebsd.org/D8898


Revision 310554 - (view) (download) (annotate) - [select for diffs]
Modified Sun Dec 25 19:49:35 2016 UTC (7 years, 10 months ago) by kib
File length: 59814 byte(s)
Diff to previous 310552
Some optimizations for kqueue timers.

There is no need to do two allocations per kqueue timer. Gather all
data needed by the timer callout into the structure and allocate it at
once.

Use the structure to preserve the result of timer2sbintime(), to not
perform repeated 64bit calculations in callout.

Remove tautological casts.
Remove now unused p_nexttime [1].

Noted by:	markj [1]
Reviewed by:	markj (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
X-MFC note:	do not remove p_nexttime
Differential revision:	https://reviews.freebsd.org/D8901


Revision 310552 - (view) (download) (annotate) - [select for diffs]
Modified Sun Dec 25 19:38:07 2016 UTC (7 years, 10 months ago) by kib
File length: 59952 byte(s)
Diff to previous 310302
Some style.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
X-Differential revision:	https://reviews.freebsd.org/D8901


Revision 310302 - (view) (download) (annotate) - [select for diffs]
Modified Mon Dec 19 22:18:36 2016 UTC (7 years, 10 months ago) by kib
File length: 59942 byte(s)
Diff to previous 310159
Do not clear KN_INFLUX when not owning influx state.

For notes in KN_INFLUX|KN_SCAN state, the influx bit is set by a
parallel scan.  When knote() reports event for the vnode filters,
which require kqueue unlocked, it unconditionally sets and then clears
influx to keep note around kqueue unlock.  There, do not clear influx
flag if a scan set it, since we do not own it, instead we prevent scan
from executing by holding knlist lock.

The knote_fork() function has somewhat similar problem, it might set
KN_INFLUX for scanned note, drop kqueue and list locks, and then clear
the flag after relock.  A solution there would be different enough, as
well as the test program, so close the reported issue first.

Reported and test case provided by:	yjh0502@gmail.com
PR:	214923
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week


Revision 310159 - (view) (download) (annotate) - [select for diffs]
Modified Fri Dec 16 17:41:20 2016 UTC (7 years, 10 months ago) by kib
File length: 59834 byte(s)
Diff to previous 302936
Switch from stdatomic.h to atomic.h for kernel.

Apparently stdatomic.h implementation for gcc 4.2 on sparc64 does not
work properly.  This effectively reverts r251803.

Reported and tested by:	lidl
Discussed with:	ed
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week


Revision 302936 - (view) (download) (annotate) - [select for diffs]
Modified Sat Jul 16 13:24:58 2016 UTC (8 years, 3 months ago) by kib
File length: 59995 byte(s)
Diff to previous 302308
Another issue reported on http://seclists.org/oss-sec/2016/q3/68 is
that struct kevent member ident has uintptr_t type, which is silently
truncated to int in the call to fget().  Explicitely check for the
valid range.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week


Revision 302308 - (view) (download) (annotate) - [select for diffs]
Modified Fri Jul 1 20:11:28 2016 UTC (8 years, 3 months ago) by kib
File length: 59940 byte(s)
Diff to previous 302242
When a process knote was attached to the process which is already exiting,
the knote is activated immediately.  If the exit1() later activates
knotes, such knote is attempted to be activated second time.  Detect
the condition by zeroed kn_ptr.p_proc pointer, and avoid excessive
activation.

Before r302235, such knotes were removed from the knlist immediately
upon activation.

Reported by:	truckman
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)


Revision 302242 - (view) (download) (annotate) - [select for diffs]
Modified Mon Jun 27 23:34:53 2016 UTC (8 years, 4 months ago) by kib
File length: 59865 byte(s)
Diff to previous 302235
Fix userspace build after r302235: do not expose bool field of the
structure, change it to int.

The real fix is to sanitize user-visible definitions in sys/event.h,
e.g. the affected struct knlist is of no use for userspace programs.

Reported and tested by:	jkim
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Approved by:	re (gjb)


Revision 302235 - (view) (download) (annotate) - [select for diffs]
Modified Mon Jun 27 21:52:17 2016 UTC (8 years, 4 months ago) by kib
File length: 59872 byte(s)
Diff to previous 300627
When filt_proc() removes event from the knlist due to the process
exiting (NOTE_EXIT->knlist_remove_inevent()), two things happen:
- knote kn_knlist pointer is reset
- INFLUX knote is removed from the process knlist.
And, there are two consequences:
- KN_LIST_UNLOCK() on such knote is nop
- there is nothing which would block exit1() from processing past the
  knlist_destroy() (and knlist_destroy() resets knlist lock pointers).
Both consequences result either in leaked process lock, or
dereferencing NULL function pointers for locking.

Handle this by stopping embedding the process knlist into struct proc.
Instead, the knlist is allocated together with struct proc, but marked
as autodestroy on the zombie reap, by knlist_detach() function.  The
knlist is freed when last kevent is removed from the list, in
particular, at the zombie reap time if the list is empty.  As result,
the knlist_remove_inevent() is no longer needed and removed.

Other changes:

In filt_procattach(), clear NOTE_EXEC and NOTE_FORK desired events
from kn_sfflags for knote registered by kernel to only get NOTE_CHILD
notifications.  The flags leak resulted in excessive
NOTE_EXEC/NOTE_FORK reports.

Fix immediate note activation in filt_procattach().  Condition should
be either the immediate CHILD_NOTE activation, or immediate NOTE_EXIT
report for the exiting process.

In knote_fork(), do not perform racy check for KN_INFLUX before kq
lock is taken.  Besides being racy, it did not accounted for notes
just added by scan (KN_SCAN).

Some minor and incomplete style fixes.

Analyzed and tested by:	Eric Badger <eric@badgerio.us>
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Approved by:	re (gjb)
Differential revision:	https://reviews.freebsd.org/D6859


Revision 300627 - (view) (download) (annotate) - [select for diffs]
Modified Tue May 24 21:13:33 2016 UTC (8 years, 5 months ago) by kib
File length: 59715 byte(s)
Diff to previous 296775
Silence false LOR report due to the taskqueue mutex and kqueue lock
named the same.

Reported by:	Doug Luce <doug@freebsd.con.com>
Sponsored by:	The FreeBSD Foundation


Revision 296775 - (view) (download) (annotate) - [select for diffs]
Modified Sat Mar 12 23:02:53 2016 UTC (8 years, 7 months ago) by gibbs
File length: 59707 byte(s)
Diff to previous 295786
Provide high precision conversion from ns,us,ms -> sbintime in kevent

In timer2sbintime(), calculate the second and fractional second portions of
the sbintime separately. When calculating the the fractional second portion,
use a 64bit multiply to prevent excess truncation. This avoids the ~7% error
in the original conversion for ns, and smaller errors of the same type for us
and ms.

PR: 198139
Reviewed by: jhb
MFC after: 1 week
Differential Revision:    https://reviews.freebsd.org/D5397


Revision 295786 - (view) (download) (annotate) - [select for diffs]
Modified Fri Feb 19 01:49:33 2016 UTC (8 years, 8 months ago) by markj
File length: 58816 byte(s)
Diff to previous 295785
Ensure that we test the event condition when a disabled kevent is enabled.

r274560 modified kqueue_register() to only test the event condition if the
corresponding knote is not disabled. However, this check takes place before
the EV_ENABLE flag is used to clear the KN_DISABLED flag on the knote, so
enabling a previously-disabled kevent would not result in a notification for
a triggered event. This change fixes the problem by testing for EV_ENABLED
before possibly checking the event condition.

This change also updates a kqueue regression test to exercise this case.

PR:		206368
Reviewed by:	kib
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D5307


Revision 295785 - (view) (download) (annotate) - [select for diffs]
Modified Fri Feb 19 01:35:01 2016 UTC (8 years, 8 months ago) by markj
File length: 58884 byte(s)
Diff to previous 295012
Return an error if both EV_ENABLE and EV_DISABLE are specified for a kevent.

Currently, this combination results in EV_DISABLE being ignored.

Reviewed by:	kib
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D5307


Revision 295012 - (view) (download) (annotate) - [select for diffs]
Modified Thu Jan 28 20:24:15 2016 UTC (8 years, 9 months ago) by vangyzen
File length: 58790 byte(s)
Diff to previous 288145
kqueue EVFILT_PROC: avoid collision between NOTE_CHILD and NOTE_EXIT

NOTE_CHILD and NOTE_EXIT return something in kevent.data: the parent
pid (ppid) for NOTE_CHILD and the exit status for NOTE_EXIT.
Do not let the two events be combined, since one would overwrite
the other's data.

PR:		180385
Submitted by:	David A. Bright <david_a_bright@dell.com>
Reviewed by:	jhb
MFC after:	1 month
Sponsored by:	Dell Inc.
Differential Revision:	https://reviews.freebsd.org/D4900


Revision 288145 - (view) (download) (annotate) - [select for diffs]
Modified Wed Sep 23 12:45:08 2015 UTC (9 years, 1 month ago) by mjg
File length: 57415 byte(s)
Diff to previous 287366
kqueue: simplify kern_kqueue by not refing/unrefing creds too early

No functional changes.


Revision 287366 - (view) (download) (annotate) - [select for diffs]
Modified Tue Sep 1 14:05:29 2015 UTC (9 years, 1 month ago) by kib
File length: 57503 byte(s)
Diff to previous 287362
Exit notification for EVFILT_PROC removes knote from the knlist.  In
particular, this invalidates the knote kn_link linkage, making the
SLIST_FOREACH() loop accessing undefined values (e.g. trashed by
QUEUE_MACRO_DEBUG).  If the knote is freed by other thread when kq
lock is released or when influx is cleared, e.g. by knote_scan() for
kqueue owning the knote, the iteration step would access freed memory.

Use SLIST_FOREACH_SAFE() to fix iteration.

Diagnosed by:	avg
Tested by:	avg, lstewart, pawel
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks


Revision 287362 - (view) (download) (annotate) - [select for diffs]
Modified Tue Sep 1 13:21:32 2015 UTC (9 years, 1 month ago) by kib
File length: 57547 byte(s)
Diff to previous 286681
Clean up the kqueue use of the uma KPI.

Explain why it is fine to not check for M_NOWAIT failures in
kqueue_register().  Remove unneeded check for NULL result from
waitable allocation in kqueue_scan().  uma_free(9) handles NULL
argument correctly, remove checks for NULL.  Remove useless cast and
adjust style in knote_alloc().

Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks


Revision 286681 - (view) (download) (annotate) - [select for diffs]
Modified Wed Aug 12 17:46:26 2015 UTC (9 years, 2 months ago) by ed
File length: 57509 byte(s)
Diff to previous 286631
Perform cleanups in response to D3307.

- Document the kern_kevent_anonymous() function.
- Add assertions to ensure that we don't silently leave the kqueue
  linked from a file descriptor table.

Reviewed by:	jmg
Differential Revision:	https://reviews.freebsd.org/D3364


Revision 286631 - (view) (download) (annotate) - [select for diffs]
Modified Tue Aug 11 13:47:23 2015 UTC (9 years, 2 months ago) by ed
File length: 57257 byte(s)
Diff to previous 286309
Add support for anonymous kqueues.

CloudABI's polling system calls merge the concept of one-shot polling
(poll, select) and stateful polling (kqueue). They share the same data
structures.

Extend FreeBSD's kqueue to provide support for waiting for events on an
anonymous kqueue. Unlike stateful polling, there is no need to support
timeouts, as an additional timer event could be used instead.
Furthermore, it makes no sense to use a different number of input and
output kevents. Merge this into a single argument.

Obtained from:	https://github.com/NuxiNL/freebsd
Differential Revision:	https://reviews.freebsd.org/D3307


Revision 286309 - (view) (download) (annotate) - [select for diffs]
Modified Wed Aug 5 07:36:50 2015 UTC (9 years, 2 months ago) by ed
File length: 56460 byte(s)
Diff to previous 285670
Allow the creation of kqueues with a restricted set of Capsicum rights.

On CloudABI we want to create file descriptors with just the minimal set
of Capsicum rights in place. The reason for this is that it makes it
easier to obtain uniform behaviour across different operating systems.

By explicitly whitelisting the operations, we can return consistent
error codes, but also prevent applications from depending OS-specific
behaviour.

Extend kern_kqueue() to take an additional struct filecaps that is
passed on to falloc_caps(). Update the existing consumers to pass in
NULL.

Differential Revision:	https://reviews.freebsd.org/D3259


Revision 285670 - (view) (download) (annotate) - [select for diffs]
Modified Sat Jul 18 09:02:50 2015 UTC (9 years, 3 months ago) by kib
File length: 56418 byte(s)
Diff to previous 284215
The si_status field of the siginfo_t, provided by the waitid(2) and
SIGCHLD signal, should keep full 32 bits of the status passed to the
_exit(2).

Split the combined p_xstat of the struct proc into the separate exit
status p_xexit for normal process exit, and signalled termination
information p_xsig.  Kernel-visible macro KW_EXITCODE() reconstructs
old p_xstat from p_xexit and p_xsig.  p_xexit contains complete status
and copied out into si_status.

Requested by:	Joerg Schilling
Reviewed by:	jilles (previous version), pho
Tested by:	pho
Sponsored by:	The FreeBSD Foundation


Revision 284215 - (view) (download) (annotate) - [select for diffs]
Modified Wed Jun 10 10:48:12 2015 UTC (9 years, 4 months ago) by mjg
File length: 56394 byte(s)
Diff to previous 283440
Implement lockless resource limits.

Use the same scheme implemented to manage credentials.

Code needing to look at process's credentials (as opposed to thred's) is
provided with *_proc variants of relevant functions.

Places which possibly had to take the proc lock anyway still use the proc
pointer to access limits.


Revision 283440 - (view) (download) (annotate) - [select for diffs]
Modified Sun May 24 16:36:29 2015 UTC (9 years, 5 months ago) by dchagin
File length: 56458 byte(s)
Diff to previous 283291
For future use in the Linuxulator:

1. Add a kern_kqueue() counterpart for kqueue() with flags parameter.

2. Be a bit secure. To avoid a double fp lookup add a kern_kevent_fp()
counterpart for kern_kevent() with file pointer parameter instead
of file descriptor an pass the buck to it.

Suggested by: mjg [2]

Differential Revision:	https://reviews.freebsd.org/D1091
Reviewed by:	trasz


Revision 283291 - (view) (download) (annotate) - [select for diffs]
Modified Fri May 22 17:05:21 2015 UTC (9 years, 5 months ago) by jkim
File length: 56133 byte(s)
Diff to previous 274560
CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head.  However, it is continuously misused as the mpsafe argument
for callout_init(9).  Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision:	https://reviews.freebsd.org/D2613
Reviewed by:	jhb
MFC after:	2 weeks


Revision 274560 - (view) (download) (annotate) - [select for diffs]
Modified Sun Nov 16 01:18:41 2014 UTC (9 years, 11 months ago) by jmg
File length: 56146 byte(s)
Diff to previous 272528
prevent doing filter ops locking for staticly compiled filter ops...
This significantly reduces lock contention when adding/removing knotes
on busy multi-kq system...  Next step is to cache these references per
kq.. i.e. kq refs it once and keeps a local ref count so that the same
refs don't get accessed by many cpus...

only allocate a knote when we might use it...

Add a new flag, _FORCEONESHOT..  This allows a thread to force the
delivery of another event in a safe manner, say waking up an idle http
connection to force it to be reaped...

If we are _DISABLE'ing a knote, don't bother to call f_event on it, it's
disabled, so won't be delivered anyways..

Tested by:	adrian


Revision 272528 - (view) (download) (annotate) - [select for diffs]
Modified Sat Oct 4 15:59:15 2014 UTC (10 years ago) by ian
File length: 55636 byte(s)
Diff to previous 271976
Make kevent(2) periodic timer events more reliably periodic.  The event
callout is now scheduled using the C_ABSOLUTE flag, and the absolute time
of each event is calculated as the time the previous event was scheduled
for plus the interval.  This ensures that latency in processing a given
event doesn't perturb the arrival time of any subsequent events.

Reviewed by:	jhb


Revision 271976 - (view) (download) (annotate) - [select for diffs]
Modified Mon Sep 22 16:20:47 2014 UTC (10 years, 1 month ago) by jhb
File length: 55387 byte(s)
Diff to previous 271489
Add a new fo_fill_kinfo fileops method to add type-specific information to
struct kinfo_file.
- Move the various fill_*_info() methods out of kern_descrip.c and into the
  various file type implementations.
- Rework the support for kinfo_ofile to generate a suitable kinfo_file object
  for each file and then convert that to a kinfo_ofile structure rather than
  keeping a second, different set of code that directly manipulates
  type-specific file information.
- Remove the shm_path() and ksem_info() layering violations.

Differential Revision:	https://reviews.freebsd.org/D775
Reviewed by:	kib, glebius (earlier version)


Revision 271489 - (view) (download) (annotate) - [select for diffs]
Modified Fri Sep 12 21:29:10 2014 UTC (10 years, 1 month ago) by jhb
File length: 55143 byte(s)
Diff to previous 268843
Fix various issues with invalid file operations:
- Add invfo_rdwr() (for read and write), invfo_ioctl(), invfo_poll(),
  and invfo_kqfilter() for use by file types that do not support the
  respective operations.  Home-grown versions of invfo_poll() were
  universally broken (they returned an errno value, invfo_poll()
  uses poll_no_poll() to return an appropriate event mask).  Home-grown
  ioctl routines also tended to return an incorrect errno (invfo_ioctl
  returns ENOTTY).
- Use the invfo_*() functions instead of local versions for
  unsupported file operations.
- Reorder fileops members to match the order in the structure definition
  to make it easier to spot missing members.
- Add several missing methods to linuxfileops used by the OFED shim
  layer: fo_write(), fo_truncate(), fo_kqfilter(), and fo_stat().  Most
  of these used invfo_*(), but a dummy fo_stat() implementation was
  added.


Revision 268843 - (view) (download) (annotate) - [select for diffs]
Modified Fri Jul 18 14:27:04 2014 UTC (10 years, 3 months ago) by bapt
File length: 55763 byte(s)
Diff to previous 264388
Extend kqueue's EVFILT_TIMER by adding precision unit flags support

Define the precision macros as bits sets to conform with XNU equivalent.
Test fflags passed for EVFILT_TIMER and return EINVAL in case an invalid flag
is passed.

Phabric:	https://phabric.freebsd.org/D421
Reviewed by:	kib


Revision 264388 - (view) (download) (annotate) - [select for diffs]
Modified Sat Apr 12 23:29:29 2014 UTC (10 years, 6 months ago) by davide
File length: 55218 byte(s)
Diff to previous 264231
Hide internal details of sbintime_t implementation wrapping INT64_MAX into
SBT_MAX, to make it more robust in case internal type representation will
change in the future. All the consumers were migrated to SBT_MAX and
every new consumer (if any) should from now use this interface.

Requested by:	bapt, jmg, Ryan Lortie (implictly)
Reviewed by:	mav, bde


Revision 264231 - (view) (download) (annotate) - [select for diffs]
Modified Mon Apr 7 18:10:49 2014 UTC (10 years, 6 months ago) by ed
File length: 55222 byte(s)
Diff to previous 264146
Implement kqueue(2) for procdesc(4).

kqueue(2) already supports EVFILT_PROC. Add an EVFILT_PROCDESC that
behaves the same, but operates on a procdesc(4) instead. Only implement
NOTE_EXIT for now. The nice thing about NOTE_EXIT is that it also
returns the exit status of the process, meaning that we can now obtain
this value, even if pdwait4(2) is still unimplemented.

Notes:

- Simply reuse EVFILT_NETDEV for EVFILT_PROCDESC. As both of these will
  be used on totally different descriptor types, this should not clash.

- Let procdesc_kqops_event() reuse the same structure as filt_proc().
  The only difference is that procdesc_kqops_event() should also be able
  to deal with the case where the process was already terminated after
  registration. Simply test this when hint == 0.

- Fix some style(9) issues in filt_proc() to keep it consistent with the
  newly added procdesc_kqops_event().

- Save the exit status of the process in pd->pd_xstat, as we cannot pick
  up the proctree_lock from within procdesc_kqops_event().

Discussed on:	arch@
Reviewed by:	kib@


Revision 264146 - (view) (download) (annotate) - [select for diffs]
Modified Sat Apr 5 14:09:16 2014 UTC (10 years, 6 months ago) by kib
File length: 55242 byte(s)
Diff to previous 263233
When KN_INFLUX is set on the knote due to kqueue_register() or
kqueue_scan() unlocking the kqueue to call f_event, knote() or
knote_fork() should not skip the knote.  The knote is not going to
disappear during the influx time, and the mutual exclusion between
scan and knote() is ensured by both code pathes taking knlist lock.
The race appears since knlist lock is before kq lock, so KN_INFLUX
must be set, kq lock must be dropped and only then knlist lock can be
taken.  The window between kq unlock and knlist lock causes lost
events.

Add a flag KN_SCAN to indicate that KN_INFLUX is set in a manner safe
for the knote(), and check for it to ignore KN_INFLUX in the knote*()
as needed.  Also, in knote(), remove the lockless check for the
KN_INFLUX flag, which could also result in the lost notification.

Reported and tested by:	Kohji Okuno <okuno.kohji@jp.panasonic.com>
Discussed with:	jmg
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week


Revision 263233 - (view) (download) (annotate) - [select for diffs]
Modified Sun Mar 16 10:55:57 2014 UTC (10 years, 7 months ago) by rwatson
File length: 54940 byte(s)
Diff to previous 260805
Update kernel inclusions of capability.h to use capsicum.h instead; some
further refinement is required as some device drivers intended to be
portable over FreeBSD versions rely on __FreeBSD_version to decide whether
to include capability.h.

MFC after:	3 weeks


Revision 260805 - (view) (download) (annotate) - [select for diffs]
Modified Fri Jan 17 05:15:44 2014 UTC (10 years, 9 months ago) by adrian
File length: 54942 byte(s)
Diff to previous 260384
Add in a default initialiser for the EVOPS_SENDFILE kqueue filterops.

Sponsored by:	Netflix, Inc.


Revision 260384 - (view) (download) (annotate) - [select for diffs]
Modified Tue Jan 7 01:17:27 2014 UTC (10 years, 9 months ago) by adrian
File length: 54898 byte(s)
Diff to previous 259633
Add a compile-time control over the size of KN_HASHSIZE.

This is needed for applications that use a lot of non-filedescriptor
knotes.

MFC after:	1 week
Sponsored by:	Netflix, Inc.


Revision 259633 - (view) (download) (annotate) - [select for diffs]
Modified Thu Dec 19 21:35:33 2013 UTC (10 years, 10 months ago) by se
File length: 54846 byte(s)
Diff to previous 259609
Fix compilation on 32 bit architectures and use INT64_MAX instead of
LONG_MAX for the upper bound check.


Revision 259609 - (view) (download) (annotate) - [select for diffs]
Modified Thu Dec 19 09:01:46 2013 UTC (10 years, 10 months ago) by se
File length: 54823 byte(s)
Diff to previous 258181
Fix overflow for timeout values of more than 68 years, which is the maximum
covered by sbintime (LONG_MAX seconds).

Some programs use timeout values in excess of 1000 years. The conversion
to sbintime caused wrap-around on overflow, which resulted in short or
negative timeout values. This caused long delays on sockets opened by
affected programs (e.g. OpenSSH).

Kernels compiled without -fno-strict-overflow were not affected, apparently
because the compiler tested the sign of the timeout value before performing
the multiplication that lead to overflow.

When the -fno-strict-overflow option was added to CFLAGS, this optimization
was disabled and the test was performed on the result of the multiplication.
Negative products were caught and resulted in EINVAL being returned, but
wrap-around to positive values just shortened the timeout value to the
residue of the result that could be represented by sbintime.

The fix is to cap the timeout values at the maximum that can be represented
by sbintime, which is 2^31 - 1 seconds or more than 68 years.

After this change, the kernel can be compiled with -fno-strict-overflow
with no ill effects.

MFC after:	3 days


Revision 258181 - (view) (download) (annotate) - [select for diffs]
Modified Fri Nov 15 19:55:35 2013 UTC (10 years, 11 months ago) by pjd
File length: 54771 byte(s)
Diff to previous 257597
Replace CAP_POLL_EVENT and CAP_POST_EVENT capability rights (which I had
a very hard time to fully understand) with much more intuitive rights:

	CAP_EVENT - when set on descriptor, the descriptor can be monitored
		with syscalls like select(2), poll(2), kevent(2).

	CAP_KQUEUE_EVENT - When set on a kqueue descriptor, the kevent(2)
		syscall can be called on this kqueue to with the eventlist
		argument set to non-NULL value; in other words the given
		kqueue descriptor can be used to monitor other descriptors.
	CAP_KQUEUE_CHANGE - When set on a kqueue descriptor, the kevent(2)
		syscall can be called on this kqueue to with the changelist
		argument set to non-NULL value; in other words it allows to
		modify events monitored with the given kqueue descriptor.

Add alias CAP_KQUEUE, which allows for both CAP_KQUEUE_EVENT and
CAP_KQUEUE_CHANGE.

Add backward compatibility define CAP_POLL_EVENT which is equal to CAP_EVENT.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days


Revision 257597 - (view) (download) (annotate) - [select for diffs]
Modified Sun Nov 3 23:06:24 2013 UTC (10 years, 11 months ago) by jilles
File length: 54644 byte(s)
Diff to previous 256849
kqueue: Change error for kqueues rlimit from EMFILE to ENOMEM and document
this error condition in the kqueue(2) manual page.

Discussed with:	kib


Revision 256849 - (view) (download) (annotate) - [select for diffs]
Modified Mon Oct 21 16:44:53 2013 UTC (11 years ago) by kib
File length: 54644 byte(s)
Diff to previous 255882
Add a resource limit for the total number of kqueues available to the
user.  Kqueue now saves the ucred of the allocating thread, to
correctly decrement the counter on close.

Under some specific and not real-world use scenario for kqueue, it is
possible for the kqueues to consume memory proportional to the square
of the number of the filedescriptors available to the process.  Limit
allows administrator to prevent the abuse.

This is kernel-mode side of the change, with the user-mode enabling
commit following.

Reported and tested by:	pho
Discussed with:	jmg
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks


Revision 255882 - (view) (download) (annotate) - [select for diffs]
Modified Thu Sep 26 13:17:31 2013 UTC (11 years, 1 month ago) by kib
File length: 54197 byte(s)
Diff to previous 255798
Do not allow negative timeouts for kqueue timers, check for the
negative timeout both before and after the conversion to sbintime_t.

For periodic kqueue timer, convert zero timeout into 1ms, to avoid
interrupt storm on fast event timers.

Reported and tested by:	pho
Discussed with:	mav
Reviewed by:	davide
Sponsored by:	The FreeBSD Foundation
Approved by:	re (marius)


Revision 255798 - (view) (download) (annotate) - [select for diffs]
Modified Sun Sep 22 19:54:47 2013 UTC (11 years, 1 month ago) by kib
File length: 54001 byte(s)
Diff to previous 255675
Pre-acquire the filedesc sx when a possibility exists that the later
code could need to remove a kqueue from the filedesc list.  Global
lock is already locked, which causes sleepable after non-sleepable
lock acquisition.

Reported and tested by:	pho
Reviewed by:	jmg
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Approved by:	re (gjb)


Revision 255675 - (view) (download) (annotate) - [select for diffs]
Modified Wed Sep 18 18:48:33 2013 UTC (11 years, 1 month ago) by rdivacky
File length: 53226 byte(s)
Diff to previous 255672
Revert r255672, it has some serious flaws, leaking file references etc.

Approved by:	re (delphij)


Revision 255672 - (view) (download) (annotate) - [select for diffs]
Modified Wed Sep 18 17:56:04 2013 UTC (11 years, 1 month ago) by rdivacky
File length: 53381 byte(s)
Diff to previous 255527
Implement epoll support in Linuxulator. This is a tiny wrapper around kqueue
to implement epoll subset of functionality. The kqueue user data are 32bit
on i386 which is not enough for epoll user data so this patch overrides
kqueue fileops to maintain enough space in struct file.

Initial patch developed by me in 2007 and then extended and finished
by Yuri Victorovich.

Approved by:    re (delphij)
Sponsored by:   Google Summer of Code
Submitted by:   Yuri Victorovich <yuri at rawbw dot com>
Tested by:      Yuri Victorovich <yuri at rawbw dot com>


Revision 255527 - (view) (download) (annotate) - [select for diffs]
Modified Fri Sep 13 19:50:50 2013 UTC (11 years, 1 month ago) by kib
File length: 53226 byte(s)
Diff to previous 255219
Use TAILQ instead of STAILQ for kqeueue filedescriptors to ensure constant
time removal on kqueue close.

Reported and tested by:	pho
Reviewed by:	jmg
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Approved by:	re (delphij)


Revision 255219 - (view) (download) (annotate) - [select for diffs]
Modified Thu Sep 5 00:09:56 2013 UTC (11 years, 1 month ago) by pjd
File length: 53234 byte(s)
Diff to previous 254932
Change the cap_rights_t type from uint64_t to a structure that we can extend
in the future in a backward compatible (API and ABI) way.

The cap_rights_t represents capability rights. We used to use one bit to
represent one right, but we are running out of spare bits. Currently the new
structure provides place for 114 rights (so 50 more than the previous
cap_rights_t), but it is possible to grow the structure to hold at least 285
rights, although we can make it even larger if 285 rights won't be enough.

The structure definition looks like this:

	struct cap_rights {
		uint64_t	cr_rights[CAP_RIGHTS_VERSION + 2];
	};

The initial CAP_RIGHTS_VERSION is 0.

The top two bits in the first element of the cr_rights[] array contain total
number of elements in the array - 2. This means if those two bits are equal to
0, we have 2 array elements.

The top two bits in all remaining array elements should be 0.
The next five bits in all array elements contain array index. Only one bit is
used and bit position in this five-bits range defines array index. This means
there can be at most five array elements in the future.

To define new right the CAPRIGHT() macro must be used. The macro takes two
arguments - an array index and a bit to set, eg.

	#define	CAP_PDKILL	CAPRIGHT(1, 0x0000000000000800ULL)

We still support aliases that combine few rights, but the rights have to belong
to the same array element, eg:

	#define	CAP_LOOKUP	CAPRIGHT(0, 0x0000000000000400ULL)
	#define	CAP_FCHMOD	CAPRIGHT(0, 0x0000000000002000ULL)

	#define	CAP_FCHMODAT	(CAP_FCHMOD | CAP_LOOKUP)

There is new API to manage the new cap_rights_t structure:

	cap_rights_t *cap_rights_init(cap_rights_t *rights, ...);
	void cap_rights_set(cap_rights_t *rights, ...);
	void cap_rights_clear(cap_rights_t *rights, ...);
	bool cap_rights_is_set(const cap_rights_t *rights, ...);

	bool cap_rights_is_valid(const cap_rights_t *rights);
	void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src);
	void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src);
	bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);

Capability rights to the cap_rights_init(), cap_rights_set(),
cap_rights_clear() and cap_rights_is_set() functions are provided by
separating them with commas, eg:

	cap_rights_t rights;

	cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);

There is no need to terminate the list of rights, as those functions are
actually macros that take care of the termination, eg:

	#define	cap_rights_set(rights, ...)				\
		__cap_rights_set((rights), __VA_ARGS__, 0ULL)
	void __cap_rights_set(cap_rights_t *rights, ...);

Thanks to using one bit as an array index we can assert in those functions that
there are no two rights belonging to different array elements provided
together. For example this is illegal and will be detected, because CAP_LOOKUP
belongs to element 0 and CAP_PDKILL to element 1:

	cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);

Providing several rights that belongs to the same array's element this way is
correct, but is not advised. It should only be used for aliases definition.

This commit also breaks compatibility with some existing Capsicum system calls,
but I see no other way to do that. This should be fine as Capsicum is still
experimental and this change is not going to 9.x.

Sponsored by:	The FreeBSD Foundation


Revision 254932 - (view) (download) (annotate) - [select for diffs]
Modified Mon Aug 26 18:53:19 2013 UTC (11 years, 2 months ago) by jmg
File length: 53072 byte(s)
Diff to previous 254356
fix up some comments and a white space issue...

MFC after:	3 days


Revision 254356 - (view) (download) (annotate) - [select for diffs]
Modified Thu Aug 15 07:54:31 2013 UTC (11 years, 2 months ago) by glebius
File length: 53071 byte(s)
Diff to previous 254287
Make sendfile() a method in the struct fileops.  Currently only
vnode backed file descriptors have this method implemented.

Reviewed by:	kib
Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix


Revision 254287 - (view) (download) (annotate) - [select for diffs]
Modified Tue Aug 13 18:45:58 2013 UTC (11 years, 2 months ago) by jhb
File length: 53039 byte(s)
Diff to previous 254072
Some small cleanups to the fixes in r180340:
- Set NOTE_TRACKERR before running filt_proc().  If the knote did not
  have NOTE_FORK set in fflags when registered, then the TRACKERR event
  could miss being posted.
- Don't pass the pid in to filt_proc() for NOTE_FORK events.  The special
  handling for pids is done knote_fork() directly and no longer in
  filt_proc().

MFC after:	2 weeks


Revision 254072 - (view) (download) (annotate) - [select for diffs]
Modified Wed Aug 7 19:56:35 2013 UTC (11 years, 2 months ago) by jhb
File length: 53051 byte(s)
Diff to previous 251803
Don't emit a spurious EVFILT_PROC event with no fflags set on process exit
if NOTE_EXIT is not being monitored.  The rationale is that a listener
should only get an event for exit() if they registered interest via
NOTE_EXIT.  This matches the behavior on OS X.
- Don't save the exit status on process exit unless NOTE_EXIT is being
  monitored.
- Add an internal EV_DROP flag that requests kqueue_scan() to free the
  knote without signalling it to userland and use this when a process
  exits but the fflags in the knote is zero.

Reviewed by:	jmg
MFC after:	1 month


Revision 251803 - (view) (download) (annotate) - [select for diffs]
Modified Sun Jun 16 09:30:35 2013 UTC (11 years, 4 months ago) by ed
File length: 52603 byte(s)
Diff to previous 248092
Change callout use counter to use C11 atomics.

In order to get some coverage of C11 atomics in kernelspace, switch at
least one piece of code in kernelspace to use C11 atomics instead of
<machine/atomic.h>.

While there, slightly improve the code by adding an assertion to prevent
the use count from going negative.


Revision 248092 - (view) (download) (annotate) - [select for diffs]
Modified Sat Mar 9 09:07:13 2013 UTC (11 years, 7 months ago) by mav
File length: 52252 byte(s)
Diff to previous 247917
Rework overflow checks of r247898 to not let too "intelligent" compiler to
optimize it out.

Submitted by:	bde


Revision 247917 - (view) (download) (annotate) - [select for diffs]
Modified Thu Mar 7 16:50:07 2013 UTC (11 years, 7 months ago) by mav
File length: 52228 byte(s)
Diff to previous 247898
Fix off-by-one error in nanoseconds validation.

Submitted by:	bde


Revision 247898 - (view) (download) (annotate) - [select for diffs]
Modified Wed Mar 6 19:37:38 2013 UTC (11 years, 7 months ago) by mav
File length: 52227 byte(s)
Diff to previous 247804
Fix time math overflows and improve zero intervals handling in poll(),
select(), nanosleep() and kevent() functions after calloutng changes.

Reported by:	bde


Revision 247804 - (view) (download) (annotate) - [select for diffs]
Modified Mon Mar 4 16:55:16 2013 UTC (11 years, 7 months ago) by davide
File length: 52128 byte(s)
Diff to previous 238424
MFcalloutng:
- Rewrite kevent() timeout implementation to allow sub-tick precision.
- Make the interval timings for EVFILT_TIMER more accurate. This also
removes an hack introduced in r238424.

Sponsored by:	Google Summer of Code 2012, iXsystems inc.
Tested by:	flo, marius, ian, markj, Fabian Keil


Revision 238424 - (view) (download) (annotate) - [select for diffs]
Modified Fri Jul 13 13:24:33 2012 UTC (12 years, 3 months ago) by jhb
File length: 52709 byte(s)
Diff to previous 237084
Make the interval timings for EVFILT_TIMER more accurate.  tvtohz() always
adds an extra tick to account for the current partial clock tick.  However,
that is not appropriate for a repeating timer when the exact tvtohz() value
should be used for subsequent intervals.  Fix repeating callouts for
EVFILT_TIMER by subtracting 1 tick from the tvtohz() result similar to the
fix used in realitexpire() for interval timers.

While here, update a few comments to note that if the EVFILT_TIMER code
were to move out of kern_event.c, it should move to kern_time.c (where the
interval timer code it mimics lives) rather than kern_timeout.c.

MFC after:	1 month


Revision 237084 - (view) (download) (annotate) - [select for diffs]
Modified Thu Jun 14 17:32:58 2012 UTC (12 years, 4 months ago) by pjd
File length: 52460 byte(s)
Diff to previous 233505
Update comment.

MFC after:	1 month


Revision 233505 - (view) (download) (annotate) - [select for diffs]
Modified Mon Mar 26 09:34:17 2012 UTC (12 years, 7 months ago) by melifaro
File length: 52461 byte(s)
Diff to previous 225617
- Add knlist_init_rw_reader() function to kqueue(9).
Function acquired reader lock if needed.
Assert check for reader or writer lock (RA_LOCKED / RA_UNLOCKED)
- While here, add knlist_init_mtx.9 to MLINKS and fix some style(9) issues

Reviewed by:    glebius
Approved by:    ae(mentor)

MFC after:      2 weeks


Revision 225617 - (view) (download) (annotate) - [select for diffs]
Modified Fri Sep 16 13:58:51 2011 UTC (13 years, 1 month ago) by kmacy
File length: 51880 byte(s)
Diff to previous 225177
In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by:	rwatson
Approved by:	re (bz)


Revision 225177 - (view) (download) (annotate) - [select for diffs]
Modified Thu Aug 25 15:51:54 2011 UTC (13 years, 2 months ago) by attilio
File length: 51872 byte(s)
Diff to previous 224914
Fix a deficiency in the selinfo interface:
If a selinfo object is recorded (via selrecord()) and then it is
quickly destroyed, with the waiters missing the opportunity to awake,
at the next iteration they will find the selinfo object destroyed,
causing a PF#.

That happens because the selinfo interface has no way to drain the
waiters before to destroy the registered selinfo object. Also this
race is quite rare to get in practice, because it would require a
selrecord(), a poll request by another thread and a quick destruction
of the selrecord()'ed selinfo object.

Fix this by adding the seldrain() routine which should be called
before to destroy the selinfo objects (in order to avoid such case),
and fix the present cases where it might have already been called.
Sometimes, the context is safe enough to prevent this type of race,
like it happens in device drivers which installs selinfo objects on
poll callbacks. There, the destruction of the selinfo object happens
at driver detach time, when all the filedescriptors should be already
closed, thus there cannot be a race.
For this case, mfi(4) device driver can be set as an example, as it
implements a full correct logic for preventing this from happening.

Sponsored by:	Sandvine Incorporated
Reported by:	rstone
Tested by:	pluknet
Reviewed by:	jhb, kib
Approved by:	re (bz)
MFC after:	3 weeks


Revision 224914 - (view) (download) (annotate) - [select for diffs]
Modified Tue Aug 16 20:07:47 2011 UTC (13 years, 2 months ago) by kib
File length: 51848 byte(s)
Diff to previous 224797
Add the fo_chown and fo_chmod methods to struct fileops and use them
to implement fchown(2) and fchmod(2) support for several file types
that previously lacked it. Add MAC entries for chown/chmod done on
posix shared memory and (old) in-kernel posix semaphores.

Based on the submission by:	glebius
Reviewed by:	rwatson
Approved by:	re (bz)


Revision 224797 - (view) (download) (annotate) - [select for diffs]
Modified Fri Aug 12 14:26:47 2011 UTC (13 years, 2 months ago) by jonathan
File length: 51796 byte(s)
Diff to previous 224778
Rename CAP_*_KEVENT to CAP_*_EVENT.

Change the names of a couple of capability rights to be less
FreeBSD-specific.

Approved by: re (kib), mentor (rwatson)
Sponsored by: Google Inc


Revision 224778 - (view) (download) (annotate) - [select for diffs]
Modified Thu Aug 11 12:30:23 2011 UTC (13 years, 2 months ago) by rwatson
File length: 51799 byte(s)
Diff to previous 220245
Second-to-last commit implementing Capsicum capabilities in the FreeBSD
kernel for FreeBSD 9.0:

Add a new capability mask argument to fget(9) and friends, allowing system
call code to declare what capabilities are required when an integer file
descriptor is converted into an in-kernel struct file *.  With options
CAPABILITIES compiled into the kernel, this enforces capability
protection; without, this change is effectively a no-op.

Some cases require special handling, such as mmap(2), which must preserve
information about the maximum rights at the time of mapping in the memory
map so that they can later be enforced in mprotect(2) -- this is done by
narrowing the rights in the existing max_protection field used for similar
purposes with file permissions.

In namei(9), we assert that the code is not reached from within capability
mode, as we're not yet ready to enforce namespace capabilities there.
This will follow in a later commit.

Update two capability names: CAP_EVENT and CAP_KEVENT become
CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they
represent.

Approved by:	re (bz)
Submitted by:	jonathan
Sponsored by:	Google Inc


Revision 220245 - (view) (download) (annotate) - [select for diffs]
Modified Fri Apr 1 13:28:34 2011 UTC (13 years, 6 months ago) by kib
File length: 51720 byte(s)
Diff to previous 205886
After the r219999 is merged to stable/8, rename fallocf(9) to falloc(9)
and remove the falloc() version that lacks flag argument. This is done
to reduce the KPI bloat.

Requested by:	jhb
X-MFC-note:	do not


Revision 205886 - (view) (download) (annotate) - [select for diffs]
Modified Tue Mar 30 18:31:55 2010 UTC (14 years, 7 months ago) by jhb
File length: 51717 byte(s)
Diff to previous 203875
Defer freeing a kevent list until after dropping kqueue locks.

LOR:		185
Submitted by:	Matthew Fleming @ Isilon
MFC after:	1 week


Revision 203875 - (view) (download) (annotate) - [select for diffs]
Modified Sun Feb 14 13:59:01 2010 UTC (14 years, 8 months ago) by kib
File length: 51682 byte(s)
Diff to previous 201352
Do not leak process lock when current thread is not allowed to see target.

Bumped into by:	ed
MFC after:	3 days


Revision 201352 - (view) (download) (annotate) - [select for diffs]
Modified Thu Dec 31 20:56:28 2009 UTC (14 years, 9 months ago) by brooks
File length: 51659 byte(s)
Diff to previous 201350
If a filter has already been added, actually return EEXIST when trying
at add it again.

MFC after:	1 week


Revision 201350 - (view) (download) (annotate) - [select for diffs]
Modified Thu Dec 31 20:29:58 2009 UTC (14 years, 9 months ago) by brooks
File length: 51643 byte(s)
Diff to previous 197930
The devices that supported EVFILT_NETDEV kqueue filters were removed in
r195175.  Remove all definitions, documentation, and usage.

fifo_misc.c:
	Remove all kqueue tests as fifo_io.c performs all those that
	would have remained.

Reviewed by:	rwatson
MFC after:	3 weeks
X-MFC note:	don't change vlan_link_state() function signature


Revision 197930 - (view) (download) (annotate) - [select for diffs]
Added Sat Oct 10 14:56:34 2009 UTC (15 years ago) by kib
File length: 51636 byte(s)
Diff to previous 197575
Postpone dropping fp till both kq_global and kqueue mutexes are
unlocked. fdrop() closes file descriptor when reference count goes to
zero. Close method for vnodes locks the vnode, resulting in "sleepable
after non-sleepable". For pipes, pipe mutex is before kqueue lock,
causing LOR.

Reported and tested by:	pho
MFC after:	2 weeks



This form allows you to request diffs between any two revisions of this file. For each of the two "sides" of the diff, enter a numeric revision.

  Diffs between and
  Type of Diff should be a

  ViewVC Help
Powered by ViewVC 1.1.27