/[base]/head/sys/kern/vfs_syscalls.c
ViewVC logotype

Log of /head/sys/kern/vfs_syscalls.c

Parent Directory Parent Directory | Revision Log Revision Log


Links to HEAD: (view) (download) (annotate)
Sticky Revision:


Revision 368614 - (view) (download) (annotate) - [select for diffs]
Modified Sun Dec 13 21:28:15 2020 UTC (3 years ago) by mjg
File length: 110261 byte(s)
Diff to previous 367777
vfs: correctly predict last fdrop on failed open

Arguably since the count is guaranteed to be 1 the code should be modified
to avoid the work.


Revision 367777 - (view) (download) (annotate) - [select for diffs]
Modified Tue Nov 17 21:14:13 2020 UTC (3 years, 1 month ago) by cem
File length: 110255 byte(s)
Diff to previous 367773
Split out cwd/root/jail, cmask state from filedesc table

No functional change intended.

Tracking these structures separately for each proc enables future work to
correctly emulate clone(2) in linux(4).

__FreeBSD_version is bumped (to 1300130) for consumption by, e.g., lsof.

Reviewed by:	kib
Discussed with:	markj, mjg
Differential Revision:	https://reviews.freebsd.org/D27037


Revision 367773 - (view) (download) (annotate) - [select for diffs]
Modified Tue Nov 17 19:51:47 2020 UTC (3 years, 1 month ago) by cem
File length: 110214 byte(s)
Diff to previous 367632
linux(4): Implement name_to_handle_at(), open_by_handle_at()

They are similar to our getfhat(2) and fhopen(2) syscalls.

Differential Revision:	https://reviews.freebsd.org/D27111


Revision 367632 - (view) (download) (annotate) - [select for diffs]
Modified Fri Nov 13 09:42:32 2020 UTC (3 years, 1 month ago) by kib
File length: 110082 byte(s)
Diff to previous 366950
Allow some VOPs to return ERELOOKUP to indicate VFS operation restart at top level.

Restart syscalls and some sync operations when filesystem indicated
ERELOOKUP condition, mostly for VOPs operating on metdata.  In
particular, lookup results cached in the inode/v_data is no longer
valid and needs recalculating.  Right now this should be nop.

Assert that ERELOOKUP is catched everywhere and not returned to
userspace, by asserting that td_errno != ERELOOKUP on syscall return
path.

In collaboration with:	pho
Reviewed by:	mckusick (previous version), markj
Tested by:	markj (syzkaller), pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D26136


Revision 366950 - (view) (download) (annotate) - [select for diffs]
Modified Thu Oct 22 19:28:12 2020 UTC (3 years, 2 months ago) by mjg
File length: 109600 byte(s)
Diff to previous 366022
vfs: prevent avoidable evictions on mkdir of existing directories

mkdir -p /foo/bar/baz will mkdir each path component and ignore EEXIST.

The NOCACHE lookup will make the namecache unnecessarily evict the existing entry,
and then fallback to the fs lookup routine eventually leading namei to return an
error as the directory is already there.

For invocations like mkdir -p /usr/obj/usr/src/sys/GENERIC/modules this triggers
fallbacks to the slowpath for concurrently executing lookups.

Tested by:	pho
Discussed with:	kib


Revision 366022 - (view) (download) (annotate) - [select for diffs]
Modified Tue Sep 22 22:48:12 2020 UTC (3 years, 3 months ago) by kib
File length: 109575 byte(s)
Diff to previous 366018
Add O_RESOLVE_BENEATH and AT_RESOLVE_BENEATH to mimic Linux' RESOLVE_BENEATH.

It is like O_BENEATH, but disables to walk out of the subtree rooted
in the starting directory. O_BENEATH does not care if path walks out
if it returned.

Requested by:	Dan Gohman <dev@sunfishcode.online>
PR:	248335
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25886


Revision 366018 - (view) (download) (annotate) - [select for diffs]
Modified Tue Sep 22 22:22:29 2020 UTC (3 years, 3 months ago) by kib
File length: 109095 byte(s)
Diff to previous 366017
Add at2cnpflags()

the helper to convert AT_ flags for *at() syscalls to namei flags.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25886


Revision 366017 - (view) (download) (annotate) - [select for diffs]
Modified Tue Sep 22 22:06:20 2020 UTC (3 years, 3 months ago) by kib
File length: 109038 byte(s)
Diff to previous 365783
Add NIRES_STRICTREL.

Stop abusing internal namei flag NI_LCF_STRICTRELATIVE as indicator of
cap-restricted lookup.  Add designated returned flag NIRES_STRICTREL
to inform kern_openat() that lookup was restricted.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25886


Revision 365783 - (view) (download) (annotate) - [select for diffs]
Modified Tue Sep 15 21:55:21 2020 UTC (3 years, 3 months ago) by kib
File length: 109040 byte(s)
Diff to previous 364113
Do not copy vp into f_data for DTYPE_VNODE files.

The pointer to vnode is already stored into f_vnode, so f_data can be
reused.  Fix all found users of f_data for DTYPE_VNODE.

Provide finit_vnode() helper to initialize file of DTYPE_VNODE type.

Reviewed by:	markj (previous version)
Discussed with:	freqlabs (openzfs chunk)
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D26346


Revision 364113 - (view) (download) (annotate) - [select for diffs]
Modified Tue Aug 11 14:27:57 2020 UTC (3 years, 4 months ago) by mjg
File length: 109280 byte(s)
Diff to previous 364086
devfs: rework si_usecount to track opens

This removes a lot of special casing from the VFS layer.

Reviewed by:	kib (previous version)
Tested by:	pho (previous version)
Differential Revision:	https://reviews.freebsd.org/D25612


Revision 364086 - (view) (download) (annotate) - [select for diffs]
Modified Mon Aug 10 18:11:00 2020 UTC (3 years, 4 months ago) by mjg
File length: 109265 byte(s)
Diff to previous 364044
vfs: drop the hello world stat probes from the vfs provider

Interested parties can get the same information by hoooking on vop_stat.


Revision 364044 - (view) (download) (annotate) - [select for diffs]
Modified Fri Aug 7 23:06:40 2020 UTC (3 years, 4 months ago) by mjg
File length: 109531 byte(s)
Diff to previous 363668
vfs: add VOP_STAT

The current scheme of calling VOP_GETATTR adds avoidable overhead.

An example with tmpfs doing fstat (ops/s):
before: 7488958
after:  7913833

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D25910


Revision 363668 - (view) (download) (annotate) - [select for diffs]
Modified Wed Jul 29 17:05:31 2020 UTC (3 years, 4 months ago) by mjg
File length: 109528 byte(s)
Diff to previous 363667
vfs: elide MAC-induced locking on rename if there are no relevant hoooks


Revision 363667 - (view) (download) (annotate) - [select for diffs]
Modified Wed Jul 29 17:04:33 2020 UTC (3 years, 4 months ago) by mjg
File length: 109150 byte(s)
Diff to previous 363072
vfs: honor error code returned by mac_vnode_check_rename_from

MFC after:	3 days


Revision 363072 - (view) (download) (annotate) - [select for diffs]
Modified Fri Jul 10 09:24:27 2020 UTC (3 years, 5 months ago) by mjg
File length: 108937 byte(s)
Diff to previous 363069
vfs: fix early termination of kern_getfsstat

The kernel would unlock already unlocked mutex if the buffer got filled up
before the mount list ended.

Reported by:	pho
Fixes:	r363069 ("vfs: depessimize getfsstat when only the count is requested")


Revision 363069 - (view) (download) (annotate) - [select for diffs]
Modified Fri Jul 10 06:47:58 2020 UTC (3 years, 5 months ago) by mjg
File length: 108934 byte(s)
Diff to previous 362460
vfs: depessimize getfsstat when only the count is requested

This avoids relocking mountlist_mtx for each entry.


Revision 362460 - (view) (download) (annotate) - [select for diffs]
Modified Sun Jun 21 08:51:24 2020 UTC (3 years, 6 months ago) by tmunro
File length: 108220 byte(s)
Diff to previous 359272
vfs: track sequential reads and writes separately

For software like PostgreSQL and SQLite that sometimes reads sequentially
while also writing sequentially some distance behind with interleaved
syscalls on the same fd, performance is better on UFS if we do
sequential access heuristics separately for reads and writes.

Patch originally by Andrew Gierth in 2008, updated and proposed by me with
his permission.

Reviewed by:	mjg, kib, tmunro
Approved by:	mjg (mentor)
Obtained from:	Andrew Gierth <andrew@tao11.riddles.org.uk>
Differential Revision:	https://reviews.freebsd.org/D25024


Revision 359272 - (view) (download) (annotate) - [select for diffs]
Modified Tue Mar 24 17:16:52 2020 UTC (3 years, 9 months ago) by kib
File length: 108135 byte(s)
Diff to previous 357951
kern_copy_file_range(): check the file type.

The syscall can only operate on valid vnode types.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation


Revision 357951 - (view) (download) (annotate) - [select for diffs]
Modified Sat Feb 15 01:28:42 2020 UTC (3 years, 10 months ago) by mjg
File length: 107873 byte(s)
Diff to previous 357888
vfs: use new capsicum helpers


Revision 357888 - (view) (download) (annotate) - [select for diffs]
Modified Thu Feb 13 22:22:15 2020 UTC (3 years, 10 months ago) by mjg
File length: 107829 byte(s)
Diff to previous 357470
Partially decompose priv_check by adding priv_check_cred_vfs_generation

During buildkernel there are very frequent calls to priv_check and they
all are for PRIV_VFS_GENERATION (coming from stat/fstat).

This results in branching on several potential privileges checking if
perhaps that's the one which has to be evaluated.

Instead of the kitchen-sink approach provide a way to have commonly used
privs directly evaluated.


Revision 357470 - (view) (download) (annotate) - [select for diffs]
Modified Mon Feb 3 22:27:55 2020 UTC (3 years, 10 months ago) by mjg
File length: 107811 byte(s)
Diff to previous 357467
fd: remove the seq argument from fget_unlocked

It is almost always NULL.


Revision 357467 - (view) (download) (annotate) - [select for diffs]
Modified Mon Feb 3 22:26:00 2020 UTC (3 years, 10 months ago) by mjg
File length: 107817 byte(s)
Diff to previous 357312
ktrace: provide ktrstat_error

This eliminates a branch from its consumers trading it for an extra call
if ktrace is enabled for curthread. Given that this is almost never true,
the tradeoff is worth it.


Revision 357312 - (view) (download) (annotate) - [select for diffs]
Modified Thu Jan 30 20:05:05 2020 UTC (3 years, 10 months ago) by mjg
File length: 107835 byte(s)
Diff to previous 357307
Remove duplicated empty lines from kern/*.c

No functional changes.


Revision 357307 - (view) (download) (annotate) - [select for diffs]
Modified Thu Jan 30 19:38:12 2020 UTC (3 years, 10 months ago) by mjg
File length: 107836 byte(s)
Diff to previous 356510
vfs: keep the mount point referenced across sys_quotactl

Otherwise we risk running into use-after-free.

In particular this codepath ends up dropping all protection before
suspending writes:

ufs_quotactl -> quotaoff_inchange -> vfs_write_suspend_umnt

Reported by:	pho


Revision 356510 - (view) (download) (annotate) - [select for diffs]
Modified Wed Jan 8 19:05:32 2020 UTC (3 years, 11 months ago) by kevans
File length: 107816 byte(s)
Diff to previous 356441
posix_fallocate: push vnop implementation into the fileop layer

This opens the door for other descriptor types to implement
posix_fallocate(2) as needed.

Reviewed by:	kib, bcr (manpages)
Differential Revision:	https://reviews.freebsd.org/D23042


Revision 356441 - (view) (download) (annotate) - [select for diffs]
Modified Tue Jan 7 15:56:24 2020 UTC (3 years, 11 months ago) by mjg
File length: 109700 byte(s)
Diff to previous 356337
vfs: reimplement deferred inactive to use a dedicated flag (VI_DEFINACT)

The previous behavior of leaving VI_OWEINACT vnodes on the active list without
a hold count is eliminated. Hold count is kept and inactive processing gets
explicitly deferred by setting the VI_DEFINACT flag. The syncer is then
responsible for vdrop.

Reviewed by:	kib (previous version)
Tested by:	pho (in a larger patch, previous version)
Differential Revision:	https://reviews.freebsd.org/D23036


Revision 356337 - (view) (download) (annotate) - [select for diffs]
Modified Fri Jan 3 22:29:58 2020 UTC (3 years, 11 months ago) by mjg
File length: 109697 byte(s)
Diff to previous 355660
vfs: drop the mostly unused flags argument from VOP_UNLOCK

Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D21427


Revision 355660 - (view) (download) (annotate) - [select for diffs]
Modified Thu Dec 12 18:45:31 2019 UTC (4 years ago) by trasz
File length: 109763 byte(s)
Diff to previous 355537
Add kern_sync(9), and make kernel code call it instead of going
via sys_sync(2).  Minor cleanup, no functional changes.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19366


Revision 355537 - (view) (download) (annotate) - [select for diffs]
Modified Sun Dec 8 21:30:04 2019 UTC (4 years ago) by mjg
File length: 109699 byte(s)
Diff to previous 354890
vfs: introduce v_irflag and make v_type smaller

The current vnode layout is not smp-friendly by having frequently read data
avoidably sharing cachelines with very frequently modified fields. In
particular v_iflag inspected for VI_DOOMED can be found in the same line with
v_usecount. Instead make it available in the same cacheline as the v_op, v_data
and v_type which all get read all the time.

v_type is avoidably 4 bytes while the necessary data will easily fit in 1.
Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new
flag field with a new value: VIRF_DOOMED.

Reviewed by:	kib, jeff
Differential Revision:	https://reviews.freebsd.org/D22715


Revision 354890 - (view) (download) (annotate) - [select for diffs]
Modified Wed Nov 20 12:05:59 2019 UTC (4 years, 1 month ago) by mjg
File length: 109727 byte(s)
Diff to previous 354574
vfs: change si_usecount management to count used vnodes

Currently si_usecount is effectively a sum of usecounts from all associated
vnodes. This is maintained by special-casing for VCHR every time usecount is
modified. Apart from complicating the code a little bit, it has a scalability
impact since it forces a read from a cacheline shared with said count.

There are no consumers of the feature in the ports tree. In head there are only
2: revoke and devfs_close. Both can get away with a weaker requirement than the
exact usecount, namely just the count of active vnodes. Changing the meaning to
the latter means we only need to modify it on 0<->1 transitions, avoiding the
check plenty of times (and entirely in something like vrefact).

Reviewed by:	kib, jeff
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D22202


Revision 354574 - (view) (download) (annotate) - [select for diffs]
Modified Sun Nov 10 01:08:14 2019 UTC (4 years, 1 month ago) by rmacklem
File length: 109705 byte(s)
Diff to previous 351193
Update copy_file_range(2) to be Linux5 compatible.

The current linux man page and testing done on a fairly recent linux5.n
kernel have identified two changes to the semantics of the linux
copy_file_range system call.
Since the copy_file_range(2) system call is intended to be linux compatible
and is only currently in head/current and not used by any commands,
it seems appropriate to update the system call to be compatible with
the current linux one.
The first of these semantic changes was changed to be compatible with
linux5.n by r354564.
For the second semantic change, the old linux man page stated that, if
infd and outfd referred to the same file, EBADF should be returned.
Now, the semantics is to allow infd and outfd to refer to the same file
so long as the byte ranges defined by the input file offset, output file offset
and len does not overlap. If the byte ranges do overlap, EINVAL should be
returned.
This patch modifies copy_file_range(2) to be linux5.n compatible for this
semantic change.


Revision 351193 - (view) (download) (annotate) - [select for diffs]
Modified Sun Aug 18 18:40:12 2019 UTC (4 years, 4 months ago) by mjg
File length: 109450 byte(s)
Diff to previous 350315
vfs: stop always overwriting ->mnt_stat in VFS_STATFS

The struct is already populated on each mount (and remount). Fields are either
constant or not used by filesystem in the first place.

Some infrequently used functions use it to avoid having to allocate a new buffer
and are left alone.

The current code results in an avoidable copying single-threaded and significant
cache line bouncing multithreaded

While here deduplicate initial filling of the struct.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21317


Revision 350315 - (view) (download) (annotate) - [select for diffs]
Modified Thu Jul 25 05:46:16 2019 UTC (4 years, 5 months ago) by rmacklem
File length: 110132 byte(s)
Diff to previous 346029
Add kernel support for a Linux compatible copy_file_range(2) syscall.

This patch adds support to the kernel for a Linux compatible
copy_file_range(2) syscall and the related VOP_COPY_FILE_RANGE(9).
This syscall/VOP can be used by the NFSv4.2 client to implement the
Copy operation against an NFSv4.2 server to do file copies locally on
the server.
The vn_generic_copy_file_range() function in this patch can be used
by the NFSv4.2 server to implement the Copy operation.
Fuse may also me able to use the VOP_COPY_FILE_RANGE() method.

vn_generic_copy_file_range() attempts to maintain holes in the output
file in the range to be copied, but may fail to do so if the input and
output files are on different file systems with different _PC_MIN_HOLE_SIZE
values.

Separate commits will be done for the generated syscall files and userland
changes. A commit for a compat32 syscall will be done later.

Reviewed by:	kib, asomers (plus comments by brooks, jilles)
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D20584


Revision 346029 - (view) (download) (annotate) - [select for diffs]
Modified Mon Apr 8 14:23:52 2019 UTC (4 years, 8 months ago) by oshogbo
File length: 107094 byte(s)
Diff to previous 345982
In the unlinkat syscall, the operation is performed on the directory
descriptor, not the file descriptor. The file descriptor is used only for
verification so do not expect any additional capabilities on it.

Reported by:	antoine
Tested by:	antoine
Discussed with:	kib, emaste, bapt
Sponsored by:	Fudo Security


Revision 345982 - (view) (download) (annotate) - [select for diffs]
Modified Sat Apr 6 09:34:26 2019 UTC (4 years, 8 months ago) by oshogbo
File length: 107144 byte(s)
Diff to previous 343891
Introduce funlinkat syscall that always us to check if we are removing
the file associated with the given file descriptor.

Reviewed by:	kib, asomers
Reviewed by:	cem, jilles, brooks (they reviewed previous version)
Discussed with:	pjd, and many others
Differential Revision:	https://reviews.freebsd.org/D14567


Revision 343891 - (view) (download) (annotate) - [select for diffs]
Modified Fri Feb 8 04:18:17 2019 UTC (4 years, 10 months ago) by kib
File length: 105921 byte(s)
Diff to previous 342889
Fix renameat(2) for CAPABILITIES kernels.

When renameat(2) is used with:
- absolute path for to;
- tofd not set to AT_FDCWD;
- the target exists
kern_renameat() requires CAP_UNLINK capability on tofd, but
corresponding namei ni_filecap is not initialized at all because the
lookup is absolute.  As result, the check was done against empty filecap
and syscall fails erronously.

Fix it by creating a return flags namei member and reporting if the
lookup was absolute, then do not touch to.ni_filecaps at all.

PR:	222258
Reviewed by:	jilles, ngie
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
X-MFC-note:	KBI breakage
Differential revision:	https://reviews.freebsd.org/D19096


Revision 342889 - (view) (download) (annotate) - [select for diffs]
Modified Wed Jan 9 17:23:59 2019 UTC (4 years, 11 months ago) by brooks
File length: 105846 byte(s)
Diff to previous 341827
style(9): fix the indent of a return.


Revision 341827 - (view) (download) (annotate) - [select for diffs]
Modified Tue Dec 11 19:32:16 2018 UTC (5 years ago) by mjg
File length: 105849 byte(s)
Diff to previous 341809
Remove unused argument to priv_check_cred.

Patch mostly generated with cocinnelle:

@@
expression E1,E2;
@@

- priv_check_cred(E1,E2,0)
+ priv_check_cred(E1,E2)

Sponsored by:	The FreeBSD Foundation


Revision 341809 - (view) (download) (annotate) - [select for diffs]
Modified Tue Dec 11 02:48:49 2018 UTC (5 years ago) by kib
File length: 105855 byte(s)
Diff to previous 341712
Remove special case handling for getfhat(fd, NULL, handle).

There is no reason for it to behave differently from openat(fd, NULL).
Also the handling did not worked because the substituted path was from
the system address space, causing EFAULT.

Submitted by:	Jack Halford <jack@gandi.net>
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D18501


Revision 341712 - (view) (download) (annotate) - [select for diffs]
Modified Fri Dec 7 23:07:51 2018 UTC (5 years ago) by kib
File length: 105873 byte(s)
Diff to previous 341711
Simplify kern_readlink_vp().

When we detected that the vnode is not symlink, return immediately.
This moves the readlink code out of else branch and unindents it.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week


Revision 341711 - (view) (download) (annotate) - [select for diffs]
Modified Fri Dec 7 23:05:12 2018 UTC (5 years ago) by kib
File length: 105893 byte(s)
Diff to previous 341689
Fix expression evaluation.

Braces were put in the wrong place, causing failing EAGAIN check to
return zero result.  Remove the problematic assignment from the
conditional expression at all.

While there, remove used once variable vp, and wrap too long line.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week


Revision 341689 - (view) (download) (annotate) - [select for diffs]
Modified Fri Dec 7 15:17:29 2018 UTC (5 years ago) by kib
File length: 105915 byte(s)
Diff to previous 341223
Add new file handle system calls.

Namely, getfhat(2), fhlink(2), fhlinkat(2), fhreadlink(2).  The
syscalls are provided for a NFS userspace server (nfs-ganesha).

Submitted by:	Jack Halford <jack@gandi.net>
Sponsored by:	Gandi.net
Tested by:	pho
Feedback from:	brooks, markj
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D18359


Revision 341223 - (view) (download) (annotate) - [select for diffs]
Modified Thu Nov 29 09:54:27 2018 UTC (5 years ago) by mjg
File length: 102616 byte(s)
Diff to previous 341220
vfs: fix i386 build after r341220


Revision 341220 - (view) (download) (annotate) - [select for diffs]
Modified Thu Nov 29 09:04:10 2018 UTC (5 years ago) by mjg
File length: 102608 byte(s)
Diff to previous 340080
vfs: drop spurious memcpy in stat

Sponsored by:	The FreeBSD Foundation


Revision 340080 - (view) (download) (annotate) - [select for diffs]
Modified Fri Nov 2 20:50:22 2018 UTC (5 years, 1 month ago) by brooks
File length: 102633 byte(s)
Diff to previous 339748
Add const to input-only char * arguments.

These arguments are mostly paths handled by NAMEI*() macros which already
take const char * arguments.

This change improves the match between syscalls.master and the public
declerations of system calls.

Reviewed by:	kib (prior version)
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17812


Revision 339748 - (view) (download) (annotate) - [select for diffs]
Modified Thu Oct 25 22:16:34 2018 UTC (5 years, 1 month ago) by kib
File length: 102432 byte(s)
Diff to previous 338798
Implement O_BENEATH and AT_BENEATH.

Flags prevent open(2) and *at(2) vfs syscalls name lookup from
escaping the starting directory.  Supposedly the interface is similar
to the same proposed Linux flags.

Reviewed by:	jilles (code, previous version of manpages), 0mp (manpages)
Discussed with:	allanjude, emaste, jonathan
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D17547


Revision 338798 - (view) (download) (annotate) - [select for diffs]
Modified Wed Sep 19 14:36:57 2018 UTC (5 years, 3 months ago) by kib
File length: 101899 byte(s)
Diff to previous 335053
Fix state of dquot-less vnodes after failed quotaoff.

UFS quotaoff iterates over all mp vnodes, and derefences and clears
the pointers to corresponding dquots. If SU work items transiently
reference some of dquots,quotaoff() would eventually fail, but all
processed vnodes are already stripped from dquots.  The state is
problematic, since quotas are left enabled, but there is no dquots
where blocks and inodes can be accounted.  The result is assertion
failures and NULL pointer dereferences.

Fix it by suspending writes around quotaoff() call.  Since the
filesystem is synced, no dandling references to dquots from SU
workitems can left behind, which means that quotaoff succeeds.

The complication there is that quotaoff VFS op is performed with the
mount point busied, while to suspend, we need to start write on the
mp.  If vn_start_write() is called on busied mp, system might deadlock
against parallel unmount request.  Handle this by unbusy-ing mp before
starting write, which in turn requires changing the quotaoff()
interface to return with the mount point not busied, same as was done
for quotaon().

Reviewed by:	mckusick
Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D17208


Revision 335053 - (view) (download) (annotate) - [select for diffs]
Modified Wed Jun 13 12:22:00 2018 UTC (5 years, 6 months ago) by bde
File length: 101851 byte(s)
Diff to previous 335035
Fix the encoding of major and minor numbers in 64-bit dev_t by restoring
the old encodings for the lower 16 and 32 bits and only using the
higher 32 bits for unusually large major and minor numbers.  This
change breaks compatibility with the previous encoding (which was only
used in -current).

Fix truncation to (essentially) 16-bit dev_t in newnfs v3.

Any encoding of device numbers gives an ABI, so it can't be changed
without translations for compatibility.  Extra bits give the much
larger complication that the translations need to compress into fewer
bits.  Fortunately, more than 32 bits are rarely needed, so
compression is rarely needed except for 16-bit linux dev_t where it
was always needed but never done.

The previous encoding moved the major number into the top 32 bits.
Almost no translation code handled this, so the major number was blindly
truncated away in most 32-bit encodings.  E.g., for ffs, mknod(8) with
major = 1 and minor = 2 gave dev_t = 0x10000002; ffs cannot represent
this and blindly truncated it to 2.  But if this mknod was run on any
released version of FreeBSD, it gives dev_t = 0x102.  ffs can represent
this, but in the previous encoding it was not decoded, giving major = 0,
minor = 0x102.

The presence of bugs was most obvious for exporting dev_t's from an
old system to -current, since bugs in newnfs augment them.  I fixed
oldnfs to support 32-bit dev_t in 1996 (r16634), but this regressed
to 16-bit dev_t in newnfs, first to the old 16-bit encoding and then
further in -current.  E.g., old ad0 with major = 234, minor = 0x10002
had the correct (major, minor) number on the wire, but newnfs truncated
this to (234, 2) and then the previous encoding shifted the major
number into oblivion as seen by ffs or old applications.

I first tried to fix this by translating on every ABI/API boundary, but
there are too many boundaries and too many sloppy translations by blind
truncation.  So use the old encoding for the low 32 bits so that sloppy
translations work no worse than before provided the high 32 bits are
not set.  Add some error checking for when bits are lost.  Keep not
doing any error checking for translations for almost everything in
compat/linux.

compat/freebsd32/freebsd32_misc.c:
Optionally check for losing bits after possibly-truncating assignments as
before.

compat/linux/linux_stats.c:
Depend on the representation being compatible with Linux's (or just with
itself for local use) and spell some of the translations as assignments in
a macro that hides the details.

fs/nfsclient/nfs_clcomsubs.c:
Essentially the same fix as in 1996, except there is now no possible
truncation in makedev() itself.  Also fix nearby style bugs.

kern/vfs_syscalls.c:
As for freebsd32.  Also update the sysctl description to include file
numbers, and change it to describe device ids as device numbers.

sys/types.h:
Use inline functions (wrapped by macros) since the expressions are now
a bit too complicated for plain macros.  Describe the encoding and
some of the reasons for it.  16-bit compatibility didn't leave many
reasonable choices for the 32-bit encoding, and 32-bit compatibility
doesn't leave many reasonable choices for the 64-bit encoding.  My
choice is to put the 8 new minor bits in the low 8 bits of the top 32
bits.  This minimizes discontiguities.

Reviewed by:	kib (except for rewrite of the comment in linux_stats.c)


Revision 335035 - (view) (download) (annotate) - [select for diffs]
Modified Wed Jun 13 08:50:43 2018 UTC (5 years, 6 months ago) by bde
File length: 101477 byte(s)
Diff to previous 333920
Fix some bugs found while fixing the representation and translation
of 64-bit dev_t's (but not ones involving dev_t's).

st_size was supposed to be clamped in cvtstat() and linux's copy_stat(),
but the clamping code wasn't aware that st_size is signed, and also had
an obfuscated off-by-1 value for the unsigned limit, so its effect was
to produce a bizarre negative size instead of clamping.

Change freebsd32's copy_ostat() to be no worse than cvtstat().  It was
missing clamping and bzero()ing of padding.

Reviewed by:	kib (except a final fix of the clamp to the signed maximum)


Revision 333920 - (view) (download) (annotate) - [select for diffs]
Modified Sun May 20 05:13:12 2018 UTC (5 years, 7 months ago) by mmacy
File length: 101482 byte(s)
Diff to previous 333425
Add additional preinitialized cap_rights


Revision 333425 - (view) (download) (annotate) - [select for diffs]
Modified Wed May 9 18:47:24 2018 UTC (5 years, 7 months ago) by mmacy
File length: 102434 byte(s)
Diff to previous 332122
Eliminate the overhead of gratuitous repeated reinitialization of cap_rights

- Add macros to allow preinitialization of cap_rights_t.

- Convert most commonly used code paths to use preinitialized cap_rights_t.
  A 3.6% speedup in fstat was measured with this change.

Reported by:	mjg
Reviewed by:	oshogbo
Approved by:	sbruno
MFC after:	1 month


Revision 332122 - (view) (download) (annotate) - [select for diffs]
Modified Fri Apr 6 17:35:35 2018 UTC (5 years, 8 months ago) by brooks
File length: 102626 byte(s)
Diff to previous 328099
Move most of the contents of opt_compat.h to opt_global.h.

opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c.  A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by:	kib, cem, jhb, jtl
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14941


Revision 328099 - (view) (download) (annotate) - [select for diffs]
Modified Wed Jan 17 22:36:58 2018 UTC (5 years, 11 months ago) by jhb
File length: 102650 byte(s)
Diff to previous 326023
Use long for the last argument to VOP_PATHCONF rather than a register_t.

pathconf(2) and fpathconf(2) both return a long.  The kern_[f]pathconf()
functions now accept a pointer to a long value rather than modifying
td_retval directly.  Instead, the system calls explicitly store the
returned long value in td_retval[0].

Requested by:	bde
Reviewed by:	kib
Sponsored by:	Chelsio Communications


Revision 326023 - (view) (download) (annotate) - [select for diffs]
Modified Mon Nov 20 19:43:44 2017 UTC (6 years, 1 month ago) by pfg
File length: 102450 byte(s)
Diff to previous 324853
sys: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.


Revision 324853 - (view) (download) (annotate) - [select for diffs]
Modified Sun Oct 22 08:11:45 2017 UTC (6 years, 2 months ago) by kib
File length: 102406 byte(s)
Diff to previous 324560
Remove the support for mknod(S_IFMT), which created dummy vnodes with
VBAD type.

FFS ffs_write() VOP catches such vnodes and panics, other VOPs do not
check for the type and their behaviour is really undefined.  The
comment claims that this support was done for 'badsect' to flag bad
sectors, we do not have such facility in kernel anyway.

Reported by:	Dmitry Vyukov <dvyukov@google.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week


Revision 324560 - (view) (download) (annotate) - [select for diffs]
Modified Thu Oct 12 15:45:53 2017 UTC (6 years, 2 months ago) by emaste
File length: 102567 byte(s)
Diff to previous 321839
allow posix_fallocate in capability mode

posix_fallocate is logically equivalent to writing zero blocks to the
desired file size and there is no reason to prevent calling it in
capability mode. posix_fallocate already checked for the CAP_WRITE
right, so we merely need to list it in capabilities.conf.

Reviewed by:	allanjude
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D12640


Revision 321839 - (view) (download) (annotate) - [select for diffs]
Modified Tue Aug 1 03:40:19 2017 UTC (6 years, 4 months ago) by dchagin
File length: 102566 byte(s)
Diff to previous 320499
Implement proper Linux /dev/fd and /proc/self/fd behavior by adding
Linux specific things to the native fdescfs file system.

Unlike FreeBSD, the Linux fdescfs is a directory containing a symbolic
links to the actual files, which the process has open.
A readlink(2) call on this file returns a full path in case of regular file
or a string in a special format (type:[inode], anon_inode:<file-type>, etc..).
As well as in a FreeBSD, opening the file in the Linux fdescfs directory is
equivalent to duplicating the corresponding file descriptor.

Here we have mutually exclusive requirements:
- in case of readlink(2) call fdescfs lookup() method should return VLNK
vnode otherwise our kern_readlink() fail with EINVAL error;
- in the other calls fdescfs lookup() method should return non VLNK vnode.

For what new vnode v_flag VV_READLINK was added, which is set if fdescfs has beed
mounted with linrdlnk option an modified kern_readlinkat() to properly handle it.

For now For Linux ABI compatibility mount fdescfs volume with linrdlnk option:

    mount -t fdescfs -o linrdlnk null /compat/linux/dev/fd

Reviewed by:	kib@
MFC after:	1 week
Relnotes:	yes


Revision 320499 - (view) (download) (annotate) - [select for diffs]
Modified Fri Jun 30 16:10:21 2017 UTC (6 years, 5 months ago) by kib
File length: 102530 byte(s)
Diff to previous 319734
Define ino64_trunc_error under same conditions as the code which uses
the variable.

Noted by:	bde
Sponsored by:	The FreeBSD Foundation


Revision 319734 - (view) (download) (annotate) - [select for diffs]
Modified Fri Jun 9 11:17:08 2017 UTC (6 years, 6 months ago) by kib
File length: 102508 byte(s)
Diff to previous 319600
Enhance vfs.ino64_trunc_error sysctl.

Provide a new mode "2" which returns a special overflow indicator in
the non-representable field instead of the silent truncation (mode
"0") or EOVERFLOW (mode "1").

In particular, the typical use of st_ino to detect hard links with
mode "2" reports false positives, which might be more suitable for
some uses.

Discussed with:	bde
Sponsored by:	The FreeBSD Foundation


Revision 319600 - (view) (download) (annotate) - [select for diffs]
Modified Mon Jun 5 11:40:30 2017 UTC (6 years, 6 months ago) by kib
File length: 102139 byte(s)
Diff to previous 318736
Add sysctl vfs.ino64_trunc_error controlling action on truncating
inode number or link count for the ABI compat binaries.

Right now, and by default after the change, too large 64bit values are
silently truncated to 32 bits.  Enabling the knob causes the system to
return EOVERFLOW for stat(2) family of compat syscalls when some
values cannot be completely represented by the old structures.  For
getdirentries(2), knob skips the dirents which would cause non-trivial
truncation of d_ino.

EOVERFLOW error is specified by the X/Open 1996 LFS document
('Adding Support for Arbitrary File Sizes to the Single UNIX
Specification').

Based on the discussion with:	bde
Sponsored by:	The FreeBSD Foundation


Revision 318736 - (view) (download) (annotate) - [select for diffs]
Modified Tue May 23 09:29:05 2017 UTC (6 years, 7 months ago) by kib
File length: 101652 byte(s)
Diff to previous 318389
Commit the 64-bit inode project.

Extend the ino_t, dev_t, nlink_t types to 64-bit ints.  Modify
struct dirent layout to add d_off, increase the size of d_fileno
to 64-bits, increase the size of d_namlen to 16-bits, and change
the required alignment.  Increase struct statfs f_mntfromname[] and
f_mntonname[] array length MNAMELEN to 1024.

ABI breakage is mitigated by providing compatibility using versioned
symbols, ingenious use of the existing padding in structures, and
by employing other tricks.  Unfortunately, not everything can be
fixed, especially outside the base system.  For instance, third-party
APIs which pass struct stat around are broken in backward and
forward incompatible ways.

Kinfo sysctl MIBs ABI is changed in backward-compatible way, but
there is no general mechanism to handle other sysctl MIBS which
return structures where the layout has changed. It was considered
that the breakage is either in the management interfaces, where we
usually allow ABI slip, or is not important.

Struct xvnode changed layout, no compat shims are provided.

For struct xtty, dev_t tty device member was reduced to uint32_t.
It was decided that keeping ABI compat in this case is more useful
than reporting 64-bit dev_t, for the sake of pstat.

Update note: strictly follow the instructions in UPDATING.  Build
and install the new kernel with COMPAT_FREEBSD11 option enabled,
then reboot, and only then install new world.

Credits: The 64-bit inode project, also known as ino64, started life
many years ago as a project by Gleb Kurtsou (gleb).  Kirk McKusick
(mckusick) then picked up and updated the patch, and acted as a
flag-waver.  Feedback, suggestions, and discussions were carried
by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles),
and Rick Macklem (rmacklem).  Kris Moore (kris) performed an initial
ports investigation followed by an exp-run by Antoine Brodin (antoine).
Essential and all-embracing testing was done by Peter Holm (pho).
The heavy lifting of coordinating all these efforts and bringing the
project to completion were done by Konstantin Belousov (kib).

Sponsored by:	The FreeBSD Foundation (emaste, kib)
Differential revision:	https://reviews.freebsd.org/D10439


Revision 318389 - (view) (download) (annotate) - [select for diffs]
Modified Wed May 17 00:34:34 2017 UTC (6 years, 7 months ago) by emaste
File length: 96609 byte(s)
Diff to previous 316334
Remove register keyword from sys/ and ANSIfy prototypes

A long long time ago the register keyword told the compiler to store
the corresponding variable in a CPU register, but it is not relevant
for any compiler used in the FreeBSD world today.

ANSIfy related prototypes while here.

Reviewed by:	cem, jhb
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D10193


Revision 316334 - (view) (download) (annotate) - [select for diffs]
Modified Fri Mar 31 14:17:14 2017 UTC (6 years, 8 months ago) by rwatson
File length: 99883 byte(s)
Diff to previous 313016
Audit arguments to posix_fallocate(2) and posix_fadvise(2) system calls.

As posix_fadvise() does not lock the vnode argument, don't capture
detailed vnode information for the time being.

Obtained from:	TrustedBSD Project
MFC after:	3 weeks
Sponsored by:	DARPA, AFRL


Revision 313016 - (view) (download) (annotate) - [select for diffs]
Modified Tue Jan 31 15:19:44 2017 UTC (6 years, 10 months ago) by trasz
File length: 99590 byte(s)
Diff to previous 312987
Replace calls to sys_truncate() with kern_truncate().

Reviewed by:	kib@
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D9371


Revision 312987 - (view) (download) (annotate) - [select for diffs]
Modified Mon Jan 30 12:24:47 2017 UTC (6 years, 10 months ago) by trasz
File length: 99796 byte(s)
Diff to previous 312986
Add kern_lseek() and use it instead of sys_lseek() in various compats.
I didn't touch svr4/, there's no point.

Reviewed by:	ed@, kib@
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D9366


Revision 312986 - (view) (download) (annotate) - [select for diffs]
Modified Mon Jan 30 11:50:54 2017 UTC (6 years, 10 months ago) by trasz
File length: 100059 byte(s)
Diff to previous 311452
Replace sys_ftruncate() with kern_ftruncate() in various compats.

Reviewed by:	kib@
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D9368


Revision 311452 - (view) (download) (annotate) - [select for diffs]
Modified Thu Jan 5 17:19:26 2017 UTC (6 years, 11 months ago) by kib
File length: 100120 byte(s)
Diff to previous 311447
Do not allocate struct statfs on kernel stack.

Right now size of the structure is 472 bytes on amd64, which is
already large and stack allocations are indesirable.  With the ino64
work, MNAMELEN is increased to 1024, which will make it impossible to have
struct statfs on the stack.

Extracted from:	ino64 work by gleb
Discussed with:	mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week


Revision 311447 - (view) (download) (annotate) - [select for diffs]
Modified Thu Jan 5 17:03:35 2017 UTC (6 years, 11 months ago) by kib
File length: 99430 byte(s)
Diff to previous 311286
Some style fixes for getfstat(2)-related code.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week


Revision 311286 - (view) (download) (annotate) - [select for diffs]
Modified Wed Jan 4 16:09:45 2017 UTC (6 years, 11 months ago) by kib
File length: 99414 byte(s)
Diff to previous 311113
The callers of kern_getfsstat(UIO_SYSSPACE) expect that *buf always
returns memory which must be freed, regardless of the error.  Assign
NULL to *buf in case we are not going to allocate any memory due to
invalid mode.

Reported and tested by:	pho
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks (together with r310638)
Differential revision:	https://reviews.freebsd.org/D9042


Revision 311113 - (view) (download) (annotate) - [select for diffs]
Modified Mon Jan 2 18:59:23 2017 UTC (6 years, 11 months ago) by kib
File length: 99368 byte(s)
Diff to previous 311111
There is no need to use temporary statfs buffer for fsid obliteration
and prison enforcement.  Do it on the caller buffer directly.

Besides eliminating memory copies, this change also removes large
structure from the kernel stack.

Extracted from:	ino64 work by gleb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week


Revision 311111 - (view) (download) (annotate) - [select for diffs]
Modified Mon Jan 2 18:49:48 2017 UTC (6 years, 11 months ago) by kib
File length: 99410 byte(s)
Diff to previous 311108
Style.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week


Revision 311108 - (view) (download) (annotate) - [select for diffs]
Modified Mon Jan 2 18:20:22 2017 UTC (6 years, 11 months ago) by kib
File length: 99402 byte(s)
Diff to previous 310638
Move common code from kern_statfs() and kern_fstatfs() into a new helper.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week


Revision 310638 - (view) (download) (annotate) - [select for diffs]
Modified Tue Dec 27 20:21:11 2016 UTC (6 years, 11 months ago) by jhb
File length: 99935 byte(s)
Diff to previous 309929
Rename the 'flags' argument to getfsstat() to 'mode' and validate it.

This argument is not a bitmask of flags, but only accepts a single value.
Fail with EINVAL if an invalid value is passed to 'flag'.  Rename the
'flags' argument to getmntinfo(3) to 'mode' as well to match.

This is a followup to r308088.

Reviewed by:	kib
MFC after:	1 month


Revision 309929 - (view) (download) (annotate) - [select for diffs]
Modified Mon Dec 12 19:16:35 2016 UTC (7 years ago) by mjg
File length: 99885 byte(s)
Diff to previous 308212
vfs: use vrefact in getcwd and fchdir


Revision 308212 - (view) (download) (annotate) - [select for diffs]
Modified Wed Nov 2 12:43:15 2016 UTC (7 years, 1 month ago) by kib
File length: 99882 byte(s)
Diff to previous 308209
Allow some dotdot lookups in capability mode.

If dotdot lookup does not escape from the file descriptor passed as
the lookup root, we can allow the component traversal.  Track the
directories traversed, and check the result of dotdot lookup against
the recorded list of the directory vnodes.

Dotdot lookups are enabled by sysctl vfs.lookup_cap_dotdot, currently
disabled by default until more verification of the approach is done.

Disallow non-local filesystems for dotdot, since remote server might
conspire with the local process to allow it to escape the namespace.
This might be too cautious, provide the knob
vfs.lookup_cap_dotdot_nonlocal to override as well.

Idea by:	rwatson
Discussed with:	emaste, jonathan, rwatson
Reviewed by:	mjg (previous version)
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 week
Differential revision:	https://reviews.freebsd.org/D8110


Revision 308209 - (view) (download) (annotate) - [select for diffs]
Modified Wed Nov 2 09:43:19 2016 UTC (7 years, 1 month ago) by trasz
File length: 99852 byte(s)
Diff to previous 308088
Fix getfsstat(2) with MNT_WAIT to not skip filesystems that are in the
process of being unmounted.  Previously it would skip them, even if the
unmount eventually failed eg due to the filesystem being busy.

This behaviour broke autounmountd(8) - if you tried to manually unmount
a mounted filesystem, using 'automount -u', and the autounmountd attempted
to refresh the filesystem list in that very moment, it would conclude that
the filesystem got unmounted and not try to unmount it afterwards.

Reviewed by:	kib@
Tested by:	pho@
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D8030


Revision 308088 - (view) (download) (annotate) - [select for diffs]
Modified Sat Oct 29 12:38:30 2016 UTC (7 years, 1 month ago) by trasz
File length: 99368 byte(s)
Diff to previous 305832
Fix getfsstat(2) handling of flags. The 'flags' argument is an enum,
not a bitfield. For the intended usage - being passed either MNT_WAIT,
or MNT_NOWAIT - this shouldn't introduce any changes in behaviour.

Reviewed by:	jhb@
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D8373


Revision 305832 - (view) (download) (annotate) - [select for diffs]
Modified Thu Sep 15 13:16:20 2016 UTC (7 years, 3 months ago) by emaste
File length: 99419 byte(s)
Diff to previous 304185
Renumber license clauses in sys/kern to avoid skipping #3


Revision 304185 - (view) (download) (annotate) - [select for diffs]
Modified Mon Aug 15 20:11:52 2016 UTC (7 years, 4 months ago) by ed
File length: 99419 byte(s)
Diff to previous 304176
Eliminate use of sys_fsync() and sys_fdatasync().

Make the kern_fsync() function public, so that it can be used by other
parts of the kernel. Fix up existing consumers to make use of it.

Requested by:	kib


Revision 304176 - (view) (download) (annotate) - [select for diffs]
Modified Mon Aug 15 19:08:51 2016 UTC (7 years, 4 months ago) by kib
File length: 99426 byte(s)
Diff to previous 302893
Add an implementation of fdatasync(2).

The syscall is a trivial wrapper around new VOP_FDATASYNC(), sharing
code with fsync(2).  For all filesystems, this commit provides the
implementation which delegates the work of VOP_FDATASYNC() to
VOP_FSYNC().  This is functionally correct but not efficient.

This is not yet POSIX-compliant implementation, because it does not
ensure that queued AIO requests are completed before returning.

Reviewed by:	mckusick
Discussed with:	avg (ZFS), jhb (AIO part)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D7471


Revision 302893 - (view) (download) (annotate) - [select for diffs]
Modified Fri Jul 15 09:23:18 2016 UTC (7 years, 5 months ago) by kib
File length: 99133 byte(s)
Diff to previous 302519
Do not allow creation of char or block special nodes with VNOVAL dev_t.

As was reported on http://seclists.org/oss-sec/2016/q3/68, tmpfs code
contains assertion that rdev != VNOVAL.  On FreeBSD, there is no other
consequences except triggering the assert.  To be compatible with
systems where device nodes have some significance, reject mknod(2)
call with dev == VNOVAL at the syscall level.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week


Revision 302519 - (view) (download) (annotate) - [select for diffs]
Modified Sun Jul 10 09:50:21 2016 UTC (7 years, 5 months ago) by rwatson
File length: 99079 byte(s)
Diff to previous 301053
Audit the file-descriptor number argument for openat(2).  Remove a comment
about the desirability of auditing the number, as it was in fact in the
wrong place (in the common path for open(2) and openat(2), and only the
latter accepts a file-descriptor argument).  Where other ABIs support
openat(2), it may be necessary to do additional argument auditing as it is
not performed in kern_openat(9).

MFC after:	3 days
Sponsored by:	DARPA, AFRL


Revision 301053 - (view) (download) (annotate) - [select for diffs]
Modified Tue May 31 16:56:30 2016 UTC (7 years, 6 months ago) by glebius
File length: 99079 byte(s)
Diff to previous 296572
Fix kernel stack disclosures in the Linux and 4.3BSD compat layers.

Submitted by:	CTurt
Security:	SA-16:20
Security:	SA-16:21


Revision 296572 - (view) (download) (annotate) - [select for diffs]
Modified Wed Mar 9 19:05:11 2016 UTC (7 years, 9 months ago) by jhb
File length: 99052 byte(s)
Diff to previous 296060
Simplify AIO initialization now that it is standard.

- Mark AIO system calls as STD and remove the helpers to dynamically
  register them.
- Use COMPAT6 for the old system calls with the older sigevent instead of
  an 'o' prefix.
- Simplify the POSIX configuration to note that AIO is always available.
- Handle AIO in the default VOP_PATHCONF instead of special casing it in
  the pathconf() system call.  fpathconf() is still hackish.
- Remove freebsd32_aio_cancel() as it just called the native one directly.

Reviewed by:	kib
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D5589


Revision 296060 - (view) (download) (annotate) - [select for diffs]
Modified Thu Feb 25 19:58:23 2016 UTC (7 years, 9 months ago) by markj
File length: 99468 byte(s)
Diff to previous 295358
Improve error handling for posix_fallocate(2) and posix_fadvise(2).

- Set td_errno so that ktrace and dtrace can obtain the syscall error
  number in the usual way.
- Pass negative error numbers directly to the syscall layer, as they're
  not intended to be returned to userland.

Reviewed by:	kib
Sponsored by:	EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D5425


Revision 295358 - (view) (download) (annotate) - [select for diffs]
Modified Sun Feb 7 01:04:47 2016 UTC (7 years, 10 months ago) by mckusick
File length: 99419 byte(s)
Diff to previous 291098
Clarify a comment in kern_openat() about the use of falloc_noinstall().

Suggested by: Steve Jacobson


Revision 291098 - (view) (download) (annotate) - [select for diffs]
Modified Fri Nov 20 14:08:12 2015 UTC (8 years, 1 month ago) by trasz
File length: 99381 byte(s)
Diff to previous 288640
The freebsd4_getfsstat() was broken in r281551 to always return 0 on success.
All versions of getfsstat(3) are supposed to return the number of [o]statfs
structs in the array that was copied out.

Also fix missing bounds checking and signed comparison of unsigned types.

Submitted by:	bde@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation


Revision 288640 - (view) (download) (annotate) - [select for diffs]
Modified Sat Oct 3 22:27:14 2015 UTC (8 years, 2 months ago) by markj
File length: 99220 byte(s)
Diff to previous 288628
Revert r288628 and instead fix a discrepancy between the posix_fadvise(2)
man page and POSIX: posix_fadvise(2) returns an error number on failure.

Reported by:	jilles
MFC after:	1 week


Revision 288628 - (view) (download) (annotate) - [select for diffs]
Modified Sat Oct 3 19:37:41 2015 UTC (8 years, 2 months ago) by markj
File length: 99197 byte(s)
Diff to previous 288431
The return value of posix_fadvise(2) is just an error status, so
sys_posix_fadvise() should simply return the errno (or 0) to syscallenter()
rather than setting a return value.

MFC after:	1 week


Revision 288431 - (view) (download) (annotate) - [select for diffs]
Modified Wed Sep 30 23:06:29 2015 UTC (8 years, 2 months ago) by markj
File length: 99220 byte(s)
Diff to previous 288336
As a step towards the elimination of PG_CACHED pages, rework the handling
of POSIX_FADV_DONTNEED so that it causes the backing pages to be moved to
the head of the inactive queue instead of being cached.

This affects the implementation of POSIX_FADV_NOREUSE as well, since it
works by applying POSIX_FADV_DONTNEED to file ranges after they have been
read or written.  At that point the corresponding buffers may still be
dirty, so the previous implementation would coalesce successive ranges and
apply POSIX_FADV_DONTNEED to the result, ensuring that pages backing the
dirty buffers would eventually be cached.  To preserve this behaviour in an
efficient manner, this change adds a new buf flag, B_NOREUSE, which causes
the pages backing a VMIO buf to be placed at the head of the inactive queue
when the buf is released.  POSIX_FADV_NOREUSE then works by setting this
flag in bufs that underlie the specified range.

Reviewed by:	alc, kib
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D3726


Revision 288336 - (view) (download) (annotate) - [select for diffs]
Modified Mon Sep 28 12:14:16 2015 UTC (8 years, 2 months ago) by avg
File length: 99270 byte(s)
Diff to previous 287209
save some bytes by using more concise SDT_PROBE<n> instead of SDT_PROBE

SDT_PROBE requires 5 parameters whereas SDT_PROBE<n> requires n parameters
where n is typically smaller than 5.

Perhaps SDT_PROBE should be made a private implementation detail.

MFC after:	20 days


Revision 287209 - (view) (download) (annotate) - [select for diffs]
Modified Thu Aug 27 15:16:41 2015 UTC (8 years, 3 months ago) by ed
File length: 99286 byte(s)
Diff to previous 285416
Decompose linkat()/renameat() rights to source and target.

To make it easier to understand how Capsicum interacts with linkat() and
renameat(), rename the rights to CAP_{LINK,RENAME}AT_{SOURCE,TARGET}.

This also addresses a shortcoming in Capsicum, where it isn't possible
to disable linking to files stored in a directory. Creating hardlinks
essentially makes it possible to access files with additional rights.

Reviewed by:	rwatson, wblock
Differential Revision:	https://reviews.freebsd.org/D3411


Revision 285416 - (view) (download) (annotate) - [select for diffs]
Modified Sun Jul 12 00:26:22 2015 UTC (8 years, 5 months ago) by bz
File length: 99195 byte(s)
Diff to previous 285390
Try to unbreak the build after r285390 removing the obsolete static
declaration.


Revision 285390 - (view) (download) (annotate) - [select for diffs]
Modified Sat Jul 11 16:19:11 2015 UTC (8 years, 5 months ago) by mjg
File length: 99252 byte(s)
Diff to previous 284446
Move chdir/chroot-related fdp manipulation to kern_descrip.c

Prefix exported functions with pwd_.

Deduplicate some code by adding a helper for setting fd_cdir.

Reviewed by:	kib


Revision 284446 - (view) (download) (annotate) - [select for diffs]
Modified Tue Jun 16 13:09:18 2015 UTC (8 years, 6 months ago) by mjg
File length: 101267 byte(s)
Diff to previous 283058
Replace struct filedesc argument in getvnode with struct thread

This is is a step towards removal of spurious arguments.


Revision 283058 - (view) (download) (annotate) - [select for diffs]
Modified Mon May 18 13:43:33 2015 UTC (8 years, 7 months ago) by mjg
File length: 101399 byte(s)
Diff to previous 281714
Tidy up sys_umask a little bit

Consistently use saved fdp pointer as it cannot change. If it could change the
code would be already incorrect.

No functional changes.


Revision 281714 - (view) (download) (annotate) - [select for diffs]
Modified Sat Apr 18 21:50:13 2015 UTC (8 years, 8 months ago) by kib
File length: 101436 byte(s)
Diff to previous 281551
The lseek(2), mmap(2), truncate(2), ftruncate(2), pread(2), and
pwrite(2) syscalls are wrapped to provide compatibility with pre-7.x
kernels which required padding before the off_t parameter.  The
fcntl(2) contains compatibility code to handle kernels before the
struct flock was changed during the 8.x CURRENT development.  The
shims were reasonable to allow easier revert to the older kernel at
that time.

Now, two or three major releases later, shims do not serve any
purpose.  Such old kernels cannot handle current libc, so revert the
compatibility code.

Make padded syscalls support conditional under the COMPAT6 config
option.  For COMPAT32, the syscalls were under COMPAT6 already.

Remove WITHOUT_SYSCALL_COMPAT build option, which only purpose was to
(partially) disable the removed shims.

Reviewed by:	jhb, imp (previous versions)
Discussed with:	peter
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week


Revision 281551 - (view) (download) (annotate) - [select for diffs]
Modified Wed Apr 15 09:13:11 2015 UTC (8 years, 8 months ago) by trasz
File length: 101364 byte(s)
Diff to previous 281086
Rewrite linprocfs_domtab() as a wrapper around kern_getfsstat(). This
adds missing jail and MAC checks.

Differential Revision:	https://reviews.freebsd.org/D2193
Reviewed by:	kib@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation


Revision 281086 - (view) (download) (annotate) - [select for diffs]
Modified Sat Apr 4 21:47:54 2015 UTC (8 years, 8 months ago) by jilles
File length: 101240 byte(s)
Diff to previous 278930
utimensat: Correct Capsicum required capability rights.


Revision 278930 - (view) (download) (annotate) - [select for diffs]
Added Tue Feb 17 23:54:06 2015 UTC (8 years, 10 months ago) by mjg
File length: 101168 byte(s)
Diff to previous 277610
filedesc: simplify fget_unlocked & friends

Introduce fget_fcntl which performs appropriate checks when needed.
This removes a branch from fget_unlocked.

Introduce fget_mmap dealing with cap_rights_to_vmprot conversion.
This removes a branch from _fget.

Modify fget_unlocked to pass sequence counter to interested callers so
that they can perform their own checks and make sure the result was
otained from stable & current state.

Reviewed by:	silence on -hackers



This form allows you to request diffs between any two revisions of this file. For each of the two "sides" of the diff, enter a numeric revision.

  Diffs between and
  Type of Diff should be a

  ViewVC Help
Powered by ViewVC 1.1.27