Parent Directory
|
Revision Log
| Links to HEAD: | (view) (download) (annotate) |
| Sticky Revision: |
vfs: remove the thread argument from vget It was already asserted to be curthread. Semantic patch: @@ expression arg1, arg2, arg3; @@ - vget(arg1, arg2, arg3) + vget(arg1, arg2)
vfs: rework vnode list management The current notion of an active vnode is eliminated. Vnodes transition between 0<->1 hold counts all the time and the associated traversal between different lists induces significant scalability problems in certain workloads. Introduce a global list containing all allocated vnodes. They get unlinked only when UMA reclaims memory and are only requeued when hold count reaches 0. Sample result from an incremental make -s -j 104 bzImage on tmpfs: stock: 118.55s user 3649.73s system 7479% cpu 50.382 total patched: 122.38s user 1780.45s system 6242% cpu 30.480 total Reviewed by: jeff Tested by: pho (in a larger patch, previous version) Differential Revision: https://reviews.freebsd.org/D22997
vfs: drop the mostly unused flags argument from VOP_UNLOCK Filesystems which want to use it in limited capacity can employ the VOP_UNLOCK_FLAGS macro. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D21427
Properly check for writers when fetching quotas for writeable vnodes in UFS quotaon(). Reviewed by: markj MFC after: 1 week Differential revision: https://reviews.freebsd.org/D21560
When updating the user or group disk quotas for the return of inodes or disk blocks, set the FORCE flag in the call to chkiq() or chkdq() since the user is always allowed to return resources and hence there is no need to check the user's credential . Reported by: Christopher Krah, Thomas Barabosch, and Jan-Niclas Hilgert of Fraunhofer FKIE Reported as: FS-1-UFS-1: Denial Of Service in mount (prison_priv_check) Discussed with: kib MFC: 1 week Sponsored by: Netflix
Remove unused argument to priv_check_cred. Patch mostly generated with cocinnelle: @@ expression E1,E2; @@ - priv_check_cred(E1,E2,0) + priv_check_cred(E1,E2) Sponsored by: The FreeBSD Foundation
Fix state of dquot-less vnodes after failed quotaoff. UFS quotaoff iterates over all mp vnodes, and derefences and clears the pointers to corresponding dquots. If SU work items transiently reference some of dquots,quotaoff() would eventually fail, but all processed vnodes are already stripped from dquots. The state is problematic, since quotas are left enabled, but there is no dquots where blocks and inodes can be accounted. The result is assertion failures and NULL pointer dereferences. Fix it by suspending writes around quotaoff() call. Since the filesystem is synced, no dandling references to dquots from SU workitems can left behind, which means that quotaoff succeeds. The complication there is that quotaoff VFS op is performed with the mount point busied, while to suspend, we need to start write on the mp. If vn_start_write() is called on busied mp, system might deadlock against parallel unmount request. Handle this by unbusy-ing mp before starting write, which in turn requires changing the quotaoff() interface to return with the mount point not busied, same as was done for quotaon(). Reviewed by: mckusick Reported and tested by: pho Sponsored by: The FreeBSD Foundation Approved by: re (gjb) MFC after: 1 week Differential revision: https://reviews.freebsd.org/D17208
sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.
Renumber copyright clause 4 Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96
Reduce size of ufs inode. Remove redunand i_dev and i_fs pointers, which are available as ip->i_ump->um_dev and ip->i_ump->um_fs, and reorder members by size to reduce padding. To compensate added derefences, the most often i_ump access to differentiate between UFS1 and UFS2 dinode layout is removed, by addition of the new i_flag IN_UFS2. Overall, this actually reduces the amount of memory dereferences. On 64bit machine, original struct inode size is 176, reduced to 152 bytes with the change. Tested by: pho (previous version) Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
In dqsync(), when called from quotactl(), um_quotas entry might appear cleared since nothing prevents completion of the parallel quotaoff. There is nothing to sync in this case, and no reason to panic. Reported and tested by: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
Replace all remaining calls to vprint(9) with vn_printf(9), and remove the old macro. MFC after: 1 month
The sys_quotactl() contract demands that the mount point is vfs_unbusy()ed when the cmd is Q_QUOTAON, regardless of other input parameters or error return. Submitted by: Conrad Meyer Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D1684 Tested by: pho MFC after: 1 week
Use lockless quota checks in qsync and qsyncvp. No strong objections from: kib, mckusick MFC after: 1 week
Direct access to the quota files, in particular, lookup, causes lock conflict with the quota metadata access. Mark quota vnode lock as recursive and always exclusive to avoid the problem. Reported by: hrs Tested by: hrs, pho Sponsored by: The FreeBSD Foundation MFC after: 1 week
Properly handle unsigned comparison. MFC after: 2 weeks
The softdep freeblks workitem might hold a reference on the dquot. Current dqflush() panics when a dquot with with non-zero refcount is encountered. The situation is possible, because quotas are turned off before softdep workitem queue if flushed, due to the quota file writes might create softdep workitems. Make the encountering an active dquot in dqflush() not fatal, return the error from quotaoff() instead. Ignore the quotaoff() failures when ffs_flushfiles() is called in the course of softdep_flushfiles() loop, until the last iteration. At the last loop, the quotas must be closed, and because SU workitems should be already flushed, the references to dquot are gone. Sponsored by: The FreeBSD Foundation Reported and tested by: pho Reviewed by: mckusick MFC after: 2 weeks
Fix a typo, resulting in the NULL pointer dereference. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 days
Remove the support for using non-mpsafe filesystem modules. In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho
This update uses the MNT_VNODE_FOREACH_ACTIVE interface that loops over just the active vnodes associated with a mount point to replace MNT_VNODE_FOREACH_ALL in the vfs_msync, ffs_sync_lazy, and qsync routines. The vfs_msync routine is run every 30 seconds for every writably mounted filesystem. It ensures that any files mmap'ed from the filesystem with modified pages have those pages queued to be written back to the file from which they are mapped. The ffs_lazy_sync and qsync routines are run every 30 seconds for every writably mounted UFS/FFS filesystem. The ffs_lazy_sync routine ensures that any files that have been accessed in the previous 30 seconds have had their access times queued for updating in the filesystem. The qsync routine ensures that any files with modified quotas have those quotas queued to be written back to their associated quota file. In a system configured with 250,000 vnodes, less than 1000 are typically active at any point in time. Prior to this change all 250,000 vnodes would be locked and inspected twice every minute by the syncer. For UFS/FFS filesystems they would be locked and inspected six times every minute (twice by each of these three routines since each of these routines does its own pass over the vnodes associated with a mount point). With this change the syncer now locks and inspects only the tiny set of vnodes that are active. Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks
Replace the MNT_VNODE_FOREACH interface with MNT_VNODE_FOREACH_ALL. The primary changes are that the user of the interface no longer needs to manage the mount-mutex locking and that the vnode that is returned has its mutex locked (thus avoiding the need to check to see if its is DOOMED or other possible end of life senarios). To minimize compatibility issues for third-party developers, the old MNT_VNODE_FOREACH interface will remain available so that this change can be MFC'ed to 9. Following the MFC to 9, MNT_VNODE_FOREACH will be removed in head. The reason for this update is to prepare for the addition of the MNT_VNODE_FOREACH_ACTIVE interface that will loop over just the active vnodes associated with a mount point (typically less than 1% of the vnodes associated with the mount point). Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks
Microoptimize: in qsync loop over mount vnodes, only unlock mount interlock after we committed to try to vget() the vnode. Submitted by: bde Reviewed by: mckusick Tested by: pho MFC after: 1 week
Properly lock DQREF() with dqhlock. Missed locking caused counter corruption. Assert that the dq reference value is sane before decrementing it. Reported and tested by: pho MFC after: 1 week
Avoid LOR between vfs_busy() lock and covered vnode lock on quotaon(). The vfs_busy() is after covered vnode lock in the global lock order, but since quotaon() does recursive VFS call to open quota file, we usually end up locking covered vnode after mp is busied in sys_quotactl(). Change the interface of VFS_QUOTACTL(), requiring that mp was unbusied by fs code, and do not try to pick up vfs_busy() reference in ufs quotaon, esp. if vfs_busy cannot succeed due to unmount being performed. Reported and tested by: pho MFC after: 1 week
- Add support for referencing quota structures without needing the inode pointer for softupdates. Submitted by: mckusick
Simplify uses of the web of pointers. Reviewed by: mckusick MFC after: 1 week
Embed a quota error message (C string) into uprintf() fmt. While here, fix whitespaces. Approved by: kib (mentor)
Extend the scope of the lock on the quota file vnode in quotaon() to cover the initial read by dqopen(). Assert that vnode is locked in dqopen(). Remove VFS_LOCK_GIANT() from dqopen(), since quotaon() keeps Giant locked if needed around the call.
Merger of the quota64 project into head. This joint work of Dag-Erling Smørgrav and myself updates the FFS quota system to support both traditional 32-bit and new 64-bit quotas (for those of you who want to put 2+Tb quotas on your users). By default quotas are not compiled into the kernel. To include them in your kernel configuration you need to specify: options QUOTA # Enable FFS quotas If you are already running with the current 32-bit quotas, they should continue to work just as they have in the past. If you wish to convert to using 64-bit quotas, use `quotacheck -c 64'; if you wish to revert from 64-bit quotas back to 32-bit quotas, use `quotacheck -c 32'. There is a new library of functions to simplify the use of the quota system, do `man quotafile' for details. If your application is currently using the quotactl(2), it is highly recommended that you convert your application to use the quotafile interface. Note that existing binaries will continue to work. Special thanks to John Kozubik of rsync.net for getting me interested in pursuing 64-bit quota support and for funding part of my development time on this project.
The dqrele() function syncs the dq, then acquires the dqh lock, and then does final drop of the the dq reference to put it onto the free list. There is a possibility that the dq would be found by another thread after sync and before the dqh lock is acquired. If that other thread drops the dq before we have taken the dqh lock, the dirty dq is put on the free list. Recheck the DQ_MOD after the dqh lock is relocked. Repeat dqsync() if the dq is dirty. This ensures that up to date dq is written in the quota file and fixes assertion in dqget(). Reported and tested by: Frode Nordahl <frode nordahl net> MFC after: 3 days
Improve usefulness of the panic by printing the pointer to the problematic dquot. In-tree gdb is often unable to get the dq value, so supply it in panic message. MFC after: 3 days
Whitespace, prototypes
VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>
Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in some cases, move to priv_check() if it was an operation on a thread and no other flags were present. Eliminate caller-side jail exception checking (also now-unused); jail privilege exception code now goes solely in kern_jail.c. We can't yet eliminate suser() due to some cases in the KAME code where a privilege check is performed and then used in many different deferred paths. Do, however, move those prototypes to priv.h. Reviewed by: csjp Obtained from: TrustedBSD Project
Revert UF_OPENING workaround for CURRENT. Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation argument from being file descriptor index into the pointer to struct file. Proposed and reviewed by: jhb Reviewed by: daichi (unionfs) Approved by: re (kensmith)
Implement fine-grained locking for UFS quotas. Each struct dquot gets dq_lock mutex to protect dq_flags and to interlock with DQ_LOCK. qhash, dqfreelist and dq.dq_cnt are protected by global dqhlock mutex. i_dquot array for inode is protected by lockmgr' vnode lock, corresponding assert added to the dqget(). Access to struct ufsmount quota-related fields (um_quotas and um_qflags) is protected by um_lock. Tested by: Peter Holm Reviewed by: tegge Approved by: re (kensmith) This work were not possible without enormous amount of help given by Tor Egge and Peter Holm. Tor reviewed each version of patch, pointed out numerous errors and provided invaluable suggestions. Peter did tireless testing of the patch as it was developed.
Rename three quota privileges from the UFS privilege namespace to the VFS privilege namespace: exceedquota, getquota, and setquota. Leave UFS-specific quota configuration privileges in the UFS name space. This renumbers VFS and UFS privileges, so requires rebuilding modules if you are using security policies aware of privilege identifiers. This is likely no one at this point since none of the committed MAC policies use the privilege checks.
Limit quota privileges in jail to PRIV_UFS_GETQUOTA and PRIV_UFS_SETQUOTA.
Style(9).
If quotacheck or edquota reset the block or inode grace time for a user or group, when the kernel first sees this, it will update the grace time value. However, it never flags the quota as modified and the updated value never makes it to the quota data file unless the user actually makes some other change that would write the data out. Fixed to flag the quota as modified if the soft limit has actually been reached and should be now enforced.
Disallow negative UIDs when processing quotactl options.
Fix build. chkdquot() should not return anything.
Quota system cleanup.
1) Do not do quota accounting for the actual quota data files
or for file system snapshot files ("system" files). This
prevents a deadlock descibed in PR kern/30958 if the kernel
ever has to grow the quota file. Snapshot files were already
exempt from the quota checks, but this change generalized the check.
2) Fix a cast that caused extremely large uids/gids to incorrectly
write the quota information to the data file at a truncated
value for a uint_t32 id value. The incorrect cast caused quota
files in this case to be around 4GB in size, with the correct cast
they can now be 131GB in size. Also related to PR kern/30958.
3) Check for what appear to be negative UIDs/GIDs and not account
for them. This prevents the quota files from becoming 131GB in
size and causing quotacheck to run forever at bootup. This could
also cause the kernel to try and expand the quota file, which might
deadlock due to the issue in #1. kern/30958 and kern/38156
(and some much older closed PR's).
4) With the deadlock problems gone, the kernel can now expand the
size of the quota database files if it needs to.
5) Pass in the i-node count change value to chkiq and chkiqchg as an
int, like it used to be before the common routine was split up
into 2 different routines to increase / decrease the i-node in-use
count. Prevents an underflow on the i-node count. Related
to PR kern/89247.
6) Prevent the block usage from growing slowly if a file system is
full and the write was denied due to that fact. PR kern/89247.
Some of these changes require an updated quotacheck to prevent
the creation of huge (131GB) quota data files (item #3).
#1/#4 probably fixes a lot of the random hangs when quotas are enabled,
possibly some of the jail hangs.
Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges. These may
require some future tweaking.
Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>
Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag. This eliminates a race where MNT_UPDATE flag could be lost when nmount() raced against sync(), sync_fsync() or quotactl().
Declare security and security.bsd sysctl hierarchies in sysctl.h along with other commonly used sysctl name spaces, rather than declaring them all over the place. MFC after: 1 month Sponsored by: nCircle Network Security, Inc.
Turn off disk quotas for snapshot files.
Use vn_start_secondary_write() and vn_finished_secondary_write() as a replacement for vn_write_suspend_wait() to better account for secondary write processing. Close race where secondary writes could be started after ffs_sync() returned but before the file system was marked as suspended. Detect if secondary writes or softdep processing occurred during vnode sync loop in ffs_sync() and retry the loop if needed.
- Using LK_NOWAIT in qsync() can get us into infinite loop situations that lead to deadlocks. Remove it. MFC After: 1 week
In quotaoff(), lock the vnode instead of asserting it when manipulating v_vflags. MFC after: 1 week Submitted by: Antoine Brodin <antoine at brodin at laposte dot net>
Instead of asserting the vnode lock before manipulating v_vflag, acquire it and drop it afterwards. Found by: kris MFC after: 1 week
Add marker vnodes to ensure that all vnodes associated with the mount point are iterated over when using MNT_VNODE_FOREACH. Reviewed by: truckman
Eradicate caddr_t from the VFS API.
Normalize a significant number of kernel malloc type names: - Prefer '_' to ' ', as it results in more easily parsed results in memory monitoring tools such as vmstat. - Remove punctuation that is incompatible with using memory type names as file names, such as '/' characters. - Disambiguate some collisions by adding subsystem prefixes to some memory types. - Generally prefer lower case to upper case. - If the same type is defined in multiple architecture directories, attempt to use the same name in additional cases. Not all instances were caught in this change, so more work is required to finish this conversion. Similar changes are required for UMA zone names.
/* -> /*- for license, minor formatting changes
Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is somewhat clearer, but more importantly allows for a consistent naming scheme for suser_cred flags. The old name is still defined, but will be removed in a few days (unless I hear any complaints...) Discussed with: rwatson, scottl Requested by: jhb
When we traverse the vnodes on a mountpoint we need to look out for
our cached 'next vnode' being removed from this mountpoint. If we
find that it was recycled, we restart our traversal from the start
of the list.
Code to do that is in all local disk filesystems (and a few other
places) and looks roughly like this:
MNT_ILOCK(mp);
loop:
for (vp = TAILQ_FIRST(&mp...);
(vp = nvp) != NULL;
nvp = TAILQ_NEXT(vp,...)) {
if (vp->v_mount != mp)
goto loop;
MNT_IUNLOCK(mp);
...
MNT_ILOCK(mp);
}
MNT_IUNLOCK(mp);
The code which takes vnodes off a mountpoint looks like this:
MNT_ILOCK(vp->v_mount);
...
TAILQ_REMOVE(&vp->v_mount->mnt_nvnodelist, vp, v_nmntvnodes);
...
MNT_IUNLOCK(vp->v_mount);
...
vp->v_mount = something;
(Take a moment and try to spot the locking error before you read on.)
On a SMP system, one CPU could have removed nvp from our mountlist
but not yet gotten to assign a new value to vp->v_mount while another
CPU simultaneously get to the top of the traversal loop where it
finds that (vp->v_mount != mp) is not true despite the fact that
the vnode has indeed been removed from our mountpoint.
Fix:
Introduce the macro MNT_VNODE_FOREACH() to traverse the list of
vnodes on a mountpoint while taking into account that vnodes may
be removed from the list as we go. This saves approx 65 lines of
duplicated code.
Split the insmntque() which potentially moves a vnode from one mount
point to another into delmntque() and insmntque() which does just
what the names say.
Fix delmntque() to set vp->v_mount to NULL while holding the
mountpoint lock.
Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999 and irc message from Robert Watson saying that clause 3 can be removed from those files with an NAI copyright that also have only a University of California copyrights. Approved by: core, rwatson
Remove mntvnode_mtx and replace it with per-mountpoint mutex. Introduce two new macros MNT_ILOCK(mp)/MNT_IUNLOCK(mp) to operate on this mutex transparently. Eventually new mutex will be protecting more fields in struct mount, not only vnode list. Discussed with: jeff
Take care not to call vput if thread used in corresponding vget wasn't curthread, i.e. when we receive a thread pointer to use as a function argument. Use VOP_UNLOCK/vrele in these cases. The only case there td != curthread known at the moment is boot() calling sync with thread0 pointer. This fixes the panic on shutdown people have reported.
Temporarily undo parts of the stuct mount locking commit by jeff. It is unsafe to hold a mutex across vput/vrele calls. This will be redone when a better locking strategy is agreed upon. Discussed with: jeff
- Properly acquire the vnode interlock before releasing the mntvnode_mtx. - Use a local variable to store the results of the test to see if the next vnode on the mount list has changed. This is so that we no longer acess the vnode after we vput() it.
Add fdidx argument to vn_open() and vn_open_cred() and pass -1 throughout.
Re-implement kernel access control for quotactl() as found in the
UFS quota implementation. Push some quite broken access control
logic out of ufs_quotactl() into the individual command
implementations in ufs_quota.c; fix that logic. Pass in the thread
argument to any quotactl command that will need to perform access
control.
o quotaon() requires privilege (PRISON_ROOT).
o quotaoff() requires privilege (PRISON_ROOT).
o getquota() requires that:
If the type is USRQUOTA, either the effective uid match the
requested quota ID, that the unprivileged_get_quota flag be
set, or that the thread be privileged (PRISON_ROOT).
If the type is GRPQUOTA, require that either the thread be
a member of the group represented by the requested quota ID,
that the unprivileged_get_quota flag be set, or that the
thread be privileged (PRISON_ROOT).
o setquota() requires privilege (PRISON_ROOT).
o setuse() requires privilege (PRISON_ROOT).
o qsync() requires no special privilege (consistent with what
was present before, but probably not very useful).
Add a new sysctl, security.bsd.unprivileged_get_quota, which when
set to a non-zero value, will permit unprivileged users to query user
quotas with non-matching uids and gids. Set this to 0 by default
to be mostly consistent with the previous behavior (the same for
USRQUOTA, but not for GRPQUOTA).
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
Use __FBSDID().
More low-hanging fruit: kill caddr_t in calls to wakeup(9) / [mt]sleep(9).
Back out M_* changes, per decision of the TRB. Approved by: trb
Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
Mark two places where an unsigned number is checked "if (foo < 0)" with an XXX comment. Somebody[TM] should look at this in some detail. Spotted by: FlexeLint
- Don't use the interlock to protect v_writecount.
- Replace v_flag with v_iflag and v_vflag - v_vflag is protected by the vnode lock and is used when synchronization with VOP calls is needed. - v_iflag is protected by interlock and is used for dealing with vnode management issues. These flags include X/O LOCK, FREE, DOOMED, etc. - All accesses to v_iflag and v_vflag have either been locked or marked with mp_fixme's. - Many ASSERT_VOP_LOCKED calls have been added where the locking was not clear. - Many functions in vfs_subr.c were restructured to provide for stronger locking. Idea stolen from: BSD/OS
Remove the bogus SYSINIT from ufs_dirhash.c and instead add a call to ufsdirhash_init() from ufs_init(). Add uninit() functions corresponding the ufs, dirhash, quota and ihash init() functions.
This commit adds basic support for the UFS2 filesystem. The UFS2 filesystem expands the inode to 256 bytes to make space for 64-bit block pointers. It also adds a file-creation time field, an ability to use jumbo blocks per inode to allow extent like pointer density, and space for extended attributes (up to twice the filesystem block size worth of attributes, e.g., on a 16K filesystem, there is space for 32K of attributes). UFS2 fully supports and runs existing UFS1 filesystems. New filesystems built using newfs can be built in either UFS1 or UFS2 format using the -O option. In this commit UFS1 is the default format, so if you want to build UFS2 format filesystems, you must specify -O 2. This default will be changed to UFS2 when UFS2 proves itself to be stable. In this commit the boot code for reading UFS2 filesystems is not compiled (see /sys/boot/common/ufsread.c) as there is insufficient space in the boot block. Once the size of the boot block is increased, this code can be defined. Things to note: the definition of SBSIZE has changed to SBLOCKSIZE. The header file <ufs/ufs/dinode.h> must be included before <ufs/ffs/fs.h> so as to get the definitions of ufs2_daddr_t and ufs_lbn_t. Still TODO: Verify that the first level bootstraps work for all the architectures. Convert the utility ffsinfo to understand UFS2 and test growfs. Add support for the extended attribute storage. Update soft updates to ensure integrity of extended attribute storage. Switch the current extended attribute interfaces to use the extended attribute storage. Add the extent like functionality (framework is there, but is currently never used). Sponsored by: DARPA & NAI Labs. Reviewed by: Poul-Henning Kamp <phk@freebsd.org>
More s/file system/filesystem/g
Remove register keyword. Sponsored by: DARPA & NAI Labs. Submitted by: mckusick
Change the suser() API to take advantage of td_ucred as well as do a general cleanup of the API. The entire API now consists of two functions similar to the pre-KSE API. The suser() function takes a thread pointer as its only argument. The td_ucred member of this thread must be valid so the only valid thread pointers are curthread and a few kernel threads such as thread0. The suser_cred() function takes a pointer to a struct ucred as its first argument and an integer flag as its second argument. The flag is currently only used for the PRISON_ROOT flag. Discussed on: smp@
Remove __P.
Simple p_ucred -> td_ucred changes to start using the per-thread ucred reference.
Do not pull quota entries of the cache-list if they have already been removed from the cache-list as part of a previous unmount. This would result in panics (page fault in dqflush()) during subsequent umounts provided that enough distinct UID's to actually make the hash do something are active. This can probably explain a number of weird quota related behaviours. PR: 32331 maybe more. Reproduced by: Søren Schrørder <sch@cybercity.dk>
Change the vnode list under the mount point from a LIST to a TAILQ in preparation for an implementation of limiting code for kern.maxvnodes. MFC after: 3 days
Change the kernel's ucred API as follows: - crhold() returns a reference to the ucred whose refcount it bumps. - crcopy() now simply copies the credentials from one credential to another and has no return value. - a new crshared() primitive is added which returns true if a ucred's refcount is > 1 and false (0) otherwise.
KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
- Fix a mntvnode and vnode interlock reversal. - Protect the mnt_vnode list with the mntvnode lock. - Use queue(9) macros.
Undo part of the tangle of having sys/lock.h and sys/mutex.h included in other "system" header files. Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files. Sort sys/*.h includes where possible in affected files. OK'ed by: bde (with reservations)
Revert consequences of changes to mount.h, part 2. Requested by: bde
Correct #includes to work with fixed sys/mount.h.
Change and clean the mutex lock interface.
mtx_enter(lock, type) becomes:
mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)
similarily, for releasing a lock, we now have:
mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.
The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.
Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:
MTX_QUIET and MTX_NOSWITCH
The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:
mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.
Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.
Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.
Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.
Finally, caught up to the interface changes in all sys code.
Contributors: jake, jhb, jasone (in no particular order)
Mechanical change to use <sys/queue.h> macro API instead of fondling implementation details. Created with: sed(1) Reviewed by: md5(1)
Convert all simplelocks to mutexes and remove the simplelock implementations.
Convert more malloc+bzero to malloc+M_ZERO. Submitted by: josh@zipperup.org Submitted by: Robert Drehmel <robd@gmx.net>
Convert lockmgr locks from using simple locks to using mutexes. Add lockdestroy() and appropriate invocations, which corresponds to lockinit() and must be called to clean up after a lockmgr lock is no longer needed.
o Substitute suser() calls for direct credential checks, which is now safe as suser() no longer sets ASU. o Note that in some cases, the PRISON_ROOT flag is used even though no process structure is passed, to indicate that if a process structure (and hence jail) was available, it would be ok. In the long run, the jail identifier should probably be moved to ucred, as the uidinfo information was. o Some uid 0 checks remain relating to the quota code, which I'll leave for another day. Reviewed by: phk, eivind Obtained from: TrustedBSD Project
Minor tweak - removed unused variable 'struct mount *mp';
This patch corrects the first round of panics and hangs reported with the new snapshot code. Update addaliasu to correctly implement the semantics of the old checkalias function. When a device vnode first comes into existence, check to see if an anonymous vnode for the same device was created at boot time by bdevvp(). If so, adopt the bdevvp vnode rather than creating a new vnode for the device. This corrects a problem which caused the kernel to panic when taking a snapshot of the root filesystem. Change the calling convention of vn_write_suspend_wait() to be the same as vn_start_write(). Split out softdep_flushworklist() from softdep_flushfiles() so that it can be used to clear the work queue when suspending filesystem operations. Access to buffers becomes recursive so that snapshots can recursively traverse their indirect blocks using ffs_copyonwrite() when checking for the need for copy on write when flushing one of their own indirect blocks. This eliminates a deadlock between the syncer daemon and a process taking a snapshot. Ensure that softdep_process_worklist() can never block because of a snapshot being taken. This eliminates a problem with buffer starvation. Cleanup change in ffs_sync() which did not synchronously wait when MNT_WAIT was specified. The result was an unclean filesystem panic when doing forcible unmount with heavy filesystem I/O in progress. Return a zero'ed block when reading a block that was not in use at the time that a snapshot was taken. Normally, these blocks should never be read. However, the readahead code will occationally read them which can cause unexpected behavior. Clean up the debugging code that ensures that no blocks be written on a filesystem while it is suspended. Snapshots must explicitly label the blocks that they are writing during the suspension so that they do not cause a `write on suspended filesystem' panic. Reorganize ffs_copyonwrite() to eliminate a deadlock and also to prevent a race condition that would permit the same block to be copied twice. This change eliminates an unexpected soft updates inconsistency in fsck caused by the double allocation. Use bqrelse rather than brelse for buffers that will be needed soon again by the snapshot code. This improves snapshot performance.
Add snapshots to the fast filesystem. Most of the changes support the gating of system calls that cause modifications to the underlying filesystem. The gating can be enabled by any filesystem that needs to consistently suspend operations by adding the vop_stdgetwritemount to their set of vnops. Once gating is enabled, the function vfs_write_suspend stops all new write operations to a filesystem, allows any filesystem modifying system calls already in progress to complete, then sync's the filesystem to disk and returns. The function vfs_write_resume allows the suspended write operations to begin again. Gating is not added by default for all filesystems as for SMP systems it adds two extra locks to such critical kernel paths as the write system call. Thus, gating should only be added as needed. Details on the use and current status of snapshots in FFS can be found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness is not included here. Unless and until you create a snapshot file, these changes should have no effect on your system (famous last words).
Move the truncation code out of vn_open and into the open system call after the acquisition of any advisory locks. This fix corrects a case in which a process tries to open a file with a non-blocking exclusive lock. Even if it fails to get the lock it would still truncate the file even though its open failed. With this change, the truncation is done only after the lock is successfully acquired. Obtained from: BSD/OS
Back out the previous change to the queue(3) interface. It was not discussed and should probably not happen. Requested by: msmith and others
Change the way that the queue(3) structures are declared; don't assume that the type argument to *_HEAD and *_ENTRY is a struct. Suggested by: phk Reviewed by: phk Approved by: mdodd
Remove unneeded #include <vm/vm_zone.h> Generated by: src/tools/tools/kerninclude
This form allows you to request diffs between any two revisions of this file. For each of the two "sides" of the diff, enter a numeric revision.
| ViewVC Help | |
| Powered by ViewVC 1.1.27 |