File libk.md artifact 3181eecb08 part of check-in 85bec55157
libk
libk is intended as a modernized replacement (not reimplementation) for libc.
manifesto
normally, all C binaries (and binaries from other languages, depending on the platform) use a combination of libraries to get things done: POSIX libraries (interfaces common to UNIX-like operating systems) and libc, the C standard library. unlike POSIX, libc is part of the C language -- it's a standardized interface to various critical parts of the operating system, things like IO, system clock access, random number generation, and more.
it's also a piece of shit.
libc is ancient, and it shows. it contains decades worth of cruft, masses of different interfaces with completely different design, horrible hacks to get around the fundamental shifts in basic computer architecture that have occurred over the past half-century, and vendor-specific extensions that make porting code a nightmare. using it is painful, tedious, error-prone, and unsafe. for various reasons, there are many different implementations of libc, but all of them have that same broken, bloated interface in common. as far as i can tell, there are been no serious attempts to create an actual alternative to libc - a new system interface that takes into account the decades of painful lessons we programmers have learned since the heydays of UNIX.
hence, libk.
libk aims to offer a better, safer, and most importantly, less unpleasant foundation for modern code in C or any other language. it also aims to be much smaller, simpler, and faster than glibc to build so that there's no arduous bootstrapping process for new architectures.
currently, the only dependency on libc in any form is arch/typesize.c, a small binary tool which uses libc IO routines to print type information it calculates; however, this could also be augmented to use POISX IO routines, or even potentially libk IO routines to remove any external dependency at all -- the work would be nontrivial, but fully feasible. further, the file it creates can also in extremis be created by hand. the final compiled libc binaries and headers do not depend on or reference libc in any way; typesize is only a makedepend.
goals
libk's goals are far-reaching, and suggestions are welcome. note however that libk is not intended to be a kitchen-sink library like libiberty. it's meant to do one thing, and to it well: to provide an easy- and pleasant-to-use foundation for modern open source projects. below is a list of some of the project's major goals.
- IO. libc's basic input/output mechanisms are dreadful, built at entirely the wrong level of abstraction. libk is intended to make many more primitives available to the user, and offer a sliding scale of abstraction so libk is suitable for a wide range of needs.
- file manipulation. libc's file manipulation primitives are a relic of a bygone age and in dire need of upgrading.
- terminal manipulation. libc has no provision for simple output formatting, a task that requires a combination of ANSI codes and in some cases pty manipulation with POSIX APIs, both of which are somewhat dark wizardry. this situation forces many innocent coders to drag in the entire unholy bulk of the aptly named library ncurses, much of whose code has been utterly obsolete for the last twenty years and whose API is one of the most singularly hateful ones in existence. libk therefore should offer a simple, straightforward way to do gracefully-degrading terminal sorcery.
- memory management. the single memory management function malloc()provided by libc is absolutely pitiful. this is 2019. modern applications have much more exotic allocation needs, and a standard library should offer a range of allocators and management techniques, as well as abstract pointer objects so that pointers to objects of different allocation types (including static or stack allocation!) can be mixed freely and safely.
- intrinsic reentrancy. because jesus christ, libc.
- interprocess communication. libc offers no useful IPC abstractions over the paltry array of tools POSIX &co. give us to work with. we can do better.
- tooling. libk is intended as more than just a library. it's also intended to work with some basic tooling to automate tasks that current binary tooling is inadequate for -- for instance, embedding binary data into a program binary. (see module kgraft)
- modularity. libk is not part of the C specification and it isn't always going to be practical for developers to expect the entire library to be present on the end-user's computer. so libk is designed to be usable in many different ways -- as a traditional library, as a static library, in full form or with only components needed by the developer, to be distributed either on its own or as part of a binary.
- compatibility. code that links against libk should be able to compile and run on any operating system. in the ideal case (Linux or FreeBSD) it will be able to do so without touching any other system libraries; for less ideal environments like Windows, libk will when necessary abstract over system libraries or libc itself.
- sane error-handling. every time you type errnogod murders a puppy.
dependencies
libk is designed to be as portable and depedency-free as possible. ideally, it will be possible to compile code against libk using nothing but libk itself.
compiling libk is also designed to be as easy as possible. it has only two external dependencies, the macro processor gpp, needed for compile-time header generation , and the GNU make utility, whose advanced features are needed to perform the relatively complex task of building all of libk from the ground up.
while gpp is a very small program that builds quickly and has no major dependencies of its own, it is an obscure program not likely to be found in any repositories and with an uncertain future. for these reasons, adding m4 translations of the gpp headers should be a long-term priority. being able to be built with both a very small, easily built macro processor, and a very large but extremely well-supported processor, should make libk maximally buildable and future-proof.
while this project will include gpp tooling and GNU makefiles designed to ease the task of writing and building libk code (as well as tools in many other languages, including native binaries that compile against libk), none of them are required for the task.
naming conventions
one of the most frustrating things about libc is its complete and total lack of a naming convention. in C, every function and global is injected into a single global namespace, including macros. this means that every libc header you include scatters words all over that namespace, potentially clobbering your function with a macro!
libk is designed to fix this (in hindsight) glaring error.
however, a common problem with libraries is the proliferation of inordinately long and hard-to-type function names such as SuperWidget_Widget_Label_Font_Size_Set(). this may be tolerable in IDEs with robust auto-complete or when referencing a highly-specific, sparsely-used library; it is however completely intolerable in the case of a core library with heavily used functionality.
therefore, libk uses two slightly different naming conventions: the short convention, for core functions the user will call frequently, and the full convention, for less-commonly used functions. the inconvenience of remembering which is which will hopefully be outweighed by the keystrokes (and bytes) saved.
in the full convention, an identifier's name is prefixed with its module name followed by an underscore. thus, kgraft/list.c is invoked as kgraft_list().
in the short convention, identifiers are prefixed by the letter k followed by the module's "glyph" -- a one- or two-letter sequence that represents the module, usually the first one or two characters. therefore, kfile/open.c is invoked as kfopen.
which naming convention a module uses should be specified at the top of its documentation. if it uses the short convention, its glyph should be specified as well
in both naming conventions, the following rules apply:
- the possible values of enumeration types are always preceded by the name of the enumeration type and an underscore. for instance, the enum ksallochas a value namedksalloc_static. exception: an enum named<S>_kind, where<S>is a struct type, may simply use the prefix<S>_.
- macros begin with the uppercase letter K-- e.g.Kmacro. macros that can be defined by the user to alter the behavior of the api should begin withKFif they are on/off flags, orKVotherwise.
- capital letters are only used in macro prefixes.
- low-level function names are prefixed with the API they call into. for example, the function that performs the POSIX syscall writeis namedkio_posix_fd_write. a wrapper around the Windows functionCreateProcess()might be calledkproc_win_createprocess.
atoms
libk uses the concept of "atoms" (small, regular strings of text) to standardize common references, such as operating systems or processor architectures.
operating systems
these atoms will be used to reference operating systems.
- Linux: lin
- Haiku: hai
- Android: and
- FreeBSD: fbsd
- NetBSD: nbsd
- OpenBSD: obsd
- Darwin/Mac OS X/iOS: dar
- MS-DOS: dos
- FreeDOS: fdos
- Windows: win
- Windows MinGW: mgw
file extensions
- C function implementations: *.c
- C module headers: *.h
- ancillary C headers: *.inc.h
- assembly code: *.s
arches
these atoms will be used to reference particular system architectures. these will mostly be used in the filenames of assembly code.
macros
libk will not in any circumstance use macros to encode magic numbers, instead using typedef'd enums. all libk macros begin with the uppercase letter K -- e.g. Kmacro. macros that can be defined by the user to alter the behavior of the api should begin with KF if they are on/off flags, or KV otherwise. macros should only be defined by the libk headers if the flag KFclean is not defined at the time of inclusion.
include guards take the form of the bare module name prefixed by KI. so to test if k/term.h has been included, you could write #ifdef KIterm.
languages
libk uses only three languages: C (*.c, *.h), yasm (*.s), and make (makefile).
other assemblers will probably be necessary for the more exotic targets, however.
repository structure
libk uses a strict directory structure for code, and deviations from this structure will not be tolerated without extremely good reason.
total segregation is maintained between source code, temporary files, and output objects. source is found in module directories (k*/). the destination for temporary files and output objects are retargetable via the make parameters TMP= OUT=, but default to tmp/ and out/, which are excluded from repo with fossil's ignore-glob setting.
all libk code is dispersed into modules: kcore for internals, kio for I/O, kgraft for binary packing, etc. each module has a folder in the root directory. (libk does not have submodules.) inside each module's directory should be a header with the same name as the module (see naming conventions above).
each function should be kept in a separate file within its module's directory. the file's name should consist of the dot-separated fields [name, class, "c"] for C sources, or [name, class, arch, OS, bits, format, "s"] for assembly sources, where "name" is the name of the function without the module prefix and "class" is rt if the file is part of the libk runtime, or fn otherwise. this distinction is necessary because while the static library libk.a can include runtime objects, the shared library libk.so cannot. examples:
- a C file in the module kstrnamedkscompwould be namedkstr/comp.fn.c
- a runtime assembly file called bootin the modulekcorefor x86-64 linux would be namedkcore/boot.rt.x86.lin.64.s
- the 32-bit x86 haiku version of a function called kiowritedefined in assembly would be namedkio/write.fn.x86.hai.32.s.
each module should have a header named the same thing as the module except without the k prefix. (e.g. the header for kio is kio/io.h) located in its folder. this is the header that the end-user will be importing, and should handle any user-defined flags to present the API the user has selected.
each module directory should contain a makefile that can build that module. see makefiles below. all makefiles should be named makefile (not Makefile).
each module should contain a markdown file. this file's name should be the name of the parent directory suffixed with .md; for instance, kterm should contain the file kterm/kterm.md. this file should document the module as thoroughly as possible 
each module may contain any number of files of the name *.exe.c. this files will be treated as tools by the build system and compiled as executables, rather than libraries. they should be compiled to out/$module.$tool
the repository root and each module may also contain the directory misc. this directory may be used to store miscellaneous data such as ABI references, developer discussions, and roadmaps. if the misc directory is deleted, this must not affect the library or build system's function in any way - that is, nothing outside a misc folder may reference a misc folder or anything inside it, including documentation. the misc directory should be removed when its contents are no longer needed. in most cases, the repository wiki and forum should be used instead of the misc folder.
the folder arch in the root of the repository contains syscall tables and ABI implementations for various architectures.
makefiles
libk uses make as its build system. makefiles should be handwritten. there will be one global makefile in the root of the repository, and one makefile for each module.
each rule should be prefixed with ${OUT}, to allow retargeting of the build-dir with the OUT environment variable. this is particularly important since the makefiles chain.
the rest is TBD.
design principles
there are four overriding principles that guide the design of libk.
- it should be easy to write code that uses it.
- it should be easy to read code that uses it.
- the simple, obvious way of using libk should produce the most optimal code.
- code that uses libk should be idiomatic C.
for these reasons, the codebase follows a number of strict rules.
booleans are banned
there are a number of reasons for this.
the first is simply that the boolean type in C is a bit messy and libk headers are intended to import as few extra files as possible.
the second is that boolean-using code can be hard to read. consider a struct declaration of the form rule r = { 10, buf, true, false, true }: the meaning of this declaration is opaque unless you've memorized the structure's definition.
instead, libk uses enums liberally. so the above might be rewritten as e.g.:
rule r = { 10, buf,
	rule_kind_undialectical,
	rule_action_expropriate,
	rule_target_bourgeoisie
};
this makes code much more legible and has the added benefit of making the definitions easier to expand at a later date if new functionality is needed without breaking the API or ABI.
build process
libk has a number of targets. all files generated by a make invocation will be stored in the folder "out" at the root of the repository. this directory may be deleted entirely to clean the repository.
defs will create the directory out/k/ and populate it with module header files. the k/ directory shall be suitable to copy to /usr/include or similar. these header files will copied by building the ${OUT}/$(module).h target of each module's makefile.
libk.so will build the dynamically linked form of libk, according to the build variables set
libk.a will build the statically linked form of libk, according to the build variables set
tool will build the executables used for modules such as kgraft.
clean will delete the tmp and out trees.
authors
so far, this is a one-woman show. contributions are welcome however.
- lexi hale lexi@hale.su
caveats
the main coder, lexi hale, is first and foremost a writer, not a coder. this is a side-project of hers and will remain so unless it picks up a significant amount of attention.
while PRs adding support for Windows, OS X, and other operating systems will be gratefully accepted, the maintainer is a Linux and FreeBSD developer, will not be writing such support infrastructure herself, and has limited ability to vet code for those platforms.
license
libk is released under the terms of the GNU AGPLv3. contributors do not relinquish ownership of the code they contribute, but agree to release it under the same terms as the overall project license.
the AGPL may seem like an inappropriately restrictive license for a project with such grandiose ambitions. it is an ideological choice. i selected it because libk is intended very specifically as a contribution to the free software community, a community that i hope will continue to grow at the expense of closed-source ecosystems. i have no interest in enabling people or corporations to profit from keeping secrets, especially not with my own free labor (or anyone else's, for that matter).
if you disagree with this philosophy, you are welcome to continue using libc.
what does the k stand for?
nothing. it was chosen in reference to libc - the letter C was part of the original roman alphabet, while K was added later by analogy to the Greek kappa ‹κ›. in my native language, the older letter ‹c› can make a number of different sounds based on context, including [k] and [s], while ‹k› is fairly consistently used for the sound [k]. and for orthographical reasons, [k] is often represented by the digraph ‹ck› - that is, a C followed by a K. hopefully the analogies are obvious.
this project has nothing to do with KDE.