銊淬厠銊广厙馃寘

Back in school at KAIST, I had a couple in depth courses on OS design, I even implemented my own OS as an assignment. But, to be honest, as many other students, I focused only on questions asked and topics covered. I did not read the full book or almost all material available. I did not fight unknown unknowns. In my defense, I typically had 7+ courses per semester, so not enough time! Now that I have more free time after work, I decided to read through the authoritative UNIX spec, I picked Advanced Programming in the UNIX Environment. While reading it, I couldn鈥檛 but think of two ideas.

First, there is an overlap of UNIX, OS, kernel with popular language runtime and cloud software. For example, similarly to Go and JAVA runtime, UNIX does scheduling, communication between executors, stack, memory management. Similarly to Kubernetes, Airflow, Spark, microservices and cloud general, UNIX does login, authorization, communication, scheduling, data sharing. Some facilities of UNIX are actually ahead of current state of cloud, for example memory sharing. It also seems distributed, microservices, community is focusing too much of failures. On single host, say UNIX, there is no worry of failure at all, it simplifies life by a lot. Single host paradigm is powerful. If you have single large instnace with a lot of processors, maybe just run your data pipelines or other workload on it. I think, some of aforementioned functions will be merged into UNIX itself at some point.

Which leads me to second point. We can go further. A lot of things we take as absolute truthts 鈥 we need CPU, we need instructions, we need signals, sockets, files 鈥 all this is just a holistic way to interact with powerful hardware. General purpose computation facility, so to speak. You can plug-in any code and it will quite fast. However, now I wonder is there alternatives? If all that our service will ever do is to do KV storage accessed by UDP without authentication, can we just make some raw hardware and skip UNIX, OS, kernel, CPU entirely? One big obstacle is CPU and this general-purpose hardware is optimized for decades, it must run really fast. However, if you can start from scratch, have resources and have manufacturing prowess then you might give a shot to implement raw hardware service. And don鈥檛 mind that it takes long time to produce this hardware, concept of KV storage will never dissapear, and concept of SQL is 40 years old. I am dying to see pure hardware based distributed KV storage or even SQL storage or maybe something like AWS S3 distributed file storage. Performance must be blazing. ASIC (Application Specific Integrated Circuit) maybe be way forward.

As usual, here are some notes from the book.

Basics

Files

Process

Threads

Daemons

Advanced I/O

IPC

Network IPC, Sockets

UNIX domain sockets

Example: Database

Skipped sections