2 OSAP: Machine Systems Interconnect
In this thesis, we want to build machine systems out of modular components. Doing so lets us re-use modules across projects, letting us quickly compose new systems from the parts bin rather than re-engineering circuits, drivers and code. The best way we know to do this is to build networked control systems with modular software.
2.1 Machine Building Needs Flexible Network Architectures
OSAP is essentially an implementation of the Open Systems Interconnect model (Standardization 1994) that is guided by the end-to-end principle (Saltzer, Reed, and Clark 1984), both of which were foundational during the invention and proliferation of the internet, but neither of which have been rigorously followed in the internet’s development (Group 2019). Indeed, machine-scale modular hardware systems (of the kind we deploy in this thesis) are deployed on a heterogeneity of different network links and transport layers (Lian, Moyne, and Tilbury 2001) whereas the internet is dominated by only a few (TCP, Ethernet, WiFi). The OSI model was meant to enable broad connectivity across heterogeneous link layers, but in practice the field of internetworking in hardware (and in industrial machine systems in particular) is fractured.
Machine building needs OSI badly; it is perhaps worth more here than in The Internet, which is homogenous enough to have its own stack of Internet Protocol Suite… and is big enough to be maintained by specific Internet professionals… machines / mechatronics more heterogenous, more ad-hoc, and we build new ones more often… we can discuss the next-generation-protocols paper, how OSI isn’t real (start page 15) …

Motivation here: lots of opportunity to deploy hardware systems… and apply increasingly intelligent i.e. machine-learning systems to their control, but building new ones still requires re-building many parts of this stack: motor control… sensing… comms… etc. Taking this on as an interoperable system is the exercise.
expectation would be that folks read these sub-chapters in whatever order, it’s not really linear, and probably they are here for a particular piece anyways
What kind of systems tooling is required before we can develop machines that produce more data than they consume, so that we can make models of their operation to understand and operate them more intelligently?
In a heterogeneous mechatronic system, can we design a distributed architecture that enables flexible re-use of components, and what trade-offs do we need to make to controller performance in order to enable this? (It is often noted that performance is less important than development speed, for example… in hardware, it is typically more important - but… can we have it all?)
- background: OSI, networked control, why mechatronics needs OSI even more than the internet does: heterogeneity, etc
- we are proposing to replace gcode with ~ a representation of a distributed algorithm, this means we need to distribute algorithms, we need this layer
- OSAP (networking, not meaning)
2.2 Related Work on Mechatronic Systems Architecture
TODO: sort and merge these into relevant topics, i.e. time sync background to services section, OOH and Modular Physical Computing -> Pipes… here we are doing network design, which is more focused (but… those inform this as well, so… writing hard!)
The work in this thesis is enabled by a flexible machine control architecture that combines modular hardware with software. This model was originally formalized by (Peek 2016) and (Moyer 2013) at the CBA as Object Oriented Hardware. The CBA also has a history of developing small networks for inter-device internetworking (Gershenfeld, Krikorian, and Cohen 2004) and building modular robotics (Smith 2023) (Abdel-Rahman et al. 2022).
Work on modular physical computing is active in the HCI community (Devine et al. 2022) (Ball et al. 2024) and has a long history in STEM education (Blikstein 2013) (Papert 2020). PyBricks (Valk and Lechner 2024) is an active project that deploys python interfaces on Lego modules. I made one contribution in this domain with Modular-Things (Read et al. 2023) alongside Quentin Bolsee and Leo McElroy, where we developed a new set of hardware modules and tested their use in a machine building session at MIT. That work contained early prototypes of OSAP (Chapter 2) and MAXL (Chapter 5); I also formalized some of MAXL’s design patterns in (Read, Peek, and Gershenfeld 2023), adding time-sychronized distributed trajectories as a design pattern for organizing motion across modules.
Efforts are also ongoing to improve interfaces for digital fabrication machines, (F. Fossdal, Heldal, and Peek 2021) and (F. H. Fossdal et al. 2023) develop interactive machine interfaces in Grasshopper using a python script as an intermediary to send GCodes to an off-the-shelf machine controller. In (Tran O’Leary, Benabdallah, and Peek 2023), computational notebooks are used as an interface for machine workflows: their system also implements an intermediary software object that communicates with off-the-shelf controllers using GCode, but presents a more useful API to the notebook.
The Jubilee project (Vasquez et al. 2020) (Dunn, Feng, and Peek 2023) is a machine platform that implements a modular tool-changer, and has been successfully deployed by researchers to automate duckweed studies (a popular model organism) (Subbaraman et al. 2024) and to study nanoparticles (Politi et al. 2023). Jubilee also uses an intermediary python object to interface with an off-the-shelf GCode controller, and shows the value of integrating motion systems with application-layer scripting languages.
Work in this thesis aims to extend these efforts by providing lower level motion control interfaces in the same scripting languages, reducing distributed state in the overall control architecture and making systems easier to debug and develop; consolidating configuration state was a topic discussed during and NSF sponsored workshop that I attended on open source lab automation tools (Peek and Pozzo 2023) where we used Jubilee machines. OSAP also extends other modular physical computing frameworks by enabling the use of a multitude of link-layers, whereas i.e. JacDac and Gestalt are limited to custom embedded busses.
Object Oriented Hardware for machine control presents many practical challenges: control over networks introduces timing overheads not present in digital controllers that add constraints to control algorithms (X.-M. Zhang et al. 2019) (L. Zhang, Gao, and Kaynak 2012) (Lian, Moyne, and Tilbury 2002). Some of these challenges can be overcome by distributing models throughout a system, trading computation for bandwidth (Yook, Tilbury, and Soparkar 2002) - MAXL (Chapter 5) takes some inspiration from this approach, allowing motors to incorporate simple local controllers that can take-over in the event of network failures.
Developing networks for real-time systems is itself a challenge, luckily there is well established practice in this domain. In particular, I borrow a scheduling pattern from (Di Natale 2000) and clock synchronization patterns from Network Time Protocol (Mills 1991) and high-performance counterpart (Eidson, Fischer, and White 2002), and other simple approaches (Ciuffoletti 1994). I have also studied simpler approaches from explicitly real-time domain (Kopetz and Ochsenreiter 1987).
2.3 OSAP
TODO: also many scattered chunks in here to anneal/merge,
OSAP (for Open Systems Assembly Protocol) is a networking protocol and system that I developed for the task. On its own it is a relatively lightweight piece of software that I have authored in C++ for embedded devices and in Python for high-level system components. It includes a runtime where messages are queued and passed between objects (Section 2.3.2) and software interfaces to network drivers (Section 2.4.2) and into software (Section 2.4.5). The system itself is link agnostic, meaning that it can be extended across many types of networking technologies with relatively little overhead. Because OSAP is a protocol and design spec, it should be easy to author in other languages when i.e. we want to build versions for Rust, JavaScript, or even into hardware design languages for custom silicon or FPGAs.
OSAP’s main task is to get serialized messages from any port in the system to any other port in a timely manner. It also provides two valuable services. Section 2.5.1 describes the discovery service which allows any device to retrieve a network map of connected devices. This allows us to inspect networks and determine i.e. if the motor drivers that our machine needs are, in fact, connected (and how to reach them). Section 2.5.2 describes the time synchronization service, which keeps device clocks in step with one another. This is a critical building block for mechatronic systems since it lets us synchronize motion, measure real network performance, and collect coherent time-series data from networks of sensors and actuators.
OSAP is not itself semantically meaningful, in the same way that IP addresses are not; this is the layer where we get the bytes from one place to another. For the layer where we make sense and structure out of those bytes, see Chapter 3.
2.3.1 Design Goals
OSAP is based on a thread of research that goes back tens of years in the CBA’s history based on object oriented hardware, that pairs modular hardware with modular software. I aimed to expand this architecture to span a broader heterogeneity of components and network configurations, to more easily add new firwmares and software integrations, and to enable the development of inter-device data flows (as opposed to star-shaped controller topologies).
I developed OSAP with the high-level goal of enabling asynchronous collaboration between machine developers based on interoperability and modularity of functional components. The same principles have driven the runaway success of open source software efforts as explored in (Eghbal 2020) and (Benkler 2002), who note that the modular ecosystems that enable distributed collaboration on open source software are themselves modular, performant and extensible. I.e. the systems that we use to compose systems are themselves composable.

To evaluate OSAP, I will measure its performance in terms of runtime overhead, program size overhead and networking overhead. I will also evaluate its flexibility in deploying across heterogeneous link layers and software components. Qualitatively, I will be able to evaluate where OSAP’s structures were helpful and where they were a hindrance while I was developing the other systems in this thesis.
2.3.2 OSAP Runtime, Implementation
an important section, har ! many challenges and outcomes from this part
effectively we are using packet queue’ing and scheduling as a sub for task scheduling. The timer evaluations break that, and we get into the im-possible / NP-Hard situation.
Using Software Interfaces: ambition = header-only networking include
is RTOS-like, but simpler packet (rather than task) based scheduling (Di Natale 2000) …
2.4 OSAP’s Layers
2.4.1 OSAP 1: PHY
- TTL, RS485, Byte Framing (?)
- is UART, SPI, … a PHY, a Link, TF?
- the backpacks, and generalizing across this layer (seems valuable)
- an aside: power routing vs. data routing and separation of interests…
2.4.2 OSAP 2: Links
- what links are responsible for (data integrity, packetization)
- what they are not responsible for and should not do (retransmits, nonlinearity: leave it to transport layers)
- how OSAP ingests links,
- a coupla’ links we love:
- UART at RS485 and TTL levels
- a SPI Bus, CAN Bus, I2C ? (the whole bus conundrum)
- USB (breaks the rules: it does delivery guarantees)
2.4.3 OSAP 3: Networking
- source routing, network addresses
- packets making routes,
- packets, pointers, instructions,
- time sync, isochronous-ness, epoch timestamps
- the OSAP runtime, first-deadline scheduling and time-to-live deadliness
2.4.4 OSAP 4: Transport
Moving bytes around: delivery guarantees or time guarantees, single- and multi-segment, a matrix of them.
2.4.5 OSAP 5: Ports (Software Interface)
Exposing network APIs to software modules. Transport heads / tails.
2.5 OSAP Services
2.5.1 Network Discovery
it’s a distributed DNS, ’yall
This is… where the thing comes full circle: we have our little name server port, strings to addresses, etc. Shows the awkwardness, but also we can point out that “DNS was the accidental mistake in… internet design” says Tim Berners Lee.
2.5.2 Distributed Time Synchronization
It’s time sync, babey. We want it for all kinds of reasons, we do it with packet stamps etc…
Our modular systems are situated in the physical world where timing is critical: across networking itself (to ensure that packets are delivered within particular windows), sensor gathering (sync’d readings enable us to more accurately re-create global phenomenology) and of course motion control (where, see chapter_x, we sync motion using sync’d time - sort of obviously).
I elected to build time into OSAP as a core networking service; because the system’s use is so focused in the real world, because of its position in the alleged osi stack and …
… millisecond is pretty good, microsecond is better but we can only measure down to microseconds in many of our devices … this could probably be improved, but was outside of the scope. Protocol renders time in nanoseconds.
… for an analysis on how timing precision relates to motion precision, see (something we might write?)
… for an understanding of where micro- and nano-second timing may be useful, see i.e. the small-ish kibble balance …
So, goals: few-microsecond sync between devices, autonomously, at all times… robust against hot-plugging, user-selectable grandclocks, stability over time.
One of the key services that OSAP provides is clock synchronization, which is used as a basis for motion control and for time-series data collection (to build models). Since other clock sync algorithms are complex and consume large amounts of program memory, I developed a simple version from scratch.

The algorithm is essentially a distributed diffusion routine: each device requests time stamps over all active links, picks the best source, and then skews its own clock in order to minimize errors. The algorithm works well enough for me to complete all of the tasks in this thesis, but I would like to evaluate it more rigorously, since high performance synchronization is a requirement of advanced control systems.
2.5.2.1 The Packet, the Measurement
- we bounce a ping, measure thiers, and get rtt,
- we do it underneath transport layers (that may be re-trying) to avoid the non-symmetric nonlinearity
2.5.2.2 Fixed Point
- we do it with FP, ’yall - lots of bits in this one
2.5.2.3 The Control Algo
- it’s an exponential filter and a p-term, y’all, believe it
2.5.2.4 Evaluation
- we built a tiny test-bed using a hex of parts,
- we tuned the thing, we did hot-plugging,
- here’s some graphs of clocks do’in it (time-series of skews aligning)
- at variable control values
- … how does the control value and the filter value relate to the jitter in the measurement, the no. of clocks, etc ?
- here’s how hot-plugging perturbs a local section of the graph, and how long it takes to
2.5.2.5 Discussion
- NTP and PTP exist as well, they’re similar … but typically more complex :|
- diffusion is maybe quick-and-dirty but not super rigid, as time should be ?
2.6 Protocol and Runtime Specification
write down packet specs, rules
2.7 Evaluating OSAP
2.7.1 Runtime Overhead
- func-to-func calls in-sys vs. out vs. native, (serialization overhead) (enables pseudo-paralellism in embedded, and flexibility in the same via compiler bypass)
- total time doing osap stuff vs. time doing controller stuff (on i.e. the motor controller, our tightest yet)
2.7.2 Program Size Overhead
- given a firmware w/ functional API, how much FLASH/RAM is added when we compile w/ OSAP handles, networking codes, vs. without ? compare across devices…
2.7.3 Network Overhead
- packet delay vs. (calculated) link times ? (develop trace packet ?)
- i.e. how much time is taken to move, process a packet…
- simple packet overhead calculation: frame size vs. bytes for routing, time, etc
2.8 Future Work
- osap should not be general purpose: fills a niche for heterogeneous realtime systems
- expand time for epoch ns, add precision, add link-layer stamps
- trace packets for network analysis: want to enable scheduling and resource allocation design, which is the real soln’ to the NP-Hard problem
- use insert-sort rather than resort
- address RTOS-ness, and programming model:serialization improvements -> compile time direct struct-memory-access rather than recursive function rollup
- address availability of multicore mcu: shuffling and transport layer + user-code layer
- address the snake in the room: python good for interfaces but not for runtime, should be possible to have c/rust backend w/ py-api