2 OSAP: Machine Systems Interconnect

In this thesis, we want to build machine systems out of modular components. Doing so lets us re-use modules across projects, letting us quickly compose new systems from the parts bin rather than re-engineering circuits, drivers and code. The best way we know to do this is to build networked control systems with modular software.

2.1 Machine Building Needs Flexible Network Architectures

OSAP is essentially an implementation of the Open Systems Interconnect model (Standardization 1994) that is guided by the end-to-end principle (Saltzer, Reed, and Clark 1984), both of which were foundational during the invention and proliferation of the internet, but neither of which have been rigorously followed in the internet’s development (Group 2019). Indeed, machine-scale modular hardware systems (of the kind we deploy in this thesis) are deployed on a heterogeneity of different network links and transport layers (Lian, Moyne, and Tilbury 2001) whereas the internet is dominated by only a few (TCP, Ethernet, WiFi). The OSI model was meant to enable broad connectivity across heterogeneous link layers, but in practice the field of internetworking in hardware (and in industrial machine systems in particular) is fractured.

Machine building needs OSI badly; it is perhaps worth more here than in The Internet, which is homogenous enough to have its own stack of Internet Protocol Suite… and is big enough to be maintained by specific Internet professionals… machines / mechatronics more heterogenous, more ad-hoc, and we build new ones more often… we can discuss the next-generation-protocols paper, how OSI isn’t real (start page 15) …

A simplified OSI layering, as implemented in OSAP. Each layer presents well defined software APIs to surrounding layers, in an effort to make components of the system easy to interchange: for example one of the challenges posed by modular hardware systems is that devices typically deploy on heterogeneous link layers (many choose CANBus, others use EtherCat, while simple devices may use I2C or UART based links). OSAP makes an effort to allow for combinations of link layers in any given system.

Motivation here: lots of opportunity to deploy hardware systems… and apply increasingly intelligent i.e. machine-learning systems to their control, but building new ones still requires re-building many parts of this stack: motor control… sensing… comms… etc. Taking this on as an interoperable system is the exercise.

expectation would be that folks read these sub-chapters in whatever order, it’s not really linear, and probably they are here for a particular piece anyways

What kind of systems tooling is required before we can develop machines that produce more data than they consume, so that we can make models of their operation to understand and operate them more intelligently?

In a heterogeneous mechatronic system, can we design a distributed architecture that enables flexible re-use of components, and what trade-offs do we need to make to controller performance in order to enable this? (It is often noted that performance is less important than development speed, for example… in hardware, it is typically more important - but… can we have it all?)

background: OSI, networked control, why mechatronics needs OSI even more than the internet does: heterogeneity, etc
we are proposing to replace gcode with ~ a representation of a distributed algorithm, this means we need to distribute algorithms, we need this layer
OSAP (networking, not meaning)

2.2 Related Work on Mechatronic Systems Architecture

TODO: sort and merge these into relevant topics, i.e. time sync background to services section, OOH and Modular Physical Computing -> Pipes… here we are doing network design, which is more focused (but… those inform this as well, so… writing hard!)

The work in this thesis is enabled by a flexible machine control architecture that combines modular hardware with software. This model was originally formalized by (Peek 2016) and (Moyer 2013) at the CBA as Object Oriented Hardware. The CBA also has a history of developing small networks for inter-device internetworking (Gershenfeld, Krikorian, and Cohen 2004) and building modular robotics (Smith 2023) (Abdel-Rahman et al. 2022).

Work on modular physical computing is active in the HCI community (Devine et al. 2022) (Ball et al. 2024) and has a long history in STEM education (Blikstein 2013) (Papert 2020). PyBricks (Valk and Lechner 2024) is an active project that deploys python interfaces on Lego modules. I made one contribution in this domain with Modular-Things (Read et al. 2023) alongside Quentin Bolsee and Leo McElroy, where we developed a new set of hardware modules and tested their use in a machine building session at MIT. That work contained early prototypes of OSAP (Chapter 2) and MAXL (Chapter 5); I also formalized some of MAXL’s design patterns in (Read, Peek, and Gershenfeld 2023), adding time-sychronized distributed trajectories as a design pattern for organizing motion across modules.

Efforts are also ongoing to improve interfaces for digital fabrication machines, (F. Fossdal, Heldal, and Peek 2021) and (F. H. Fossdal et al. 2023) develop interactive machine interfaces in Grasshopper using a python script as an intermediary to send GCodes to an off-the-shelf machine controller. In (Tran O’Leary, Benabdallah, and Peek 2023), computational notebooks are used as an interface for machine workflows: their system also implements an intermediary software object that communicates with off-the-shelf controllers using GCode, but presents a more useful API to the notebook.

The Jubilee project (Vasquez et al. 2020) (Dunn, Feng, and Peek 2023) is a machine platform that implements a modular tool-changer, and has been successfully deployed by researchers to automate duckweed studies (a popular model organism) (Subbaraman et al. 2024) and to study nanoparticles (Politi et al. 2023). Jubilee also uses an intermediary python object to interface with an off-the-shelf GCode controller, and shows the value of integrating motion systems with application-layer scripting languages.

Work in this thesis aims to extend these efforts by providing lower level motion control interfaces in the same scripting languages, reducing distributed state in the overall control architecture and making systems easier to debug and develop; consolidating configuration state was a topic discussed during and NSF sponsored workshop that I attended on open source lab automation tools (Peek and Pozzo 2023) where we used Jubilee machines. OSAP also extends other modular physical computing frameworks by enabling the use of a multitude of link-layers, whereas i.e. JacDac and Gestalt are limited to custom embedded busses.

Object Oriented Hardware for machine control presents many practical challenges: control over networks introduces timing overheads not present in digital controllers that add constraints to control algorithms (X.-M. Zhang et al. 2019) (L. Zhang, Gao, and Kaynak 2012) (Lian, Moyne, and Tilbury 2002). Some of these challenges can be overcome by distributing models throughout a system, trading computation for bandwidth (Yook, Tilbury, and Soparkar 2002) - MAXL (Chapter 5) takes some inspiration from this approach, allowing motors to incorporate simple local controllers that can take-over in the event of network failures.

Developing networks for real-time systems is itself a challenge, luckily there is well established practice in this domain. In particular, I borrow a scheduling pattern from (Di Natale 2000) and clock synchronization patterns from Network Time Protocol (Mills 1991) and high-performance counterpart (Eidson, Fischer, and White 2002), and other simple approaches (Ciuffoletti 1994). I have also studied simpler approaches from explicitly real-time domain (Kopetz and Ochsenreiter 1987).

2.3 OSAP

TODO: also many scattered chunks in here to anneal/merge,

OSAP (for Open Systems Assembly Protocol) is a networking protocol and system that I developed for the task. On its own it is a relatively lightweight piece of software that I have authored in C++ for embedded devices and in Python for high-level system components. It includes a runtime where messages are queued and passed between objects (Section 2.3.2) and software interfaces to network drivers (Section 2.4.2) and into software (Section 2.4.5). The system itself is link agnostic, meaning that it can be extended across many types of networking technologies with relatively little overhead. Because OSAP is a protocol and design spec, it should be easy to author in other languages when i.e. we want to build versions for Rust, JavaScript, or even into hardware design languages for custom silicon or FPGAs.

OSAP’s main task is to get serialized messages from any port in the system to any other port in a timely manner. It also provides two valuable services. Section 2.5.1 describes the discovery service which allows any device to retrieve a network map of connected devices. This allows us to inspect networks and determine i.e. if the motor drivers that our machine needs are, in fact, connected (and how to reach them). Section 2.5.2 describes the time synchronization service, which keeps device clocks in step with one another. This is a critical building block for mechatronic systems since it lets us synchronize motion, measure real network performance, and collect coherent time-series data from networks of sensors and actuators.

OSAP is not itself semantically meaningful, in the same way that IP addresses are not; this is the layer where we get the bytes from one place to another. For the layer where we make sense and structure out of those bytes, see Chapter 3.

2.3.1 Design Goals

OSAP is based on a thread of research that goes back tens of years in the CBA’s history based on object oriented hardware, that pairs modular hardware with modular software. I aimed to expand this architecture to span a broader heterogeneity of components and network configurations, to more easily add new firwmares and software integrations, and to enable the development of inter-device data flows (as opposed to star-shaped controller topologies).

I developed OSAP with the high-level goal of enabling asynchronous collaboration between machine developers based on interoperability and modularity of functional components. The same principles have driven the runaway success of open source software efforts as explored in (Eghbal 2020) and (Benkler 2002), who note that the modular ecosystems that enable distributed collaboration on open source software are themselves modular, performant and extensible. I.e. the systems that we use to compose systems are themselves composable.

A diagram of how users of Open Source Software developers interchangeably use components from a commons of functional modules, and develop and publish their own. Software has many “built-in” tools for modularity, but hardware tends to resist generalization. Modular hardware approaches try to bridge this gap, to enable the development of a commons of re-useable devices.

To evaluate OSAP, I will measure its performance in terms of runtime overhead, program size overhead and networking overhead. I will also evaluate its flexibility in deploying across heterogeneous link layers and software components. Qualitatively, I will be able to evaluate where OSAP’s structures were helpful and where they were a hindrance while I was developing the other systems in this thesis.

2.3.2 OSAP Runtime, Implementation

an important section, har ! many challenges and outcomes from this part

effectively we are using packet queue’ing and scheduling as a sub for task scheduling. The timer evaluations break that, and we get into the im-possible / NP-Hard situation.

Using Software Interfaces: ambition = header-only networking include

is RTOS-like, but simpler packet (rather than task) based scheduling (Di Natale 2000) …

2.4 OSAP’s Layers

2.4.1 OSAP 1: PHY

TTL, RS485, Byte Framing (?)
is UART, SPI, … a PHY, a Link, TF?
the backpacks, and generalizing across this layer (seems valuable)
an aside: power routing vs. data routing and separation of interests…

2.4.2 OSAP 2: Links

what links are responsible for (data integrity, packetization)
what they are not responsible for and should not do (retransmits, nonlinearity: leave it to transport layers)
how OSAP ingests links,
a coupla’ links we love:
- UART at RS485 and TTL levels
- a SPI Bus, CAN Bus, I2C ? (the whole bus conundrum)
- USB (breaks the rules: it does delivery guarantees)

2.4.3 OSAP 3: Networking

source routing, network addresses
packets making routes,
packets, pointers, instructions,
time sync, isochronous-ness, epoch timestamps
the OSAP runtime, first-deadline scheduling and time-to-live deadliness

2.4.4 OSAP 4: Transport

Moving bytes around: delivery guarantees or time guarantees, single- and multi-segment, a matrix of them.

2.4.5 OSAP 5: Ports (Software Interface)

Exposing network APIs to software modules. Transport heads / tails.

2.5 OSAP Services

2.5.1 Network Discovery

it’s a distributed DNS, ’yall

This is… where the thing comes full circle: we have our little name server port, strings to addresses, etc. Shows the awkwardness, but also we can point out that “DNS was the accidental mistake in… internet design” says Tim Berners Lee.

2.5.2 Distributed Time Synchronization

It’s time sync, babey. We want it for all kinds of reasons, we do it with packet stamps etc…

Our modular systems are situated in the physical world where timing is critical: across networking itself (to ensure that packets are delivered within particular windows), sensor gathering (sync’d readings enable us to more accurately re-create global phenomenology) and of course motion control (where, see chapter_x, we sync motion using sync’d time - sort of obviously).

I elected to build time into OSAP as a core networking service; because the system’s use is so focused in the real world, because of its position in the alleged osi stack and …

… millisecond is pretty good, microsecond is better but we can only measure down to microseconds in many of our devices … this could probably be improved, but was outside of the scope. Protocol renders time in nanoseconds.

… for an analysis on how timing precision relates to motion precision, see (something we might write?)

… for an understanding of where micro- and nano-second timing may be useful, see i.e. the small-ish kibble balance …

So, goals: few-microsecond sync between devices, autonomously, at all times… robust against hot-plugging, user-selectable grandclocks, stability over time.

One of the key services that OSAP provides is clock synchronization, which is used as a basis for motion control and for time-series data collection (to build models). Since other clock sync algorithms are complex and consume large amounts of program memory, I developed a simple version from scratch.

Here I show results from a clock synchronization test. The test polls eight devices (constituting a subset of the Rheo Printer’s controller) as the distributed clock sync algorithm settles each devices’ clock skew with respect to the chosen *grandmaster* (in this case, the laptop running the test). In the top plot, we see measured errors (these are noisy because packet round trip is not always the same, a key issue with packetized clock sync) - errors stay within +/- one millisecond during the duration of the test, and improve over time. The bottom plot shows each device’s clock skew as calculated and updated by the distributed controller. These settle eventually, but reducing their oscillation is something I would like to investigate.

The algorithm is essentially a distributed diffusion routine: each device requests time stamps over all active links, picks the best source, and then skews its own clock in order to minimize errors. The algorithm works well enough for me to complete all of the tasks in this thesis, but I would like to evaluate it more rigorously, since high performance synchronization is a requirement of advanced control systems.

2.5.2.1 The Packet, the Measurement

we bounce a ping, measure thiers, and get rtt,
we do it underneath transport layers (that may be re-trying) to avoid the non-symmetric nonlinearity

2.5.2.2 Fixed Point

we do it with FP, ’yall - lots of bits in this one

2.5.2.3 The Control Algo

it’s an exponential filter and a p-term, y’all, believe it

2.5.2.4 Evaluation

we built a tiny test-bed using a hex of parts,
we tuned the thing, we did hot-plugging,
here’s some graphs of clocks do’in it (time-series of skews aligning)
- at variable control values
- … how does the control value and the filter value relate to the jitter in the measurement, the no. of clocks, etc ?
here’s how hot-plugging perturbs a local section of the graph, and how long it takes to

2.5.2.5 Discussion

NTP and PTP exist as well, they’re similar … but typically more complex :|
diffusion is maybe quick-and-dirty but not super rigid, as time should be ?

2.6 Protocol and Runtime Specification

write down packet specs, rules

2.7 Evaluating OSAP

2.7.1 Runtime Overhead

func-to-func calls in-sys vs. out vs. native, (serialization overhead) (enables pseudo-paralellism in embedded, and flexibility in the same via compiler bypass)
total time doing osap stuff vs. time doing controller stuff (on i.e. the motor controller, our tightest yet)

2.7.2 Program Size Overhead

given a firmware w/ functional API, how much FLASH/RAM is added when we compile w/ OSAP handles, networking codes, vs. without ? compare across devices…

2.7.3 Network Overhead

packet delay vs. (calculated) link times ? (develop trace packet ?)
- i.e. how much time is taken to move, process a packet…
simple packet overhead calculation: frame size vs. bytes for routing, time, etc

2.8 Future Work

osap should not be general purpose: fills a niche for heterogeneous realtime systems
expand time for epoch ns, add precision, add link-layer stamps
trace packets for network analysis: want to enable scheduling and resource allocation design, which is the real soln’ to the NP-Hard problem
use insert-sort rather than resort
address RTOS-ness, and programming model:serialization improvements -> compile time direct struct-memory-access rather than recursive function rollup
address availability of multicore mcu: shuffling and transport layer + user-code layer
address the snake in the room: python good for interfaces but not for runtime, should be possible to have c/rust backend w/ py-api

References

Abdel-Rahman, Amira, Christopher Cameron, Benjamin Jenett, Miana Smith, and Neil Gershenfeld. 2022. “Self-Replicating Hierarchical Modular Robotic Swarms.” Communications Engineering 1 (1): 35.

Ball, Thomas, Peli de Halleux, James Devine, Steve Hodges, and Michał Moskal. 2024. “Jacdac: Service-Based Prototyping of Embedded Systems.” Proceedings of the ACM on Programming Languages 8 (PLDI): 692–715.

Benkler, Yochai. 2002. “Coase’s Penguin, or, Linux and" the Nature of the Firm".” Yale Law Journal, 369–446.

Blikstein, Paulo. 2013. “Gears of Our Childhood: Constructionist Toolkits, Robotics, and Physical Computing, Past and Future.” In Proceedings of the 12th International Conference on Interaction Design and Children, 173–82.

Ciuffoletti, A. 1994. “Using Simple Diffusion to Synchronize the Clocks in a Distributed System.” In 14th International Conference on Distributed Computing Systems, 484–91. Pozman, Poland: IEEE Comput. Soc. Press. https://doi.org/10.1109/ICDCS.1994.302457.

Devine, James, Michal Moskal, Peli De Halleux, Thomas Ball, Steve Hodges, Gabriele D’Amone, David Gakure, et al. 2022. “Plug-and-Play Physical Computing with Jacdac.” Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6 (3): 1–30.

Di Natale, Marco. 2000. “Scheduling the CAN Bus with Earliest Deadline Techniques.” In Proceedings 21st IEEE Real-Time Systems Symposium, 259–68. IEEE.

Dunn, Kellie, Cynthia Feng, and Nadya Peek. 2023. “Jubilee: A Case Study of Distributed Manufacturing in an Open Source Hardware Project.” Journal of Open Hardware 7 (1).

Eghbal, Nadia. 2020. Working in Public: The Making and Maintenance of Open Source Software. Stripe Press.

Eidson, John C, Mike Fischer, and Joe White. 2002. “IEEE-1588™ Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems.” In Proceedings of the 34th Annual Precise Time and Time Interval Systems and Applications Meeting, 243–54.

Fossdal, Frikk H, Vinh Nguyen, Rogardt Heldal, Corie L Cobb, and Nadya Peek. 2023. “Vespidae: A Programming Framework for Developing Digital Fabrication Workflows.” In Proceedings of the 2023 ACM Designing Interactive Systems Conference, 2034–49.

Fossdal, Frikk, Rogardt Heldal, and Nadya Peek. 2021. “Interactive Digital Fabrication Machine Control Directly Within a CAD Environment.” In Proceedings of the 6th Annual ACM Symposium on Computational Fabrication, 1–15.

Gershenfeld, Neil, Raffi Krikorian, and Danny Cohen. 2004. “The Internet of Things.” Scientific American 291 (4): 76–81.

Group, ETSI Industry Specification. 2019. “Next Generation Protocols (NGP); An Example of a Non-IP Network Protocol Architecture Based on RINA Design Principles.” ESTI. https://www.etsi.org/deliver/etsi_gr/NGP/001_099/009/01.01.01_60/gr_ngp009v010101p.pdf.

Kopetz, Hermann, and Wilhelm Ochsenreiter. 1987. “Clock Synchronization in Distributed Real-Time Systems.” IEEE Transactions on Computers 100 (8): 933–40.

Lian, Feng-Li, J. R. Moyne, and D. M. Tilbury. 2001. “Performance Evaluation of Control Networks: Ethernet, ControlNet, and DeviceNet.” IEEE Control Systems 21 (1): 66–83. https://doi.org/10.1109/37.898793.

Lian, Feng-Li, James Moyne, and Dawn Tilbury. 2002. “Network Design Consideration for Distributed Control Systems.” IEEE Transactions on Control Systems Technology 10 (2): 297–307.

Mills, David L. 1991. “Internet Time Synchronization: The Network Time Protocol.” IEEE Transactions on Communications 39 (10): 1482–93.

Moyer, Ilan Ellison. 2013. “A Gestalt Framework for Virtual Machine Control of Automated Tools.” PhD thesis, Massachusetts Institute of Technology.

Papert, Seymour A. 2020. Mindstorms: Children, Computers, and Powerful Ideas. Basic books.

Peek, Nadya. 2016. “Making Machines That Make: Object-Oriented Hardware Meets Object-Oriented Software.” PhD thesis, Massachusetts Institute of Technology.

Peek, Nadya, and Lilo Pozzo. 2023. “Pathways to Open-Source Hardware for Laboratory Automation.” NSF POSE Workshop. https://depts.washington.edu/machines/scienceautomation/.

Politi, Maria, Fabio Baum, Kiran Vaddi, Edwin Antonio, Joshua Vasquez, Brittany P Bishop, Nadya Peek, Vincent C Holmberg, and Lilo D Pozzo. 2023. “A High-Throughput Workflow for the Synthesis of CdSe Nanocrystals Using a Sonochemical Materials Acceleration Platform.” Digital Discovery 2 (4): 1042–57.

Read, Jake Robert, Leo Mcelroy, Quentin Bolsee, B Smith, and Neil Gershenfeld. 2023. “Modular-Things: Plug-and-Play with Virtualized Hardware.” In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems, 1–6.

Read, Jake Robert, Nadya Peek, and Neil Gershenfeld. 2023. “MAXL: Distributed Trajectories for Modular Motion.” In Proceedings of the 7th Annual ACM Symposium on Computational Fabrication.

Saltzer, Jerome H, David P Reed, and David D Clark. 1984. “End-to-End Arguments in System Design.” ACM Transactions on Computer Systems (TOCS) 2 (4): 277–88.

Smith, Miana M. 2023. “Recursive Robotic Assemblers.” PhD thesis, Massachusetts Institute of Technology.

Standardization, International Organization for. 1994. Information Technology—Open Systems Interconnection—Basic Reference Model: The Basic Model. ISO/IEC 7498-1:1994. Geneva, Switzerland: ISO/IEC. https://www.iso.org/standard/20269.html.

Subbaraman, Blair, Orlando de Lange, Sam Ferguson, and Nadya Peek. 2024. “The Duckbot: A System for Automated Imaging and Manipulation of Duckweed.” Plos One 19 (1): e0296717.

Tran O’Leary, Jasper, Gabrielle Benabdallah, and Nadya Peek. 2023. “Imprimer: Computational Notebooks for CNC Milling.” In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, 1–15.

Valk, Laurens, and David Lechner. 2024. “PyBricks: Robotics Made Easy.” https://pybricks.com/.

Vasquez, Joshua, Hannah Twigg-Smith, Jasper Tran O’Leary, and Nadya Peek. 2020. “Jubilee: An Extensible Machine for Multi-Tool Fabrication.” In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, 1–13.

Yook, John K, Dawn M Tilbury, and Nandit R Soparkar. 2002. “Trading Computation for Bandwidth: Reducing Communication in Distributed Control Systems Using State Estimators.” IEEE Transactions on Control Systems Technology 10 (4): 503–18.

Zhang, Lixian, Huijun Gao, and Okyay Kaynak. 2012. “Network-Induced Constraints in Networked Control Systems—a Survey.” IEEE Transactions on Industrial Informatics 9 (1): 403–16.

Zhang, Xian-Ming, Qing-Long Han, Xiaohua Ge, Derui Ding, Lei Ding, Dong Yue, and Chen Peng. 2019. “Networked Control Systems: A Survey of Trends and Techniques.” IEEE/CAA Journal of Automatica Sinica 7 (1): 1–17.