creating a Rust COM library

This past summer, I worked as a software engineering intern for Microsoft in Cheltenham’s MSRC UK team. As a preventative measure against memory-safety-related vulnerabilities, I worked for the Safe Systems Programming Language ( SSPL) group, which investigates safe programming languages.

The project I’ve been working on while being mentored by the SSPL team is described in this blog post. This should give you more information about what Microsoft interns do!

Overview of the project

My objective was to create an open-source Rust library that would enable programmers to easily consume and produce Component Object Model ( COM) components.

Note that there are already existing Rust crates that support COM, such as winapi-rs, Intercom, XPCOM and com-impl, as well as those for impl-consumption. We were able to make well-informed design choices that laid the groundwork for this library thanks to these crates as references.

A language-agnostic, object-oriented standard for creating binaries is called COM for the uninitiated. Binaries that support this standard can communicate with one another regardless of the language they are written in thanks to COM’s extremely well-defined Application Binary Interface ( ABI ).

A set of interfaces are used by a COM component to expose its data. A Virtual Method Table (vtable ) is defined for each interface, much like the virtual function layout of a C++ object. Function pointers to the actual function implementations are included in these vtables. The COM component then stores these vtables ‘ pointers. This is shown in the image below.

Any language can map internal data types to the vtable of a COM object by mandating the layout of function pointers. Any language can use this to call the corresponding exposed function on a COM object and dereference the appropriate function pointer.

Note that DLL entry points, COM object instantiation, and other fundamental mechanisms are also responsible for com. I wo n’t be covering that here. I heartily advise both Don Box’s Essential COM and the MSDN reference for more information.

Why should COM support be enabled in Rust?

Microsoft is looking into the use of safer systems programming languages in an effort to get rid of a particular class of vulnerabilities. Rust is one such language, as we covered in a previous blog post. Interoperability with C++ and current Microsoft tooling is one of the obstacles to the adoption of Rust. Supporting the COM standard, which is widely used by Microsoft, is a prerequisite for compatibility with current components.

The purpose of this library, which is independent of Microsoft, is to give COM developers access to the memory safety guarantees provided by the Rust compiler. When we discuss creating safe wrappers for COM interactions, we will go into more detail about this impact and related challenges.

project design and implementation

We will describe the actual operation of our library in this section.

Note: Along the way, we’ll make a few behind-the-scenes design choices that we hope will be helpful to those who join the project.

This project’s main objective was to develop an idiomatic library that Rust programmers could use to create COM components. Boilerplate code, which describes and creates the various vtable layouts, is used frequently in writing COM interactions. We made an effort to abstract the COM details as much as we could in order to make it idiomatic to use. Thankfully, Rust’s expressive macros system enables us to accomplish this abstraction!

You must first describe the Rust interface ( here, IAnimal ) in order to consume a COM component.

$#[com_interface(EFF8970E-C50F-45E0-9284-291CE5A6F771)]pub trait IAnimal: IUnknown {}fn eat(&mut self) -> HRESULT;$

Take two steps to interact with a COM object that displays the IAnimal interface. Instantiate the COM object first, and you’ll receive an interface pointer ( a vpointer-pointing object ) in return. You accomplish this by using a Runtime struct that uses CoInitializeEx/CoUninitrialize to manage the COM Library’s lifetime.

Calling techniques using the interface pointer are the second.

Macro usage was relatively simple to implement because COM abstracts the consumer-specific implementation specifics of its components.

We mainly focused on making the design extensible and idiomatic when it came to production. It was n’t simple at all. When creating a component, COM offers numerous implementation options that we must be able to handle with our library. We initially made an effort to completely abstract away the generated vtables ‘ ability to make an object COM-compatible. By encasing the user-defined object in a struct we can refer to as ComBox, we were able to accomplish this. The vpointers are kept in this ComBox and given to customers to interact with.

For basic COM, this was effective. Users wo n’t ever need to access their vpointers, so this solution was developed with that in mind. This supposition turned out to be false. When making a COM component, com allows for the creation of numerous features. Aggregation, which enables you to display the interfaces of other COM objects as if they were your own, is one of these features that promotes code reuse. Users must explicitly provide the aggregated object with your own vpointer in order to enable aggregation at any point. Our design choices were impacted because this goes against our initial assumption.

Now that we are aware of this, we must conceal the COM information while still allowing seasoned com developers access to it. Let’s say you want to develop CatDog, a COM component that inherits from ICat and IDoggy. You would need to write something along the lines of:

In the struct, developers will specify their user fields. The COM fields (vpointers, ref_counts, etc. ) are then added to the struct by the# ]co_class ] macro. The user must implement a constructor “new” in this case, and it must initialize the COM fields using the macro-generated “allocate” function. Methods are defined on the wrapped struct containing the vpointers in this instance, which is the main distinction. Since these fields now fall under the same purview, the user has access to them.

Really, how secure is it?

We’ve talked about some design choices and how to use our library. What are the actual effects of Rust use?

Even though Rust offers memory safety assurances, using the language’s unsafe superset removes all wagers. Since COM encourages language interoperability in this particular instance, frequent use of the risky Rust superset is necessary for these interactions. This is the reason.

Safe wrappers are typically made around these interactions by Rust developers. The raw pointers are verified and transformed into safe types by these secure wrappers. Developers must explicitly handle the null case, etc. because if a pointer had nulity, it would be checked and changed to an option. Unfortunately, our library was unable to replicate this solution. They are not foolproof, these wrappers. What about dangling references? Users can be made to explicitly handle null pointers. It is impossible to confirm that the pointers given to us do not indicate invalid or garbage memory. How are any of these wrappers Rust standards-marked safe?

The fact that the aforementioned scenario and our library are centered around particular Libraries/APIs is one important distinction. We’re attempting to incorporate a Standard/Protocol. Before creating a safe wrapper specific to that library, they can consult documentation, review codebase, etc. in their situation. If the wrappers return valid pointers and have checked the unsafe code they are wrapping, this would be a valid reason to mark them as safe. We cannot ensure that every single COM component will be properly implemented because we are developing a standard wrapper rather than an exact library or API. We are unable to automatically mark these generated wrappers as safe because we have no knowledge of the code we are wrapping. Once users have done their due diligence, we can give them the option to mark interactions as safe.

Unsplash, @matthewhenry

Where do we stand in terms of impact if we ca n’t assume that wrappers are secure? Compared to their C/C++ system languages counterparts, Rust will make writing secure COM components much simpler for developers. First, interactions with foreign function interfaces ( FFI ) are frequently the subject of unsafe code. For logic flows unrelated to these FFI interactions, developers can still write secure Rust code. When creating a new COM component, this is crucial. For instance, performant code frequently uses multi-threaded data structures. Data races can happen to the logic that underlies these data structures. These data races may result in difficult-to-find memory safety vulnerabilities as a result of compiler optimizations. Data races are not possible because Rust’s ownership model is constant across threads.

There are numerous such logic flows that we can write safe Rust for, which will produce safer code. This is just one example.

Second, Rust compels developers to handle memory safety with diligence and initiative. For instance, developers in Rust must use the unsafe keyword to mark the aforementioned FFI interactions as unsafe. This encourages library users to look into these FFI interactions and thoroughly investigate the COM objects they are interacting with. It is simpler to overlook this signaling mechanism when it is built right into the compiler than when these are described in documentation or code comments. We are obligated to take care of memory safety as the library’s maintainers. To comply with Rust’s safety standards, we must be able to accurately determine whether a code is safe or unsafe when exposing it.

Last but not least, it would be unfortunate if I did n’t consider the viewpoint of the security engineers I’ve worked with while working for MSRCUK! Since memory safety vulnerabilities can only come from unsafe code, having these explicit unsafe blocks greatly reduces the surface area security engineers must search through to find them.

As you can see, Rust programmers have a wide range of tools at their disposal for creating secure code and ensuring that it is used safely. Increased productivity and security are facilitated by the compiler’s use of these tools rather than the need for external, complex tooling.

Next, what?

We looked into the project’s driving forces. We have looked at the effects it will have as well as how it currently operates. Next, what?

Now that this project is open-sourced, GitHub users can download it! We encourage you to check out the project, whether you’re just curious about it or want to contribute. Since this library was created with user experience in mind, we want to hear from the community as much as possible. With this library, we were unable to cover all of the features of COM. Out-of-process interactions are one of these, among other things. Once we have a solid foundation based on the input of the community, we can examine these.

One last thing

From the first to the last day of this project, I faced challenges. I was forced to step outside of my comfort zone in order to learn about COM and pick up Rust. The struggle has, however, been overcome with an even greater sense of satisfaction as a result of the project’s completion and extensive learning.

I had the opportunity to collaborate with and observe some of the brightest minds I have ever encountered during my internship at Microsoft. I appreciate your hospitality and the MSRC UK team’s support in making me a part of the team. Ryan and Sebastian, two of my team members, I want to express my gratitude for helping me with this project and rescuing me from a lot of challenges. Finally, I want to express my gratitude to my mentor Sebastian for giving me this project and for his patient guidance throughout.

MSRC intern and software engineer Hadrian Wei Heng Lim