In the article Level Up Your Reverse Engineering Skills I explained the benefits of reverse-engineering and outlined some guidelines and principles of successful reverse-engineering. If you haven’t read it, I recommend you start from there. This article will demonstrate the application of those guidelines and principles and introduce debugging techniques that I commonly use.

I was writing this article as I was exploring and debugging the React sources. To get the most value from it, I encourage you to go through a similar discovery process as I did using this article as a guide. If you ever wanted to reverse-engineer something, use this guide as a starting point and join me in this journey of discovery!

I’m really excited to find out how React works under the hood, like a kid on a Christmas morning 😄. Are you? Let’s get started then!

Start with acquiring the sources and setting up a sample application

For reverse-engineering I need both the React sources and a sample React application for debugging. Let’s start with the sources. I go to github.com/facebook/react and check the “releases” tab:

At the time of this writing the latest version is 16.4.2:

Then I clone the project:

$ git clone https://github.com/facebook/react.git

search for the version in git tags:

$ git tag
…
v16.4.0
v16.4.0-alpha.16.4.0-alpha.7926752
v16.4.0-alpha.5a25959
v16.4.0-alpha.94a255d
v16.4.1
v16.4.2

and checkout the version I need (v16.4.2):

$ git checkout tags/v16.4.2
Checking out files: 100% (728/728), done.
Previous HEAD position was 7d9b4ba35 Update bundle sizes for 16.1.0 release
HEAD is now at 54adb2674 16.4.2

Now I need to set up a project with the same version of React. I recommend using the simplest setup possible — HTML page with the library. And it’s awesome that React is bundled to make this setup possible.

Once I download the project, I need to update the version of React to 16.4.2. I can specify the version I need in the unpckg bundle inside the index.html:

<script src="https://unpkg.com/[email protected]/umd/react.development.js" crossorigin></script>
<script src="https://unpkg.com/[email protected]/umd/react-dom.development.js" crossorigin></script>

Okay, just to test that it works, I run the HTTP server inside the folder with index.html:

$ http-server .
Starting up http-server, serving .
Available on:
  http://192.168.0.4:8080
  http://127.0.0.1:8080
Hit CTRL-C to stop the server

Everything’s fine, the server is started on port 8080. Nice.

Identify the part of the technology to focus on

Okay, now I need to figure out the place where to start the process of reverse-engineering. Right now I’m wondering how change detection works in React. So that is my goal to start with.

However, it still doesn’t tell me where to put the debugger statement or what files to look for in the sources. To find out that I’m going to use my knowledge of the modern change detection process

Know common design patterns and general architectural concepts

I know that change detection is about synchronizing changes from a component instance to DOM nodes. In React we initiate the change detection process by calling setState. So I can start from there. However, since I learned from the docs that it’s an asynchronous call, I don’t think it’ll lead me directly to the relevant parts. So, instead, I’ll focus on identifying where React stores references to the created DOM nodes. Once I know that, I think I can track down where these DOM nodes are accessed and work forward from there. It seems that this path might get me closer to the end goal of understanding change detection: so I’ll focus on DOM nodes.

DOM nodes to look for

Before I can figure out where DOM nodes are stored, I need to know what DOM nodes will be created for the component. Here is how a React component looks like in my simple application:

class LikeButton extends React.Component {
  constructor(props) {
    super(props);
    this.state = { liked: false };
  }
  render() {
    if (this.state.liked) {
      return 'You liked this.';
    }
    return e(
      'button',
      { onClick: () => this.setState({ liked: true }) },
      'Like'
    );
  }
}

I’m looking at the render function responsible for returning a template for a component. Here the e is an alias for React.createElement:

const e = React.createElement;

In Angular, a template for a component is defined using HTML. In React however, the template is defined by composing calls to the createElement function. So, I need to figure out what HTML will be produced by the following statement:

return e(
    'button',
    {onClick: () => this.setState({liked: true})},
    'Like'
);

Well, since it’s a call to the createElement function I can explore the function’s implementation. But first, I need to find that function in the sources. To do that I open sources in WebStorm and use the IDE’s Search Everywhere functionality:

The first two results seem promising. I’m going to use signatures to figure out which function I need. Here is the signature for the function in the react-dom package:

export function createElement(
  type: string,
  props: Object,
  rootContainerElement: Element | Document,
  parentNamespace: string,
): Element { ... }

And the signature for the function in the react package:

export function createElement(type, config, children) { ... }

So, which one do I need?

Think like a scientist

In the first article I defined the basic steps for the scientific approach:

  1. Make an observation and form a hypothesis.
  2. Make a prediction based on the hypothesis.
  3. Test the prediction.

So let’s now use this approach. Observing the parameters list, I assume the first function is used in the render method of the component. It’s also inside the react-dom package which adds credibility to my assumption. So that is my hypothesis — the function createElement from the react-dom package is used inside the render method. Hence, I predict that this function will be executed when React calls the render method on the component. I now need to test the prediction. To do that, I put a debugger statement before the call to createElement, run my example and when the execution is paused I step into the e function call to see where I’ll end up:

Interestingly, I end up inside the createElementWithValidation function:

Not the function I expected. Based on the function name, I guess it’s just a wrapper around createElement. I assume that this function has the call to the createElement function inside. So I look up this function in the sources:

and explore its implementation. Indeed, I see the call to the createElement inside:

export function createElementWithValidation(type, props, children) {
  const validType = isValidElementType(type);
  ...
  const element = createElement.apply(this, arguments);

At this point I can either continue exploring the sources relying on WebStorm’s ability to figure out the location of the referenced createElement function or go back to debugging. React is written in JavaScript and for projects written in JS I normally prefer debugging over exploring sources using an IDE’s reference resolution mechanism. It’s a different story with TypeScript because it’s mostly strongly typed. But in this particular case React uses ES modules for importing references:

import {isValidElement, createElement, ...} from './ReactElement';

which provides a reliable way for an IDE to resolve the location of the references. So I’ll stick to the sources for now. I control-click on the createElement and WebStorm navigates me to the function in the react/src/ReactElement.js:

export function createElement(type, config, children) { ... }

And it’s not the function I assumed would be called! My assumption was wrong. That happens all the time, don’t get discouraged by getting your hypothesis wrong.

Actually, I’m still a bit suspicious about my wrong assumption. So I’m going to test it again using the debugger. Switching between debugging and exploring implementations in the sources is very common during reverse-engineering.

Remember, we’ve ended up in the createElementWithValidation function:

So scroll a bit lower and put a breakpoint at the line of the createElement function call and use the Continue to here functionality of the debugger:

I then step into the function call and end up inside the createElement:

Okay, it’s the same function I arrived at using IDE’s resolution process. Now that I know which function is used in the render function, I can figure out the HTML. I just need to match the parameters in the function’s signature and the function’s call:

// signature
function createElement(type, config, children) { ... }

// actual call
return e(
    'button',
    {onClick: () => this.setState({liked: true})},
    'Like'
);

Now I’m using my knowledge of how a browser DOM works to come up with a template. There will be a bunch of hypotheses here. I assume that the type parameter refers to the tag name of a DOM node. The config specifies a bunch of event handlers and the children parameter defines child nodes. So I predict that based on the information supplied to the function, React will create DOM nodes corresponding to the following template:

<button (click)="() => this.setState({ liked: true })">
    Like
</button>

These nodes will be: a button element with one click event binding and a child text node with the value Like. So that is my prediction that I need to validate. To do that I need to see what DOM nodes will be created by React and compare them to the template I came up with.

Notice how the knowledge about underlying platform is required to come up with the template. You have to know about different types of DOM nodes.

If I’m lucky, these DOM nodes will be created inside the function createElement. Since I have an application paused inside the createElementfunction, I’ll continue with debugging to explore the function’s body. I could do that from the sources, but I usually prefer debugging over the sources, mostly because I don’t need to rely on IDE’s ability to resolve references and can immediately observe the values that variables hold.

So I quickly skim through the function body and stumble upon the call to ReactElement as a return statement:

Now it’s time to switch to the sources and explore the function ReactElement there:

I want to see if it creates DOM nodes. Well, it doesn’t. As I can see, this function just creates a data structure called ReactElement. I’m instantly curios how this data structure is used in React. Interestingly, it has the type property:

const element = {
  // This tag allows us to uniquely identify this as a React Element
  $$typeof: REACT_ELEMENT_TYPE,
  // Built-in properties that belong on the element
  type: type,
  key: key,
  ...
};

So I’m wondering if the value for this property will be the string button. Since my application is paused at the call to the ReactElement function:

I can step into the function call to find out:

Well, the first call is not the string button, but a class reference to the LikeButton component. I’m going to remember that and get back to it later. I assume that the button string should come in one of the next calls. I don’t know how many more calls there’s going to be, but I know that I can expect the button string to come only after the render function has started executing. So here is a neat trick I can use. I disable the breakpoint in the createElement function until I hit the debugger statement in the render function:

Once the breakpoint is disabled I resume execution:

Until I hit the debugger statement in the render function:

I then navigate to the disabled breakpoint:

and enable it:

Okay, so now I resume execution and inspect the value of the type parameter:

Great! As I guessed the element we’re creating is the button.

The comment that describes the function states that it used to create a React Element. So I guess the data structure that acts as a proxy for a DOM node in React is called ReactElement. Interesting. There’s also a link to the documentation on React Elements so I’ll need to get back to it and get familiar with the docs. But, I’ll do that a bit later when I get to the investigation of the role React Elements play in the framework. For now I need to stick to my task of validating the prediction about the DOM nodes that React will create for the component.

I continue debugging and when I exit the render function I see the following code:

{
    ReactDebugCurrentFiber.setCurrentPhase('render');
    nextChildren = instance.render();
    if (debugRenderPhaseSideEffects || ...) {
        instance.render();
    }
    ReactDebugCurrentFiber.setCurrentPhase(null);
}

I observe the surrounding code and see the word Fiber everywhere. I have no idea what it is, so I’ll google it. The search turns up numerous results. I’ve quickly scanned some of them and it seems that Fiber is the new engine. I’ll definitely need to read more about it.

Allow yourself some time to think about what you’ve found

So I’m going to pause now and think about what I’ve discovered so far. For each createElement function call, React creates a data structure called ReactElement. So, I want to add this to my notes:

createElement(type) -> ReactElement.type

I’ve seen two calls to the createElement function — the first one with the type of the LikeButton class reference and the second one with the string button. So, I assume that we can have two types of React Elements: component class references and DOM nodes types.

I still haven’t found where the DOM node is created. It’s important to find that place to validate the hypothesis about types of DOM nodes created.

Use the Call Stack to construct the application flow

At this point I want to take a look at the Call Stack:

By clicking through the functions in the Call Stack I infer that React is going through some work loop and executes pieces of work bit by bit. I suppose that one such piece of work could be creating a DOM node, but I’ll need to work through the multiple iterations of the loop to find out that. Debugging loops is always time consuming, and if a loop is asynchronous, which I imagine is the case here, the complexity of debugging rises drastically.

Have a good grasp on the underlying platform

So, instead, I’ll use my knowledge of the native DOM API and try to find the createElement native DOM API call in the sources. This method is called on the document to create a DOM node. I don’t expect React to use document directly, but rather a platform independent wrapper. So I’ll be searching for .createElement with the dot before the name omitting the document.

I’ll first do a search inside packages/react/src. There seems to be nothing there besides tests. Okay, maybe there’s something in a different package. So I’ll do a search through all packages excluding test files:

I know that createElement should be called on document, and I see something promising:

Great. Similar to how I did with the createElement function, I’ll use debugging to verify if the function I’ve found is used to create DOM nodes. I expect it to be called to create the button HTML element. To check it, I need to find this code in the sources loaded to the browser and add a breakpoint there.

Have a good command of debugging tools

So, I do a search in the sources tab of Chrome’s Dev Tools using Ctrl+Shift+F and look for the parent.ownerDocument.createElement(parent.tagName) line. This is what I get:

I double click it and end up in the corresponding file. So, I then add a breakpoint and reload my application. Nothing. The breakpoint is not hit so I guess it’s not the method I’m looking for. Again, I got it wrong.

Approach the task from different angles

But I’ll try something else here. Here is a technique I use quite often. I replace methods on objects with methods that wrap around original functionality and add the debugger statement: inside. This is a variation of the decorator pattern. When someone calls the decorated method on an object anywhere in the application, the debugger statement is hit and I can see in the Call Stack where the call is made from.

I’ll use this technique on the document object and decorate the createElement method. Usually, it’s important to decorate the method as early as possible in the application. In my case I know that the button DOM node won’t be created until after the render method is called on the component instance, so I can decorate the document at this point.

So, as soon as the application is paused in the render method, I decorate the createElement method by executing a few statements in the console:

When I resume the execution it is then paused exactly at my debugger statement:

Now all I need to do is inspect the Call Stack. I first check the previous function call:

There are a couple of things to notice here. One thing is that the function that creates DOM nodes is called createElement$1 in the code. The other thing is that React references document as ownerDocument. This is useful information and I’ll use it when I’ll be searching for other method calls on the document.

I’ll now take a look at the createInstance function call:

Cool, here the function name is createElement. And it creates the buttonDOM node. I don’t know why the previous call adds $1 to the function name, but it’s not important at this point. Perfect, I’ve found the function that creates DOM nodes!

Now I need to find it in the sources. I do a full search in WebStorm:

And see the familiar list again. That’s the list we started from. The first time I saw it I assumed that the function from the react-dom package will be executed inside the render function:

return createElement(
    'button',
    {onClick: () => this.setState({liked: true})},
    'Like'
);

But it was the second function that was called which created a React Element.

So I can conclude now that the function from the react-dom package creates DOM nodes, while the function from the react package creates React Elements.

That’s important information! I can use it to infer that for functions related to DOM, I should be looking inside the react-dom package. On the other hand the react package probably contains platform independent logic. I can make such an inference because I know that React is a multi platform framework. I also know how Angular packages are structured, that they have @angular/core and @angular/platform-browser which seem to be similar to React.

Now I want to spend some time exploring the file where the function createElement is placed so I can start learning the code base. The filename with the createElement function is called ReactDomFiberComponent.js. There are many functions in the file that perform DOM operations, for example: createElement, createTextNode and updateDOMProperties. I’m so happy I’ve found it! Now I have a place to look for functions to put the debugger in when I need to intercept a DOM operation.

Now I’m finally ready to validate my predication that the following call:

return createElement(
    'button',
    {onClick: () => this.setState({liked: true})},
    'Like'
);

will create one button element node, one text node with the value Likeand add one event listener to the button element.

I know which functions create element and text nodes, but I don’t know which function adds an event listener. I’ll do a search in the react-dom package and find the following functions:

It must be either the first or the second one.

So, to validate the prediction, I add the debugger statements to functions createElement, createTextNode and addEventBubbleListener and addEventCaptureListener. When I reload the application I hit breakpoints in the createElement and addEventBubbleListener functions. The click event is bubbling, so it makes sense that the bubble variant of the addEventListener was executed. However, the function createText node wasn’t executed. I wonder now how the text node Like was created. Maybe it was created inside the createElement function?

To find that out, I’m going to continue stepping over the calls and check the childNodes property. This is where my knowledge of the underlying platform comes in handy. This technique will allow me to quickly move through the calls and identify the function that added the child node.

So, the first function I exit is createElement. I check the property:

Nope, it’s 0. It means that child node wasn’t created by the createElement function. I then step over precacheFiberNode$1 and updateFiberProps$1 checking the property after each function. The child text node is not added. I continue debugging and end up in the completeWork function:

The variable _instance holds the reference to the created DOM node. I also check the workInProgress variable which references FiberNode:

Based on the variable name and a little info I read on Fiber, I assume the FiberNode represents a chunk of work to do in the new Fiber architecture.

Now I step over the appendAllChildren function call and check the childNodes again:

Nope, still 0.That means that the child was not added in the appendAllChildren and at this point I don’t need to dive into it. Now I repeat the same with the call to finalizeInitialChildren. Step over and check:

Great, now we have it. I’ve pinpointed the function finalizeInitialChildrenthat created the child node. By continuing debugging and using the approach of checking the chidlNodes value, I’ve found the code that adds the text node inside the setInitialDOMProperties function:

When I step into the setTextContent function I see how the node was added:

Finally, I’ve verified my prediction on what types of DOM nodes will be created! Take a moment to appreciate how much we’ve learnt along the way: functions that create React Elements and DOM nodes, their locations, difference between packages, new Fiber architecture. Amazing, isn’t it?

A word about luck

I talked about luck and the role it plays in the process of reverse-engineering in my previous article. Well, here is a great demonstration. I started debugging with the goal to find out where React stores DOM nodes. Incidentally, if I just lower my gaze under the finalizeInitialChildren function call I see the following:

The button DOM element created above is attached to the workInProgress fiber node. So DOM nodes are attached to Fiber Nodes. That’s where React stores references to DOM nodes.

Of course, that’s just a start. There’s a very long of road of discovery ahead.

At this point I simply know that the created DOM node and its children are attached to the FiberNode’s stateNode property. Based on my knowledge of how Angular stores DOM elements, I assume that there should be some kind of data structure corresponding to components that will hold the references to the DOM. Could the Fiber Node, which is a piece of work, be that kind of data structure? It’s possible, but I haven’t seen anything like that. Now I need to imagine how I would design that kind of system if that were the case? For that to work, I’d need to store Fiber Nodes somewhere and then iterate over them when the DOM needs an update.

I’ll need to take a look at the Fiber Node class and its type. I’ll need to figure out where Fiber nodes are stored and how they are processed. So many questions to find answers to. But I’m excited!