What’s under the hood?

Exploring some algorithms essential for self-driving

Disclaimer: All the ideas and algorithms in this post are public knowledge and does not use proprietary information from my current or former employers.

My goal with this post is to give a little insight into the kinds of problems we think about in the autonomous vehicle (AV) industry. I’ll go over the problem of forced merges on the highway and why actor predictions are crucial for them. Then I’ll dive into a recent paper on predictions for AVs, and apply it to the specific case of forced merges. Finally, I’ll give my thoughts on what the limitations are and what next steps might be. I’ll also share some code in case it’s useful for educational purposes. Let’s begin!


“Forced merges” on the highway occur when other lanes merge into our own. When this happens, an ordering is forced on the vehicles in both lanes. Usually, the right-of-way is ambiguous here, which require us to predict the intention of other vehicles as they enter our lane.¹

How do human drivers deal with this? When I’m driving in this situation and another driver is in the merging lane, I notice myself thinking in the following way:

  1. As I approach the merging point, I’m trying to get a sense of whether the other driver is aiming to be in front of or behind me.
  2. If I’m far behind or far ahead of the other car, there is very little chance they are yielding to/beating me, respectively.
  3. The longer I’m able to watch the other car, the more sure I am about their intention.

AV systems have to solve this problem as well. The stack is commonly divided hierarchically into perception, prediction, planning and controls. It is useful to be able to predict the movement or intentions of other actors in order to plan the motion of the ego vehicle.

Interaction-Aware Probabilistic Behavior Prediction in Urban Environments is a paper from 2018 that focuses on a framework for behavior prediction that encompasses a diverse set of road scenarios.What I like about this paper is that their framework clearly captures the intuitions I described earlier, using two well known tools: a Dynamic Bayesian Network and Particle Filter. Let’s dive into how these work!


Rather than discussing the paper in its full complexity, let’s focus on the getting the intuition for how this works in a merging scenario by making the following simplifications:

  1. Assume “route intention” is known — all actors are assumed to be traveling along a fixed path in their own lanes.
  2. Assume there is only one actor merging into our lane, and no one else in our lane. This means we only need to understand whether an actor intends to end up in front of or behind us by the merge point.

Dynamic Bayesian Network

The formal problem of behavior prediction is to estimate the likelihoods of different variables (such as “yield”/ “no-yield” likelihood for each actor), given a sequence of measurements. The DBN is used to represent relationships between variables across time. Note that:

  1. The chosen action a for an actor depends on its own maneuver intention (m) at that time-step (yield/no-yield). This captures the intuition that an actor will accelerate more if they are not yielding than if they are yielding.
  2. An actor’s state influences the maneuver intention (m) of other actors at the same time-step. This captures the interactive nature of prediction — an actor would need to accelerate harder to beat actors that are further in front of them or moving quickly.

Note that this DBN doesn’t need to actually be expressed in code anywhere — rather, it’s inherently captured by the action model and the resampling step of the particle filter, both of which we’ll discuss next.

Action Model

In order to calculate some of the probabilities we need, we need some model of how actors are generated once a “maneuver” is known. That is — how much will an actor accelerate if it’s yielding vs. not yielding. The paper calls this an “action model”.

The authors implement an action model by first describing a set of minimum and maximum acceleration constraints, then sampling from a gaussian distribution close to the “lowest maximum constraint”. The “yield”/ “no-yield” decision is incorporated here — if the car is believed to be yielding to us, then they have to be strictly behind us by the merge point, and this can be used to generate a “max” acceleration constraint (a^max_conf). Similarly, if they are believed to be trying to “beat” us, they must be strictly ahead of us by the merge point and this can be used to generate a “min” acceleration constraint (a^min_conf).²

This action model will be used during the forward propagation step of the particle filter, discussed next.

Particle Filter

The particle filter is the engine of our predictions. It keeps our most recent estimate of different variables, and updates them as new measurements come in. They are a specific form of a Bayesian Filter³, which use Bayes Rule⁴ to make the perfect update to our beliefs over time. The Bayes Filter is actually perfect but intractable, so we have evolved different variations that approximate it, with different trade-offs.

The particle filter works by initialize a number of “particles”, each representing a guess about the true state of the world. Each particle is assigned a probability and the aggregate set of particles represents your belief.

As each new observation comes in, the particles are propagated forward in time, their probabilities are re-weighted, and they are re-sampled according to the new probabilities.⁵

Now let’s take a look at how particle filters are used in this paper. To simplify, let’s assume we’re only considering maneuver intentions m and the state is fully observed (z=x):

I start by initializing m particles, half with a yield maneuver and half no-yield. Using the action model described below, each particle is then propagated forward. The “yield” particles will have generally smaller acceleration applied than the “no-yield” particles, resulting in a bi-model distribution of particles. Say we then receive an observation of the vehicle accelerating . We are less likely to observe this if the vehicle is yielding than if they were intending to beat us. Correspondingly, the “yield” particles will be re-weighted with a higher likelihood and the distribution of particles will shift toward the “yield” particles after re-sampling (6).

Let’s see an example showing one step of a particle filter:

Recall that each particle represents a “state” of the world. For our purposes, this is a travel, velocity, and maneuver intention of the other actor. The “yielding” particles are in red, while the “not yielding” particles are in green. (Note that I’ve also added in an “ignoring” maneuver which attempts to model actors which are not paying attention to us).

Notice that in the “before” stage, the distributions have already separated from each other as the model has been running for a few time steps, and the “no yield” particles have been applying higher acceleration on average. The new data is observed as the blue dot — a constant velocity of 20 m/s with 0 acceleration, 0.5 seconds in the future.

In the “intermediate” step, all the particles are propagated forward by the same time step of 0.5 seconds with the action model. The green “not yielding” particles have applied a higher acceleration and separate even further from the red particles. Note that the observation seems to fall more closely within the “meat” of the red distribution now. The propagated particles are then re-weighted according to their posterior likelihood. Intuitively, this should result in a relative increase in our confidence of the “yielding” hypothesis.

In the final “resampling stage”, we sample a new set of particles by drawing from the old set using their likelihood weights, with replacement. This has the property of “re-centering” the particles around the new observation, with the difference being that the number of yielding particles should now be higher than it was before, and the number of not-yielding particles should be lower. These counts are used as estimates of our belief, meaning we have more relative confidence in the yielding hypothesis.

Putting it together

Putting all these together, we now have a filter which tracks “yield”/ “no-yield” probabilities for an actor over time. As we get more observations of an actor over time (and as we both get closer to the merge point, requiring more drastic acceleration/deceleration to change between yield and no-yield, we should become more confident in our beliefs.

In this example, I’ve rigged up for an actor who is “behind” us and stays behind us all the way leading up to the merge point.⁶ This is how the particle filter evolves over the 12 seconds of observation:

You can see the “beating” probability starts low and stays low as it becomes more and more implausible that the actor can “beat” us to the merge point. They would need to start applying some positive acceleration in order to “beat” us and we just don’t observe that. Note that the acceleration is not directly observed, but inferred from the measured velocity.

Here’s another example where the actor starts off behind us, as before, but is clearly accelerating starting around 5 seconds in, and ends up beating us.

Here, you can see the filter adjust to the new, higher velocity measurements. The observations (blue dot) more and more closely match the distribution of “green” dots, meaning they get re-sampled more, and the set of “live” particles becomes dominated by “beating” particles.


Despite the relative simplicity of the ideas, and without requiring any training data, this filter did an impressive job of intention prediction in the case of forced merges. Some thoughts on how these results could be extended:

  • From a theoretical standpoint, the particle filter does not account for the possibility that an actor may actually change intentions over time, as we modeled in the second example. I upped the “re-sampling” rate of the particle filter to account for this possibility, and it seemed to work alright but was relatively unprincipled. A PGM could probably be used to model transitions between different maneuver types and their likelihoods over time, as opposed to assuming a fixed maneuver.
  • Ideally, you want a prediction model which can improve with more data, which this paper does not attempt to do. One way to incorporate a machine-learned model might be to create a “maneuver” for it and include it as a possibility in your set of beliefs. This would give us the benefit of having data-driven predictions while also protecting us when these models return nonsense answers. Some other alternatives might be to fit the gaussian noise parameters of the action model or learn the IDM model parameters from data.

I’d also like to share my code here in case it’s useful to read through — I will caveat by saying it was written in a short period of time and I will probably not be cleaning it up. For anyone interested in diving deeper, I would recommend working through the Probabilistic Robotics textbook as a starting point. Thank you for reading!


  1. Additionally, there is usually a cooperative aspect to this decision-making — neither driver wants to cause a collision — but we will not address that aspect in this post.
  2. The paper is a bit ambiguous about how this actually gets turned into a constraint (or at least I didn’t understand it). I chose to implement this by letting the actor assume our vehicle would continue with constant velocity along the road, and assuming a constant acceleration would be applied by them to be strictly ahead or behind with some padding by the critical point. This is a pretty simple heuristic and could probably be improved.
  3. For those not in the field, a lot of robotics really relies on different forms of Bayesian Filtering; see Kalman Filters in localization and tracking, for example.
  4. Bayes’ rule helps us relate P(A|B) to P(B|A). For us, this is useful to transform: “how likely is this observation given this maneuver” into “how likely is this maneuver given this observation” (something we care a lot about).
  5. Some small percentage of the particle set is usually held out to be re-sampled from observations to avoid issues with “weight collapse” (more on this later).
  6. A more realistic experiment would model the sensor noise in the measured state, but this works fine for demonstration purposes.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store