**Generating Random Number from Image of a
Probability Density Function**

In this tutorial, we are going to learn how to generate random numbers based on a given probability density function (pdf). In other words, the distribution of the random numbers generated will be the same as the desired probability density function.

Let us assume that we need random numbers from an uncommon distribution (not a Gaussian distribution). Furthermore, it may not possible to describe the probability function in a parametric form. Instead, we somehow have the image of the desired probability density function (for example: we can take picture of it, or we simply can draw the probability density function).

**Figure 1.** Example of an uncommon probability
density function: Houston Downtown Probability Density Function. We assume that
the sky is the background. Photo is taken from [1].

**What is a Probability Density Function?**

In probability theory, a probability density function (pdf) is a continuous function that describes the probability for a random variable X to take values between given boundaries. For example, we are given the probability density function below:

**Figure 2.**
Probability density function

The probability for the random variable to fall within given region is obtained by the integral of probability density function over the region. We want to know the probability of the random variable X to take values between 2 and 4.5? We need to compute the integral:

which is actually equal to the area depicted below:

**Figure 3.** Value
of

The probability value 0.50 means that 50% of the times, X is going to take values between 2 and 4.5. Given the probability density function and a boundary, it looks very straightforward and easy to compute a probability.

Instead, given a probability density function, how about generating random numbers having the probability density function above? In other words, we will draw random numbers between 2 and 7 and each will be equally probable. One can suggest sampling numbers from 2 to 7 and increment the number by 0.01:

**2.00 2.01 2.02
2.03 2.04 2.05 … 6.96 6.97 6.98 6.99 7.00**

This is a very naďve algorithm that will work for a uniform probability density function, as the one given in Figure 2. But, what if we have a probability density function that has more sophisticated form as the one given in Figure 1?

**Methods for (Pseudo–) Random Number
Sampling**

When a computer program generates random numbers, although the generated numbers appear to be random, they are actually not. It is nothing but a sequence of numbers generated by an algorithm. Therefore, such random numbers are called pseudo–random numbers. There are several algorithms to draw pseudo–random numbers from such probability density functions:

· and more.

In this tutorial, we are going to use the **inverse transform
sampling**** method**. The inverse transform sampling method
has three steps. Let be the probability density function of
interest, and let be the cumulative distribution function
of . What is a cumulative distribution
function? We will see later, do not worry for now.

1. Generate a random number u from the standard uniform distribution in the interval 0 and 1.

2. Compute the value x such that .

3. x will be the random number drawn from the distribution described by .

For example, we are given the in Figure 4.

**Figure
4.** A probability density
function that may remind you the Houston Downtown.

Our goal is to generate random numbers such that the probability distribution of the random numbers is going to the in Figure 4.

**Figure 5.** The
distribution of 1 million random numbers generated by a simulation using the in Figure 4.

First of all, to be able to apply the inverse transform sampling method, we need to know actual numeric values of each point in the probability density function. We are given the image of the probability density function, but not the actual function values!

**Image of a Probability Density Function,
What does This Mean?**

A
digital image is nothing but **a set of numbers**. If we have a pictorial image of a function, this means that we
have discrete points sampled from the function. For the case of an image of
probability density function, we can obtain a **probability distribution function** (PDF) from the image.

In
probability and statistics, a **probability distribution function** or **probability distribution** assigns a probability to each measurable subset of the possible
outcomes of a **discrete random
variable X**.

**Probability Density Function (pdf) vs.
Probability Distribution Function (PDF)**

When the
random variable X is
continuous (the number of possible outcomes of the
random variable is __infinite__), we have the probability
density function (pdf). When
the random variable X is
discrete (the number of possible outcomes of the random
variable is __finite__),
we have the probability
distribution function (PDF).

**Discretize the Probability Density
Function**

Let us assume that given probability density function is a set of discrete pixels.

**Figure 6.** Obtaining
the probability
distribution function (PDF)
from the given image of the probability
density function (pdf). We
assume that each square is a pixel.

Either
probability density or probability distribution, who cares? What is the
difference? Actually, the **difference
and what they represent are quite important**, but we will try to make it easy to understand.

We now have the image of the probability distribution function, but we still do not have the actual function values. Please remember that a digital image is nothing but a set of numbers. If we have a pictorial image of a function, this means that we have discrete points sampled from the function.

But, how
we can assign values to the pixels in Figure 6. Let
us find out some constraints that can help us. We know that a probability
values cannot be larger than 1 (100%). Therefore, our first constraint is that
the area under the entire probability distribution function (**so-called probability distribution
function**, it is the
discrete version of the probability density function) should be 1:

Please remember that we have discrete values; therefore, the integral in Equation 3 becomes a summation operation:

integral = area under the curve

where N is the number of the pixels in PDF image, and (we do not know the actual values) are the width and height of a pixel, respectively.

**Figure
7.** Size of a pixel is
defined by and . We need to know values of **N**, and , so that we can compute the integral
(i.e., area) given in the Equation 4 for different boundary values. Thereby, we
can assign actual numeric probability values to each sampling point in the image
of the probability distribution function. This will help us use the inverse
transform sampling method to generate a set of random numbers drawn from the
desired probability density function.

Let us
start with the easy one, that is N. **N can be calculated by counting the number of pixels that belong
to the probability distribution function**. For example, N is 74 for the probability distribution depicted
in Figure 6. However, we need more information to
compute and .

At this
point, we need **user
defined information**. The
second constraint will come from the boundary values of X:

**Figure 8.**
Center of each pixel is assigned to a real value denoted by . and are user provided boundary values. We can
compute the width of a pixel using the boundary values.

Please remember that our goal is to compute and , so that we can assign numeric values to the points in the image. We have 2 unknowns ( and ) and 2 equations, so we should be able to compute and . The first equation comes from the Equation 4:

,

and the second equation is given in the Figure 8:

,

where ** and **** are user defined values**, and k and N can be computed using the
image. For example, k (the number of pixels) is 16 for the probability
distribution depicted in Figure
8. If we rearrange the
Equation 5 and the Equation 6, we can obtain and as follows:

and

Once we calculate and , we can compute approximate values for each as follows:

**Figure 9.** To compute the probability value for the point , we need to the number of the pixels at
the point . Then, we need to calculate the total
area of the pixels. For example, **there are 7 pixels at the point **** above**. Thereby, is equal to , which is the total area of 7 pixels.

We
assign a real valued point, denoted by for each , to the center of each pixel. The total
number of the pixels at the point (denoted by **h in **Figure 9)
can be computed using the image. Please note that the boundary values and are provided by the user. Using the and values, we can now compute each value as follows:

After we obtain a real value for each point for , we can draw a figure using the actual probability values:

**Figure 10.** **Left**:
Input image, the desired probability density function. **Right**: We draw a plot using the estimated values of the
probability distribution function using the given probability density function.

At this point, let us remember the steps of the inverse transform sampling method. We are given a probability distribution and then we calculate the cumulative distribution. What is a cumulative distribution function (CDF)? Let us focus on CDF a little bit.

**Cumulative Distribution Function**

**Cumulative Distribution Function (CDF)** describes the probability that a random
variable X with a given probability distribution will be found at a value less
than or equal to x.

,

If X is a discrete random variable, can be computed by a summation operation as follows:

Using the Equation 11, we can compute cumulative distribution function from values:

**Figure 11. **Using
the Equation 11, we can compute values for .

We have developed methods to compute the probability distribution and the cumulative distribution from the image of the corresponding probability density function. Therefore, we can now apply the inverse transform sampling methods:

1.
Generate a random
number **u** from the **standard uniform distribution in the
interval 0 and 1**. We can
use the predefined Matlab function rand to generate uniformly distributed
pseudorandom numbers.

2. Compute the value such that . We do not need to compute the inverse of . Instead, we can use values to estimate .

3. will be the random number drawn from the distribution described by .

**Implementation**

Please click to download Matlab source code and the input image. Here are some test images and results.

(a)

(b)

(c)

Figure 12.(a) Actual image[1] (b) Input image (c) Probability distribution of 1 million numbers, where and .

This
is the end of the tutorial. I hope you enjoyed it. Please feel free to contact
me at **haberdar ****at**** ****gmail**** ****dot**** ****com****.**

**How to Cite this Document**

You can use anything here as long as you give credit as follows:

**Hakan Haberdar, ****"Generating Random Number from
Image of a Probability Density Function"****, Computer Science Tutorials [online], (****Accessed MM-DD-YYYY****) Available from: ****http://www.haberdar.org/generating-random-number-from-image-of-probability-density-function-tutorial.htm**