Return home

Generating Random Number from Image of a Probability Density Function

In this tutorial, we are going to learn how to generate random numbers based on a given probability density function (pdf). In other words, the distribution of the random numbers generated will be the same as the desired probability density function.

Let us assume that we need random numbers from an uncommon distribution (not a Gaussian distribution). Furthermore, it may not possible to describe the probability function in a parametric form. Instead, we somehow have the image of the desired probability density function (for example: we can take picture of it, or we simply can draw the probability density function).

Figure 1. Example of an uncommon probability density function: Houston Downtown Probability Density Function. We assume that the sky is the background. Photo is taken from [1].

What is a Probability Density Function?

In probability theory, a probability density function (pdf) is a continuous function that describes the probability for a random variable X to take values between given boundaries. For example, we are given the probability density function below:

Figure 2. Probability density function

The probability for the random variable to fall within given region is obtained by the integral of probability density function over the region. We want to know the probability of the random variable X to take values between 2 and 4.5? We need to compute the integral:

  

 

which is actually equal to the area depicted below:

Figure 3. Value of

 

The probability value 0.50 means that 50% of the times, X is going to take values between 2 and 4.5. Given the probability density function and a boundary, it looks very straightforward and easy to compute a probability.

Instead, given a probability density function, how about generating random numbers having the probability density function above? In other words, we will draw random numbers between 2 and 7 and each will be equally probable. One can suggest sampling numbers from 2 to 7 and increment the number by 0.01:

2.00  2.01  2.02  2.03  2.04  2.05   …   6.96  6.97  6.98  6.99  7.00

This is a very naďve algorithm that will work for a uniform probability density function, as the one given in Figure 2. But, what if we have a probability density function that has more sophisticated form as the one given in Figure 1?

 

Methods for (Pseudo–) Random Number Sampling

When a computer program generates random numbers, although the generated numbers appear to be random, they are actually not. It is nothing but a sequence of numbers generated by an algorithm. Therefore, such random numbers are called pseudo–random numbers. There are several algorithms to draw pseudo–random numbers from such probability density functions:

·        Inverse transform sampling

·        Rejection sampling

·        Slice sampling

·        and more.

 

In this tutorial, we are going to use the inverse transform sampling method. The inverse transform sampling method has three steps. Let  be the probability density function of interest, and let  be the cumulative distribution function of . What is a cumulative distribution function? We will see later, do not worry for now.

 

1.     Generate a random number u from the standard uniform distribution in the interval 0 and 1.

2.     Compute the value x such that .

3.     x will be the random number drawn from the distribution described by .

 

For example, we are given the  in Figure 4.

Figure 4. A probability density function that may remind you the Houston Downtown.

 

Our goal is to generate random numbers such that the probability distribution of the random numbers is going to the  in Figure 4.

 

Figure 5. The distribution of 1 million random numbers generated by a simulation using the  in Figure 4.

 

First of all, to be able to apply the inverse transform sampling method, we need to know actual numeric values of each point in the probability density function. We are given the image of the probability density function, but not the actual function values!

Image of a Probability Density Function, What does This Mean?

A digital image is nothing but a set of numbers. If we have a pictorial image of a function, this means that we have discrete points sampled from the function. For the case of an image of probability density function, we can obtain a probability distribution function (PDF) from the image.

In probability and statistics, a probability distribution function or probability distribution assigns a probability to each measurable subset of the possible outcomes of a discrete random variable X.

Probability Density Function (pdf) vs. Probability Distribution Function (PDF)

When the random variable X is continuous (the number of possible outcomes of the random variable is infinite), we have the probability density function (pdf). When the random variable X is discrete (the number of possible outcomes of the random variable is finite), we have the probability distribution function (PDF).

Discretize the Probability Density Function

Let us assume that given probability density function is a set of discrete pixels.

Figure 6. Obtaining the probability distribution function (PDF) from the given image of the probability density function (pdf). We assume that each square is a pixel.

 

Either probability density or probability distribution, who cares? What is the difference? Actually, the difference and what they represent are quite important, but we will try to make it easy to understand.

We now have the image of the probability distribution function, but we still do not have the actual function values. Please remember that a digital image is nothing but a set of numbers. If we have a pictorial image of a function, this means that we have discrete points sampled from the function.

But, how we can assign values to the pixels in Figure 6. Let us find out some constraints that can help us. We know that a probability values cannot be larger than 1 (100%). Therefore, our first constraint is that the area under the entire probability distribution function (so-called probability distribution function, it is the discrete version of the probability density function) should be 1:

  

 

Please remember that we have discrete values; therefore, the integral in Equation 3 becomes a summation operation:

integral = area under the curve

  

 

where N is the number of the pixels in PDF image,  and  (we do not know the actual values) are the width and height of a pixel, respectively.

Figure 7. Size of a pixel is defined by  and . We need to know values of N,  and , so that we can compute the integral (i.e., area) given in the Equation 4 for different boundary values. Thereby, we can assign actual numeric probability values to each sampling point in the image of the probability distribution function. This will help us use the inverse transform sampling method to generate a set of random numbers drawn from the desired probability density function.

Let us start with the easy one, that is N. N can be calculated by counting the number of pixels that belong to the probability distribution function. For example, N is 74 for the probability distribution depicted in Figure 6. However, we need more information to compute  and .

At this point, we need user defined information. The second constraint will come from the boundary values of X:

 

Figure 8. Center of each pixel is assigned to a real value denoted by .  and  are user provided boundary values. We can compute the width of a pixel using the boundary values.

 

Please remember that our goal is to compute  and , so that we can assign numeric values to the points in the image. We have 2 unknowns ( and ) and 2 equations, so we should be able to compute  and . The first equation comes from the Equation 4:

 

,  

 

and the second equation is given in the Figure 8:

,  

 

where  and  are user defined values, and k and N can be computed using the image. For example, k (the number of pixels) is 16 for the probability distribution depicted in Figure 8. If we rearrange the Equation 5 and the Equation 6, we can obtain  and  as follows:

  

 

and

 

Once we calculate  and , we can compute approximate  values for each  as follows:

 

Figure 9. To compute the probability value  for the point , we need to the number of the pixels at the point . Then, we need to calculate the total area of the pixels. For example, there are 7 pixels at the point  above. Thereby,  is equal to , which is the total area of 7 pixels.

We assign a real valued point, denoted by  for each , to the center of each pixel. The total number of the pixels at the point  (denoted by h in Figure 9) can be computed using the image. Please note that the boundary values  and  are provided by the user. Using the  and  values, we can now compute each  value as follows:

 

After we obtain a real value for each point  for , we can draw a figure using the actual probability values:

Figure 10. Left: Input image, the desired probability density function. Right: We draw a plot using the estimated values of the probability distribution function using the given probability density function.

At this point, let us remember the steps of the inverse transform sampling method. We are given a probability distribution and then we calculate the cumulative distribution. What is a cumulative distribution function (CDF)? Let us focus on CDF a little bit.

Cumulative Distribution Function

Cumulative Distribution Function (CDF) describes the probability that a random variable X with a given probability distribution will be found at a value less than or equal to x.

 

,  

 

If X is a discrete random variable,  can be computed by a summation operation as follows:

 

Using the Equation 11, we can compute cumulative distribution function  from  values:

Figure 11. Using the Equation 11, we can compute  values for .

 

We have developed methods to compute the probability distribution and the cumulative distribution from the image of the corresponding probability density function. Therefore, we can now apply the inverse transform sampling methods:

1.     Generate a random number u from the standard uniform distribution in the interval 0 and 1. We can use the predefined Matlab function rand to generate uniformly distributed pseudorandom numbers.

2.     Compute the value  such that . We do not need to compute the inverse of . Instead, we can use  values to estimate .

3.      will be the random number drawn from the distribution described by .

 

Implementation

Please click to download Matlab source code and the input image.  Here are some test images and results.

(a)

(b)

(c)

 

Figure 12.(a) Actual image[1] (b) Input image (c) Probability distribution of 1 million numbers, where  and .

This is the end of the tutorial. I hope you enjoyed it. Please feel free to contact me at haberdar at gmail dot com.

How to Cite this Document

You can use anything here as long as you give credit as follows: 

Hakan Haberdar, "Generating Random Number from Image of a Probability Density Function", Computer Science Tutorials [online], (Accessed MM-DD-YYYY) Available from: http://www.haberdar.org/generating-random-number-from-image-of-probability-density-function-tutorial.htm