share
Cross ValidatedHistogram for a compound poisson process
[+3] [1] StatsNovice
[2017-12-17 22:20:36]
[ probability poisson-distribution poisson-process ]
[ https://stats.stackexchange.com/questions/319316/histogram-for-a-compound-poisson-process ]

Would a compound poisson process result in a histogram that isn't the same as that of a regular poisson process? How would I fit such a histogram without knowing the rate?

I've just located your other question; if you expect the increments to be Poisson with fixed rate, then the process is an ordinary homogeneous Poisson rather than compound Poisson; if you count that over constant time intervals the whole thing should be Poisson. [If it's not, how would it be compound-Poisson?] - Glen_b
[+4] [2017-12-17 23:07:03] Glen_b [ACCEPTED]
  1. You wouldn't draw a histogram of a Poisson process [1] at all. It's an increasing sequence of values (0,1,2,...) across a set of continuous times:

    Plot of a Poisson process

    You could draw something like a histogram of observations drawn from a Poisson distribution (which could arise as the number of events in a sequence of constant-length non-overlapping intervals under a Poisson process). E.g. if when observing the process above, I split time up into intervals of length 5, say, and counted the number of events in each interval, I'd get observations from a Poisson distribution and you might try to draw a histogram of that.

  2. However, a histogram is really designed for continuous data rather than values on a lattice; if you use it on discrete data you have to be careful about both its construction and its interpretation.

    Here's an illustration of a common problem with careless use of histograms on discrete data:

    Histogram and plot of sample pmf showing discrepancy arising from using default histogram in R

    This is a sample of counts from a Poisson distribution ($\lambda=3$), with the counts (shown in blue), and a histogram (the default one in R, shown in black). Note carefully what happened to the counts at 0 and 1.

    In general I'd use a different display for a discrete distribution - one that clearly indicates the discreteness - rather than one more suited to continuous data; the points in blue clearly show all the information.

  3. A compound Poisson distribution [2] is not generally much like a Poisson distribution. Indeed it needn't even be discrete -- if the jumps in the compound Poisson process [3] (also see here [4]) are continuously distributed, the resulting compound Poisson distribution is of mixed type.

    Again, an ordinary histogram is not really the best way to display such data -- indeed, even more so, since the potential spike at 0 is not "on the same scale" as the continuous data. An ecdf might serve quite well. If you do use a histogram but there are some zeros it's worth indicating carefully the special behaviour at 0. If you have a discrete compound Poisson, the same comments as under point 2 above would apply -- but if you have a lot of data, with the mean quite large compared to the standard deviation then a histogram may be perfectly sensible - in practice if the increments (when there are any) are continuous and you observed no zeros, then a histogram would make sense.

  4. It's not quite clear what you intend by "fit the histogram". If you want to model the process you'd look to specify some model for the distribution of the increments and in some way estimate the parameters of the resulting distribution. If you don't have the data on the individual increments or at least the counts comprising them but only have the aggregates (the things you'd plot in a histogram), and your data are already binned (you only have the histogram itself) it can be slightly more complicated still (since you have interval censoring) but it should still be possible to estimate parameters on the aggregate distribution.


Since you mentioned in comments that you have a discrete distribution of increments (resulting in a discrete compound Poisson, or stuttering Poisson distribution), I'll say a few things about that case.

A few common-enough-to-be-named discrete compound Poisson distributions are the Hermite distribution [5] - where increments are either 1 or 2 - and the Geometric Poisson distribution [6] (Pólya–Aeppli distribution) - where the increments have a geometric distribution. [You could arguably add the zero-inflated Poisson, where the increments are 0 or 1, but typically the discrete distribution is assumed to take positive values.]

If you're able to specify a distribution for the increments, you could attempt to estimate the collection of resulting parameters via maximum likelihood (perhaps via E-M).

Even if you don't have a model for the increments ($X_i$), you can still get information about the increments. For example, in the compound distribution, the population variance divided by the population mean of the aggregate cancels down to $E(X^2)/E(X)$ (i.e. the Poisson-$\lambda$ cancels out). There's a similar result for the third central moment divided by the mean (it cancels down to $E(X^3)/E(X)$; however fourth and higher moments are a little more complicated); this sort of information can also be used to get method of moments estimators and so be used as starting values for maximum likelihood estimates as well.

In the case of the geometric-Poisson, there are several estimators for the parameters listed here [7]

[1] https://en.wikipedia.org/wiki/Poisson_point_process
[2] https://en.wikipedia.org/wiki/Compound_Poisson_distribution
[3] https://en.wikipedia.org/wiki/Compound_Poisson_process
[4] https://en.wikipedia.org/wiki/Compound_Poisson_distribution#Compound_Poisson_processes
[5] https://en.wikipedia.org/wiki/Hermite_distribution
[6] https://en.wikipedia.org/wiki/Geometric_Poisson_distribution
[7] http://www.itl.nist.gov/div898/software/dataplot/refman2/auxillar/pappdf.htm

Thanks so much for your detailed explanation. The data are already binned. How would the Poisson parameter be separable? - StatsNovice
There are a number of potential approaches (e.g. MLE, method of moments). What's your model for the increments? - Glen_b
I guess I have an even more basic question: even without binning the data, what is the best way to display and model data since you mention histograms aren't the best way to do it? In my case, I have a lot of data from a physical process that I suspect should follow a compound Poisson distribution with jumps in the process that are discretely distributed. I tried binning the data into a histogram, but evidently that's not a good way to do it. - StatsNovice
Do you happen to have a small subset that could be used for illustration of possible ideas (along with some idea of how much data you have)? - Glen_b
1