SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
Article Title Page

Benchmarking of Marine Bunker Fuel Suppliers: The Good, The Bad, The Ugly


Author Details
Author 1 Name: Ole Jørgen Anfindsen
University/Institution: DNV Research & Innovation
Town/City: Høvik
Country: Norway

Author 2 Name: Grunde Løvoll
University/Institution: DNV Research & Innovation
Town/City: Høvik
Country: Norway

Author 3 Name: Thomas Mestl
University/Institution: DNV Research & Innovation
Town/City: Høvik
Country: Norway

Corresponding author: Ole Jørgen Anfindsen
Corresponding Author’s Email: ole.jorgen.anfindsen@dnv.com

Acknowledgments (if applicable): n/a

Biographical Details (if applicable): Ole Anfindsen holds a dr. scient. degree (PhD) in computer science and a bachelors degree
in electronics engineering. For more than 25 years he has worked with databases and related technologies. He has been senior
research scientist in Telenor R&D, visiting researcher at GTE Laboratories (Massachusetts) and Sun Microsystems Laboratories
(California), as well as adjunct associate professor at the Institute of Informatics at the University of Oslo. He currently works as a
researcher in the Research & Innovation department of DNV, where his main activity is directed towards data analysis especially in
the maritime area.
G. Løvoll has a dr. scient. degree (PhD) in physics. Grunde has worked for 6 years as a Post Doc and researcher at the
Department of Physics at the University of Oslo doing experimental studies on multiphase flow in porous materials, water diffusion in
dry clay and optical tweezers. Dr. Løvoll currently works as a researcher in DNV Research & Innovation, where his main focus is on
data analysis in the maritime area.
Thomas Mestl has a Dr. Scient. (PhD) in mathematics and a degree in precisions engineering. He has worked in DNV's Research
Department for the last 13 years within the field of information technology. A large part of his work has been on identifying emerging
technology trends, evaluating new ICT technologies (especially with respect to mobile work and information management), and to
identify promising business opportunities offered by new or combination of existing technologies. Currently, his main activity is
directed towards data analysis especially in the maritime area.

Structured Abstract: Purpose - This paper has two main focus areas; the construction of a realistic best practice benchmark, and
the development of a methodology for comparison of individual suppliers of marine bunker fuel. As is well-known in this trade, unfair
business behaviors in the bunker fuel market are not uncommon, resulting in financial losses for the buyers.
Design/methodology/approach - Establishing a best practice will naturally involve some degree of subjectivity as there is not a
priori correct answer to this problem. Using the concept of membership functions from fuzzy set theory, a score can be derived from
a best practice benchmark histogram. The main advantages of this method are its relative independence both of sample size and of
the underlying distribution, as well as being computationally very efficient.

Findings - Our methodology turns out to be more powerful than standard descriptive statistics, as it is less sensitive to outliers and
is well suited for small datasets and even single numbers. When applied to data for all suppliers worldwide it turns out that the
number of good suppliers is actually much lower than might be expected.

Practical implications - Bunker fuel is a major expense for ship owners, and can easily reach $30 million/year
for a single container ship. There is therefore a considerable interest in the market for benchmarking of individual
fuel suppliers. Our methodology is also applicable to other quality related fuel parameters.

Originality/value - To the best of our knowledge this is the first attempt to benchmark actors in the marine bunker
fuel industry and to quantify their behaviors.
Keywords: benchmarking, membership functions, scoring, fuzzy clustering, supplier quality, best practice
Type header information here



Article Classification: Technical paper




For internal production use only

Running Heads:




  Type footer information here
Benchmarking of Marine Bunker Fuel Suppliers:
             The Good, The Bad, The Ugly


Abstract
Purpose
This paper has two main focus areas; the construction of a realistic best practice benchmark, and the
development of a methodology for comparison of individual suppliers of marine bunker fuel. As is well-known
in this trade, unfair business behaviors in the bunker fuel market are not uncommon, resulting in financial losses
for the buyers.

Design/methodology/approach
Establishing a best practice will naturally involve some degree of subjectivity as there is no a priori correct
answer to this problem. Using the concept of membership functions from fuzzy set theory, a score can be derived
from a best practice benchmark histogram. The main advantages of this method are it’s relative independence
both of sample size and of the underlying distribution, as well as being computationally very efficient.

Findings
Our methodology turns out to be more powerful than standard descriptive statistics, as it is less sensitive to
outliers and is well suited for small datasets and even single numbers. When applied to data for all suppliers
worldwide it turns out that the number of good suppliers is actually much lower than what might be expected.

Practical implications
Bunker fuel is a major expense for ship owners, and can easily reach $30 million/year for a single container ship.
There is therefore a considerable interest in the market for benchmarking of individual fuel suppliers. Our
methodology is also applicable to other quality related fuel parameters.

Originality/value
To the best of our knowledge this is the first attempt to benchmark actors in the marine bunker fuel industry and
to quantify their behaviors.




Keywords: benchmarking, membership functions, scoring, fuzzy clustering, supplier quality,
best practice
Category: Technical Paper


    1. Introduction
The density of marine bunker fuel can be regarded as one of its most basic parameters. It is used for
fuel quantity estimation, and is also the basis for the so-called Calculated Carbon Aromaticity Index
(CCAI), an important factor for ignition and for deposits in the engine and used for calculating the
specific energy content in fuel. Density is also an important factor when it comes to the process of
separating water or solids from bunker fuel.
For the typical ship operator the primary importance of density comes from the fact that bunker fuel is
delivered by volume but paid per ton. The conversion is done by means of the fuel density reported by
the supplier. A small density difference between stated and actual fuel density can quickly lead to
large financial losses for the ship operator. For instance, if a density of 977 kg/m3 is stated when the
actual value happens to be 960 kg/m3, this will give rise to a difference of nearly 35 ton when



                                                                                                               p. 1
bunkering 2000m3, the value of which, in the current market, is close to US$ 20,000 – just for a single
bunkering.
Although this example belongs in the high end of the spectrum, it is not at all hard to find even more
extreme examples in real life. And such a way of making a quick buck is exploited by many fuel
suppliers as their stated density is usually used to calculate the quantity of the delivered fuel. Over-
reporting of density, i.e. claiming that the fuel density is higher than what is actually the case, is called
short-lifting, while the opposite could be termed long-lifting. Short-lifting implies that the ship
operator loses money, since he pays for more fuel than he receives. Long-lifting implies that the fuel
supplier loses money, and that the ship operator gets more than what he pays for.
The global market for marine bunker fuel is more than 300 million tons annually (IEA 2010, p. 618;
Eyring et al 2010; IMO 2009; EPA 2008). We estimate that more than 300,000 tons of bunker fuel, i.e.
about 1‰ of the global consumption, is short-lifted every year. We further estimate that the amount of
long-lifting exceeds 150,000 tons. That is, on the order of half a million tons are long- or short-lifted
annually. Thus, bunker fuel worth more than US$200 million appears not to be properly accounted for
every year.
Both short- and long-lifting may be indications of fraudulent behavior of individual employees within
the ship operator’s or bunker fuel supplier’s organization. Such behavior is however sufficiently
widespread that a systematic and commonly accepted short-lifting praxis in parts of the bunker fuel
trade may be suspected. Some fuel suppliers use this tactic to consistently over-state the delivered
amount to improve the company’s profit margin. Many ship operators and suppliers would welcome a
benchmarking of suppliers, ports, or geo-regions against some best practice.
The rest of the paper is organized as follows: In Section 2 we take a closer look at concrete examples
of different density reporting strategies and discuss the difficulties associated with single number
characteristics. In Section 3 we use this to characterize good suppliers and derive criteria for defining a
best practice. In Section 4, a Best Practice Classifier is constructed that will assign a Best Practice
Score to an individual bunkering or a supplier. We also present a series of benchmarking comparisons
between regions together with an overview of how they developed over a 10 year period. This paper
ends with a discussion and some promising leads for further work.


    2. Investigating density reporting behavior
Table 1 gives some statistics for density deviations on a global and local basis (e.g. Canada and the US
West coast, South Asia, Middle East, and South America West) and for 4 selected suppliers (S1, S2 , S3,
S4) in 4 different bunker ports. The density difference, dd, is the difference between the density
claimed by the supplier and the actual density measured by a fuel testing agency (e.g. DNVPS). The
average density difference, dd , could in principle be used to characterize the behavior of a fuel
supplier (a port or a region) as good, medium or bad.
Unfortunately, most of such single number quality measures have some sort of shortcoming as they
compress a wealth of information into a single number. They often wipe out (quite effectively) much
of the information about the interesting behavior of a supplier. In addition, the arithmetic mean or
median may be less suited for distributions that are non-normal, skewed or showing heavy tails. Also,
the mean and standard deviation is very sensitive to outliers (a few unusually large or small
observations) (Bhattacharyya & Johnson 1977). As an example, the mean value of ten bad bunkerings
could easily be balanced by one exceptionally good one (or a typing error), while the median is less
sensitive to outliers. Another problem with the mean and median is that they reveal nothing about the
shape of the underlying distribution. For instance, if we only look at the mean, the geo-region South
America West seems to be better than e.g. Canada & US West Coast from a short-lifting perspective,
see Table 1. If we take the standard deviation into account it is obvious that there is a higher risk of
being short-lifted in South America West than in the other geo-regions, simply because the
distribution is wider. The standard deviation only refers to the width of the underlying distribution but
not to the actual shape. As can be seen in Figure 2 the distributions are non-normal, i.e. a highly
skewed middle spike combined with a very long one-sided tail.



                                                                                                         p. 2
Table 1: Standard descriptive measures of density differences for some selected geo-regions and suppliers
(n = number of samples, dd = mean density difference, σdd = standard deviation of dd). Histograms for the
geo-regions and suppliers are shown in Figures 1 and 2 respectively, whereas their scatter plots are shown
in Figures 3 and 4. Data in this table and in the following examples is, unless otherwise stated, based on
DNVPS bunkering samples of RMG380 fuel collected in 2008 (confer DNV 2010).
                                                               dd         median(dd)
                                               n
                                                                            in kg/m3
                                                                                              σdd
                                                             in kg/m3
          Global                             43343            0.39             0.10          3.92
          Canada & US West Coast             1919             0.03            -0.10          2.43
          South Asia                         6806             1.22             0.90          3.35
          Middle east                        2990             1.83             0.70          4.76
          South America West                  565             -0.48           -0.90          6.00
          Supplier 1 (S1)                     129             -0.12           -0.10          0.95
          Supplier 2 (S2)                     239             2.31            0.90           4.84
          Supplier 3 (S3)                     71              2.40            2.60           1.83
          Supplier 4 (S4)                     145             2.07            1.50           2.81



Histograms
For a more detailed understanding of the properties of the data in Table 1 please refer to the density
difference histograms of Figures 1 and 2. For comparison we have plotted a smoothed version of the
global histogram (dashed line) and a smoothed version of the actual histogram (solid line). These
histograms represent estimates for the underlying probability density distribution and can thus tell us
something about the risk and possible amount of the short-lifting. A comparison with a reference
histogram, like the global histogram, would provide the desired benchmark.
From Figure 1 it can be seen that none of the histograms seem to come from a normal distribution (the
implications of this observation will not be further discussed in this paper). This can be confirmed by
means of a probability plot. The different geo-regions also show significant differences in their density
reporting practice. Canada & US West Coast appears better than the global average, the peak of the
histogram is centered at 0 and has shorter tails. For South Asia, the width of the histogram is similar to
the global one, but its center is shifted towards short-lifting, whereas the Middle East shows a fairly
heavy short-lifting tail. The histogram for South America West is especially remarkable as the chance
of actually getting the fuel density stated by the supplier appears to be slim. The rule is rather that the
buyer is either short- or long-lifted, something which could not be deduced from the standard
descriptive statistics.


Figure 1: Probability distribution of density reporting deviations (i.e. the difference between claimed and
measured density) for 4 selected geo-regions. The histograms are (clockwise from top left): Canada & US
West Coast, South Asia, Middle East, and South America West Coast. The solid lines represent the
smoothed histogram while the dashed lines are the smoothed global histogram. The underlying number of
samples, averages, medians, and standard deviations are given in Table 1. The histograms reveal
considerable variation in density reporting.

Histograms for individual suppliers listed in Table 1 are shown in Figure 2 below. A visual
comparison indicates that Supplier 1 is much better than the global average with a narrow symmetric
distribution centered at 0. The three other suppliers are all heavily short-lifting with varying degrees of
right-shifted and/or right-heavy distributions. Based on these histograms the suppliers might be
characterized as rather bad, but any fine grained information about their underlying reporting strategy
is removed by the histogram. A main disadvantage of using histograms for characterizing suppliers is
that they require a considerable amount of data which could be a challenge when considering short
time periods or suppliers with few data samples.



                                                                                                       p. 3
Figure 2: Probability distribution of density reporting deviations (i.e. the difference between claimed and
measured density) for 4 selected suppliers in 4 different bunker ports (for more details se Table 1). The
histograms reveal different reporting behavior, but histograms become noisy when the number of samples
becomes too low.


Scatter plots
Scatter plots of measured vs. claimed density allows a much more fine grained view on the underlying
data. These plots may be used to unravel the various reporting strategies of the suppliers, see Figure 3
and Figure 4. Scatter plots quite effectively visualize the density reporting behavior of suppliers or
groups of suppliers. Note that each dot in a scatter-plot represents at least one bunkering sample. The
diagonal solid line represents correct density reporting (i.e. stated = measured, in the following called
no-cheat line). The horizontal and vertical dashed lines specify the upper density limit given by the
ISO8217 standard.
These scatter plots exhibit some interesting observations. Note that the range of densities of the
available fuel varies between geo-regions; e.g. the fuel density range is much wider in the Middle East
than in North America or South Asia. This phenomenon may be traced back to the proximity to crude
oil production in the regions.
Observe also that in many bunkerings the fuel density was above the limit (dots to the right of vertical
dashed line) but almost none of them were reported to lie above the limit (above horizontal dashed
line). This is true for all suppliers.
From Figure 4 we may deduce that Supplier 1 could be considered as rather good, since most of his
samples are on or close to the no-cheat line. This behavior seems to be dominant for most of the
suppliers in the Canada & US West geo-region (note: good suppliers are found in all geo-regions). In
contrast, Supplier 2 may be regarded as bad, since his stated densities cover the whole range from the
no-cheat line and all the way up to maximum-cheating, i.e. the upper density limit given by the
standard. This type of behavior is also visible both in the South Asia and the Middle East scatter plots.
It seems that Supplier 3 has a strategy of simply adding an offset to the real density, which is reflected
in the mean density different from zero and a relative low standard deviation. A fourth reporting
scheme appears in Supplier 4 who has a tendency of always stating a density near the limit –
independently of the actual density. This could be termed as the worst behavior since they short-lift as
much as possible. This behavior is not uncommon in South Asia and the Middle East. Variations to
this scheme, i.e. stating a fixed fuel density but lower than the limit, are seen in Asia, Middle East and
South America West. They appear as horizontal lines in the scatter plot.



Figure 3: Scatter plot of measured vs. claimed density for the same geo-regions as in Table 1 and Figure 1.
Each black dot represents (at least) one bunkering. The solid line represents the no-cheat line, i.e.
bunkerings where the supplier states the density correctly (claimed = measured), whereas the dashed lines
indicate the upper density limit in the ISO standard for bunker fuel (ISO8217), viz. 991 kg/m3, implicitly
giving the maximum possible amount of cheating. Many dots along the upper dashed line indicate a high
degree of cheating in many bunkerings. Note that in many bunkerings the fuel density was above the limit
(dots to the right of vertical dashed line) but almost none of them were reported to lie above the limit
(above horizontal dashed line).




Figure 4: Scatter plot of measured versus claimed density for the same suppliers as in Table 1 and Figure
2. Supplier 1 reports quite honestly as his dots are scattered close along the no-cheat line. In contrast,
Supplier 2 and 3 have many reportings away from this no-cheat line but they are not as dishonest as
Supplier 4, who basically reports only one density close to 991 irrespective of the actual fuel density.




                                                                                                       p. 4
3. The Good: Best practice benchmark
The above discussion has emphasized the need for a good benchmark for measuring the goodness in
density reporting, and for distinguishing between various short-lifting and long-lifting strategies.
The scatter plots of Canada & US West Coast and Supplier 1 are examples of good density reporting
behaviors that could be used as best practice references. Our interpretation of good or best practice is
indicated by the grey diagonal area around the no-cheat line in Figure 5. Fair reporting and good
control of the delivered density should result in a small symmetric scatter around the no-cheat line,
and thus a narrow density difference (dd) histogram centered at dd = 0 (like the one for Supplier 1 in
Figure 2).
The goal is to establish a best practice, and then use it as a predefined reference to which bunkerings
may be compared. This best practice benchmark is given by the dd-histogram for a group of selected
good suppliers.



Figure 5: Scatter plot of bunkering data from South Asia. Data points around the diagonal line (no-cheat
line) indicates good or best practice behavior, i.e. fair reporting, with little or no cheating. In the area
above the no-cheat line, customers get short-lifted (pay too much) whereas below the line the supplier loses
money. The more dots there are above the fair line, and the further away from it they are, the less
accurate the density reporting. Bunkerings far below the fair area should be considered suspicious and
may indicate a bribing situation. Reportings in the grey horizontal area (reporting densities close to the
upper density limit) indicate that some suppliers consciously choose a strategy of maximum density
cheating. A close up of the scatter plot near the density limit = 991 kg/m3 reveals that hardly any suppliers
are willing to state that their fuel exceeds the limit even when this is clearly the case.


This best practice histogram shall represent good suppliers and should be based on many data points.
Any outliers, intentional cheating, or other indications of dishonesty should be eliminated to obtain an
unbiased and fair benchmark. The following criteria for deriving the best practice benchmark should
therefore be chosen (there will always be a certain element of subjective judgment in this process, but
the method for deriving the benchmark should as far as possible be transparent, sound, and unbiased):
    1) Select some geo-regions where the scatter plots show that data are predominantly found along
        the no cheat line.
    2) For each selected dataset we:
            a. Eliminate extreme outliers, max cheating and near limit lying; only data inside a
                 predefined area around the no-cheat line is selected (see Figure 6 for details).
            b. Eliminate any bias by centering the dd data around dd = 0.
    3) The adjusted and selected dd data for all the selected sets are then merged into one large
        dataset.
    4) Calculate the dd histogram for the dataset.
Figure 7 shows the best practice reference histogram derived from the geo-regions Biscay, Canada &
US East Coast, Canada & US West Coast, US Gulf Coast, and Oceania.



Figure 6: Only bunkering samples between the 2 blue solid lines will be used as basis for deriving the best
practice benchmark histogram. This effectively eliminates max cheating, outliers, and ‘near limit effects’,
i.e. less than complete honesty when selling too heavy fuel. The upper solid line divides the angle between
no-cheat and max-cheat lines. The lower solid line is simply mirrored around the no-cheat line such that
the density deviations are the same above and below, i.e. |+ | = |- |.




                                                                                                         p. 5
Figure 7: Best practice dd histogram based on samples from selected geo-regions (Biscay, Canada
& US East and West Coast, US Gulf Coast and Oceania) where max cheating, outliers and near
limit dishonesty have been eliminated. The dashed line is the histogram function H, i.e. a
smoothed version of the histogram indicating the global best practice.



Classification by membership function
Once the best practice histogram is generated, the challenge is to benchmark a supplier, a port, or a
region against it. In principle, this histogram must be compared with the dd histograms for the
suppliers in question and the degree of conformance would then give the desired benchmark.
Unfortunately this is a non-trivial task and for many of the suppliers only relatively few samples are
available, resulting in bad histograms. We therefore propose a more elegant approach that is
insensitive to the number of data points and outliers, and that can even be used for a single bunkering.
The concept of a membership function (Turksen 1991; Terano et al 1987, p. 21), which is widely
applied in Fuzzy set theory (Lowen 1996, Self 1990), is used to achieve this benchmarking. A single
number (score) is computed denoting the goodness of a specific bunkering or supplier.
An example will hopefully make this clear. Consider the task of benchmarking people into fast and
slow runners, respectively. One way to do this is to set a threshold T on how fast a person should be
able to run 100 m, and then categorize the people who run slower than the threshold as slow (=0) and
those who run faster than the threshold as fast (=1). This sorting is achieved by a Boolean membership
function B with threshold T for the measured time t on 100 m, i.e. B(T,t). However, it is quite obvious
that this benchmarking will result in a crude oversimplification as there is a continuous transition from
extremely fast runners to the really slow ones, and a small change in the chosen threshold could
seriously alter the number of members in each category. A better approach would be to replace the
Boolean function with a continuous function, assigning a continuous membership value between 0 and
1 depending on how fast they run. This is an example of a so-called membership function, and will in
the following simply be denoted m.
The situation is analogous to our best practice density benchmark where suppliers (or bunkerings) are
not grouped into crisp sets of good and bad but rather get a score indicating how close to or far away
from the best practice they are. This, by the way, is also the reason why e.g. discriminant analysis
(Hastie et al 2009) is unsuitable for the task at hand.
The challenge is to find a membership function for the good group, faithfully reflecting what we
consider to be good. Fuzzy set theory does not provide help in determining the membership function,
as all kinds of functions are used, e.g. triangular, trapezoid, Gaussian, etc. The discussion of good
behavior above gives us some hints about the properties of the desired membership function. It should
not be too wide, as a bad bunkering could then be regarded as good. Likewise, if it is too narrow then a
good bunkering would get a too low goodness score. It is important that the membership function
represents the best practice set as well as possible. The obvious choice is to derive the membership
function directly from the dd histogram itself.
The membership function for good bunkerings, mG, must have a maximum value of 1 at dd = 0, i.e.
mG(dd=0) = 1, and is continuously decreasing in both directions, i.e. a rescaling and shift of the H
histogram has to be done. We therefore propose the following definition of the membership function:
                                                      H (dd )   H (dd )
                                        m G (dd ) =           =
                                                      max(H )   H (0)
where the subscript G indicates that this gives a goodness scoring, and H is the smoothed (and
adjusted) best practice histogram (i.e. H is the histogram function). Note that mG is a function of the
distance of dd to 0, as well as the frequency of dd in the best practice. This membership function can
now be applied e.g. to all n supplier samples to obtain the overall goodness benchmark,




                                                                                                     p. 6
1 n
                                               bG =    ⋅ ∑ mG (dd i )
                                                      n i =1
where the summation is done over all n bunkerings for a specific supplier, port, or geo-region.
An interesting observation is that the scoring from the membership function mG(dd) is not (a priori) a
probabilistic measure, it is a measure (0→1) based on how far away a variable is from some value, i.e.
dd=0; see Figure 8. However, this rescaling does preserve an interesting probabilistic feature, viz. the
following: the probability of finding a value x in a small interval around dd, relative to that of finding a
value y in an equally sized interval close to 0, given that the samples are drawn from the best practice
group.



Figure 8: The solid line gives the goodness membership function, mG, which is a scaling of the best practice
histogram. mB = 1-mG gives the membership function for the opposite (dashed line), i.e. bad which in turn
could be divided into a long- and short-lifting part, mLL and mSL respectively (corresponding to negative
and positive dd values). E.g. a bunkering with dd=2.3 would get a good score of mG=0.23 and a bad score of
mB=0.77 (with mLL=0 and mSL=0.77).



The Bad
Note that mG(dd) was derived based on what was chosen to be the best practice. It therefore gives a
measure/score for how good a bunkering or supplier is with respect to this best practice. The
complementary,
                                           mB(dd) = 1 - mG(dd),
give a badness scoring but it will not tell weather the bad scoring comes from short- or long-lifting.
Fortunately, mB can, depending on whether a sample falls into the short- or long-lifting domain, be
further divided into mSL and mLL. That is, if the dd value of a sample is positive, its mSL will be greater
than zero; if the dd value of a sample is negative, its mLL will be greater than zero.
This enables us to calculate short- and long-lifting scores similar to the goodness score:
                                                  1 n
                                         b xL =    ⋅ ∑ mxL (dd i ) ,
                                                  n i =1

where the subscript xL should be SL or LL, which stands for short- or long-lifting, respectively. These
scores indicate the behavior of a supplier and give the risk of being short- or long-lifted. Note, by
definition:
                                         bG + bSL + bLL = 1
Remember that the scores correspond to the degree of membership, i.e. how close a bunkering is to the
good or bad benchmark, they can therefore be understood as weights corresponding to the proportion
of good or bad.

The Ugly
As pointed out above, profit maximization by reporting densities at or close to the upper limit may be
considered as fairly ugly behavior. The same methodology can be applied to obtain a near limit score
for this behavior by constructing a membership function
                           mNC(claimed density) = mG(claimed density - 991)
where the subscript NC denotes Near Ceiling.
This membership function assigns a scoring to a bunkering corresponding to the distance from the
density limit and frequency of occurrence in the benchmark. To avoid categorizing a bunkering as
ugly when the measured density is actually near the limit, we employ a convolution of mNC and mSL. In



                                                                                                        p. 7
so doing we exclude all reportings that are near the limit but that are actually honest. We propose the
following ugly or near limit benchmark
                                       1 n
                              b NC =    ⋅ ∑ mSL (dd i ) ⋅ mNC (claimed densityi )
                                       n i =1
giving the fraction of short-lifting that could be considered as near limit reporting.


Further characterization of Good and Bad
In order to further characterize bunkering samples within the good-, short-, or long-lifting region in the
scatter plot, the average density deviations in each region could be computed by weighting each
bunkering sample with the corresponding score from the membership function. For instance, the mean
density difference ( dd SL ) in the short-lifting area is:


                                                     ∑ (dd ) ⋅ m (dd )
                                                      i
                                                               i        SL         i
                                           dd SL   =
                                                       ∑ m (dd )
                                                          i
                                                                   SL        i


in kg/m3, where the index i runs over all samples n.
This means, for a given supplier we can provide information about the risk of being short-lifted, bSL,
and about the expected average amount in density difference, dd SL . The method is easily extended to
the other identified behaviors.


    4. Application of the benchmarks
As discussed above the power of the scatter plot lies in the visualization of the different density
reporting schemes. Several patterns, like fixed value density reporting, systematic density reporting
deviations, etc., are easily spotted. The benchmarks developed above are constructed to discriminate
between some of these different reporting schemes, and to quantify the risk of being short-lifted as
well as the amount of short-lifting that should be expected. The benchmarks for our examples from
Table 1 are given in Table 2 below.

Table 2: Standard descriptive measures together with our benchmark(s) for the geo-regions and suppliers
from Table 1. The benchmarks for the data that were used to generate the best practice histogram are also
included for comparison. A row, e.g. Global, is read as follows: average density difference is 0.39, std=3.92.
Benchmarking against the best practice gives the following results: 43% of the samples can be regarded as
good (bG), 31% qualify as short-lifting (bSL), and 26% as long-lifting (bLL). For the short-lifting samples the
average density difference is 3.31, but only 7% of them were near the ceiling.
                                         dd        σdd        bG             bSL       bLL    bNC    dd SL
                                        (kg/m3)                                                      (kg/m3)
              Best Practice              0.05      1.16       0.62       0.19          0.19   0.01   1.50
                 Global                  0.39      3.92       0.43       0.31          0.26   0.07   3.31
       Canada & US West Coast            0.03      2.43       0.55       0.22          0.24   0.02   2.09
               South Asia                1.22      3.35       0.41       0.52          0.07   0.26   2.44
              Middle east                1.83      4.76       0.32       0.49          0.19   0.02   4.61
          South America West             0.48      6.00       0.08       0.42          0.50   0.00   3.73
               Supplier 1                0.12      0.95       0.71       0.09          0.20   0.02   1.70
               Supplier 2                2.31      4.84       0.36       0.53          0.11   0.13   4.65
               Supplier 3                2.40      1.83       0.09       0.87          0.03   0.00   2.81
               Supplier 4                2.07      2.81       0.27       0.72          0.01   0.46   2.64



                                                                                                               p. 8
The samples used to generate the best practice histogram were included in the table for easy
comparison. Note that the only way the good score can be 1 is when all samples are at dd=0, this
explains why even the good score of the best practice is ‘only’ 0.62. The table shows that for the
selected geo-regions the highest risk of being short-lifted is found in South Asia. The near-limit
benchmark, bNC, confirms what is apparent from the scatter-plot (Figure 3), that for many suppliers it
is a common practice to maximize their profit by just reporting a fuel density at or near the limit.
South America West nicely illustrates the strong ability of the benchmark to identify the underlying
behavior. Recall that for this area the mean was near zero, but the high standard deviation suggested
large fluctuations in their reporting. Even so, no indications about the underlying reporting schemes,
or the risk of being short- or long-lifted, can be deduced. In contrast, our benchmark reveals that the
likelihood of actually getting what you paid for is rather slim, viz. around 8%. In the vast majority of
the cases either short- or long-lifting takes place.
Observe also that Supplier 1 can indeed be regarded as honest with a good score higher than best
practice. Supplier 2 and 3 have comparable average density differences but their good and near limit
benchmarks clearly separates them. A comparison of the benchmarks with the corresponding scatter
plots will confirm that the benchmarks do indeed give a more accurate description of the honesty of
suppliers than standard descriptive statistics.



Figure 9: Comparison of different benchmarking methods: suppliers ranked based on their mean density
difference, dd , (top), and their corresponding good score, bG (bottom). Observe that ranking with respect
to the mean would result in about 1057 good suppliers (| dd | ≤ 0.7). Our scoring with respect to best
practice, (0.62), reveals however that about 150 are definitively bad (left-hatched area), even below global
average (0.43). 539 are rally good (equal to or better than best practice, right-hatched area) whereas the
rest are located between global average and best practice. Observe also that simply relying on the mean to
characterize suppliers would label several of them as bad even though their good score is above global best
practice.



Supplier ranking
In Figure 9 (top) all suppliers of RMG380 fuel worldwide are ranked with respect to their mean
density difference, dd . When using | dd | ≤ 0.7 as a criterion for goodness then the mean would imply
there are about 1057 good suppliers. Applying this mean dd to our benchmarking method results in
the continuous bell-shaped curve (blue). If dd is indeed an unbiased measure for the goodness of
suppliers, then their scorings should be closely scattered around this curve – this is, however, not at all
the case. This discrepancy stems from the unreliability of the mean (or standard deviation) as a
trustworthy measure whenever the underlying distributions are non-normal or outliers have a large
effect. The figure visualizes clearly that 150 of the apparently good suppliers are actually quite bad, i.e.
even below global average (left hatched area), whereas just about the half (539) can be considered
equal to or better than best practice (right hatched area). Observe also that many of the apparently bad
suppliers (those with | dd | > 0.7) are actually better then their reputation as most of them are above the
bell shaped curve, some are even above best practice – further emphasizing the need for an unbiased
score like bG.

Development over time
Following the development of the score of a supplier, port, or region over time may give valuable
indications about what may be expected in the near future. For instance, Figure 10 shows the
development of the bG score for two major ports, Singapore and Rotterdam, over the past 25 years.




                                                                                                        p. 9
Figure 10: Time series of goodness scores bG for two large ports in different geo-regions. Data from all
available suppliers are included. Dots are quarterly time intervals while the stippled lines are year
averages. Each dot is based on a varying number of ‘raw data points’, i.e. the number of bunkerings
during the corresponding time interval.

Observe that from the beginning of the 1980s and up to the mid 1990s the quality of the density
reporting was increasing. It then leveled off until 2008, when a change in behavior occurred – perhaps
triggered by the onset of the global recession?

    5. Discussion and concluding remarks
This paper has two main focus areas: the construction of a realistic benchmark and the development of
a methodology that allows comparing one or more samples with the benchmark.
The examples given above demonstrate the capabilities of our approach. It is more powerful than
standard descriptive statistics (e.g. dd and σdd), as it is less sensitive to outliers and is well suited for
small datasets and even single numbers. Recall that our benchmarks give better quantifications than
the dd and σdd together. Further, it makes no assumptions about the data distributions. There are
actually no restrictions to the probability distribution of the underlying data – any distribution is
allowed. Only some weak requirements apply to the membership function (e.g. increasing/decreasing).
The methodology is quite generic and could in principle be applied to any kind of comparison task, i.e.
benchmarking.
The fact that the benchmark is based on a probability density function, and that a probabilistic
interpretation of the scoring is possible, is an aid to the user’s intuition, making it easier to understand
and interpret the results.
Once a best practice histogram has been generated, a membership function can be derived, after which
benchmarking is easily done. Subjectivity is only involved in the definition of what can be regarded as
best practice, as there is no a priori correct answer to this problem. Our approach has been to ask:
what should be expected of a good supplier? And by answering this question we have picked suppliers
that best match our expectations. Outliers and incorrect claims near the density limit are of course not
wanted from a good supplier, hence their removal from the best practice data set.
From a user perspective the main strengths of the presented benchmark are:
    • Institutive and easy to understand.
    • Applicable for few or even singleton samples.
    • Able to pinpoint different density reporting schemes.
In closing let us return to the extent and amount of global short-lifting which is estimated to be around
1.7 ton per bunkering on average. Thanks to our benchmarking methodology we can now provide a
more detailed picture of the situation. First, 43% of the bunkerings could be considered to be loss
neutral (bG=0.43), since they are within best practice. Second, 26% are instances of long-lifting
(bLL=0.26), where the buyer gains on average 1.8 ton. Third, 31% could be regarded as short-lifting
(bSL=0.31), with an average buyer loss of 2.5 ton per bunkering. This highlights the importance of
choosing the right supplier.
The presented benchmark methodology is easily extendable to other (quality and economical)
bunkering parameters like viscosity, sulfur or water content, as well as a series of physical and
chemical properties. The methodology will be the basis for a benchmarking web tool, scheduled for
release by DNVPS later this year.




Figure 11: Bunker surveyor on board a ship. Photo by DNV Petroleum Services (used with
permission).




                                                                                                        p. 10
References

Bhattacharyya, G., Johnson, R. (1977), Statistical Concepts and Methods, Wiley, New York.
DNV (2010). Total fuel management,
http://www.dnv.com/industry/maritime/servicessolutions/fueltesting (accessed 13. Oct. 2010).
EPA (2008), Global Trade and Fuels Assessment -Future Trends and Effects of Requiring Clean Fuels
in the Marine Sector. Assessment and Standards Division Office of Transportation and Air Quality,
U.S. Environmental Protection Agency. EPA420-R-08-021, November 2008.
Eyring, V., Isaksen, I.S.A., Berntsen, T., Collins, W.J., Corbett, J.J., Endresen, O., Grainger, R.G.,
Moldanova, J., Schlager, H., Stevenson, D.S. (2010), “Transport impacts on atmosphere and climate:
Shipping”, Atmospheric Environment, Volume 44, Issue 37, December 2010, pp. 4735-4771.
Hastie, T., Tibshirani, R., Friedman, J. (2009), The Elements of Statistical Learning: Data Mining,
Inference, and Prediction (second edition). Springer, New York.
IEA (2010). World Energy Outlook 2010. International Energy Agency, OECD Publishing, Paris.
IMO (2009). Prevention of Air Pollution from Ships. International Maritime Organization, Marine
Environment Protection Committee. MEPC 59/INF.10, 9 April 2009.
Lowen, R. (1996), Fuzzy Set Theory, Kluwer Academic Publishers, Dordrecht.
Self, K. (1990), “Designing with fuzzy logic”, IEEE Spectrum, Vol 27, No 11, November 1990, pp.
42-44, p. 105.
Terano, T., Asai, K., Sugeno, M. (1987), Fuzzy Systems Theory and its Applications. Academic Press,
San Diego.
Turksen, I.B. (1991), “Measurement of membership functions and their acquisition”, Fuzzy Sets and
Systems, Vol. 40, pp. 5-38.




                                                                                                   p. 11
Figure 1:
Figure 2:
Figure 3:
Figure 4:




Figure 10:
Figure 11:
Figure 5




                                   Limit
                                                 max. cheat area
                             991




                             981
                                   Bad




           Claimed density
                             971

                             Good                      Suspicious
                                                                     Limit




                             961
                                961        971          981         991
                                            Measured density
Figure 6




       Limit = max. cheat line




                                    =

                                        +



                                        -
                                e
                        t   lin
                     ea
                o ch
            n                               Limit
Figure 7




   Probability
                 Density
                 deviations
Figure 8




           Long-lifting                      1                    Short-lifting

                                    mG
                          mB=1-mG
                                                            Bad: mB =1-0.23
                                                                    = 0,77




                                                            Good: mG = 0.23


                                         0       dd = 2.3    density difference
Figure 9


                                  10




                                    5

                                                    Ca. 1057 suppliers
                                   0.7
                                                                                              total number
                                    0
                                                                                              of suppliers
                                  - 0.7 0    500   1000          1500          2000   2500




    claimed – measured density
                                   -5

                                    1
                                                                         539                 Some “bad suppliers” are
                                                                                             actually very good !

                                 0,75
                                                                                             Best practice score

                                  0,5
                                                                                             Global average score




    Good score
                                                                                             Some “bad suppliers” are
                                                                                             actually slightly better !
                                 0,25
                                                                                             Many “good suppliers” are
                                                                                             actually quite bad !
                                                                         150
                                    0
                                         0   500   1000          1500          2000   2500

Contenu connexe

Similaire à 6.benchmarking of

Preventive maintenance
Preventive maintenancePreventive maintenance
Preventive maintenanceSTACY DAVIS
 
Log Segregation Study 2 (2)
Log Segregation Study 2 (2)Log Segregation Study 2 (2)
Log Segregation Study 2 (2)Robert Terras
 
12 Simple Ideas To Make Your Supply Chain Greener Exec
12 Simple Ideas To Make Your Supply Chain Greener Exec12 Simple Ideas To Make Your Supply Chain Greener Exec
12 Simple Ideas To Make Your Supply Chain Greener ExecLidia Gasparotto
 
Expect the-unexpected cti-imperial " Yes Carbon Tracker forgot 10,000 billion...
Expect the-unexpected cti-imperial " Yes Carbon Tracker forgot 10,000 billion...Expect the-unexpected cti-imperial " Yes Carbon Tracker forgot 10,000 billion...
Expect the-unexpected cti-imperial " Yes Carbon Tracker forgot 10,000 billion...www.thiiink.com
 
IRJET- Guidelines to Improve Fiscal Natural Gas Metering Accuracy and Consist...
IRJET- Guidelines to Improve Fiscal Natural Gas Metering Accuracy and Consist...IRJET- Guidelines to Improve Fiscal Natural Gas Metering Accuracy and Consist...
IRJET- Guidelines to Improve Fiscal Natural Gas Metering Accuracy and Consist...IRJET Journal
 
greenhouse-gas-allowance-allocation-cost-pass-through-sector-differentiation-...
greenhouse-gas-allowance-allocation-cost-pass-through-sector-differentiation-...greenhouse-gas-allowance-allocation-cost-pass-through-sector-differentiation-...
greenhouse-gas-allowance-allocation-cost-pass-through-sector-differentiation-...Eric Williams
 
06-19-energy-storage-economics-2_0-for-washington-d_c_
06-19-energy-storage-economics-2_0-for-washington-d_c_06-19-energy-storage-economics-2_0-for-washington-d_c_
06-19-energy-storage-economics-2_0-for-washington-d_c_HG Chissell
 
Dr Dev Kambhampati | DOE NETL Report- Cost & Performance Baseline for Fossil ...
Dr Dev Kambhampati | DOE NETL Report- Cost & Performance Baseline for Fossil ...Dr Dev Kambhampati | DOE NETL Report- Cost & Performance Baseline for Fossil ...
Dr Dev Kambhampati | DOE NETL Report- Cost & Performance Baseline for Fossil ...Dr Dev Kambhampati
 
Carbon pricing-in-the-corporate-world
Carbon pricing-in-the-corporate-worldCarbon pricing-in-the-corporate-world
Carbon pricing-in-the-corporate-worldSustainable Brands
 
CMR 495- Cap StoneMini Case Submission RequirementsEach .docx
CMR 495- Cap StoneMini Case Submission RequirementsEach .docxCMR 495- Cap StoneMini Case Submission RequirementsEach .docx
CMR 495- Cap StoneMini Case Submission RequirementsEach .docxpickersgillkayne
 
Power Responsive DSR Conference 18th June - Summary Paper
Power Responsive DSR Conference 18th June - Summary PaperPower Responsive DSR Conference 18th June - Summary Paper
Power Responsive DSR Conference 18th June - Summary PaperPower Responsive
 
Put gas-on-standby- oct21-v3
Put gas-on-standby- oct21-v3Put gas-on-standby- oct21-v3
Put gas-on-standby- oct21-v3MarcoGrondacci1
 

Similaire à 6.benchmarking of (20)

Ltbr 6262013
Ltbr 6262013Ltbr 6262013
Ltbr 6262013
 
Preventive maintenance
Preventive maintenancePreventive maintenance
Preventive maintenance
 
MARCOD_PhD_web
MARCOD_PhD_webMARCOD_PhD_web
MARCOD_PhD_web
 
Log Segregation Study 2 (2)
Log Segregation Study 2 (2)Log Segregation Study 2 (2)
Log Segregation Study 2 (2)
 
12 Simple Ideas To Make Your Supply Chain Greener Exec
12 Simple Ideas To Make Your Supply Chain Greener Exec12 Simple Ideas To Make Your Supply Chain Greener Exec
12 Simple Ideas To Make Your Supply Chain Greener Exec
 
Expect the-unexpected cti-imperial " Yes Carbon Tracker forgot 10,000 billion...
Expect the-unexpected cti-imperial " Yes Carbon Tracker forgot 10,000 billion...Expect the-unexpected cti-imperial " Yes Carbon Tracker forgot 10,000 billion...
Expect the-unexpected cti-imperial " Yes Carbon Tracker forgot 10,000 billion...
 
IRJET- Guidelines to Improve Fiscal Natural Gas Metering Accuracy and Consist...
IRJET- Guidelines to Improve Fiscal Natural Gas Metering Accuracy and Consist...IRJET- Guidelines to Improve Fiscal Natural Gas Metering Accuracy and Consist...
IRJET- Guidelines to Improve Fiscal Natural Gas Metering Accuracy and Consist...
 
greenhouse-gas-allowance-allocation-cost-pass-through-sector-differentiation-...
greenhouse-gas-allowance-allocation-cost-pass-through-sector-differentiation-...greenhouse-gas-allowance-allocation-cost-pass-through-sector-differentiation-...
greenhouse-gas-allowance-allocation-cost-pass-through-sector-differentiation-...
 
Stakeholder Magazine (Greater Dandenong)
Stakeholder Magazine (Greater Dandenong)Stakeholder Magazine (Greater Dandenong)
Stakeholder Magazine (Greater Dandenong)
 
06-19-energy-storage-economics-2_0-for-washington-d_c_
06-19-energy-storage-economics-2_0-for-washington-d_c_06-19-energy-storage-economics-2_0-for-washington-d_c_
06-19-energy-storage-economics-2_0-for-washington-d_c_
 
Dr Dev Kambhampati | DOE NETL Report- Cost & Performance Baseline for Fossil ...
Dr Dev Kambhampati | DOE NETL Report- Cost & Performance Baseline for Fossil ...Dr Dev Kambhampati | DOE NETL Report- Cost & Performance Baseline for Fossil ...
Dr Dev Kambhampati | DOE NETL Report- Cost & Performance Baseline for Fossil ...
 
Breedon Aggregates
Breedon AggregatesBreedon Aggregates
Breedon Aggregates
 
Green Web Hosting A Must To Adopt in 2023.pdf
Green Web Hosting A Must To Adopt in 2023.pdfGreen Web Hosting A Must To Adopt in 2023.pdf
Green Web Hosting A Must To Adopt in 2023.pdf
 
Carbon pricing-in-the-corporate-world
Carbon pricing-in-the-corporate-worldCarbon pricing-in-the-corporate-world
Carbon pricing-in-the-corporate-world
 
CMR 495- Cap StoneMini Case Submission RequirementsEach .docx
CMR 495- Cap StoneMini Case Submission RequirementsEach .docxCMR 495- Cap StoneMini Case Submission RequirementsEach .docx
CMR 495- Cap StoneMini Case Submission RequirementsEach .docx
 
Power Responsive DSR Conference 18th June - Summary Paper
Power Responsive DSR Conference 18th June - Summary PaperPower Responsive DSR Conference 18th June - Summary Paper
Power Responsive DSR Conference 18th June - Summary Paper
 
Life cycle costing
Life cycle costingLife cycle costing
Life cycle costing
 
Carbon Financing for Renewable Energy Projects in Zimbabwe – A Case of Chipen...
Carbon Financing for Renewable Energy Projects in Zimbabwe – A Case of Chipen...Carbon Financing for Renewable Energy Projects in Zimbabwe – A Case of Chipen...
Carbon Financing for Renewable Energy Projects in Zimbabwe – A Case of Chipen...
 
NSR Executive Summary
NSR Executive SummaryNSR Executive Summary
NSR Executive Summary
 
Put gas-on-standby- oct21-v3
Put gas-on-standby- oct21-v3Put gas-on-standby- oct21-v3
Put gas-on-standby- oct21-v3
 

Plus de libfsb

Principles of food beverage and labor cost controls
Principles of food  beverage  and labor cost controlsPrinciples of food  beverage  and labor cost controls
Principles of food beverage and labor cost controlslibfsb
 
Principles of food beverage and labor cost controls
Principles of food  beverage  and labor cost controlsPrinciples of food  beverage  and labor cost controls
Principles of food beverage and labor cost controlslibfsb
 
Foodbeverage
FoodbeverageFoodbeverage
Foodbeveragelibfsb
 
Food and beverage_operations
Food and beverage_operationsFood and beverage_operations
Food and beverage_operationslibfsb
 
Food safety basics a reference guide for foodservice operators
Food safety basics a reference guide for foodservice operatorsFood safety basics a reference guide for foodservice operators
Food safety basics a reference guide for foodservice operatorslibfsb
 
The bar & beverage book
The bar & beverage bookThe bar & beverage book
The bar & beverage booklibfsb
 
The bar & beverage book
The bar & beverage bookThe bar & beverage book
The bar & beverage booklibfsb
 
Introduction.to.management.in.the.hospitality.industry.10th.edition
Introduction.to.management.in.the.hospitality.industry.10th.editionIntroduction.to.management.in.the.hospitality.industry.10th.edition
Introduction.to.management.in.the.hospitality.industry.10th.editionlibfsb
 
Hotel front office management 3rd edition
Hotel front office management 3rd editionHotel front office management 3rd edition
Hotel front office management 3rd editionlibfsb
 
4.the singularity
4.the singularity4.the singularity
4.the singularitylibfsb
 
3.great profits
3.great profits3.great profits
3.great profitslibfsb
 
2.pleasing all
2.pleasing all2.pleasing all
2.pleasing alllibfsb
 
1.the recession,
1.the recession,1.the recession,
1.the recession,libfsb
 
9.greener library
9.greener library9.greener library
9.greener librarylibfsb
 
8.moving on
8.moving on 8.moving on
8.moving on libfsb
 
7.let them
7.let them7.let them
7.let themlibfsb
 
6.dealing with
6.dealing with6.dealing with
6.dealing withlibfsb
 
5.the management
5.the management5.the management
5.the managementlibfsb
 
4.making the
4.making the4.making the
4.making thelibfsb
 
2.free electronic
2.free electronic2.free electronic
2.free electroniclibfsb
 

Plus de libfsb (20)

Principles of food beverage and labor cost controls
Principles of food  beverage  and labor cost controlsPrinciples of food  beverage  and labor cost controls
Principles of food beverage and labor cost controls
 
Principles of food beverage and labor cost controls
Principles of food  beverage  and labor cost controlsPrinciples of food  beverage  and labor cost controls
Principles of food beverage and labor cost controls
 
Foodbeverage
FoodbeverageFoodbeverage
Foodbeverage
 
Food and beverage_operations
Food and beverage_operationsFood and beverage_operations
Food and beverage_operations
 
Food safety basics a reference guide for foodservice operators
Food safety basics a reference guide for foodservice operatorsFood safety basics a reference guide for foodservice operators
Food safety basics a reference guide for foodservice operators
 
The bar & beverage book
The bar & beverage bookThe bar & beverage book
The bar & beverage book
 
The bar & beverage book
The bar & beverage bookThe bar & beverage book
The bar & beverage book
 
Introduction.to.management.in.the.hospitality.industry.10th.edition
Introduction.to.management.in.the.hospitality.industry.10th.editionIntroduction.to.management.in.the.hospitality.industry.10th.edition
Introduction.to.management.in.the.hospitality.industry.10th.edition
 
Hotel front office management 3rd edition
Hotel front office management 3rd editionHotel front office management 3rd edition
Hotel front office management 3rd edition
 
4.the singularity
4.the singularity4.the singularity
4.the singularity
 
3.great profits
3.great profits3.great profits
3.great profits
 
2.pleasing all
2.pleasing all2.pleasing all
2.pleasing all
 
1.the recession,
1.the recession,1.the recession,
1.the recession,
 
9.greener library
9.greener library9.greener library
9.greener library
 
8.moving on
8.moving on 8.moving on
8.moving on
 
7.let them
7.let them7.let them
7.let them
 
6.dealing with
6.dealing with6.dealing with
6.dealing with
 
5.the management
5.the management5.the management
5.the management
 
4.making the
4.making the4.making the
4.making the
 
2.free electronic
2.free electronic2.free electronic
2.free electronic
 

6.benchmarking of

  • 1. Article Title Page Benchmarking of Marine Bunker Fuel Suppliers: The Good, The Bad, The Ugly Author Details Author 1 Name: Ole Jørgen Anfindsen University/Institution: DNV Research & Innovation Town/City: Høvik Country: Norway Author 2 Name: Grunde Løvoll University/Institution: DNV Research & Innovation Town/City: Høvik Country: Norway Author 3 Name: Thomas Mestl University/Institution: DNV Research & Innovation Town/City: Høvik Country: Norway Corresponding author: Ole Jørgen Anfindsen Corresponding Author’s Email: ole.jorgen.anfindsen@dnv.com Acknowledgments (if applicable): n/a Biographical Details (if applicable): Ole Anfindsen holds a dr. scient. degree (PhD) in computer science and a bachelors degree in electronics engineering. For more than 25 years he has worked with databases and related technologies. He has been senior research scientist in Telenor R&D, visiting researcher at GTE Laboratories (Massachusetts) and Sun Microsystems Laboratories (California), as well as adjunct associate professor at the Institute of Informatics at the University of Oslo. He currently works as a researcher in the Research & Innovation department of DNV, where his main activity is directed towards data analysis especially in the maritime area. G. Løvoll has a dr. scient. degree (PhD) in physics. Grunde has worked for 6 years as a Post Doc and researcher at the Department of Physics at the University of Oslo doing experimental studies on multiphase flow in porous materials, water diffusion in dry clay and optical tweezers. Dr. Løvoll currently works as a researcher in DNV Research & Innovation, where his main focus is on data analysis in the maritime area. Thomas Mestl has a Dr. Scient. (PhD) in mathematics and a degree in precisions engineering. He has worked in DNV's Research Department for the last 13 years within the field of information technology. A large part of his work has been on identifying emerging technology trends, evaluating new ICT technologies (especially with respect to mobile work and information management), and to identify promising business opportunities offered by new or combination of existing technologies. Currently, his main activity is directed towards data analysis especially in the maritime area. Structured Abstract: Purpose - This paper has two main focus areas; the construction of a realistic best practice benchmark, and the development of a methodology for comparison of individual suppliers of marine bunker fuel. As is well-known in this trade, unfair business behaviors in the bunker fuel market are not uncommon, resulting in financial losses for the buyers. Design/methodology/approach - Establishing a best practice will naturally involve some degree of subjectivity as there is not a priori correct answer to this problem. Using the concept of membership functions from fuzzy set theory, a score can be derived from a best practice benchmark histogram. The main advantages of this method are its relative independence both of sample size and of the underlying distribution, as well as being computationally very efficient. Findings - Our methodology turns out to be more powerful than standard descriptive statistics, as it is less sensitive to outliers and is well suited for small datasets and even single numbers. When applied to data for all suppliers worldwide it turns out that the number of good suppliers is actually much lower than might be expected. Practical implications - Bunker fuel is a major expense for ship owners, and can easily reach $30 million/year for a single container ship. There is therefore a considerable interest in the market for benchmarking of individual fuel suppliers. Our methodology is also applicable to other quality related fuel parameters. Originality/value - To the best of our knowledge this is the first attempt to benchmark actors in the marine bunker fuel industry and to quantify their behaviors. Keywords: benchmarking, membership functions, scoring, fuzzy clustering, supplier quality, best practice
  • 2. Type header information here Article Classification: Technical paper For internal production use only Running Heads: Type footer information here
  • 3. Benchmarking of Marine Bunker Fuel Suppliers: The Good, The Bad, The Ugly Abstract Purpose This paper has two main focus areas; the construction of a realistic best practice benchmark, and the development of a methodology for comparison of individual suppliers of marine bunker fuel. As is well-known in this trade, unfair business behaviors in the bunker fuel market are not uncommon, resulting in financial losses for the buyers. Design/methodology/approach Establishing a best practice will naturally involve some degree of subjectivity as there is no a priori correct answer to this problem. Using the concept of membership functions from fuzzy set theory, a score can be derived from a best practice benchmark histogram. The main advantages of this method are it’s relative independence both of sample size and of the underlying distribution, as well as being computationally very efficient. Findings Our methodology turns out to be more powerful than standard descriptive statistics, as it is less sensitive to outliers and is well suited for small datasets and even single numbers. When applied to data for all suppliers worldwide it turns out that the number of good suppliers is actually much lower than what might be expected. Practical implications Bunker fuel is a major expense for ship owners, and can easily reach $30 million/year for a single container ship. There is therefore a considerable interest in the market for benchmarking of individual fuel suppliers. Our methodology is also applicable to other quality related fuel parameters. Originality/value To the best of our knowledge this is the first attempt to benchmark actors in the marine bunker fuel industry and to quantify their behaviors. Keywords: benchmarking, membership functions, scoring, fuzzy clustering, supplier quality, best practice Category: Technical Paper 1. Introduction The density of marine bunker fuel can be regarded as one of its most basic parameters. It is used for fuel quantity estimation, and is also the basis for the so-called Calculated Carbon Aromaticity Index (CCAI), an important factor for ignition and for deposits in the engine and used for calculating the specific energy content in fuel. Density is also an important factor when it comes to the process of separating water or solids from bunker fuel. For the typical ship operator the primary importance of density comes from the fact that bunker fuel is delivered by volume but paid per ton. The conversion is done by means of the fuel density reported by the supplier. A small density difference between stated and actual fuel density can quickly lead to large financial losses for the ship operator. For instance, if a density of 977 kg/m3 is stated when the actual value happens to be 960 kg/m3, this will give rise to a difference of nearly 35 ton when p. 1
  • 4. bunkering 2000m3, the value of which, in the current market, is close to US$ 20,000 – just for a single bunkering. Although this example belongs in the high end of the spectrum, it is not at all hard to find even more extreme examples in real life. And such a way of making a quick buck is exploited by many fuel suppliers as their stated density is usually used to calculate the quantity of the delivered fuel. Over- reporting of density, i.e. claiming that the fuel density is higher than what is actually the case, is called short-lifting, while the opposite could be termed long-lifting. Short-lifting implies that the ship operator loses money, since he pays for more fuel than he receives. Long-lifting implies that the fuel supplier loses money, and that the ship operator gets more than what he pays for. The global market for marine bunker fuel is more than 300 million tons annually (IEA 2010, p. 618; Eyring et al 2010; IMO 2009; EPA 2008). We estimate that more than 300,000 tons of bunker fuel, i.e. about 1‰ of the global consumption, is short-lifted every year. We further estimate that the amount of long-lifting exceeds 150,000 tons. That is, on the order of half a million tons are long- or short-lifted annually. Thus, bunker fuel worth more than US$200 million appears not to be properly accounted for every year. Both short- and long-lifting may be indications of fraudulent behavior of individual employees within the ship operator’s or bunker fuel supplier’s organization. Such behavior is however sufficiently widespread that a systematic and commonly accepted short-lifting praxis in parts of the bunker fuel trade may be suspected. Some fuel suppliers use this tactic to consistently over-state the delivered amount to improve the company’s profit margin. Many ship operators and suppliers would welcome a benchmarking of suppliers, ports, or geo-regions against some best practice. The rest of the paper is organized as follows: In Section 2 we take a closer look at concrete examples of different density reporting strategies and discuss the difficulties associated with single number characteristics. In Section 3 we use this to characterize good suppliers and derive criteria for defining a best practice. In Section 4, a Best Practice Classifier is constructed that will assign a Best Practice Score to an individual bunkering or a supplier. We also present a series of benchmarking comparisons between regions together with an overview of how they developed over a 10 year period. This paper ends with a discussion and some promising leads for further work. 2. Investigating density reporting behavior Table 1 gives some statistics for density deviations on a global and local basis (e.g. Canada and the US West coast, South Asia, Middle East, and South America West) and for 4 selected suppliers (S1, S2 , S3, S4) in 4 different bunker ports. The density difference, dd, is the difference between the density claimed by the supplier and the actual density measured by a fuel testing agency (e.g. DNVPS). The average density difference, dd , could in principle be used to characterize the behavior of a fuel supplier (a port or a region) as good, medium or bad. Unfortunately, most of such single number quality measures have some sort of shortcoming as they compress a wealth of information into a single number. They often wipe out (quite effectively) much of the information about the interesting behavior of a supplier. In addition, the arithmetic mean or median may be less suited for distributions that are non-normal, skewed or showing heavy tails. Also, the mean and standard deviation is very sensitive to outliers (a few unusually large or small observations) (Bhattacharyya & Johnson 1977). As an example, the mean value of ten bad bunkerings could easily be balanced by one exceptionally good one (or a typing error), while the median is less sensitive to outliers. Another problem with the mean and median is that they reveal nothing about the shape of the underlying distribution. For instance, if we only look at the mean, the geo-region South America West seems to be better than e.g. Canada & US West Coast from a short-lifting perspective, see Table 1. If we take the standard deviation into account it is obvious that there is a higher risk of being short-lifted in South America West than in the other geo-regions, simply because the distribution is wider. The standard deviation only refers to the width of the underlying distribution but not to the actual shape. As can be seen in Figure 2 the distributions are non-normal, i.e. a highly skewed middle spike combined with a very long one-sided tail. p. 2
  • 5. Table 1: Standard descriptive measures of density differences for some selected geo-regions and suppliers (n = number of samples, dd = mean density difference, σdd = standard deviation of dd). Histograms for the geo-regions and suppliers are shown in Figures 1 and 2 respectively, whereas their scatter plots are shown in Figures 3 and 4. Data in this table and in the following examples is, unless otherwise stated, based on DNVPS bunkering samples of RMG380 fuel collected in 2008 (confer DNV 2010). dd median(dd) n in kg/m3 σdd in kg/m3 Global 43343 0.39 0.10 3.92 Canada & US West Coast 1919 0.03 -0.10 2.43 South Asia 6806 1.22 0.90 3.35 Middle east 2990 1.83 0.70 4.76 South America West 565 -0.48 -0.90 6.00 Supplier 1 (S1) 129 -0.12 -0.10 0.95 Supplier 2 (S2) 239 2.31 0.90 4.84 Supplier 3 (S3) 71 2.40 2.60 1.83 Supplier 4 (S4) 145 2.07 1.50 2.81 Histograms For a more detailed understanding of the properties of the data in Table 1 please refer to the density difference histograms of Figures 1 and 2. For comparison we have plotted a smoothed version of the global histogram (dashed line) and a smoothed version of the actual histogram (solid line). These histograms represent estimates for the underlying probability density distribution and can thus tell us something about the risk and possible amount of the short-lifting. A comparison with a reference histogram, like the global histogram, would provide the desired benchmark. From Figure 1 it can be seen that none of the histograms seem to come from a normal distribution (the implications of this observation will not be further discussed in this paper). This can be confirmed by means of a probability plot. The different geo-regions also show significant differences in their density reporting practice. Canada & US West Coast appears better than the global average, the peak of the histogram is centered at 0 and has shorter tails. For South Asia, the width of the histogram is similar to the global one, but its center is shifted towards short-lifting, whereas the Middle East shows a fairly heavy short-lifting tail. The histogram for South America West is especially remarkable as the chance of actually getting the fuel density stated by the supplier appears to be slim. The rule is rather that the buyer is either short- or long-lifted, something which could not be deduced from the standard descriptive statistics. Figure 1: Probability distribution of density reporting deviations (i.e. the difference between claimed and measured density) for 4 selected geo-regions. The histograms are (clockwise from top left): Canada & US West Coast, South Asia, Middle East, and South America West Coast. The solid lines represent the smoothed histogram while the dashed lines are the smoothed global histogram. The underlying number of samples, averages, medians, and standard deviations are given in Table 1. The histograms reveal considerable variation in density reporting. Histograms for individual suppliers listed in Table 1 are shown in Figure 2 below. A visual comparison indicates that Supplier 1 is much better than the global average with a narrow symmetric distribution centered at 0. The three other suppliers are all heavily short-lifting with varying degrees of right-shifted and/or right-heavy distributions. Based on these histograms the suppliers might be characterized as rather bad, but any fine grained information about their underlying reporting strategy is removed by the histogram. A main disadvantage of using histograms for characterizing suppliers is that they require a considerable amount of data which could be a challenge when considering short time periods or suppliers with few data samples. p. 3
  • 6. Figure 2: Probability distribution of density reporting deviations (i.e. the difference between claimed and measured density) for 4 selected suppliers in 4 different bunker ports (for more details se Table 1). The histograms reveal different reporting behavior, but histograms become noisy when the number of samples becomes too low. Scatter plots Scatter plots of measured vs. claimed density allows a much more fine grained view on the underlying data. These plots may be used to unravel the various reporting strategies of the suppliers, see Figure 3 and Figure 4. Scatter plots quite effectively visualize the density reporting behavior of suppliers or groups of suppliers. Note that each dot in a scatter-plot represents at least one bunkering sample. The diagonal solid line represents correct density reporting (i.e. stated = measured, in the following called no-cheat line). The horizontal and vertical dashed lines specify the upper density limit given by the ISO8217 standard. These scatter plots exhibit some interesting observations. Note that the range of densities of the available fuel varies between geo-regions; e.g. the fuel density range is much wider in the Middle East than in North America or South Asia. This phenomenon may be traced back to the proximity to crude oil production in the regions. Observe also that in many bunkerings the fuel density was above the limit (dots to the right of vertical dashed line) but almost none of them were reported to lie above the limit (above horizontal dashed line). This is true for all suppliers. From Figure 4 we may deduce that Supplier 1 could be considered as rather good, since most of his samples are on or close to the no-cheat line. This behavior seems to be dominant for most of the suppliers in the Canada & US West geo-region (note: good suppliers are found in all geo-regions). In contrast, Supplier 2 may be regarded as bad, since his stated densities cover the whole range from the no-cheat line and all the way up to maximum-cheating, i.e. the upper density limit given by the standard. This type of behavior is also visible both in the South Asia and the Middle East scatter plots. It seems that Supplier 3 has a strategy of simply adding an offset to the real density, which is reflected in the mean density different from zero and a relative low standard deviation. A fourth reporting scheme appears in Supplier 4 who has a tendency of always stating a density near the limit – independently of the actual density. This could be termed as the worst behavior since they short-lift as much as possible. This behavior is not uncommon in South Asia and the Middle East. Variations to this scheme, i.e. stating a fixed fuel density but lower than the limit, are seen in Asia, Middle East and South America West. They appear as horizontal lines in the scatter plot. Figure 3: Scatter plot of measured vs. claimed density for the same geo-regions as in Table 1 and Figure 1. Each black dot represents (at least) one bunkering. The solid line represents the no-cheat line, i.e. bunkerings where the supplier states the density correctly (claimed = measured), whereas the dashed lines indicate the upper density limit in the ISO standard for bunker fuel (ISO8217), viz. 991 kg/m3, implicitly giving the maximum possible amount of cheating. Many dots along the upper dashed line indicate a high degree of cheating in many bunkerings. Note that in many bunkerings the fuel density was above the limit (dots to the right of vertical dashed line) but almost none of them were reported to lie above the limit (above horizontal dashed line). Figure 4: Scatter plot of measured versus claimed density for the same suppliers as in Table 1 and Figure 2. Supplier 1 reports quite honestly as his dots are scattered close along the no-cheat line. In contrast, Supplier 2 and 3 have many reportings away from this no-cheat line but they are not as dishonest as Supplier 4, who basically reports only one density close to 991 irrespective of the actual fuel density. p. 4
  • 7. 3. The Good: Best practice benchmark The above discussion has emphasized the need for a good benchmark for measuring the goodness in density reporting, and for distinguishing between various short-lifting and long-lifting strategies. The scatter plots of Canada & US West Coast and Supplier 1 are examples of good density reporting behaviors that could be used as best practice references. Our interpretation of good or best practice is indicated by the grey diagonal area around the no-cheat line in Figure 5. Fair reporting and good control of the delivered density should result in a small symmetric scatter around the no-cheat line, and thus a narrow density difference (dd) histogram centered at dd = 0 (like the one for Supplier 1 in Figure 2). The goal is to establish a best practice, and then use it as a predefined reference to which bunkerings may be compared. This best practice benchmark is given by the dd-histogram for a group of selected good suppliers. Figure 5: Scatter plot of bunkering data from South Asia. Data points around the diagonal line (no-cheat line) indicates good or best practice behavior, i.e. fair reporting, with little or no cheating. In the area above the no-cheat line, customers get short-lifted (pay too much) whereas below the line the supplier loses money. The more dots there are above the fair line, and the further away from it they are, the less accurate the density reporting. Bunkerings far below the fair area should be considered suspicious and may indicate a bribing situation. Reportings in the grey horizontal area (reporting densities close to the upper density limit) indicate that some suppliers consciously choose a strategy of maximum density cheating. A close up of the scatter plot near the density limit = 991 kg/m3 reveals that hardly any suppliers are willing to state that their fuel exceeds the limit even when this is clearly the case. This best practice histogram shall represent good suppliers and should be based on many data points. Any outliers, intentional cheating, or other indications of dishonesty should be eliminated to obtain an unbiased and fair benchmark. The following criteria for deriving the best practice benchmark should therefore be chosen (there will always be a certain element of subjective judgment in this process, but the method for deriving the benchmark should as far as possible be transparent, sound, and unbiased): 1) Select some geo-regions where the scatter plots show that data are predominantly found along the no cheat line. 2) For each selected dataset we: a. Eliminate extreme outliers, max cheating and near limit lying; only data inside a predefined area around the no-cheat line is selected (see Figure 6 for details). b. Eliminate any bias by centering the dd data around dd = 0. 3) The adjusted and selected dd data for all the selected sets are then merged into one large dataset. 4) Calculate the dd histogram for the dataset. Figure 7 shows the best practice reference histogram derived from the geo-regions Biscay, Canada & US East Coast, Canada & US West Coast, US Gulf Coast, and Oceania. Figure 6: Only bunkering samples between the 2 blue solid lines will be used as basis for deriving the best practice benchmark histogram. This effectively eliminates max cheating, outliers, and ‘near limit effects’, i.e. less than complete honesty when selling too heavy fuel. The upper solid line divides the angle between no-cheat and max-cheat lines. The lower solid line is simply mirrored around the no-cheat line such that the density deviations are the same above and below, i.e. |+ | = |- |. p. 5
  • 8. Figure 7: Best practice dd histogram based on samples from selected geo-regions (Biscay, Canada & US East and West Coast, US Gulf Coast and Oceania) where max cheating, outliers and near limit dishonesty have been eliminated. The dashed line is the histogram function H, i.e. a smoothed version of the histogram indicating the global best practice. Classification by membership function Once the best practice histogram is generated, the challenge is to benchmark a supplier, a port, or a region against it. In principle, this histogram must be compared with the dd histograms for the suppliers in question and the degree of conformance would then give the desired benchmark. Unfortunately this is a non-trivial task and for many of the suppliers only relatively few samples are available, resulting in bad histograms. We therefore propose a more elegant approach that is insensitive to the number of data points and outliers, and that can even be used for a single bunkering. The concept of a membership function (Turksen 1991; Terano et al 1987, p. 21), which is widely applied in Fuzzy set theory (Lowen 1996, Self 1990), is used to achieve this benchmarking. A single number (score) is computed denoting the goodness of a specific bunkering or supplier. An example will hopefully make this clear. Consider the task of benchmarking people into fast and slow runners, respectively. One way to do this is to set a threshold T on how fast a person should be able to run 100 m, and then categorize the people who run slower than the threshold as slow (=0) and those who run faster than the threshold as fast (=1). This sorting is achieved by a Boolean membership function B with threshold T for the measured time t on 100 m, i.e. B(T,t). However, it is quite obvious that this benchmarking will result in a crude oversimplification as there is a continuous transition from extremely fast runners to the really slow ones, and a small change in the chosen threshold could seriously alter the number of members in each category. A better approach would be to replace the Boolean function with a continuous function, assigning a continuous membership value between 0 and 1 depending on how fast they run. This is an example of a so-called membership function, and will in the following simply be denoted m. The situation is analogous to our best practice density benchmark where suppliers (or bunkerings) are not grouped into crisp sets of good and bad but rather get a score indicating how close to or far away from the best practice they are. This, by the way, is also the reason why e.g. discriminant analysis (Hastie et al 2009) is unsuitable for the task at hand. The challenge is to find a membership function for the good group, faithfully reflecting what we consider to be good. Fuzzy set theory does not provide help in determining the membership function, as all kinds of functions are used, e.g. triangular, trapezoid, Gaussian, etc. The discussion of good behavior above gives us some hints about the properties of the desired membership function. It should not be too wide, as a bad bunkering could then be regarded as good. Likewise, if it is too narrow then a good bunkering would get a too low goodness score. It is important that the membership function represents the best practice set as well as possible. The obvious choice is to derive the membership function directly from the dd histogram itself. The membership function for good bunkerings, mG, must have a maximum value of 1 at dd = 0, i.e. mG(dd=0) = 1, and is continuously decreasing in both directions, i.e. a rescaling and shift of the H histogram has to be done. We therefore propose the following definition of the membership function: H (dd ) H (dd ) m G (dd ) = = max(H ) H (0) where the subscript G indicates that this gives a goodness scoring, and H is the smoothed (and adjusted) best practice histogram (i.e. H is the histogram function). Note that mG is a function of the distance of dd to 0, as well as the frequency of dd in the best practice. This membership function can now be applied e.g. to all n supplier samples to obtain the overall goodness benchmark, p. 6
  • 9. 1 n bG = ⋅ ∑ mG (dd i ) n i =1 where the summation is done over all n bunkerings for a specific supplier, port, or geo-region. An interesting observation is that the scoring from the membership function mG(dd) is not (a priori) a probabilistic measure, it is a measure (0→1) based on how far away a variable is from some value, i.e. dd=0; see Figure 8. However, this rescaling does preserve an interesting probabilistic feature, viz. the following: the probability of finding a value x in a small interval around dd, relative to that of finding a value y in an equally sized interval close to 0, given that the samples are drawn from the best practice group. Figure 8: The solid line gives the goodness membership function, mG, which is a scaling of the best practice histogram. mB = 1-mG gives the membership function for the opposite (dashed line), i.e. bad which in turn could be divided into a long- and short-lifting part, mLL and mSL respectively (corresponding to negative and positive dd values). E.g. a bunkering with dd=2.3 would get a good score of mG=0.23 and a bad score of mB=0.77 (with mLL=0 and mSL=0.77). The Bad Note that mG(dd) was derived based on what was chosen to be the best practice. It therefore gives a measure/score for how good a bunkering or supplier is with respect to this best practice. The complementary, mB(dd) = 1 - mG(dd), give a badness scoring but it will not tell weather the bad scoring comes from short- or long-lifting. Fortunately, mB can, depending on whether a sample falls into the short- or long-lifting domain, be further divided into mSL and mLL. That is, if the dd value of a sample is positive, its mSL will be greater than zero; if the dd value of a sample is negative, its mLL will be greater than zero. This enables us to calculate short- and long-lifting scores similar to the goodness score: 1 n b xL = ⋅ ∑ mxL (dd i ) , n i =1 where the subscript xL should be SL or LL, which stands for short- or long-lifting, respectively. These scores indicate the behavior of a supplier and give the risk of being short- or long-lifted. Note, by definition: bG + bSL + bLL = 1 Remember that the scores correspond to the degree of membership, i.e. how close a bunkering is to the good or bad benchmark, they can therefore be understood as weights corresponding to the proportion of good or bad. The Ugly As pointed out above, profit maximization by reporting densities at or close to the upper limit may be considered as fairly ugly behavior. The same methodology can be applied to obtain a near limit score for this behavior by constructing a membership function mNC(claimed density) = mG(claimed density - 991) where the subscript NC denotes Near Ceiling. This membership function assigns a scoring to a bunkering corresponding to the distance from the density limit and frequency of occurrence in the benchmark. To avoid categorizing a bunkering as ugly when the measured density is actually near the limit, we employ a convolution of mNC and mSL. In p. 7
  • 10. so doing we exclude all reportings that are near the limit but that are actually honest. We propose the following ugly or near limit benchmark 1 n b NC = ⋅ ∑ mSL (dd i ) ⋅ mNC (claimed densityi ) n i =1 giving the fraction of short-lifting that could be considered as near limit reporting. Further characterization of Good and Bad In order to further characterize bunkering samples within the good-, short-, or long-lifting region in the scatter plot, the average density deviations in each region could be computed by weighting each bunkering sample with the corresponding score from the membership function. For instance, the mean density difference ( dd SL ) in the short-lifting area is: ∑ (dd ) ⋅ m (dd ) i i SL i dd SL = ∑ m (dd ) i SL i in kg/m3, where the index i runs over all samples n. This means, for a given supplier we can provide information about the risk of being short-lifted, bSL, and about the expected average amount in density difference, dd SL . The method is easily extended to the other identified behaviors. 4. Application of the benchmarks As discussed above the power of the scatter plot lies in the visualization of the different density reporting schemes. Several patterns, like fixed value density reporting, systematic density reporting deviations, etc., are easily spotted. The benchmarks developed above are constructed to discriminate between some of these different reporting schemes, and to quantify the risk of being short-lifted as well as the amount of short-lifting that should be expected. The benchmarks for our examples from Table 1 are given in Table 2 below. Table 2: Standard descriptive measures together with our benchmark(s) for the geo-regions and suppliers from Table 1. The benchmarks for the data that were used to generate the best practice histogram are also included for comparison. A row, e.g. Global, is read as follows: average density difference is 0.39, std=3.92. Benchmarking against the best practice gives the following results: 43% of the samples can be regarded as good (bG), 31% qualify as short-lifting (bSL), and 26% as long-lifting (bLL). For the short-lifting samples the average density difference is 3.31, but only 7% of them were near the ceiling. dd σdd bG bSL bLL bNC dd SL (kg/m3) (kg/m3) Best Practice 0.05 1.16 0.62 0.19 0.19 0.01 1.50 Global 0.39 3.92 0.43 0.31 0.26 0.07 3.31 Canada & US West Coast 0.03 2.43 0.55 0.22 0.24 0.02 2.09 South Asia 1.22 3.35 0.41 0.52 0.07 0.26 2.44 Middle east 1.83 4.76 0.32 0.49 0.19 0.02 4.61 South America West 0.48 6.00 0.08 0.42 0.50 0.00 3.73 Supplier 1 0.12 0.95 0.71 0.09 0.20 0.02 1.70 Supplier 2 2.31 4.84 0.36 0.53 0.11 0.13 4.65 Supplier 3 2.40 1.83 0.09 0.87 0.03 0.00 2.81 Supplier 4 2.07 2.81 0.27 0.72 0.01 0.46 2.64 p. 8
  • 11. The samples used to generate the best practice histogram were included in the table for easy comparison. Note that the only way the good score can be 1 is when all samples are at dd=0, this explains why even the good score of the best practice is ‘only’ 0.62. The table shows that for the selected geo-regions the highest risk of being short-lifted is found in South Asia. The near-limit benchmark, bNC, confirms what is apparent from the scatter-plot (Figure 3), that for many suppliers it is a common practice to maximize their profit by just reporting a fuel density at or near the limit. South America West nicely illustrates the strong ability of the benchmark to identify the underlying behavior. Recall that for this area the mean was near zero, but the high standard deviation suggested large fluctuations in their reporting. Even so, no indications about the underlying reporting schemes, or the risk of being short- or long-lifted, can be deduced. In contrast, our benchmark reveals that the likelihood of actually getting what you paid for is rather slim, viz. around 8%. In the vast majority of the cases either short- or long-lifting takes place. Observe also that Supplier 1 can indeed be regarded as honest with a good score higher than best practice. Supplier 2 and 3 have comparable average density differences but their good and near limit benchmarks clearly separates them. A comparison of the benchmarks with the corresponding scatter plots will confirm that the benchmarks do indeed give a more accurate description of the honesty of suppliers than standard descriptive statistics. Figure 9: Comparison of different benchmarking methods: suppliers ranked based on their mean density difference, dd , (top), and their corresponding good score, bG (bottom). Observe that ranking with respect to the mean would result in about 1057 good suppliers (| dd | ≤ 0.7). Our scoring with respect to best practice, (0.62), reveals however that about 150 are definitively bad (left-hatched area), even below global average (0.43). 539 are rally good (equal to or better than best practice, right-hatched area) whereas the rest are located between global average and best practice. Observe also that simply relying on the mean to characterize suppliers would label several of them as bad even though their good score is above global best practice. Supplier ranking In Figure 9 (top) all suppliers of RMG380 fuel worldwide are ranked with respect to their mean density difference, dd . When using | dd | ≤ 0.7 as a criterion for goodness then the mean would imply there are about 1057 good suppliers. Applying this mean dd to our benchmarking method results in the continuous bell-shaped curve (blue). If dd is indeed an unbiased measure for the goodness of suppliers, then their scorings should be closely scattered around this curve – this is, however, not at all the case. This discrepancy stems from the unreliability of the mean (or standard deviation) as a trustworthy measure whenever the underlying distributions are non-normal or outliers have a large effect. The figure visualizes clearly that 150 of the apparently good suppliers are actually quite bad, i.e. even below global average (left hatched area), whereas just about the half (539) can be considered equal to or better than best practice (right hatched area). Observe also that many of the apparently bad suppliers (those with | dd | > 0.7) are actually better then their reputation as most of them are above the bell shaped curve, some are even above best practice – further emphasizing the need for an unbiased score like bG. Development over time Following the development of the score of a supplier, port, or region over time may give valuable indications about what may be expected in the near future. For instance, Figure 10 shows the development of the bG score for two major ports, Singapore and Rotterdam, over the past 25 years. p. 9
  • 12. Figure 10: Time series of goodness scores bG for two large ports in different geo-regions. Data from all available suppliers are included. Dots are quarterly time intervals while the stippled lines are year averages. Each dot is based on a varying number of ‘raw data points’, i.e. the number of bunkerings during the corresponding time interval. Observe that from the beginning of the 1980s and up to the mid 1990s the quality of the density reporting was increasing. It then leveled off until 2008, when a change in behavior occurred – perhaps triggered by the onset of the global recession? 5. Discussion and concluding remarks This paper has two main focus areas: the construction of a realistic benchmark and the development of a methodology that allows comparing one or more samples with the benchmark. The examples given above demonstrate the capabilities of our approach. It is more powerful than standard descriptive statistics (e.g. dd and σdd), as it is less sensitive to outliers and is well suited for small datasets and even single numbers. Recall that our benchmarks give better quantifications than the dd and σdd together. Further, it makes no assumptions about the data distributions. There are actually no restrictions to the probability distribution of the underlying data – any distribution is allowed. Only some weak requirements apply to the membership function (e.g. increasing/decreasing). The methodology is quite generic and could in principle be applied to any kind of comparison task, i.e. benchmarking. The fact that the benchmark is based on a probability density function, and that a probabilistic interpretation of the scoring is possible, is an aid to the user’s intuition, making it easier to understand and interpret the results. Once a best practice histogram has been generated, a membership function can be derived, after which benchmarking is easily done. Subjectivity is only involved in the definition of what can be regarded as best practice, as there is no a priori correct answer to this problem. Our approach has been to ask: what should be expected of a good supplier? And by answering this question we have picked suppliers that best match our expectations. Outliers and incorrect claims near the density limit are of course not wanted from a good supplier, hence their removal from the best practice data set. From a user perspective the main strengths of the presented benchmark are: • Institutive and easy to understand. • Applicable for few or even singleton samples. • Able to pinpoint different density reporting schemes. In closing let us return to the extent and amount of global short-lifting which is estimated to be around 1.7 ton per bunkering on average. Thanks to our benchmarking methodology we can now provide a more detailed picture of the situation. First, 43% of the bunkerings could be considered to be loss neutral (bG=0.43), since they are within best practice. Second, 26% are instances of long-lifting (bLL=0.26), where the buyer gains on average 1.8 ton. Third, 31% could be regarded as short-lifting (bSL=0.31), with an average buyer loss of 2.5 ton per bunkering. This highlights the importance of choosing the right supplier. The presented benchmark methodology is easily extendable to other (quality and economical) bunkering parameters like viscosity, sulfur or water content, as well as a series of physical and chemical properties. The methodology will be the basis for a benchmarking web tool, scheduled for release by DNVPS later this year. Figure 11: Bunker surveyor on board a ship. Photo by DNV Petroleum Services (used with permission). p. 10
  • 13. References Bhattacharyya, G., Johnson, R. (1977), Statistical Concepts and Methods, Wiley, New York. DNV (2010). Total fuel management, http://www.dnv.com/industry/maritime/servicessolutions/fueltesting (accessed 13. Oct. 2010). EPA (2008), Global Trade and Fuels Assessment -Future Trends and Effects of Requiring Clean Fuels in the Marine Sector. Assessment and Standards Division Office of Transportation and Air Quality, U.S. Environmental Protection Agency. EPA420-R-08-021, November 2008. Eyring, V., Isaksen, I.S.A., Berntsen, T., Collins, W.J., Corbett, J.J., Endresen, O., Grainger, R.G., Moldanova, J., Schlager, H., Stevenson, D.S. (2010), “Transport impacts on atmosphere and climate: Shipping”, Atmospheric Environment, Volume 44, Issue 37, December 2010, pp. 4735-4771. Hastie, T., Tibshirani, R., Friedman, J. (2009), The Elements of Statistical Learning: Data Mining, Inference, and Prediction (second edition). Springer, New York. IEA (2010). World Energy Outlook 2010. International Energy Agency, OECD Publishing, Paris. IMO (2009). Prevention of Air Pollution from Ships. International Maritime Organization, Marine Environment Protection Committee. MEPC 59/INF.10, 9 April 2009. Lowen, R. (1996), Fuzzy Set Theory, Kluwer Academic Publishers, Dordrecht. Self, K. (1990), “Designing with fuzzy logic”, IEEE Spectrum, Vol 27, No 11, November 1990, pp. 42-44, p. 105. Terano, T., Asai, K., Sugeno, M. (1987), Fuzzy Systems Theory and its Applications. Academic Press, San Diego. Turksen, I.B. (1991), “Measurement of membership functions and their acquisition”, Fuzzy Sets and Systems, Vol. 40, pp. 5-38. p. 11
  • 19.
  • 20. Figure 5 Limit max. cheat area 991 981 Bad Claimed density 971 Good Suspicious Limit 961 961 971 981 991 Measured density
  • 21. Figure 6 Limit = max. cheat line = + - e t lin ea o ch n Limit
  • 22. Figure 7 Probability Density deviations
  • 23. Figure 8 Long-lifting 1 Short-lifting mG mB=1-mG Bad: mB =1-0.23 = 0,77 Good: mG = 0.23 0 dd = 2.3 density difference
  • 24. Figure 9 10 5 Ca. 1057 suppliers 0.7 total number 0 of suppliers - 0.7 0 500 1000 1500 2000 2500 claimed – measured density -5 1 539 Some “bad suppliers” are actually very good ! 0,75 Best practice score 0,5 Global average score Good score Some “bad suppliers” are actually slightly better ! 0,25 Many “good suppliers” are actually quite bad ! 150 0 0 500 1000 1500 2000 2500