UiPath Community: Communication Mining from Zero to Hero
Aed1222 lesson 1 and 3
1. Introduction to Statistics for Built
Environment
Course Code: AED 1222
Compiled by
DEPARTMENT OF ARCHITECTURE AND ENVIRONMENTAL DESIGN (AED)
CENTRE FOR FOUNDATION STUDIES (CFS)
INTERNATIONAL ISLAMIC UNIVERSITY MALAYSIA
2. What is/are statistics?
1. Used as a plural term that refers to numerical facts or data.
The term statistics is commonly used in two ways:
3. Statistics: as a set of numerical facts or data about the
usage of a particular website
4. Statistics: as a set of numerical facts or data about the
availability of lands, and the ownership of minerals in the
state of Alberta (USA).
5. 2. Used as a singular (more broad) term that refers to the
science of designing studies, gathering data, and then
classifying, summarizing, interpreting, and presenting
these data to explain and support the decisions that are
reached.
In other words, the term statistics used as a singular term
refers to a science or a field of study that covers the following
activities:
Designing
studies
Gathering data
Analyzing the
data
Presenting the
data
Reaching a
decision
7. Step one: Identifying the problem
Statistical Problem-solving cont.
Step two: Gathering available facts
Step three: Gathering new data
Step four: Classifying and organizing the data
Step five: Presenting and analyzing data
Step six: Making a decision
8. Important terms used in the
study of statistics
A population
A sample
A parameter
A statistic
Is the complete collection of objects or individuals under study
Is a portion or subset taken from a population
Is a number that describes a population characteristic (such as
weight or height…)
Is a number that describes a sample characteristic
A variable Is a characteristic that can be expressed by a number. The value
of this characteristic is likely to vary (change) from one item in
the data set to the next, for example: age, gender, weight,
height…
9. P O P U L A T I O N
S a m p l e
P O P U L A T I O N
S a m p l e
Sample selection
Judgment about
the unknown
Population.
10. The two essential parts of the subject
of statistics
1. Descriptive statistics: covers the process of
collecting, classifying, summarizing and
presenting data.
2. Inferential statistics: refers to the process of
arriving at a conclusion about a population based
on information obtained from a sample.
The subject of statistics can be viewed as a process
that is broken down into two parts:
14. Start of the Study
Classification,
summarization and
processing of data
Presentation and
communication of
summarized information
Use sample information to
make inferences and draw
conclusions about the
population
Use census data to analyze
the population
End of the Study
Or
DescriptiveStatistics
Inferentialstatistics
An overview of
Descriptive Statistics
and Inferential
Statistics
15. The need for sampling
Sampling (or taking samples) is an activity that occurs on a daily
basis, and is not used by statisticians alone.
Sampling in daily life may not be as sophisticated as sampling
done for formal statistical studies.
However, they still serve the fundamental purpose of providing
information for judgement.
Examples of daily-life sampling include:
1. A chef tasting food to see if it has the desired flavor.
2. A car buyer test-driving a car to compare it with others…
3. Pieces of rock being analyzed to determine the availability of
a certain mineral in an area.
16. Sampling is needed to provide sufficient information so that
inferences can be made about the characteristics of a
population.
A population can be either finite or infinite:
A finite population is one where the total number of members
(items, measurements, etc…) is fixed and could be listed.
An infinite population has an unlimited number of members.
The need for sampling cont.
21. Sample data vs. Census data
Complete information acquired through a census is
generally desirable.
If every item in a population is examined, we can be
confident in describing the population.
However:
• What you want is not necessarily what you can get.
• Census data are a luxury in most situations and are
usually not available for studying a population.
Therefore:
• Data gathering by sampling (rather than census taking) is
the rule rather than the exception because of the
following sampling advantages:
22. Advantages of sampling
1. Cost: any data-gathering effort incurs costs for such things
as mailings, interviews, and data processing. The more data
to be handled, the higher the costs are likely to be.
2. Time: speed in decision making is often crucial, and
carrying out a census is too time-consuming… an example?
3. Accuracy of sampling: sometimes a small sample
provides information that’s almost as accurate as the results
obtained from a complete census. How? There are sampling
methods that produce samples that are highly
representative of the population, and in such cases, larger
samples will not produce results that are significantly more
accurate.
23. 4. Other advantages: sometimes the resources may be
available for a census, but the nature of the population
requires a sample.
For example, an environmental protection agency is willing
to sponsor a study of the entire population of a certain
whale species, but the migration movement, births and
deaths can prevent a complete count. One way to solve this
problem is to study a small area of the ocean and use the
results to make a projection (inference).
In addition to the above, destructive test are often used to
judge product quality. For example a car manufacturer may
want to know the safety of one of their vehicles… what
would the manufacturer do?
Advantages of sampling cont.
25. Non-probability sampling techniques
A non-probability sample is one in which the judgment of the
experimenter, the method in which the data are collected, or
other factors could affect the results of the sample.
Items of the sample are chosen based on unknown or non-
probabilities
The interpretation of such samples is always questionable:
“was the discovery from the sample true? Or was it just the
result of the way the sample was taken?”
There are 3 common types of non-probability samples:
27. Judgment sampling technique
Judgement Sample selection based on the opinion of one or
more persons who feel sufficiently qualified to identify items for
a sample as being representative of the population.
Any sample will be taken based on someone’s expertise about
the population.
For example, a politician picks a certain voting district as reliable
places to measure the public’s opinion of his/her political party…
A judgment sample is convenient, but its difficult to assess how
closely it measures reality.
However, it can still be useful depending on the expertise of the
person(s) involved in determining the sample.
28. Voluntary sampling technique
Sometimes questions or questionnaires
are distributed to the public by
publishing them in print media, the
internet or radio/television. Such
questionnaires or polls produce
voluntary samples and attract only those
who are interested in the subject
matter…
Obviously, results obtained from such
samples are unreliable… Why?
29. Convenience sampling technique
Often people want to take an “easy”
sample.
Such samples where the ease with which
the sample is taken are called
Convenience Samples.
For example, a surveyor will stand in one
location and ask passersby their question
or questions.
A student working on a project will ask an
entire class to fill out a survey.
Would standing outside a bank asking
customers what they thought of the
bank’s services give a complete picture?
Why?
30. Probability sampling techniques
A probability sample is one in which the chance of selection of
each item in the population is known or calculable before the
sample is picked.
Probability samples are more reliable than Non-Probability
samples.
There are 4 common types of Probability samples that are used
to gather new data:
32. Simple random sampling technique
If a probability sample is chosen in such a way that each item in
the population has an equal chance of being selected, then the
sample is called a simple random sample.
Assume that every item in a population is numbered, and each
number is written on a slip of paper. If all the slips are placed in a
bowl and mixed, and if a group of slips is then picked, the items
represented by the selected slips constitute a simple random
sample.
A more practical approach is often to number the items in the
population, and use a computer programme to generate a table
of random numbers, and then select a sample from that list…
33. Systematic sampling technique
Suppose we have a list of 1000 items in a finite population, and
that we want to pick a probability sample of 50 items.
We first have to number the items from 0-1000. Then, we can
use a random number table to pick one of the first 20 items
(1000/50 = 20) on our list.
If the table of random numbers gives us number 16, then the
16th item in the list will be the first to be selected. We would
then pick every 20th name after this random start (the 36th item,
the 56th item etc…) to produce a systematic sample.
35. Stratified sampling technique
If a population is divided into homogenous groups (or strata),
and then a sample is drawn from each group to produce an
overall sample, this overall sample is known as a stratified
sample.
We can stratify data based on race (in the case of Malaysia),
gender, employment category, type of programme (1 year vs. 1.5
years at CFS) and so on…
Some prior knowledge of the structure of the population is
necessary for selecting a stratified sample.
37. Cluster sampling technique
A cluster sample is one in which the individual units to be
sampled are actually groups or clusters of items.
Its assumed that the individual items within each cluster are
representative of the population.
Consumer surveys of large cities often employ cluster sampling.
Usually a city is divided into small blocks, each block containing a
cluster of households to be surveyed.
A number of clusters are selected for the sample, and all the
households in the selected clusters are surveyed.
The benefits of this sampling method is savings in cost and time,
since an interviewer will need less energy and money if he/she
stays within a specific area rather than travel across the city…
During this course we will explore each of the above activities. We will discuss basic methods of data (primary & secondary) collection, basic data analysis (measures of central tendency, deviation and others), and basic presentation techniques or methods commonly used in the field of statistics.. These include frequency distribution tables, bar and pie charts, contingency tables, histograms and so on…Give the example of the study on the performance of two cars: design the study by deciding on the variables to look at: power, torque, petrol consumption, emissions…Start gathering data from secondary and primary (experiments and interviews maybe) sources.Analyze the stockpiles of data…Present the findings and then pass a judgment!
Examples of a parameter:The average weight of all the packages in the shipment, or the average lifetime of all batteries of a certain brand. is a single number. This number is a parameter.Thus, if a percentage figure or an average value describes a population, it is a parameter.
Ask the students to share examples of sampling in daily life before showing them the ones here…