8 Guiding Principles to Kickstart Your Healthcare Big Data Project
Geoscientific Data Management Principles
1.
2. Introduction
The quality and consistency of geoscientific data
management practices across the minerals
exploration and mining industry vary greatly.
This occurs for a variety of reasons;
• Budgetry constraints
• Technical knowledge constraints
• Lack of appreciation of the value of data within the organisation
• Lack of an IT Dept (or lack of integration with the IT group)
• Staff turnover
• Technology change-overs/updates
• “Islands” of accountability
2
3. Introduction
On the following slides are listed the primary
factors conducive to best practice geoscientific
data management.
Each principle is discussed first to allow for a
clear understanding of what the principle
involves and why, along with, in some instances,
an example of it’s application or requirement.
4. Principle 1: Centralised Data Management
Simply put, this is the practice of having all geoscience data in a
centralised location, preferably not on an operations site, on fully
maintained/monitored servers, with fully tested back-up systems
in a industry standard server room. This data may be derived from
site/project based databases, or replicated to them – the main
point is that the version on site is not the only copy.
It is not strictly necessary for the site copy of the data to be
maintained however experience has shown that site personnel
have a significantly improved attitude towards the quality of the
data when a site copy is maintained as they have a stronger sense
of ownership of the data and view the centralized storage as
merely a backup.
5. Principle 2: Standard Geoscientific Legend
This is a standard set of observational data codes used across all
sites/projects. Multiple legends within one organisation
frequently cause problems both in the database and at the point
of data capture.
Problems at the point of capture emerge as the different legends
will “cross-breed” as a geologist transferred to a new site will
sometimes use codes from his previous posting either out of
preference or by force of habit. This results in contaminated data
that is frequently useless if this practice is allowed to persist for
some time.
Problems at the database end are the result of either differing
legend codes being stored in the same field, or multiple fields
being created for each data type to cater for each legend. The
former is confusing whilst the latter is both inefficient and
confusing.
6. Principle 3: Standard Geoscientific Data Model
This is simply the use of one data model across the organisation
rather than having a different data model for each mining
operation or exploration project. Some companies run one data
model for their mining operations and another for their
exploration.
Ultimately these data sets should be coming together so that all
data for a project, deposit or terrane is in one location and the
maximum value can be achieved from analysis of the data.
Complete data sets such as this are essential for understanding
the geological setting and processes involved in forming the
deposit, thereby allowing for predictive tools for discovering
another.
7. Principle 4: Same System Digital Data Logging
This is simply the use of one data model across the organisation
rather than having a different data model for each mining
operation or exploration project. Some companies run one data
model for their mining operations and another for their
exploration.
Ultimately these data sets should be coming together so that all
data for a project, deposit or terrane is in one location and the
maximum value can be achieved from analysis of the data.
Complete data sets such as this are essential for understanding
the geological setting and processes involved in forming the
deposit, thereby allowing for predictive tools for discovering
another.
8. Principle 5: Direct Data Transfer
This principle requires that the data is either transferred directly
from the data collection tool to the database, or is done via a
secure facility e.g., Acquire’s Briefcase mechanism.
Systems where data is exported to a text-based file is open to,
and frequently subjected to, manual editing which is outside of
the validation controls inherent in the system. This can result in
contaminated data in the database, or difficulty in loading the
data which then requires support from specialized users.
Furthermore, these files are commonly not transferred
immediately to the database and therefore are exposed to the
risks of loss and multiple versions.
A further point here is that importers must be constructed, have
validation coded in, and subsequently be maintained to enable
the importing of the exported, text-based file
9. Principle 6: Digital Sample Submission
• This is where all sampling data is derived from a digital data
collection tool and submitted to the lab digitally. Where physical
sampling sheets are required, there should be a facility associated
with the data collection tool to provide a printed version.
• While barcode tags are now a common technology for assisting in
managing samples they still have issues of; having to be manually
handled at several points in the transport and processing of the
sample, and; can be difficult to get a reading from when dirty and/or
wet.
• It is recommended that RFID technology be used to manage samples
as this eliminates the multiple-point manual handling of the samples
to obtain their sample numbers. RFID tags are now extremely
affordable and readily available. Even in a small hole of a hundred
samples, the time saved by avoiding having to find and scan each
barcode is significant. Depending how samples are placed, this may
also remove the risk of injury through bending over or physically
lifting the samples.
10. Principle 7: Automated Assay Loading
This principle involves the assays being imported directly into the
database without the opportunity to be manually edited by personnel.
The idea behind this is very similar to that behind Principle 5. Direct
Data Transfer – the analogy between data and a piece of medical
equipment for surgery; the more hands that come into contact with it,
the dirtier it gets. By avoiding personnel having the ability to manually
interact with, or edit, the data before it gets to the database, the cleaner
it is.
There are a multitude of ways of achieving this;
• Emails from the lab may be delivered to a common folder where a
batch process extracts and loads them into the database,
• The laboratory concerned my have a portal or some other web-
hosted access through which the DHDMS can acess the assay data
for loading.
• The laboratory has direct access to the DHDMS and loads the data
directly.
11. Principle 8: Drillhole Data Staging
The recommendation here is that the data is loaded into the
database but is not available to general users or any
extraction/reporting facilities until it has been approved (i.e. checked
that all relevant data is present, QAQC is acceptable, etc). Ultimately
what is to be avoided is unapproved data being used in what may be
critical calculations or decisions.
An example would be a geochemist including assay data in an extract
he ran, when later it is revealed by the geologist responsible for the
data that it in fact failed it’s QAQC and was subsequently re-assayed
by the laboratory. Meanwhile the geochemist is unaware that he has
some poor quality data that has been superceded.
While this principle is intimately linked to the following one and may
at first appear to be the same, they are in fact separate as many
companies apply Principle 9 but not Principle 8.
12. Principle 9: Drillhole Signoff/Approval
This principle is centred around the assigning of accountability for
the quality of the data to the person that responsible for it. This is
the logical subsequent step to the previous point and records the
name of the approver against the data.
Elements of a sense of ownership of the data, as discussed in
Principle 1, are equally valid here.
13. Principle 10: Audit Trail Facility
The recommendation here is that all inserts/deletes/mods made
in the database are logged (date/time, userid, previous value) to
sufficient detail to allow for rollback to occur if required. This
then allows for the correction of data contaminated whilst in the
database whether by accident or malicious intent
14. Principle 11: External Database Audits
This is a self explanatory principle – external audits provide an
independent assessment of the quality of the data stored and the
processes used in obtaining and approving it. Remembering that
large investment decisions may be made on the basis of this data,
it is essential that this process occur on a semi-regular basis.
15. Principle 12: Database Photo Management
While storing photos within a database is a recent technology (e.g. SQL
Filestream), it is recommended to be adopted for the following reasons;
• Current folder-based systems do not easily allow for integration into
other systems or software packages
• Accidental or malicious deletion may not be recoverable in folder-
based systems
• There is currently no useful way to store metadata about the
image(s)
• Folder-based systems do not cater well for ATV/OTV images or
images from emerging technologies such as Hylogger.
Standardised Folder-based Photo Management
• If, due to budget or technology restrictions it is not possible to
implement Principle 12, then a folder-based system is still better
than no system at all. In this case it is imperative that the correct
permissions be set up on the folders/system to minimize the risk of
accidental or malicious deletion. Further steps should also be taken
to regularly backup the system for the same reason.
16. Principle 13: One GIS Software Standard
A simple principle, though one that often gets overruled by
personnel in islands of accountability standing their ground and
insisting that that need a particular system despite the fact that
no one else in the organisation is using it.
The advantages are obvious;
• Potential savings on licensing costs
• Elimination of conflict with IT groups who logically want to
reduce the number of applications they need to cater for
• Data tends to get doubled up, i.e. stored for each system,
resulting in the potential for multiple, unsynchronized data
(“multiple truths”).
• Constantly converting data for one package from another
allows for the possibility of mistakes and contamination,
particularly where coordinate system conversions are involved.
17. Principle 14: Controlled GIS Data
This principle is primarily concerned with avoiding multiple truths
and lost data. In the application of this principle all GIS data is
published to a structured area and users are expected to access
this area for their GIS data. Other data sets brought into the
organisation must go through this process of being published
prior to use.
Implementation of this principle may be done simply with a folder
structure where proper permissions have been set to avoid
deletion or over-writing of the published data. A more
sophisticated option would be an environment such as Sharepoint
where data can be checked in and out with full version control.
18. Principle 15: Centralised Grid Transformations
Grid and coordinate conversions are a constant source of error
and contamination within many organisations. Implementation of
this principle involves a sophisticated system where grid
definitions are entered into a database by surveyor and their
userid is recorded against the entry in much the same vein as in
Principle 9. The system must be capable of versioning these
definitions as they do change over time.
The database then produces a definition file that is accessed by
the conversion software. The apparently complex part then is
integrating your GIS and other packages to utilize this conversion
software to do all coordinate conversions.
While the above does sound overly complex the truth is that it
the architecture and execution are not particularly difficult. What
this then allows for is;
Continued on next slide
19. Principle 15: Centralised Grid Transformations (cont)
• Elimination of multiple versions of coordinate conversion
formulas and macros that once released are impossible to
control.
• Following on from the above is the elimination of potentially
expensive mistakes caused by using the wrong or outdated
conversion facility.
Standardised Grid Transformations
• If it is not possible to implement Principle 15 as described
above, then it recommended that surveyor approved
transformation parameters or formulae are published to a
central area where they can be accessed by users, in much the
same way as discussed in Principle 14. This area is likely a
folder structure and as such should have the correct
permissions to prevent deletion or editing except by the
surveyors.
20. Principle 16: Controlled Geophysics Data
The principle in this instance is very similar to Principles 14 & 16
in that approved data is published to a central area, protected by
permissions , where users go to access the processed geophysical
data.
With regard to the raw geophysical data, while this is almost
useless to anybody but the geophysicists, the data should still be
stored in a protected folder system to prevent the contamination
or loss of the primary, unprocessed data.
21. Principle 17: Database Driven Tenement Management
The principle in this instance is very similar to Principles 14 & 16
in that approved data is published to a central area, protected by
permissions , where users go to access the processed geophysical
data.
With regard to the raw geophysical data, while this is almost
useless to anybody but the geophysicists, the data should still be
stored in a protected folder system to prevent the contamination
or loss of the primary, unprocessed data.
22. Principle 18: Exploration Embedded IT People
This principle involves IT specialists embedded in, and paid for by,
the exploration group but that have a reporting line through to
the company’s IT department. This is the preferable choice as the
personnel are fully exposed to the exploration requirements,
challenges and planning schedule but are grounded in the IT
requirements of standardisation where feasible and security
issues.
Exploration-centric IT People
Should Principle 18 not be a feasible option, then the
organisation’s IT group should have support and architecture
people in which a significant part of their focus is the exploration
group and is familiar with their requirements, sometimes rapidly
changing requirements and the limitations/demands of the
remote environs in which exploration personnel frequently work.