1. UNIVERSITY OF ARKANSAS AT LITTLE ROCK
Information Quality Program
Information Quality and File System
Management at the Department of
Arkansas Heritage
BY T.M. “SHELLEY” KEITH
2. UNIVERSITY OF ARKANSAS AT LITTLE ROCK
Information Quality Program
Department of Arkansas Heritage
7 “arm” state organization, plus central director’s office
◦ Each arm with its own mission, staff.
◦ Some have their own regulatory requirements.
Identified issues with file system, email
◦ Lack of naming conventions
◦ Operational inefficiencies
◦ Concerns about waste, archives, backups, resources
Digital photo storage
◦ space, conventions, backups
Training
IQ AND FILE SYSTEM MANAGEMENT AT DAH 2
Step 1: Define Business Need and Approach
3. UNIVERSITY OF ARKANSAS AT LITTLE ROCK
Information Quality Program
Approach Rationale
Quantify issues
◦ Verify problems identified by leadership
◦ What other problems exist that might be contributing to or more critical than what’s been
reported?
Prioritize
◦ Triage identified issues and begin understanding the source
Define improvement
◦ What is “better” for this organization?
Plan
◦ What will it take to start making progress toward “better?”
IQ AND FILE SYSTEM MANAGEMENT AT DAH 3
4. UNIVERSITY OF ARKANSAS AT LITTLE ROCK
Information Quality Program
Project Approach
Establish A Data Quality Baseline
◦ Step 1: Define Business Need and Approach
◦ Step 2: Analyze Information Environment
◦ Step 3: Assess Data Quality
◦ Step 4: Assess Business Impact
◦ Step 5: Identify Root Causes
◦ Step 6: Develop Improvement Plans
◦ Step 10: Communicate Actions and Results
Goal
◦ Uncover problems
◦ Determine which ones are worth
addressing
◦ Identify root causes for high priority issues
◦ Develop realistic action plans
IQ AND FILE SYSTEM MANAGEMENT AT DAH 4
McGilvray pp 242-243
5. UNIVERSITY OF ARKANSAS AT LITTLE ROCK
Information Quality Program
Project Goals
IQ AND FILE SYSTEM MANAGEMENT AT DAH 5
1. Assess the current ecosystem from an Information Quality perspective.
I. Primary Dimensions
I. Duplication
II. Ease of Use & Maintainability
III. Data Specifications
2. Provide a set of formal recommendations for naming conventions.
I. Folder names and file system organization
II. Metadata
III. File names
3. Provide a path to and structure for unified, consistent, file system
governance.
Step 1: Define Business Need and Approach
6. UNIVERSITY OF ARKANSAS AT LITTLE ROCK
Information Quality Program
Department
of Arkansas
Heritage
Director’s Office
Museums
Historic Arkansas Museum (HAM)
Delta Cultural Center (DCC)
Mosaic Templars Cultural Center (MTCC)
Old State House Museum (OSH)
Heritage Resource
Agencies
Arkansas Arts Council (AAC)
Arkansas Natural Heritage Commission (ANHC)
Arkansas Historic Preservation Program (AHPP)
The Organization
IQ AND FILE SYSTEM MANAGEMENT AT DAH 6
Step 1: Define Business Need and Approach
7. UNIVERSITY OF ARKANSAS AT LITTLE ROCK
Information Quality Program
DAH Network Access
Each agency has a dedicated network drive (T)
Each agency has access to a central shared drive (S)
Each user has their own personal network drive (U)
IQ AND FILE SYSTEM MANAGEMENT AT DAH 7
S:
AAC
ANHC
Central
MTCC
AHPP
DCC
HAM
OSH
Step 2: Analyze the Information Environment
8. UNIVERSITY OF ARKANSAS AT LITTLE ROCK
Information Quality Program
Project Plan & Tools
Plan
◦ File System Review
◦ Manual evaluation of the file names and folder structures across
the network.
◦ Stakeholder Survey
◦ Understand perceptions across agencies and user types
◦ Administrative, Professional, Leadership
◦ Identify issues throughout the organization
◦ Uncover root causes
◦ File System Scan
◦ Quantitative measurements for the health of the file system
Tools
◦ MailChimp
◦ Google Form
◦ Microsoft Excel
◦ DiskBoss Pro
IQ AND FILE SYSTEM MANAGEMENT AT DAH 8
9. UNIVERSITY OF ARKANSAS AT LITTLE ROCK
Information Quality Program
Stakeholder Survey
37 question
◦ 46 input opportunities once broken down into survey tool
◦ 5 required
112 responses of 213 employees emailed (53%)
Questions specific to Leadership & IT staff
Applicable to:
◦ Dimensions of Data Quality
◦ Business Impact Techniques
◦ Information Life Cycle
◦ 10-Step Process
Organized by:
◦ Theme
◦ Employee type
◦ Agency
IQ AND FILE SYSTEM MANAGEMENT AT DAH 9
10. UNIVERSITY OF ARKANSAS AT LITTLE ROCK
Information Quality Program
Stakeholder Survey – IQ Map
Information Life
Cycle
Business Impact
Technique
Dimension(s) of
Data Quality
10-Step Process Theme
Plan Usage Ease of Use Define Business Need
& Approach
General information
Obtain Anecdotes Duplication Analyze Information
Environment
Time spent
on/frequency of
encounters
Store & Share Cost of Low-Quality
Data
Timeliness &
Availability
Assess Data Quality Preferences
Maintain Process Impact Perception,
Relevance, & Trust
Assess Business
Impact
File storage behaviors
Apply Ranking & Prioritization Data Specifications Identify Root Causes Regulatory awareness
Dispose Develop Improvement
Plans
IQ AND FILE SYSTEM MANAGEMENT AT DAH 10
11. UNIVERSITY OF ARKANSAS AT LITTLE ROCK
Information Quality Program
Survey Responses – Agency Information
Arkansas Arts Council, 11, 10%
Arkansas Historic Preservation
Program, 21, 19%
Arkansas Natural Heritage
Commission, 18, 16%
Delta Cultural Center, 3, 3%Director's Office, 19, 17%
Historic Arkansas Museum, 16,
14%
Mosaic Templars Cultural
Center, 8, 7%
Old State House Museum, 16,
14%
RESPONSES BY AGENCY
IQ AND FILE SYSTEM MANAGEMENT AT DAH 11
12. UNIVERSITY OF ARKANSAS AT LITTLE ROCK
Information Quality Program
Survey Responses – Category
Leadership,
13, 12%
Administrative,
29, 26%
Professional, 70,
62%
EMPLOYEE CATEGORY
IQ AND FILE SYSTEM MANAGEMENT AT DAH 12
Director's
Office,
19, 17%
Museums,
43, 38%
Heritage
Resource
Agencies,
50, 45%
RESPONSE BY AGENCY TYPE
13. UNIVERSITY OF ARKANSAS AT LITTLE ROCK
Information Quality Program
Survey Response – File Types
IQ AND FILE SYSTEM MANAGEMENT AT DAH 13
102
75
71
26
19
14
14
7
7
6
4
4
2
2
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
FILE TYPES
14. UNIVERSITY OF ARKANSAS AT LITTLE ROCK
Information Quality Program
Survey Responses – File Findability
10
7 8
2
1
2
6
4
14
20
20
12
2
1 2 3 4 5
ORDERED EASY (1) TO HARD (5)
EASE - BY CATEGORY
Administrative Leadership Professional
IQ AND FILE SYSTEM MANAGEMENT AT DAH 14
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
1 2 3 4 5
BY CATEGORY PERCENTAGE
Professional Leadership Administrative Average
15. UNIVERSITY OF ARKANSAS AT LITTLE ROCK
Information Quality Program
Survey Responses – Time & Frequency
26%
reported recreating existing
files because they couldn’t
find the file they needed…
IQ AND FILE SYSTEM MANAGEMENT AT DAH 15
25%
reported being unable to
find the source file for an
archive document type like
PDF…
26%
reported having to ask
someone to email a file
because they can’t find it or
it’s stored where they don’t
have access…
…at least once a month.
16. UNIVERSITY OF ARKANSAS AT LITTLE ROCK
Information Quality Program
32%
reported encountering files that were
supposed to be current, but actually
contained outdated or incorrect
information…
23%
reported discovering conflicting copies
of the same file…
IQ AND FILE SYSTEM MANAGEMENT AT DAH 16
…at least once a year.
Survey Responses – Time & Frequency
17. UNIVERSITY OF ARKANSAS AT LITTLE ROCK
Information Quality Program
Survey Responses – Time & Frequency
20 hours or more,
1, 1%
Less than 10 hours,
7, 6%
Less than 20 hours,
3, 3%
Less than 5 hours,
98, 90%
TIME PER WEEK
IQ AND FILE SYSTEM MANAGEMENT AT DAH 17
18. UNIVERSITY OF ARKANSAS AT LITTLE ROCK
Information Quality Program
Survey Responses – File Storage Behaviors
Yes, 86%
Yes, 53%
No, 14%
No, 47%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Local External
STORING FILES ON NON-NETWORK DRIVES
IQ AND FILE SYSTEM MANAGEMENT AT DAH 18
19. UNIVERSITY OF ARKANSAS AT LITTLE ROCK
Information Quality Program
Survey Responses – Regulatory Awareness
IQ AND FILE SYSTEM MANAGEMENT AT DAH 19
No, 33, 30%
Yes, 76, 70%
ORGANIZATION WIDE
No Yes
9
2
22
18
11
47
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Administrative Leadership Professional
BY CATEGORY
No Yes
20. UNIVERSITY OF ARKANSAS AT LITTLE ROCK
Information Quality Program
Survey Responses - Preferences
No
39%
Yes
61%
PRESENCE OF FILE NAMING PREFERENCES EXAMPLES
[project number].[artifact_id]
[location]_[year]_[description]
[historic resource number]-[historic name]-
[description]
IQ AND FILE SYSTEM MANAGEMENT AT DAH 20
21. UNIVERSITY OF ARKANSAS AT LITTLE ROCK
Information Quality Program
File System Evaluation – Drive Scans
IQ AND FILE SYSTEM MANAGEMENT AT DAH 21
Measures Drives Totals
Agency T Drives
S Central AAC ANHC AHPP OSH MTCC HAM DCC
Wasted Space
(GB)
9.2 0.33163 28.05 153.31 184.3 28.55 244.8 80.25 9.26 728.79
by last accessed 1-2 years 1-3 months 1-2 years 3-5 years 3-6 months 1-2 years 1-3 months 6-12
months
6-12
months
by user name Administrators Jessica.Cren
shaw
Administrators Administrators Shelle Administrators bryan.mcdade Patricia
by file type JPG JPG JPG JPG JPG TIF TIF JPG TIF
Disk Space (GB) 305.54 38.04 147.37 1380 1690 388.02 754.64 418.68 55.03 5122.2
by last accessed 1-2 years 1-2 years 2-3 years 3-5 years 1-2 years 3-5 years 6-12
months
6-12
months
by modified 5+ years 2-3 years 2-3 years 5+ years 5+ years 5+ years 5+ years 5+ years 5+ years
by user name Administrators Administrators Scotty Administrators Administrators Administrators jaime
by file type TIF VHD JPG JPG JPG TIF MTS JPG TIF
% wasted 3% 1% 19% 11% 11% 7% 32% 19% 17% 14%
Number of Files 68739 16805 85890 387661 409190 60059 114067 140869 27018 1283280
by last accessed 1-2 years 1-2 years 3-5 years 3-5 years 1-2 years 3-5 years 6-12
months
by modified 3-5 years 5+ years 5+ years 5+ years 5+ years 5+ years 5+ years
duplicate files 9101 1046 11699 88089 56439 5080 8613 33555 3086 213622
% duplicate 13% 6% 14% 23% 14% 8% 8% 24% 11% 17%
22. UNIVERSITY OF ARKANSAS AT LITTLE ROCK
Information Quality Program
Wasted Space
IQ AND FILE SYSTEM MANAGEMENT AT DAH 22
3%
1%
19%
11% 11%
7%
32%
19%
17%
0%
5%
10%
15%
20%
25%
30%
35%
% wasted
WASTED SPACE PER DRIVE
S Central Arts ANHC AHPP OSH MTCC HAM DCC
23. UNIVERSITY OF ARKANSAS AT LITTLE ROCK
Information Quality Program
Duplicate Files
13%
6%
14%
23%
14%
8% 8%
24%
11%
0%
5%
10%
15%
20%
25%
30%
% duplicate
PERCENTAGE OF DUPLICATE FILES ON EACH DRIVE
S Central Arts ANHC AHPP OSH MTCC HAM DCC
IQ AND FILE SYSTEM MANAGEMENT AT DAH 23
24. UNIVERSITY OF ARKANSAS AT LITTLE ROCK
Information Quality Program
Network Waste
14%
Wasted Disk Space
IQ AND FILE SYSTEM MANAGEMENT AT DAH 24
17%
Duplicate Files
25. UNIVERSITY OF ARKANSAS AT LITTLE ROCK
Information Quality Program
File System Age
1 year
38%
5 Years
34%
10 Years
17%
Older
11%
REPORTED AGE OF FILES
0
1
2
3
4
5
6
Wasted Space Disk Space Files
LAST ACCESSED
< 1 year
1-2 years
2-3 years
3-5 years
5+ years
IQ AND FILE SYSTEM MANAGEMENT AT DAH 25
26. UNIVERSITY OF ARKANSAS AT LITTLE ROCK
Information Quality Program
Stakeholder Support
1
4% 2
4%
3
13%
4
29%
5
50%
VALUE PERCEPTION - ORGANIZATION
IQ AND FILE SYSTEM MANAGEMENT AT DAH 26
27. UNIVERSITY OF ARKANSAS AT LITTLE ROCK
Information Quality Program
Recommendations
Create agency-level working groups to steward the resource. Include IT.
a. Naming conventions
b. Folder hierarchies
c. Metadata
d. Deletion/archiving plans
Create a central working group made up of agency stewards and IT.
a. Formalize and support the work being done at the agency level.
b. Establish “S” drive requirements for appropriate use, naming, and archiving.
Provide regular training on conventions, metadata, and the use of existing tools.
Continually scan network drives to identify areas of focus for working groups. Define and
measure improvement.
IQ AND FILE SYSTEM MANAGEMENT AT DAH 27
28. UNIVERSITY OF ARKANSAS AT LITTLE ROCK
Information Quality Program
Conclusion
Interview
Survey
Scans
Refine
Iterate
IQ AND FILE SYSTEM MANAGEMENT AT DAH 28
Sweeping change is not likely to render
desired results. A non-invasive
approach will allow agencies to
establish conventions and protocols
that work for their requirements while
achieving the desired result of a
cleaner, more efficient, more
sustainable file system.
29. UNIVERSITY OF ARKANSAS AT LITTLE ROCK
Information Quality Program
Future Considerations
Digital Asset Management
Geodatabase
Sharepoint or other “intranet” type file versioning tool
IQ AND FILE SYSTEM MANAGEMENT AT DAH 29
Notes de l'éditeur
Regulatory requirements: archival needs, file content/format, naming conventions governed by other agencies they work closely with, etc.
Reality vs. perception. Leadership vs. the rest of the organization vs. file system scans.
Leadership had anecdotes and the desire for consistency, but no hard data to understand the actual state of their file system. They also didn’t have a clear understanding of how issues were impacting the whole organization.
Establish a data quality baseline
10-Step Process
Business Impact Techniques
Information Life Cycle (POSMAD)
Data Quality Dimensions
This project focused on the S and T drives.
The start of building comparative data for measuring progress over time.
Attempt to ensure that any corrective measures don’t miss the mark.
In many cases, the multiple choice questions existed just to get the user in the right mindset to respond to the long-answer portion. So much value comes from letting people tell their stories.
Use cases, anecdotes
Frustrations with systems and services
The survey responses helped inform the scan priorities. The scans gave context to survey results.
Use steps 1-6 of the 10-Step Process, as prescribed by the project approach. Note that Step 10 isn’t reflected in the survey.
The largest group was the Historic Preservation program with 21 respondents, followed by the Director’s office, the Natural Heritage Commission, and then by a tie between the Historic Arkansas Museum and the Old State House Museum. Delta Cultural Center contributed 3 responses to the survey.
Respondents were asked “How easy is it to locate existing files?” The overall responses clearly skewed “easy” in raw numbers, but when it was broken down into percentages for each employee category, we see that, on average, a larger percentage of leadership and professional respondents skewed toward “hard.” These opposing trends may indicate that users have adapted to the system, or that users don’t perceive reported issues as factors that increase the level of difficulty of file findability. It may also indicate that leadership and professional users rely on administrative staff for some of these functions.
A quarter of the responses indicated consistent problems finding files.
Many of the comments provided examples of files renamed or deleted by coworkers, mislabeled or misfiled files, and one mentioned a file that had been password protected by a former employee. The most common issues associated with difficult to find files were those involving images and GIS data. The lack of consistent metadata was repeatedly cited as a contributing factor.
Email files: This is of concern because it results in duplicate files, and/or a number of versions of files, across the network.
Seventeen percent of respondents indicate they regularly create files only to discover a similar file already existed. Note that these figures only reflect self-reported instances of these scenarios. In the case of discovering files already exist, the probability is high that existing files go undiscovered as well.
Question 10: How often have you encountered files that were supposed to be current, but actually contained outdated or incorrect information? This issue can arise because files haven’t been updated to match new information, but are the most current version; because updated versions are being stored elsewhere, or because an old version is the most readily available to the respondent. Thirty-two percent responded that they encounter this issue more than once per year.
Question 11: How often have you encountered conflicting copies of the same file or form? Question 11 measures the frequency of discovering conflicting files, rather than simply outdated ones. Twenty-three percent of respondents encounter this issue more than once per year.
Respondents were asked to self-report the impact of information quality issues in the DAH file system in terms of hours per week. Ninety percent indicated less than 5 hours per week spent on these scenarios. Overall, while frequency may be an issue for some situations, the actual time spent working through these problems is perceived to be minimal.
Keeping copies of files is a common, and sometimes necessary, behavior in networked environments. It can also be indicative of and a contributor to versioning and duplication issues. Eighty-six percent of respondents reported keeping copies of files on their computer.
Like storing files on a local computer rather than on the network drives, storing files or copies in the cloud or on other external devices can be a best practice for archiving purposes, but can also lead to versioning and duplication issues. Fifty-three percent of respondents reported storing files in this way. Reasons cited include easier sharing of files, fear of loss due to drive failure, and the ability to access files from outside the office.
Discussions with leadership indicated the possibility that employees within the agencies might not be fully aware of internal and external regulatory requirements governing the storage and deletion of files. Seventy percent of survey respondents indicated an awareness that there was a policy or policies. However, a request to describe the requirement resulted in responses ranging from “I have no idea” to “3 years” to “until legislators say it’s ok to delete them” and even “someone else keeps up with that.” Couple this lack of awareness with repeated reports of “other people” deleting needed files from network drives and we start to see some of the root cause of issues overall.
Sixty-one percent of respondents indicated they already have a method in how they name files, and many provided examples in the comments. While some respondents had very general guidelines, such as date and location for photos relevant to a geographic area, some were very specific in their methods. Examples included above.
Many examples of hierarchal folder structures were provided to sort files in ways that were appropriate to the agency. One respondent indicated use National Park Service naming conventions, and another suggested adherence to an ISO standard. The diversity of use cases throughout DAH will factor heavily into any efforts for file name and folder structure consistency.
The leadership interview, and employee survey informed the list of needs for the file system evaluation. Not only did we need to understand the state of the file system, we needed to understand how the perceptions in the survey were reflected by the state of the file system.
Thorough scans were taken of each shared drive, and a set of key indicators; age of the file system, file types, and users of interest; shown in the table below, were selected to profile the overall health of the file system. File system age is demonstrated by (1) how long ago the largest portion of the drive was last accessed and last modified, and (2) by how long ago the largest portion of the duplicate space was last accessed.
In addition to the evaluation criteria, we’re also able to see the usernames associated with wasted space, who uses the most disk space, who has the most duplicate files, which file types are the most common, and the most commonly duplicated.
14% of the scanned file system, or (729 gb of space) qualifies as ‘wasted space’, meaning it’s occupied by duplicated files.
17% of the files on the system (213,622) are duplicates.
One of the key indicators of the age of the file system is when files were last accessed. This includes just opening them. The chart here is the last accessed times for the largest portion of the disk space or files of each type. To clarify, the wasted space indicator is the last opened times for the largest chunk of wasted files on each drive. Disk space is calculated the same way. Files are by actual numbers of files.
What we see here is a comparison of the perception of the age of files users interact with, and an actual reading of the how recently different types of files are accessed.
Respondents were asked to rate on a scale of 1 (Not at all valuable) to 5 (Very valuable) how valuable they thought a consistent naming convention would be in the context of finding files and information. Only eight percent of respondents felt a level of consistency had little to no value. Overwhelmingly, respondents felt consistency would be valuable, with half of respondents felt there was significant value in the effort. Many did voice concerns about the level of effort such an initiative would require. Those with much evolved naming and storage methods, especially those in use agency-wide, were opposed to imposed standards that would require significant time and manpower to adopt.
Realistic action plan(s) – Accountability, ownership, relevance.
Create agency-specific naming conventions. Consider legacy files.
Create agency-specific file cleanup plans to remove drafts and unnecessary versions.
Establish agency-specific metadata conventions. Consider metadata editors.
Provide training on existing tools such as the Microsoft Suite, Adobe, etc.
Folders should be relevant to the organization, not individuals. (Personal files should be stored on non-work spaces)
Formalize IT processes and communication surrounding backups, archiving, and loss recovery. It would be beneficial for IT to clearly define and make available archiving and backup protocols, including when backups are set to run. Make it known when issues occur, such as backup failures, and communicate updates and changes early and often.
Note the cyclical nature. Use the baselines to measure improvement. Keep testing.
Interview/survey – measure perceptions, identify things to test
Scan – tests of previous and new metrics. How you identify actual progress or lack thereof.
Refine, iterate – identify changes to be made for the next cycle (almost agile, selecting the focus of the next sprint), update survey instrument, select new scans to be run/old to be dropped
The overwhelming majority of stakeholders, including leadership and IT, support a more sustainable, strategic file system management process. The least optimal outcome of this initiative, per leadership and IT survey responses, is to do nothing. It should not go unnoticed that the status quo is designated ‘least optimal’ in these responses. The optimal outcome is agency-specific plans designed to support the needs and requirements of each branch of DAH, alleviate some of the pressure on the physical servers, and make transparent the naming, storing, and archiving of files throughout the organization.
As this is the preliminary study in a long-term rehabilitation effort, it will be necessary to re-survey stakeholders regularly to determine needed course adjustments and identify new issues.
Based on stakeholder feedback, there is significant interest and perceived need for both a Digital Asset Management System (DAM) and a geodatabase. Enterprise cloud storage options may alleviate some of the network-based issues associated with archiving and availability. Additionally, tools that include change management and version control options, such as Microsoft Sharepoint, could serve to correct user behaviors regarding archiving and versioning. These considerations should not be taken as recommendations, but rather starting points for further evaluation and review.
It should be noted that the 1st steps recommended here are necessary for successful implementation of enterprise systems.