Data plays a big role at Tableau—not just for our customers, but also throughout our company. Using our own products is not only one of our fundamental company values, but the analysis and discoveries we make are important to track as they shape our development processes and influence our day-to-day decisions. In this talk, we present and analyze a variety of data visualizations based on Perforce data from our development organization and share how it has influenced our infrastructure and development practices.
Professional Resume Template for Software Developers
Using Perforce Data in Development at Tableau
1. Using Perforce Data in
Development at Tableau
Ed Mack
Staff Systems Software Engineer
Robert Orr
Systems Software Engineer
2. 2
Tableau Software
Award-winning data analytics software that helps people see and
understand data.
More than 35,000 customer accounts get rapid results with Tableau
in the office and on-the-go, and tens of thousands of people use
Tableau Public to share data in their blogs and websites.
Check out our products by downloading a free trial at
www.tableau.com/trial.
3. 3
Using Tableau to Analyze Perforce Data
We use data to help answer questions in our day-to-day
work and to help us make decisions for the future.
Perforce provides a lot of data -- beyond basic changelist
and file activity.
Let’s take a look at our data sources and the Tableau
visualizations (vizes) we’ve created for analysis.
4. 4
Data Sources
Direct calls to Perforce servers
• Current data
• Historical data
• Calls to ‘p4’ or P4Python API
P4toDB
• A read-only replica feeds a PostGRES database
5. 5
Data Sources
Custom views, tables, and databases
• “ChangesByHumans” view added to P4toDB database to filter out
background users
• Join table between P4toDB and TFS
• Integrations table
• Table with server names, types, and locations
• CSV files
• Tableau Extracts and Data Sources
6. 6
P4toDB
Supported by Perforce -- no new development
since 2012.1
Lessons learned
• List tables to include rather than exclude (the default)
• Upgrade rather than rebuild (when possible)
- Metadata table is P4TODB_CFG
7. 7
Areas of Data Tracking
General health & monitoring
Codeline queries & analysis
Infrastructure planning
Historical analysis
8. 8
General Health & Monitoring
Usual hardware monitors
• Disk space, available RAM, CPU usage, etc.
Perforce-specific monitors
• Number of active processes (“p4 monitor show”)
• Long-running processes (“p4 monitor show -ale”)
• Number & age of workspaces
- Number/Growth: “p4 files //spec/client/...@date”
- Age (last accessed): “p4 clients” (P4Python)
• Replica lag (“p4 pull”)
9. 9
Replica Lag
How the process works
• Script runs every 30 seconds
- Running p4 pull –ls and p4 pull –lj commands against our
servers
• Returned data is parsed & stored in a database table
• Data is currently:
- About 9 million records, back to 06-2014
- Total size 1.4Gb
17. 17
Integration Tracking
Extensive mining of the P4ToDB database
• Tracks the path of integrations through multiple branches
• On a changelist level instead of individual file level
2-part process
18. 18
Integration Tracking: Part 1
Script runs against our p4todb database every
5 minutes, looking for new integration records.
- Ignores streams
- Generates a de-duped list of ‘from cl’, ‘from branch’, ‘to cl’,
‘to branch’ tuples, for each file in each integration
- Most integrations are small, so processing takes only a few
seconds per changelist.
19. 19
Integration Tracking: Part 2
Web application runs accepting queries for
integration changelists
• Searches backwards recursively through pre-computed records for
changelists whose ‘to cl’ is the target
• Typically, recursive search back is only a few levels; sometimes
can be many.
• Response time is variable, but usable in most cases.
21. 21
Infrastructure Planning
Size and growth of server db files
Number of Perforce clients
• “p4 files //spec/client/*@date” (ignore deleted revisions)
License usage
• Commands
- “p4 files //spec/user/*@date” (ignore deleted revisions)
• Determine “active” users
- Working backwards, draw a decreasing graph (assumes no reduction in head
count)
32. 32
Where to go from here?
We’ve given you ideas for analyzing your own Perforce data
What are your ideas?
Hint: Identify and collect data that helps you answer
questions you have now and that’s broad enough to help
you answer new questions.
P4toDB is a great source!
SPEAKER: Ed
Tableau’s mission is to help people see and understand their data ... Including our own data. It is one of Tableau’s core values that each of us use our products as part of our job.
We use data—with our products and others—to monitor our development infrastructure and processes, to discover trends and outliers, and to validate changes made to our systems and processes.
SPEAKER: Ed
- Mostly, data visualizations created in Tableau. There are other products you can use to analyze your data, including Perforce Insights.
- Tableau products make it easy to create powerful, insightful, and delightful data visualizations. But, what is important is the data ... Asking questions of the data and making decisions based on it.
- Perforce produces a lot of data, even beyond the simple questions of “How many changelists are submitted each day?” or “Who submitted the most changelists last month”?
- We believe we collect interesting Perforce data and that we analyze and present it in interesting ways. We’re going to show you what we’ve done and, hopefully, we will give you some ideas for what you might do with your data.
SPEAKER: Robert
We use several sources for Perforce data:
Direct calls to the servers
Either by using ‘p4’ calls or the P4Python API.
P4toDB
- A read-only replica of our commit server populates a PostGRES database
SPEAKER: Robert
We have created custom views, tables, and databases
We’ve added views in P4toDB, like “ChangesByHumans”, to filter out background users, like our automated build user.
We created a join table between P4toDB and TFS to support our Perforce/TFS integration.
An integrations table that is used to report where a changelist has been integrated to. We will describe this in more detail later in this presentation.
A table for server names, types, and locations is used in our system health monitoring.
We use a few CSV files for selected data, populated by ‘p4’ or P4Python.
A Tableau Extract is a compressed snapshot of data stored on disk or in memory. It is a databased optimized for Tableau vizes.
SPEAKER: Robert
- No new P4toDB development; maybe Perforce Insights is intended to replace it.
- We had to rebuild the database from scratch several times, taking a few days to complete.
- The default configuration lists tables to exclude. This will halt the process if an upgrade adds new tables. We recommend, instead, to list tables you want included.
- The metadata table in P4toDB is P4TODB_CFG. The “TABLES” column contains the list of replicated tables, with the version-level of each table.
SPEAKER: Ed
General Health and Monitoring
This is the usual monitoring of disk space, CPU & RAM usage, etc., along with some custom monitors: replica lag, number of active commands, and notification of long-running commands.
Codeline Queries & Analysis
- Velocity: changes/day, lines changed/day, etc.
- What has been integrated where
Planning
- Projecting growth in data or license usage
Historical Analysis
- Analysis of past trends
- Analysis of past decisions
SPEAKER: Ed
We use Zabbix to monitor the health of our systems.
This includes the usual monitoring of disk space, CPU & RAM usage, etc. and some custom monitors:
replica lag
number of active commands
notification of long-running commands
SPEAKER: Robert
After deploying forwarding replicas, we periodically experienced replica lag, especially with our most geographically remote development office. For this data source, we capture the output of “p4 pull –lj” and “p4 pull –ls” in 30-second intervals, for each replica and edge server. This data is used more for historical analysis than for real-time monitoring, but it did influence our decision to deploy edge servers.
SPEAKER: Robert
After deploying forwarding replicas, we periodically experienced replica lag, especially with our most geographically remote development office. For this data source, we capture the output of “p4 pull –lj” and “p4 pull –ls” in 30-second intervals, for each replica and edge server. This data is used more for historical analysis than for real-time monitoring, but it did influence our decision to deploy edge servers.
IMAGE OR VIZ: replica-lag-1
SPEAKER: Robert
IMAGE OR VIZ: replica-lag-2
Largest integration lag since we started recording lag data.
SPEAKER: Ed
What’s in a branch?Analysis of third-party vs. non-third-party content in a branch.
Branch changelists & planningPlanning retirement of branches, moving toward a single mainline.
Edited backportsTracking backports that may lead to code loss.
Who knows the code?A viz that helps people find code experts.
Integration trackingReporting where changelists have integrated into downstream branches
SPEAKER: Ed
VIZ OR IMAGE: What’s in a branch?
- Breakdown of major code modules, by number of files, in a product branch.
- 3rd-party components are integrated into product branches.
- Question: “How much of a branch is populated by third-party files, compared to other product components?”
- Answer: More than half a branch is made up of 3rd party files.
- Rev-Graph: shows one rev of one unchanged comp version; nothing but integs.
- Lower-right shows relative number of changes to 3rd-party vs. product code.
- This dashboard helped motivate us to move 3rd-party files from product branches to a shared dependency scheme, reducing branch size—saving disk space and sync time.
SPEAKER: Ed
VIZ OR IMAGE: Team Branch Breakdown
Tableau Development is moving toward working on a single, mainline branch. We have been retiring “team” branches, combining teams into remaining branches. One of the factors used in deciding which branch is retired, next, and where those developers will end up, is the changelist activity in each branch.
In this viz, we can see that “Branch E” is the least active branch, with an average of 13.3 changelists per day. And, it looks like “Branch D” may be a good candidate to receive the bulk of these developers.
SPEAKER: Ed
VIZ OR IMAGE: Edited Backports
We have a problematic integration pattern that we must monitor. This is due to a few practices which may be unusual: 1) We have multiple “team” branches as children of “main”, 2) We use “null integrations” (that’s a resolve, accepting yours), when a fix doesn’t apply to the next, downstream branch, and 3) developers, sometimes, backport from a team branch, resulting in a parallel merge path that can be a problem if the null-integration reaches the mainline, first.
To monitor this issue, we created a dashboard that used the INTEGED table data from P4toDB, reporting backports—which are simply matched with from/to depot path patterns.
When a new one appears, we investigate and enter the details in a spreadsheet which feeds into the dashboard.
SPEAKER: Robert
VIZ OR IMAGE: Who knows about X?
This dashboard accepts a file or directory pattern and returns a list of people who have worked on the matching items. One can filter by the age of the changes, by file extension, and by branch.
SPEAKER: Robert
This also connects to TFS through perforce triggers that capture TFS item ids form the change list description and stores them in a database, allowing us to display the change lists and integrations associated with a given TFS item.
SPEAKER: Robert
This also connects to TFS through perforce triggers that capture TFS item ids form the change list description and stores them in a database, allowing us to display the change lists and integrations associated with a given TFS item.
SPEAKER: Robert
This also connects to TFS through perforce triggers that capture TFS item ids form the change list description and stores them in a database, allowing us to display the change lists and integrations associated with a given TFS item.
SPEAKER: Robert
SPEAKER: Ed
Size and growth of server db files
- From an offline instance that is rebuilt daily
Number of Perforce clients
- Uses the spec depot history
License Usage
- Also uses the spec depot history
SPEAKER: Ed
VIZ OR IMAGE: db file size
- Files can be filtered out, to change the vertical axis scaling
- We can see when clients were deleted or moved to an edge server.
- Does anyone know what caused the big drop in db.have data?
SPEAKER: Ed
VIZ OR IMAGE : db-size-change
- The change in size by percentage
- I use this viz to spot unusual changes.
- Note where we migrated labels from traditional to “smart” labels.
SPEAKER: Ed
VIZ OR IMAGE : clients-by-server
Here you can see a split between our commit server, two build farm edge server, and an end-user edge server. As we migrate more users to edge servers, we will see the commit server count decrease.
SPEAKER: Ed
VIZ OR IMAGE : clients-by-user
In this viz, we can filter by server, by location, and/or by last access date.
SPEAKER: Ed
VIZ OR IMAGE: active-license-growth
Using the spec history, we create charts of the number of users, using different timelines to generate different forecasts.
We use Tableau’s forecasting analysis to help determine how many new licenses we need for the next quarter. Tableau has been growing fast and it has been challenging to figure out stay ahead of the growth without having to order new licenses mid-quarter.
While building this viz, I discovered a lot of ups and downs in the graphs, even though Tableau was only growing. The drops reflected times when we were close to our license limit and we discovered licenses that weren’t being used, and we deleted the unused licenses. Because the licenses weren’t being used, we created an algorithm that would eliminate these from the data generated from querying the spec depot.
The resulting data reflects “active” users. You can see the lines only increase. Surprisingly, the forecasted line was almost identical to the previous data, but the confidence bands for the “active” user visualization are tighter. Note that applying this “active” algorithm only works if you are only in a growth pattern.
SPEAKER: Robert
Changelist History
- A dashboard that displays changelist history in different ways
Edge Server Migration
- A look at the data behind our migration to edge servers
SPEAKER: Ed
VIZ OR IMAGE: changelist-history
This was my first Perforce viz. I created it using a comma-separated-values file, using Perl to format the output of “p4 changes”. Now, it’s backed by our P4toDB database.
It’s a basic and fun dashboard. Selecting marks in any one pane will immediately filter the others, so it’s pretty fun, and interesting to play with.
SPEAKER: Ed
VIZ OR IMAGE: edge-server-arrival-1
This viz shows the typical replica lag we experienced before we had an edge server, and when we didn’t have very many build machines syncing against our servers. We had forwarding replicas, but all were referencing a single commit server.
Our Zabbix notification threshold was 100MB for five minutes. You can see the threshold line in the visualization. Our Fremont replica experienced occasional spikes and Menlo Park—a more distant office—experienced these more frequently, but they usually cleared fairly quickly.
SPEAKER: Ed
VIZ OR IMAGE: edge-server-arrival-2
The second viz reflects a doubling of the number of build machines. Many “syncs” were into fresh workspaces (and our CI app would frequently refresh workspaces using “sync –f”).
You can see that the number of spikes has increased at all locations, with the Menlo Park office suffering from serious and extended replica lag. The largest gap lasted almost an hour.
We deployed an edge server within a few days ....
SPEAKER: Ed
VIZ OR IMAGE: edge-server-arrival-3
This is what lag looked like after we deployed the build farm edge server. You can see that over this 24-hour period, neither Seattle nor Kirkland had lag that spiked above the threshold, and although Menlo Park’s server had broken the threshold, the spikes were fewer and smaller than before we added the new build machines.
SPEAKER: Ed
We showed you some of what we have done with Perforce data. We hope we have given you some ideas for what you might do with your data.
One thing we suggest is to collect a broad enough set of data to not only answer the questions you have, now, but to allow you to ask new questions of the data.
P4toDB has been a great data source for us. We—who manage the Perforce servers—aren’t the only ones using this data. Many others in Development have created visualizations to answer their questions.