How using Git together with software development best practices learned from Open Source development projects can increase efficiency and turnover for your activity
4. Case 1a: Exploring historical changes
A customer reports an issue with IE8, which I know we’ve already fixed somewhere, at some point.
I can only remember we fixed it through a change in the script/ directory, and that the
person responsible for it indicated the change with the term “IE8”.
I’m also convinced that, although the exact same change might be useless, I’m going to find a very
good source of inspiration in it.
How do I find that change again?
5. Case 1b: Knowing what other people
changed
I’m coming back from a 4 weeks holiday.
A lot has been going on while I wasn’t there, and before I left, I was working on a pretty serious
new feature that required modifications into several files.
Now I’m back, and I want to get back to work on that new feature, but I don’t know if my colleagues
have modified the same files and, if I’m not careful, I might override their changes or the other way
around.
How can I know what files have been modified?
17. Case 2: Discussing changes in the right place
I sent a code change to the server the other day, but I’m really not sure whether it might break
somebody else’s code. Luckily, we have a review process, so someone will look at my change
and comment if anything feels wrong.
If ask for that by e-mail, I am:
●
Spending time writing the e-mail
●
Limiting the potential reviewers
●
Preventing future reviews/references
Furthermore, it is difficult to explain the code change via e-mail. Looking at the code will certainly be
much faster for the reviewer if I can pinpoint that change…
How do I improve that so that the review process is not simply abandoned because it is impractical?
18. Discussing changes
One destination: Gitlab.com/our-project/
●
Find your branch
●
Find the commit
●
Link the commit in your e-mail, or simply reference the person in a comment
22. Case 3: Finding meaning in changes
I regularly have to give support to several customers during long periods of time.
I know that 50% of the time, the issues they are reporting has already been fixed for
some other customer, but it’s difficult for me to find references as to how these were
fixed.
On other days, I might be having a report on an issue that is caused by a very
complex condition in a file. I can identify the file and the condition, but I really need
to understand why, or in relation with which issue this condition has been added.
How do I do that?
23. Solution 1: Meaningful commit messages
Meaningful commit message are the MOST IMPORTANT ACTION to be
taken while sending changes to the versions server.
This is the only thing that can render Git (or SVN) completely useless if not
well managed.
24. What are meaningful commits?
●
Verb (present imperative) + rest of the comment
●
Reference the task (or customer) in relation (preferrably a task ID)
●
Special markers/tags:
●
“Minor: ” → change that does NOT affect the logic
●
“Style: ” → change that ONLY affects the user interface
●
“[Some module]: ” → specific to one module (will help with changelog)
25. Solution 2: “Blame”
Blame is actually a feature of Git (not the actual action of blaming someone
else for all your life’s pain).
●
Locate a file
●
Click the “Blame” button
●
Locate the section and the corresponding commit
●
Sometimes: Repeat the process
28. Case 4: Managing several customers
I regularly fix things for different customers using the same code piece.
How can I manage a better workflow to ensure that all customers will
benefit from the changes in the end but that I don’t need to apply that
exactly at the same time to all customers?
29. Solution: Use branches
Branches in a Git project allow you to keep different versions of the code simultaneously, without them
having to be exactly at the same point.
Using branches is a bit tricky at first, and you always have to be “aware” that they exist, but they
provide great benefits in a complex coding environment.
31. Solution: Use branches (3)
Typical workflow, step by step:
●
Start from the master branch
●
On your computer, create branch (Feature X)
●
Develop Feature X (partially)
●
Customer call
●
Save temporary changes (commit in Feature X)
●
Switch to master branch
●
Create branch “Customer-[Issue number]”
●
Fix the customer issue
●
Save changes (commit)
●
Update the Git server with this branch (push)
●
Update customer server instance (pull)
●
Get back to branch “Feature X”
●
Work on Feature X for a bit
●
Customer approves Fix [Issue number]
●
Commit temporary changes (Feature X)
●
Switch to master branch
●
Merge changes from branch [Issue number]
●
Push to Git server
●
Get back to branch “Feature X”
●
Finish “Feature X” (commit)
●
Test & get approval
●
Switch to master branch
●
Merge changes from branch Feature X
●
Push to Git server
33. Solution: Use branches (activities)
master
Feature X
master
Customer-[Issue number]
New branches
Merge into master
Commits
34. Solution: Use branches (developer track)
master
Feature X
master
Customer-[Issue number]
35. Case 5: Partial feature dev, interrupted
I regularly find myself developing a new feature on the basis of the latest
code, only to be interrupted a few hours later by a customer support
request...
Solution: see Case 4: Use branches!
36. Case 6: Customer not approving all changes
at once
When I develop a fix for a customer, it may take some time for him to
get back to me and to validate a change. While one of the changes is
not validated, I cannot merge his “branch” back into master…
What do I do?
37. Solution 1: Cherry-pick the changes
cherry-pick[ing] is a Git action that allows you to merge only specific changes (commits) into another
branch.
This allows me to only bring parts of a branch into the master branch, for example.
39. Solution 2: Multiple branches per customer
If there is a risk of a customer only approving some fix and not another one, and this risk can be
foreseen, then it makes sense to simply create one branch per fix.
Creating one branch per fix or per feature should be the default behaviour, but it’s simply not always
practical to do so.
If done this way, then integrating the change into the master branch is only a matter of merging one
branch. This is much less work-intensive and results in a clearer history overall.
41. Case 7: Submitting a change for review
Our workflow indicates that all changes made by developers have to
be checked by at least one more person before they are integrated
into an official repository.
How can that be implemented?
42. Solution: Individual repositories
and merge requests
Each developer creates his/her own copy of the official repositories (on Gitlab).
Any branch is initially created on a personal repository.
When a branch is considered finished, the developer pushes it to his/her repository on Gitlab.
Using Gitlab’s interface, the developer creates a “Merge request” to the main repository.
An e-mail gets to the main repository managers, which will have to accept (or reject) the merge
request.
A conversation can follow...
50. Case 8: Pre-production environment
We want to improve the stability of our solutions. This is usually best
implemented through a pre-production environment where we can test things
out before delivering to the customer, or where the customer can do some
testing… but we’re not clear how that’s connected to Git.
51. Solution: multiple env with Git workflow
The “right” context for a good quality of development is to always have 3 environments:
●
Development server (or machines if the whole system can be installed on one machine)
●
Pre-production server (or Validation server or Testing server or Approval server)
●
Production server
The challenge is into having a clean procedure to update each of these. Here are a few tips and a
suggested workflow…
In the case of eSearch and generally all our products, setting a dev environment up for each developer
(locally) seems to be a big challenge. This is not at all uncommon. In these cases, the development
server is part of the infrastructure. There can be several dev, val and prod servers, depending on the
customer projects.
52. Solution: multiple env with Git workflow
Development server(s)
Developments happen on the developers computers, and are then sent to the Git server.
The development server(s) synchronizes (either manually or automatically) regularly with Git. There
can be one single development server, or several servers, depending on the number of projects and
their differences. In case of several server instances, each instance is synchronized with one (and
only one) branch on the Git server.
The data connected to the development server is not of paramount importance. It can be
synchronized manually (with a documented procedure) from time to time.
53. Solution: multiple env with Git workflow
Validation server(s)
A validation server is (initially) a copy of the production server. It has the data and code of the
production server, but is then regularly updated on the basis of the code coming from the
development server.
Code on the validation server CANNOT come directly from Git. Instead, it is only ever synchronized
with the development server (using Git commands).
Data on the validation server is regularly updated using data from the production server, through a
clear documented procedure, for example to anonymize the data and avoid automated processes
sending e-mails to the customer.
54. Solution: multiple env with Git workflow
Production server(s)
Production servers are of utmost importance. They are only ever updated when new code has been
tested on the validation server.
Code on the validation server CANNOT come directly from Git. Instead, it is only ever synchronized
with the validation server (using Git commands).
An exception to that rule exists for “hotfixes”, which are stuff that require immediate attention and an
immediate fix. This is true only for data-critical situations, where data on the customer server can be
lost, damaged or stolen, or where the data is unaccessible by the customer. All other situations
MUST go through the normal process.
Any code update on the production server has to be properly prepared, documented (taks in JIRA)
and agreed to by the customer.
55. Solution: multiple env with Git workflow
Git server
Developers
Dev Val Prod
hotfix
originorigin
origin
(automated)
Data dumpData dump
5’
DevOps
56. Solution: multiple env, Git WF, Multi-branch
Git server
Dev Val ProdDev Val Prod
Dev Val ProdDev Val Prod
Dev Val ProdDev Val Prod
57. Case 9: Updating servers periodically
As a developer, I want to be able to test my new developments quickly
without taking the risk of updating the code on the server directly, because
that could mean that I am unknowingly overwriting someone else’s code, and
because it would void all the precious workflow we have established to
guarantee the quality of our developments.
Can that be done through Git?
58. Solution: Git and Cron
Once development instances have been setup and everybody works through Git
(except for configuration changes), setting up a periodical update of the
development server instance is dead-easy.
Our server can simply “git pull” from the server (ssh key needed) and get everything
updated.
If cache needs to be cleaned or some similar process needs execution, this can be
scripted and executed after “git pull”.
59. Case 10: Config files varying between
customers
If we synchronize everything through Git, does it mean that we need
one branch per customer and that our configuration files will be
different in each branch?
Or how should we work with configuration files?
60. Solution: Config files are not in Git
Configuration files should NOT be located inside the code (to start with).
If configuration files are located inside the code directory, then:
●
they should be specifically omitted from Git (through the .gitignore file)
●
a “template” for the configuration file should be present wherever the final configuration file should be
located
●
the template (.dist) kept in Git should always contain all the possibilities of configuration (description,
name and example of value). You CANNOT assume that someone having access to this config file
will also have access to the documentation about configuration
●
the .dist template should never be modified on the customer server. It serves as a template. It must
be copied into a new file and updated there
As a matter of fact, the goal of all development projects MUST ALWAYS BE to have only one official
repository with one official branch (be it with several versions).
This reduces confusion, improves understandability of the code and the project overall and ensures
better synchronization of the customer portals.
61. Solution: Config files are not in Git
# Config file configuration.pl.dist
# This configuration template is provided as an example
# for the formatting of the real configuration file of
# your eSearch application.
# Please copy this file to configuration.pl to start using
### Database connection information
# Source – used by drivers at bin/ariane.pl
my $confEnabled = false; # defaults to false
my $confUser = ‘user’;
my $confPass = ‘pass’;
my $confHost = ‘host’;
# SD source – used by SD drivers at bin/ariane-sd.pl
my $confSDEnable = false; # defaults to false
my $confSDUser = ‘user’;
my $confSDPass = ‘pass’;
...
62. Case 11: Automated quality assurance
Isn’t there a way to automated tests that we should
otherwise do manually before delivery to a customer?
63. Solution: Gitlab pipelines
Although this is less related to Git and more to Continuous Integration, Gitlab pipelines allow for the
design of validation processes that can execute for each commit sent to the server.
This way, you can efficiently track, over time, when some specific test failed for the first time.
You can also test, in general, specific features that you just developed (through interface testing) so
you know whenever someone else breaks it and can prevent it from ever reaching the server.
Also, your own repository or branch can be tested before you sending the changes to the official
repository.