A funny thing sometimes happens when I am endeavoring into a new tool or the latest releases of an existing one. As someone who best learns by doing, I sometimes won’t immediately grasp its purpose. Especially as it pertains to software engineering, it is helpful for me to know why a feature was added and in what real-world context it would be used, rather than just reading about it. I cannot tell you how many times I have read the documentation on something, carried on with my work, and then two weeks later upon being faced with a particular type of problem, had a lightbulb moment and said “Ah, that is what (insert new feature/tool) is for. It solves this problem.”
This could not ring more true for my experience, in my very first days of coding, with Git. It was made abundantly clear to me that Git was of crucial importance to professional engineering, considered a basic and must-have skillset for any software engineer of any level. What wasn’t made clear to me was 1. How to use it and 2. Why to use it that particular way.
Quick Disclaimer — If this sounds absurd to you, you likely are not who this is intended for. I remember doing my very first repos on GitHub, like this one using our favorite “Hello World” example. Given that this was the context that I had for engineering at the time, I simply could not conceptualize the usage of the vast number of tools and CLI commands available within this limited context. I looked, as did the other engineers in my cohort, for resources that could explain this. I found very few though that actually answered the questions I was asking. Given my “Hello World” knowledge, I didn’t really get why we needed all of this yet, and I certainly didn’t understand the proper Git flow in this context. These are the questions we will answer here.
What is Git?
Straight from the horse’s mouth, we have this:
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.
Let’s break this down. The part we want to focus on is “open source distributed version control system”. Open source just means that the code base is publicly available. Frameworks and packages often are open to or even rely on engineers choosing to contribute improvements, rather than staffed engineers.
A version control system provides effective management and organization of a code base while multiple engineers are working on it at the same time. Even if you are not at a point yet of collaborating with other engineers, think back to a group project or paper in school. Likely the paper was divided into different sections and assigned to different members. But to write a good paper, those pieces can’t be written entirely separately from one another. Perhaps you had to make a few changes to the opening paragraph in order to properly tie in your piece, and another member had to do the same thing. It’s easy to copy and paste an entirely new section of the paper into its reserved slot, but those changes where two members of the group were both working on the same paragraph in different ways, get a bit trickier.
Now imagine that your group paper was actually a code base and that code base was the size of Uber’s. Solving that opening paragraph dilemma becomes nothing short of a nightmare. This is where Git comes in. Git provides an efficient way to track and manage all changes, and platforms like GitHub provide a visual representation of these changes as well as remote storage for the code base.
But Git isn’t just a version control system — it is a distributed version control system. This word, distributed, means that there is no one singular centralized version of the code base. Rather, every engineer working on it keeps his or her own local version which is worked on, and eventually “merged” or added to a master branch (we will learn more about this later). In other words, going back to our group paper analogy, each group member has his or her own copy of the paper locally in Microsoft Word, rather than everyone working on a Google Doc at once.
What is the Difference Between Git and GitHub?
I admittedly thought at first that Git and GitHub were the same thing, assuming Git was just a nickname. They are actually two separate things, and you need both. Git is local, while GitHub is in the cloud. For any one project, you will have a local Git repository and a remote Git repository — this is imperative to understanding much of the different terms like “tracking”, “branches”, “upstream”, etc. that you will come across. I remember getting sucked into these seemingly endless circles because I didn’t fully grasp this, and so didn’t understand how to properly set Git up.
Not to put too fine of a point on it, but this means that whenever you are setting up a new project, you need to initialize a new Git repository locally from the command line, then separately from that create a remote GitHub repository, and then essentially connect the two by telling your local repository “Hey, you can find your remote counterpart at this URL.”
GitHub is not the only remote Git option. There are others such as GitLab. That being said I think GitHub is a more popular option and would recommend starting with it until having a reason to switch.
A Few Terms
There are a few terms that will be helpful to discuss before diving into best practices and Git Flow.
- Branch A copy of the codebase that you can work on freely without risking breaking the deployed/production ready code. The Master Branch is the one exception to this, since this branch will contain that deployed/production code that you don’t want to break. Other branches will likely be copies of the master branch. If you completely destroy your application on a branch other than master, you can just delete that branch, go back to master, branch off again, and start your changes over with a clean slate.
- Merge The integration of new code from one branch into another branch, typically the master branch.
- Commit An entry in the repository’s change history that contains two parts — the new code that is being “committed” or saved to the Git branch, and a message explaining the changes in that commit. Git commits can be as few as 3 lines of code or as many as 30, but each commit should pertain to an individual code block or function.
- Push The action of sending local Git commits to be saved in its remote GitHub repository.
- Pull The action of requesting the latest updates in a remote branch. Pulling code regularly is imperative when collaborating with another engineer on the same branch, and for staying up-to-date with master.
- Pull Request A request to have your branch merged into the master (deployed/production) branch. You open a PR in GitHub, and typically assign one or two teammates to review your changes. Once they approve the new code, the changes can be merged into the master branch and the other branch can be deleted.
- Merge Conflict These occur when Git detects differences between branches that it cannot resolve on its own. It does not know which change should be kept, and allows you to make the decision yourself. This typically happens if two engineers merge off of the master branch and make changes to the same lines of code. Once one set of changes gets merged into master, the other engineer will see merge conflicts when pulling the latest updates to master.
We now understand the issues that version control solves, the basics of Git and GitHub, and how Git and GitHub interact and rely on one another. This leads us to “Git Flow”, which is best practices in using Git to collaborate with other engineers. Git Flow ensures regular integration between the branches.
Git Flow is designed to keep merge conflicts to a minimum and make sure that everyone is working on up-to-date code. It is roughly the below:
Create a Branch → Write and Commit New Code → Open Pull Request and Request Review(s)→ Merge Into Master Once Approved → Delete Branch on GitHub → Checkout Master Branch Locally → Pull Master Branch Updates to Local Repo→ Create New Branch Off of Updated Master Branch → Repeat
The idea is that if all engineers working on a repository are regularly branching off from master, adding their own changes and then merging those into master, everyone will be working on a current enough version of the code base, and merge conflicts will be kept to a minimum. Seeing as this is the goal, best practices also include pulling the master branch every morning into your branch. This also keeps potential merge conflicts to a minimum and typically very manageable when they do arise. If you happen to be working on the same branch as someone else, those changes should be pulled, too.
Following this will make your life so much easier. It is not fun to reach the point of being ready to open a pull request, just to pull updates from master and find 30 merge conflicts.
Thanks for reading the Industry Experts publication!
Want to become a writer for the publication? It’s possible! Start by reading the guidelines. Our guidelines are simple to follow and will lead you in the right direction! As always, be sure to sign up below! Art downloads available at https://www.deviantart.com/paint-writer