What is Git and What is It Used For

What is Git and What is It Used For

This guide will cover the fundamentals of Git. You will learn why version control is necessary and how version control systems work. This information will help you to master practical work with Git successfully.

What problems does version control solve

Regardless of the language or direction of development, a developer's code is still text written in multiple files on a disk. These files are constantly being added to, deleted, and modified. Some of them may have hundreds or thousands of lines of code. Files with a thousand lines of code are pretty standard in programming.

While the project consists of a couple of files, its development does not store up troubles. A developer writes code, runs it, and makes everyone happy. But with the growth of the code base, certain inconveniences emerge, which plunge you into real problems:

  • How not to lose the source code files?
  • How to protect yourself from accidental corrections and deletions?
  • How to undo the changes if they prove to be incorrect?
  • How to simultaneously support the production version and the development of a new one?

Assume your project has hundreds of files and tens of thousands of lines of code. You finish some tasks, changed 15 files and 300 lines of code in the process, and then realized that the task is no longer relevant. You must now return the source code to its previous state. And this is only one of many possible outcomes. Or, while working on the code, it became clear that it was urgent to make changes to the production (website). Since an irrelevant task cannot be uploaded to the site, a correction must be made to the version of the code that existed before the new task's implementation.

The simplest solution to the above problems is copying directories. Unfortunately, this approach has only disadvantages. Because point changes cannot be tracked (only from memory), moving changes from one directory to another necessitates a complete rewrite. You'll get confused as soon as there are two folders. Even so, this method will not allow two people to work on the code at the same time.

Co-development is a different problem. How can two developers work on tasks that require them to correct code in the same files without destroying or overwriting each other's changes?

Fortunately for us, this problem was solved back in the 80s. Since then, the toolkit has developed significantly and is now widely used not only for coding but also for writing and translating books. Version control is the solution. It is carried out with the help of special programs that can track code changes. Here are a few of the many features of these systems:

  • Revert to any previous version of the code
  • View the history of changes
  • Collaboration without fear of losing data or someone else's work

In this guide, we will analyze the general working principles of such programs.

How does version control work

Version control systems (VCS) are frequently integrated into tools that even non-programmers are familiar with. We'll start with them and define a few terms along the way.

Almost everyone uses file synchronization services such as Dropbox. And they all keep track of the versions of the files they work with. This is how it happens: the program regularly synchronizes local files with those in the service's storage. If the local file differs and its modification time is later than the server file, the server file becomes part of the change history, and the most recently modified file becomes the current one.

dropbox, version history

In the picture above, the current version of the file is marked as current. The rest are previous versions of the current file. As you can see, Dropbox allows you to restore any version of a file.

Please, pay attention to the following phrase:

Dropbox keeps a snapshot every time you save a file.

Snapshot is a term that will be used frequently in the future. It is also known as a "shot of state", but for simplicity we will just call it a "snapshot".

In this case, a snapshot is a file itself after the change. To grasp it better, consider the alternative — the delta of change (diff). Instead of saving a new file version, Dropbox could compute the difference between the new and old files (which is not difficult for text files) and save only that. Why, you might ask? This method saves disk space but adds complexity to file manipulation.

Later, we'll see that different tools use different approaches: some use deltas of changes, while others use snapshots. By the way, the term "snapshot" is commonly applied to disks. For example, you can take a snapshot of the disk and then recover from this point (just like in games).

Another good example of how to use version control is text editors, particularly online ones.

google docs, version history

After each autosave, Google Docs takes a picture (about once every 5 seconds). The new version does not appear if the document has not changed during this time. Many of these versions form a history of changes.

The version history is referred to as "Revision history" in the image above. Revision is the basic concept of version control systems. Any recorded change in the version control system is called a revision.

Please note that a revision and a snapshot are not the same things. When you commit changes, you create a revision that can contain either a delta of changes or a snapshot.

By the way, the switching between revisions also has its own name. When we upload a revision, they say that we checkout it.

editor, revision scheme

You can detect differences between revisions if the VCS uses snapshots, as shown in the image above by Microsoft Word. This functionality is essential, as we must constantly check "what has changed," not just when working with code. Let me give you an example from my own practice: the coordination of various legal documents (contracts) typically occurs with lots of edits. I’d like to see what exactly has changed after the lawyers have corrected the contract.

Moreover, Linux systems include a diff command that can be used to determine the differences between any files even without using VCS. These changes can be saved to a file and then applied to the source file with the patch program.

diff index.js index2.js > index.patch

1c1

< const a = 5;

---

> const a = 8;

3a4

> console.log(a - b);

patch index.js -i index.patch -o index2.js

In the programs discussed above, making a revision is linked to autosave, but this is not the only strategy. Three methods are used in total:

  • Save
  • Auto-save
  • Using button (or command)

The latter is used when working with code.

What are the types of version control systems?

In all the previous examples, we analyzed VCS embedded directly into programs, particularly text editors. However, VCSs for the source code are separate from the development tools used (although they can be additionally integrated with them).

This is because the source code is a collection of text (and binary) files. It is impossible to predict who, how, or where will edit them. Besides, automatic revision creation becomes extremely inconvenient.

Committing is the process of creating a revision in a VCS. You'll often hear the phrases "will you do a commit?" or "I made a commit" at work. Aside from that, the term "commit" is generally used instead of "revision" (and we use it too).

Changes within a single commit must follow certain rules when working with code. Only in this case will all of the advantages of VCS be available. These requirements include:

  • Good description. It usually starts with a short one-line header of no more than 50 characters, followed by an empty line, then a more detailed explanatory text, if necessary. It’s better to stick to the imperative mood in the title: "Fix scrolling," not "Fixed scrolling" or "Fixes scrolling.”
  • Atomicity. A commit should solve only one task, preferably from start to finish. This will allow you to create an easy-to-read and comprehend project history. And, if necessary, you can easily undo the change or transfer it to a different version of the program.

In addition to these basic recommendations, there are many others that are commonly referred to as "good commit".

The basic workflow is the same regardless of VCS. It looks like this:

  1. Initialization (creation) of the repository
  2. Adding new files
  3. Committing
  4. Any operations with files (adding, deleting or changing)
  5. Committing
  6. ...

A repository is a set of files and directories that are under version control.

VCS is typically divided into generations, each of which has altered working approaches significantly over time.

First generation

RCS, SCCS

  • Work with each file separately
  • Local work only

version control systems, first generation

Second generation

CVS, SourceSafe, Subversion

  • Multi-file
  • Centralized
  • Require a server

There is no way to work in these systems without access to the server. You literally can't do anything. Viewing the history, committing, and rolling back to a previous version become impossible without network access.

version control systems, second generation

Third generation

Git, Bazaar, Mercurial

  • Distributed
  • Each participant has their own fully functional local repository

A server is only used to store the reference repository. In fact, all repository copies are equal and can exchange data in any direction.

version control systems, third generation

Conclusion

Now you have learned what Git is used for, as well as the principles of version control systems. This information will help you master practical work with Git within the program you've chosen. If you have any questions, do not hesitate to ask them here in the comments.

Resources

Source code (github)
Kirill Mokevnin
comments powered by Disqus