Author: Yucheng Zhang, yucheng.zhang@tufts.edu
Date: 10/02/2025
Linux is a free, open-source, and Unix-like operating system kernel that was originally created by Linus Torvalds in 1991. Over time, Linux has grown into a full-fledged operating system used worldwide across various types of devices, from servers and desktop computers to smartphones and embedded systems.
by Richard Stallman
Many computer users run a modified version of the GNU system every day, without realizing it.
Through a peculiar turn of events, the version of GNU which is widely used today is often
called “Linux,” and many of its users are not aware that it is basically the GNU system,
developed by the GNU Project.
• Ubuntu: A user-friendly distribution popular for desktop and server use, based on Debian.
• Fedora: A cutting-edge distribution often used by developers and those who want the latest features.
• Debian: Known for its stability and extensive software repositories, often used in server environments.
• CentOS/AlmaLinux/Rocky Linux: Enterprise-grade distributions derived from Red Hat Enterprise Linux (RHEL).
• Arch Linux: A rolling release distribution known for its simplicity and customization, aimed at advanced users.
• Kali Linux: A distribution designed for penetration testing and security research.
Feature | Red Hat Enterprise Linux (RHEL) | Rocky Linux |
---|---|---|
Origin | Developed and maintained by Red Hat (IBM-owned) | Community-driven rebuild |
License | Commercial, subscription required for updates & support | Free and open-source, no subscription required |
Support | Paid enterprise support from Red Hat | Community support |
Security | Certified security patches | Security patches synced from RHEL sources |
Ecosystem | Widely certified by software/hardware vendors | Not officially certified, but works with same ecosystem |
Cost | Paid subscription | Free; optional paid support available |
A file is an addressable location that contains some data which can take many forms.
Files have associated meta-data
r
w
x
-
Everything is mounted to the root directory
Files are referred to by their location called path
Absolute Path (From the root): /cluster/tufts/mylab/user01
Relative Path (From my current location):user01
When working on Linux, you don’t need to Google every command — the manual pages (man pages) are built right into the system. The man command shows documentation for most Linux commands and tools.
man <command>
man ls
Linux provides powerful tools for managing files and file systems. Here we will introduce a few essential commands.
$ pwd
/cluster/home/yzhang85
$ cd /cluster/tufts/rt/yzhang85/
$ pwd
/cluster/tufts/rt/yzhang85
cd [directory]
If a directory is not supplied as an argument, it will default to your home directory.
$ pwd
/cluster/tufts/rt/yzhang85
$ cd ..
$ pwd
/cluster/tufts/rt
$ cd
$ pwd
/cluster/home/yzhang85
ls [options] [directory]
$ chmod g+w filename ## Give the group write permission
$ chmod u+x filename ## Give user execute permission
$ chmod a+r filename ## Give all users read access
$ chmod u=rw,g=r,o=r filename ## Give user read and write permission, group and other only read permission.
To apply permissions recursively to all files and subdirectories within a directory, use the -R option:
$ chmod -R g+rx /path/to/directory
touch is used to create new files or to update the timestamps (access and modification times) of existing files.
$ touch newfile.txt
$ touch existingfile.txt
mkdir [options] dir_name
$ mkdir -p rnaseq/output
This will create output
folder as well as its parent folder rnaseq
if it doesn’t exist.
mv [options] source destination
cp [options] source destination
rm [options] file/directory
When your storage space starts running low on an HPC or Linux system, it’s important to figure out which files and folders are using the most space.
ncdu
stands for NCurses Disk Usage, and it provides an interactive, text-based interface for exploring disk usage.
ncdu [directory]
$ ncdu ~
$ ncdu /cluster/tufts/mylab
When working on Linux (especially on shared HPC systems), it’s important to know how much disk space is available on different filesystems.
The df
command (disk free) shows this information.
$ df -h /cluster/tufts/mylab
$ df -h /cluster/tufts/yzhang85
Filesystem Size Used Avail Use% Mounted on
10.246.194.77:/projects/yzhang85 1.1T 961G 64G 94% /cluster/tufts/yzhang85
Linux command-line tools are invaluable for bioinformatics text processing due to their efficiency and flexibility. They allow for rapid manipulation and analysis of large biological datasets, such as DNA sequences, protein structures, and gene expression data. Commands like grep
, sed
, awk
, and cut
are essential for filtering, extracting, and reformatting text-based biological information.
cat [options] file1 file2 …
head/tail [options] file
$ less largefile.txt
$ more largefile.txt
grep [options] PATTERN file
sed (short for stream editor) is a powerful text-processing tool in Bash that allows you to parse and transform text in files or streams. It is commonly used to perform basic text manipulations like search and replace, insert and delete lines, and apply regular expressions on text data.
Replace the first occurrence of old with new in each line:
sed 's/old/new/' filename.txt
Replace all occurrences of old with new in each line:
sed 's/old/new/g' filename.txt
sed -i 's/old/new/g' filename.txt
Warning: Use this command with caution as it directly modifies the original file. To create a backup, use -i.bak
:
sed -i.bak 's/old/new/g' filename.txt
sed '/pattern/d' filename.txt
When working with files on Linux, compressing them to save space and bundling multiple files into a single archive is a common practice. The commands gzip, gunzip, and tar are essential tools for file compression and archiving in Bash.
tar is used to create, extract, and manipulate archive files. However, tar itself does not compress files; it only archives them by combining multiple files and directories into a single file. This file usually has a .tar
extension. However, tar can be used in combination with other compression utilities (like gzip
or bzip2
) to compress the archive.
tar -cvf archive.tar my_folder
tar -xvf archive.tar
tar -cvzf archive.tar.gz my_folder
tar -xvzf archive.tar.gz
VARIABLE=value ## No space around =
$VARIABLE ## echo $VARIABLE
>
: Overwrites the contents of a file with the command’s output$ cat file1 file2 > files
>>
: Appends the output to the end of an existing file
cat file3 >> files
<
: Uses the contents of a file as input to a command
sort < names.txt
Pipes in Linux are a powerful feature that allows you to connect the output of one command directly as the input to another command. This is a key concept in Unix/Linux philosophy, which promotes the use of small, modular tools that can be combined to perform complex tasks.
A pipe is represented by the |
symbol. When you place a pipe between two commands, the standard output (stdout
) of the command on the left of the pipe becomes the standard input (stdin
) for the command on the right.
command1 | command2
$ sort file.txt | uniq
If you’d like extra practice with the Linux command line beyond today’s workshop, I recommend trying the Bandit wargame from OverTheWire Bandit.