Wildcards,Filters and Regex in Linux

Howdy fellows! It’s gutsytechster!!

Today we will advance our knowledge a bit more with some new topics. If you don’t know the basic commands then it might be little difficult to get over your head. So I’ll suggest you to go through An Introduction to Linux.Have you ever thought, how that would be easy if a set of files can be operated over by using just one command. There are some amazing options when you can control the view of the output according to you. This can be done by using Wildcards, Filters, Regex and piping. We’ll have a look on each of them. Let’s start then.

Wildcards

A wildcard is a substitute for a group of characters, which results in formation of a pattern defining a set of files and directories. They greatly increase the flexibility and efficiency of searches in any nix-operating system. These are usually used with linux commands. There are basically three wildcards which are used often:

  • * (Star) : It represents zero or more characters.
  • ? (Question Mark) : It represents only a single character.
  • [] (Square Brackets) : It represents a range of characters.

Examples:

  1.  ls  b* :- This command will list all the files and folders which starts with b irrespective of the letters come after it.
  2. ls  *.txt :- This command will list every file and folder whose extension is .txt.
  3. ls  ?p* :- This command will list all files and folders whose second letter is p.
  4. ls  *[qv] :- This will  list files and folders that ends with either q or v.
  5. ls  *[0-9]* :- This will show every file whose name has a digit in it.
  6. ls   [^a-k]* :-  Here, I used a character ^(caret). It can be used inside a range to exclude the things which are included in range. Therefore, the command will show all files and folders which do not start with the letters between a to k.

These are only a few examples where you can use wildcards. I have used ls command only, though you can use other commands also. For e.g. mv   home/.??g    home/images/, this command will move each jpg or png files into the given folder. Just like that, they can be used widely and saves a lot of time with the required result.

Filters

Filters are the programs that take textual data as an input and format it according to the applied filter. These filters have various command line options which provides more flexibility to the output given by the filters. So, don’t forget to look for the man page for each of these filters. There are a number of filters such as:

  • head :- It’s a program that prints first -n lines of its input. By default it prints first 10 lines.
head sample.txt       # This will print the first 10 lines of the file
head -4 sample.txt    # This will print the first 4 lines of the file
  • tail :- It’s a program that prints last -n lines of its input. By default it prints last 10 lines.
tail -6 sample.txt    # This will print the last 6 lines of the file
  • sort :- It sorts its input as the name says. By default it sorts alphabetically, though we can change it’s behaviour using many command line options. So it’s man page is worth looking.
sort sample.txt      # This will short the file sample.txt alphabetically
  • nl :- It stands for number lines. As the name suggests, it shows the line number alongwith each line.
nl sample.txt       # This will show each line with line number
nl -s '.' -w 10 sample.txt

The first command is self-explanatory, though the second one needs a little attention. We used two command line option -s and -w. What -s does is that, it specifies what should come after the number and what -w does is that it specifies how much padding before the number should be given. The output would be somewhat like this:

                   1. #something written here
                   2. #something is written here also
                   3. 
                   4. #there would be more lines in your output

The space before the number is the padding we gave.

  • wc :- It stands for word count. It counts the words, lines, characters by default, though you can restrict the output using command line options. -l for lines only, -w for words only, -m for characters only. You can use two command line options at a time e.g. -lw for both lines and words.
wc -l sample.txt
  • cut :- It is used when the text is seperated into columns and we want only certain columns.
cut -f 1 -d ' ' sample.txt
cut -f 1,2 -d ' ' sample.txt

We used the cut filter with two command line options -f and -d. -f stands for field or column which we want, as in second example we want both 1 and 2 field so we input these two. By default it takes tab character as a seperator, to specify any other we use -d option, as here we specified space as a seperator.

  • sed :- It stands for stream editor. It allow us to use search and replace specific data.
sed s/search/replace/g    # General syntax
sed s/cash/trash/g        # Replace every occurrence of cash with trash

In above commands, s stands for substitute(specified the action to be performed). After first / the word to be searched and after second / the word to be replaced and g stands for globally i.e. to replace every occurrence.

  • uniq :- It stands for unique. Its work is to remove duplicate lines from the data. The only limitation is that those lines must be adjacent.
uniq sample.txt
  • tac :- It is just opposite of cat (in words also). It will print the last line first, through to the first line.
tac sample.txt
  • awk :- It is used when we need to work with data which is organised into records and fields.
awk '{print $3}' sample.txt

The above command will print the third column of the given file. We enclosed the expression in single quote so that no symbol get interpreted with its special meaning. Curly braces are used to tell that it is an action.

Well, these were some of the filters which would be very helpful in displaying data according to our requirements.

Regular Expressions

Regular Expression is a means of matching a pattern from a given string. Through regular expressions we can search, replace, validate, coordinate, reformat, etc. in an efficient manner. They are often called as RegEx. They are more like wildcards but with much more functionality.

There are symbols used in regular expressions, so have a look at them:

  • . (dot) :-  It represents a single character.
  • ? :- The preceding character matches zero or one time only.
  • * :- The preceding character matches zero or more times.
  • + :- The preceding character matches one or more times.
  • {p} :- The preceding character matches exactly p times.
  • {p,q} :- The preceding character matches atleast p times but not more than q times.
  • {p, } :- The preceding character matches atleast p times.
  • [agd] :- The character is one of those included in the [].
  • [^agd] :- The character is not one of those included in the [].
  • [a-z] :-  The dash used to give the range. In this case, the output can be any character between a to z.
  • () :- It allows us to group several characters together.
  • |(pipe) :- It behaves as a logical OR operation.
  • ^ :- Matches the beginning of the line.
  • $ :- Matches the end of the line.

grep

grep command is used to search a given set of data and prints every line which contains a given pattern. This seemingly trivial program is much powerful when used correctly. More often, it is used with regular expressions as it stands for global regular expression print. It has many command line options which this command support. So its man page is worth looking.

Examples

  1. To identify any line with two or more vowels in a row, we can use                           grep   ‘[aeiou]{2,}’   sample.txt.
  2. To identify any line with a 2 on it which is not the end of the line, you would use grep   ‘2.+’   sample.txt.
  3. grep   ‘2$’   sample.txt will give those lines which have number 2 as the last character.
  4. To get each line which contain either ‘is’ or ‘go’ or ‘or’, you can type                      grep  ‘or|is|go’   sample.txt.
  5. A line whose name begins with A-K can be identified as grep  ‘^[A-K]’   sample.txt.

There are many more examples, but you will learn only by doing it. So I would prefer practicing regular expressions to get command over them. For more you can refer here.

Piping and Redirections

Every program we run on the shell has three data streams connected to it by default:

  • STDIN(0)
  • STDOUT(1)
  • STDERR(2)

We can manipulate these data streams using piping and redirections.

Redirections

  • We can redirect the output of a command into a file using ‘>’ (greater than) operator. e.g. ls  >  output.txt will store whatever the result of the ls command into the output.txt(If the file doesn’t exist, then it will be created).
  • If we save the output to an existing file, then its old data will be truncated and the new output will be saved. To append the data we use ‘>>’ (double greater than) operator.
  • In a similar way if we use ‘<‘ (less than) operator we can send the data the other way. For e.g. wc   -w  <  sample.txt, it will print the word count for the file sample.txt. Somewhere you will find that both of them can be used at a time e.g.  wc  -l  <  sample.txt   >  output.txt.
  • If you noticed, I mentioned a number with each data stream, that’s for a purpose. Each stream has a stream number associated with it. So using that we can also redirect an error into a file. e.g. ls  -l  exist.txt  2>  error.txt. The command will give error as the file exist.txt doesn’t exist, but the error will not be shown on the screen, rather it will go into the file error.txt and get saved into it.

Piping (|)

Piping is used to send data send one program to another. It has the same operator as logical OR has. But the operation is totally different, so don’t get confuse. It takes the output of the program on the left as an input to the program on the right. Let’s have some examples:

  1. ls | head -3 : It will take the output of ls and print the first three folders as the final output.
  2. We can also use as many pipes as we want e.g. ls | head -3 | tail -2. The output of 1st example is then passed to the tail -2 i.e. print the last two folders as the final output.
  3. We can also mix the piping and redirections together, e.g. ls | head -3 | tail -2 > output.txt.

Well, with this we have reached to the end of this blog post. I tried to cover many topics in a short manner. Now atleast you would be having an idea about these concepts. Each topic in itself requires more reading and practice. So don’t limit yourself to this post only.

Meet you in the next blog, Till then goodbye! And…

Be curious!

Advertisements

One thought on “Wildcards,Filters and Regex in Linux

  1. Pingback: A short note to CLI – Curiosity never ends

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s