Split Command

Splitting files with the split command

split command


The split command is a very useful utility that allows you to split large files into smaller files. In the following example, we will split a large file into multiple smaller files:




john@sles01:~/split> ls -l
total 116
-rw-r--r-- 1 john users 112000 May  7 10:26 file1
john@sles01:~/split> wc -l file1
8000 file1

From the above we can see that the file "file1" contains 8000 lines.

By default, when we specify the split command, files are broken down into files containing a max of 1000 lines. This value can be amended.



john@sles01:~/split> split file1
john@sles01:~/split> ls -l
total 244
-rw-r--r-- 1 john users 112000 May  7 10:26 file1
-rw-r--r-- 1 john users  14000 May  7 10:29 xaa
-rw-r--r-- 1 john users  14000 May  7 10:29 xab
-rw-r--r-- 1 john users  14000 May  7 10:29 xac
-rw-r--r-- 1 john users  14000 May  7 10:29 xad
-rw-r--r-- 1 john users  14000 May  7 10:29 xae
-rw-r--r-- 1 john users  14000 May  7 10:29 xaf
-rw-r--r-- 1 john users  14000 May  7 10:29 xag
-rw-r--r-- 1 john users  14000 May  7 10:29 xah

john@sles01:~/split> wc -l xaa
1000 xaa

After the split command has been run against file1, we can see that we now have eight smaller files each containing 1000 lines.

Each of the eight files has been given a unique name starting xaa, xab,xac and so on. We can use the "-d" option to change to a numeric identifier:



john@sles01:~/split> split -d file1
john@sles01:~/split> ls -l
total 372
-rw-r--r-- 1 john users 112000 May  7 10:26 file1
-rw-r--r-- 1 john users  14000 May  7 10:33 x00
-rw-r--r-- 1 john users  14000 May  7 10:33 x01
-rw-r--r-- 1 john users  14000 May  7 10:33 x02
-rw-r--r-- 1 john users  14000 May  7 10:33 x03
-rw-r--r-- 1 john users  14000 May  7 10:33 x04
-rw-r--r-- 1 john users  14000 May  7 10:33 x05
-rw-r--r-- 1 john users  14000 May  7 10:33 x06
-rw-r--r-- 1 john users  14000 May  7 10:33 x07

We can also split our large file into a specified amount of "chunks" by using the "-n" option:



john@sles01:~/split> split -n4 file1
john@sles01:~/split> ls -l
total 228
-rw-r--r-- 1 john users 112000 May  7 10:26 file1
-rw-r--r-- 1 john users  28000 May  7 10:36 xaa
-rw-r--r-- 1 john users  28000 May  7 10:36 xab
-rw-r--r-- 1 john users  28000 May  7 10:36 xac
-rw-r--r-- 1 john users  28000 May  7 10:36 xad
john@sles01:~/split> wc -l xaa
2000 xaa

Now we have four files each with 2000 lines.

We can also specify the amount of lines we require in each file by using the "-l" option:



john@sles01:~/split> split -l4000 file1
john@sles01:~/split> ls -l
total 236
-rw-r--r-- 1 john users 112000 May  7 10:26 file1
-rw-r--r-- 1 john users  56000 May  7 10:39 xaa
-rw-r--r-- 1 john users  56000 May  7 10:39 xab
john@sles01:~/split> wc -l xaa
4000 xaa

By specifying "4000" after the "-l" option, we have reduced the number of files down to two. Each file now contains 4000 lines!

Now lets join the files back together using the "cat" command. "cat" is often used for quickly displaying the contents of a file to stdout, however, its real functionality is to concatenate files. In the example below, we join (concatenate) file "xaa" with file "xab":



john@sles01:~/split> rm file1
john@sles01:~/split> cat xaa xab > file1
john@sles01:~/split> ls -l
total 236
-rw-r--r-- 1 john users 112000 May  7 11:08 file1
-rw-r--r-- 1 john users  56000 May  7 11:08 xaa
-rw-r--r-- 1 john users  56000 May  7 11:08 xab

Commonly used options



Split options

       -a, --suffix-length=N
              use suffixes of length N (default 2)

       -b, --bytes=SIZE
              put SIZE bytes per output file

       -C, --line-bytes=SIZE
              put at most SIZE bytes of lines per output file

       -d, --numeric-suffixes
              use numeric suffixes instead of alphabetic

       -e, --elide-empty-files
              do not generate empty output files with `-n'

       -l, --lines=NUMBER
              put NUMBER lines per output file

       -n, --number=CHUNKS
              generate CHUNKS output files.  See below

       -u, --unbuffered
              immediately copy input to output with `-n r/...'

       --verbose
              print a diagnostic just before each output file is opened