//**1. Converting a scanned document to a pdf document**//
----
Last year I did some consulting for a law firm that required me to submit time sheets with my invoices. In any given invoice period I would undertake work involving multiple clients. Work undertaken for each client was broken down into standard categories, telephone call, email, meeting, etc.
I was working from my own home and my first inclination was to record everything on a spreadsheet formatted to look like the log however this was a bit cumbersome and I found that it was much simpler to just keep a log on the side of my desk, or in my diary, and pen in entries as necessary.
The law firm filed everything as pdf files so I had to submit my log forms via email as a pdf documents.
To make life easy a wrote the following script to convert the scanned log forms from an image to a pdf. I used xsane set to lineart for scanning and saved as either .jpg or png which resulted in an image just about the same width and height as an A4 document. With xsane set to lineart and 300 dpi, the pdf files were around 93.5kB.
#!/bin/bash
############################################################
# /usr/local/bin/con2pdf
# Usage: con2pdf [input file]
# Converts an image to a pdf.
# requires awk
# requires convert (from ImageMagick)
############################################################
# Assign a variable
input_file=$1
# Test to see if a variable was provided with the command
test -n "$input_file"
if [ $? -eq 1 ]; then
echo -e "\nUsage: con2pdf [input file]\n"
exit
fi
# Assign another variable using the output of a command
output_file=`echo "$input_file" | awk -F "." '{ print $1 }'`.pdf
convert $input_file $output_file # This line does the actual conversion
rm $input_file # This line removes the image file
# end of script
I will now explain how this script works:
Note that with the exception of the first line, any text prefixed with a hash, //#//, is ignored up until the next new line. Text prefixed with a hash is usually referred to as a comment. Comments can be put on the same line as a command but only after the command. There are no hard and fast rules about using comments. They are handy to explain things to other folks as well as oneself. I normally don't comment a small script as much as this one. Usually I just add some notes at the top and then perhaps add commets to explain why something is done a certain way for future reference.
#!/bin/bash
The first line of my script begins with the two characters " # " and " ! ". Since files are seen by programs as streams of data, a method is required to determine the format of a particular file within the filesystem. Different operating systems have traditionally taken different approaches to this problem.* In the case of Unix and in our case Linux, " #! " will tell the kernel to treat the file as an executable script and not a machine code program. "/bin/bash" declares the path to the command interpreter that will be used. In the instance //bash//.
input_file=$1
This line is used to assign a variable to //input_file// using the first string of text, i.e. a file name that has been entered after the command con2pdf. More than one variable can be passed to a script when it is run and they would be numbered $1, $2, etc, but I only want to pass the name of the input file to the script in this instance.
test -n "$input_file"
This line uses //test// a bash built in command (builtin) to test if the variable is a non zero string, i.e. if a file name was passed to the script when the command //contopdf// was run. //Test// will exit with an exit status of 0 (true) if //input_file// is a non zero string and 1 (false) if //input_file// is not a non zero string. The exit code does not print to stdout but it can be assigned as the variable //$?// and can then be evaluated using an //if statement//.
if [ $? -eq 1 ]; then
echo -e "\nUsage: con2pdf [input file]\n"
exit
fi
This //if statement// evaluates //$?// to see if it is equal to 1.
If //$?// equals 1 then it will run the bash builtin, //echo// which prints the text within the double quotes to stdout. //Echo// is used with the flag //-e// which enables interpretation of backslash escapes. In this instance a newline, //\n//, is inserted before and after the text.
The next command is the bash builtin //exit// which will be used to exit the script.
All if statements must be closed with //fi//.
output_file=`echo "$input_file" | awk -F "." '{ print $1 }'`.pdf
Instead of passing both an input filename and an output (save) filename to the script the next line to assign an output filename to the variable //output_file//. Variables can be assigned using the output of a command when the command is enclosed in two backticks, //`[command`//.
In this line echo is used to print the variable //input_file// but instead of printing to stdout it is redirected with a pipe to //awk//.
//Awk//, or //gawk//, is a pattern matching program. Here the flag //-F// is used to declare //"."// (full stop) as the field separator. For example, the file name //scanned_image.png// consists of two fields separated by a full stop. Awk will print the first field, //$1// (scanned_file) to stdout.
Note //.pdf// on the same line, after the second backtick. This appends //.pdf// to //$1// so if //$1// was scanned_file, the variable //output_file// would be scanned_file.pdf.
You will find that there are often more than one way to do something when scripting. The command //cut// could also have been used in place of awk.
output_file=`echo "$input_file" | cut -d. -f1`.pdf
Field separators are also referred to as delimiters. In the above line, //-d.// nominates full stop as the delimiter and //-f1// selects field 1 for printing to stdout.
convert $input_file $output_file
rm $input_file
The next two lines need little explanation.
//Convert// is is an Image Magick utility that converts images from one format to another. The file extension //.pdf// appended to the variable //output_file// ensures that the scanned document image will be converted to pdf format.
I did not want save the document images so the next line deletes the image file.
----
//I almost always have a terminal open so my scripts are usually intended to be run on the command line. After saving the scanned image into the directory where the relevant pdf records were kept I would //cd// into that directory and run the command //con2pdf [image name]//.
In the next section I'll show how to modify //con2pdf// so that it will have a gui interface for both selecting the image file and selecting a path and name for the resulting .pdf file//
----
**Cheers!**