Archives Outside

For people who love, use and manage archives

Archives Outside - For people who love, use and manage archives

Digitising your collection – Part 4: Scanning and handling tips

So far in this digitisation blog series we’ve covered program planning, the golden rule of digitisation, the heady world of techs and specs and now we come to practical tips for image capture.

For general handling of archives see our post Moving and handling – the Basics. Most of the tips below are from our reading room posters that were designed for researchers wishing to scan or photograph archives.

See also care and handling guidelines from the National Library of Australia which includes information on glass plate negatives and transparencies.

What can be scanned?

Items smaller than the bed of the scanner:

  • flat cards and single, loose pages
  • photographs
  • glass plate negatives
  • transparencies.

What should be photographed?

  • documents that require the removal of pins or other fasteners
  • documents that would need to be bent or folded in any way on a scanner
  • documents that retain a strong “fold” memory and will not sit flat easily
  • anything that is larger than the glass on your scanner.

General tips for scanning or photographing

  • use soft leather weights to hold documents in place
  • bleed-through from the reverse page can be reduced by placing a sheet of black card between the pages
  • thin documents may benefit from a sheet of white paper placed behind the page.

Using a flat-bed scanner

  • ensure the scanner is calibrated correctly (some equipment includes colour charts, or you can find information about colour management online)
  • is the glass plate clean? Some archives leave dust behind and the screen may need a clean with a soft lens cloth or blower brush between each scan
  • do not place pressure on the scanner lid to keep a document flat – ensure there is a gap by placing your fingers under the scanner cover.

Scanner lid - ensure no pressure is applied

Handling archives for a photography session

Single Pages

  • if the document has been folded place leather weights mainly where the heavy folds will not lie flat (Tip: small undulations will not affect the copy quality)
  • ensure the weights do not obscure text or other information
  • take care not to place weights over damaged areas of the document.

Weights helping to flatten file

Bundled files

– fastened with pins, staples, split pins, thread and plastic ring binders

When fastened at the corner:

  • use weights to position the document while a page is opened
  • maintain a soft curve in the page as you open the document – this will prevent hard creases and tears forming around the folds and indentations
  • use a book pillow to maintain a soft curve where the page does not naturally sit this way.

Using weights with stapled file

When fastened along the edge:

  • use support boards to build a level that matches (or is similar to) the document stack
  • use a soft leather weight to hold the page open on the supporting board stack while you photograph from the document stack
  • maintain a soft curve in the open page to prevent creases and tears.

Stack of documents with a support board and long weight

Volumes/ledgers etc

These objects require special handling to prevent damage and to provide the best quality copy image.

  • place the volume with the spine facing you
  • position pillows (our pillows are filled with beanbag beans) against the spine and open the front cover. The spine should sit easily and with no strain on the sewing
  • increase or decrease the number of pillows to provide the best support
  • open the book in small sections to get to the page you wish to copy
  • use soft leather weights to hold pages open

Large volume supported with pillow

Photographic prints

Photographic materials contain silver compounds and sensitive dyes that are very susceptible to damage from the oils and acids in our skin.

  • always wear plastic gloves (plastic provides a complete barrier between the archives and your skin while allowing for good dexterity and handling feel)
  • do not bend or crease the photo – this will crack the emulsion
  • if a photo is fragile or damaged use a camera rather than a scanner

Maps and plans

Maps and plans can be large and unwieldy, and come on varied supports, including paper and plastic.

  • use soft weights to hold down plans that have been rolled
  • hold from two strong points and carry plans in a u-shape to prevent creases

Carrying maps

Tips for creating your master photos files

Capture the whole image

Capture the edges of the photograph (where possible) to show that the image has not been cropped in any way. The original photos won’t necessarily have square edges so this technique will also ensure no information is left out.

Framing the picture

Frames, mounts, backings

Some photos in our collection have decorative supports (see photo of the doctor below) and some are housed in – or have been glued into – photo albums (see the album below).

Will you include these ‘extras’ in the digitised version? At State Records we do; it ensures the item has been captured in its entirety. Does the backing also need scanning? Check for information that may be relevant to the archive and scan if necessary.

Framing of pictures and text on reverse of image

Framing of pictures and text on reverse of image. Dr Lawrence William Cock, dated March 1903 Digital ID 9873_a025_a025000097

In a recently digitised photographic series at State Records the photos were stored in albums. We scanned the album page in full and then the images separately.

Showing full page scan of album plus individual photo from that page

Showing full page scan of album plus individual photo from the album page

In the next – and final – post we look at quality control, metadata and access.

Digitising your collection – Part 3: Technical specifications

You now know all about the Golden Rule of Digitisation and your plan is starting to come together. In this post we are talking techs and specs such as:  image capture; technical definitions; standards and storage.

This is the third post in a series about starting a digitisation program. The series covers: project planning; technical specifications; handling the archives; scanning tips; file storage, and; access.

In this post:

I’d like to thank our photographer, Tara Majoor, for her time, knowledge and contribution to this post.

Warning: we tried to keep this as basic as possible and link out to more in-depth information but you might want to grab a coffee for this one. Alternatively, if you need some bedtime reading…

Image capture – techs and specs

In our last post you learnt the Golden Rule of Digitisation and the importance of creating a master file (from which derivatives files are made). As you’ll recall, master files are the original files created during the image capture process: the aim of a master file is to be of a high enough quality to meet your organisation’s access and/or preservation needs, both now and in the future.

In order to meet your digitisation goals you need to make some basic decisions relating to image specifications before you begin capturing images. And, more than likely, because of the differences in the original formats (including fragile records, large maps etc) you will need a set of specifications.

It is the unique characteristic of each archive that will often necessitate different approaches to image capture.

For example:

  • photographs and detailed images require a much greater resolution than text-based documents

The main goal when defining your technical specifications is to create the best digital image possible, given the resources available. A basic understanding of the core imaging principles/concepts will assist in this all important decision-making process.

Resolution, bit-depth (colour depth) and colour management make up the core of a digital image. These core ingredients can contain variable amounts of data depending on your selected input parameters – specifications. You should also take time to consider an archival file type for your master files, and determine what compression (if any) you wish to use.

Tech talk – some helpful definitions

Bit depth, colour management  resolution, compression, what the heck is it all about? Please allow us to shed some light on the situation (thanks Tara).

Image resolution

A digital image is a structured matrix (or grid) of tiny squares known as pixels (picture elements). Each of these pixels has an assigned tonal value and when viewed in combination with surrounding pixels form the illusion of a continuous tone image.

Image resolution is simply a measurement of the density (or number) of pixels within the digital image. It describes the amount of detail encoded within a digital image. In the scanning world, resolution is a representation of the number of samples taken from the analogue original (photograph, document etc). In general, a greater number a samples (or higher resolution) should result in a more representative digital surrogate.

Resolution can be measured using two methods. In most software programs these are referred to as pixel dimensions and document size/pixels per inch.

Showing image size properties window

Pixel dimensions (also known as pixel array) – makes reference to the number of pixels in the matrix arrangement (array) horizontally and vertically.

For example:

  • 1024 x 768 pixels, or width=1024 and height=768

Document size/pixels per inch – resolution is most commonly expressed in pixels per inch (ppi) and measures the number of pixels per square inch.

For example:

  • a 1 inch x 1 inch image @ 300ppi image = 300 x 300 pixels

Pixel per inch (ppi) is a variable measurement and is dependent on knowing the size of overall the image; without this scale (or magnification ratio) the measurement loses context.

[You might be familiar with the term dots per inch (dpi) and while the two terms are often interchangeable dpi refers to printed resolution whereas ppi refers to the pixels within the digital image file].

Example of image resolution

Here is a plan from our collection (University Hotel, Parramatta Road, Glebe 1890). Take note of the horse bottom right.

Below is a close-up of the horse and shows three derivatives from the one master file. The higher the resolution, the greater the (uncompressed) file size – from 300ppi for printing down to 75ppi for web delivery.

Showing three version of image resolution

So, should I be scanning at the highest resolution possible?

A common misconception is that scanning at the highest resolution available will always produce the best quality images. Whilst it is true that the amount of detail captured within an image is controlled through resolution there are some factors to be wary of such as interpolated resolution (see below).

And of course, the higher the resolution at which you scan the bigger the file size and this will impact on your storage options (we’ll get to that later).

Optical Resolution vs Interpolated Resolution

  • Optical Resolution describes the maximum sampling rate possible from a given scanning device
  • Interpolated Resolution is additional ‘resolution’ or data made up (an educated guess) by the software program

Interpolation is not desirable, especially for digitisation practices as it can degrade image quality.

Tip: Take note of your scanner’s optical resolution, and only scan up to the optical limit.

Which leads us to another question…

How do I find out my optical resolution?

Consult your scanner’s manual (search online if you don’t have one). To make life extra confusing optical resolution can be expressed in either pixel per inch (where scale = 1:1) or pixel dimensions. When presented in pixel dimensions the smallest value represents ppi at a 1:1 ratio.

For example:

  • an optical resolution of 600 x 1200px is equivalent to 600ppi at a 1:1 scale

Bit depth (tonal or colour depth)

This is the measurement of the number of bits – or binary digits – devoted to storing the colour information about each pixel. The number of bits available determines the maximum possible range of colours and luminosity values (or grey shades) that can be represented within an image’s colour space or palette.

For instance, in a one bit image, each pixel is stored as a single bit (0 or 1) so there are only two digits available (black [0] or white [1]).

The formula for calculating bit-depth is: 2^(number of bit) = number of grey shades. So, for instance, in a one bit image, each pixel is stored as a single bit (0 or 1) meaning there are only two digits available (black [0] or white [1]).

In the image below you can see:

  • 1bit = 2^1 = 2 grey shades (black 0 or white 1)
  • 8 bit = 2^8 = 256 grey shades

1-bit vs 8-bit

So how do we get the colour?

A 24-bit colour image comprises of 8-bits of information for each of the red, green and blue (RGB) channels; so for each pixel there is 8 levels of red, 8 levels of green and 8 levels of blue:

  • 8 x 3 (RGB) = 24-bits

The palette of colours increases to:

  • 256 x 256 x 256 = 16.7 million colours

Down sampling some scanners may present options such as 48-24(bit) or 36-24(bit). The higher figure is the depth at which the scanner samples the raw data; the software then converts this value into a lower bit-depth (the lower figure) which becomes the final bit-depth of the exported image.

Some common bit-depths

Depth No. of Tones Description
1-bit Bi-tonal 2 Monochrome – contains only black (0) and white (1) pixels. Useful when digitising clear printed/typed text documents/publications.
8-bit Greyscale 256 Describes the number of pixels required for continuous tone greyscale, black and white plus a large range of intermediate greys
16-bit Greyscale 65,536 16-bit greyscale uses an extended colour space, creating a much larger file (double 8-bit), and requiring storage in formats that explicitly support this colour depth (TIF).
8-bit Colour*(VGA) 256 This colour mode was used heavy in early digital graphics, and it still sometimes used by web designers. This depth is NOT suitable for digitisation as it does not create True-tone Images.
24-bit Colour 16.7 Million 24-bit colour is the current standard, supported by a wide range of file formats and implication. It comprises of 8-bits of information for the red, green and blue (RGB) values.
48-bit Colour 281 Trillion 48-bit colour (16-bit per RGB channel) uses an extended colour space (trillions of colours) creating a much larger file size (double 24-bit), and requiring storage in formats that explicitly support this colour depth (TIF). Whilst images can be scanned and stored at with high colour depth at present affordable monitors and printers are not available to display or reproduce images with such high quality.

Resolution, bit-depth, file size guide

This table is from the State Records NSW Digitisation Guideline and shows the impact of resolution and bit-depth on file size (in megabytes).

Colour depth Res (ppi) Total bits Uncompressed file size
1 bit bi-tonal 300 8 700 867 1.04mb
1 bit bi-tonal 600 34 803 468 4.15mb
8 bit grey or colour 300 69 606 936 8.30mb
8 bit grey or colour 600 278,427,744 34.00mb
24 bit colour 300 208 820 808 24.89mb
24 bit colour 600 835,283,232 101.96mb

Colour management

We won’t go too in depth on this as the use of colour management is not mandatory, but it does provide the opportunity to create images that have more accurate colour.

However, for the more experienced digitisation readers …

Colour management outlines the colour capabilities of hardware devices – cameras, scanners, monitors and printers – by creating a translation (profile) that controls how the colour is displayed (or printed) by those devices.

Colour profiles ensure the quality of reproduced colour across many output devices. The minimum requirement for most projects should be an input profile outlining the colour space of the device that was used to digitise the document (most devices will default to sRGB).

Printing is a common scenario where the need for colour profile is emphasized. Whilst printing may not be the main objective of your digitisation project, the prospective requirements should be taken into account.

Calibration will also help achieve accurate and reliable colour. Calibration refers to the process of stabilising the imaging equipment to provide a consistent colour representation.

For more information see:

http://getty.edu/research/publications/electronic_publications/introimages/image.html

Phew! Still with us? We’re going to power on through to file types and file compression.

File types

Tip: Be wary of proprietary owned files types – eg: PSD files are Photoshop files. Without the Photoshop program the files are inaccessible.

TIFF (TIF) – Tagged Image File Format

This is currently the preferred archival format for storage of images. It is the most common uncompressed image file type and retains all of the image information. It also offers lossless compression options (see below under File Compression). Most software programs use this format and it is available for both Macintosh and Windows.

JPG (JPEG) – Joint Photographic Experts Group

This format is highly compressed and removes “unnecessary” image information. Most software programs use this format and it is available for both Macintosh and Windows.

JPEG 2000

A compression standard enabling both lossless and lossy storage. The compression methods are different from the ones in standard JPEG and improve quality and compression ratios. However it requires more computational power (or to be more technical, grunt) to process.

Format Bit depth Compression
TIFF (TIF)
  • RGB – 24/48 bits
  • Grayscale – 8/16 bits
  • Indexed colour – 1 to 8 bits
No Compression or Lossless (LWZ)
PNG
  • RGB – 24/48 bits
  • Grayscale – 8/16 bits
  • Indexed colour – 1 to 8 bits
Lossless (ZIP)
JPEG2000 (JP2)
  • RGB – 24/48 bits
  • Grayscale – 8/16 bits
Lossless or Lossy
JPEG (JPG)
  • RGB – 24 bits
  • Grayscale – 8 bits
Lossy
PSD
  • RGB – 24/48 bits
  • Grayscale – 8/16 bits
  • Indexed colour – 1 to 8 bits
No compression

File compression

Compression shrinks the digital images for storage. There are two ways to compress:

1. Lossless eg: TIFF – keeps all data by encoding the image files. It can reduce the file size by 40-60% without scarifying (boo!) any pixel information.

The encoding stores adjacent pixels with the same colour value as a single value and the data records how many pixels have been compressed together. This way of compressing files is highly desirable when no resources for storing un-compressed files is available.

We currently store our master files as un-compressed TIFF.

2. Lossy eg: JPEG/JPG – this way of compression permanently removes “un-important data” (subtle colour/tonal information that is hard to distinguish with the human eye) aiming to strike a balance between acceptable loss of detail and bandwidth.

Lossy compression is not recommended for master images, as it scarifies (boo! x2) pixel information. It is, however, very useful for managing the bandwidth of derivative images – particularly those used for online access.

We use compressed JPEG/PNG images on our website.

While lossless compression is preferable you can see in the image below that lossy compression doesn’t always show a loss of detail. It depends on the amount of compression that is applied which in turn depends on the image content and resolution.

Lossy compression showing quality loss with a heavily compressed file

The more compression applied the more visible the result. With lossy compression you can reduce an image from 1/10 to 1/20 of its original size without perceived loss.  

Tip: Lossy compression is irreversible. Each time a jpeg file is saved – even after minor edits – it will lose quality.

File storage – Digital Asset Management

While storage costs decrease as technological capabilities increase, the size and number of individual digital files will have an impact on your resources. Determining an adequate storage capacity for the amount of data your digitisation program will potentially generate is an important part of your plan.

A helpful storage calculation

To estimate the size of storage required for the digital images, a small organisation may have a calculation like this:

[Average file size = 20MB]

x

[#Digitised files/day = 100]

x

[Workdays/year =260]

=

a storage requirement of 520GB/year (or 1.56Tb over 3 years).

A larger organisation could require a storage capacity of up to 10-15Tb per year (increasing each year). This calculation is from the State Records NSW Digitisation Guideline.

Factors to consider for storage

Security – can the files be tampered with/can an unauthorised user gain access?

Accessibility – are the files easy to retrieve by an authorised user? Is there a record of where items are stored? This could include sensible naming conventions for the digital files; organised folders/labels; keywords (metadata). Will they remain accessible long-term as storage systems change/or update?

An example of naming a convention for a series of files:

series number + job number + photo/file in sequence = 17420_a012_00004.jpg

At State Records our master files are stored on a dedicated server. Access is limited to authorised staff only, lessening the chance of lost or tampered data.

Image files for ‘use’ (web delivery, staff requests, copy orders etc) are stored on a separate server. A greater number of authorised staff have access to these files.

Media – will you store images on a hard-drive; CD/DVD; USB stick/memory card? There’s no perfect medium – each has a limited lifespan.

Back-ups – any of the above media could malfunction – have you made a back-up? Do you regularly update your back-up or check its functionality?

Recognised guidelines for capturing digital images

As we’ve discussed above, resolution, colour-depth, file type, compression and storage need to be considered in your plan.

Remember: these parameters often depend of the format of the original item.

Whilst there is currently no universal standard for digitisation specifications, a number of organisation have published recognised guidelines for capturing digital images – we have included here for your reference.

Every organisation will have differing requirements/capabilities depending on the nature of their collection and the digitisation resources available to them.

If you’re still reading give yourself an almighty pat on the back! In the next post we provide some tips on handling and scanning archives.

Digitising your collection – Part 2: The Golden Rule of Digitisation

So you’ve started to lay out your digitisation plan and have made the decision to scan in-house, outsource the work or split between the two.

This is the second post in a series on starting a digitisation program. The series covers: project planning; technical specifications; handling the archives; scanning tips; file storage, and; metadata and access.

The golden rule

‘Capture once, use many times’

By following this philosophy we digitise without an output in mind.

Capture once, use many times

Avoid the trap of creating a digital image to meet an immediate need. You may find that later on that another digital image (with a different file format requirement) of the same archive is requested. This means you will have to access that archive a second time, resulting in further moving and handling and potential damage.

Always create a high-resolution master file

…regardless of the original purpose. Many derivatives can be created from the one master file to meet many different needs in the future.

Future uses have not yet been thought of

Needs change over time, as does the digital life of an archive. Our archives often make the must-digitise list for a Digital Gallery on our website. A low res jpeg is suitable for web access but a master file is still digitised and a low res derivative created from it. If a web visitor likes a gallery image and submits a copy order request then a high-quality derivative of the master file can be generated without having to access the original item.

Example of the ‘capture once’ philosophy

A while ago we digitised some railway posters and brochures for an exhibition installation at the Western Sydney Records Centre…you remember, the one where our boss woke up at 3am? The documents were digitised as high res (master) TIFFs.

One derivative was generated as a print-quality file to be displayed as a poster in an exhibition case here:

Photo of exhibition display

See the poster front and centre?

And one derivative was created to become the whopping, great window transparency here at the front doors:

Window poster of the same image - capture once use many times

Window poster of the same image – capture once, use many times

Even if we think an image is only to be used as low/web resolution jpeg for web delivery we still create a high resolution master TIFF. If someone places a reading room request for a high quality image – or our boss has another 3am moment – we can provide it without disturbing the original archive.

Keep your program cost-effective

For a digitisation program to be cost effective and achieve its access and preservation goals the image file needs be created with flexibility in mind. Maximise the preservation/access benefits and avoid unnecessary handling of the original records.

And remember the Golden Rule…

Next week we get into the nitty gritty of technical specifications (without giving you a headache).

 

Digitising your collection – Part 1: Project Planning

This is the first in a series of posts on starting a digitisation program. In the series we’ll be talking about: project planning; technical specifications; handling the archives; scanning tips; file storage, and; metadata and access.

Much of this advice is based on experiences at State Records and we’ll be using examples of State Records practices along the way.

You’ve had the big ‘digitisation’ idea, now where to start?

Whether it be a large-scale project to digitise a whole collection or a more targeted preservation-priority strategy you need to have a plan.

Various scanned archival items

Factors to consider in your plan

We’ll go into some of these in more detail in later posts:

  • scanning – to scan in-house or to outsource
  • prioritising the workload – will you begin with the most requested series in the collection or the more fragile items that are in need of preservation, or will you embark on a digitisation-on-demand approach
  • prepping the records – is conservation required, do the archives need re-housing after digitisation, are the items being indexed as part of the project
  • technicalities – what resolution for the ‘master’ file, what is bit-depth, what is file compression, what equipment will be needed
  • time-frames – and workflows depend on the size of the collection and the number of staff allocated to the digitisation project
  • metadata – what are the requirements, generating a unique identifier etc
  • quality assurance checks
  • storage of digital images – long-term and ongoing costs involved
  • what is the plan for the image files – will they become accessible on your website, in an online gallery, in a searchable database, on a social media site such as Flickr, will you need IT expertise to fulfil your vision.

Remember: A digitisation project requires a financial investment – from the initial scanning of the archives (whether it be in-house or outsourced) to the ongoing digital image storage costs. Defining expected costs as part of your planning process will ensure you have adequate resources.

Consult far-and-wide

A digitisation program will have an impact on other areas in your organisation. Are you the sole full-time staff member assigned to the program? Will staff from other areas be involved part-time or will there be a new team dedicated to the cause?

An in-house advisory group will allow managers and staff across the board to discuss possible issues and modify existing workflows before the program begins.

Some of the questions we had to consider here at State Records:

Conservation

Will Conservation staff suddenly be flooded with extra work to prepare archives for scanning? How will the workload be prioritised? Has time for preservation been allotted into the overall time-frame of the project?

Your Conservation team may need to do a ‘health-check’ on the archives that have been flagged for digitisation. The more fragile the records, the more time may be needed in Conservation. This may change digitisation priorities – other records could be pushed to the top of the digitisation list while the less robust records undergo conservation work. Be prepared to be flexible.

Reading room requests

Will archives become inaccessible to researchers while digitisation is in progress? How long for? Public Access staff will need to be made aware of any delays so that notices (online and offline) can be issued to the public that certain records are temporarily unavailable.

Arrangement and Description – archives control and management

Will workflows for staff processing archives be interrupted? Record series on the digitisation list that are not fully processed might need to be slotted in to current work schedules.

Do the archives need re-housing post-digitisation? And if so, will databases need to be updated listing new storage locations?

The digitisation team

Whether or not you have a dedicated digitisation team for in-house scanning the staff involved may need training (or refresher courses) on using equipment such as scanners, cameras, computer software, file storage and advice on handling the archives. You will also need to stay up-to-date with new technologies, equipment and processes.

If you are planning to scan in-house, scope out the equipment you will need and costs involved.

Other organisations

When our Project Officer, Digitisation was appointed and began scoping out the program – way back in 2001 – she started by consulting other cultural organisations who already had digitisation programs in place. Via websites, on the phone and by email, she formulated a list of standards, processes and equipment required.

Some organisations have fully fledged digitisation programs with large budgets and can fulfil large quotas of scanning. It’s a matter of finding the balance of do-able digitisation within the boundaries of your own resources.

Visualising your program

You’ve read some ideas above on what you need to consider in your plan so now it’s time to put pen to paper: visualise your program, consult with staff and begin costing equipment, storage (a handy equation will be available in a later post) and staff resources.

Before we go, a quick look at what you can digitise and some pros and cons of in-house versus outsourced scanning.

What can you digitise?

Practically anything! A range of historic material can be digitised including:

  • large books/registers/volumes
  • manuscripts/documents/files
  • maps/plans
  • illustrations/photographs
  • negatives/transparencies (including glass plates and lantern slides)
  • audio recordings, and
  • cinematic film.

In-house scanning vs outsourced digitisation

In-house scanning Outsource scanning
You retain control of handling and storage of archives Archives need to go off-site, less control over the records
Technical process can be fully controlled & modified as/if needed – a dedicated ‘digitisation’ space required Less control over imaging process, therefore a need to clearly define technical specifications at the outset
Costs for staff training, scanning/storage equipment and software Pay for the cost of scanning only, storage equipment still required

As you can see in the table above there are pros and cons for both in-house and outsourced scanning. At State Records NSW we scan our original archives in-house and digitising of microformed records is outsourced.

In the next post we’ll talk about the golden rule of digitisation.