Since the addition of sCMOS cameras and improvement in the ease to set-up automated image acquisition we are routinely seeing users generate data sets that range from 20 GB to over 200 GB in a single imaging session.
For example, a single image from a 4 MP sCMOS camera is 8.3 MB.
- If you capture a 3 channel z-stack at 15 planes the file size is now 3 x 15 x 8.3 = 373.5 MB.
- If you take that same 3 channel z-stack and do a time lapse experiment imaging with 15 time points the file size is now 373.5 MB x 15 = 5.6 GB.
- If you add in just 5 positions to your experiment the file jumps to 28 GB for a relatively simple experiment.
As the size of data sets grows it is important to think about how to manage and handle your data, discussed below. The UCSF library has a guide on data sharing and data management that will help you make sure your data is maintained in a way to meet the requirements for data sharing for Federal funding agencies and many journals.
If you are new to imaging we recommend that you watch the video Introduction to Digital Images on iBiology to familiarize yourself with some of the basics concepts of image files.
Transferring data from the microscope computer
All microscopes in the NIC have gigabit Ethernet connections. These can transfer data at approximately 100 MB/s. Other computers at UCSF may only have 100 megabit Ethernet connections, which can only transfer at 10 MB/s. You can use the NICdata server for local transfers or connect to your lab server if you have one. Another option available at UCSF is Box. Your Box account can be accessed via UCSF MyAccess and there is no longer a size limit tothe amount of data you can store.
If you will be transferring large amounts of data make sure that the drive you are using is USB3. USB2.0 flash drives cannot transfer data at more than 20-30 MB/s (10-15 minutes for a 20GB data set) and we have seen some transfers as slow as 8 MB/s. Good USB3 thumb drives can transfer data at 180MB/s. If you are consistently moving large amounts of data investing in faster transfer speeds will save a lot of time. All of the microscope computers in the NIC have USB3 ports, although we have seen some incompatibilities between specific drives and ports, so if your drive isn’t transferring as fast as you think it should, try another port.
Long term storage and organization
As with any experiment, you need to think about data organization. Depending on your experiments you can easily end up several TB of data for a study. Consider the following issues before you start.
- File names: Use clear, consistent file naming to convey information about what data is in each file. As the number of images grows it will become harder and harder to find the file you are looking for. Having a basic structure for your file names can make this easier.
- File formats: Imaging software often uses proprietary file formats. These file formats will contain metadata that records important details about the microscope and acquisition settings used to collect the image. Proprietary file formats cannot be opened in all software packages, so depending on your analysis pipeline you may need to export your data in a more universal file format, which will add time to the process and take up more hard drive space. If you end up exporting your data you will want to make sure you pay attention to how you are saving your data so you don’t cause a change in your data or make it impossible to analyze.
Examples of common issues with data export:
- Export of images to RGB tiff format. Allows you to see the image but merges channels, preventing any analysis of individual channels.
- Export of images to file formats such as jpeg, which use lossy compression and change the underlying pixel intensity values.
- Metadata: The metadata is a record of all the settings used to acquire your images. Metadata is your friend and though it doesn’t seem important at first it is incredibly useful down the road when you want access to all of the details for your images. You will want to make sure that you keep a copy of the metadata, either as a separate text file or embedded in the image by one of the proprietary file formats or by using an OME-Tiff file format.
For actually organizing your storage there are a few simple techniques that can be used:
- Folder architecture
- Example: Lab/Year(2016)/Project/sub-project/experiment/replicates/files
- You will want some form of documentation to save with your files. This can be a simple README text file in the folders to give important details for your data or an excel spreadsheet with the experiment details for each image file.
- Databases in commercial software packages
- Open source database options such as OMERO
- Home built databases
- Making a good database is very difficult and requires careful planning to make sure that you are able to capture all the needed information
Remember to back-up your data on a regular basis!
Computational requirements for data analysis
Another issue that comes with large data sets is that the computational resources required for visualization and data analysis grow. File sizes can become a big problem quickly for large stitched images and large 3D volumes. The microscopes make it trivial to stitch together images covering an entire slide but if you want to manipulate that image you will have trouble on computers with small amounts of memory.
When designing your imaging experiment, match the imaging conditions to your experimental needs and avoid increasing your data size more than necessary. Factors to consider:
- What magnification and resolution do you need to visualize your sample?
- Do you need the spatial information gained from stitching images or can you process individual image tiles?
- What temporal resolution do you need?
For example, does using a 4x objective give you enough resolution for your image or do you really need to use a 10x or 20x objective? The lower magnification objective dramatically decreases the size of a stitched image. If you don’t need to stitch, don’t; you will see better performance working through 300 individual image files than trying to work on one large image stitched from those 300 files.
Another issue to consider is if you will be able to automate your data analysis. Before collecting thousands of images in a time lapse experiment make sure you have a plan for analysis since it will be very time consuming to analyse all of those images by hand. Automated analysis is not always possible and can take a lot of time and work. Make sure that you do pilot experiments to troubleshoot the analysis before collecting data for the large experiment. Pilot experiments will show you where there are problems with the data analysis. Some of these problems are best fixed by changing how the data is acquired, so you want to learn this before carrying out the large experiment.