Image-Based Experimental Analysis 

Scalable Analysis for Multidimensional

Structured Data

Overview and challenges:

Our algorithms are not keeping up with the rapid increase in the capabilities of imaging sensors. 


    Many DOE research laboratories store digital images as part of their experimental records. Limitations in image analysis hamper our ability to understand the data acquired by high resolution sensors. As an example, much of the data acquired at imaging facilities is manually inspected, delaying access to experimental results.


    Invaluable information, encoded in these large datasets and obtained at considerable cost, is often lost.


    Currently, users are forced to utilize memory-bound tools that require drastic downsampling in order to analyze overwhelming data sizes/rates. Much of the precision and nuance captured by the experimental apparatus vanishes with improper downsampling. DOE identified these bottlenecks, emphasizing that analysis of data coming from high-throughput sensors is a fundamental challenge for data-intensive science. Analysis methods, such as those we have been working on, provide means for compressing large data, comparisons, and understanding to guide and optimize experiments. Solutions to these problems require parallel-capable algorithms to accommodate increasing data size and complexity, as well as new analysis algorithms. Advances in image-based methods will save time between experiments, make efficient use of materials, and open up imaging instruments to more experiments for more users.


This project focuses on the data-intensive science of quantitative image analysis at scale, which entails image processing, representation, analysis, classification, concurrency and optimization. It encompasses on-going and growing collaboration within LBL, including the Advanced Light Source, and NERSC DOE user facilities, and various LBNL Divisions, including  Earth Sci,ences,  Materials Sciences, and Life Sciences,  as well as cross-institutional endeavors such as the Bay Area Physical Sciences Oncology center (Bay Area PSOC), aimed at understanding mechanistic principles underlying cancer progression, and the Solar Durability and Lifetime Extension Center at Case Western Reserve University (CWRU), aimed at controlling degradation pathways in interface-rich energy materials.  Some of the scientific challenges at the heart of these missions rest on multiple imaging techniques and a data generated from a vast variety of samples.


Working with all of the above partners, our mission is to employ a wide spectrum of imaging algorithms, from low level techniques to high level pattern recognition algorithms. All together, we aim to advance knowledge discovery from datasets that are inherently noisy, multi-modal and multiscale. This will allow complex user queries through a combination of algorithms. Effective methods will also require attention to computer science issues, such as code optimization. In tandem with algorithmic developments, we will capitalize on our previous work on parallel programming for compute-intensive kernels. Above all, we propose to provide users with workable modules that can enhance experimental analysis by tracking user intervention.


    As part of CAMERA, we are building the algorithms and methodologies that will  enable us to analyze unprecedented volumes and velocities of data. Our research focuses on scalable analysis of multidimensional structured data, as those produced by simulations and experiments. In collaboration with CAMERA, we will design fundamental computer vision and pattern recognition algorithms useful throughout image-dependent applications. Only within a larger data science context, we will be able to tackle:
    (a) the scalability of mathematical and statistical image analysis techniques;

    (b) the exploration of domain-specific knowledge about known structures as constraints, applying priors to find scientifically relevant structures;
    (c) the need for human-machine interaction algorithms that merge these two approaches by monitoring users as they navigate data, recording and then suggesting proper models for both low level features and high level structures;
    (d) the support to release and maintain free and open-source codes.


Our algorithms and software will benefit image-based science that relies on accurate measurements, improving the ability to:
(a) extract information from noisy data; 

(b) construct structural models, in particular
create analyzable 3D models from 2D scans; 

(c) initialize numerical models; 

(d) establish check points for numerical simulations to verify if models match experimental data; 

(d) explore and summarize data, etc., and (e) distill large data sets into scientific relevant feature vectors in lower dimensional-spaces. These algorithmic advancements are important for scientific research that requires analyzing information hidden in digital images.

Current targetted areas include:

(1) quantification of porous material clogging (as part of geological processes involved in carbon sequestration and oil recovery);
(2) crack detection and measurement of materials for use in energy-engineering applications such as the design and inspection of turbines, and in non-destructive testing of sensitive materials;
(3) fracking analysis of samples in order to understand environmental impact;
(4) molecule and cell counting, and detection of cell microstructures with unknown functionalities that play a major role in mechanical regulation and communication intra and inter-cell, with application to artificial photosynthesis and the search for biofuels.