Publications
You can also find our articles on our Google Scholar profile.
2025
- 
            
Similarity-Based Assessment of Computational Reproducibility in Jupyter Notebooks
2025 ACM Conference on Reproducibility and Replicability (ACM REP '25) 
2024
- 
            
Kondo: Efficient Provenance-driven Data Debloating
40th IEEE International Conference on Data Engineering (ICDE) 
2023
- 
            
Reproducible eScience: The Data Containerization Challenge
IEEE eScience - 
            
Efficient Differencing of System-level Provenance Graphs
32nd ACM International Conference on Information and Knowledge Management (CIKM) - 
            
Towards Shareable and Reproducible Cloud Computing Experiments
IEEE CloudSummit - 
            
Querying Container Provenance
WWW '23 Companion: Companion Proceedings of the ACM Web Conference - 
            
IOSPReD: I/O Specialized Packaging of Reduced Datasets and Data-Intensive Applications for Efficient Reproducibility
IEEE Access 
2022
- 
            
CHEX: Multiversion Replay with Ordered Checkpoints
Proceedings of the Very Large Databases (VLDB) - 
            
Provenance-based Workflow Diagnostics Using Program Specification
29th IEEE International Conference on High Performance Computing, Data, and Analytics - 
            
Reproducible Notebook Containers using Application Virtualization
18th IEEE International Conference on eScience 
2021
- 
            
Artifact Description/Artifact Evaluation: A Reproducibility Bane or a Boon
Proceedings of the 4th International Workshop on Practical Reproducible Evaluation of Computer Systems - 
            
On Lowering Merge Costs of an LSM Tree
Proceedings of the 33rd International Conference on Scientific and Statistical Database Management - 
            
LDI: Learned Distribution Index for Column Stores
2021 IEEE International Conference on Big Data (Big Data) - 
            
Reproducibility Practice in High-Performance Computing: Community Survey Results
Computing in Science & Engineering - 
            
An Approach for Open and Reproducible Hydrological Modeling using Sciunit and HydroShare
EGU General Assembly Conference Abstracts 
2020
- 
            
Efficient provenance alignment in reproduced executions
12th International Workshop on Theory and Practice of Provenance (TaPP 2020) - 
            
Content-defined Merkle Trees for Efficient Container Delivery
28th IEEE International Conference on High Performance Computing, Data, & Analytics - 
            
A taxonomy for reproducible and replicable research in environmental modelling
Environmental Modelling & Software - 
            
{PROV-CRT}: Provenance Support for Container Runtimes
12th International Workshop on Theory and Practice of Provenance (TaPP 2020) - 
            
Documenting computing environments for reproducible experiments
Parallel Computing: Technology Trends - 
            
DF-toolkit: interacting with low-level database storage
Proceedings of the VLDB Endowment - 
            
ODSA: Open Database Storage Access
Extending Database Technology (EDBT) - 
            
MiDas: Containerizing Data-Intensive Applications with I/O Specialization
Proceedings of the 3rd International Workshop on Practical Reproducible Evaluation of Computer Systems 
2019
- 
            
Report on the first international workshop on incremental re-computation: Provenance and beyond
ACM SIGMOD Record - 
            
PLI+: Efficient Clustering of Cloud Databases
Distributed and Parallel Databases - 
            
SciInc: A Container Runtime for Incremental Recomputation
2019 15th International Conference on eScience (eScience) 
2018
- 
            
Leveraging Scientific Cyberinfrastructures to Achieve Computational Hydrologic Model Reproducibility
AGU Fall Meeting Abstracts - 
            
Improving Reproducibility of Distributed Computational Experiments
Proceedings of the First International Workshop on Practical Reproducible Evaluation of Computer Systems - 
            
Achieving Reproducible Computational Hydrologic Models by Integrating Scientific Cyberinfrastructures
9th International Congress on Environmental Modelling and Software - 
            
Integrating scientific cyberinfrastructures to improve reproducibility in computational hydrology: Example for HydroShare and GeoTrust
Environmental Modelling & Software - 
            
Detecting database file tampering through page carving
21st International Conference on Extending Database Technology - 
            
Using Provenance for Generating Automatic Citations
10th USENIX Workshop on the Theory and Practice of Provenance (TaPP 2018) - 
            
Utilizing provenance in reusable research objects
Informatics - 
            
Where Provenance in Database Storage
International Provenance and Annotation Workshop 
2017
- 
            
Cyberinfrastructure to Support Collaborative and Reproducible Computational Hydrologic Modeling
AGU Fall Meeting Abstracts - 
            
GeoTrust Hub: A Platform For Sharing And Reproducing Geoscience Applications
AGU Fall Meeting Abstracts - 
            
Sciunits: Reusable Research Objects
2017 IEEE 13th International Conference on e-Science (e-Science) - 
            
Database forensic analysis with DBCarver
CIDR 2017, 8th Biennial Conference on Innovative Data Systems Research - 
            
PLI: Augmenting live databases with custom clustered indexes
Proceedings of the 29th International Conference on Scientific and Statistical Database Management 
2016
- 
            
Ontology-based urban data exploration
Proceedings of the 2nd ACM SIGSPATIAL Workshop on Smart Cities and Urban Analytics - 
            
Challenges with Maintaining Legacy Software to Achieve Reproducible Computational Analyses: An Example for Hydrologic Modeling Data Processing Pipelines
iEMSs Conference - 
            
Interactive provenance summaries for reproducible science
2016 IEEE 12th International Conference on e-Science (e-Science) 
2015
- 
            
GEN: a database interface generator for HPC programs
Proceedings of the 27th International Conference on Scientific and Statistical Database Management - 
            
Sharing and reproducing database applications
Proceedings of the VLDB Endowment - 
            
Personalized, Shareable Geoscience Dataspaces For Simplifying Data Management and Improving Reproducibility
AGU Fall Meeting Abstracts - 
            
PDACS: a portal for data analysis services for cosmological simulations
Computing in Science & Engineering - 
            
An invariant framework for conducting reproducible computational science
Journal of Computational Science - 
            
LDV: Light-weight database virtualization
2015 IEEE 31st International Conference on Data Engineering 
2014
- 
            
SOLE: towards descriptive and interactive publications
Implementing reproducible research - 
            
Benchmarking cloud-based tagging services
2014 IEEE 30th International Conference on Data Engineering Workshops - 
            
Auditing and maintaining provenance in software packages
International Provenance and Annotation Workshop - 
            
GeoBase: indexing NetCDF files for large-scale data analysis
Big data management, technologies, and applications - 
            
Plenario: An Open Data Discovery and Exploration Platform for Urban Science.
IEEE Data Eng. Bull. - 
            
GeoDataspaces: Simplifying Data Management Tasks with Globus
AGU Fall Meeting Abstracts 
2013
- 
            
Sketching distributed data provenance
Data Provenance and Data Management in eScience - 
            
Proactive Support for Large-Scale Data Exploration
2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum - 
            
Distributed data provenance for large-scale data-intensive computing
2013 IEEE International Conference on Cluster Computing (CLUSTER) - 
            
Using provenance for repeatability
5th USENIX Workshop on the Theory and Practice of Provenance (TaPP 13) - 
            
Lens: a faceted browser for research networking platforms
2013 IEEE 9th International Conference on e-Science - 
            
Towards a provenance-aware distributed filesystem
5th Workshop on the Theory and Practice of Provenance (TaPP) 
2012
- 
            
Addressing data access needs of the long-tail distribution of geoscientists
2012 IEEE International Geoscience and Remote Sensing Symposium - 
            
Wagging the long tail of earth science: Why we need an earth science data web, and how to build it
 - 
            
SOLE: linking research papers with science objects
International Provenance and Annotation Workshop 
2011
- 
            
Policy-based integration of provenance metadata
2011 IEEE International Symposium on Policies for Distributed Systems and Networks - 
            
Improving the efficiency of subset queries on raster images
Proceedings of the ACM SIGSPATIAL Second International Workshop on High Performance and Distributed Geographic Information Systems 
2010
- 
            
JAWS: Job-aware workload scheduling for the exploration of turbulence simulations
SC'10: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis - 
            
Tracking and sketching distributed data provenance
2010 IEEE Sixth International Conference on e-Science - 
            
Efficient querying of distributed provenance stores
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing - 
            
Providing scalable data services in ubiquitous networks
International Conference on Database Systems for Advanced Applications - 
            
RNEDE: Resilient network design environment
2010 3rd International Symposium on Resilient Control Systems - 
            
A Dynamic Data Middleware cache for Rapidly-growing Scientific Repositories
ACM/IFIP/USENIX International Conference on Distributed Systems Platforms and Open Distributed Processing 
2009
- 
            
Adaptive physical design for curated archives
International Conference on Scientific and Statistical Database Management - 
            
Liferaft: Data-driven, batch processing for the exploration of scientific databases
Conference on Innovative Database Research (CIDR) 
2008
- 
            
Rule-based classification systems for informatics
2008 IEEE Fourth International Conference on eScience - 
            
Large scale data management for the sciences
 - 
            
Workload-Aware histograms for remote applications
International Conference on Data Warehousing and Knowledge Discovery - 
            
Automated physical design in database caches
2008 IEEE 24th International Conference on Data Engineering Workshop 
2007
- 
            
A workload-driven unit of cache replacement for mid-tier database caching
International Conference on Database Systems for Advanced Applications - 
            
A Black-Box Approach to Query Cardinality Estimation.
CIDR 
2006
- 
            
Estimating query result sizes for proxy caching in scientific database federations
SC'06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing 
2005
- 
            
Practical passive lossy link inference
International Workshop on Passive and Active Network Measurement - 
            
Bypass caching: Making scientific databases good network citizens
21st International Conference on Data Engineering (ICDE'05) 
2002
- 
            
Web services for the virtual observatory
Virtual Observatories - 
            
The SDSS SkyServer - Public Access to the Sloan Digital Sky Server Data
ACM Special Interest Group on Management of Data (SIGMOD) - 
            
SkyQuery: A WebService approach to federate databases
arXiv preprint cs/0211023