Publications
You can also find our articles on our Google Scholar profile.
2024
-
Kondo: Efficient Provenance-driven Data Debloating
40th IEEE International Conference on Data Engineering (ICDE)
2023
-
Reproducible eScience: The Data Containerization Challenge
IEEE eScience -
Efficient Differencing of System-level Provenance Graphs
32nd ACM International Conference on Information and Knowledge Management (CIKM) -
Towards Shareable and Reproducible Cloud Computing Experiments
IEEE CloudSummit -
Querying Container Provenance
WWW '23 Companion: Companion Proceedings of the ACM Web Conference -
IOSPReD: I/O Specialized Packaging of Reduced Datasets and Data-Intensive Applications for Efficient Reproducibility
IEEE Access
2022
-
CHEX: Multiversion Replay with Ordered Checkpoints
Proceedings of the Very Large Databases (VLDB) -
Provenance-based Workflow Diagnostics Using Program Specification
29th IEEE International Conference on High Performance Computing, Data, and Analytics -
Reproducible Notebook Containers using Application Virtualization
18th IEEE International Conference on eScience
2021
-
Artifact Description/Artifact Evaluation: A Reproducibility Bane or a Boon
Proceedings of the 4th International Workshop on Practical Reproducible Evaluation of Computer Systems -
On Lowering Merge Costs of an LSM Tree
Proceedings of the 33rd International Conference on Scientific and Statistical Database Management -
LDI: Learned Distribution Index for Column Stores
2021 IEEE International Conference on Big Data (Big Data) -
Reproducibility Practice in High-Performance Computing: Community Survey Results
Computing in Science & Engineering -
An Approach for Open and Reproducible Hydrological Modeling using Sciunit and HydroShare
EGU General Assembly Conference Abstracts
2020
-
Efficient provenance alignment in reproduced executions
12th International Workshop on Theory and Practice of Provenance (TaPP 2020) -
Content-defined Merkle Trees for Efficient Container Delivery
28th IEEE International Conference on High Performance Computing, Data, & Analytics -
A taxonomy for reproducible and replicable research in environmental modelling
Environmental Modelling & Software -
{PROV-CRT}: Provenance Support for Container Runtimes
12th International Workshop on Theory and Practice of Provenance (TaPP 2020) -
Documenting computing environments for reproducible experiments
Parallel Computing: Technology Trends -
DF-toolkit: interacting with low-level database storage
Proceedings of the VLDB Endowment -
ODSA: Open Database Storage Access
Extending Database Technology (EDBT) -
MiDas: Containerizing Data-Intensive Applications with I/O Specialization
Proceedings of the 3rd International Workshop on Practical Reproducible Evaluation of Computer Systems
2019
-
Report on the first international workshop on incremental re-computation: Provenance and beyond
ACM SIGMOD Record -
PLI+: Efficient Clustering of Cloud Databases
Distributed and Parallel Databases -
SciInc: A Container Runtime for Incremental Recomputation
2019 15th International Conference on eScience (eScience)
2018
-
Leveraging Scientific Cyberinfrastructures to Achieve Computational Hydrologic Model Reproducibility
AGU Fall Meeting Abstracts -
Improving Reproducibility of Distributed Computational Experiments
Proceedings of the First International Workshop on Practical Reproducible Evaluation of Computer Systems -
Achieving Reproducible Computational Hydrologic Models by Integrating Scientific Cyberinfrastructures
9th International Congress on Environmental Modelling and Software -
Integrating scientific cyberinfrastructures to improve reproducibility in computational hydrology: Example for HydroShare and GeoTrust
Environmental Modelling & Software -
Detecting database file tampering through page carving
21st International Conference on Extending Database Technology -
Using Provenance for Generating Automatic Citations
10th USENIX Workshop on the Theory and Practice of Provenance (TaPP 2018) -
Utilizing provenance in reusable research objects
Informatics -
Where Provenance in Database Storage
International Provenance and Annotation Workshop
2017
-
Cyberinfrastructure to Support Collaborative and Reproducible Computational Hydrologic Modeling
AGU Fall Meeting Abstracts -
GeoTrust Hub: A Platform For Sharing And Reproducing Geoscience Applications
AGU Fall Meeting Abstracts -
Sciunits: Reusable Research Objects
2017 IEEE 13th International Conference on e-Science (e-Science) -
Database forensic analysis with DBCarver
CIDR 2017, 8th Biennial Conference on Innovative Data Systems Research -
PLI: Augmenting live databases with custom clustered indexes
Proceedings of the 29th International Conference on Scientific and Statistical Database Management
2016
-
Ontology-based urban data exploration
Proceedings of the 2nd ACM SIGSPATIAL Workshop on Smart Cities and Urban Analytics -
Challenges with Maintaining Legacy Software to Achieve Reproducible Computational Analyses: An Example for Hydrologic Modeling Data Processing Pipelines
iEMSs Conference -
Interactive provenance summaries for reproducible science
2016 IEEE 12th International Conference on e-Science (e-Science)
2015
-
GEN: a database interface generator for HPC programs
Proceedings of the 27th International Conference on Scientific and Statistical Database Management -
Sharing and reproducing database applications
Proceedings of the VLDB Endowment -
Personalized, Shareable Geoscience Dataspaces For Simplifying Data Management and Improving Reproducibility
AGU Fall Meeting Abstracts -
PDACS: a portal for data analysis services for cosmological simulations
Computing in Science & Engineering -
An invariant framework for conducting reproducible computational science
Journal of Computational Science -
LDV: Light-weight database virtualization
2015 IEEE 31st International Conference on Data Engineering
2014
-
SOLE: towards descriptive and interactive publications
Implementing reproducible research -
Benchmarking cloud-based tagging services
2014 IEEE 30th International Conference on Data Engineering Workshops -
Auditing and maintaining provenance in software packages
International Provenance and Annotation Workshop -
GeoBase: indexing NetCDF files for large-scale data analysis
Big data management, technologies, and applications -
Plenario: An Open Data Discovery and Exploration Platform for Urban Science.
IEEE Data Eng. Bull. -
GeoDataspaces: Simplifying Data Management Tasks with Globus
AGU Fall Meeting Abstracts
2013
-
Sketching distributed data provenance
Data Provenance and Data Management in eScience -
Proactive Support for Large-Scale Data Exploration
2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum -
Distributed data provenance for large-scale data-intensive computing
2013 IEEE International Conference on Cluster Computing (CLUSTER) -
Using provenance for repeatability
5th USENIX Workshop on the Theory and Practice of Provenance (TaPP 13) -
Lens: a faceted browser for research networking platforms
2013 IEEE 9th International Conference on e-Science -
Towards a provenance-aware distributed filesystem
5th Workshop on the Theory and Practice of Provenance (TaPP)
2012
-
Addressing data access needs of the long-tail distribution of geoscientists
2012 IEEE International Geoscience and Remote Sensing Symposium -
Wagging the long tail of earth science: Why we need an earth science data web, and how to build it
-
SOLE: linking research papers with science objects
International Provenance and Annotation Workshop
2011
-
Policy-based integration of provenance metadata
2011 IEEE International Symposium on Policies for Distributed Systems and Networks -
Improving the efficiency of subset queries on raster images
Proceedings of the ACM SIGSPATIAL Second International Workshop on High Performance and Distributed Geographic Information Systems
2010
-
JAWS: Job-aware workload scheduling for the exploration of turbulence simulations
SC'10: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis -
Tracking and sketching distributed data provenance
2010 IEEE Sixth International Conference on e-Science -
Efficient querying of distributed provenance stores
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing -
Providing scalable data services in ubiquitous networks
International Conference on Database Systems for Advanced Applications -
RNEDE: Resilient network design environment
2010 3rd International Symposium on Resilient Control Systems -
A Dynamic Data Middleware cache for Rapidly-growing Scientific Repositories
ACM/IFIP/USENIX International Conference on Distributed Systems Platforms and Open Distributed Processing
2009
-
Adaptive physical design for curated archives
International Conference on Scientific and Statistical Database Management -
Liferaft: Data-driven, batch processing for the exploration of scientific databases
Conference on Innovative Database Research (CIDR)
2008
-
Rule-based classification systems for informatics
2008 IEEE Fourth International Conference on eScience -
Large scale data management for the sciences
-
Workload-Aware histograms for remote applications
International Conference on Data Warehousing and Knowledge Discovery -
Automated physical design in database caches
2008 IEEE 24th International Conference on Data Engineering Workshop
2007
-
A workload-driven unit of cache replacement for mid-tier database caching
International Conference on Database Systems for Advanced Applications -
A Black-Box Approach to Query Cardinality Estimation.
CIDR
2006
-
Estimating query result sizes for proxy caching in scientific database federations
SC'06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing
2005
-
Practical passive lossy link inference
International Workshop on Passive and Active Network Measurement -
Bypass caching: Making scientific databases good network citizens
21st International Conference on Data Engineering (ICDE'05)
2002
-
Web services for the virtual observatory
Virtual Observatories -
The SDSS SkyServer - Public Access to the Sloan Digital Sky Server Data
ACM Special Interest Group on Management of Data (SIGMOD) -
SkyQuery: A WebService approach to federate databases
arXiv preprint cs/0211023