Publications

Peer-reviewed papers, articles, and scholarly outputs from the Radiant Systems Lab.

You can also browse our Google Scholar profile.

Filter by

2026

  1. Efficiently Reproducing Distributed Workflows in Notebook-based Systems

    Article
    Talha, Azaz. Raza, Ahmad. and Malik, Tanu.
    The 26th IEEE International Symposium on Cluster, Cloud, and Internet Computing (CCGrid) February 2026

2025

  1. AI-Generated Code Is Not Reproducible (Yet): An Empirical Study of Dependency Gaps in LLM-Based Coding Agents

    Article
    Vangala, B. P., Adibifar, A., Gehani, A., and Malik, T.
    Reproducible Artificial Intelligence (RAI2025) Workshop December 2025
  2. Efficient Multi-Model Orchestration for Self-Hosted Large Language Models

    Article
    Vangala, B. P., and Malik, T.
    Deployable Artificial Intelligence (DAI2025) Workshop November 2025
  3. Similarity-Based Assessment of Computational Reproducibility in Jupyter Notebooks

    Article
    Hossain, A. S. M. S., Brown, C., Koop, D., and Malik, T.
    2025 ACM Conference on Reproducibility and Replicability (ACM REP '25) July 2025
  4. Accurate Differential Analysis using Record and Selective Replay

    Article
    Nakamura, Y., Chu, X., Laguna, I., and Malik, T.
    37th International Conference on Scalable Scientific Data Management June 2025

2024

  1. Accurate Path Prediction of Provenance Traces

    Article
    Ahmad, R., Jung, H. Y., Nakamura, Y., and Malik, T.
    33rd ACM International Conference on Information and Knowledge Management (CIKM) October 2024
  2. Kondo: Efficient Provenance-driven Data Debloating

    Article
    Modi, A., Tikmany, R., Malik, T., Komondoor,R., Gehani, A. and D'Souza, D.
    40th IEEE International Conference on Data Engineering (ICDE) January 2024
  3. Efficiently Reducing Storage Footprint in Reproducible Containers via I/O Specialization

    Article
    Tikmany, R., Modi, A., Atiq, R., Reyad, M., Gehani, A., and Malik, T.
    IEEE Cluster, Cloud and Internet Computing (CCGrid) January 2024
  4. FAIR Assessment of Cloud-based Experiments

    Article
    Kamath, K., Brewer, N., and Malik, T.
    4th Workshop on Reproducible Workflows, Data Management, and Security (ReWoRDS) January 2024

2023

  1. Reproducible eScience: The Data Containerization Challenge

    Article
    Malik, Tanu
    IEEE eScience October 2023
  2. Efficient Differencing of System-level Provenance Graphs

    Article
    Nakamura, Y, Kanj, I and Malik, T
    32nd ACM International Conference on Information and Knowledge Management (CIKM) August 2023
  3. Towards Shareable and Reproducible Cloud Computing Experiments

    Article
    Malik, T and Khan, S
    IEEE CloudSummit June 2023
  4. Querying Container Provenance

    Article
    Modi, A., Reyad, M, Gehani, A., and Malik, T
    WWW '23 Companion: Companion Proceedings of the ACM Web Conference April 2023
  5. IOSPReD: I/O Specialized Packaging of Reduced Datasets and Data-Intensive Applications for Efficient Reproducibility

    Article
    Niddodi, C., Gehani, A., Malik, T., Mohan, S., and Rilee, M.
    IEEE Access February 2023
  6. Comparing containerization-based approaches for reproducible computational modeling of environmental systems

    Article
    Choi, Y., Roy, B., Nguyen, J., Ahmad, R., Maghami, I., Nassar, A., Li, Z., Castronova, A., Malik, T., Wang, S., and Goodall, J.
    Environmental Modelling & Software January 2023

2022

  1. Provenance-based Workflow Diagnostics Using Program Specification

    Chapter
    Nakamura, Y. Malik, T. Kanj, I. Gehani, A.
    29th IEEE International Conference on High Performance Computing, Data, and Analytics December 2022
  2. Reproducible Notebook Containers using Application Virtualization

    Chapter
    Ahmad, R. Manne, N. Malik, T.
    18th IEEE International Conference on eScience October 2022
  3. CHEX: Multiversion Replay with Ordered Checkpoints

    Article
    Manne, N. N. Satpati, S. Malik, T. Bagchi, A. Gehani, A. Chaudhary, A.
    Proceedings of the Very Large Databases (VLDB) February 2022

2021

  1. LDI: Learned Distribution Index for Column Stores

    Other
    That, D. T. Gharehdaghi, M. Rasin, A. Malik, T.
    2021 IEEE International Conference on Big Data (Big Data) December 2021
  2. Reproducibility Practice in High-Performance Computing: Community Survey Results

    Article
    Plale, B. A. Malik, T. Pouchard, L. C.
    Computing in Science & Engineering September 2021
  3. Reproducibility Practice in High Performance Computing: Community Survey Results

    Article
    Plale, B., Malik, T., and Pouchard, L.
    IEEE Computing in Science and Engineering September 2021
  4. On Lowering Merge Costs of an LSM Tree

    Other
    That, D. H. T. Gharehdaghi, M. Rasin, A. Malik, T.
    Proceedings of the 33rd International Conference on Scientific and Statistical Database Management July 2021
  5. Artifact Description/Artifact Evaluation: A Reproducibility Bane or a Boon

    Chapter
    Malik, T.
    Proceedings of the 4th International Workshop on Practical Reproducible Evaluation of Computer Systems June 2021
  6. An Approach for Open and Reproducible Hydrological Modeling using Sciunit and HydroShare

    Article
    Choi, YoungDon and Goodall, Jonathan and Ahmad, Raza and Malik, Tanu and Tarboton, David
    EGU General Assembly Conference Abstracts April 2021
  7. On Lowering Merge Costs of an LSM Tree

    Article
    Ton That, D. H., Gharehdaghi, M., Rasin, A., and Malik, T.
    33rd International Conference on Scientific and Statistical Database Management (SSDBM) January 2021

2020

  1. A taxonomy for reproducible and replicable research in environmental modelling

    Article
    Essawy, B. T. Goodall, J. L. Voce, D. Morsy, M. M. Sadler, J. M. Choi, Y. D. Tarboton, D. G. Malik, T.
    Environmental Modelling & Software December 2020
  2. DF-toolkit: interacting with low-level database storage

    Article
    Wagner, J. Rasin, A. Heart, K. Malik, T. Grier, J.
    Proceedings of the VLDB Endowment August 2020
  3. ODSA: Open Database Storage Access

    Other
    Wagner, J. Rasin, A. Malik, T. Grier, J.
    Extending Database Technology (EDBT) August 2020
  4. MiDas: Containerizing Data-Intensive Applications with I/O Specialization

    Chapter
    Niddodi, C. Gehani, A. Malik, T. Navas, J. A. Mohan, S.
    Proceedings of the 3rd International Workshop on Practical Reproducible Evaluation of Computer Systems June 2020
  5. Content-defined Merkle Trees for Efficient Container Delivery

    Other
    Nakamura, Y. Ahmad, R. Malik, T.
    28th IEEE International Conference on High Performance Computing, Data, & Analytics January 2020
  6. Documenting computing environments for reproducible experiments

    Chapter
    Chuah, J. Deeds, M. Malik, T. Choi, Y. Goodall, J. L.
    Parallel Computing: Technology Trends January 2020
  7. Efficient provenance alignment in reproduced executions

    Other
    Nakamura, Y. Malik, T. Gehani, A.
    12th International Workshop on Theory and Practice of Provenance (TaPP 2020) January 2020
  8. {PROV-CRT}: Provenance Support for Container Runtimes

    Other
    Ahmad, R. Nakamura, Y. Manne, N. N. Malik, T.
    12th International Workshop on Theory and Practice of Provenance (TaPP 2020) January 2020

2019

  1. SciInc: A Container Runtime for Incremental Recomputation

    Other
    Youngdahl, A. Ton-That, D. Malik, T.
    2019 15th International Conference on eScience (eScience) September 2019
  2. Report on the first international workshop on incremental re-computation: Provenance and beyond

    Article
    Missier, P. Malik, T. Cala, J.
    ACM SIGMOD Record May 2019
  3. PLI+: Efficient Clustering of Cloud Databases

    Article
    That, D. H. T. Wagner, J. Rasin, A. Malik, T.
    Distributed and Parallel Databases March 2019

2018

  1. Leveraging Scientific Cyberinfrastructures to Achieve Computational Hydrologic Model Reproducibility

    Article
    Sadler, J. Essawy, B. Goodall, J. Voce, D. CHOI, Y. Morsy, M. Yuan, Z. Malik, T.
    AGU Fall Meeting Abstracts December 2018
  2. Where Provenance in Database Storage

    Other
    Rasin, A. Malik, T. Wagner, J. Kim, C.
    International Provenance and Annotation Workshop July 2018
  3. Integrating scientific cyberinfrastructures to improve reproducibility in computational hydrology: Example for HydroShare and GeoTrust

    Article
    Essawy, B. T. Goodall, J. L. Zell, W. Voce, D. Morsy, M. M. Sadler, J. Yuan, Z. Malik, T.
    Environmental Modelling & Software July 2018
  4. Improving Reproducibility of Distributed Computational Experiments

    Chapter
    Pham, Q. Malik, T. That, D. H. T. Youngdahl, A.
    Proceedings of the First International Workshop on Practical Reproducible Evaluation of Computer Systems June 2018
  5. Detecting database file tampering through page carving

    Article
    Wagner, J. Rasin, A. Heart, K. Malik, T. Furst, J. Grier, J.
    21st International Conference on Extending Database Technology March 2018
  6. Utilizing provenance in reusable research objects

    Article
    Yuan, Z. That, D. H. T. Kothari, S. Fils, G. Malik, T.
    Informatics March 2018
  7. Achieving Reproducible Computational Hydrologic Models by Integrating Scientific Cyberinfrastructures

    Other
    Essawy, B. T. Goodall, J. L. Morsy, M. M. Zell, W. Sadler, J. Malik, T. Yuan, Z. Voce, D.
    9th International Congress on Environmental Modelling and Software January 2018
  8. Using Provenance for Generating Automatic Citations

    Other
    Malik, T. Rasin, A. Youngdahl, A.
    10th USENIX Workshop on the Theory and Practice of Provenance (TaPP 2018) January 2018

2017

  1. Cyberinfrastructure to Support Collaborative and Reproducible Computational Hydrologic Modeling

    Article
    Goodall, J. L. Castronova, A. M. Bandaragoda, C. Morsy, M. M. Sadler, J. M. Essawy, B. Tarboton, D. G. Malik, T. Nijssen, B. Clark, M. P. Liu, Y. Wang, S.
    AGU Fall Meeting Abstracts December 2017
  2. GeoTrust Hub: A Platform For Sharing And Reproducing Geoscience Applications

    Article
    Malik, T. Tarboton, D. G. Goodall, J. L. Choi, E. Bhatt, A. Peckham, S. D. Foster, I. That, D. T. Essawy, B. Yuan, Z. Dash, P. Fils, G. Gan, T. Fadugba, O. I. Saxena, A. Valentic, T. A.
    AGU Fall Meeting Abstracts December 2017
  3. Sciunits: Reusable Research Objects

    Other
    Ton That DH. Fils, G. Yuan, Z. Malik, T.
    2017 IEEE 13th International Conference on e-Science (e-Science) October 2017
  4. PLI: Augmenting live databases with custom clustered indexes

    Chapter
    Wagner, J. Rasin, A. That, D. H. T. Malik, T.
    Proceedings of the 29th International Conference on Scientific and Statistical Database Management June 2017
  5. Database forensic analysis with DBCarver

    Article
    Wagner, J. Rasin, A. Malik, T. Heart, K. Jehle, H. Grier, J.
    CIDR 2017, 8th Biennial Conference on Innovative Data Systems Research January 2017

2016

  1. Ontology-based urban data exploration

    Chapter
    Balasubramani, B. S. Shivaprabhu, V. R. Krishnamurthy, S. Cruz, I. F. Malik, T.
    Proceedings of the 2nd ACM SIGSPATIAL Workshop on Smart Cities and Urban Analytics October 2016
  2. Interactive provenance summaries for reproducible science

    Other
    Li, X. Xu, X. Malik, T.
    2016 IEEE 12th International Conference on e-Science (e-Science) October 2016
  3. Challenges with Maintaining Legacy Software to Achieve Reproducible Computational Analyses: An Example for Hydrologic Modeling Data Processing Pipelines

    Other
    Essawy, B. T. Goodall, J. L. Malik, T. Xu, H. Conway, M. Gil, Y.
    iEMSs Conference January 2016

2015

  1. Personalized, Shareable Geoscience Dataspaces For Simplifying Data Management and Improving Reproducibility

    Article
    Malik, T. Foster, I. Goodall, J. L. Peckham, S. D. Baker, J. B. Gurnis, M.
    AGU Fall Meeting Abstracts December 2015
  2. Sharing and reproducing database applications

    Article
    Pham, Q. Thaler, S. Malik, T. Foster, I. Glavic, B.
    Proceedings of the VLDB Endowment August 2015
  3. PDACS: a portal for data analysis services for cosmological simulations

    Article
    Madduri, R. Rodriguez, A. Uram, T. Heitmann, K. Malik, T. Sehrish, S. Chard, R. Cholia, S. Paterno, M. Kowalkowski, J. Habib, S.
    Computing in Science & Engineering July 2015
  4. An invariant framework for conducting reproducible computational science

    Article
    Meng, H. Kommineni, R. Pham, Q. Gardner, R. Malik, T. Thain, D.
    Journal of Computational Science July 2015
  5. GEN: a database interface generator for HPC programs

    Chapter
    Pham, Q. Malik, T.
    Proceedings of the 27th International Conference on Scientific and Statistical Database Management June 2015
  6. LDV: Light-weight database virtualization

    Other
    Pham, Q. Malik, T. Glavic, B. Foster, I.
    2015 IEEE 31st International Conference on Data Engineering April 2015

2014

  1. GeoDataspaces: Simplifying Data Management Tasks with Globus

    Article
    Malik, T. Chard, K. Tchoua, R. B. Foster, I.
    AGU Fall Meeting Abstracts December 2014
  2. Plenario: An Open Data Discovery and Exploration Platform for Urban Science.

    Article
    Catlett, C. Malik, T. Goldstein, B. Giuffrida, J. Shao, Y. Panella, A. Eder, D. Zanten, E. v. Mitchum, R. Thaler, S. Foster, I. T.
    IEEE Data Eng. Bull. December 2014
  3. Auditing and maintaining provenance in software packages

    Other
    Pham, Q. Malik, T. Foster, I.
    International Provenance and Annotation Workshop June 2014
  4. Benchmarking cloud-based tagging services

    Other
    Malik, T. Chard, K. Foster, I.
    2014 IEEE 30th International Conference on Data Engineering Workshops March 2014
  5. GeoBase: indexing NetCDF files for large-scale data analysis

    Chapter
    Malik, T.
    Big data management, technologies, and applications January 2014
  6. SOLE: towards descriptive and interactive publications

    Article
    Malik, T. Pham, Q. Foster, I. T. Leisch, F. Peng, R.
    Implementing reproducible research January 2014
  7. Plenario: An Open Data Discovery and Exploration Platform for Urban Science

    Article
    Catlett, C., Malik, T., Goldstein, B., Giuffrida, J., Shao, Y., Panella, A., Eder, D., van Zanten, E., Mitchum, R., Thaler, S., and Foster, I.
    IEEE Data Engineering Bulletin January 2014

2013

  1. Lens: a faceted browser for research networking platforms

    Other
    Whaling, R. Malik, T. Foster, I.
    2013 IEEE 9th International Conference on e-Science October 2013
  2. Distributed data provenance for large-scale data-intensive computing

    Other
    Zhao, D. Shou, C. Maliky, T. Raicu, I.
    2013 IEEE International Conference on Cluster Computing (CLUSTER) September 2013
  3. Proactive Support for Large-Scale Data Exploration

    Other
    Hereld, M. Malik, T. Vishwanath, V.
    2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum May 2013
  4. Sketching distributed data provenance

    Chapter
    Malik, T. Gehani, A. Tariq, D. Zaffar, F.
    Data Provenance and Data Management in eScience January 2013
  5. Towards a provenance-aware distributed filesystem

    Article
    Shou, C. Zhao, D. Malik, T. Raicu, I.
    5th Workshop on the Theory and Practice of Provenance (TaPP) January 2013
  6. Using provenance for repeatability

    Other
    Pham, Q. Malik, T. Foster, I.
    5th USENIX Workshop on the Theory and Practice of Provenance (TaPP 13) January 2013

2012

  1. Addressing data access needs of the long-tail distribution of geoscientists

    Other
    Malik, T. Foster, I.
    2012 IEEE International Geoscience and Remote Sensing Symposium July 2012
  2. SOLE: linking research papers with science objects

    Other
    Pham, Q. Malik, T. Foster, I. Lauro, R. D. Montella, R.
    International Provenance and Annotation Workshop June 2012
  3. Wagging the long tail of earth science: Why we need an earth science data web, and how to build it

    Other
    Foster, I. Katz, D. S. Malik, T. Fox, P.
    January 2012

2011

  1. Improving the efficiency of subset queries on raster images

    Chapter
    Malik, T. Best, N. Elliott, J. Madduri, R. Foster, I.
    Proceedings of the ACM SIGSPATIAL Second International Workshop on High Performance and Distributed Geographic Information Systems November 2011
  2. Policy-based integration of provenance metadata

    Other
    Gehani, A. Tariq, D. Baig, B. Malik, T.
    2011 IEEE International Symposium on Policies for Distributed Systems and Networks June 2011

2010

  1. Tracking and sketching distributed data provenance

    Other
    Malik, T. Nistor, L. Gehani, A.
    2010 IEEE Sixth International Conference on e-Science December 2010
  2. A Dynamic Data Middleware cache for Rapidly-growing Scientific Repositories

    Other
    Malik, T. Wang, X. Little, P. Chaudhary, A. Thakar, A.
    ACM/IFIP/USENIX International Conference on Distributed Systems Platforms and Open Distributed Processing November 2010
  3. JAWS: Job-aware workload scheduling for the exploration of turbulence simulations

    Other
    Wang, X. Perlman, E. Burns, R. Malik, T. Budavári, T. Meneveau, C. Szalay, A.
    SC'10: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis November 2010
  4. RNEDE: Resilient network design environment

    Other
    Venkatasubramanian, V. Malik, T. Giridhar, A. Villez, K. Prasad, R. Shukla, A. Rieger, C. Daum, K. McQueen, M.
    2010 3rd International Symposium on Resilient Control Systems August 2010
  5. Efficient querying of distributed provenance stores

    Chapter
    Gehani, A. Kim, M. Malik, T.
    Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing June 2010
  6. Providing scalable data services in ubiquitous networks

    Other
    Malik, T. Prasad, R. Patil, S. Chaudhary, A. Venkatasubramanian, V.
    International Conference on Database Systems for Advanced Applications April 2010

2009

  1. Liferaft: Data-driven, batch processing for the exploration of scientific databases

    Article
    Wang, X. Burns, R. Malik, T.
    Conference on Innovative Database Research (CIDR) September 2009
  2. Adaptive physical design for curated archives

    Other
    Malik, T. Wang, X. Dash, D. Chaudhary, A. Ailamaki, A. Burns, R.
    International Conference on Scientific and Statistical Database Management June 2009

2008

  1. Rule-based classification systems for informatics

    Other
    Krishnamurthy, B. Malik, T. Stamatis, S. Venkatasubramanian, V. Caruthers, J.
    2008 IEEE Fourth International Conference on eScience December 2008
  2. Workload-Aware histograms for remote applications

    Other
    Malik, T. Burns, R.
    International Conference on Data Warehousing and Knowledge Discovery September 2008
  3. Automated physical design in database caches

    Other
    Malik, T. Wang, X. Burns, R. Dash, D. Ailamaki, A.
    2008 IEEE 24th International Conference on Data Engineering Workshop April 2008
  4. Large scale data management for the sciences

    Dissertation
    Malik, T.
    The Johns Hopkins University January 2008

2007

  1. A workload-driven unit of cache replacement for mid-tier database caching

    Other
    Wang, X. Malik, T. Burns, R. Papadomanolakis, S. Ailamaki, A.
    International Conference on Database Systems for Advanced Applications April 2007
  2. A Black-Box Approach to Query Cardinality Estimation.

    Other
    Malik, T. Burns, R. C. Chawla, N. V.
    CIDR January 2007

2006

  1. Estimating query result sizes for proxy caching in scientific database federations

    Other
    Malik, T. Burns, R. Chawla, N. V. Szalay, A.
    SC'06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing November 2006

2005

  1. Bypass caching: Making scientific databases good network citizens

    Other
    Malik, T. Burns, R. Chaudhary, A.
    21st International Conference on Data Engineering (ICDE'05) April 2005
  2. Practical passive lossy link inference

    Other
    Batsakis, A. Malik, T. Terzis, A.
    International Workshop on Passive and Active Network Measurement March 2005

2002

  1. Web services for the virtual observatory

    Other
    Szalay, A. S. Budavári, T. Malik, T. Gray, J. Thakar, A. R.
    Virtual Observatories December 2002
  2. SkyQuery: A WebService approach to federate databases

    Article
    Malik, T. Szalay, A. S. Budavari, T. Thakar, A. R.
    arXiv preprint cs/0211023 November 2002
  3. The SDSS SkyServer - Public Access to the Sloan Digital Sky Server Data

    Article
    Szalay, A. S. Gray, J. Thakar, A. R. Kunszt, P. Z. Malik, T. Raddick, J. Stoughton, C.
    ACM Special Interest Group on Management of Data (SIGMOD) August 2002