Whose Data Is It Anyway

February 10th to February 14th, 2025

In its fifth year, Johns Hopkins Libraries are hosting Love Data Week (February 10-14, 2025), an international celebration of data. This year’s theme is Whose Data Is It, Anyway?

JHU Love Data Week 2025 is made possible by:

JHU Love Data Week 2025 is organized by the Johns Hopkins Libraries: Sheridan Libraries, Welch Medical Library, Arthur Friedheim Library, APL Library, and SAIS Europe Library.

Love Data Week is presented in partnership with the Data Science and AI Institute and the Berman Institute of Bioethics.

Contact

If you require a reasonable accommodation to enjoy and participate in these events, or if you have questions or feedback, please direct your inquiries to dataservices@jhu.edu.

Schedule

Recordings of Love Data Week 2025 are available below.

  • Day 1 - Monday, February 10th
    Time Topic Presenter
    10:00 – 10:10am Welcome Remarks and Introduction Elisabeth Long, Sheridan Dean of University Libraries, Archives, and Museums,
    Johns Hopkins University
    10:10 – 11:00am Keynote Presentation
    Context is key: Thoughtful uses of secondary data (recording)
    This talk will explore the importance of approaching secondary data with a critical eye. Reusing existing datasets – whether from government agencies, social media platforms, or online repositories – saves time and resources and often provides a broader view than starting from scratch. However, it also poses unique challenges: the data may be incomplete, outdated, or gathered with different priorities. This session explores both the advantages of and concerns with working with secondary data, highlighting considerations such as privacy and data ownership.
    Negeen Aghassibake, Data Visualization Librarian, University of Washington
    11:00am – 12:00pm Research Talk
    Defining and Implementing Clear, Reproducible Experiments for Empirical Humanistic Scholarship (recording)
    The basic processes common to setting up and performing a computational study will be illustrated with a grounded historical example from Near East Studies. Starting with the fundamental, minimal tooling needs, we’ll demonstrate the path from an empty directory and a command-line to a full-fledged research project. While the narration won’t go deeply into the choices, each concrete step will be justified at a high level and linked to copious documentation of the relevant open-source software used. The demonstration can be used as a starting-point for a new research project, or simply as a way to exercise and expand understanding of its components.
    Tom Lippincott, Associate Research Professor (CS/AGHI) & Director of the Center for Digital Humanities
    12:00 – 2:00pm Break
    2:00 – 3:30pm Roundtable Discussion
    Whose (Training) Data Is It Anyway? Ethics in Data Use for AI Research (recording)
    Artificial intelligence (AI) is built on data, and creates more every time it’s used. Whose Data Is It, Anyway? We invite researchers using and creating AI across disciplines, legal experts, and researchers whose data supports AI development to discuss the ownership of the data that underpins AI models used for research. The topics that will be covered include the implication of AI on patient privacy for health research, data ownership for AI-generated data, researchers’ responsibilities to develop AI models ethically, and guidelines and future directions for AI research.

    • Moderator: Mark Dredze, John C Malone Professor of Computer Science, Johns Hopkins University
    • Brian Caffo, Professor, Department of Biostatistics and Department of Biomedical Engineering, Johns Hopkins University
    • Somdatta Goswami, Assistant Professor, Department of Civil and Systems Engineering, Johns Hopkins University
    • Brian Klaas, Assistant Director for Technology, Bloomberg School of Public Health Center for Teaching and Learning, Johns Hopkins University
    • Daniel Khashabi, Assistant Professor, Department of Computer Science, Johns Hopkins University
  • Day 2 - Tuesday, February 11th
    Time Topic Presenter
    10:00 – 11:00am Research Talk
    Reproductive Justice and Open Information Access on Wikipedia (recording)
    It can be hard to find accurate, up-to-date information about reproductive health and justice issues. Wikipedia is where people go when they want to answer a question without extensive research or reliance on unverified AI summaries. Questions about reproductive issues may come up in civic life (“what policies align with my values?”) or in personal life (“is abortion legal in my state?”). Access to information about reproduction, contraception, abortion, and birthing has always been unequal, with people of color, young people, and the poor facing higher information barriers, and this is exacerbated by state policies activated by the 2022 Dobbs ruling that overturned Roe v. Wade. High-quality Wikipedia articles make reproductive health information accessible in a transparent way: they cite sources and are collectively reviewed by a community of editors. This talk will cover the intersection of reproductive justice with open-source Wikipedia editing, sharing case studies of articles that have been revised with this approach, and leading participants through the process of setting up their own Wikipedia account and identifying articles in need of revision. Understanding how an information source works is critical to being a responsible reader and citizen. The open nature of Wikipedia, where all edits are sourced, tracked, reviewed, and contestable, makes it a valuable site of tension and consensus-making. At the same time, many important scholarly sources are not accessible to non-academic users, therefore contributions by editors working within institutions can help to move information out from behind paywalls. We will also discuss the relationship between data and narrative framing, exploring how the language used to present data can exacerbate structural biases and how appropriate language and framing can counter bias. This is especially important when providing information about reproductive justice issues. Articles may use biased framing or language that perpetuates stigma. Users who contribute to Wikipedia are empowered to reframe these issues and introduce inclusive language.
    Alicia Puglionesi, Lecturer, Johns Hopkins Medicine, Science, and the Humanities Program

    Siân Evans, Librarian for History and Area Studies, Johns Hopkins Sheridan Libraries

    Sophie Reverdy, Distance Learning Librarian, Anne Arundel Community College

    11:00am – 12:00pm Webinar
    Motivating Research with Open Data (recording)
    Students and researchers have benefitted tremendously from the advent of modern information technology architecture and recordkeeping. Organizations in the private and public sector are now amassing tremendous amounts of information as part of their daily operations and their data collection efforts. Much of this is now available to the public through data portals. Yet while data availability is greater than ever, this does not equate to excellence in quantitative research. In this workshop, Dr. Paschall and Dr. Kromphardt will review best practices for formulating research questions that build on publicly available datasets.
    Collin Paschall, Program Director, Assistant Center Director, Sr. Lecturer, MS in Data Analytics and Policy Program, Johns Hopkins University

    Chris Kromphardt, Assistant Program Director, Lecturer, MS in Data Analytics and Policy Program, Johns Hopkins University

    12:00 – 1:00pm Break
    1:00 – 2:00pm Lightening Talk: AI & Data (recording)

    • Data pipeline for athletic performance optimization at JHU (Kaze W. K. Wong) Machine learning and AI tools have made tremendous progress in computer vision tasks such as image segmentation, action localization, and pose estimation. These are tasks that can be used to extract useful analytics out of videos of athletes performance, hence can be used gain insight about training methods and help optimize performance. However, most of existing AI tools stop at the level of research code instead of being a production-ready tool the users can just deploy. In this talk, I am going to talk about a research and engineering effort I lead to build a production-ready data pipeline for the Johns Hopkins Track and Field team, specifically for the high jump event. I will discuss the prospects and challenges to building such an end-to-end pipeline that not only is going to result in research papers in machine learning, but translates to improvement in the performance of athletes. I will discuss these challenges from different prespectives, including an AI researcher, a software engineer, a coach, and finally an athlete. Specifically, we are going to go from data collection to integrating different continuous integration/continuous deployment (CI/CD) and MLOps tools such that our machine learning pipeline can run whenever new data is generated.
    • Archiving Prison Witness: History, Power, and Entitling (Doran Larson) This talk will review the inception and history of the APWA, discuss the potential power of first-person prison witness to bend the arc of carceral history, and discuss APWA policies and practices that seek to make platforming of prison witness non-extractive by granting rights and ownership to its 1000+ (and growing) authors.
    • The OIDA Toolbox: Supporting Research on the Opioid Industry Documents Archive Across Methods and Disciplines (Brian Wingenroth) The OIDA team at JHU has developed and released the OIDA Toolbox, a suite of tools supporting diverse approaches to conducting research with the archive. OIDA, the UCSF-JHU Opioid Industry Documents Archive, houses over 4 million documents from the opioid industry. This presentation will demonstrate how researchers across disciplines can access and analyze these materials through multiple pathways – from traditional search and download to cloud-based computing environments – designed to support users at all levels of technical expertise.
    Kaze W. K. Wong, Assistant Research Professor, Research Software Engineer, Johns Hopkins Applied Mathematics and Statistics Department; Johns Hopkins Data Science and AI Institute

    Doran Larson, Founder, Co-Director, and Program Executive of the American Prison Writing Archive, Johns Hopkins University

    Brian Wingenroth, Data Science Lead, OIDA (the Opioid Industry Documents Archive), Bloomberg School of Public Health, Johns Hopkins University

    2:00 – 3:30pm Roundtable Discussion
    Data Ownership (recording)
    This roundtable will discuss issues of ownership around data especially concerning how to collect data, what can and should be shared, and the rights of the subject of data collection to decide these terms for themselves.

    • Moderator: Kathleen DeLaurenti, Director, Arthur Friedheim Library, Johns Hopkins University
    • Tonika Berkley, Africana Archivist, Special Collections, Sheridan Libraries, Johns Hopkins University
    • Laura Evans, Executive Director, Homewood IRB & Conflict of Interest and Commitment Office, Johns Hopkins University
    • Dr. Valentín Quiroz Sierra, Postdoctoral Fellow, Johns Hopkins Center for Indigenous Health
    • Dr. Nadejda I. Webb, Assistant Director, Life x Code | Visiting Assistant Professor, Center for Digital Humanities, Johns Hopkins University
  • Day 3 - Wednesday, February 12th
    Time Topic Presenter
    10:00 – 11:00am Webinar
    Big Data in R: Larger-than-Memory Data Analysis with Arrow and DuckDB (recording)
    Analyzing datasets that exceed your system’s memory can be a significant challenge, but the right tools can make it manageable. In this session, we’ll explore how to use Apache Arrow—a high-performance, multi-language framework for working with larger-than-memory tabular data—together with DuckDB, a fast and lightweight embedded database system. In this webinar we’ll guide you through combining these powerful tools to build efficient, scalable data analysis pipelines directly in R. This webinar will equip you with practical strategies to overcome memory limitations and enhance your data processing capabilities in R.
    Dr. Pete Lawson, Data and Visualization Librarian, Johns Hopkins University Data Services
    11:00am – 12:00pm Webinar
    Licensing Options for Research Data & Software (recording)
    This workshop covers the fundamentals of open licensing for data and software, providing practical guidance for researchers who want to make their work openly available. Participants will gain an understanding of license selection, implementation best practices, and special considerations for AI model components.
    Lubov McKone, Data Management Consultant, Johns Hopkins University Data Services

    Megan Forbes, Program Manager, Open Source Programs Office, Johns Hopkins University Sheridan Libraries

    12:00 – 2:00pm Break
    2:00 – 3:30pm Roundtable Discussion
    Using Someone Else’s Data: Benefits and Challenges (recording)
    Data reuse benefits scholarship by reducing the amount of time required to produce data and by facilitating novel ways to use data, perhaps not foreseen by the original data creator. Given these benefits, scholars may be motivated to reuse other’s data, but how do you start using data collected by others for your own scholarship? During this session, you will have the opportunity to listen to Johns Hopkins researchers and staff talk about their experiences finding and evaluating data sources, the challenges and benefits when using secondary data, and thoughts on how Hopkins can support the use of secondary data for scholarship. The session is designed so that attendees can ask their own questions as well.

    • Moderator: Dr. Pete Lawson, Data and Visualization Librarian, Johns Hopkins University Data Services
    • Andreia Faria, Associate Professor of Radiology at Johns Hopkins University
    • Angelo Mele, Associate Professor of Economics, Johns Hopkins Carey Business School
    • Erik Westlund, Assistant Scientist, JHSPH Department of Biostatistics
    • Benjamin Zaitchik, Morton K. Blaustein Chair and Professor, Department of Earth and Planetary Sciences, Johns Hopkins University
  • Day 4 - Thursday, February 13th
    Time Topic Presenter
    10:00 – 11:00am Research Talk
    Data Stewardship and Black DH: Researching and Building K4BL (recording)
    Keywords for Black Louisiana is a pair of websites (docs.k4bl.org and stories.k4bl.org) curating a digital edition of transcriptions, translations, and stories about people of African descent in Louisiana based on French and Spanish colonial documents in the eighteenth and early long-nineteenth centuries. It is a LifexCode project sponsored by the National Historical Preservation and Records Commission and Johns Hopkins Sheridan Libraries. For the past four years, Keywords researchers have engaged manuscripts digitized through the Louisiana Historical Center’s Louisiana Colonial Documents Digitization Project and housed in the New Orleans Jazz Museum. Thanks to robust, ongoing community engagement, our project serves as a resource for researchers, cultural workers, genealogists, educators, and students. We hold ourselves accountable to a primary audience of Black New Orleans: community members, descendants, creators, and scholars. This talk will address our data development and curation process, writing and publishing documents and stories on our site, and community engagement and accountability in digital humanities.
    Olivia Barnard, Associate Editor, French Language Team, French Team Lead, PhD candidate, History, Johns Hopkins University

    Zaria Sawdijah El-Fil, Digital Curation Fellow, Keywords for Black Louisiana, Ph.D. candidate, History, University of Chicago

    Akosa Obianwu, Assistant Editor, Spanish Language Team, Johns Hopkins University

    Ellie Palazzalo, Former Digital Lead, Keywords for Black Louisiana, PhD candidate, History, Johns Hopkins University

    Dr. Nadejda I. Webb, Assistant Director, Life x Code | Visiting Assistant Professor, Center for Digital Humanities, Johns Hopkins University

    11:00am – 12:00pm Webinar
    How does the choice of data source influence our research outcomes? Navigating Geospatial Tools for Smarter Research (recording)
    In this session, we’ll dive into four popular geospatial tools – Social Explorer, Data Axle, ArcGIS Online’s Living Atlas, and ArcGIS Online’s Business Analyst, to determine what questions you should ask before choosing a data source. Whether you’re asking the obvious or uncovering the unexpected, this session will help you think critically about data and make informed decisions for your projects.
    Bonni Wittstadt, Geospatial Services Librarian, Johns Hopkins University Data Services
    12:00 – 1:00pm Break
    1:00 – 2:00pm Research Talk
    Honoring Indigenous Data Sovereignty in Suicide Prevention Research: Creating a Culture-Based Risk Assessment Tool through the HOPES Study (recording)
    Suicide is the leading cause of non-accidental death among Native American youth aged 10 to 24. While clinical suicide risk assessment tools aim to identify high-risk individuals and connect them with mental healthcare services, these tools often face limitations when applied in Native American contexts. The Helping Our People End Suicide (HOPES) Study, in collaboration with the Sacramento Native American Health Center (SNAHC), explores perceptions of suicide and its prevention within the Sacramento Native American community to inform the development of a culture-based suicide risk assessment tool. The HOPES Study honors Indigenous Data Sovereignty by prioritizing a collaborative, culturally responsive research process that integrates Indigenous Knowledge Systems and Indigenous data stewardship practices. This approach challenges standard approaches to data ownership by positioning the community as the primary authority over data collection, analysis, and application.
    Dr. Valentín Quiroz Sierra, Postdoctoral Fellow, Johns Hopkins Center for Indigenous Health
    2:00 – 3:30pm Webinars
    Research Data and Software Support at JHU (recording)  Representatives from multiple JHU offices will present data and software support they each provide to JHU researchers. Research IT provides IT solutions to JHU researchers. Precision Medicine Analytics Platform (PMAP) offers a platform for JHU medical and clinical researchers to work on data from
    various data sources. AI and Data Trust oversees the data from JH Medicine to
    ensure patient’s data are secured and protected. The Biostatistics Center consults on biostatistical questions in research to the JHU community. The Open Source Programs Office (OSPO) is a hub for JHU affiliates to create, contribute to, and participate in open-source communities. Data Services helps JHU faculty, researchers, and students find, use, manage, visualize, and share data. Each group will present for 15 minutes and leave 3-5 minutes for Q&A. Presentations in order:

    • Research IT: Ayla Gnau, Technical Support Manager – Cloud & Virtualization Services; Monica Crubezy, IT Director, Research IT Solutions; Kendra Little, IT Director, Research Data Analytics and Engineering; Tyler J Creamer, Research IT Facilitator, Cloud & Virtualization Services, IT@JH, Johns Hopkins University
    • Precision Medicine Analytics Platform (PMAP): Paul Nagy, PhD, FSIIM, Director of Education, Biomedical Informatics and Data Science, Johns Hopkins University Schools of Medicine, Public Health, and Engineering
    • AI & Data Trust: Valerie Smothers, Data Trust Administrator, Johns Hopkins Medicine
    • Biostatistics Center: Andre Hackman, Director of Data Informatics Services Core (DISC), Associate Director, Johns Hopkins Biostatistics Center
    • Data Services: Chen Chiu, Senior Data Management Consultant, Johns Hopkins University Data Services
    • Open Source Programs Office (OSPO): Megan Forbes, Program Manager, Open Source Programs Office, Johns Hopkins University Sheridan Libraries

    Moderator: Anne K. Seymour, Associate Dean, Welch Medical Library, Johns Hopkins Medicine, School of Nursing, Bloomberg School of Public Health

  • Day 5 - Friday, February 14th
    Time Topic Presenter
    11:00 – 11:10am Closing Remarks Cynthia Hudson Vitale, Associate Dean, Sheridan Libraries, Johns Hopkins University
    11:10am – 12:00pm Unconference
    Did you expect a topic to be discussed during this Love Data Week conference that wasn’t or did you want to dive deeper into a topic that was? This unconference session is your chance to discuss that topic with fellow attendees. Join this session and enter a breakout room to discuss one of the three most popular topics that attendees vote for during the first four days of Love Data Week.

 

Love Data Week 2025 Organizing Committee

  • Lubov McKone (Co-chair), Data Management Consultant, Data Services
  • Bonni Wittstadt (Co-chair), Geospatial Services Librarian, Data Services
  • Chen Chiu, Senior Data Management Consultant, Data Services
  • Megan Forbes, Program Manager, Open Source Programs Office, Sheridan Libraries
  • Betsy Gunia, Data Management Consultant, Data Services
  • Peter Lawson, Data and Visualization Librarian, Data Services
  • Emily McGinn, Digital Humanities Specialist, Sheridan Libraries
  • Nancy Shin, Scholarly Communications Librarian, Welch Medical Library
  • Stanley Singer, Data Informationist II, Welch Medical Library
  • Daniel Watts, GIS Specialist and Applications Administrator, Data Services