Ethics, Values, and Data
Ethical values may mean different things to data producers, consumers, and to organizations implementing algorithmically-driven systems. The actual data management phases of Artificial Intelligence and machine learning are often neglected. The ethical frameworks that should guide data gathering, the “grist for the mill” of machine processes, and the systems that manage it, are spotty, subject to little oversight, few guidelines, and uneven monitoring and enforcement.
The release of ethical guidelines in the private sector, public policies such the EU’s General Data Protection Directive, and industry-based acknowledgements that ethical practices in the open data space need to be improved, signal some of the problems associated with privacy and user consent. Yet the complexities involved in large data aggregations, transformations, distribution, and reuse, and the limited capacity to validate ethical implications embedded in routine data practices make it difficult to track and prevent ethical breaches. Shasti et al. (2019) demonstrate how prescriptions within legal structures such as GDPR may be at variance with the normal practices of system programmers.
- What are key ethical features in data management from the perspectives of both data producers and users?
- How can ethical features, including the stakeholders’ and data subjects’ best interests, be effectively managed across the continuum of data? What are the difficulties?
- What computational/curation techniques can help identify and track these features in large, aggregated, multi-variable datasets, and how are they reflected in current production systems?
Designing and building good systems is an ongoing process fundamentally intertwined with what we call ethics data management. We will investigate how data ethics can be a point of departure to designing and evaluating good systems. Conventional practices of data and computer scientists do not always conform to the expectations of people and institutions handling data and making decisions based on their personal or institutional priorities. By highlighting the contradictions and the pressure points in data practices, our research charts new directions for ethical considerations.
We will investigate two cases in open data and open source systems. Open data are often produced and used by the public and private sectors and shared broadly, with data sometimes repurposed by commercial entities. Among others, open source systems include Linux, Firefox and Kubernetes. The open data and open source systems may represent the efforts of loosely connected communities, another sort of “institution” and sometimes organized through foundations such as Mozilla. Hyperscalers as Google, Facebook and Amazon actively contribute infrastructure to the open source community.
Two contrasting cases will highlight a range of issues: (1) natural hazards data collected by public authorities and (2) the deployments of Kubernetes, an open source platform developed by Google that “containerizes” content and applications to cloud environments.
For both cases, the actual software affordances, the scientific or engineering settings, and the demand factors that influence developmental trajectories, will be examined and questioned. We will use the data lifecycle model (see figure beneath “guiding questions”) to investigate generation, evolution, and change through stages of processing, storage, distribution and reuse (Digital Curation Centre, 2019).
The final phase (June – August 2020) generates an ethical decision-making framework and a set of computational methods for evaluating data and applications. These deliverables will be assessed by stakeholders selected from our interview pool. They will be blueprints for a prototype application that extends conventional “toolkit” approaches to more detailed ethical queries and to other test cases.
15th International Open Repositories Conference
June 1st-4th in Stellenbosch, South Africa
In this presentation, we describe our research methods using the case of natural hazards data. We present ethical pressure points identified across the data lifecycle, and introduce foundations for an ethical decision-making framework.
45th Annual Natural Hazards Research and Applications Workshop
Sunday, July 12 through Wednesday, July 15, 2020
There is so much buzz about artificial intelligence, but what’s real and what’s fiction? In this hands-on session about AI’s effects on science and society, Harvard researcher and artist Sarah Newman helped us explore scholarly and practical AI...
The Technolgy & Information Policy Institute main office is in the Jesse H. Jones Communication Center, Building A. We are part of the Moody College of Communication.