(Sketch… | PERSONAL OPINION)

From both regulatory pressures (GDPR, HIPPA, CCPA, LGPD…) and the ever-increasing sizes of different elephants in the room (lake, warehouses, stores…), wherever you turn these days, you’ll have an earful of governance, yet nobody gives a clear definition of what governance is. Without understandings of what governance entails, it’s not surprising that current governance practices in enterprises are limiting in scope, incoherent in strategy and lackluster in value proposition. Decision makers are emphasizing governance nowadays as a mere show of efforts: They’re compelled to do something about it yet don’t know how to realize its full potential.

Scope: from Data to Assets & Processes

Like me, you most likely heard of governance in the data space, for the alarming velocity and volume at which data accumulates in the enterprise. But other things around data are also growing out of control! It’s time to expand governance from data to assets and processes.

Assets in a corporate environment refers to artifacts having value or potential value. Compared to IT assets such as servers and databases, analytic assets (data, feature, reports, dashboards, predictive models, apps and APIs, referred to as “assets” hereafter) requires special considerations when it comes to governance.

If assets are the “things”, Processes are the series of actions/steps taken to produce the assets or activate the values provided by the assets.

Mission: From Policing to Decisioning

Governance refers to “what decisions must be made to ensure effective management and use of assets (decision domains) and who makes the decisions (locus of accountability for decision-making)”. On the other hand, management involves making and implementing governance decisions. In this sense, governance provides a (self) regulatory framework within which management operates.

To appreciate what governance really is, consider the use of AWS S3 buckets within a highly regulated enterprise. Let’s assume the only way the enterprise knows to use the S3 bucket is with the access/secret keys. As a fact, key/secret based S3 access has a less-than-ideal attack surface as anybody getting hold of the keys may access its content from outside of the enterprise’s Virtual Private Cloud (VPC). Now do we categorize its use as not secure and halt all projects that depend on it? Or we put in place a decision process considering the content of the bucket (PII or not), the probability of the user leaking the key/secret pair (0.001) and the cost associated with the risk (0.001 X $50,000 HIPPA fine = $50), and allow its use when the quantified risk is less than the benefits. As a front-line manager, I’m more than willing to authorize its use and be held accountable for the decision. If you refer to our definition of governance, “effective” is the key word.

Decision Domains of analytic assets governance

In parallel to Maslow’s hierarchy of needs, we frame the assets governance into five decision domains:

  • Physiological. The physical assets that enable the end-to-end lifecycles of data and analytic assets. Storage (S3, Hive…), Compute (AWS EKS, GPU…), Automation (CI/CD), Workflow (schedule, event driven…) …
  • Security. Regulatory compliance needs such as HIPPA, GDPR…
  • Quality. Total quality management: Fairness/bias, correctness, completeness, clinical relevancy, and performance in every step of the asset’s lifecycle.
  • Discoverability. Assets Graph which enables sharing the reuse: metadata, relationships, search…
  • Value. Last but not least, artifacts without value are not assets by definition! Cost/benefits analysis to drive investment or retirement decisions for assets. Cost optimization on AWS…

Putting it Together

With decision domain and assets types as two orthogonal dimensions, we may reason about the decision points and decision makers of analytic assets governance with below tabular format.

Semantics: from Taxonomy and Lineage to Knowledge Graph

Operational vs Conceptual (ontology) vs upper ontology
Concrete vs abstract ontology

Maturity: from Semantics to Procedural

to be continued…

References:

  1. Designing Data Governance, Vijay Khatri, Carol V. Brown, Communications of the ACM, January 2010, Vol. 53 No. 1, Pages 148-152
  2. Model governance: reducing the anarchy of production ML, Sridhar V., Subramanian S, etc, ParalleM
  3. A governance model for the application of AI in health care, Sandeep Reddy, Sonia Allan, Simon Coghlan, Paul Cooper, Journal of the American Medical Informatics Association, Volume 27, Issue 3, March 2020
  4. A governance framework for algorithmic accountability and transparency, Koene, Ansgar; Clifton, Chris; Hatada, Yohko; Webb, Helena; Richardson, Rashida
  5. Process Governance: definitions and framework, Part I, Rafael Paim & Raquel Flexa, BTTrends.com, Nov. 2011. Paper itself is not high quality, included for the bibliographical references therein.
  6. >Process Governance: Part II, Rafael Paim & Raquel Flexa, BTTrends.com, Nov. 2011.
Governance 2.0