Lavaca Scientific

Identity Modeling: Trust

Trust : ” a firm belief in the reliability, truth, ability, or strength of someone or something”

In this post we are going to take a simplistic look at an organizations trust relationships, i.e “how to make trust decisions”, Then attempt to describe this in a learning model.

Lets start with some questions to frame our task.

How do we define trust in a system? How do we determine trust? How do our responses change when we trust something? How do we respond when we don’t trust something?

We are going to attempt to formulate an observation model that we can use to learn the answer to some of these questions. So for this exercise we are going to be setting our focus on observing the response to the identity rather than the identity it self.

How do we define trust in a system

If we treat trust as a simple binary classification problem. True or false, we can work backwards from fundamental access requirements and establish rules and conditional logic that will govern how a system response to a set of identity attributes. Organizations today are doing just this. They leverage business intelligence modeling tools, spend time and spend effort to automate and streamline their new employee on-boarding processes. (eq.., we study a process, design, program logic that is coupled to that process, implement, reassess.) Over time this requires revisiting and reprogramming of the model to keep it relevant.

This approach works from a linear process efficiency standpoint, but if we want to gain a deeper understanding of trust, I believe we need to take a different approach and this is where machine learning can assist us.

The core problem with this classic model is that the rules that where derived from the original upfront assessment and analysis. These rules become obsolete over time, their change driven by requirements both internal(business lines, partners, etc.) and external(regulations, changing markets, etc.). Time and resource are required to continuously reassess and update these models. Also these models are typically not general and have to be customized, sometimes even completely redeveloped for new uses-cases.

On the other hand, if we build a ML that can observe the process, generate rules, reinforce those rules, and discover why rules are different, we can develop a new model. A model that is not just a one use-case answer, but an adaptive dynamic model capable of answering unprogrammed use-cases, and even perhaps future yet to be discovered ones.

How do we determine trust

It is an iterative process that starts long before someone request data, or is hired. It starts with the individual establishing their identity with a local governing body. In the United states this is primarily the job of the State government and in some cases the Federal government (see, Passports, Real ID act). They establish identity by evaluating official records, such as a birth certificate, and supporting documents from other organizations. One after the other building a foundation of trust in the person’s identity.

The individual then adds to their identity by establishing some type of communication channel that they can use to interact with other parties. In the past this was a phone number and mailing address. Nowadays it includes an e-mail, and this is interesting because this is the start of a individuals personal digital identity. I say personal because most organizations will establish the digital identity of the user in their own systems (HR records, IT accounts and business email)

In 2017 NIST published Digital Identity Guidelines, laying out guidelines for

“federal agencies implementing digital identity services.. guidelines cover identity proofing and authentication of users (such as employees, contractors, or private individuals) interacting with government IT systems..”

Though these guidelines are intended for the federal government, the core concepts of “Identity assurance” and “Authenticator assurance” are applicable across all industries. These guidelines put a users personal e-mail at Identity Assurance Level (IAL) 1, meaning that no 2nd party has validated its connection to a real world identity. NIST has gone as far as to say that 2-step verification(SMS) and association with a MFA device is not valid for proof of identity and is only “self-asserted, or should be treated as such”.

In the U.S new employees will go through an “I-9 process to validate their identity“, after which their corporate digital identities will then be considered at a higher IAL. This works for employees, but what about third parties, partners, media outlets, customers, etc.. that need access to data? This is another opportunity for centralized Identity management.

This leads to the next and possibly most neglected and misunderstood aspect in determining trust; how we apply it to a data system.

Right now organizations have rules on how employees, vendors, admins, and even the owner of the company gets access to data. In some cases this knowledge only exists inside the minds of the system administrators, or written out in vague archaic memos and policies. Nonetheless, rules are in place and they are being enforced and followed to some degree. This web of knowledge can be very valuable in understating why a trust entity receives different access.

This leads to another NIST contribution, the “Policy Machine” and it implications for mapping attributes to access rules, “Attribute Based Access Control“. This is going to be one of the keys to success in the mission to build a machine learning model for user trust. I will discuss this more in the next post “learning how to learn”, even a generalized A.I may need some pre-education to be successful.

What do we do when we trust something? What do we do when we don’t trust something?

A response to a data access request typically goes something like this. Administrator evaluation a set of documented access requirements and access request. Then compares them against the presented user credentials and\or qualifications(attributes), and in most cases against a set of “undocumented requirements”.

Why do they need access? What department?, For how long, Who is their supervisor, etc.., even bias and intuition play a factor. There is more to the question of “Why do they need access” then “to do their job”, in a complex system this can drive logic about what access they need in order “to do their job”. This can be time consuming and if not properly mapped, data access mistakes can be made.

Based on all of these attributes and rules both documented and undocumented considered together. Access may be granted, limited or denied all together.

What are these attributes, and how could we learn them and map them? Lets see if this could be done through knowledge transfer.

Human Observation

Where do we start. Let’s look at training, How do we train a new administrator today?

First we start with observation.

Trainee observes the interaction and decisions made by the trainer learning how to respond to the task.

How do we reinforce that training?

Trainee’s mistake in corrected by the trainer. Trainee knowledge is reinforced, second attempt is correct.

A trainee will demonstrate their learned knowledge through practice, trainer will continue to teach the trainee by becoming the observer.

How do we translate this in to a machine learning process?

Machine Learning Analog

Lets take the training model from above and develop a high level process flow.

ML observer acts as trainee, comparing observed attributes to observed response

By placing the ML actor in-line and taking the place of the trainee, the ML can generate rules based off of the outcomes of observed transactions. Much like a human observer.

ML reinforcing rules by comparison of prediction vs outcome

In the case of reinforcing rules, the ML process would evaluate the input and predict an outcome based off of probability of a learned rule. If the prediction was correct the weight of the rule would increase if the prediction was incorrect, a new rule would replace the old rule.

This works until we learn something new about the process?

Trainee prediction is incorrect, Trainer explains they are on a list that was not consulted before.

What happens when we learn something new?

The trainee will now incorporate this into their process and consult this list for names before making a decision. Further reinforcement will increase the likelihood that the trainee will make this decision in the future.

What about a machine?

weighted reinforcement as rules are learned they are added to rule array

A straight forward approach would be to treat the identity as a new parameter (a, b, c) -> (Id, a, b, c), and now respond to that Id with a unique Rule.

We need to stop here because we are beginning to fork the process now, and at this point we are losing generality and growing beyond simple observation.

We now have a new question would a ML learn this new parameter on its own? Could this be learned through only observation?

I believe the answer is possibly!

Though this has been proven through techniques like Deep Learning. A great example of which is DeepMind’s Capture the Flag (don’t let the fact this was done with computer games fool you.) This leads to a new set of issues, mainly the multitude of unique rules required and the multitude of data points required to create those useful rules. In this case it is the time it takes to collect those data points from the observation of human admins and users and for a single organization this equates to a long time.

We have to ask the question how does a human make these connections so quickly.

One answer is a human would simply ask why and incorporate the this new knowledge into their process. How could a ML process do this, is it metadata, is it perhaps meta-learning(learning within learning)?

In the next post “learning how to learn” we will take a step back and look at learning, how machines are learning and what perhaps is missing that could help make faster learning connections.

Fractured Identity

We have all created multiple accounts in multiple systems (e-mail, social media, financial, utilities, etc..) All of these requiring an account to centralize your data inside their independent ecosystems.

An account is a requirement for organizations to provide us a service, which allows for the internally linking of contact and billing info, and services. An account is not an identity it is simply a record in a database. A static record that can go on for years with out being reviewed or updated independent of the changes to your real identity (phone, address, e-mail, etc..).

There is a greater issue here then obsolescence of data attributes. It is the inability to completely track the use of these accounts as identity impersonators. When we delegate an organization permission on our behalf to provide a service, or a recurring financial transaction, we have in a sense fractured part of our identity. We have taken a set of PII attributes + account name (our pseudoidentity), verified sometimes only by email and put it in action to preform a task. This is not federation, this is not true delegation, we can not centrally track and maintain what we have granted to these organizations. Though most companies do their best due diligence, we can not 100% guarantee that every organization will alert us when changes have been made to the use of these accounts.

This is greater then a personal data privacy issue. This is an all data issue and one organizations big and small have to deal with everyday. Corporate IT infrastructures rely on non human service accounts to preform delegated operations for databases, webservers, applications, and data transactions between third parties. Administrators and system designers are fracturing their identity, delegating permission to these systems & trusting that the logic encoded within them will always perform as expected. For the most part this is true, well designed rules can guarantee that a process is understandable and repeatable. Unfortunately, the task of interpreting, tracking, and auditing these rules is getting more complex and is only going to become more difficult as the autonomous systems we use become more intelligent.

As machine learning and A.I becomes more prevalent, we will have long running (existing) non human entity that will have access to corporate, government, and personal data. The complexity of these systems are far greater then the rules of the past. Therefore the need for “..Explainable AI”.

What if these individual accounts did not exist, what if instead you

This image has an empty alt attribute; its file name is image-8.png

had just a single digital identity and contracts of delegated authority with these services ( between entities). Contracts that specified exactly what level of information is needed and for how long. Contracts that protect both the individual and the organization.

Here at Lavaca Scientific we believe that this is the future of the digital identity. Not just identity management, but a core independent identity. We believe that by leveraging blockchain’s decentralized and independently validated nature as the backbone for identity management. Users and organizations a like, will benefit from a less complicated and more secure identity ecosystem.

We are not alone in this belief, organization such as Decentralized Identify Foundation (DIF) and NIST have pioneered the way. By working to develop a “standards” to which all organizations will adopt to make this a reality.

This does not negate the need for Identity proofing, the core identity will still need to be validated. This though does hold the promise of reducing organizations costs and effort by de-duplicating identity proofing. For instance, eliminating background investigations and credit checks on the same individual with multiple accounts or by excepting through reciprocity the decision another entity has made about that identity.

Another promise this holds is cleaner lines of trust. This will make studying and observing interactions between entities easier, by the very nature of its “standards” communication protocol, data will be more available, interpretable, and general. These are all promising attributes for ingestion into machine learning algorithms.

In the next few weeks we will be talking about trust, learning trust, and why learning “how to learn” may be the next big step in machine learning for identity management.

Artificial Abstraction Layer (A.A.L)

What needs to be done to move A.I to the next level? Is it new innovations in A.I algorithms? Is it greater access to datasets? Is it reliability of the A.I systems? Is it better interpretability of what the A.I is doing? is higher interoperability? Is it more information on the predictability of the A.I? Is it a greater level of trustworthiness?

I argue it is all of these things and more. The evolution and I dare say revolution of the internet. Was brought on by the need of disparate networks and systems to interact and provide a reliable level of service to the users of the systems. A system that users could trust would allow them to work and communicate more effectively. The real accelerator was the creation of the web and when reliable user-friendly applications where developed to interact and utilize the potential of with the web, the internet exploded. [How the web went world wide] [Internet history has just begun]

We are now to that point with A.I systems. We have separate systems and datasets across a multitude of hosting platforms, numerous implementations, and various APIs. These systems are developed and cultivated by individuals and organizations based on their defined use cases. Many of which overlap, and knowledge sharing, resource sharing and cost saving opportunities exist. Unfortunately, many organizations implement their own versions of the same ML, or A.I systems. Incurring internal research and IT implementation cost. There are many driving factors for a costly inhouse implementation over utilizing a publicly available A.I. Some of the most apparent are Sensitivity of the data, reliability of the A.I., Security (eq., has malicious bad actor poisoned the A.I), Configurability & interoperability (A.I system is not adaptable to needs of the system) , Interpretability (understanding how it works , to explain to the stakeholders how they made a decision).

We also have many systems and datasets today that would benefit from A.I. Legacy Systems that would see the almost instant benefit from even some of the most basic ML libraries available today.

We need an Artificial Abstraction Layer (A.A.L) A system that would allow for the access and sharing of A.I resources. Algorithm compute services, datasets, pre-trained A.I services accessible through a secure standards approach that would provide for Availability, Reliability, Predictability, Interpretability, Interoperability, and Trustworthiness. Such a platform could be the springboard for new revolutionary steps in A.I allowing for growth and adoption not just in research but in practical applications.

At Lavaca Scientific we believe the first step in achieving such a system is establishing trust. Understating users and A.I. services and their level of trustworthiness. We are working to develop a distributed identity model utilizing A.I to learn and develop policy that governs and guides how AI system and user interact.