Developing a Hybrid Approach to Automated Social Intelligence

Project Details

Project Lead
Katherine Metcalf 
Project Manager
Katherine Metcalf 
Institution
Indiana University, Computer Science Department  
Discipline
Computer Science (401) 
Subdiscipline
---910 Social Sciences, n.e.c.--- 

Abstract

My research plan has three main parts: (1) develop a system able to recognize different components of the social interaction process, (2) use the components to reason about the interaction’s social profile and how the interactors are situating themselves within it, and (3) use the information about the social and interactional contexts to reason about the personality and background of the people involved in the interaction. Each of these steps requires a number of sub-steps in order for them to be successfully completed. It is my goal to create a reasoning system that will process input and handle social inference generation in a manner that is functionally similar to the way humans do. This means that the systems must have mechanisms to reason about (1) lower level features, such as speech and dialogue acts, sentiment, and turns, (2) mid-level features, such as the successfulness of each speech or dialogue act, the mutually held belief space, and the interactors’ intentions, and (3) high-level features, such as socially imposed constraints, violations of interactional constraints, and the personality features of the interactants. Features are connected across the levels such that lower level features guide the generation of mid and higher level features, while mid-level features can guide the generation of lower level features and higher-level features can guide the generation of mid-level features. To achieve this level of interconnectedness, an implementation of the blackboard model will be the basis of this system. I intend to use machine-learning techniques, particularly neural networks, to process and classify textual, audio, and visual input. The results of the machine-learning step will be represented in symbolic form and further processed using a set of inference rules based on the rules of social interaction. The case-based reasoning system will receive the symbolic input and use it to reach a conclusion about a given interaction. I will use statistical machine learning, formal inference making, and case-based reasoning to mimic the results of people’s social-based reasoning. 1. The Components The system will be organized according to a hierarchy of components, where each level in the hierarchy handles the different levels of feature complexity represented by the social context. As the social context is not immediately evident, its social profile must be built-up from lower level features and reasoning processes to the higher level features (Figure 1). 1.1 Low-Level Reasoning and Features This level of analysis will handle, extract, and reason about those features that are “immediately” evident, such as word and part-of-speech counts, number of turns per participant, interaction duration, and speech and dialogue acts, using off the shelf tools already available. It is important to note that what counts as immediately evident features depends on the type of data used as input. For example, when working with audio, word and part-of-speech counts would be not be at the first level of analysis. In the case of extracting and analyzing audio features identifying the words spoken and the para-linguistic features would be the first step. These low level features will be extracted using statistical classifiers and those tools that already exist and are suited to the task. 1.2 Mid-Level Reasoning and Features This level of analysis will use the features extracted at the lower level(s) to identify those characteristics of the interactional context that define its social profile. At this level the reasoners would be a combination of rule-based and case-based reasoners with conceptual dependency like representations. I intend to work with domain experts when developing the rules and the conceptual dependency scheme for social interactions. The exact functions of each of the components will be a large portion of the research that is necessary to complete this project, but the main purpose of this level of analysis is to fill in slots used in the frames that represent the social context. The slots include, but are not limited to, the interaction’s setting, participants, goals, form, tone, instrumentalities, norms of interaction, and genre (Hymes, 1967). 1.3 Top-Level Reasoning and Features This level is responsible for reasoning directly about the social context. The social context will be represented with a “frame-like” structure containing the same knowledge as the SPEAKING model for ethnographic research (Hymes, 1967). The reasoning will be done with a combination of a case-based reasoner and a prototyping mechanism. As social interactions can vary greatly while still following the same or a familiar theme, a large case base with a lot of cases would be beneficial. Therefore, I intend to investigate the effectiveness of organizing the case base into a series of hierarchies. The hierarchies would move from most to least abstract (i.e. specific instances). Prototyping the interactions in the case base into hierarchies has precedent in case-based reasoning (Smyth et al., 2001) and in social theory, namely Hymes’ (1967) hierarchy for levels of an interaction: speech acts, speech events, and the speech situation. A speech situation is composed of speech events, which are, in turn, composed of speech acts. However, each speech situation can have varying speech events and each speech event can have varying speech acts. Those speech events that make up speech situations and speech acts that make up speech events are constrained by the nature of the interaction (Hymes, 1967). Therefore, the frame representation will provide the structure necessary to reason about a case’s prototypicality. Prototyping and storing the cases in a hierarchical structure would allow for the opportunity to examine which level of specificity allows for efficient and effective case retrieval and adaptation. 2. Research Already Completed I began working on this project in the fall of 2014, and since then I have completed two subparts as research projects and am in the process of working on two others. In the fall of 2014, I investigated the accuracy of a case-based reasoning system at identifying differences in social power, and found that, because of the complexity that would be required for a case adaptation mechanism, was not the best suited to the task and would require a massive case base likely to slow performance. Consequently, in the spring of 2015, I performed experiments comparing the efficiency and accuracy of two statistical classifiers, random forest and SVM, at predicting social power differences using function word counts as features. I found that random forest performed best with a 99% accuracy. The relative social power classifier I developed, will be used a low-level component in the system proposed here. I am in the process of working on two more components for my proposed system: a statistical classifier able to do relationship type classification and a case-based reasoning mechanism for the top-level reasoner discussed above. As described in a paper I have submitted to the ACS conference (accepted).

Intellectual Merit

My research interests concerns the automated analysis of personality and interactions, and will involve testing the applicability of Case-Based Reasoning (CBR) to social relation mapping, the test for which will be a system that automatically maps the social relations between two or more interactions. Mapping social relations requires the following: relationship type identification, social hierarchy identification, and the identification of the quality of the relationship. Each of these three subcomponents can be used to describe the relationship between interactors, where interactors can be either two or more individuals, an individual and a group, or two groups, where a group means two or more people whose relationship(s) establishes some type of common ground and allows them to work together as a unit. Relationship classification provides information necessary to understanding an interaction’s social context, which facilitates the identification of deviant behavior, intention, sentiment, and personality features. The purpose of relationship type identification is to identify a set of general, higher-level, socially based rules and expectations that govern the permissible actions for the role each interactor is enacting relative to the role of the other interactor(s). This will be done by identifying the common ground two agents have that would cause them to interact and the role each participant plays relative to the other(s). The interactors’ relationship will be defined in terms of the role each is playing. For example, the relationship type of two people talking at a bus stop for the first time is that of strangers, whereas for a child and adult pair the relationship could be parent-child. Social hierarchy identification describes the social status one participant in an interaction has relative to the other participant(s) (Rowe et al., 2007). The main task of this step is to assign deferential and authoritative classifications to each participant. By identifying the authoritative individual in an interaction, it becomes easier to identify a leader or someone who has influence. Additionally, it further defines and constrains the behaviors likely to be present in a given interaction. By having a bounded model of the expected behavior for a given interaction, it makes it easier to identify deviant behavior. For example, in a work environment, a manager should be identified as authoritative, whereas employees should be identified as deferential towards the manager. Identifying the quality of a given relationship is likely to be the most difficult task. By relationship quality, I mean the valence each interactor assigns to the relationship, and implicitly to the other participant. A quality measure (+/-) and degree, or strength of sentiment, marker (0-10) will be applied to a feature set of descriptive terms that a psychiatrist might use to describe a relationship. This classification is meant to answer questions of the following type: “How would you characterize the relationship? It is good, strained, mutually desired, mutually beneficial?” For example, when a child goes through puberty, the relationship between a child and a parent tends to be come strained and often times prone to arguments, therefore the valence measure should reflect this. The ability to map and identify social relations facilitates the interpretation of human interactions. An important feature of human interaction analysis is the context of the interaction, which includes, but is not limited to socially required, predefined actions and the rules or constraints that structure them. The social context is defined by interpersonal relations, which are arguably even more important to identify. Interpersonal relations are composed of the relationship between the participants, their relative social statuses, the quality of the relationship, and each interactor’s goals and intentions relative to the others. These features structure the context of an interaction and define the expected behaviors and their expected manner of execution. The automatic identification of deviancy, intention, sentiment, and personality characteristics requires an analysis of social context. Relationship type, relative status, and relationship quality will not generate these classifications by themselves, however, they represent vital features for such a classification task. CBR is an appropriate framework for social relation mapping because of its use of prior cases to classify current cases. The feature sets vary greatly between different interactions, this means that it would be difficult to build a statistical machine learning model of social interaction that would be capable of accurately handling the variety of interaction cases. Additionally, interactions are highly dependent on society, which is flexible and changes with each new generation, despite which people remain able to navigate without too many blunders. This ability is most likely due to reasoning about prior experience, a process CBR is able to mimic (Kolodner, 1992), which will allow it to adapt new or previously inexperienced modes of social interaction. This investigation will be focused on supralinguistic and linguistic features extracted from an interactor’s speech. There are instances where a scarcity of data means that an analysis will need to be completed on just linguistic features or just supralinguistic features, therefore, the dataset will include training and test cases with just a transcript of an interaction or just tonal data from an interaction. This will make the system more useful for real-world applications where data scarcity tends to be common. The features of interest are: duration of speech (both time and word count), rate of speech (syllables per second), degree of change in fundamental frequency, the direction of change in fundamental frequency, the duration of the change in fundamental frequency, the degree of change in energy (dB), the direction of the change in energy, the duration of change in energy, bag-of-words, keywords, presence of function words, number of pauses, and disfluencies. Those features relating to the fundamental frequency or the energy will be calculated relative to the fundamental frequency (F(o)) or energy (dB) space of the speaker. As F(o) and dB difference between speakers are physically based, absolute values are not beneficial. Instead, it is the percent of change, or the magnitude of change, that is useful. Additionally, the gradual and expected decrease in vocal energy during an utterance will be accounted for. References: (Sept. 14, 2009). The history of automatic speech recognition evaluation at nist. National Institute of Standards and Technology. Retrieved on Sept. 2, 2014 from http://www.itl.nist.gov/iad/mig/publications/ASRhistory/index.html Kolodner, J.L. (1992). An introduction to case-based reasoning, Artificial Intelligence Review, 6, 3-34. Rowe, R., Creamer, G., Hershkop, S., Stolfo, S.T. (2007). Automated social hierarchy detection through email network analysis. Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Network Analysis,109-117.

Broader Impacts

Automated human interaction profiling is an important task and is highly applicable to the commercial, education, and defense sectors. Analyzing human interactions creates more accurate models of individuals and their relation to the social environment, which allows for better device personalization, personalized education decisions, and determinations about a potential security threat or asset. My research interests lie at the intersection of cognitive science, social cognition, and both statistical and knowledge-based reasoning in artificial, intelligent agents. I want to use current knowledge about the cognitive and cultural practices that underpin human interactions to develop methods for reasoning about people using the social context. 2. Motivations Awareness of the social context is necessary for an agent to demonstrate social cognition and to seamlessly integrate itself into an interaction (de Weerd et al., 2014). Therefore, the ability to reason directly about the social context provides a number of benefits. For example, it could improve the performance of dialogue systems, in-home assistive robots, and automatic profilers. This knowledge will allow a system to more accurately interpret social interactions by being able to account for intensions, expectations, and appropriate or script-like behaviors. 2.1 Example Reasoning about the social context includes the ability to identify the authoritative figure in a given interaction. This provides benefits such as enabling the generation of appropriate responses given the power dynamic; the identification of the best recipient for a given utterance or sentence; and, if necessary, the identification of the most acceptable interactant to contradict or interrupt. These abilities could allow a system to function as an interactive part of a team by providing unprompted, socially appropriate recommendations.

Scale of Use

A few VMs for experiments and project development.

Results

I am still in the process of putting together the base level of this project, therefore, I do not yet have results.