# cognitiveinspired_conversationalstrategy_reasoner_for_sociallyaware_agents__0a4d70e2.pdf Cognitive-Inspired Conversational-Strategy Reasoner for Socially-Aware Agents Oscar J. Romero*, Ran Zhao+, Justine Cassell+ *Machine Learning Department, Carnegie Mellon University +Human-Computer Interaction Institute, Carnegie Mellon University {oscarr, rzhao1}@andrew.cmu.edu, justine@cs.cmu.edu In this work we propose a novel module for a dialogue system that allows a conversational agent to utter phrases that do not just meet the system s task intentions, but also work towards achieving the system s social intentions. The module - a Social Reasoner - takes the task goals the system must achieve and decides the appropriate conversational style and strategy with which the dialogue system describes the information the user desires so as to boost the strength of the relationship between the user and system (rapport), and therefore the user s engagement and willingness to divulge the information the agent needs to efficiently and effectively achieve the user s goals. Our Social Reasoner is inspired both by analysis of empirical data of friends and stranger dyads engaged in a task, and by prior literature in fields as diverse as reasoning processes in cognitive and social psychology, decisionmaking, sociolinguistics and conversational analysis. Our experiments demonstrated that, when using the Social Reasoner in a Dialogue System, the rapport level between the user and system increases in more than 35% in comparison with those cases where no Social Reasoner is used. 1 Introduction and Motivation At any one time, people speaking in conversation are pursuing multiple goals [Tracy and Coupland, 1990]. These can be divided into three categories: those that fulfill: propositional functions, contributing informational content to the dialogue; interactional functions, managing the speaking turns and other aspects of conversation; and interpersonal functions, managing relational goals such as building rapport [Cassell and Bickmore, 2003; Fetzer, 2013]. AI systems that communicate with their users have focused in vast majority on propositional goals, with some newer work on the interactional. More recently, however, it has been recognized that in order to build trust in the system, and evoke the desire to use the system long term, the system must also build a relational bond. In this paper we focus on how systems can build strong relational bonds using specific conversational strategies - units of discourse that are larger than speech acts - chosen automatically by a social reasoner module (SR). Specifically here we focus on how to automatically select among seven common conversational strategies shown to positively impact rapport [Tajfel and Turner, 1979; Spencer Oatey, 2008]: Self-Disclosure (SD), revealing personal information, to decrease social distance; Question Elicitation of Self-Disclosure (QESD), which is used to encourage the other interlocutor to self-disclose; Reference to Shared Experiences (RSE), that indexes common history; Praise (PR), that serves to increase self-esteem in the listner and therefore interpersonal cohesiveness; Adhere to Social Norm (ASN), that increases coordination by adhering to behavior expectations guided by sociocultural norms; Violation of Social Norm (VSN), where general norms are purposely violated to accommodate the others behavioral expectations; and Acknowledgement (ACK), a way to show that the interlocutor is listening. These are strategies rather than goals, as any one of them might realize a specific communicative goal. For example, for the conversational goal greeting , a system might produce these realizations based on the selected conversational strategy: ASN: Hi, I am Steve. May I ask your name?; SD: Hi! I m so glad you re here!; VSN: Hey buddy!; PR: Hi, I m Steve! It s such an honor to meet you! This perspective has been adopted from [Zhao et al., 2014], which developed a computational model of rapport to learn the use of appropriate conversational strategies that contribute to building, maintaining or even destroying interpersonal (or human-agent) bonds. In this paper, we propose to leverage this module and other sociocultural theories to design a reasoning model called Social Reasoner whose purpose is to select the appropriate conversational strategy to raise rapport. Given that rapport-management is a dyadic process, intrinsically involving two individuals, our system must fulfill two critical prerequisites: understanding the user s conversational strategy in real-time, and estimating the level of rapport, or relationship strength, at any given moment. The first prerequisite was fulfilled by [Zhao et al., 2016a], who trained a Conversational Strategy Classifier to automatically recognize user s way of presenting themselves by including contextual features drawn from the verbal, visual and vocal modalities of both interlocutors, from the current and previous turn. Their approach has been integrated into our decision-making system. The second prerequisite was fulfilled by [Zhao et al., 2016b], who focused on data-driven discovery of the temporally co-occurring and contingent behavioral patterns that sig- Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) nal high, medium and low interpersonal rapport. Their forecasting model, called Rapport Estimator, has been shown to have strong predictive power on rapport estimation in real time, which is also integrated into the current work. Here, then, we describe a Social Reasoner module that is capable of taking input from both the Rapport Estimator and User s Conversational Strategy Classifier described above, reasoning about how to respond to the social intentions underlying those particular behaviors (such as to raise rapport), and generating appropriate social conversational responses with the system s goal of always keeping rapport high in order to increase trust and long-term engagement. While there are several potential approaches, most of them are not suitable for our purposes: since the large and increasing number of inputs that the Social Reasoner must process continuously, selecting a proper conversational strategy becomes a combinatorial explosion problem that results almost intractable to solve with a pure symbolic approach such as production rule systems or classic planners. On the other hand, pure subsymbolic or connectionist approaches fail to semantically express the relationships between inputs, outputs, and negative and positive consequences of triggering a particular conversational strategy. Therefore, we employ a hybrid approach that takes advantage of the features of a classic planner governed by spreading activation dynamics. In fact, the hybrid model proposed by [Maes, 1989] and extended by [Romero, 2011], so-called Behavior Networks, perfectly fits our needs. 2 Related Work Below we will describe related work that focuses on computational modeling decision-making processing in agent to build long-term relationship with human. [Bickmore and Schulman, 2012] proposed a computational model of user-agent relationship which was inspired from accommodation theory. They defined a set of activities that user is willing to perform with agent. Those activities were described as dialogue acts. Their reactive algorithm selected the most appropriate dialogue act in order to advance useragent intimacy. However, the study indicated that their algorithm successfully adapted to user s desired intimacy level but failed to increase intimacy along with the user-agent interaction. As a side note, their system understood user-agent relationship through questionnaire instead of automatically reasoning the real-time closeness level, which was harmful to their decision-making process. Similarly, [Coon et al., 2013] targeted on developing closeness in human-agent interactions through implementing an algorithm to plan appropriate joint activities. The algorithm modeled the difference between relationship stages from stranger to companion. The decision-making process of this activities planner was based on the required closeness level of each activity while the algorithm optimized its performed activities to achieve user-agent intimacy over time. However, since [Coon et al., 2013] handcrafted specifics activities for each stage, it is a challenge to scale up their algorithm. Actually, we are not the first ones to propose using a behavior network to model social dialogue in human communication. In the past, [Cassell and Bickmore, 2003] constructed a discourse planner that could interleave small talk and task talk during the real estate buyer interview. The conversational moves such as introducing new topic in dialogue were planned in order to maximize trust building while pursing task goal of selling real estate. Their implementation utilized activation network to simply adjust agent linguistic behavior - more or less polite, more or less task-oriented, or more or less deliberative, but not for deciding which conversational strategy fitted better during each state of the conversation. 3 System Architecture The proposed Social Reasoner described here is one module of a fully implemented virtual personal assistant so-called SARA: Socially-Aware Robotic Assistant1. Using a Global Workspace approach and a spreading activation model, we endow our social reasoner with both short-term and longterm decision-making skills that allow it to reactively select a proper conversational strategy while deliberatively tailoring a plan (sequence of conversational strategies) in the background. The other modules of the system have been described elsewhere by us or the researchers from whom we adopted the modules. Our purpose here is to motivate and then evaluate the use of this kind of Social Reasoner, which has some specific properties due to its hybrid nature, specifically to a) efficiently make both short-term decisions (real-time or reactive reasoning) and long-term decisions (deliberative reasoning and planning); b) the knowledge is encoded by using both symbolic structures (i.e., semantic-labeled nodes and links) and sub-symbolic operations (i.e., spreading activation dynamics); and c) its network s operation is grounded on cognitive psychological phenomena such as subliminal priming, automaticity with practice, and selective attention, whereas the design of its network s structure relies on observations extracted from data-driven models. 3.1 Modules Description The Social Reasoner s architecture is depicted in figure 1. They are described as follows: 1) Working Memory (WM): short-term memory that stores chunks of environmental information (percepts) that are then processed by the Social Reasoner s decision module; 2) Goals: a hierarchy of both task (e.g., generate a recommendation) and social goals (e.g., build rapport); 3) Social Reasoner History (SRH): records of all past decisions (i.e., system conversational strategies); 4)Selective Attention (SA): the most relevant, important, urgent, and insistent information at the moment, which will be selected to be processed by the decision module based on the Global Workspace Theory [Baars, 2003]; 5) Action Selection (AS): this module is in charge of choosing a conversational strategy as a consequence of the decision-making dynamics. This module is implemented as a Behavior Network (originally proposed by [Maes, 1989] and extended by [Romero, 2011]). 6) Learning Processing (LP): this module is responsible of adapting the system parameters in real-time. However, this is part of our future work so we will not go into further details; 7) Other Modules: Social Reasoner interfaces with other modules that are commonly used in Dialogue Systems and conversational agents, such as ASR, NLU, NLG, etc. 1http://articulab.hcii.cs.cmu.edu/projects/sara/ Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) Figure 1: System Architecture Social Reasoner s Decision module is crafted as a network of interacting nodes where decision-making emerges from the dynamics of relationships among those nodes. 4 Computational Model In the following, we will provide details of our Behavior Network formalism. A Behavior Network (BN) is a spreading activation model proposed by [Maes, 1989] as a collection of competence modules which works in a continuous domains. Behavior selection is modeled as an emergent property of activation/inhibition dynamics among all behaviors. A behavior i can be described by a tuple < ci, ai, di, αi > where ci is a list of pre-conditions which have to be fulfilled before the behavior can become active, ai and di represent the expected (positive and negative) effects of the behavior s action in terms of an add list and a delete list. Additionally, each behavior has a level of activation αi. If the proposition p about environment is true and p is in the pre-condition list of the behavior i, there is an active link from the state p (proposition about environment) to the behavior i. If the goal g has an activation greater than zero and g is in the add list of the behavior i, there is an active link from the goal g to the behavior i. Internal links include predecessor links, successor links, and conflicter links. There is a successor link from behavior i to behavior j for every proposition p that is member of the add list of i and also member of the pre-condition list of j A predecessor link from behavior j to behavior i exists for every successor link from i to j. There is a conflicter link from behavior i to behavior j for every proposition p that is a member of the delete list of j and a member of the pre-condition list of i. The following is the procedure for decision-making: 1. Calculate the excitation coming in from environment. 2. Spread excitation along the predecessor, successor, and conflicter links, and normalize the behavior activations so that the average activation becomes equal to π. 3. Check any executable behaviors, choose the one with the highest activation, and execute it. A behavior is executable if all the pre-conditions are true and if its activa- tion is greater than the global threshold. If no behavior is executable, reduce the threshold and repeat the cycle. Additionally, the model defines five global parameters that can be used to tune the spreading activation dynamics: π is the mean level of activation, θ is the threshold for becoming active which is lowered each time none of the modules could be selected and reset to its initial value otherwise, φ is the amount of activation energy a proposition that is observed to be true injects into the network, γ is the amount of energy a goal injects into the network, and δ is the amount of activation energy a protected goal takes away from the network. One important contribution made to the original Behavior Networks model is that we use a partial matching approach rather than a strict full matching approach; that is, the original model states that a behavior is activated only when all its pre-conditions are true, which works well when using discrete variables, however, we deal with continuous variables in a frequently-changing environment, so behaviors are almost never activated under these conditions. We propose the definition of categories to group sets of well-defined pre-conditions with something in common. An inclusive OR operator is used to evaluate intra-category pre-conditions and an AND to evaluate inter-category pre-conditions, that is, there must be at least one pre-condition per category that is true. This scheme is much more flexible and allows more combinations of preconditions that can trigger a particular behavior. Table 1: Pre-condition and Post-condition Categories Category Pre-conditions and Post-conditions Rapport level low, medium, and high Rapport delta decreased, maintained and increased System and User conv. strategies asn, vsn, sd, qesd, se, ack, pr, not asn, not vsn, not sd, not qesd, not se, etc. User non-verbals gaze elsewhere, gaze partner, head nod, smile, etc. Dialogue history number of turns, sd user history, pr system history, qesd user history, etc. System intent greeting, do goal elicitation, start interest elicitation, start recommendation, do recommendation, end recommendation, farewell, etc. In our model, each behavior corresponds to a specific conversational strategy (e.g., SD, PR, VSN, etc.) where preconditions are divided into categories, as shown in table 1, and post-conditions are defined in terms of what the expected states are after performing the current conversational strategy (e.g., rapport score increases, user smiles, etc.). This kind of chaining reasoning based on linked pre-conditions and postconditions endows the system with planning ahead capabilities. Intuitively, the Social Reasoner can tailor a deliberative plan as the aggregation of nodes connected through both predecessor and successor links, for instance, when a conversation starts the most likely sequence of nodes could be: , that is, initially the system establishes a cordial and respectful communication with user (ASN), then it uses SD as an icebreaking strategy[Altman and Taylor, 1973], followed by PR to encourage the user to also perform SD. After some interaction, if the rap- Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) port level is high, a VSN is performed. Coalitions are created between nodes, so ASN would spread forward some energy to SD, and SD would spread backward some energy to ASN, and the same between SD and PR, and between PR and SD, etc. Inhibitory links avoid wrong conversational strategies to be triggered. The Social Reasoner is adaptive enough to respond to unexpected users actions by executing a reactive plan that emerges from forward and backward spreading activation dynamics as well as from the network s parameters configuration that determines the global system s behavior, for instance, it can make the system more goal-oriented vs. situation-oriented, more adaptive vs. biased to ongoing plans, more thoughtful vs. faster. 5 Design of the Decision-Making module 5.1 Sources of Information As is clear from the description above, the nature of the precondition and post-conditions is key to the functioning of the systems. We extracted information for these conditions from two sources: theoretical and empirical data. Theoretical Sources Rapport Theory: [Zhao et al., 2014] proposed a computational model to explain how humans in dyadic interactions build rapport through the use of conversational strategies that adapt to the dynamics of each other s behavioral expectations. At the beginning of the interaction, one tends to be tentative and polite, adhering to social norms. Initiating a selfdisclosure at this stage will both signal attention and elicit self-disclosure from the interlocutor which, in turn, enables both parties to gradually learn each other s behavioral expectations. During this stage of interaction, praise can boost self-esteem and motivate the interlocutor to diminish social distance. Thus, adhering to social norms, self-disclosure and praise are three trending conversational strategies in the early stage of communication. As the interaction proceeds, interlocutors have more interpersonal knowledge to guide their behavior. They refer to shared experience to index commonality and purposely violate social norm in order to accommodate each other s behavioral expectations, and signal that they are now outside the phase of pure politeness. Norm of Reciprocity: Reciprocity of behavior [Burger et al., 2009] plays an important role in increasing coordination between interlocutors. Our annotations of conversations revealed that most of the conversational strategies described here are used reciprocally (referring to shared experience evokes the same behavior from one s conversation partner). Thus, one pre-condition for praise is that the user hasn t praised in the previous turn. Data-driven Sources Data-driven discovery by temporal association rule: [Zhao et al., 2016b] applied a data mining algorithm to separately learn behavioral rules for friends and strangers. In our Social Reasoner, we input phase of interaction (early, middle, late) as a variable. Early stages of the interaction were determined by rules learned from the stranger data, later stages by friend rules. For instance, a rule that strangers followed was: one of the interlocutors smiles while the other gazes at the partner and begins self-disclosing, so we defined smile as one of a set of optional pre-conditions for self-disclosure. Data from Wizard-of-Oz study: We collected data from 228 English-speakers interacting with a virtual assistant acting as a guide that recommends sessions to attend and people to meet at the Annual Meeting of the Champions organized by the WEF (World Economic Forum) in Tianjin 2016 and in Davos 2017. In each session, a dyad consisting of a user and the virtual assistant (using a Wizard of Oz protocol) interacted through a dialogue system interface for around 8-10 minutes. During conversation, the agent elicited the users interests and preferences and used these to improve its recommendations. The user s verbal and non-verbal behaviors were recorded by the system while the woz-er picked the next utterance for the agent depending on the user s utterance, the current task and goal, as well as the Wo Zer s assessment of most appropriate conversational strategy to build rapport. After conducting the study, only those decisions made by the woz-er that had a significant impact on building rapport (i.e., increasing rapport) and raising engagement (defined here as increase conversation length) were taken into account. 5.2 Encoding of Pre-conditions & Post-conditions We modeled a Behavior Network with seven behaviors (one for each conversational strategy). Their pre-conditions and post-conditions were designed by following a two-way tuning process: initially, for each behavior, we identified a sub-set of precondtions and post-conditions (from table 1) based on the theoretical foundations provided in section 5.1; then we validated the previous model through the empirical analysis of data obtained from the Wizard-of Oz study. For the latter process, we ran a feature selection statistic analysis, more specifically, a bidirectional elimination stepwise regression that allowed us, through a series of partial F-test, to include or drop candidate variables from each behavior. This process helped us to discover which sub-set of variables and features should be considered as pre-conditions and post-conditions for each behavior because of their impact and significance. For instance, the theoretical foundation guided us to identify a sub-set of pre-conditions for PR as follows: however, the stepwise regression analysis told us that we need to include at least three more pre-conditions: (F: 95.7, p-value: 0.00), (F: 56.8, p-value: 0.00002) and (F: 17.6, p-value: 0.00073); and remove pre-condition ) (F: 3.4, p-value: 0.005) to improve the accuracy on conversational strategy prediction. An excerpt of the final tuned behaviors pre-conditions and post-conditions is shown below. Self-Disclosure Pre-conditions: [low rapport, medium rapport, rapport decreased], [sd user, qesd user], [smile, gaze elsewhere], [introduce, start*], . .. Post-conditions (add): [sd history, smile, gaze partner, rapport increased, rapport maintained], . .. Post-conditions (delete): [rapport decreased, sd user, qesd user, pr history, vsn history, introduce, start *], . .. Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) Acknowledgement Pre-conditions: [sd user, vsn user], [gaze partner], [not ack history user, not ack history system], [feedback *] Post-conditions (add): [ack history, rapport maintained] Post-conditions (delete): [not ack history, feedback *] Praise Pre-conditions: [low rapport], [not pr user], [not pr history user, sd history system, turns lower thresh, not pr history system, qesd history system], ... Post-conditions (add): [pr system, pr history, rapport increased, rapport maintained], . .. Post-conditions (delete): [low rapport, not pr history], Question Elicitation Self-disclosure Pre-conditions: [ rapport increased], [not qesd history, not sd history], [do *, preclosing, ask *] . .. Post-conditions (add): [qesd system, gaze partner] . .. Post-conditions (delete): [not qesd history system, not sd history system, do *, preclosing, ask *], . .. Reference to Shared Experiences Pre-conditions: [medium rapport, high rapport], [rse user, sd user, vsn user], [vsn history, not rse history system], [available shared experiences] .. . Post-conditions (add): [rse history, rapport increased, rapport maintained, gaze partner], . .. Post-conditions (delete): [gaze elsewhere], .. . Adhere to Social Norm Pre-conditions: [low rapport, medium rapport], [not asn history system], [outcome * recommendation, preclosing, greeting, farewell, feedback *, start *, ... ] Post-conditions (add): [asn system, asn history, rapport maintained, gaze partner, .. . ] Post-conditions (delete): [not asn history system, [outcome recommendation, farewell, feedback *, . .. ] Violation of Social Norm Pre-conditions: [high rapport], [vsn user], [smile, gaze partner], [turns higher threshold], [once vsn history user, not vsn history system], [start *, feedback *,].. . Post-conditions (add): [vsn history, rapport increased,] Post-conditions (delete): [not vsn history system, greeting, start *, feedback *, do *, . .. ] 5.3 Spreading Activation Parameters: Following the guidelines proposed by [Romero, 2011; Romero and de Antonio, 2012] and through empirical analysis, we determined that the best configuration of the spreading activation parameters is as follows: 1. To keep the balance between deliberation and reactivity, φ > γ, so φ = 68 and γ = 42. 2. To keep the balance between bias towards ongoing plan vs. adaptivity, π > γ π < φ, so φ = 50. 3. To preserve sensitivity to goal conflict, δ > γ, so δ = 75. 6 Experimentation and Results Our experiments focused on evaluating three aspects of our work: 1) determining whether social reasoning can increase rapport and raise engagement; 2) evaluating the degree of effectiveness and accuracy of the Social Reasoner after the data-driven tuning process; and 3) evaluating the performance of the Social Reasoner during interaction with users. 6.1 Experiment 1: Social Reasoning validity H0 : Social Reasoning doesn t contribute significantly to build rapport and increase conversational engagement in comparison with traditional dialogue systems. For this experiment we divided the WOZ study dataset of 228 sessions (section 5.1) into two groups: dialogue turns that used conversational strategies and dialogue turns that did not use any conversational strategy (plain behavior). Then, we observed the rapport score (1-7), our variable of interest. We ran an one-way ANOVA analysis in order to find out whether there is a statistically significant difference between the groups at p < .05. The ANOVA is shown in table 2. Table 2: ANOVA for Experiment 1. Sc. of Variation df SS MS F p Between groups 2 1012398 687297.4 4.52 0.007% Withing groups 154 1672037 293898.8 Total 156 2684435 Since p is less than .05 we can conclude that there is a statistically significant difference between the two groups. A Tukey post-hoc test revealed that rapport scores of the group that uses social reasoning was higher (5.65 0.4, p = .032) in comparison with the group that uses a traditional approach no social reasoning (3.17 0.6, p = .028) and therefore we can reject the null hypothesis H0 that social reasoning doesn t contribute significantly to build rapport. Likewise, we conclude that using social reasoning may improve social bonds (rapport) on a 35.4% during a conversation. 6.2 Experiment 2: Social Reasoner s accuracy H0 : Data-driven tuning process doesn t improves Social Reasoner s accuracy For this experiment we used the WOZ study dataset as a ground truth. Then we ran a simulation for all 228 sessions, where system inputs were signals from the understanding module, the task reasoner, and the history databases; and outputs were the conversational strategies picked by the wozer. Then, we compared each woz-er output with the social reasoner s output for two different scenarios: before tuning the decision-making module (i.e., using only a theoreticaldriven design) and after tuning (i.e., using both a theoretical and data-driven design). We ran an one-way ANOVA analysis and results are shown in table 3. Since p is considerably lower than α, we can conclude that there is a statistically significant difference between the two Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) groups. A Tukey post-hoc test revealed that rapport scores of the group that received a data-driven tuning was higher (4.83 0.5, p = .027) in comparison with the group that only used a theoretical-based design (3.05 0.4, p = .033) and therefore we can reject the null hypothesis that data-driven tuning doesn t improve the Social Reasoner s accuracy. Also, we conclude that using a data-driven tuning process along with a theoretical-driven design may improve the accuracy of Social Reasoner up to a 25.4%. Table 3: ANOVA for Experiment 2. Sc. of Variation df SS MS F p Between groups 4 2984714 873394.3 5.34 0.005% Withing groups 173 3439465 363797.8 Total 175 6424179 6.3 Experiment 3: Social Reasoner s performance For this experiment we chose four well-characterized conversational sessions from the dataset log files in the postexperimental evaluation to test the system s performance. Below is the description of each one: Flat User Scenario (FU): user s verbal and non-verbal behaviors remain the same during conversation, e.g., rapport level is medium all the time, no smile, and user s conversational strategy is ASN most of the time. Incremental Engagement Scenario (IE): user is getting more engaged in conversation over time, e.g., rapport level increases gradually, user smiles more often, and user s conversational strategy is mostly SD and VSN. Low Rapport Scenario (LR): during most of the conversation user keeps a low rapport level, no smiles and barely makes eye contact. Losing Interest Scenario (LI): initially, user is very engaged during conversation (i.e., high rapport, a lot of smiles and eye contact, user s conversational strategies are SD and VSN, etc.) but gradually he is losing interest. Table 4: Social Reasoner s performance. MSE: Mean Square Error, MSE Rate: [1 - (MSESR MSET D)] Scenario Std Dev MSET D MSESR MSE Rate FU 0.83 1.31 0.86 34.35% IE 0.73 2.12 1.68 20.75% LR 0.52 0.96 0.68 29.16% LI 0.93 1.54 1.05 31.81% Table 4 shows the statistical data for experiment 3. The MSE for each scenario is the mean square error of 20 turns, where an error is considered as a drop on the rapport score as consequence of activating the wrong conversational strategy. The MSE rate presents the performance relationship between MSET D (a traditional dialogue system that doesn t use conversational strategies) and MSESR (a dialogue system that uses our Social Reasoner). It is important to notice that, for the experiments executed, the proposed Social Reasoner model improves the performance results obtained by a traditional dialogue system a rate between 20% and 34%. It is worth mentioning that having the highest activation level is not the only criteria to chose a particular conversational strategy (CS), but it must be also executable and its ac- tivation level must be over the threshold, otherwise, the next CS which meets those conditions will be chosen. Intuitively, one can deduce that the Social Reasoner emergently tailors a plan as the combination of SD, PR and QESD strategies when detecting the user is not engaged during interaction as expected (e.g., in LR and LI scenarios). Conversely, VSN is avoided when trying to recover both user s attention and interest, and also his rapport level is low (as at the end of LR, and in FU). On the other hand, reactive decisions such as using VSN or RSE are made when the system detects the user is more receptive to this kind of strategies, even if they are not the ones with the highest activation level. ACK is more likely to appear when there is evidence of progressive raising of user s engagement, since conversational strategy such as ASN, SD and RSE spread more activation energy forward and backward to it. Also, it is interesting to see how ASN is activated at an early stage of the conversation (e.g., IE scenario) but remains accumulating energy during the whole interaction so it can be easily triggered if the system realizes that a previous action (as consequence of using a particular CS) causes a diminishing on the rapport level. Finally, PR is continually used when the Social Reasoner detects no significant changes on user s verbal and non-verbal behaviors that can raise rapport, specially when other conversational strategy such as SD and QESD have been used without success. 7 Conclusions and Future Work We proposed a hybrid adaptive Social Reasoner component that determines which conversational strategy should be used in order to build and maintain rapport with a user. The Social Reasoner interacts with several modules that can be connected and disconnected while its behavior remains robust. A spreading activation approach was merged with classic planner features and extended to allow the system to partially match pre-conditions by using an OR operator rather than the conventional AND operator, and as a consequence expanding the number of possible combinations between matched pre-conditions and triggered conversational strategies. As future work we hope to: 1) continue collecting data from user interaction to fine tune the system and improve its performance; and 2) explore an alternative to learn and adjust preconditions and post-conditions. Rather than using a fixed set of pre-conditions and post-conditions we will use our datadriven model as a cold-start solution while more suitable preconditions and post-conditions are discovered over time by a learning process that may personalize the interaction with the user. One approach is to assign weights to pre-conditions and post-conditions based on saliency properties observed from data. That is, during the stepwise regression analysis some variables produced stronger effects on spreading activation dynamics than others, for instance, variable past experiences available had a stronger effect on RSE than low rapport , so the former could have a weight of, e.g., 0.93 while the latter a weight of 0.12. RSE would then be triggered faster when the former variable is present. After that, weights could be adjusted through reinforcement learning. Acknowledgments This work was partially supported by Yahoo! through CMUs In Mind project. Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) [Altman and Taylor, 1973] Irwin Altman and Dalmas Taylor. Social penetration theory. New York: Holt, Rinehart &\ Mnston, 1973. [Baars, 2003] Bernard J. Baars. The global brainweb: An update on global workspace theory. Science and Consciousness Review, 2003. [Bickmore and Schulman, 2012] Timothy Bickmore and Daniel Schulman. Empirical validation of an accommodation theory-based model of user-agent relationship. In International Conference on Intelligent Virtual Agents, pages 390 403. Springer, 2012. [Burger et al., 2009] Jerry M Burger, Jackeline Sanchez, Jenny E Imberi, and Lucia R Grande. The norm of reciprocity as an internalized social norm: Returning favors even when no one finds out. Social Influence, 4(1):11 17, 2009. [Cassell and Bickmore, 2003] Justine Cassell and Timothy Bickmore. Negotiated collusion: Modeling social language and its relationship effects in intelligent agents. User Modeling and User-Adapted Interaction, 13(12):89 132, 2003. [Coon et al., 2013] William Coon, Charles Rich, and Candace L Sidner. Activity planning for long-term relationships. In Intelligent Virtual Agents: 13th International Conference, IVA 2013, Edinburgh, UK, August 2931, 2013, Proceedings, volume 8108, page 425. Springer, 2013. [Fetzer, 2013] Anita Fetzer. no thanks : a socio-semiotic approach. Linguistik Online, 14(2), 2013. [Maes, 1989] Pattie Maes. How to do the right thing. Connection Science, 1(3):291 323, 1989. [Romero and de Antonio, 2012] Oscar J. Romero and Angelica de Antonio. Evolving the way of doing the right thing. In Proceedings of the IEEE Congress on Evolutionary Computation, CEC, pages 1 8. Springer, 2012. [Romero, 2011] Oscar J. Romero. An evolutionary behavioral model for decision making. Adaptive Behavior, 19(6):451 475, 2011. [Spencer-Oatey, 2008] Helen Spencer-Oatey. Face,(im) politeness and rapport. 2008. [Tajfel and Turner, 1979] Henri Tajfel and John C Turner. An integrative theory of intergroup conflict. The social psychology of intergroup relations, 33:47, 1979. [Tracy and Coupland, 1990] Karen Tracy and Nikolas Coupland. Multiple goals in discourse: An overview of issues. Journal of Language and Social Psychology, 9(1-2):1 13, 1990. [Zhao et al., 2014] Ran Zhao, Alexandros Papangelis, and Justine Cassell. Towards a dyadic computational model of rapport management for human-virtual agent interaction. In Intelligent Virtual Agents, pages 514 527. Springer, 2014. [Zhao et al., 2016a] Ran Zhao, Tanmay Sinha, Alan Black, and Justine Cassell. Automatic recognition of conversational strategies in the service of a socially-aware dialog system. In 17th Annual SIGDIAL Meeting on Discourse and Dialogue, 2016. [Zhao et al., 2016b] Ran Zhao, Tanmay Sinha, Alan Black, and Justine Cassell. Socially-aware virtual agents: Automatically assessing dyadic rapport from temporal patterns of behavior. In 16th International Conference on Intelligent Virtual Agents, 2016. Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17)