Benchmarking Organizational Performance Dynamics Using Memory Cells Recurrent Neural Networks

Competitiveness in the workforce is being challenged on the basis of hard skills, acquired through lifelong systemic and measured learning, that a computer can be taught to reproduce with unmatched speed and accuracy. Cloud computing, for example, brought with it online accounting, sales, and even contract negotiation, etc. The primary objective is to train a neural model based on soft skills as a complementary set of attributes that include empathy, trustworthiness extraversion, deference, ethicality, neuroticism, etc., to reclaim the competitive edge of the human workforce. The second objective is to present teams (as opposed to individuals) as a better, more optimized unit of organizational charting.


Introduction
Many experts are already asking whether IoT (Internet of Things) will bring with it the "quantified employee" (1) , and if or when the 4 th industrial revolution will make its mark (2) .Computers claim supremacy in speed and accuracy, but judgement and reflex remain a computing complex problem (1,2) .This study attempts to design intuitive memory from a set of vectors to consider in predictive models.
Recent advances in cognitive science argue about the different states of change that the human brain undergo.Some of these changes are shown to affect, in good or bad ways, the operational mechanisms with regards to how humans behave, work, or relate with one another.This and many other researches have led to what is known as brain training (3) .A 2015 census has demonstrated that 50% more engagement has occurred amongst employees in the last 20 years.It claims 75% of the time spent at work is in communicating, exchanging views and meetings (4) .
With the advent of the internet, growing volumes in data, and powerful computing commodity, tough economic conditions have put pressure on competitive differentiation.Predictive analytics are part of a growing toolset used to optimize and automate decisions in order to stay competitive.Predictive models use known results to predict values for new or different data.This probabilistic exercise is based on estimated significance from a set of input variables.
Recent talent acquisition trends favor teams over individuals.Yearly appraisals, for instance, have lost value due to digital workspaces, or co-offices.This new type of workspace abstracts conventional biases of diversity and inclusion by focusing on accountability, data, and transparency.Once the sole responsibility of HR, hiring now runs across multiple teams, hiring managers inside an organizational structure and outside via consulting professionals, crowd-sourcing platforms, or bots (1,4) .

Literature Review
Kozlowski et al., 2003 contributed an interesting differentiation in the field of industrial and organizational psychology that guides this paper's definition of a team (5) .They concluded that organizational charts do not clearly support the quantization of performance, processes, and effectiveness.Primarily because these aspects are of a qualitative nature, and may be difficult to contextualize.They were indirectly referring to soft skills.
A combination of both qualitative and quantitative assessments was later conducted privately by Google with a mix of high-and low-performing teams.The results confirmed a similar discovery made by Dr. Edmondson, who is credited for coining the term "psychological safety" (6) .In her study she attributed overall lack of performance inside an organization to a behavioral trait acquired through experience (childhood, education, etc.) in which individuals, regardless of their responsibilities, carry a certain stigma.Psychologists refer to this trait as "impression management", which studies show a large majority of todays' working class unconsciously share.This stigma is characterized by the absence of a rationale to contribute; which impacts the climate of openness in an organization and therefore its performance.Climate in this context may refer to organizational culture.Dr. Edmondson offered solutions to treat the stigma by prescribing clear guidelines.Google used insights from her research and applied their own internal assessments to derive similar implementations of what it means to be an effective team.Using standard, proprietary methods like the Berkeley Big Five Inventory (BFI), or the Toronto Empathy Questionnaire (7) with other assessments including demographic variables like tenure, seniority, and location (8) , Google inferred statistical models on years of collected data with a wide array of variables (9) .Their methods are proprietary, however this paper offers a proof of concept using synthesized data to train a deeply layered model for prediction with the assumption that there is uncertainty in the problem being solved by the team, and there is a need for interdependence of the members.Studies have shown that committing to a goal can help improve performance, either individual or group-wide (10) , which this paper also assumes a priori for simplicity.
Deeply layered machine learning models apply a method referred to as feature extraction by preserving the spatial relationship between inputs.By systematically convolving through nodes, the saliency of a feature is determined as it is fed to other nodes deeper in the network and via backpropagation to adjusts the biases (11) which in turn supports the prediction accuracy of a new or different input.Assuming a relationship through correlation or causation between inputs and outputs is a heuristic approach to finding the function called universal approximator (the algorithm), such that the mapping of x to y, regardless of the activation function applied, but considering the multilayer feedforward architecture, approximates a known source of truth (12) .
Using the notions of psychological safety and collective intelligence (13) to approximate empathy and openness as a function is this paper's main contribution.Dr. Edmondson suggested psychological safety as another dimension at par with other dimensions like accountability and transparency.A deep layered, multi-input and multi-output model is designed as a representation of the function used to test the hypothesis.

Dataset Composition & Structure
Combining sociologist surveys methods with machine learning, this paper uses multiple sources of research to synthesize the data.Focus is made on (a) an individual's features, (b) the features they exert in a team, (c) the workforce collective features, and (d) the problem features, quantified for their "solvability" boundary and added once the model has built a memory of the available workforce.Results are highlighted and compared using two different optimizations scenarios: (a) Individual features Based on factor analysis of various cognitive abilities and professional achievements.Actual data can be collected from monitored activity or via mobile censors (4,5,8,10) .
(b) Individual behavior in a team Abilities exerted only in a team environment (6,7,9,10) .They may be influenced by group norms.
(d) Problem domain A range of problem definitions are synthesized to represent the unpredictability of a solution (6,9,12) .Classified as either of two possible states: easy or hard, and two outcomes: solved or not.

Training the model
Let (. ) be a non-constant, bounded and increasing function, there exists an integer N, real constants vi, bi ∈ ℝ and real vectors wi ∈ ℝ , where i = 1, …, N such that is an approximate realization of equation 1 (15) .Two optimization techniques are used as " (. ) ", namely the stochastic approximation of minima and maxima with SGD (Stochastic Gradient Descent) and its variant RMSProp (Root Mean Square Propagation, Tieleman et al., 2012) on identical learning rates (15,16) .
With a sample size of 1,000 the training is performed in batches of 10 in 50 epochs.The result is evaluated on a subset of 100 samples.Running two separate supervised learnings, one with SGD, and another with RMSProp, the kernels, or " " in equation 2, also called weights, are optimized using backward propagation.

Network Graph
To retain knowledge of the available workforce, this paper proposes expanding the main input dimensions using an embedding layer, then feeding the results into an LSTM algorithm (16) to build a "gut feeling".
Auxiliary input from Table 1 is then concatenated with the LSTM output from Table 2, and the mashup is fed into a 3-layered ReLu network of 64 nodes each.
The final sigmoid function "  +  " layer outputs the prediction: given a problem X2, individual X1 displays the optimal set of behaviors Y1,2 in his current or assigned group/team configuration.Refer to figure 1.
Loss and accuracy are monitored using binary cross entropy and root mean squared computations.This model uses ReLu (16) to enforce feature detection and classification.

Error and Loss
As the model is being supervised, the proposed method iteratively reviews the error rates and overall training loss using mean squared error (MSE) and binary cross entropy (BCE) respectively (16)(17)(18)(19) .The former squares the difference between the estimator (see equation 2) and the evaluation data, or hypothesis, as per the equation for any vector Ŷ of n predictions, and the vector of synthetic values Y, as proposed in this paper.
Binary cross entropy is computed as per the formula  for training set  accuracy is tested against evaluation data ′ to provide a measure of "encoding length", which is the number of bits expected based on the hypothesis (evaluation data).Distribution is hereby hypothesized as well.

Experiments
The model was designed in Python, version 3.6.3with Keras version 2.0.9 using the TensorFlow library (version 1.4.0),running on a Windows 10 Enterprise, version 1703 with 4GB of memory, operating on x64-based processor (Intel® Core™ i7-2700K at 3.80GHz).
Training samples and validation samples were identical, totaling 1,000 and 100 respectively.Input was fed in batches of 10 samples iterated over 50 epochs.The network graph is also identical in both experiments.Figures 2 and 3 show the readings of the results, which took respectively 108.211 and 116.913 seconds to complete.

SGD Optimized Learning
The input is configured at 12 vectors, or features, generated from synthetic data, given certain heuristics (psychological, sociological, human capital statistics, etc.).The sample contains 1,000 subjects for training.For optimal results, the experiment considers a 10/10 ratio.The samples are fed into the network graph in figure 1 using the Stochastic Gradient Descent (SGD) optimizer, using a technique that applies a parameter update for each training sample at high frequency rates.

RMSProp Optimized Learning
A similar design is repeated using an optimization technique that compensates the average gradient by dynamically increasing the coefficient used in SGD.It is similar in design to the Adagrad optimizer (Zeiler, 2012) but resolves the latter's learning rates limitations.RMSProp stands for Root Mean Squared Propagation.

Results
The SGD optimized learning reached 0.754 accuracy by the epoch 10 out of 50 and remained constant until the end.Loss smoothly decreased from a little above 1.28 at epoch 1 to 1.17899 by epoch 50.The t-SNE visualization algorithm (18) does not suggest a noticeable separation in feature distribution across the dataset.
On the second experiment using RMSProp optimized learning, accuracy started a smooth hiking from 0.750 by epoch 5 until it reached well over 0.97 by the end of learning (epoch 50) against a loss below 0.2.t-SNE visualization shows 2 distinct clusters suggesting a successful learning.

Conclusions
By suggesting more emphasis on collective contributions and deference to group norms, and less on hierarchy or organizational charting, this paper highlights the need to frame each problem solving endeavor as a learning opportunity.The usefulness and contribution of imperfection is encouraged as a necessary step in problem solving, without sacrificing accountability.The proposed methods can be used to support talent acquisition in sourcing good team members by focusing on soft skills.
In future revisions live data will be sourced from a mix of automated surveys and API fetches on social media networks like Reddit, Instagram, etc. which store valid timeseries data about familiarity (follows, followers, friend requests) and common interests (shared likes), cited here as non-exhaustive examples.