Publications

Stillman, Paul. E., James D. Wilson, Matthew J. Denny, Bruce Desmarais, Shankar Bhamidi, Skyler Cranmer, & Zhong-Lin Lu (2017). Statistical Modeling of the Default Mode Brain Network Reveals a Segregated Highway Structure. Scientific Reports, 7(11694), 1-14. doi.org/10.1016/j.socnet.2016.11.002

In most domains of network analysis researchers consider networks that arise in nature with weighted edges. Such networks are routinely dichotomized in the interest of using available methods for statistical inference with networks. The generalized exponential random graph model (GERGM) is a recently proposed method used to simulate and model the edges of a weighted graph. The GERGM specifies a joint distribution for an exponential family of graphs with continuous-valued edge weights. However, current estimation algorithms for the GERGM only allow inference on a restricted family of model specifications. To address this issue, we develop a Metropolis–Hastings method that can be used to estimate any GERGM specification, thereby significantly extending the family of weighted graphs that can be modeled with the GERGM. We show that new flexible model specifications are capable of avoiding likelihood degeneracy and efficiently capturing network structure in applications where such models were not previously available. We demonstrate the utility of this new class of GERGMs through application to two real network data sets, and we further assess the effectiveness of our proposed methodology by simulating non-degenerate model specifications from the well-studied two-stars model. A working R version of the GERGM code is available in the supplement and is incorporated in the GERGM CRAN package.

ben-Aaron, James, Matthew J. Denny, Bruce A. Desmarais, Hanna Wallach (2017) "Transparency by Conformity: A Field Experiment Evaluating Openness in Local Governments." Public Administration Review, 77(1), 68–77. doi.org/10.1111/puar.12596

Sunshine laws establishing government transparency are ubiquitous in the United States; however, the intended degree of openness is often unclear or unrealized. Although researchers have identified characteristics of government organizations or officials that affect the fulfillment of public records requests, they have not considered the influence that government organizations have on each other. This picture of independently acting organizations does not accord with the literature on diffusion in public policy and administration. In this article, we present a field experiment to test whether a county government's fulfillment of a public records request is influenced by the knowledge that its peers have already complied. We argue that knowledge of peer compliance should (1) induce competitive pressures to comply and (2) resolve legal ambiguity in favor of compliance. We find evidence of peer conformity effects both in the time to initial response and in the rate of complete request fulfillment.

[preprint]

Handler, Abram, Denny, Matthew J., Wallach, Hanna, & O’Connor, Brendan (2016). Bag of What? Simple Noun Phrase Extraction for Text Analysis. In Proceedings of the Workshop on Natural Language Processing and Computational Social Science at the 2016 Conference on Empirical Methods in Natural Language Processing, 114-124. aclweb.org/anthology/W/W16/W16-56.pdf

Social scientists who do not have specialized natural language processing training often use a unigram bag-of-words (BOW) representation when analyzing text corpora. We offer a new phrase-based method, NPFST, for enriching a unigram BOW. NPFST uses a part-of-speech tagger and a finite state transducer to extract multiword phrases to be added to a unigram BOW. We compare NPFST to both n-gram and parsing methods in terms of yield, recall, and efficiency. We then demonstrate how to use NPFST for exploratory analyses; it performs well, without configuration, on many different kinds of English text. Finally, we present a case study using NPFST to analyze a new corpus of U.S. congressional bills.

de Oliveira, Angela C. M., John M. Spraggon, and Matthew J. Denny (2016) "Instrumenting Beliefs in Threshold Public Goods." PLoS ONE 11(2): e0147043. doi:10.1371/journal.pone.0147043

Understanding the causal impact of beliefs on contributions in Threshold Public Goods (TPGs) is particularly important since the social optimum can be supported as a Nash Equilibrium and best-response contributions are a function of beliefs. Unfortunately, investigations of the impact of beliefs on behavior are plagued with endogeneity concerns. We create a set of instruments by cleanly and exogenously manipulating beliefs without deception. Tests indicate that the instruments are valid and relevant. Perhaps surprisingly, we fail to find evidence that beliefs are endogenous in either the one-shot or repeated-decision settings. TPG allocations are determined by a base contribution and beliefs in a one shot-setting. In the repeated-decision environment, once we instrument for first-round allocations, we find that second-round allocations are driven equally by beliefs and history. Moreover, we find that failing to instrument prior decisions overstates their importance.

[article], [supporting information], [replication data], [BibTeX]

Working Papers

Text Preprocessing For Unsupervised Learning: Why It Matters, When It Misleads, And What To Do About It (conditionally accepted) Political Analysis

Despite the popularity of unsupervised techniques for political science text-as-data research, the importance and implications of preprocessing decisions in this domain have received scant systematic attention. Yet, as we show, such decisions have profound effects on the results of real models for real data. We argue that substantive theory is typically too vague to be of use for feature selection, and that the supervised literature is not necessarily a helpful source of advice. To aid researchers working in unsupervised settings, we introduce a statistical procedure that examines the sensitivity of findings under alternate preprocessing regimes. This approach complements a researcher's substantive understanding of a problem by providing a characterization of the variability changes in preprocessing choices may induce when analyzing a particular dataset. In making scholars aware of the degree to which their results are likely to be sensitive to their preprocessing decisions, it aids replication efforts. We make easy-to-use software available for this purpose.

The Importance of Generative Models for Assessing Network Structure (2016)

Network representations, theories, and methods have emerged as a powerful framework for examining political phenomena at a systems level. This article illustrates the importance of specifying and testing theories about the structure of political networks at multiple levels, and the potential pitfalls of failing to model the processes by which these networks were generated. These pitfalls range from missing important alternative explanations for an observed network structure, to specifying a model which does not test the hypotheses that derive from the researcher’s theory. It then combines these insights into a general framework for assessing the structural properties of political networks. The framework is applied to re-specify an existing political science study that seeks to engage with network concepts. I find that the approach taken in this paper can dramatically affect inferences about the structure of a political system.

Influence in the United States Senate (2015)

Interpersonal influence plays an important role in determining what legislation will see action the United States Congress. Building on theories that conceptualize a legislator’s influence as an individual property, I cast influence in a relational framework, recognizing that influence is exercised through legislators’ social and political networks. I develop a novel measure of legislative influence using temporal patterns in bill cosponsorship data as an instrument to infer a latent network of influence relationships between legislators. I then validate the measure of legislative influence I derive from these networks in an extension of a recently published study and in the context of predicting bill advancement. I find that my measure performs like an effective measure of interpersonal influence in Congress.

Graph Compartmentalization (2014)

This article introduces a concept and measure of graph compartmentalization. This new measure allows for principled comparison between graphs of arbitrary structure, unlike existing measures such as graph modularity. The proposed measure is invariant to graph size and number of groups and can be calculated analytically, facilitating measurement on very large graphs. I also introduce a block model generative process for compartmentalized graphs as a benchmark on which to validate the proposed measure. Simulation results demonstrate improved performance of the new measure over modularity in recovering the degree of compartmentalization of graphs simulated from the generative model. I also explore an application to the measurement of political polarization.

Posters

1

Complex Stochastic Weighted Graphs: Flexible Specification and Simulation

Presented at the International Conference on Computational Social Science , Helsinki Finland, 2015.

1

Topic-Conditioned Hierarchical Latent Space Models for Text-Valued Networks

Presented at PolMeth XXXI , University of Georgia, 2014.

1

Influence in the United States Senate

Presented at PolNet , McGill University, 2014.

1

An Analytic Measure of Graph Compartmentalization

Presented at UMass Amherst Computational Social Science Initiative Poster Session, 2014

1

Community Structure and Its Implications for the Evolution of Cooperation

Sunbelt XXXII, annual conference of the International Network for Social Network Analysis (INSNA), Redondo Beach CA, 2012.

Presentations

Text Preprocessing For Unsupervised Learning: Why It Matters, When It Misleads, And What To Do About It

Presented at the NYU NLP and Text as Data Speaker Series, February 16th, 2017.

Assessing the Consequences of Text Preprocessing Decisions

Presented at New Directions in Analyzing Text as Data , Northeastern University, 2016.

The Generalized Exponential Random Graph Model for Weighted Networks

Presented at the 9th Annual Political Networks Conference , Washington University St. Louis, 2016.

The Importance of Generative Models for Assessing Network Structure

Presented at the 9th Annual Political Networks Conference , Washington University St. Louis, 2016.

Content-Conditioned Hierarchical Latent Space Models for Textual Communication Networks

Presented at the International Conference on Computational Social Science , Helsinki Finland, 2015.

Inferring Latent Influence Diffusion Networks in The United States Senate

Presented at the International Conference on Computational Social Science , Helsinki Finland, 2015.