Research

I’m primarily interested in syntax (sentence structure), computational linguistics, and Japanese linguistics. Below are some current and past projects. See my CV for a more complete list of papers and presentations.

Computational complexity of syntactic dependencies

Recent work suggests that a wide array of syntactic phenomena are subregular over trees, meaning that they occupy strict subclasses of the regular (finite-state) tree languages. This provides a much tighter bound than previous string-based characterizations, provides insight into the specific formal patterns utilized by natural language, and helps us understand how such patterns can be learned from positive data. What’s more, phonological patterns are also overwhelming subregular over strings, allowing direct comparisons between syntax and phonology.

Among the various subregular classes, TSL (tier-based strictly local) seems to be the upper bound for most individual dependencies. My current work argues that case and agreement dependencies are TSL-2, which means that all constraints can be stated in a window of two elements on a tier of salient elements, and that this provides an explanation of several key properties of their formal typology.

Agreement

Using a formal model of agreement based on paths though MG (Minimalist Grammar) derivation trees, I show how the natural parameters of the TSL-2 model correspond to attested variation in visibility, directionality, and iteration. Even seemingly complex patterns from the literature can be provided a simple analysis without resorting to any additional mechanisms.

For a comprehensive and self-contained presentation, see my JLM paper.
For a shorter and less technical overview, see my CLS 60 paper.
My talk at the 2024 Workshop on Myopia in Grammar focused on locality and the parallel with long-distance harmony in phonology.

I have also developed an analysis of Hindi verbal agreement in two tiers, which demonstrates how the theoretically problematic phenomenon of parasitic agreement arises from the intersection of two very ordinary TSL-2 patterns.

Case

Case also appears to be TSL-2, though it differs from movement and agreement in that it does not (obviously) involve feature matching. I am currently exploring a model which assigns case to the string of noun phrases in a given domain (ordered by c-command) according to a TSL-2 string language.

In recent paper, I provide an in-depth analysis of case in Japanese, including structural and lexical case, valency alternations, and long-distance case assignment.

paper / poster (SCiL 2023)
informal presentation (SYNC 2023)

I am currently expanding the analysis to other languages, mirroring my work on agreement.

recent handout

Local dependencies

While long-distance dependencies get most of the attention, the study of computational complexity can reveal insights about local dependencies as well. I tackle this issue in a recent paper using the same path-based model as my work on agreement, showing that most of the logical patterns which can be generated by a strictly local (SL) grammar correspond to some real syntactic phenomenon.

paper / slides (CLS 59)

Syntactic learning

The above work extends the finding that long-distance linguistic dependencies are predominantly TSL-2, meaning that all constraints can be stated in a window of two elements on a tier. This is interesting, as TSL-2 is one of the simplest classes of formal languages which can handle long-distance dependencies at all, and is in principle efficiently learnable. But there is a lot more that must be done to incorporate this knowledge into a theory of syntactic learning.

In a recent paper, I adapt a system by a group at UPenn, which learns syntactic islands from movement path using the Tolerance Principle, to produce a TSL-2 grammar, and discuss the relative roles of the grammar formalism and the learning theory in explaining the constraints on long-distance syntactic dependencies.

Paper (SCiL 2024)

Restrictions on syntactic feature systems

I’m working with Logan Swanson and Thomas Graf to test Thomas’s 2020 conjecture that syntactic category features are ISL-2 recoverable, that is, that they can be inferred only from the categories of heads that they select or are selected by, using data from MGBank, an MG corpus built from the Penn Treebank. Such a principle could plug a hole found in many grammatical formalisms which allows linguistically unnatural patterns to be encoded via category features.

Allomorph selection in the Japanese verb paradigm

Allomorphy in the forms of Japanese verbal stems and suffixes appears to be phonologically motivated, yet many of the patterns are unattested elsewhere in the language. This naturally raises the question of how such patterns are encoded in the grammar.

In a recent presentation I examine a proposal by Ito and Mester (2004) in which lexically specified allomorphs are selected by the phonology via the mechanisms of Optimality Theory. While the approach elegantly derives much of the verbal paradigm, extending it to the suffixes -te, -ta, -tara, and -tari fails due to opacity in the allomorphy of these suffixes, and also results in severe overgeneration. These results suggests that the grammar of Japanese does in fact include phonological processes which are restricted to certain verbal suffixes.

In an upcoming paper, I propose an answer as to what is really going on, and discuss avenues for further exploration.

Bare noun phrases in the history of English

I worked with Cristina Schmitt and Alan Munn on a diachronic corpus study of the loss of bare singular noun phrases (those lacking a determiner, quantifier, or possessor) in Middle English. Bare singular predicates (such as “doctor” in “He is doctor”, with no preceding article) used to be common in Old and Middle English, as they still are in other Germanic and Romance languages, but are severely restricted in Modern English. Our goal is to determine how exactly these changes unfolded and why using statistical measures of their changes in frequency according to syntactic environment and the semantic class of the head noun.

Corpus tools and methods

Tracking lexical classes

Part of the bare noun project involved tracking lexical classes of nouns, first because of the need to distinguish between count and mass nouns (not distinguished in the corpus used, as of the time of research), and because we wanted to track several classes of count nouns predicted to be of theoretical relevance. The following poster describes the methods used for this part of the project.

Poster (MSULC 2014)

CorpusExtract

A program called CorpusSearch, developed at the University of Pennsylvania, allows automated searching and coding of syntactically annotated corpora, such as the Penn Corpora of Historical English. In order to make the next step, data analysis, easier, I wrote a program I call CorpusExtract, which converts the output of a CorpusSearch coding query to spreadsheet form. See the poster linked below for a more complete explanation of the purpose behind CorpusExtract and its implementation.

Poster (MSULC 2013)

Classical Japanese Poetry

I worked with Catherine Ryu on a corpus study of the Classical Japanese text Hyakunin Isshu, a collection of 100 tanka (five line poems with a 5-7-5-7-7 mora structure). My part of the project was to generate statistics and visualizations to show macro-level patterns in the syntax of these poems: where, by line and within lines, do different syntactic categories occur, and in what combinations do they occur? This involved parsing the poems using the morphological analyzer MeCab (with the Early Middle Japanese version of the UniDic dictionary) and running the results through a suite of Python and R scripts.

Presentation (MSULC 2014)

Human-robot interaction

When I was a research assistant for the MSU Language and Interaction Research Group (LAIR), I worked on a project to create an end-to-end system allowing a robot to collaborate with a human partner in a game of object naming, which required to the robot to speech and gesture to proactively mediate its representation of the visual scene with that of its human partner.