Using Computational Biology to Mitigate Path Overlap in Transit Assignment

In the past thirty years, transit assignment has grown from a niche theoretical exercise to a rigorous field of study with a variety of advanced models and methodologies designed to incorporate the unique decision-making process of transit trips in the planning process. A significant boon to the transit modeling community came with the advent of the General Transit Feed Specification (GTFS) in 2006. Prior to 2006, information on transit networks was difficult to obtain and was inconsistently formatted – meaning that software developed for one network may need significant modifications to be applied elsewhere. The motivation of having transit systems searchable on Google Maps proved great enough to convince hundreds of transit systems around the globe to develop their own GTFS feed and make it openly available. This opened the door to a vast amount of research potential, not least the transit assignment community. One particular software, FAST-TrIPs, leveraged the structure of GTFS to create a dynamic transit passenger assignment (DTPA) program with improved computational performance than previous tools. The program takes a detailed (or enhanced) version of GTFS and transit demand data and simulates the transit choices people might make by generating many possible paths for every trip a simulated traveler wants to make and then assigns them to paths probabilistically, incorporating the capacity of the transit vehicles along the way. In this path generation phase, one issue identified by researchers working to take DTPA from research to practice was the presence of path overlap. Path overlap occurs when two or more paths in a path choice set contain many similar features with slight variations, violating independence assumptions in the logit-based path choice model and potentially resulting in a poor representation of transit path choice behavior. In the path generation phase there are also sometimes nonsensical paths included in the path set (multiple transfers between two routes; alighting and then reboarding the same route) that waste computational resources and in some cases (albeit rarely) result in one of those trips being chosen in the DTPA. GTFS organizes the transit network as a collection of linked spreadsheets that organizes the system as stops, trips and routes. Trips are a sequence of stops visited by a transit vehicle at a particular time of the day. It is these trips, and their associated stop times that are used by FAST-TrIPs in the path generation phase. Because GTFS presents these trips as sequences, it presents an opportunity to leverage work with sequences to potentially improve on the path generation phase of DTPA. Computational molecular biology and computer science are fields that have been working to efficiently operate with strings and sequences for decades. Among others, the Longest Common Subsequence (LCS) algorithm is a method used to measure the similarity between a pair, or group of sequences. The sequences can be either sequences of the letters T, A, C and G representing the bases of a DNA molecule or a sequence of stops representing a potential transit trips. Incredibly, this dynamic-programming based method run in linear time – making them very computationally efficient. The goals of the proposed research are broadly to apply the methods of sequence alignment and matching from computational biology, focusing on LCS, to the DTPA problem in three contexts: (1) A priori characterization of the network; (2) Elimination of nonsensical paths during path generation phase; and (3) Mitigation of path overlap through development of metrics to be integrated with the path choice model.