Difference between reported crash me and speed disturbances in urban signalized intersec ons: a case study in Fortaleza-Brazil

Recebido: 29 de janeiro de 2020 Aceito para publicação: 8 de maio de 2020 Publicado: 15 de dezembro de 2020 Editor de área: Sara Ferreira ABSTRACT The advent of new technologies for monitoring and controlling traffic allows the development of more robust road safety studies, based on the collec'on of disaggregated traffic data at intervals of 1 to 15 minutes or even in real 'me. However, in order to relate the crashes to their precursor condi'ons, applying this type of data requires a be8er knowledge on the accuracy of crash reported 'mes. This work aims to present an analysis between the reported 'mes of crashes and disturbances in the traffic flow condi'ons in urban signalized intersec'ons in Fortaleza, Brazil. Vehicle flow disturbances were detected from speed oscilla'ons by using an algorithm based on comparing speeds between “typical” and “crash” condi'ons and by valida'ng the detec'ons with visual analysis. The results of inves'ga'ng 291 crashes showed an average difference of 20 minutes (sd = 23 min) between the reported 'me of crash and the occurrence of the speed disturbance. This is an indica'on that, while developing road safety disaggregated analyses, one should examine the database accuracy regarding the crash reported 'me.


INTRODUCTION
The advent of new technologies for monitoring and controlling urban traf ic -such as vehicle detection inductive loops, speed camera equipment and GPS data -provides tools for the development of dynamic traf ic low management. Some of the main bene its of this approach: reducing the negative impacts of congestions and improving road safety performance, aiming to identify and preferably anticipate the incidents (Roshandel et al., 2015;USDOT, 2014).
Some techniques that use real-time vehicle data are being applied mainly in uninterrupted traf ic low regimes. The Variable Speed Limit, for example, is a strategy that aims to homogenize the speed in the traf ic low, providing a more stable traf ic regime therefore reducing trafic con licts (Yang and Loo, 2016). Another alternative is the ramp metering that aims to control the entrance of vehicles on the highways, promoting reductions of rear-end and sideswipe collisions and congestions in the order of 30% and 8%, respectively (Taylor and Meldrum, 2000). In terms of interrupted low environment, traf ic control measures are mostly focused on the real-time optimization of traf ic signals reducing the delays experienced by drivers (Day and Bullock, 2017;Feng et al., 2016).
When it comes to road safety modeling, the most commonly used approach is the development of safety performance functions aiming to evaluate the relationship between exposure and traf ic variables, that are usually annually or monthly aggregated, and the frequency and severity of crashes (Cunto et al., 2011;Hauer, 2004;Qin et al., 2005). However, it is observed that this temporal data aggregation may lead to less useful models, especially when the explanatory variables are subject to oscillation within the same day, as well as the average speed and the volume of vehicles.
The availability of disaggregated traf ic information enables the development of more robust researches where the effect of speed can be evaluated, for example, on road safety (Abdel-Aty and Pande, 2005;Imprialou et al., 2016) or to apply proactive real-time risk management measures for traf ic occurrences (Essa and Sayed, 2019;Huang et al., 2017;Pirdavani et al., 2015). These studies are based on crash precursor indicators, which seek to identify -from aggregated data in intervals of 1 to 10 minutes -disturbances in the traf ic low that may be related to crash occurrences.
The vast majority of crash precursor studies rely on the assumption/estimation of the reported time of the crash in order to estimate the upper limit of the time interval used to de ine time aggregations (5 min, 10min, 15min, etc). Unfortunately, differences in the reported crash time and the precise crash time may produce another source of bias especially for short time aggregation studies such as 1 minute time interval, sometimes leading to the association of crashes to conditions that are not those prior to the crash or even to scenarios that the condition is already affected by the crash occurrence (Zheng, 2012).
Furthermore, unlike uninterrupted low conditions, the densely signalized urban environment has systematic interruptions in the traf ic low. Unfortunately, the vast majority of studies on traf ic low disturbances was produced for uninterrupted low environment, making it dif icult for the applied techniques to be transferred to the densely signalized urban environment, such as the city of Fortaleza, Brazil. Thus, pointed out the importance of the crash occurrence time for the development of disaggregated analyses and the absence of related studies for the urban environment, the main objective of this work is to propose a method to detect speed disturbances in the uninterrupted low environment and to apply it in urban signalized intersections of Fortaleza in order to quantify the differences between crash reported times and the moment of speed disturbances. This paper is organized into ive sections. This section presented the context, the motivation and the goals of this work. The second part brings a review around studies that evaluated the TRANSPORTES | ISSN: 2237-1346 282 detection of traf ic disturbances. The third section exhibits the proposed method for detecting the moment of speed disturbances in an urban environment and for comparing them to reported crash times. The fourth part presents an application of the method in urban signalized intersections of Fortaleza and a discussion of the indings. The last section brings the study conclusions. Solomon (1964) presented one of the irst studies to investigate the relationship among traf ic low, crash frequency and its severity using speed data collected in the ield. The development of new traf ic monitoring technologies and statistical techniques has enabled the advancement of researches that, in their state of the art, are aimed at real-time incident risk monitoring strategies (Abdel-Aty et al., 2012;Essa and Sayed, 2019;Pirdavani et al., 2015;Shi and Abdel-Aty, 2015). At the same time that the use of disaggregated data allows a more robust modeling, there is a need for a higher precision in the crash occurrence time. The use of the times reported by the agents responsible for recording the crash -which may be imprecise -may not correspond to the traf ic characteristics considered as precursors to the crash, introducing erroneous cause and effect relationships (Roshandel et al., 2015;Zheng, 2012). Some of the studies developed on this topic did not consider the estimation of the most probable crash occurrence time (Imprialou et al., 2016;Lee et al., 2006;Pirdavani et al., 2015;Stempfel et al., 2016).

VEHICLE FLOW OSCILLATION AND ROAD SAFETY STUDIES
Otherwise, some authors have incorporated crash detection techniques into their analyzes. Lee et al. (2002) developed a study on crash precursors considering vehicle density and speed variation indicators with aggregation of 5 minutes in 10 km of freeways in Toronto-CA. The most likely crash occurrence time was identi ied from speed information aggregated within 1 minute of the upstream detection loop of the occurrence location. The speed pro ile was visually inspected and the most probable crash occurrence time was assigned when an abrupt drop in the speed was detected. Zheng et al. (2010) used aggregated data at 20-second intervals of the upstream and downstream detection loops from the crash location to detect the most likely time to evaluate the relationship between traf ic oscillations and frequency of freeway crashes. The traf ic oscillation identi ication was based on reductions in the speed of the upstream loop and in increments in the speed and contraction in the volume of the downstream loop. This method -with variations in the temporal data aggregation -is the most common in the studies applied to uninterrupted low conditions that have continuous inductive loops (Abdel-Aty and Pande, 2005;Lee et al., 2003;Pande and Abdel-Aty, 2006;Zheng, 2012). Hojati et al. (2014) presented a study that evaluates the time difference between the crash occurrence and the return to typical traf ic conditions (recovery time) in 14 km of freeways in Queensland-AUS from speed and volume indicators aggregated in 5-minute intervals. The authors used speed data also aggregated within 5 minutes of the upstream detection loop to recognize the precise time of the crash. Average speeds at the time of the crash were compared to typical values on the road and, if substantial reductions (not quanti ied by the authors) were visually identi ied in 4 consecutive intervals, then this time was considered.
Another approach applies ixed delays in the reported times in the attempt to obtain the most adequate crash precursor conditions. Christoforou et al. (2011), for example, considered a ixed delay of 6 minutes in the crash time to collect traf ic characteristics, that is, for a crash reported at 9:00 am, the authors considered the conditions on the road as of 8h54m. Other researchers adopted the same method with a change in the ixed delay: (i) Golob et al. (2008): 12 minutes; (ii) Quddus et al. (2010): 30 minutes.
Unfortunately, most studies on traf ic conditions and road safety that use disaggregated data -and consequently on the relationship of traf ic disturbances and reported crash times -were carried out in uninterrupted low environments using, usually, traf ic data from downstream and upstream the crash location. Therefore, there is a gap in the literature researches focused on the urban environment with predominance of interrupted low, which might add some features around the analyses, like the signal cycle length and the absence of continuous loop detectors.
Moreover, although it is a consensus that fair estimates around the crash occurrence might be considered in disaggregated analyses, there are few studies that evaluated, in quantitative terms, the precision of reported crash times in databases, which is the main purpose of this paper.  The irst step of the method consists in the consolidation of two databases: (i) road crashes and (ii) speed camera enforcement system. Fortaleza's Municipal Department of Transportation (AMC) stores the historical data of road crashes in a platform called Traf ic Crashes Information System of the City of Fortaleza (SIAT-FOR) since 2004. SIAT-FOR receives crash information from various sources (AMC of icers, emergency response system -SAMU, forensic expertise, hospitals, among others), which are treated and georeferenced by the traf ic managers. The database provides important characteristics on the occurrences, such as: date, time (hour and minute), severity (property damage only -PDO, injuries or deaths), type of crash (rear-end collision, right-angle collision, sideswipe collision, among others) and involved vehicles (motorcycles or bicycles). Speed camera enforcement system data is also available in a text ile structured in which the following information is recorded for each vehicle passage: equipment code, passing date, passing time (hour, minute and second), passing lane, posted speed limit, speed measured by the equipment, estimated vehicle length and vehicle classi ication (motorcycle, car, truck or bus).

METHOD
The consolidation between the two databases intends to associate each speed camera with certain crashes that have occurred (or have been georeferenced) in their vicinity. It was decided to implement some measures so that a greater control of the crash location was possible: (i) selection of equipment located at signalized intersections; (ii) only crashes within a 30-meter radius from the equipment were collected. This was adopted mainly due to the imprecision of the crash georeferencing process caused by the technique of collecting the location of the occurrences by the traf ic agents (numbering of the neighboring lot or name of the intersection streets).
After associating the reported crash times to the speed camera equipment, the vehicle speed pro iles were used in order to differentiate "typical" conditions and "crash" conditions, i.e., speed pro ile considerably low when compared to "typical" conditions. Due to the uncertainty around the time reported in the database, it was decided to evaluate a 3-hour interval around each occurrence (reported crash hour, one hour before and after the reported time). To de ine the "typical" condition in the traf ic low, ive days (one and two weeks before and one, two and three weeks after the crash date) were evaluated considering the same 3-hour time window and the same day of the week. Average speeds were determined in the interval of three hours for the ive days, where the day that presented the value of the median between the averages was selected as typical. Three criteria for the speed disturbance identi ication were tested based on the average and standard deviation of the speed under typical conditions and on the speed aggregation interval: if the aggregated speed in x minutes remains below the average value subtracted from y standard deviations between the speed observations for z consecutive minutes, the subject traf ic low regime is considered as a "disturbance". Each criterion was based on speed aggregations (x) in 1 or 3 minutes with subtraction of (y) 2 or 3 standard deviations for (z) 6 or 9 minutes. A pilot study was developed to determine the most effective criterion of the speed disturbances detection. Figure 2 shows an application of the method in which the speed was aggregated in TRANSPORTES | ISSN: 2237-1346 285 3 (x) minutes, and the criterion for detecting the disturbance (thick line) obtained with the reduction of the average speed under typical conditions (thin line) of 2 standard deviations between sixty speed observations time slots of 3 minutes each (y). For a disturbance to be detected by this criterion, it must remain for 6 or 9 minutes (z) below the thick line. In total, the pilot study comprised 8 combinations of the three criteria. The comparisons between typical and crash situations and the detection of the disturbance moment were performed automatically using the software R. For the crashes in which it was possible to match with speed oscillation triggered by a given combination of criteria, the speed pro iles were used to provide additional visual validation of the crash. In this analysis, the graphs corresponding to each crash were observed individually, considering as valid those in which it was possible to detect the exact moment of the beginning of the speed disturbance, corresponding to a sudden change in the graph pattern and, thus, increasing the reliability of the used criterion. The visual validation approach is discussed in more details in the next section. The best combination of average speed aggregation, standard deviation and duration time was the one that produced the highest percentage of automatic detection and number of visually validated cases.
Finally, the reported crash times were compared to the detection time of the speed disturbances, in order to evaluate the difference between them. The results were segregated by crash severity (PDO and injury), by crash type (right-angle and others) and number of approaches with speed cameras that identi ied the disturbance for the same crash. The last two groups are important because it is assumed that, for example, right-angle collisions (better accuracy of the crash location) and occurrences detected in two or more approaches (possibility of disturbance time validation) would result in more reliable estimates.

RESULTS AND DISCUSSION
In this section, the results are presented and analyzed in three steps. Initially, a brief characterization of the database is shown, followed by the results of the pilot study carried out to de ine the most effective speed disturbance identi ication criterion. Finally, the algorithm with the selected criterion is applied to the whole crash database leading to an analysis between the differences in the reported crash times and the occurrence of the speed oscillations.

Consolida on of the speed camera data and crashes database
The speed camera enforcement system of the city of Fortaleza is divided into three main types: signalized intersections, mid-block and exclusive public transportation lanes. As highlighted in the method, in order to increase the reliability regarding the location of the crashes, only signalized intersections equipped with speed surveillance were studied. During the period from 2015 to 2017 a total of 83 speed cameras were active, encompassing 60 intersections in the city. From these, 38 were equipped in only one approach; 21, in two approaches and a single intersection had three devices installed. From a 30-meter radius from each equipment, it was possible to collect 344 crashes in 2015, 384 in 2016 and 229 in 2017; corresponding to a sample of 957 crashes which types and severities are presented in Table 1. Figure 3 shows the spatial distribution of speed cameras and crashes and an example of a 30m-radius buffer around one speed camera in the south of the city. Figure 3 also presents that most of the speed cameras placed in signalized intersections are surrounded by traf ic signals, characterizing an interrupted low environment.

Pilot study to define the criterion of speed disturbance iden fica on
The de inition of the speed disturbance identi ication criterion to be implemented in the algorithm and the temporal aggregation of the speed indicator was performed by applying a pilot study to 200 crashes (randomly selected from the sample set) using the combination of three variables in eight different scenarios. As explained before the evaluated variables were: (i) temporal aggregation of speed in 1 or 3 minutes; (ii) average speed in the typical day reduced by 2 or 3 standard deviations between the speed observations; (iii) disturbance duration for detection of 6 or 9 minutes. Table 2 summarizes the obtained results.
From Table 2, it can be noticed that the combinations that used 3-minute speed aggregation showed detection rates considerably higher than 1-minute aggregations. The high oscillation between consecutive speeds aggregated in 1 minute -possibly due to the traf ic signal downstream -provides high standard deviations for the speed, thus generating signi icantly lower thresholds to identify the disturbance (thick line). Aggregating speeds in 3 minutes smoothens the natural oscillation caused by the traf ic signal, reducing the dispersion between observations. These situations are illustrated in Figure 4.  When it comes to speed standard deviations and duration time, as expected, the increase in the duration time from six to nine minutes and the use of three standard deviations reduced the percentage of automatic detection, but increased the cases where it was possible to visually validate the crash time.
In addition to not reaching a given criteria, other hypotheses can be drawn to explain the cases where it was not possible to match a given reported crash to the speed pro ile given by the speed cameras: (i) some crashes -although georeferenced at the intersection -only impacted the approach without electronic enforcement; (ii) some of the crashes occurred upstream the loop, so only the volume indicator would be impacted; (iii) some crashes with low severity did not affect the speed in the street enough to be detected by the algorithm.
For the visual validation procedure, Figure 5 provides two examples in which speed disturbance was detected by the algorithm, but -during the complementary visual analysis -it was not possible to determine the initial oscillation time, making it dif icult to estimate the crash occurrence time. In Figure 5a (rear-end collision) the algorithm detected a disturbance around 10h10m, however, as the speed did not present a constant reduction pattern, it was not possible to specify the crash occurrence time. The speed pro ile in Figure 5b (sideswipe collision) does not allow to associate some punctual disturbance to the crash occurrence. One hypothesis for such a below-average long-lasting behavior is that the speed reduction may have caused the crash (a lane obstructed by vehicles, for example) rather than the other way around.
Among all combinations, 5 and 7 were the ones that presented the best results. It was decided to adopt the combination 7 (3 min 2 sd 9 dur) to conduct the complete analysis due to the higher rates of visual con irmation. Even though the algorithm's rate of con irmed detection was approximately 26% (36% of detection times 73% of visual con irmation), it is still possible to achieve the central purpose of evaluating the differences between the reported crash times and the speed disturbances, since an extent crash database was considered.

Analysis of speed disturbances and comparison with reported crash mes
The analysis of the traf ic disturbances was divided into four scenarios according to the type of crash and to the number of approaches with loop detectors that identi ied the disturbance: (1) disturbances detected by at least two approaches associated with right-angle collisions; (2) disturbances detected by at least two approaches associated with other types of crashes; (3) disturbances detected only by one approach associated with right-angle collisions; (4) disturbances detected only by one approach associated with other types of crashes. This division re lects -in descending order -the con idence level to estimate the most likely crash time. For example, scenario 1 allows the most likely crash occurrence time to be validated from the information in one of the approaches. In scenario 4, on the other hand, the con idence regarding the georeferencing of the crash is smaller, and may impact the detection quality. It should be noted that approximately 31% of the crashes registered in the database were classi ied only as "collision", thus being included in the Other Types category. In general terms, from the 957 investigated crashes, speed disturbances were automatically identi ied in 412 (43.1%) of them by using the proposed criteria combination 7. From the crashes detected by the algorithm, the complementary visual analysis allowed the exact detection of the speed disturbance moment in 291 (70.6%) of them, maintaining the performance close to the one obtained in the pilot study. Figure 6 shows a summary of the detections per scenario. Figures 7a and 7b provide the speed pro ile for two approaches of the same intersection at the time when a right-angle collision was reported. It is veri ied that the two loops detected the speed disturbance in the same hour, 3 minutes after the reported one. Most of the disturbances detected and visually validated for scenarios 1 and 2 followed the pro ile presented in Figure 7, allowing the precise identi ication of the disturbance time. Figure 8 shows two disturbances detected and con irmed by visual analysis in only one loop (scenarios 3 and 4). Figures 8a (right-angle collision) and 8b (sideswipe collision) show situations in which it was possible to determine precisely the moment in which the speed disturbance was triggered. Table 3 shows the observed differences between the reported crash time and speed disturbance per scenario. For the case of a higher con idence for the difference (scenario 1), there was an average delay of 12 minutes in relation to the time reported in the database and the speed oscillation occurrence. Considering all 291 crashes, the difference was higher: average delay of 20 minutes. It was also veri ied that in all the scenarios there were crashes reported before the disturbance identi ication (estimated difference less than zero). Figure 9 presents a right-angle that the speed disturbance was detected 24 minutes after the reported crash time.  Table 4 shows the statistics of the differences by crash severity. It is observed that there is a difference in the average between crashes with injuries and deaths compared to those only with property damages (p-value = 0.07), evidencing comparatively lower emergency response time for crashes with victims. The lower average veri ied for scenario 1 in the previous analysis may be related to the preponderance of crashes with injuries in this scenario. The results found for the city of Fortaleza indicate differences around the crash occurrence reported time provided in the databases and the estimated time of serious disturbances in trafic. These divergences would impact severely the validity of studies that evaluate crash precursor conditions. For example, if one while conducting a research with data aggregation of ive minutes applies a ixed "correction" to the crash time of 33 minutes (75º percentile), one would relate about 25% of the crash sample to erroneous precursor conditions that might be impacted by the crash existence. To further illustrate, take a crash that was reported at 20h40m, but its real occurrence was at 20h00m; if one employs a correction of 33 minutes -or even worst: uses the reported time -one would associate the crash to post-crash circumstances. Therefore, one feasible solution for this issue is to consider only the crashes which the identi ication of the most probable occurrence time was possible.

CONCLUSION
This work presented an evaluation of the difference between reported crash times and disturbances in vehicle low conditions in urban signalized intersections in the city of Fortaleza-Brazil. The detection of vehicle speed disturbances was performed using an algorithm that compared speeds at typical days (without crashes) intervals with the interval at which the crash was reported. Then, a complementary visual analysis validated the identi ied disturbances automatically.
A comparison between the level of disturbance detections considering speed aggregations of one and three minutes showed that the irst one is not indicated for the signalized environment, since the short timeframe is impacted by the signal cycle. Aggregations of three minutes were able to minimize the signal cycle effects in the average vehicular speed. The best performance algorithm presented a 43.1% disturbance detection rate, from which 70.6% were visually validated. Disturbances close to the occurrence time of right-angle collisions, which were detected in two approaches (two loop detectors), were considered the most reliable ones for estimating the difference between the reported time and the disturbance, showing an average value of 12 minutes (sd = 18 min). Based on all crash (291) and speed disturbance scenarios, the average difference between occurrence and disturbance was 20 minutes (sd = 23 min).
Due to the observed differences, the main recommendation of this study is that before the development of disaggregated road safety analyses one might explore the crash reported times registered in databases, aiming to evaluate possible differences between this time and the moment that the crash probably occurred. Otherwise, one takes the risk of associating the crashes to conditions that were not prior to them.
The main limitation of the proposed method is that it only considered the speed indicator to detect disturbances in the traf ic low, not contemplating the traf ic volume indicator. So, in some crashes that happened upstream the loop detectors only the traf ic volume would be impacted. Then, performing the analyses with both indicators would probably increase the algorithm disturbance detection rate.
In future works, it is recommended the application of techniques that consider both speed and traf ic volume indicators and the development of similar studies in other cities and countries aiming to measure the differences around the reported crash times the moment of disturbances in the traf ic low. Another possibility is a deeper investigation around the features that impact the value of those differences, like location of the crash, traf ic low conditions before the crash, time of day, day of the week and other elements. One step forward would be the association between the crashes that the most probable time of occurrence were estimated and their precursors conditions, such as speed, traf ic volume, vehicular headway, and others considering short temporal aggregations. The development of studies of this type in urban signalized environment would help to ill an existent gap in the literature.