Text-Based Traffic Panels Detection using the Tiny YOLOv3 Algorithm

: Lately, traffic panel detection has been engrossed by academia and industry. This study proposes a new categorization method for traffic panels. The traffic panels are classified into three classes: symbol-based, text-based, and supplementary/additional traffic panels. Although few types of research have investigated text-based traffic panels, this type is considered in detail in this study. However, there are many challenges in this type of traffic panel, such as having different languages in different countries, their similarity with other text panels, and the lack of suitable quality datasets. The panels need to be detected first to obtain a reasonable accuracy in recognizing the text. Since there are few public text-based traffic panels datasets, this study gathered a novel dataset for the Persian text-based traffic panels all over the streets of Tehran-Iran. This dataset includes two collections of images. The first collection has 9294 images, and the latter has 3305 images. The latter dataset is more monotonous than the first one. Thus, the latter is utilized as the main dataset, and the first is used as an additional dataset. To this end, the algorithm uses the additional dataset for pre-training and the main datasets for training the network. The tiny YOLOv3 algorithm that is fast and has low complexity compared to the YOLOv3 is used for pre-training, training, and testing the data to examine the utility and advantages of the data. The K-fold cross-validation procedure is used to estimate the model's skill on the new data. It achieves 0.973 for Precision, 0.945 for Recall, and 0.955 for F measure .


Introduction
Countries define rules to guarantee a secure traffic system. All traffic users such as drivers and pedestrians need to obey these rules. Traffic panels/signs play a crucial role in the system. Drivers observe the panels and act based on the information that the panels provide. The drivers intentionally or unintentionally ignore the traffic panels in different circumstances (once a vehicle has high speed or the driver is distracted by something), which can cause horrible accidents and misfortune. The definite dependence on the mentally and physically vulnerable human is hazardous. To this end, vehicles armed with progressive technologies play a significant role in the security of the traffic system and protecting many human lives. Detection of the traffic panels is considered a formidable technology in the intelligent transportation system. It has various uses including driver assistant systems, unmanned autonomous vehicles, and road panel keeping [1]. The traffic panels have been divided into two categories in the previous works: I) symbolbased and II) text-based traffic panel. This study classified the traffic panels into three classes: I) symbol-based, II) supplementary/additional, and III) text-based traffic panels.

Symbol-Based Traffic Panels
this type of panel [2]- [23]. Instances of symbol-based traffic panels can be seen in Figure 1.

Supplementary/Additional Traffic Panels
The traffic panels in this group complete the concept of symbol-based traffic panels. They contain 'text,' 'arrow,' 'pictograph,' and 'a combination of the text and pictograph' [24,25,26,27,28,29,30]. The shape of these panels is generally rectangular, their color is white, and they are found under the symbol-based traffic panels. There is a considerable variation of this type in Germany. However, there are a few numbers of this type in Iran. Since the number of traffic panels in this group is less than the number of traffic panels in two other groups, this group has attracted less attention. Instances of supplementary/additional traffic panels in Germany can be observed in Figure 2 and one instance in Iran can be observed respectively in Figure 3.

Text-Based Traffic Panels
Text-based traffic panels include text and arrows. The texts deliver rich and valuable semantic information about the traffic system and have a fundamental function in the traffic system and intelligent transportation system. They illustrate the correct routes, warnings about possible dangers, and permission or prohibition of access to the road, etc. The automatic detection of these panels can alert the driver about the traffic and environment. Some systems can read the text using a synthesized voice or combine it with navigation systems like GPS. They play the assistant role and aim to increase the drivers' attention to have a safe traffic environment. Fewer studies have been done in this category than symbol-based category [31,32,33,34,35,36,37,38] [39,40,41,42,43,44,44]. The reason is a multitude of challenges which are presented in the following [1].
• There are not enough and suitable public datasets.
• The texts are written in different languages in different countries. • Sometimes, the panels are not evident in various weather situations such as snowy, rainy, sunny, foggy, cloudy, etc. • There are different text-based boards in the street that look like the text-based traffic panels such as billboards, store name boards, and advertisements on the vehicles. • The color of the panels and texts fade due to the sunshine for a long time and reaction between the air and color. • Some panels have been physically damaged.
• Some panels have been obstructed by the trees, vehicles, and other objects.
As far as we know, few benchmark datasets for textbased traffic panel detection are publicly available [45,40]. Hence, this study gathered a novel public dataset all over the city of Tehran-Iran and called it 'Persian Text-Based Traffic Panel Dataset' [46]. As well as the general challenges mentioned above, there are other exclusive challenges for the Persian dataset: I) As shown in Figure  4, some traffic panels are not uniform from the perspective of color. In other words, they are a combination of multiple colors. II) There are a few greens and white rectangular panels that only includes the symbols, and there is no text inside them, as shown in Figure 5. Since these panels look like text-based traffic panels, the network might make a mistake in detecting them. III) Persian text-based traffic panels include two languages, Persian and English. Therefore, the size of the texts is small, and the characters are unreadable from a relatively long distance.

Contribution
The contributions of this study are outlined as follows. This paper provides short and valuable literature on different types of traffic panels. Different traffic panels are categorized into a novel and comprehensive manner. The newest related works for detecting traffic panels in traditional and non-traditional methods are presented for each category. This study introduces a new dataset and elaborates on data collection and traffic panel detection techniques and challenges in detail. This work explains the way of evaluation using a state-of-the-art algorithm, the Tiny YOLOv3, to analyze the efficiency and benefits of the data. In the evaluation and results section, the algorithm's performance for the dataset has been extensively discussed. The results show that the algorithm provides 0.973 for Precision, 0.945 for Recall, and 0.955 for Fmeasure.

Organization
The study is structured as follows. Recent works in different types of traffic panels are reviewed in section 2. The used algorithm is explained in section 3. The steps of the data collection and labeling are described in section 4.
The evaluation and results are discussed in section 5. The discussion is presented in section 6. Ultimately, the conclusion and perspectives about the future studies are proposed in section 7.

Related Work
This section discusses related works in different categories of traffic panels.

Symbol-Based Traffic Panel Detection
Detection methods for symbol-based traffic panels are generally divided into three sorts: I) colour-based methods, II) shape-based methods, and II) deep learning methods.

Color-Based Methods
The color-based methods usually use normalized color space such as RGB (Red, Green, Blue), YUV, and HSI (hue, saturation, and intensity), with specific thresholding. Segmentation means separating the traffic panels from the road background. Due to the damage to the color and shape of the traffic panels, it is not easy to distinguish its position from other outdoor images. Most segmentation techniques use color. Since the traffic panels are divided based on the color information from the background, it depends on the threshold of the input image in various color spaces. First, the segmentation is done using a different color range. Shape detection is done in the next step. Many researchers in this field face issues for white color and uneven lighting level [1]. In [47] red and green components' average is calculated for each pixel, and the G-R histogram is developed by displaying G-R's value. Many researchers use HSI color space because it understands the human understanding of RGB space well. While some researchers have found better results in the color space of YCbCr and YUV [48].

Shape-Based Methods
The particular shapes of traffic panels such as circles, triangles, and squares make them significant for distinguishing from other panels. For the shape-based methods, Hough transformation and different types of Hough transformation are used. Like detection of the general objects, a sliding window using categorization of the region is used to investigate the presence of traffic panels in the current window. Shape detection is an essential component of information in the recognition process of the traffic panels. Techniques such as connected components, dividing and merging edge detection, and clustering are utilized to compound pixels using color information-based similarity measurements to model the features at the detection stage. Attributes like aspect ratio, width, height, perimeter, and area of borderbox, are extracted experimentally to obtain candidate bubbles. The resulting bubble is confirmed at this stage. There are several features that are extractable from the bubble that can be used in the classification of the shapes better [49]. Therefore, for each segmented bubble, right, left, bottom, and top DtB vectors are produced. Li et al. [50] utilized only right and left DtB vectors to detect the shape. The task in [51] is given by the union of distance vectors from the bubble's center to the bubble's outer edge, i.e., distance vector from the center of union. These feature vectors are utilized in the categorization of the shapes using a linear support vector machine. In [52] an alternative method uses the signature form of the connected components, which is computed directly through the bubble. Referring to the angle, the signature is defined the distance from the center to the object border. The absolute value of the fast Fourier transform is used for each shape signature to counteract the direction of the panel. The absolute value of discrete Fourier transform can be a good choice as well, but is not preferred due to computational complexity [53] . In [54], the Histogram of oriented gradients algorithm is developed using integrating the color information into the feature vector to improve performance. Since the road panels have high color contrast, some studies consider the pixels as features for shape definition [3].

Deep Learning Methods
Late advancements in deep learning techniques have inspired researchers to exploit neural networks to detect and recognize traffic panels. Contrary to the comparative achievement of detectors with hand-crafted features, most non-deep learning-based systems are not suitable to correctly detect a vast number of traffic panels [15]. Wu et al. converted the basic image into the grayscale image using support vector machines (SVM). They exploited convolutional neural networks (CNN) with fixed and learnable layers to detect the traffic panels [55]. Sun et al. utilized CNN and twin SVM hybrid model in a new study [18]. In [56] a traffic panel detection and recognition algorithm were presented using CNN. Target of the system included traffic panels, Chinese characters, English letters, and digits. The system uses a multi-task CNN as its base trained to obtain valuable features for classifying and localization of different traffic panels and texts. Ahmed et al. in a newer study proposed a CNN consists of a CNN-based challenge classifier [20]. In [57] for extracting the Region of Interest (RoI), the AdaBoost classifier and local binary pattern feature detector are combined. Cascaded CNNs are used to decrease negative samples of RoI for traffic panel recognition. Liu et al. also used cascade saccade machine learning network with hierarchical classes for traffic panel detection [22]. In [58] Abdi et al. presented a novel real-time method based on cascade deep learning and AR for a fast and precise framework for traffic panel recognition. It superimposes augmented virtual objects onto a real scene under all kinds of driving conditions, such as adverse weather circumstances. Kamal et al. merged the segmentation architectures Seg-Net and U-Net for detecting the traffic panels from video sequences [11]. Different Studies used CNN and mask R-CNN to detect the symbol-based traffic panel [12,14]. Various studies used different versions of You Only Look Once (YOLO) algorithm, YOLOv3, YOLOv4, and YOLOv5 to recognize the traffic panels [16,17,23]. Wang et al. presented a new traffic panel recognition method via incorporating a lightweight superclass detector with a refinement classifier [59]. Nadeem et al. presented a new dataset in Pakistan and used transfer learning to recognize the traffic panels [21].

Additional/Supplementary Traffic Panel Detection
Since the supplementary or additional panels often appear below the symbol-based traffic panels, in most detection methods, first symbol-based panels are identified, then the interested region located below the symbol-based panels are specified, at last, the additional traffic panels in the RoI are searched. To this end, Nienhuser et al. presented an algorithm to detect additional panels besides the other panels. This method exploits Hough transform for lines and geometric constraints to find additional panels. Then uses SVM classification to verify the candidate shape [24]. Hamdoun et al. used MLP neural network to detect the rectangular shape in a region below speed-limit panels [30]. Puthon et al. presented a method to determine the shapes with 'arrow,' 'pictogram,' 'text or mixed.' Afterward, they classified the shapes with pyramid-HOG features [25]. Wenzel et al. utilized a Maximally Stable Extremal Regions (MSER) based approach to detect the additional panels [26]. In their next study, they found corner areas via aggregated channel features. Then, quadrangle generation and filtering technique were used to filter the variation of large aspect ratio for supplementary panels [28]. They introduced a complete pipeline to recognize the text by optical character recognition (OCR) in their earlier work. They assumed a specific additional panel, classified its layout, determined the bounding boxes of the content using regression, then used a multi-class classification step or applied a text sequence classifier if necessary [27]. Weber et al. proposed CNNs to classify the road condition and recognize additional panels that display the validation of 'when wet' additional panels [29].

Text-Based Traffic Panel Detection
Few studies have been done in text-based traffic panels detection compared to symbol-based traffic panel. This study categorized methods of text-based traffic panels detection into two categories, I) the methods that extract the features in a traditional way, and II) the methods that extract the features using deep learning algorithms.

The Methods That Extract the Features in a Traditional Way
Wu et al. [35] used the traffic panels properties such as geometric constraints and color distribution to distinguish the traffic panels from the other objects. At last, they used the vertical plane criterion for traffic panel detection. Reina et al. [60] considered the traffic panels in Spain. They exploited the information of HSI color space and computation of the shape classification techniques for thresholding. They detected the blue pixels by applying an appropriate threshold and used achromatic decomposition to detect the white pixels. They labeled the connected components and considered all the blob candidates in a selection process and removed some of them based on their aspect ratio and size. Xavier et al. [32] found edges of the input image using a canny edge detector. They used Hough transform to find the straight lines of the detected edges. Then, they determined the candidate regions (top, right, and left of the road) that contain the traffic panels. They utilized HSV thresholding to detect the text-based traffic panels in the candidate regions. Finally, they vertically aligned the detected panels using the random sample consensus (RANSAC) algorithm and homography techniques. In [36], the images have been taken from Google Street View Service that provides 360 panoramic views with high resolution from different locations on various streets and roads worldwide. The authors used the text localization algorithm and detected the panel characters using MSER. They used MSER to determine traffic panels as rectangles with high text density. As plenty of traffic panels that have been studied in this paper have a blue background, a blue segmentation method was applied to detect the extensive rectangular regions. Greenhalgh et al. [37] specified the RoI of the text-based traffic panels by defining the vanishing point and sides of the road. Textbased traffic panels candidates were detected exploiting thresholding techniques in RoI by MSER and HSV. The number of candidates was decreased via temporal and contextual information at the end. Gonzalez et al. [38] detected the text-based traffic panels by Bag of Visual Words (BoVW) and color segmentation techniques. They applied the BoVW method on the specific parts of the images identified by the blue and white masking. Khodayari et al. [61] processed the input images in HSV space. Then, they detected the traffic panels using the fuzzy logic method. In the fuzzy images, the properties of each pixel, such as brightness and color are presented using linguistic values. Yellow, green, light green, white, dark red, and dark blue are examples of the linguistic values assigned to each pixel. Korghond et al. [45] presented a dataset including videos of traffic panels recorded by a moving vehicle in municipal regions. They categorized the video sequences based on their features in a way that researchers could select the piece of data that helps their research purpose.

The Methods That Extract the Features Using Deep Learning Algorithms
Rong et al. [31] proposed a new Cascaded Localization Network. They detected the candidate traffic panels in each frame on a set of continuous image frames using a set of features learned by CNN. They also collected a novel dataset for traffic guide panel to train and appraise the framework. Zhu et al. [62] collected a text-based traffic panels dataset containing English and Chinese traffic panels. They exploited a fully convolutional network to segment the candidate traffic panels. Luo et al. [42] proposed a new system for recognizing both symbol-based and text-based traffic panels. They applied MSER on grey and normalized RGB channels to extract the RoIs from each video frame. They trained a multi-task CNN with images labeled from street views and synthetic traffic panels and used it to recognize the panels. Eun et al. [33] used a region proposal network to detect Korean character candidates. Then, they classified Korean characters by a classification network. Jain et al. [34] applied MSER and OCR on given text-based traffic panels to recognize the texts. Peng et al. [43] presented a deep learning-based cascade detection model with two stages to detect the traffic panel text in natural scenes. They found the panels' RoIs using an improved Single Shot Multi-Box Detector (SSD) network. Zhang et al. [44] solved the false detection and undetected by proposing a cascaded R-CNN to reach the multi-scale features in pyramids. All layers of the cascaded network combine the output bounding box of the previous layer for joint training; other than the first layer, this technique chips for traffic panels detection. They also presented a multi-scale attention technique to get the weighted multiscale features using softmax and dot-product, highlighting the traffic panel features and improving the detection accuracy by fining the features. Zhang et al. [63] proposed a new detection method called MSA_YOLOv3 to detect the small traffic panels precisely. They achieved data augmentation exploiting image mix-up technology. They introduced a multi-scale spatial pyramid pooling block into the Darknet53 network to help the network to learn object features more generally. Guo et al. [39] proposed an algorithm to detect the mixed horizontaland-vertical-text traffic panel with the Chinese language in the street. To differentiate the traffic panels from other similar things in color in the complicated background like street scenes, the different red, green, and blue features were effectively mixed. Since the Chinese text lines are usually vertical and horizontal on text-based traffic panels, the presented technique created the text lines by the structural information and position of the characters. Boujemaa et al. [40] presented a novel public multi-task dataset for detecting text-based traffic panels. Bagi et al. [41] presented an end-to-end trainable deep neural network that can recognize multi-oriented text instances in adverse meteorological conditions.

The Tiny YOLOv3 Algorithm
Since the computer's hardware is continuously growing, CNN-based deep learning methods have advanced fast and acquired remarkable results in computer vision and machine vision scope [64]. Traffic panel detection is considered a subset of the more general scope of object detection. Object detection is a relatively old subject. However, it is still a complex issue and an active scope for many researchers despite the endeavors that have been accomplished. Various object detection algorithms have been suggested so far. Currently, the most famous algorithms are SSD [65], R-CNN [66], and YOLO. YOLO is an algorithm that has become widespread for object detection recently. Joseph Redmon et al. have proposed its three versions (1-3) [67,68,69]. The fourth and fifth versions have been presented by [70] a [71], respectively. This algorithm solves object detection as a regression problem. It classifies the objects and gives their location as an output using an end-to-end network in one step. The noteworthy thing about this algorithm is its speed. It is one of the fastest algorithms so far. YOLOv3 uses the K-means clustering algorithm to automatically choose the most suitable anchors for the dataset. This algorithm is very complicated and demands complex hardware. It has 256 layers, and 53 layers out of 256 are convolution layers, as it utilizes the Darknet-53 as the backbone network. The tiny YOLOv3 is a smaller and faster version of the YOLOv3 algorithm. It has an architecture with low complexity and a more straightforward implementation for datasets with small sizes. This algorithm is used for pre-training, training, and testing the dataset. The tiny YOLOv3 algorithm used in this study has 46 total layers, including 13 Convolution, 11 Leaky Relu, 11 Batch Normalization, 6 Max Pooling, 3 Input, 1 Concatenate, and 1 Upsampling layer. In Table 1, the features of the tiny YOLOv3 architecture are summarized.

Dataset Collection
Before getting into the details of collecting the dataset, the characteristics of the Persian text-based traffic panels in Iran are explained. These traffic panels express a particular meaning according to their shape and color. The concepts of the colors for this type of traffic panel are as follows: • Green: Route guide for highways and religious places • Blue: Route guide for freeways and service guide • Black and white: Route guide for the other ways • Yellow: General warnings and warnings for construction or repair operations • Orange: Guide for administrative, training, and service areas • Brown: Guide for recreational, cultural, and tourism areas The concept of the shapes in the text-based traffic panels are as follows: • Horizontal rectangle: Warning for repairing and maintenance • Vertical rectangle: Guide for imperative and service panels • Flag rectangle: Guide for routes Since few public datasets focus on text-based traffic panels, this study collected a novel challenging dataset containing traffic panels with the Persian and English text called 'Persian Text-Based Traffic Panels.' In this dataset, the images have been taken using diverse smartphone cameras with various specifications by crowdsourcing method. Most images have been taken all over the city of Tehran in Iran, more on highways and streets. Two collections of images were collected. First, 4000 images were collected in different situations. Instances of these images can be witnessed in Figure 6. The images were augmented to 9294, followed by labeling. The tiny YOLOv3 algorithm was used for training and testing to examine the utility and advantages of the data. Nevertheless, this dataset did not attain suitable performance despite the positive expectation. The reason might be numerous text-based traffic panels in different situations compared to the total number of images (9294). In other words, there were many challenges in that dataset. It was compared to the dataset (an available part of the dataset) that has been used in [31]. Since most investigated text-based traffic panels are green and somewhat look like each other, this dataset is nearly monotonous and does not have many challenges. Hence, training and testing a model with this dataset is more straightforward than our dataset, gaining more accurate results. Thus, 1500 new images were taken as the main dataset. These images were more uniform than the images in the additional dataset. The images were augmented to 3305 and were labeled. This dataset was called the 'main dataset' and was utilized for training the algorithm. It is worth noting that the previous dataset was used to pretrain the algorithm and was called 'additional dataset.' Next, the dataset collection steps are explained in the following. Before getting into the details, methods of taking the images are explained. The images were taken in different situations such as: • Some images have been taken in a static condition.
• Some images have been taken in a moving car, and they might have been zoomed (in case of being far from the traffic panel). • The images have been taken at various distances.
• The images have been taken in different climates (e.g., sunny, cloudy, and rainy). • The images have been taken at different times of the day with different light conditions (e.g., morning, noon, evening, and night). • Some images have been taken from the traffic panels that have been physically damaged. • Most of the images have been taken in the city with complex background. • When lights of the cars and lamps in the middle of the highway and street illuminate the traffic panels). • Some images have been taken from the traffic panels blocked by the trees, billboards, etc. • Some images have been taken when the camera is behind the glass of the car.

Additional Dataset Collection
This dataset contains 4000 images with 12 Gigabyte sizes. Since different persons and cameras have taken the images, they are in both vertical and horizontal shapes, and their sizes are different. In other words, the images are not uniform. The images needed to be preprocessed to get uniform for use in the network. The desired size of the input images for the network is 416*416. If the images were directly resized to 416*416, the objects within the image would lose their original form. To this end, the cropping technique was used, and the images were converted to square form with the same length and width. Then, a bicubic interpolation method was applied to make the images monotonous. The cropping strategy has a pair of advantages: I) data augmentation and II) making the images uniform. To augment the data, the images were cropped three times, from the left side, right side, and both sides. Using this technique, the total number of images increases, and no pixels are discarded from the original images, i.e., all the pixels were optimally used. After cropping each image, three images are produced from one original image. However, in some images, cropping causes the exit of some traffic panels from the cropped images. Thus, the images with no traffic panels need to be removed, which made the number of the final images not precisely three times the number of the original images. Therefore, the total number of augmented images is 9294. Finally, a bicubic interpolation technique was applied to derive a uniform dataset with 416*416 size.

Main Dataset Collection
Due to the additional dataset challenges, a new and more uniform dataset was collected. 1500 images were collected and augmented to 3305 using the exact technique utilized for the additional dataset. Instances from the main dataset can be observed in Figure 7. In the new dataset, the panels showing the name of the alleys and streets were ignored. Because they are smaller than the usual traffic panels, and their texts are unclear. The green and white text-based traffic panels were used more. The images that a big part of them have been blocked with the trees, cars, and the other objects were not also used. After data collection and applying the preprocessing steps, the images were labeled. Before explaining the labeling process, the predicted bounding box and ground-truth concepts need to be defined. The predicted bounding box is the bounding box predicted by the network. The ground-truth bounding box is the exact location of the defined traffic panel in the images. As shown in Figure 8 the predicted bounding box and ground-truth bounding box have been depicted in red and green. The predicted bounding boxes that overlap more with the ground-truth bounding boxes are more precious. Ground-truth bounding boxes are usually determined in two forms: I) the coordinates of the two corner points connected by diameter, and II) the coordinates of one corner and the length and width of the bounding box. This study uses the first one. The authors developed a code, after running the code, the dataset's images are displayed one by one. As it can be seen in Figure 9, the user needs to click twice (one clicks for left up corner and another one for right down corner) for each panel, in such a way that the bounding box covers the whole panel.

Evaluation and Result
Some important parameters for implementing the algorithm are summarized in Table 2. First, the network uses the additional dataset for training and the COCO dataset's weights as the pre-train weights. Then, it uses the main dataset for training and the obtained weights from the previews network as the pre-train weights. The number of the training dataset is 80 percent of the whole dataset's images, the validation set is 10 percent of the images number in the training dataset, and the number of the test set is 20 percent of the whole images. The exact number of the main and additional datasets are explained in Table 3Table 3. To evaluate the algorithm's accuracy, the first step is to calculate the Intersection over Union (IoU) for all the bounding boxes. The IoU of each box is computed by dividing the area of the intersection between ground-truth and predicted bounding box over the area of the union between ground-truth and predicted bounding box. This study considers the IoU threshold 0.5.
The Precision, Recall, and Fmeasure determining the accuracy of the algorithm are described in the following.
This study uses K-Fold cross-validation technique with K=5 to evaluate the algorithm's performance. The results are illustrated in Table 4. The evaluation outcomes of Fold1 for various thresholds of IoU are depicted in Table  5. As shown, once the IoU threshold boost, Precision, Recall, and Fmeasure drop. The comparison of this method with other methods for text-based traffic panel detection can be observed in Table 6.

Discussion
Good accuracy in the high IoU thresholds is crucial as the detected text-based traffic panels may not contain the full text. The used algorithm in this study has some flaws mentioned in the following. As shown in Figure 10, the predicted bounding boxes do not contain the full text though IoU is higher than 0.5. Since the next step after the text-based traffic panel detection is the recognition of the texts, the predicted bounding boxes need to contain the entire text to deliver the concept. As shown in Figure 10: Predicted bounding boxes that do not include the full text. Figure 11, despite the clarity of some panels in specific sizes, the network cannot detect them. The reason is anchors' size chosen by the algorithm at the beginning. As depicted in Figure 12, the panels that were wrongly detected as the traffic panels look like some text-based traffic panels in size and background. Although the Tiny YOLOv3 has a few flaws, it is a robust algorithm and can detect traffic panels in many challenging situations. It does not wrongly detect the panels that contain several symbol-based panels (indeed, they are considered symbol-based traffic panels) and look like the text-based traffic panels, as shown in Figure 13. As depicted in Figure 14, other than a few cases mentioned as the algorithm's flaws, it does not incorrectly detect the other similar objects to the text-based traffic panels such as the advertisements and billboards on the sides of the roads and vehicles, store name board, etc. It can accurately detect the text-based traffic panels in the complicated backgrounds as shown in Figure 15. It can correctly detect the lopsided text-based traffic panels, as shown in Figure  16. As shown in Figure 17, it can detect the text-based traffic panels partially blocked by objects, including vehicles, other panels, trees, etc.

Conclusion
This study proposed a new categorization method for traffic panels. It classified the traffic panels into three classes such as symbol-based, text-based, and supplementary/additional traffic panel. A novel textbased traffic panel dataset named 'Persian text-based traffic panel dataset' was introduced. It includes two collections of images. The first is a collection of 9294 images utilized as an additional dataset for pre-training the network. The latter is the main dataset, including 3305 images, and was utilized for training and testing the network.   The tiny YOLOv3 algorithm was exploited for pretraining, training, and testing the data to examine the utility and advantages of the data. The K-fold crossvalidation procedure was used to estimate the model's skill on the new data. It achieves 0.973 for Precision, 0.945 for Recall, and 0.955 for Fmeasure. Since the next step of textbased traffic panel detection is recognizing the text on the traffic panel, the authors plan to enhance text-based traffic panel detection accuracy for IoU thresholds greater than 0.5 to focus on recognizing the texts in the dataset as the future study.