Publications
Publications in reversed chronological order.
2021
- Enhancing Hyper-To-Real Space Projections Through Euclidean Norm Meta-Heuristic OptimizationRibeiro, L., Roder, M., Rosa, G., Passos, L., and Papa, J.In 25th Iberoamerican Congress on Pattern Recognition 2021
The continuous computational power growth in the last decades has made solving several optimization problems significant to humankind a tractable task; however, tackling some of them remains a challenge due to the overwhelming amount of candidate solutions to be evaluated, even by using sophisticated algorithms. In such a context, a set of nature-inspired stochastic methods, called meta-heuristic optimization, can provide robust approximate solutions to different kinds of problems with a small computational burden, such as derivative-free real function optimization. Nevertheless, these methods may converge to inadequate solutions if the function landscape is too harsh, e.g., enclosing too many local optima. Previous works addressed this issue by employing a hypercomplex representation of the search space, like quaternions, where the landscape becomes smoother and supposedly easier to optimize. Under this approach, meta-heuristic computations happen in the hypercomplex space, whereas variables are mapped back to the real domain before function evaluation. Despite this latter operation being performed by the Euclidean norm, we have found that after the optimization procedure has finished, it is usually possible to obtain even better solutions by employing the Minkowski p-norm instead and fine-tuning p through an auxiliary sub-problem with neglecting additional cost and no hyperparameters. Such behavior was observed in eight well-established benchmarking functions, thus fostering a new research direction for hypercomplex meta-heuristic optimization.
2020
- O^2PF: Oversampling via Optimum-Path Forest for Breast Cancer DetectionPassos, L., Jodas, D., Ribeiro, L., Moreira, T., and Papa, J.In 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS) 2020
Breast cancer is among the most deadly diseases, distressing mostly women worldwide. Although traditional methods for detection have presented themselves as valid for the task, they still commonly present low accuracies and demand considerable time and effort from professionals. Therefore, a computer-aided diagnosis (CAD) system capable of providing early detection becomes hugely desirable. In the last decade, machine learning-based techniques have been of paramount importance in this context, since they are capable of extracting essential information from data and reasoning about it. However, such approaches still suffer from imbalanced data, specifically on medical issues, where the number of healthy people samples is, in general, considerably higher than the number of patients. Therefore this paper proposes the O2PF, a data oversampling method based on the unsupervised Optimum-Path Forest Algorithm. Experiments conducted over the full oversampling scenario state the robustness of the model, which is compared against three well-established oversampling methods considering three breast cancer and three general-purpose tasks for medical issues datasets.
- A Layer-Wise Information Reinforcement Approach to Improve Learning in Deep Belief NetworksRoder, Mateus, Passos, Leandro A., Ribeiro, Luiz Carlos Felix, Pereira, Clayton, and Papa, João PauloIn Artificial Intelligence and Soft Computing 2020
With the advent of deep learning, the number of works proposing new methods or improving existent ones has grown exponentially in the last years. In this scenario, “very deep” models were emerging, once they were expected to extract more intrinsic and abstract features while supporting a better performance. However, such models suffer from the gradient vanishing problem, i.e., backpropagation values become too close to zero in their shallower layers, ultimately causing learning to stagnate. Such an issue was overcome in the context of convolution neural networks by creating “shortcut connections” between layers, in a so-called deep residual learning framework. Nonetheless, a very popular deep learning technique called Deep Belief Network still suffers from gradient vanishing when dealing with discriminative tasks. Therefore, this paper proposes the Residual Deep Belief Network, which considers the information reinforcement layer-by-layer to improve the feature extraction and knowledge retaining, that support better discriminative performance. Experiments conducted over three public datasets demonstrate its robustness concerning the task of binary image classification.
- Intestinal Parasites Classification Using Deep Belief NetworksRoder, Mateus, Passos, Leandro A., Ribeiro, Luiz Carlos Felix, Benato, Barbara Caroline, Falcão, Alexandre Xavier, and Papa, João PauloIn Artificial Intelligence and Soft Computing 2020
Currently, approximately 4 billion people are infected by intestinal parasites worldwide. Diseases caused by such infections constitute a public health problem in most tropical countries, leading to physical and mental disorders, and even death to children and immunodeficient individuals. Although subjected to high error rates, human visual inspection is still in charge of the vast majority of clinical diagnoses. In the past years, some works addressed intelligent computer-aided intestinal parasites classification, but they usually suffer from misclassification due to similarities between parasites and fecal impurities. In this paper, we introduce Deep Belief Networks to the context of automatic intestinal parasites classification. Experiments conducted over three datasets composed of eggs, larvae, and protozoa provided promising results, even considering unbalanced classes and also fecal impurities.
- Evolving Neural Conditional Random Fields for drilling report classificationRibeiro, Luiz C.F., Afonso, Luis C.S., Colombo, Danilo, Guilherme, Ivan R., and Papa, João P.Journal of Petroleum Science and Engineering 2020
Oil and gas prospecting is an important economic activity, besides being expensive and quite complex, thus requiring close monitoring to avoid work accidents and mainly environmental damages. An essential source of information concerns the daily drilling reports that contain operations technical interpretations and additional information from rig sensors. However, only a few works have focused on mining textual information from such reports for providing intelligent-based decision-making mechanisms to aid safety and efficiency concerns in drilling operations. This work proposes a contextual-driven approach based on Recurrent Neural Networks to recognize events in drilling reports that can outperform other related techniques. We also introduce a novel approach based on evolutionary computing to combine partially trained models using cyclical learning rates. Experiments conducted on two unbalanced datasets provided by Petrobras (Petróleo Brasileiro S.A.) show that our model improved Macro-F1 scores over the baseline by more than 47%. Besides, the proposed ensembling technique further enhanced these values by another 3% in the best scenario. Such promising results can shed light over new research directions in the field. The source code is available at http://github.com/lzfelix/evolving-ncrf.
2019
- Bag of Samplings for computer-assisted Parkinson’s disease diagnosis based on Recurrent Neural NetworksRibeiro, Luiz C.F., Afonso, Luis C.S., and Papa, João P.Computers in Biology and Medicine 2019
Parkinson’s Disease (PD) is a clinical syndrome that affects millions of people worldwide. Although considered as a non-lethal disease, PD shortens the life expectancy of the patients. Many studies have been dedicated to evaluating methods for early-stage PD detection, which includes machine learning techniques that employ, in most cases, motor dysfunctions, such as tremor. This work explores the time dependency in tremor signals collected from handwriting exams. To learn such temporal information, we propose a model based on Bidirectional Gated Recurrent Units along with an attention mechanism. We also introduce the concept of “Bag of Samplings” that computes multiple compact representations of the signals. Experimental results have shown the proposed model is a promising technique with results comparable to some state-of-the-art approaches in the literature.
- Discovering Patterns within the Drilling Reports using Artificial Intelligence for Operation MonitoringColombo, Danilo, Pedronette, Daniel Carlos Guimarães, Guilherme, Ivan Rizzo, Papa, João Paulo, Ribeiro, Luiz Carlos Felix, Sugi Afonso, Luis Claudio, Presotto, João Gabriel Camacho, and Sousa, Gustavo José2019
In well drilling activities, the execution of a sequence of operations defined in a well project is a central task. In order to provide proper monitoring, the operations executed during the drilling procedures are reported in Daily Drilling Reports (DDRs). Technologies capable of assisting the fulfillment of such reports represent valuable contributions. An approach using Machine Learning and Sequence Mining algorithms is proposed for predicting the next operation and classifying it based on textual descriptions.Nowadays, artificial intelligence (AI) applications play a key role in digital transformation process and is a very broad area, with various branches. Machine Learning techniques provide systems the ability to automatically learn and improve from experience without explicit instructions. Sequence Mining can be broadly defined as the task of finding statistical relevant patterns between samples modeled in a sequence. In our approach, the operations reported in DDRs are analyzed by Sequence Mining algorithms for predicting the next operation, whereas Machine Learning methods are used for automatically classifying the operations according to predefined ontologies based on textual descriptions.The proposed approach was experimentally validated using a real-world dataset composed of drilling reports with approximately 90K entries. Various sequence prediction algorithms are considered, more specifically: CPT+(Compact Prediction Tree+), DG (Dependency Graph), AKOM (All-k Order Markov), LZ78, PPM (Prediction by Partial Matching), and TDAG (Transitional Directed Acyclic Graph). For the classification tasks, approaches based on word embeddings and CRF (Conditional Random Fields) are exploited. Experimental results achieved high-accurate results, of 89\% for the classification task. The promising results indicate that such strategies can be successfully exploited in the evaluated scenarios. Additionally, the positive results also encourage the investigation of its use in other oil and gas applications, since the reports organized through chronological order consists of a common scenario.The main contribution to the oil and gas industry consists of using artificial intelligence strategies in tasks associated with DDRs, saving human efforts and improving operational efficiency. Although the Sequence Mining and Machine Learning algorithms have been extensively used in different applications, the novelty of our work consists in the use of such approaches on the tasks of extracting useful information from the DDRs.
2018
- Unsupervised Dialogue Act Classification with Optimum-Path ForestRibeiro, L. C. F., and Papa, J. P.In 2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI) 2018
Dialogue Act classification is a relevant problem for the Natural Language Processing field either as a standalone task or when used as input for downstream applications. Despite its importance, most of the existing approaches rely on supervised techniques, which depend on annotated samples, making it difficult to take advantage of the increasing amount of data available in different domains. In this paper, we briefly review the most commonly used datasets to evaluate Dialogue Act classification approaches and introduce the Optimum-Path Forest (OPF) classifier to this task. Instead of using its original strategy to determine the corresponding class for each cluster, we use a modified version based on majority voting, named M-OPF, which yields good results when compared to k-means and Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN), according to accuracy and V-measure. We also show that M-OPF, and consequently OPF, are less sensitive to hyper-parameter tuning when compared to HDBSCAN.