AI-Generated Text Detector for Arabic Language Using Encoder-Based Transformer Architecture
Cancer Detection Using a New Hybrid Method Based on Pattern Recognition in MicroRNAs Combining Particle Swarm Optimization Algorithm and Artificial Neural Network

Journal Description

Big Data and Cognitive Computing

Big Data and Cognitive Computing is an international, peer-reviewed, open access journal on big data and cognitive computing published monthly online by MDPI.

Open Access— free for readers, with article processing charges (APC) paid by authors or their institutions.
High Visibility: indexed within Scopus, ESCI (Web of Science), dblp, Inspec, Ei Compendex, and other databases.
Journal Rank: CiteScore - Q1 (Management Information Systems)
Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 18.2 days after submission; acceptance to publication is undertaken in 3.9 days (median values for papers published in this journal in the second half of 2023).
Recognition of Reviewers: reviewers who provide timely, thorough peer-review reports receive vouchers entitling them to a discount on the APC of their next publication in any MDPI journal, in appreciation of the work done.

Impact Factor: 3.7 (2022)

Imprint Information Journal Flyer Open Access ISSN: 2504-2289

Latest Articles

29 pages, 2444 KiB

Open AccessReview

Integrating OLAP with NoSQL Databases in Big Data Environments: Systematic Mapping

by Diana Martinez-Mosquera, Rosa Navarrete, Sergio Luján-Mora, Lorena Recalde and Andres Andrade-Cabrera

Big Data Cogn. Comput. 2024, 8(6), 64; https://doi.org/10.3390/bdcc8060064 (registering DOI) - 5 Jun 2024

Abstract

The growing importance of data analytics is leading to a shift in data management strategy at many companies, moving away from simple data storage towards adopting Online Analytical Processing (OLAP) query analysis. Concurrently, NoSQL databases are gaining ground as the preferred choice for storing and querying analytical data. This article presents a comprehensive, systematic mapping, aiming to consolidate research efforts related to the integration of OLAP with NoSQL databases in Big Data environments. After identifying 1646 initial research studies from scientific digital repositories, a thorough examination of their content resulted in the acceptance of 22 studies. Utilizing the snowballing technique, an additional three studies were selected, culminating in a final corpus of twenty-five relevant articles. This review addresses the growing importance of leveraging NoSQL databases for OLAP query analysis in response to increasing data analytics demands. By identifying the most commonly used NoSQL databases with OLAP, such as column-oriented and document-oriented, prevalent OLAP modeling methods, such as Relational Online Analytical Processing (ROLAP) and Multidimensional Online Analytical Processing (MOLAP), and suggested models for batch and real-time processing, among other results, this research provides a roadmap for organizations navigating the integration of OLAP with NoSQL. Additionally, exploring computational resource requirements and performance benchmarks facilitates informed decision making and promotes advancements in Big Data analytics. The main findings of this review provide valuable insights and updated information regarding the integration of OLAP cubes with NoSQL databases to benefit future research, industry practitioners, and academia alike. This consolidation of research efforts not only promotes innovative solutions but also promises reduced operational costs compared to traditional database systems. Full article

► Show Figures

Figure 1

17 pages, 1493 KiB

Open AccessArticle

LLMs and NLP Models in Cryptocurrency Sentiment Analysis: A Comparative Classification Study

by Konstantinos I. Roumeliotis, Nikolaos D. Tselikas and Dimitrios K. Nasiopoulos

Big Data Cogn. Comput. 2024, 8(6), 63; https://doi.org/10.3390/bdcc8060063 - 5 Jun 2024

Abstract

Cryptocurrencies are becoming increasingly prominent in financial investments, with more investors diversifying their portfolios and individuals drawn to their ease of use and decentralized financial opportunities. However, this accessibility also brings significant risks and rewards, often influenced by news and the sentiments of crypto investors, known as crypto signals. This paper explores the capabilities of large language models (LLMs) and natural language processing (NLP) models in analyzing sentiment from cryptocurrency-related news articles. We fine-tune state-of-the-art models such as GPT-4, BERT, and FinBERT for this specific task, evaluating their performance and comparing their effectiveness in sentiment classification. By leveraging these advanced techniques, we aim to enhance the understanding of sentiment dynamics in the cryptocurrency market, providing insights that can inform investment decisions and risk management strategies. The outcomes of this comparative study contribute to the broader discourse on applying advanced NLP models to cryptocurrency sentiment analysis, with implications for both academic research and practical applications in financial markets. Full article

(This article belongs to the Special Issue Generative AI and Large Language Models)

► Show Figures

Figure 1

29 pages, 5167 KiB

Open AccessReview

Insights into Industrial Efficiency: An Empirical Study of Blockchain Technology

by Kaoutar Douaioui and Othmane Benmoussa

Big Data Cogn. Comput. 2024, 8(6), 62; https://doi.org/10.3390/bdcc8060062 - 4 Jun 2024

Abstract

Blockchain technology is expected to have a radical impact on most industries by boosting security, transparency, and efficiency. This work considers the potential benefits of blockchain-focused applications in industrial process monitoring. The research design facilitates a detailed bibliometric analysis and delivers insights into the intellectual structure of blockchain technology’s application in industry via scientometric approaches. The work also approaches numerous sources in various industrial sectors to identify the transformative role of blockchain in industrial processes. Aspects such as blockchain technology’s impact on industrial processes’ transparency are discussed, while the paper does not ignore that success stories in applying blockchain to industrial sectors are often exaggerated due to a highly competitive environment that the cryptocurrency domain has become. Finally, the work presents major research avenues and decision-making areas that should be tackled to maximize the disruptive potential of blockchain and create a secure, transparent, and inclusive future. Full article

(This article belongs to the Special Issue Industrial Applications of IoT and Blockchain for Sustainable Environment)

► Show Figures

Figure 1

16 pages, 2955 KiB

Open AccessArticle

Analyzing Trends in Digital Transformation Korean Social Media Data: A Semantic Network Analysis

by Jong-Hwi Song and Byung-Suk Seo

Big Data Cogn. Comput. 2024, 8(6), 61; https://doi.org/10.3390/bdcc8060061 - 4 Jun 2024

Abstract

This study explores the impact of digital transformation on Korean society by analyzing Korean social media data, focusing on the societal and economic effects triggered by advancements in digital technology. Utilizing text mining techniques and semantic network analysis, we extracted key terms and their relationships from online news and blogs, identifying major themes related to digital transformation. Our analysis, based on data collected from major Korean portals using various related search terms, provides deep insights into how digital evolution influences individuals, businesses, and government sectors. The findings offer a comprehensive view of the technological and social trends emerging from digital transformation, including its policy, economic, and educational implications. This research not only sheds light on the understanding and strategic approaches to digital transformation in Korea but also demonstrates the potential of social media data in analyzing the societal impact of technological advancements, offering valuable resources for future research in effectively navigating the era of digital change. Full article

(This article belongs to the Special Issue Challenges and Perspectives of Social Networks within Social Computing)

► Show Figures

Figure 1

18 pages, 3050 KiB

Open AccessArticle

Quantifying Variations in Controversial Discussions within Kuwaiti Social Networks

by Yeonjung Lee, Hana Alostad and Hasan Davulcu

Big Data Cogn. Comput. 2024, 8(6), 60; https://doi.org/10.3390/bdcc8060060 - 4 Jun 2024

Abstract

During the COVID-19 pandemic, pro-vaccine and anti-vaccine groups emerged, influencing others to vaccinate or abstain and leading to polarized debates. Due to incomplete user data and the complexity of social network interactions, understanding the dynamics of these discussions is challenging. This study aims to discover and quantify the factors driving the controversy related to vaccine stances across Kuwaiti social networks. To tackle these challenges, a graph convolutional network (GCN) and feature propagation (FP) were utilized to accurately detect users’ stances despite incomplete features, achieving an accuracy of 96%. Additionally, the random walk controversy (RWC) score was employed to quantify polarization points within the social networks. Experiments were conducted using a dataset of vaccine-related retweets and discussions from X (formerly Twitter) during the Kuwait COVID-19 vaccine rollout period. The analysis revealed high polarization periods correlating with specific vaccination rates and governmental announcements. This research provides a novel approach to accurately detecting user stances in low-resource languages like the Kuwaiti dialect without the need for costly annotations, offering valuable insights to help policymakers understand public opinion and address misinformation effectively. Full article

► Show Figures

Figure 1

19 pages, 331 KiB

Open AccessArticle

An Efficient Probabilistic Algorithm to Detect Periodic Patterns in Spatio-Temporal Datasets

by Claudio Gutiérrez-Soto, Patricio Galdames and Marco A. Palomino

Big Data Cogn. Comput. 2024, 8(6), 59; https://doi.org/10.3390/bdcc8060059 - 3 Jun 2024

Abstract

Deriving insight from data is a challenging task for researchers and practitioners, especially when working on spatio-temporal domains. If pattern searching is involved, the complications introduced by temporal data dimensions create additional obstacles, as traditional data mining techniques are insufficient to address spatio-temporal databases (STDBs). We hereby present a new algorithm, which we refer to as F1/FP, and can be described as a probabilistic version of the Minus-F1 algorithm to look for periodic patterns. To the best of our knowledge, no previous work has compared the most cited algorithms in the literature to look for periodic patterns—namely, Apriori, MS-Apriori, FP-Growth, Max-Subpattern, and PPA. Thus, we have carried out such comparisons and then evaluated our algorithm empirically using two datasets, showcasing its ability to handle different types of periodicity and data distributions. By conducting such a comprehensive comparative analysis, we have demonstrated that our newly proposed algorithm has a smaller complexity than the existing alternatives and speeds up the performance regardless of the size of the dataset. We expect our work to contribute greatly to the mining of astronomical data and the permanently growing online streams derived from social media. Full article

(This article belongs to the Special Issue Big Data and Information Science Technology)

23 pages, 1380 KiB

Open AccessArticle

Enhancing Self-Supervised Learning through Explainable Artificial Intelligence Mechanisms: A Computational Analysis

by Elie Neghawi and Yan Liu

Big Data Cogn. Comput. 2024, 8(6), 58; https://doi.org/10.3390/bdcc8060058 - 3 Jun 2024

Abstract

Self-supervised learning continues to drive advancements in machine learning. However, the absence of unified computational processes for benchmarking and evaluation remains a challenge. This study conducts a comprehensive analysis of state-of-the-art self-supervised learning algorithms, emphasizing their underlying mechanisms and computational intricacies. Building upon this analysis, we introduce a unified model-agnostic computation (UMAC) process, tailored to complement modern self-supervised learning algorithms. UMAC serves as a model-agnostic and global explainable artificial intelligence (XAI) methodology that is capable of systematically integrating and enhancing state-of-the-art algorithms. Through UMAC, we identify key computational mechanisms and craft a unified framework for self-supervised learning evaluation. Leveraging UMAC, we integrate an XAI methodology to enhance transparency and interpretability. Our systematic approach yields a 17.12% increase in improvement in training time complexity and a 13.1% boost in improvement in testing time complexity. Notably, improvements are observed in augmentation, encoder architecture, and auxiliary components within the network classifier. These findings underscore the importance of structured computational processes in enhancing model efficiency and fortifying algorithmic transparency in self-supervised learning, paving the way for more interpretable and efficient AI models. Full article

► Show Figures

Figure 1

16 pages, 4754 KiB

Open AccessArticle

Dynamic Electrocardiogram Signal Quality Assessment Method Based on Convolutional Neural Network and Long Short-Term Memory Network

by Chen He, Yuxuan Wei, Yeru Wei, Qiang Liu and Xiang An

Big Data Cogn. Comput. 2024, 8(6), 57; https://doi.org/10.3390/bdcc8060057 - 31 May 2024

Abstract

Cardiovascular diseases (CVDs) are highly prevalent, sudden onset, and relatively fatal, posing a significant public health burden. Long-term dynamic electrocardiography, which can continuously record the long-term dynamic ECG activities of individuals in their daily lives, has high research value. However, ECG signals are weak and highly susceptible to external interference, which may lead to false alarms and misdiagnosis, affecting the diagnostic efficiency and the utilization rate of healthcare resources, so research on the quality of dynamic ECG signals is extremely necessary. Aimed at the above problems, this paper proposes a dynamic ECG signal quality assessment method based on CNN and LSTM that divides the signal into three quality categories: the signal of the Q1 category has a lower noise level, which can be used for reliable diagnosis of arrhythmia, etc.; the signal of the Q2 category has a higher noise level, but it still contains information that can be used for heart rate calculation, HRV analysis, etc.; and the signal of the Q3 category has a higher noise level that can interfere with the diagnosis of cardiovascular disease and should be discarded or labeled. In this paper, we use the widely recognized MIT-BIH database, based on which the model is applied to realistically collect exercise experimental data to assess the performance of the model in dealing with real-world situations. The model achieves an accuracy of 98.65% on the test set, a macro-averaged F1 score of 98.5%, and a high F1 score of 99.71% for the prediction of Q3 category signals, which shows that the model has good accuracy and generalization performance. Full article

► Show Figures

Figure 1

18 pages, 4725 KiB

Open AccessArticle

Stock Trend Prediction with Machine Learning: Incorporating Inter-Stock Correlation Information through Laplacian Matrix

by Wenxuan Zhang and Benzhuo Lu

Big Data Cogn. Comput. 2024, 8(6), 56; https://doi.org/10.3390/bdcc8060056 - 30 May 2024

Abstract

Predicting stock trends in financial markets is of significant importance to investors and portfolio managers. In addition to a stock’s historical price information, the correlation between that stock and others can also provide valuable information for forecasting future returns. Existing methods often fall short of straightforward and effective capture of the intricate interdependencies between stocks. In this research, we introduce the concept of a Laplacian correlation graph (LOG), designed to explicitly model the correlations in stock price changes as the edges of a graph. After constructing the LOG, we will build a machine learning model, such as a graph attention network (GAT), and incorporate the LOG into the loss term. This innovative loss term is designed to empower the neural network to learn and leverage price correlations among different stocks in a straightforward but effective manner. The advantage of a Laplacian matrix is that matrix operation form is more suitable for current machine learning frameworks, thus achieving high computational efficiency and simpler model representation. Experimental results demonstrate improvements across multiple evaluation metrics using our LOG. Incorporating our LOG into five base machine learning models consistently enhances their predictive performance. Furthermore, backtesting results reveal superior returns and information ratios, underscoring the practical implications of our approach for real-world investment decisions. Our study addresses the limitations of existing methods that miss the correlation between stocks or fail to model correlation in a simple and effective way, and the proposed LOG emerges as a promising tool for stock returns prediction, offering enhanced predictive accuracy and improved investment outcomes. Full article

(This article belongs to the Special Issue Big Data Analytics and Edge Computing: Recent Trends and Future)

► Show Figures

Figure 1

25 pages, 8282 KiB

Open AccessArticle

A Secure Data Publishing and Access Service for Sensitive Data from Living Labs: Enabling Collaboration with External Researchers via Shareable Data

by Mikel Hernandez, Evdokimos Konstantinidis, Gorka Epelde, Francisco Londoño, Despoina Petsani, Michalis Timoleon, Vasiliki Fiska, Lampros Mpaltadoros, Christoniki Maga-Nteve, Ilias Machairas and Panagiotis D. Bamidis

Big Data Cogn. Comput. 2024, 8(6), 55; https://doi.org/10.3390/bdcc8060055 - 28 May 2024

Abstract

Intending to enable a broader collaboration with the scientific community while maintaining privacy of the data stored and generated in Living Labs, this paper presents the Shareable Data Publishing and Access Service for Living Labs, implemented within the framework of the H2020 VITALISE project. Building upon previous work, significant enhancements and improvements are presented in the architecture enabling Living Labs to securely publish collected data in an internal and isolated node for external use. External researchers can access a portal to discover and download shareable data versions (anonymised or synthetic data) derived from the data stored across different Living Labs that they can use to develop, test, and debug their processing scripts locally, adhering to legal and ethical data handling practices. Subsequently, they may request remote execution of the same algorithms against the real internal data in Living Lab nodes, comparing the outcomes with those obtained using shareable data. The paper details the architecture, data flows, technical details and validation of the service with real-world usage examples, demonstrating its efficacy in promoting data-driven research in digital health while preserving privacy. The presented service can be used as an intermediary between Living Labs and external researchers for secure data exchange and to accelerate research on data analytics paradigms in digital health, ensuring compliance with data protection laws. Full article

(This article belongs to the Special Issue Privacy-Enhancing Technologies of Data for Sustainable and Secure Cooperation)

► Show Figures

Figure 1

20 pages, 70388 KiB

Open AccessArticle

Analyzing the Attractiveness of Food Images Using an Ensemble of Deep Learning Models Trained via Social Media Images

by Tanyaboon Morinaga, Karn Patanukhom and Yuthapong Somchit

Big Data Cogn. Comput. 2024, 8(6), 54; https://doi.org/10.3390/bdcc8060054 - 27 May 2024

Abstract

With the growth of digital media and social networks, sharing visual content has become common in people’s daily lives. In the food industry, visually appealing food images can attract attention, drive engagement, and influence consumer behavior. Therefore, it is crucial for businesses to understand what constitutes attractive food images. Assessing the attractiveness of food images poses significant challenges due to the lack of large labeled datasets that align with diverse public preferences. Additionally, it is challenging for computer assessments to approach human judgment in evaluating aesthetic quality. This paper presents a novel framework that circumvents the need for explicit human annotation by leveraging user engagement data that are readily available on social media platforms. We propose procedures to collect, filter, and automatically label the attractiveness classes of food images based on their user engagement levels. The data gathered from social media are used to create predictive models for category-specific attractiveness assessments. Our experiments across five food categories demonstrate the efficiency of our approach. The experimental results show that our proposed user-engagement-based attractiveness class labeling achieves a high consistency of 97.2% compared to human judgments obtained through A/B testing. Separate attractiveness assessment models were created for each food category using convolutional neural networks (CNNs). When analyzing unseen food images, our models achieve a consistency of 76.0% compared to human judgments. The experimental results suggest that the food image dataset collected from social networks, using the proposed framework, can be successfully utilized for learning food attractiveness assessment models. Full article

(This article belongs to the Special Issue Advances and Applications of Deep Learning Methods and Image Processing)

► Show Figures

Figure 1

23 pages, 2866 KiB

Open AccessArticle

Exploiting Rating Prediction Certainty for Recommendation Formulation in Collaborative Filtering

by Dionisis Margaris, Kiriakos Sgardelis, Dimitris Spiliotopoulos and Costas Vassilakis

Big Data Cogn. Comput. 2024, 8(6), 53; https://doi.org/10.3390/bdcc8060053 - 27 May 2024

Abstract

Collaborative filtering is a popular recommender system (RecSys) method that produces rating prediction values for products by combining the ratings that close users have already given to the same products. Afterwards, the products that achieve the highest prediction values are recommended to the user. However, as expected, prediction estimation may contain errors, which, in the case of RecSys, will lead to either not recommending a product that the user would actually like (i.e., purchase, watch, or listen) or to recommending a product that the user would not like, with both cases leading to degraded recommendation quality. Especially in the latter case, the RecSys would be deemed unreliable. In this work, we design and develop a recommendation algorithm that considers both the rating prediction values and the prediction confidence, derived from features associated with rating prediction accuracy in collaborative filtering. The presented algorithm is based on the rationale that it is preferable to recommend an item with a slightly lower prediction value, if that prediction seems to be certain and safe, over another that has a higher value but of lower certainty. The proposed algorithm prevents low-confidence rating predictions from being included in recommendations, ensuring the recommendation quality and reliability of the RecSys. Full article

(This article belongs to the Special Issue Business Intelligence and Big Data in E-commerce)

► Show Figures

Figure 1

14 pages, 4246 KiB

Open AccessArticle

Image-Based Leaf Disease Recognition Using Transfer Deep Learning with a Novel Versatile Optimization Module

by Petra Radočaj, Dorijan Radočaj and Goran Martinović

Big Data Cogn. Comput. 2024, 8(6), 52; https://doi.org/10.3390/bdcc8060052 - 23 May 2024

Abstract

Due to the projected increase in food production by 70% in 2050, crops should be additionally protected from diseases and pests to ensure a sufficient food supply. Transfer deep learning approaches provide a more efficient solution than traditional methods, which are labor-intensive and struggle to effectively monitor large areas, leading to delayed disease detection. This study proposed a versatile module based on the Inception module, Mish activation function, and Batch normalization (IncMB) as a part of deep neural networks. A convolutional neural network (CNN) with transfer learning was used as the base for evaluated approaches for tomato disease detection: (1) CNNs, (2) CNNs with a support vector machine (SVM), and (3) CNNs with the proposed IncMB module. In the experiment, the public dataset PlantVillage was used, containing images of six different tomato leaf diseases. The best results were achieved by the pre-trained InceptionV3 network, which contains an IncMB module with an accuracy of 97.78%. In three out of four cases, the highest accuracy was achieved by networks containing the proposed IncMB module in comparison to evaluated CNNs. The proposed IncMB module represented an improvement in the early detection of plant diseases, providing a basis for timely leaf disease detection. Full article

(This article belongs to the Topic Big Data and Artificial Intelligence, 2nd Volume)

► Show Figures

Figure 1

20 pages, 4936 KiB

Open AccessArticle

Development of Context-Based Sentiment Classification for Intelligent Stock Market Prediction

by Nurmaganbet Smatov, Ruslan Kalashnikov and Amandyk Kartbayev

Big Data Cogn. Comput. 2024, 8(6), 51; https://doi.org/10.3390/bdcc8060051 - 22 May 2024

Abstract

This paper presents a novel approach to sentiment analysis specifically customized for predicting stock market movements, bypassing the need for external dictionaries that are often unavailable for many languages. Our methodology directly analyzes textual data, with a particular focus on context-specific sentiment words within neural network models. This specificity ensures that our sentiment analysis is both relevant and accurate in identifying trends in the stock market. We employ sophisticated mathematical modeling techniques to enhance both the precision and interpretability of our models. Through meticulous data handling and advanced machine learning methods, we leverage large datasets from Twitter and financial markets to examine the impact of social media sentiment on financial trends. We achieved an accuracy exceeding 75%, highlighting the effectiveness of our modeling approach, which we further refined into a convolutional neural network model. This achievement contributes valuable insights into sentiment analysis within the financial domain, thereby improving the overall clarity of forecasting in this field. Full article

(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)

► Show Figures

Figure 1

22 pages, 604 KiB

Open AccessArticle

XplAInable: Explainable AI Smoke Detection at the Edge

by Alexander Lehnert, Falko Gawantka, Jonas During, Franz Just and Marc Reichenbach

Big Data Cogn. Comput. 2024, 8(5), 50; https://doi.org/10.3390/bdcc8050050 - 17 May 2024

Abstract

Wild and forest fires pose a threat to forests and thereby, in extension, to wild life and humanity. Recent history shows an increase in devastating damages caused by fires. Traditional fire detection systems, such as video surveillance, fail in the early stages of a rural forest fire. Such systems would see the fire only when the damage is immense. Novel low-power smoke detection units based on gas sensors can detect smoke fumes in the early development stages of fires. The required proximity is only achieved using a distributed network of sensors interconnected via 5G. In the context of battery-powered sensor nodes, energy efficiency becomes a key metric. Using AI classification combined with XAI enables improved confidence regarding measurements. In this work, we present both a low-power gas sensor for smoke detection and a system elaboration regarding energy-efficient communication schemes and XAI-based evaluation. We show that leveraging edge processing in a smart way combined with buffered data samples in a 5G communication network yields optimal energy efficiency and rating results. Full article

(This article belongs to the Special Issue Low-Power Data Processing on the Edge: Solutions for Artificial Intelligence Hardware Acceleration)

► Show Figures

Figure 1

18 pages, 2027 KiB

Open AccessArticle

Runtime Verification-Based Safe MARL for Optimized Safety Policy Generation for Multi-Robot Systems

by Yang Liu and Jiankun Li

Big Data Cogn. Comput. 2024, 8(5), 49; https://doi.org/10.3390/bdcc8050049 - 16 May 2024

Abstract

The intelligent warehouse is a modern logistics management system that uses technologies like the Internet of Things, robots, and artificial intelligence to realize automated management and optimize warehousing operations. The multi-robot system (MRS) is an important carrier for implementing an intelligent warehouse, which completes various tasks in the warehouse through cooperation and coordination between robots. As an extension of reinforcement learning and a kind of swarm intelligence, MARL (multi-agent reinforcement learning) can effectively create the multi-robot systems in intelligent warehouses. However, MARL-based multi-robot systems in intelligent warehouses face serious safety issues, such as collisions, conflicts, and congestion. To deal with these issues, this paper proposes a safe MARL method based on runtime verification, i.e., an optimized safety policy-generation framework, for multi-robot systems in intelligent warehouses. The framework consists of three stages. In the first stage, a runtime model SCMG (safety-constrained Markov Game) is defined for the multi-robot system at runtime in the intelligent warehouse. In the second stage, rPATL (probabilistic alternating-time temporal logic with rewards) is used to express safety properties, and SCMG is cyclically verified and refined through runtime verification (RV) to ensure safety. This stage guarantees the safety of robots’ behaviors before training. In the third stage, the verified SCMG guides SCPO (safety-constrained policy optimization) to obtain an optimized safety policy for robots. Finally, a multi-robot warehouse (RWARE) scenario is used for experimental evaluation. The results show that the policy obtained by our framework is safer than existing frameworks and includes a certain degree of optimization. Full article

(This article belongs to the Special Issue Field Robotics and Artificial Intelligence (AI))

► Show Figures

Figure 1

14 pages, 2188 KiB

Open AccessArticle

Enhanced Linear and Vision Transformer-Based Architectures for Time Series Forecasting

by Musleh Alharthi and Ausif Mahmood

Big Data Cogn. Comput. 2024, 8(5), 48; https://doi.org/10.3390/bdcc8050048 - 16 May 2024

Abstract

Time series forecasting has been a challenging area in the field of Artificial Intelligence. Various approaches such as linear neural networks, recurrent linear neural networks, Convolutional Neural Networks, and recently transformers have been attempted for the time series forecasting domain. Although transformer-based architectures have been outstanding in the Natural Language Processing domain, especially in autoregressive language modeling, the initial attempts to use transformers in the time series arena have met mixed success. A recent important work indicating simple linear networks outperform transformer-based designs. We investigate this paradox in detail comparing the linear neural network- and transformer-based designs, providing insights into why a certain approach may be better for a particular type of problem. We also improve upon the recently proposed simple linear neural network-based architecture by using dual pipelines with batch normalization and reversible instance normalization. Our enhanced architecture outperforms all existing architectures for time series forecasting on a majority of the popular benchmarks. Full article

► Show Figures

Figure 1

13 pages, 1255 KiB

Open AccessArticle

International Classification of Diseases Prediction from MIMIIC-III Clinical Text Using Pre-Trained ClinicalBERT and NLP Deep Learning Models Achieving State of the Art

by Ilyas Aden, Christopher H. T. Child and Constantino Carlos Reyes-Aldasoro

Big Data Cogn. Comput. 2024, 8(5), 47; https://doi.org/10.3390/bdcc8050047 - 10 May 2024

Abstract

The International Classification of Diseases (ICD) serves as a widely employed framework for assigning diagnosis codes to electronic health records of patients. These codes facilitate the encapsulation of diagnoses and procedures conducted during a patient’s hospitalisation. This study aims to devise a predictive model for ICD codes based on the MIMIC-III clinical text dataset. Leveraging natural language processing techniques and deep learning architectures, we constructed a pipeline to distill pertinent information from the MIMIC-III dataset: the Medical Information Mart for Intensive Care III (MIMIC-III), a sizable, de-identified, and publicly accessible repository of medical records. Our method entails predicting diagnosis codes from unstructured data, such as discharge summaries and notes encompassing symptoms. We used state-of-the-art deep learning algorithms, such as recurrent neural networks (RNNs), long short-term memory (LSTM) networks, bidirectional LSTM (BiLSTM) and BERT models after tokenizing the clinical test with Bio-ClinicalBERT, a pre-trained model from Hugging Face. To evaluate the efficacy of our approach, we conducted experiments utilizing the discharge dataset within MIMIC-III. Employing the BERT model, our methodology exhibited commendable accuracy in predicting the top 10 and top 50 diagnosis codes within the MIMIC-III dataset, achieving average accuracies of 88% and 80%, respectively. In comparison to recent studies by Biseda and Kerang, as well as Gangavarapu, which reported F1 scores of 0.72 in predicting the top 10 ICD-10 codes, our model demonstrated better performance, with an F1 score of 0.87. Similarly, in predicting the top 50 ICD-10 codes, previous research achieved an F1 score of 0.75, whereas our method attained an F1 score of 0.81. These results underscore the better performance of deep learning models over conventional machine learning approaches in this domain, thus validating our findings. The ability to predict diagnoses early from clinical notes holds promise in assisting doctors or physicians in determining effective treatments, thereby reshaping the conventional paradigm of diagnosis-then-treatment care. Our code is available online. Full article

(This article belongs to the Special Issue Artificial Intelligence and Natural Language Processing)

► Show Figures

Figure 1

23 pages, 1966 KiB

Open AccessArticle

Imagine and Imitate: Cost-Effective Bidding under Partially Observable Price Landscapes

by Xiaotong Luo, Yongjian Chen, Shengda Zhuo, Jie Lu, Ziyang Chen, Lichun Li, Jingyan Tian, Xiaotong Ye and Yin Tang

Big Data Cogn. Comput. 2024, 8(5), 46; https://doi.org/10.3390/bdcc8050046 - 28 Apr 2024

Abstract

Real-time bidding has become a major means for online advertisement exchange. The goal of a real-time bidding strategy is to maximize the benefits for stakeholders, e.g., click-through rates or conversion rates. However, in practise, the optimal bidding strategy for real-time bidding is constrained by at least three aspects: cost-effectiveness, the dynamic nature of market prices, and the issue of missing bidding values. To address these challenges, we propose Imagine and Imitate Bidding (IIBidder), which includes Strategy Imitation and Imagination modules, to generate cost-effective bidding strategies under partially observable price landscapes. Experimental results on the iPinYou and YOYI datasets demonstrate that IIBidder reduces investment costs, optimizes bidding strategies, and improves future market price predictions. Full article

(This article belongs to the Special Issue Business Intelligence and Big Data in E-commerce)

► Show Figures

Figure 1

18 pages, 5253 KiB

Open AccessReview

Digital Twins for Discrete Manufacturing Lines: A Review

by Xianqun Feng and Jiafu Wan

Big Data Cogn. Comput. 2024, 8(5), 45; https://doi.org/10.3390/bdcc8050045 - 26 Apr 2024

Abstract

Along with the development of new-generation information technology, digital twins (DTs) have become the most promising enabling technology for smart manufacturing. This article presents a statistical analysis of the literature related to the applications of DTs for discrete manufacturing lines, researches their development status in the areas of the design and improvement of manufacturing lines, the scheduling and control of manufacturing line, and predicting faults in critical equipment. The deployment frameworks of DTs in different applications are summarized. In addition, this article discusses the three key technologies of high-fidelity modeling, real-time information interaction methods, and iterative optimization algorithms. The current issues, such as fine-grained sculpting of twin models, the adaptivity of the models, delay issues, and the development of efficient modeling tools are raised. This study provides a reference for the design, modification, and optimization of discrete manufacturing lines. Full article

► Show Figures