Examinando por Autor "Pinto Rojas, Yuri Andrea"
Mostrando 1 - 7 de 7
Resultados por página
Opciones de ordenación
Publicación Acceso abierto Aumento de Datos Basado en Modelos Generativos de Series Temporales para Detección de Anomalías en Sistemas de Control Industrial(Universidad de los Andes, 2026-05-25) Manrique Cruz, Juan Miguel; Pinto Rojas, Yuri AndreaLos sistemas de control industrial (ics) constituyen infraestructuras críticas cuya disponibilidad operativa es prioritaria. Los detectores de anomalías no supervisados entrenados sobre estos sistemas enfrentan limitaciones derivadas de la escasez de datos representativos, lo que puede producir umbrales de decisión inestables y alta tasa de falsas alarmas. Este trabajo evalúa el impacto del aumento de datos sintéticos, generados mediante TimeGAN (Yoon et al., 2019), en el entrenamiento de un detector basado en un autoencoder variacional híbrido lstm-vae sobre el dataset swat A1&A2. Bajo el esquema Train on Synthetic, Test on Real (tstr), se comparan tres escenarios: solo datos reales, solo datos sintéticos y datos reales combinados con sintéticos. Los resultados indican que los datos sintéticos no son suficientemente fieles para sustituir por completo a los datos reales, como evidencia el bajo desempeño del escenario entrenado exclusivamente con sintéticos. Sin embargo, cuando se incorporan como aumentación complementaria, inducen un efecto de regularización estructurada que mejora el ELBO de validación en 18 %, incrementa el PR-AUC en 4.7 pp y el recall en 1.1 pp, y aumenta la discriminabilidad en espacio latente (ratio 2.47× vs. 2.21×), sin incrementar la tasa de falsas alarmas. Estos hallazgos sugieren que el aumento generativo puede contribuir al diseño de detectores más robustos en ics siempre que los datos sintéticos se usen como complemento de la normalidad real y no como reemplazo directo.Publicación Acceso abierto Closing the Generalization Gap: transferable unsupervised dos detectors for Water Distribution ICS(Universidad de los Andes, 2025-12-10) Peñaranda Urbina, Pablo Ramón Santiago; Pinto Rojas, Yuri AndreaThis work studies the transferability of an unsupervised autoencoder-based detector for denial-of-service (DoS) attacks in industrial control systems (ICS). We use DHALSIM to simulate two water distribution networks with different sizes and topologies, Anytown and C-Town, and capture network traffic at the PLC2 controller under normal operation and ARP-based DoS attacks. From raw PCAP traces we extract aggregated ARP features over fixed 5-second windows, including ARP rate, total packet rate, ARP ratio, and communication diversity metrics. An autoencoder is trained exclusively on normal windows from Anytown and evaluated on both scenarios. Isolation Forest is used as a baseline, also trained strictly on normal Anytown traffic to ensure no data leakage. We analyse reconstruction error distributions, receiver operating characteristic (ROC) curves, and area under the ROC curve (AUC) to quantify separability between normal and attack traffic. The autoencoder achieves an AUC of 0.79 in Anytown and 0.99 in C-Town, while Isolation Forest reaches 0.79 and 0.99 respectively. The lower performance in Anytown is driven by the attack morphology: the DoS scenario is intermittent and heavily diluted across aggregation windows, whereas C-Town exhibits a continuous and sustained ARP flood that strongly departs from the learned normal manifold. Results show that unsupervised detectors trained in one ICS can generalise to other water distribution networks when volumetric ARP attacks induce sufficient distributional shifts, highlighting transferability as a realistic option for DoS detection in water ICS. Industrial Control Systems, Anomaly Detection, Unsupervised Learning, Transferability, Denial-of-Service (DoS), Autoencoder, Water Distribution Networks.Publicación Acceso abierto Control de Acceso Basado en Contexto para Extensiones de Navegador Web(Universidad de los Andes, 2026-05-22) Rozo Cepeda, Adriana Sofía; Pinto Rojas, Yuri AndreaLas extensiones de navegador representan una superficie de ataque crítica, ya que operan dentro del contexto de confianza del usuario y pueden acceder a datos sensibles, modificar páginas web e interactuar con APIs privilegiadas. Aunque Manifest V3 restringió la ejecución de código remoto en Google Chrome, campañas recientes evidencian que los atacantes han adaptado sus técnicas mediante abuso de permisos legítimos, actualizaciones maliciosas, persistencia con APIs nativas y observación contextual de la actividad del usuario. Este trabajo presenta una solución para auditar y mitigar riesgos en extensiones bajo Manifest V3. La propuesta integra un modelo cuantitativo de riesgo basado en permisos, actualizado a partir del Permission Risk Whitepaper de Google de 2019; un backend de análisis estático de código fuente; un módulo LLM que explica los hallazgos a partir de evidencia técnica; y un mecanismo de control contextual que deshabilita extensiones en zonas seguras definidas por el usuario. La solución se implementó como una extensión de Chrome conectada a un backend con reglas SAST. La extensión calcula el riesgo declarado en el manifest.json, monitorea cambios de permisos y aplica restricciones contextuales durante la navegación. El backend analiza paquetes .crx, detecta patrones de comportamiento sospechoso y genera reportes estructurados que el LLM transforma en explicaciones comprensibles para el usuario final. La evaluación experimental se realizó sobre 100 extensiones maliciosas del repositorio público MaliciousBrowserExtensions, con el objetivo de identificar falsos negativos y analizar la capacidad del sistema para reconocer comportamientos riesgosos. Los resultados permiten valorar la utilidad de combinar riesgo declarado, alcance de permisos, análisis estático y explicación asistida por LLM como una capa adicional de auditoría frente a los mecanismos tradicionales de revisión de extensiones.Publicación Acceso abierto Defense model to detect cyberattacks in critical infrastructures: Machine Learning And Cyber Threat Intelligence Approach(Universidad de los Andes, 2024-12-04) Pinto Rojas, Yuri Andrea; Donoso Meisel, Yezyd Enrique; Gutiérrez, Jairo A.; Núñez Castro, Haydemar María; Safaei Pour, Morteza; Lozano Garzón, Carlos Andrés; Montoya Orozco, Germán Adolfo; Facultad de Ingeniería::COMIT - Comunicaciones y Tecnología de InformaciónCritical Infrastructures (CIs), including energy, water, and industrial control systems, are foundational to the functioning of modern society. However, the evolving sophistication of cyber threats poses significant risks to these essential services, with traditional security frameworks often falling short in addressing the complexities inherent to CIs. The increasing integration of Industrial Internet of Things (IIoT) devices and operational technologies further complicates the security landscape, creating a critical need for adaptive and holistic cybersecurity solutions that can protect against both network and physical disruptions. This doctoral thesis presents the Integrated Hybrid Cybersecurity Framework (IHCF)—a novel, adaptive approach designed to address these challenges. By integrating Adversarial Autoencoders (AAE) with Graph Convolutional Networks with Long Short-Term Memory (GCN-LSTM) and leveraging Cyber Threat Intelligence (CTI), the IHCF aims to bridge the gap between physical anomaly detection and network-based threat classification. The framework offers a comprehensive, context-aware defense mechanism capable of handling both known and emerging threats across physical and network domains in CI environments. The research follows an iterative Design Science Research Methodology (DSRM), starting with problem identification, moving through solution design, development, and rigorous evaluation, and concluding with effective communication of findings. Through an extensive systematic literature review, key limitations in existing cybersecurity frameworks were identified—primarily their inability to effectively integrate network traffic analysis with physical anomaly detection and contextual threat intelligence. The IHCF was developed to overcome these limitations, using a hybrid approach to integrate physical sensor data, network traffic data, and threat intelligence into a cohesive security framework. The IHCF was evaluated using the SWAT dataset—a scaled-down industrial testbed providing both physical sensor and network data, with attack scenarios targeting physical components and network communications. The evaluation results demonstrate that the IHCF successfully detected and classified all 26 attack scenarios aimed for detection, achieving robust performance across both network and physical domains. The Adversarial Autoencoder (AAE) successfully identified 24 out of 26 scenarios, while the GCN-LSTM component achieved an accuracy of 99.04% and a macro F1-score of 0.9151, reflecting strong classification capabilities across diverse classes. This hybrid approach ensures that all anomalies are detected, providing a comprehensive detection mechanism that captures both temporal and spatial anomalies. The inclusion of MITRE ATT&CK within the GCN-LSTM further enriched the framework's situational awareness, mapping detected threats to known adversary tactics, techniques, and procedures, and thereby providing valuable context to guide response actions. This feature empowers analysts with actionable insights, facilitating targeted and efficient incident responses that enhance the resilience of CI systems. While the IHCF demonstrated strong results, several limitations were identified, including reliance on a single dataset for evaluation and challenges related to generalizing the findings to other CI environments. Expanding the scope of datasets, enhancing adaptability, and ensuring scalability will be essential steps for future research to address these limitations. Overall, this thesis contributes significantly to the academic and practical domains of cybersecurity, presenting an adaptive, robust, and context-aware solution for protecting critical infrastructure systems. The IHCF provides a pathway to significantly improve the cybersecurity posture of CIs by integrating AI-driven anomaly detection with threat intelligence, and these findings will be disseminated through peer-reviewed publications and academic conference presentations to advance knowledge in the field.Publicación Acceso abierto Design and implementation of a testing project to detect explanatory contradictions in large language models (LLMs) in sensitive scenarios, following ethical and regulatory principles(Universidad de los Andes, 2025-12-10) Torres Turriago, Daniela ; Pinto Rojas, Yuri AndreaEach day Large Language Models (LLMs) are being used more frequently in sensitive areas such as law, healthcare and education, areas which require that consistency, transparency and privacy are always guaranteed. This study examines the compliance of the previous characteristics based on explanations generated by two LLMs: LLaMA 2 and LLaMA 3.1 when answering to semantically equivalent prompts across four dimensions: normative, logical, ethical, and explanatory, using a rubric-based protocol which is aligned with the principles of transparency, stability and privacy of GDPR, NIST and the 1581 Colombian Law. We evaluated the outputs of the models under specific criteria such as explanatory stability, factual contradiction, logical inconsistencies, explanatory evasion, normative ambiguity, quality justification and misclassified risk. Results from the rubric show that LLaMA 3.1 achieved a near perfect logical coherence and explanatory stability with a percentage of more than 90 in improvement rates, while LLaMA 2, exhibited a higher variability with up to 20% normative degradation and more than 50% of “No Compliance” in factual contradiction, logic inconsistency, explainability evasion and quality justification. These findings demonstrate that the rubric based evaluation is able to systematically detect contradictions and highlight model differences. The proposed protocol provides a reproducible framework for regulation assigned assessments of the LLMs models in sensitive contexts.Publicación Acceso abierto Evaluation of the Resilience and Compliance of LLMs in DSAR Management: A Comparative Analysis of Security Configurations(Universidad de los Andes, 2025-12-09) Puig Pardo, Juan Felipe; Pinto Rojas, Yuri AndreaThe use of large language models (LLMs) in cybersecurity tasks has created new opportunities for process automation, while also introducing challenges related to ethics, privacy, and regulatory compliance. One of the most sensitive domains is the management of Data Subject Access Requests (DSARs), through which individuals exercise their rights of access, rectification, or erasure of personal data un der the General Data Protection Regulation (GDPR) and equivalent national regulations. This work evaluates the resilience and regulatory compliance of LLMs in DSAR handling by comparing two configurations based on the Llama model: (i) a baseline version (C0) and (ii) an enhanced version (C1) that incorporates Retrieval Augmented Generation (RAG), security and privacy guardrails, a confirmation step, and defenses against prompt injection attacks. The evaluation was conducted using a structured set of prompts across three complexity levels (basic, intermediate, and advanced), designed to induce potential ethical violations or sensitive data disclosures. The models’ outputs were assessed using key performance indicators (KPIs) groun dedin GDPR requirement sand ISO/IEC27001:2022 and 27701:2019 controls, specifically: (i) regulatory compliance (Arts. 5, 12, 15, 17, 25 GDPR), (ii) adversarial resilienceagainst manipulation and data leakage (Arts. 25, 32, 35 GDPR), and (iii) transparency and auditability (Arts.5 (2),12,24 GDPR). These KPIs provide an empirical basis for measuring improvements aligned with Privacy by Design, Accountability, and Security by Default. The central hypothesis is that the improved configuration (C1) will reduce unsafe or non compliant responses substantially, enhancing compliance, transparency, and traceability without compromising output quality or operational efficiency. Overall, this project contributes a reproducible audit and evaluation framework for LLMs under regulatory constraints, supporting trustworthy AI governance in contexts where data protection and cybersecurity are essential.Publicación Acceso abierto Impact of Secure Aggregation and Differential Privacy on Federated Network Intrusion Detection Systems(Universidad de los Andes, 2026-05-21) Peña Arias, Carlos Andrés; Pinto Rojas, Yuri AndreaFederated Learning (FL) facilitates the joint training of machine learning models without centralizing sensitive data, making it highly suitable for Network Intrusion Detection Systems (NIDS). However, FL is vulnerable to inference attacks that may compromise training data privacy and data exposure attacks by honest-but-curious servers [1]. To mitigate these risks, privacy-preserving mechanisms such as Differential Privacy (DP) and Secure Aggregation (SecAgg) are employed [2]. This work presents an empirical evaluation of the impact of DP and SecAgg on the performance of a federated NIDS using industry-oriented framework NVIDIA FLARE (NVFlare). Results demonstrate that federated GraphIDS preserves performance relatively close to a centralized baseline under heterogeneous client distributions. While homomorphic encryption-based SecAgg successfully protects client updates at the cost of increased computational and communication overhead, the evaluated client-side DP configurations introduce optimization instability and unfavorable privacyutility trade-offs. These findings highlight the practical challenges of achieving strong privacy guarantees in graph-based federated intrusion detection systems.