Poster session 3

3 - 1
A Comparative Study of Segmentation Methods for Historical Documents
Nury Yuleny Arosquipa Yanque
Nury Yuleny Arosquipa Yanque
Great efforts are being made to digitize ancient handwritten and machine-printed text documents in recent years and Optical Character Recognition (OCR) systems still do not work well in them for a variety of reasons (paper aging defects, faded ink, stains, uneven lighting, folds, bleed-through, ghosting, poor contrast between text and background, among others). An important step in most OCR system is segmentation of text and background (binarization), that is particularly sensitive to the typical artifacts of historical documents (in the last 8 years, competitions for segmentation of historical documents has been held 1). Here we compare several segmentation methods and propose a new one based on machine learning that rescues the advantages of the heuristic and texture methods. The study covered both handwritten and typography historical documents and we compared the segmentation via DIBCO standard metrics and an open OCR system. The results of the proposed method are comparable with the state of the art in respect to DIBCO metrics but it has advantages respect to OCR system.
document binarization, thresholding, text segmentation
3 - 2
A continuous model of pulse clarity
Martin A. Miguel, Mariano Sigman, Diego Fernandez Slezak
Martin Miguel
"Music has a unique capability to evoke emotions. One of the mechanisms used is through the manipulation of expectations in time. Specifically in rhythms, two concepts have been explored that relate to the comfort and understanding of music: pulse clarity and rhythm complexity. Several computational models have been introduced to analyze these concepts but most do not consider how they evolve in time. We present a novel beat tracking model, that given a rhythmic passage provides continuous information of which tacti are most reasonable and how salient they are. Here we evaluate the output of the model as a proxy for pulse clarity. We performed a beat tapping experiment which consisted in asking participants (N=27) to tap the subjective beat while listening to 30 rhythmic passages. A pulse clarity score was calculated as the mean certainty of the model. After each trial participants were asked about task difficulty. We also calculated the within subject tapping precision as an empirical measurement of pulse clarity. The proposed metric correlated with similar spearman correlation coefficient than previous models with both collected measures. This positive result allows us to inspect music emotions that arise from changes in rhythm perception."
computational models; cognitive musicology; beat perception
3 - 3
A cross linguistic study of the production of turn taking cues in Slovak, Argentinian Spanish, and American English.
Pablo Brusco, Jazmin Vidal, Štefan Beňuš, Agustín Gravano
Pablo Brusco
"Humans produce and perceive acoustic and prosodic cues during dialogue. However, little of the dynamics and the cross-linguistic validity of these cues is known. In this work we explore and show the effect of different acoustic/prosodic cues preceding different turn transitions (holds, switch, and backchannels) using machine learning techniques as a descriptive tool in three languages: Slovak, American English, and Argentine Spanish. Results suggest that the three languages share acoustic/prosodic resources to signal turn transitions. We also rank the features in each language by order of contribution to the separation of classes of turn transitions. This study demonstrates that machine learning methods provide a powerful and efficient means for explaining how the dynamics of prosodic features relate to conversation flow."
descriptive machine learning, turn-taking, prosody
3 - 4
A Multi-Armed Bandit Approach for House Ads Decisions
Nicolás Aramayo, Mario Schiappacase, Marcel Goic
Nicolás Aramayo
In recent years, many websites have started to use a variety of recommendations systems to decide the content to display to their visitors. In this work we address this using a contextual combinatorial multi-armed bandit approach to select the combination of house ads to display in the homepage of a large retailer. House ads correspond to promotional information displayed on the retailer's website to highlight some category of products. As retailers are continuously changing their product assortment, they can benefit from dynamically deciding what products are more effective, thus treating this as a reinforcement learning problem benefits from the ability to learn efficiently which images perform well and quickly discard the least attractive ones. Moreover, the number of clicks they receive not only depends on their own attractiveness, but also on how attractive are other products displayed around them. Finally, using previous purchases of a fraction of customers, we implemented another version of our algorithm that personalized recommendations. We tested our methods in a controlled experiment where we compared them against an experienced team of managers. Our results show that our method implies a more active exploration of the decision space, but also significant increases in conversion rates.
recommendation systems, reinforcement learning, a/b test
3 - 5
A multi-task relational model for conversation classification
Felipe del Río, Álvaro Soto
Felipe Del Río
Every day millions of users interact online with each other generating a large number of interactions all over the internet. These discussion data pose a great opportunity as a source to get a variety of insights, as well as a great challenge on how to correctly obtain them. Most of the existing models are only focus on classification, without explaining why they are classifying in a certain way. This limits our ability to get insights from our models as well as a reduced trust in them. To attack this issue, we build two datasets. The first one based on a Reddit corpus, based on discussion threads on the platform, that is composed of the main task of classifying a thread into its subreddit, as well as an auxiliary task. Second, a dataset of news with their subsequent discussion, based on the chilean news outlet EMOL, in which the main task is to classify the controversiality of the news, and complemented with a variety of other auxiliary tasks. We proposed a model based on the transformer that can learn multiple tasks jointly, and can effectively be used in both of these datasets, and tested it on the Reddit dataset. We checked that our model is achieving better performance that our baseline as well as paying attention to relevant interactions in a conversation.
abstractive summarization, transformers, nlp
3 - 6
A Place to Go: Locating Damaged Regions after Natural Disasters through Mobile Phone Data
"Galo Castillo-López, María-Belén Guaranda, Fabricio Layedra, and Carmen Vaca"
María Belén Guaranda
Large scale natural disasters involve budgetary problems for governments even when local and foreign humanitarian aid is available. Prioritizing investment requires near real time information about the impact of the hazard in different locations. However, such information is not available through sensors or other devices specially in developing countries that do not have such infrastructure. A rich source of information is the data resulting from mobile phones activity that citizens in affected areas start using as soon as it becomes available post-disaster. In this work, we exploit such source of information to conduct different analyses in order to infer the affected zones in the Ecuadorian province of Manabí, after the 2016 earthquake, with epicenter in the same province. We propose a series of features to characterize a geographic area, as granular as a canton, after a natural disaster and label its level of damage using mobile phone data. Our methods result in a classifier based on the K-Nearest Neighbors algorithm to detect affected zones with a 75% of accuracy. We compared our results with official data published two months after the disaster
spatio-temporal analysis, mobile phone activity, disaster management
3 - 7
Advanced Transfer Learning Approach for Improving Spanish Sentiment Analysis
Daniel Palomino and Jose Ochoa-Luna
In the last years, innovative techniques like Transfer Learning have impacted strongly in Natural Language Processing, increasing massively the state-of-the-art in several challenging tasks. In particular, the Universal Language Model Fine-Tuning (ULMFiT) algorithm has proven to have an impressive performance on several English text classification tasks. In this paper, we aim at developing an algorithm for Spanish Sentiment Analysis of short texts that is comparable to the state-of-the-art. In order to do so, we have adapted the ULMFiT algorithm to this setting. Experimental results on benchmark datasets (InterTASS 2017 and InterTASS 2018) show how this simple transfer learning approach performs well when compared to fancy deep learning techniques.
sentiment analysis, transfer learning, spanish
3 - 8
Aggressive Language Identification in Social Media using Deep Learning
Errol Wilderd Mamani Condori
Errol Wilderd Mamani Condori
The increasing influence from users in social media has made that Aggressive content propagates over the internet. In a way to control and tackle this problem, recent advances in Aggressive and offensive language detection have found out that Deep Learning techniques get good performance as well as the novel Bidirectional Encoder Representations from Transformer called BERT. This work presents an overview of Offensive language detection in English and the Aggressive content detection using this novel approach from Transformer for the case study of Mexican Spanish. Our preliminary results show that pre-trained multilingual model BERT also gets good performance compared with the recent approaches in Aggressive detection track at MEX-A3T.
aggressive language, deep learning,social media
3 - 9
CharCNN Approach to Aspect-Based Sentiment Analysis for Portuguese
Ulisses Brisolara Corrêa and Ricardo Araújo
Ulisses Brisolara Correa
Sentiment Analysis was developed to support individuals in the harsh task of obtaining significant information from large amounts of non-structured opinionated data sources, such as social networks and specialized reviews websites. A yet more challenging task is to point out which part of the target entity is addressed in the opinion. This task is called Aspect-Based Sentiment Analysis. The majority of work focuses on coping with English text in the literature, but other languages lack re- sources, tools, and techniques. This paper focuses on Aspect-Based Sentiment Analysis for Accommodation Services Reviews written in Brazilian Portuguese. Our proposed approach uses Convolution Neural Networks with inputs in Character-level. Results suggest that our approach out- performs lexicon-based and LSTM-based approaches, displaying state- of-the-art performance for binary Aspect-Based Sentiment Analysis.
aspect-based sentiment analysis, char-level convolutional neural networks,
3 - 10
Clustering meteorological scenarios
"Matthieu Jonckheere Dominique Picard Vincent Lefieux Alfredo Umfurer Agustin Somacal Yamila Barrera"
Yamila Barrera
"The fluctuations in the temperature have a strong influence in the electric consumption. As a consequence, identifying and finding groups of possible climate scenarios is useful for the analysis of the electric supply system. The scenarios data that we are considering are time series of hourly measured temperatures over a grid of geographical points in France and neighboring areas. Clustering techniques are useful for finding homogeneous groups of times series but the challenge is to find a suitable data transformation and distance metric. In this work, we used several transformations (fourier, wavelets, autoencoders) and distance metrics (DTW and euclidean among others) and found consistent groups of climate scenarios using clustering techniques (k-medoids and k-means). We found that k-shape performs the best according a within cluster dispersion index. This is a joint work with RTE (Réseau de Transport d’Électricité), the electricity transmission system operator of France. "
non-supervised learning, clustering, temperature times series
3 - 11
Compare OCR Services
Orietha Castillo
Orietha Marcia Castillo Zalles
"Abstract Start a startup these days is not an easy job, there are many things we need to consider on this like choose a partner. There are entrepreneurs who have chosen Twitter as silent partner, beside that the startup needs press, growth, customer acquisition and rabid fans but entrepreneurs do not have time and money which makes the path more easy. Twitter is the guy who supports the entrepreneur have traffic to their sites for free, have the opportunity to network with potential clients, a Marketing expert Mark Schaefer said in his book “Known” that if you had to choose one distribution channel for your content, use Twitter. Is an excellent channel to communicate ideas, increase a positive branding is a key to stay in clients mind which will increase the potential clients, but these benefits are not the best thing Twitter can provide. Twitter by itself knows your client's thoughts what is really the most important thing. In this paper will review this topic in detail "
text and topic analysis
3 - 12
Conditioning visual reasoning through query based mechanisms
Sebastián Amenabar, Raimundo Manterola, Julio Hurtado, Francisco Rencoret and Alvaro Soto
Francisco Rencoret
"Deep Neural Networks learn to solve datasets which may contain different reasoning tasks, for example, VQ&A where questions may rely on counting, positioning, or existence reasoning. Even though they may sometimes learn effectively for simple tasks, they usually lack generalization capabilities for complex tasks which demand several reasoning steps. Their fixed weight structure limits the network to use different neurons for different types of reasoning. In this work, we propose a method to adaptively condition Neural Network's reasoning based on the information retrieved of an input. The proposed method helps the model carry out a variety of reasoning tasks generalizing better for complex tasks. Based on VQ&A, we test our hypothesis by conditioning visual reasoning in models that rely on iterative reasoning. On each reasoning step, the model attends the input and radically alters their visual reasoning. By transforming each convolutional filter, the model learns to specialize their visual reasoning for the arbitrary input and reasoning step."
conditional visual reasoning, selective feature transformation, vq&a
3 - 13
Contextual Hybrid Session-based News Recommendation with Recurrent Neural Networks
Gabriel Moreira, Dietmar Jannach, Adilson Marques da Cunha
Recommender systems help users deal with information overload by providing tailored item suggestions to them. The recommendation of news is often considered to be challenging, since the relevance of an article for a user can depend on a variety of factors, including the user’s short-term reading interests, the reader’s context, or the recency or popularity of an article. Previous work has shown that the use of RNNs is promising for the next-in-session prediction task, but has certain limitations when only recorded item click sequences are used as input. In this work, we present a hybrid, deep learning based approach for session-based news recommendation that is able to leverage a variety of information types. We evaluated our approach on two public datasets, using a temporal evaluation protocol that simulates the dynamics of a news portal in a realistic way. Our results confirm the benefits of considering additional types of information, including article popularity and recency, resulting in significantly higher recommendation accuracy and catalog coverage than other session-based algorithms. Additional experiments show that the proposed parameterizable loss function used in our method also allows us to balance two conflicting quality factors: accuracy and novelty.
recommender systems, deep learning, session-based news recommendation
3 - 14
Cost-sensitive Machine Learning
Emanuele Luzio
Emanuele Luzio
What is the difference between classification and decision making? We show how to calibrate a classifier, incorporating the economic context information into the model and transforming a classification model into a decision-making model.
decision-making, classification, economics
3 - 15
Deep Learning for Meteor Classification
Yuri Galindo, Ana Carolina Lorena
Yuri Galindo
The EXOSS (Exploring the Southern Sky) non profit organization manages a network of cameras across Brazil that automatically capture meteor images. The captures more often than not are of non meteor objects such as birds and planes, and are currently filtered by volunteers. Our research targets the classification of these images by applying Convolutional Neural Networks to the automatic captures, that are black and white, noisy, uncentered, and largely different from publicly available datasets. The objective is to develop a system that is capable of automatically filtering the captures, reducing human intervention to cases that represent uncertain classifications.
computer vision, image classification, deep learning
3 - 16
Deep Multiple Instance Learning for the Acoustic Detection of Tropical Birds using Limited Data
Jorge Castro, Roberto Vargas-Masis, Danny Alfaro Rojas
Jorge Castro
Deep learning algorithms have produced state of the art results for acoustic bird detection and classification. However, thousands of birds vocalizations have to be manually tagged by experts to train these algorithms. We use three strategies to reduce this manual work: simpler labels, fewer labels, and less labeled data. The Multiple Instance Learning (MIL) approach provides the framework to simplify and reduce the number of labels, as each recording (bag) is modeled as a collection of smaller audio segments (instances) and is associated with a single label that indicates if at least one bird was present in the recording. In this work, we propose a deep neural network architecture based on the MIL framework to predict the presence or absence of tropical birds in one minute recordings. As only a relatively small number of training observations (1600) are used to train the algorithm, we compare the performance of the network using several hand-crafted features.
deep learning, multiple instance learning (mil), bird detection
3 - 17
Deep Q-Learning in ROS for navigation with TurtleBot3.
Leopoldo Agorio, Juan Bazerque
Leopoldo Carlos Agorio Grove
"Our group is getting involved in the use of robots with the ROS operating system for machine learning applications and distributed algorithms. ROS connects with a simulation environment -Gazebo- with good fidelity in terms of the dynamics of actual robot platforms. This connection allows the robots to be trained offline through a series of simulated episodes, and then use the results online in the real world. In our poster we explain the Deep Q-Learning method developed by the ROBOTIS machine learning team, in which a robot learns to navigate towards a target avoiding a series of obstacles. We implement this technique using a Turtlebot3 platform and design our own robot world in which the robot is trained."
robotics, q-learning, navigation
3 - 18
Detecting Spatial Clusters of Disease Infection Risk Using Sparsely Sampled Social Media Mobility Patterns
Roberto Nalon (student), Renato Assuncao (myself), Daniel Neill, Wagner Meira
Renato Assuncao
Standard spatial cluster detection methods used in public health surveillance assign each disease case a single location (typically, the patient’s home address), aggregate locations to small areas, and monitor the number of cases in each area over time. However, such methods cannot detect clusters of disease resulting from visits to non-residential locations, such as a park or a university campus. Thus we develop two new spatial scan methods, the unconditional and conditional spatial logistic models, to search for spatial clusters of increased infection risk. We use mobility data from two sets of individuals, disease cases and healthy individuals, where each individual is represented by a sparse sample of geographical locations (e.g., from geo-tagged social media data). The methods account for the multiple, varying number of spatial locations observed per individual, either by non-parametric estimation of the odds of being a case, or by matching case and control individuals with similar numbers of observed locations. Applying our methods to synthetic and real-world scenarios, we demonstrate robust performance on detecting spatial clusters of infection risk from mobility data, outperforming competing baselines.
spatial scan statistics, social media data, spatial cluster detection
3 - 19
Detective conditions using multi-modal deep learning approach
Diana Mosquera
Diana Mosquera
"Analyzing the voice behind words represents a central aspect of human communication, and therefore key to intelligent machines. In the field of computational linguistics, research has addressed human-computer interaction by allowing machines to recognize features such as loudspeakers, social and non-verbal signals, speech emotion and prosody estimation. These concepts added to the sequence modeling, allows us to generate an early diagnosis of the cognitive condition of the human being. "
cognitive impairment, prosody models, sequence models
3 - 20
Diagnosing Mental Health Disorders using Deep Learning and Probabilistic Graphical Models
Juan Pavez, Simón Michell, Diego Acuña, Héctor Allende
Juan Pavez
" Mental illnesses are becoming one of the most common health concern among the world population, with important effects on the life of people suffering them. Despite the evidence of the efficacy of psychological and pharmacological treatments, mental illnesses are largely underdiagnosed and untreated, especially in developing countries. One important cause of this is the scarcity of mental health providers that can correctly diagnose and treat people in need of help. In this work, we developed a deep learning system to help in the differential diagnosis of mental disorders. Our system can analyze a patient description of symptoms written in natural language, and based on that, it can ask questions to confirm or refine the initial diagnosis made by the deep learning model. We trained our model on thousands of anonymous symptoms descriptions that we collected from various sources on the internet. The initial prediction is refined by asking symptoms confirmation questions that are extracted from a probabilistic graphical model built by experts based on the diagnostic manual DSM-5. Preliminary studies both on symptoms descriptions from the internet and on clinical vignettes extracted from psychiatry exams show very encouraging results."
deep learning, healthcare, natural language processing
3 - 21
Efficient Data Sampling for Product Title Classification with Unbalanced Classes and Partially Unreliable Labels
Tobias Veiga
Tobias Mesquita Silva da Veiga
Having a large corpus for training can be a great asset for developing efficient Machine Learning models. Despite that, if a corpus is too large, computational problems may arise. Sampling the data is a reasonable approach, but can become more complex when the problem has more restrictions. In the MercadoLibre Challenge 2019, not only the corpus was large but also the classes were very unbalanced and most of the labels were unreliable. The method here presented is a simple way to sample from large corpus while also taking these restrictions into account. Using this sampling method and a simple SGDClassifier from scikit-learn, the public score was 90.38% (enough to rank 13th in the public leaderboard). Internal validation and leaderboard scores were very similar with only less than 0.1% difference. To improve the score further to 90,13% (2nd place), an ensemble was used by combining a similar variation of the sampling method and a few different models.
text-classification, unbalanced-classes, unreliable-labels
3 - 22
Efficiently Improved Hierarchical Text with External Knowledge Integration
Kervy Rivas, Gina Bustamante, Arturo Oncevay, Marco Sobrevilla
Gina Bustamante
Hierarchical text classification has been addressed recently with increasingly complex deep neural networks, taking advantage solely of the annotated corpus and raw monolingual data for pre-training embeddings and language modelling transfer. We turn the focus towards the potential semantic information in the target class definitions at the different layers of the hierarchy, and proceed to exploit them directly from word embedding spaces. We identify that a less-complex deep neural network could achieve state-of-the-art results by integrating the target class embeddings in an on-fly prediction from the highest levels. Also, we analyse the relevance of integrating this kind of external knowledge into a flat text classification scenario. Even with a straightforward approach to interconnect external semantic information to the model, we overcome flat text classification baselines and previous work with more complex neural architectures in two well-known datasets.
text clarification, knowledge integration, semantic information
3 - 23
EpaDB: analysis of a database for automatic Assessment of pronunciation
Jazmin Vidal, Luciana Ferrer
Jazmin Vidal
In this paper, we describe the methodology for collecting and annotating a new database designed for conducting research and development on pronunciation assessment. We created EpaDB (English Pronunciation by Argentinians Database), which is composed of English phrases read by native Spanish speakers with different levels of English proficiency. The recordings are annotated with ratings of pronunciation quality at phrase-level and detailed phonetic alignments and transcriptions indicating which phones were actually pronounced by the speakers. We present inter-rater agreement, the effect of each phone on overall perceived non-nativeness, and the frequency of specific pronunciation errors.
Pronunciation Scoring, Databases, Phone-level
3 - 24
Exploring Double Cross Cyclic Interpolation in Unpaired Image-to-Image Translation
Jorge López, Antoni Mauricio, Guillermo Cámara .
Jorge Roberto López Cáceres
The unpaired image-to-image translation consists of transferring a sample an in the domain A to an analog sample b in domain B without intensive pixel-to-pixel supervision. The current vision focuses on learning a generative function that maps both domains but ignoring the latent information, although its exploration is not explicit supervision. This paper proposes a cross-domain GAN-based model to achieve a bi-directional translation guided by latent space supervision. The proposed architecture provides a double-loop cyclic reconstruction loss in an exchangeable training adopted to reduce mode collapse and enhance local details. Our proposal has outstanding results in visual quality, stability, and pixel-level segmentation metrics over different public datasets.
unpaired image-to-image translation, generative adversarial networks, latent space
3 - 25
FastDVDnet: Towards Real-Time Video Denoising Without Explicit Motion Estimation
Matias Tassano, Julie Delon, Thomas Veit
Matias Tassano
We propose FastDVDnet, a state-of-the-art video denoising algorithm based on a convolutional neural network architecture. Until recently, video denoising with neural networks had been a largely under explored domain, and existing methods could not compete with the performance of the best patch-based methods. Our approach shows similar or better performance than other state-of-the-art competitors with significantly lower computing times. In contrast to other existing neural network denoisers, our algorithm exhibits several desirable properties such as fast runtimes, and the ability to handle a wide range of noise levels with a single network model. The characteristics of its architecture make it possible to avoid using a costly motion compensation stage while achieving excellent performance. The combination between its denoising performance and lower computational load makes this algorithm attractive for practical denoising applications.
video denoising, cnn, residual learning
3 - 26
From medical records to research papers: A literature analysis pipeline for supporting medical genomic diagnosis processes
Fernando López Bello, Hugo Naya, Víctor Raggio, Aiala Rosá
Fernando López Bello
In this paper, we introduce a framework for processing genetics and genomics literature, based on ontologies and lexical resources from the biomedical domain. The main objective is to support the diagnosis process that is done by medical geneticists who extract knowledge from published works. We constructed a pipeline that gathers several genetics- and genomics-related resources and applies natural language processing techniques, which include named entity recognition and relation extraction. Working on a corpus created from PubMed abstracts, we built a knowledge database that can be used for processing medical records written in Spanish. Given a medical record from Uruguayan healthcare patients, we show how we can map it to the database and perform graph queries for relevant knowledge paths. The framework is not an end user application, but an extensible processing structure to be leveraged by external applications, enabling software developers to streamline incorporation of the extracted knowledge.
health records, natural language processing, medical terminology
3 - 27
Generative Adversarial Networks for Image Synthesis and Semantic Segmentation in Brain Stroke Images
Israel Chaparro, Javier Montoya
Israel Nazareth Chaparro Cruz
Brain stroke was classified as 2nd cause of death in 2016, automated methods that can locate and segment strokes could aid clinician decisions about acute stroke treatment. Most medical images datasets are limited, smalls and have a severe class imbalance, this limits the development of medical diagnostic systems. Generative Adversarial Networks (GANs) are one of the hottest topics in artificial intelligence and can learn how to produce data. This work presents a conditional image synthesis with GANs for brain stroke image analysis and class balancing; furthermore, presents a novel training framework for segmentation with GANs.
generative adversarial networks, image synthesis, image segmentation
3 - 28
How Important is Motion in Sign Language Translation?
Jefferson Rodríguez and Fabio Martínez
Jefferson Rodríguez
More than 70 million people use at least one Sign Language (SL) as their main channel of communication. Nevertheless, the absence of effective mechanisms to translate massive information among sign, written and spoken languages is the main cause of exclusion of deaf people into society. Thanks to recent advances, sign recognition has moved from a naive isolated sign recognition problem to a structured end-to-end translation. Today, the continuous SL recognition is an open research problem because of multiple spatio-temporal variations, challenging visual sign characterization, as well as the non-linear correlation between signs. This work introduces a compact sign to text approach that explores motion as an alternative to support SL translation. In contrast to appearance-based features, the proposed representation allows focused attention on main spatio-temporal regions relevant to a corresponding word. Firstly, a 3D-CNN network codes optical flow volumes to highlight sign features. Then, an encoder-decoder architecture is used to couple visual motion-sign information with respective texts. From a challenging dataset with more than 4000 video clips, motion-based representation outperforms appearance-based representation achieving 47.51 and 56.55 on WER and Blue-4 score.
sign language translation, motion patterns, encoder-decoder architecture,
3 - 30
Language-Agnostic Visual-Semantic Embeddings
Jônatas, Wehrmann and Rodrigo C. Barros
Jônatas Wehrmann
This paper proposes a framework for training language-invariant cross-modal retrieval models. We also introduce a novel character-based word-embedding approach, allowing the model to project similar words across languages into the same word-embedding space. In addition, by performing cross-modal retrieval at the character level, the storage requirements for a text encoder decrease substantially, allowing for lighter and more scalable retrieval architectures. The proposed language-invariant textual encoder based on characters is virtually unaffected in terms of storage requirements when novel languages are added to the system. Our contributions include new methods for building character-level-based word-embeddings, an improved loss function, and a novel cross-language alignment module that not only makes the architecture language-invariant, but also presents better predictive performance. We show that our models outperform the current state-of-the-art in both single and multi-language scenarios. This work can be seen as the basis of a new path on retrieval research, now allowing for the effective use of captions in multiple-language scenarios.
multimodal learning, deep neural networks, language agnostic learning
3 - 31
Mining Opinions in the Electoral Domain based on social media
Jéssica Soares dos Santos, Aline Paes, Flavia Bernardini
Jéssica Soares dos Santos
Election polls are the de facto mechanisms to predict political outcomes. Traditionally, these polls are conducted based on a process that includes personal interviews and questionnaires. Taking into account that such a process is costly and time-demanding, many alternative approaches have been proposed to the traditional way of conducting election polls. In this research, we focus on the methods that use social media data to infer citizens’ votes. As the main contribution, this research presents the state-of-the-art of this area by comparing social media-based mechanisms to predict political outcomes taking into account the quantity of collected data, the specific social media used, the collection period, the algorithms adopted, among others. This comparison allows us to identify the main factors that should be considered when forecasting elections based on social media content and the main open issues and limitations of the strategies found in the literature. In brief, the main challenges that we have found include (but are not limited to): labeling data reliably during the short period of campaigns, absence of a robust methodology to collect and analyze data, and a lack of a pattern to evaluate the obtained results.
sentiment analysis, opinion mining, election outcomes
3 - 32
Object removal from complex videos using a few annotations
Thuc Trinh LE, Andrés ALMANSA, Yann GOUSSEAU & Simon MASNOU
Andres Almansa
"We present a system for the removal of objects from videos. As input, the system only needs a user to draw a few strokes on the first frame, roughly delimiting the objects to be removed. To the best of our knowledge, this is the first system allowing the semi-automatic removal of objects from videos with complex backgrounds. The key steps of our system are the following: after initialization, segmentation masks are first refined and then automatically propagated through the video. Missing regions are then synthesized using video inpainting techniques. Our system can deal with multiple, possibly crossing objects, with complex motions, and with dynamic textures. This results in a computational tool that can alleviate tedious manual operations for editing high-quality videos. More information here"
video inpainting, object removal, semantic segmentation
3 - 33
Persona-oriented approach to building a paraphrase corpus
Rossana Cunha, Adriana Pagano, Fabio Alves
Rossana Cunha
Paraphrasing shares intertwined perspectives on Linguistics and Natural Language Processing (NLP). In a broader sense, paraphrases are expressions that share approximately the same meaning. Several NLP tasks comprise paraphrasings such as paraphrase identification, text simplification, textual entailment, and semantic textual similarity. In this study, we explain how persona use benefited the compilation and alignment of a Brazilian Portuguese paraphrase corpus in the education of diabetes self-management domain. Our main objectives are constructing a paraphrase corpus for meeting the needs of healthcare professionals, patients, and families. The corpus consists of pairs of three groups of real users - (i) doctors/expert readers, (ii) nurses and healthcare assistants, and (iii) patients/lay readers. We combine the Systemic Functional Theory (Halliday and Matthiessen, 2014) with semantic-based NLP approaches in order to recognize paraphrase relationships. Finally, a Committee of domain experts (Linguists, Health professionals) further evaluates these pairs of sentences in order to validate our approach. Our experiments show preliminary results of a monolingual corpus aligned with expert, specialist, and lay discourse on the diabetes mellitus self-care domain.
natural language processing, persona, paraphrase corpus
3 - 34
PTb-Entropy: Leveraging Phase Transition of Topics for Event Detection in Social Media
Pedro H. Barros, Isadora Cardoso-Pereira, Hector Allende-Cid, Osvaldo A. Rosso and Heitor S. Ramos,
Pedro Henrique Silva Souza Barros
Social Media has gained increasing attention in the last years. It allows users to create and share information in an unprecedented way. Event detection in social media, such as Twitter, is related to the identification of the first story on a topic of interest. In this work, we propose a novel approach based on the observation that tweets are subjected to a continuous phase transition when an event takes place, i.e., its underlying model changes. This work proposes a new method to detect events in Twitter based on calculating the entropy of the keywords extracted from the content of tweets to classify the most shared topic as an event or not. We present a theoretical rationale about the existence of phase transitions, as well as the characterization of phase transitions with synthetic models and also with real data. We evaluated the performance of our approach using seven data sets, and we outperformed nine different techniques present in the literature.
event detection, social media analysis, phase transition
3 - 35
Relation extraction in Spanish radiology reports
Viviana Cotik, Javier Minces Müller
Viviana Cotik
"The number of digitized texts in the clinical domain has been growing steadily in the last years, due to the adoption of clinical information systems. Information extraction from clinical texts is essential to support clinical decisions and is important to improving health care. The scarcity of lexical resources and corpora, the informality of texts, the polysemy of terms and the abundance of non-standard abbreviations and acronyms, among others, difficult the task. For languages other than English, the challenges are usually more important. In this work, we present three different methods developed to perform relation extraction among clinical findings and anatomical entities in Spanish clinical reports: a baseline method based on co-occurrence of entities, a rule-based method and a work in progress based on convolutional neural networks. As data, we use a set of Spanish radiology reports, previously annotated by us. "
relation extraction, spanish radiology reports, bionlp
3 - 36
Revisiting Latent Semantic Analysis word-knowledge acquisition
Edgar Altszyler, Diego Fernandez-Slezak
Edgar Altszyler
"Latent Semantic Analysis (LSA) is one of the most widely used corpus-based methods for word meaning representation (word-embeddings). Landauer and Dumais published in 1997 the foundational work ``A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge''. In this paper, they compare the word-knowledge acquisition between LSA and that of children’s, and claimed that most of the knowledge acquired comes from indirect associations (high-order co-occurrences). To this day LSA continues to be intensively used in the computational psychology community as a plausible model of vocabulary acquisition. In this work, we revisit Landauer and Dumais (1997) experiments and discuss about some technical elements that call into question the presence of indirect learning processes in LSA word-knowledge acquisition. We support our discussion with new experiments that shed light on the matter"
latent semantic analysis, vocabulary acquisition, higher-order co-occurrence
3 - 37
Single image deblurring with convolutional neural networks
Guillermo Carbajal, Mauricio Delbracio, José Lezama, Pablo Musé
Guillermo Carbajal
Single image deblurring is a well studied task in computer vision and image processing. Blur may be caused by camera shake, object motion or out-of-focus. In general, deblurring is a challenging inverse problem that is severely ill-posed. When blur across the image can be considered to be uniform, traditional methods produce satisfactory results. However, in the more general non-uniform case, state-of-the-art deblurring methods are end-to-end trainable convolutional neural networks. Those networks learn a nonlinear mapping from low quality and high quality image pairs. Long-exposure blurry frames are generated through averaging consecutive short-exposure frames from videos captured by high-speed cameras, e.g. GoPro Hero 4 Black. These generated frames are quite realistic since they can simulate complex camera shake and object motion, which are common in real photographs. Although producing impressive results in some cases, their performance remains irregular and dependent on each case. In this project we investigate whether it is possible to improve networks performance by incorporating prior knowledge in the training process.
image deblurring; restoration; cnn
3 - 38
Syntactic Analysis and Semantic Role Labeling for Spanish using Neural Networks
Luis Chiruzzo and Dina Wonsever
Luis Chiruzzo
We developed a neural network architecture for parsing Spanish sentences using a feature based grammar. The architecture consists of several LSTM neural network models that produce syntactic analysis and semantic role labeling: first determine how to split a sentence into a binary tree, then assign the rule to be applied for each pair of branches, finally determine the argument structure of the resulting segments. We analyze two variants of this architecture and conclude that merging the split and rule identification models yields better results than training them separately. We train and evaluate the performance of these models against two Spanish corpora: the AnCora corpus and the IULA treebank.
parsing, spanish, lstm
3 - 39
The Encoder-Decoder Model Applied to Brazilian-Portuguese Verbal Irregularities
Beatriz Albiero
"Inspired by the controversial debate about the acquisition of irregular verbs in the English language, this research aims to study the inflection process of irregular verbs in the Brazilian Portuguese language through the use of the Encoder-Decoder model. To do this, we propose the task of predicting an inflected verbal form given a primary form (Stem + Thematic Vowel). To do this, we built a corpus that consisted of 423 verbs that were marked as belonging to either regular (51%) or irregular (49%) groups. Moreover, within the scope of irregular verbs, it was possible to identify 15 subgroups through the identification of inflection patterns. We also built a specific phonetic notation so that verbs could be associated with new representations that included information related to the phonetic features present. Thus, the proposed model attempts to predict inflected forms by identifying the phonetic relationships involved during the inflection process. The model was submitted to multiple trainings and tests and presented an average accuracy of 13.55%. Considering the segmentation between regular and irregular verbs, the model performed better among the regular class (17.88% vs 9.23%). "
computational linguistics, phonetics, connectionism
3 - 40
Towards goal-oriented dialog systems that consider their visual context
Luciana Benotti and Mauricio Mazuecos
Mauricio Diego Mazuecos Perez
Research in deep neural networks has made great progress in the area of computer vision in the last decade. There are preliminary works that make use of these advances to allow a dialogue system to talk about what is "observed" in an image. So far these systems are usually limited to answering questions about the image. In this project we investigate the generation of goal-directed questions that refer to the visual context of an image. We analyze how the visual context contributes to disambiguating the use of situated language. Finally, we model how the goal of the task influences the “salience” of the visual context and how the visual context restricts the range of possible clarification requests in a dialog.
natural language question generation, visually grounded dialog, reward shaping for reinforcement learning
3 - 41
Unraveling Antonym’s Word Vectors through a Siamese-like Network
Mathias Etcheverry and Dina Wonsever
Mathias Etcheverry
Discriminating antonyms and synonyms is an important NLP task that has the difficulty that both, antonyms and synonyms, contains similar distributional information. Consequently, pairs of antonyms and synonyms may havesimilar word vectors. We present an approach to unravel antonymy and synonymy from word vectors based on a siamese network inspired approach. The model consists of a two-phase training of the same base network: a pre-training phase according to a siamese model supervised by synonyms and a training phase on antonyms through a siamese-like model that supports the antitransitivity present in antonymy. The approach makes use of the claim that the antonyms in common of a word tend to be synonyms. We show that our approach outperforms distributional and pattern based approaches, relaying on a simple feed forward network as base network of the training phases.
word embeddings, synonym/antonym detection, siamese/parasiamese neural network
3 - 42
Unsupervised anomaly detection in 2D radiographs using generative models
"Laura Estacio-Cerquin, Moritz Ehlke, Alexander Tack, Stefan Zachow, Hans Lamecker, Rensso Mora-Colque"
Laura Jovani Estacio Cerquin
"Anomaly detection in medical images plays an important role in the development of biomedical applications. One interesting example is computed-aided diagnosis tools which are used in clinical routines process for detecting and diagnosis pathologies, disease evaluations, and treatment planning. This kind of application requires manual identification of anomalies by experts which is tedious, prone to error and time-consuming task. Therefore, this research field still poses highly challenging identification problems, which will be addressed in this research. The fundamental hypothesis is that manual identification problems could be solved using unsupervised methods in order to require minimal interaction by medical experts. We focus on the prosthesis and foreign objects identification located in the pelvic bone using X-ray images. The main idea is to use generative models such as convolutional autoencoders and variational autoencoders to reproduce X-rays without anomalies. Thereby, if a new X-ray image has an anomaly by subtraction between the input image and the reconstructed image we will be able to identify it. Preliminary results show good performance in the anomaly detection process."
anomaly detection, generative models, medical images.
3 - 43
Using Contextualized Word Embeddings to detect Hate Speech in Social Media
Juan Manuel Pérez, Franco Luque, Agustín Gravano
Juan Manuel Pérez
" Hate speech (also known as cyber bullying) is a pervasive phenomenon on the Internet. Racist and sexist discourse are a constant in Social Media, with peaks documented after “trigger” events, such as murders with religious or political reasons, or other events related to the affected groups. Interventions against this phenomenon (such as Reddit's ban on 2015) have been proved effective to restrain its proliferation. Due to the amount of content generated in Social Media, automatic tools are crucial to reduce human effort in the detection of abusive speech. In this work we present a classifier of hate speech based on recurrent neural networks and contextualized word-embeddings. We use data from a recent competition (HatEval @ SemEval 2019) achieving slightly better results in Spanish. Moreover, we analyze the behaviour of our neural model trying to understand where it is failing to detect hate speech. "
nlp, hate speech, contextualized embeddings
3 - 44
Video Segmentation with Complex Networks
Josimar Chire Saire
Josimar Chire
"Nowadays, the quantity of multimedia files(images, videos, audio, etc.) is increasing everyday. Then, It is necessary to analyze if there is one issue related to find. Focus on video camera surveillance, the quantity of cameras in the cities has exploded, after of some crime usually people goes to video records to find some evidence related to the crime. Complex Networks is an approach to analyze phenomenons considering the inner relationships represented by graphs. The objective of this work is combine techniques from Image Processing with Complex Networks and Machine Learning(K-Means) to analyze surveillance video and perform automatic segmentation to a posterior analysis. The initial performed experiments shows the capacity of automatic segmentation using Complex Network representation."
complex networks, machine learning, video segmentation
3 - 45
Winograd Schemas in Portuguese
Gabriela S. de Melo, Vinicius A. Imaizumi, Fabio G. Cozman
Gabriela Souza de Melo
"The Winograd Schema Challenge has become a common benchmark for question answering and natural language processing. The original set of Winograd Schemas was created in English; in order to stimulate the development of Natural Language Processing in Portuguese, we have developed a set of Winograd Schemas in Portuguese. We have also adapted solutions proposed for the English-based version of the challenge so as to have an initial baseline for its Portuguese-based version; to do so, we created a language model for Portuguese based on a set of Wikipedia documents."
winograd schema challenge, natural language processing, deep learning
3 - 46
Learning the operation of energy storage systems from real trajectories of demand and renewables
Agustin Castellano, Juan Andrés Bazerque
Agustin Castellano
Storage systems at the grid level have the potential to increase the power system performance in many aspects, including arbitrage of energy, frequency stabilization, and stable island operation. When grid operators plan the investments for expansion, they must compare these benefits to the cost of instal- lation of massive storage. We take on this question by analyzing the potential savings by arbitrage of energy, focusing on a single- bus model of the grid that includes storage, fuel and hydro- based generators, and renewables. A storage dispatch policy is optimized via the q-learning algorithm under a cyclostationary model of the random variables. Our algorithm starts with no prior knowledge of the system and progresses to take actions that act with regards to the expected state of the system in the future. The learning agent is trained with real trajectories of demand and renewables collected from the Uruguayan power system over three years, and with a fitted cost that accounts for the actual aggregated price of energy at the Uruguayan generation market. The learned policy operates the storage system achieving lower operational costs compared to a grid with no batteries.
Keywords - Q-Learning, Reinforcement learning, Energy storage systems
3 - 47
The use of computer vision for soil analysis from pfeiffer chromatographs
Nathália Ferreira de Figueiredo, Wallinson Deives Batista Lima, Liomar Renner de Araújo Rabelo, Oderlan Freire de Sousa, João Vitor de Araújo Rocha
Nathália Ferreira de Figueiredo
The SharinAgro project was created to provide assistance for CSA's. CSA stands for Community Supported Agriculture, witch is a worldwide social way of producing food that puts in direct contact, farmers and consumers. In the CSAs, consumers become supporters of farmers, like owning a piece of the farm, collaborating with a monthly or weekly fee to help with cost of food production. The CSA then produces baskets of organic vegetables, with fruits, vegetables, honey and other foods of natural origin at cheaper price, and also establishes a trusting relationship with producers. With this in mind, we want to use technology to create a software to help providing financial and administrative control to CSAs. Through SharinAgro's application, users will have access to functionalities such as organization of human resources and control of operational expenses. In this way we help the CSA, to have more financial security, as regards the loss of harvest by pests, environmental inclemencies and others. And on top of that, we propose a feature for soil health analysis by using Pfeiffer’s chromatography. Using machine learning we aim to to the user correct prediction for agricultural management purposes.
computer vision, pfeiffer chromatography, soil analysis
3 - 48
Quanam Data & Analytics
Collaborative tool for physicians, to streamline access to latest medical science findings. Based on paper
health records, natural language processing, medical terminology