Hierarchical attention networks for document classification
Answers
for document classification. Our model has
two distinctive characteristics: (i) it has a hier-
archical structure that mirrors the hierarchical
structure of documents; (ii) it has two levels
of attention mechanisms applied at the word-
and sentence-level, enabling it to attend dif-
ferentially to more and less important con-
tent when constructing the document repre-
sentation. Experiments conducted on six large
scale text classification tasks demonstrate that
the proposed architecture outperform previous
methods by a substantial margin. Visualiza-
tion of the attention layers illustrates that the
model selects qualitatively informative words
and sentences.
Text classification is one of the fundamental task in
Natural Language Processing. The goal is to as-
sign labels to text. It has broad applications includ-
ing topic labeling (Wang and Manning, 2012), senti-
ment classification (Maas et al., 2011; Pang and Lee,
2008), and spam detection (Sahami et al., 1998).
Traditional approaches of text classification repre-
sent documents with sparse lexical features, such
as n-grams, and then use a linear model or kernel
methods on this representation (Wang and Manning,
2012; Joachims, 1998). More recent approaches
used deep learning, such as convolutional neural net-
works (Blunsom et al., 2014) and recurrent neural
networks based on long short-term memory (LSTM)
(Hochreiter and Schmidhuber, 1997) to learn text
representations.
Explanation:
✍️✍️✍️
We propose a hierarchical attention network
for document classification. Our model has
two distinctive characteristics: (i) it has a hier-
archical structure that mirrors the hierarchical
structure of documents; (ii) it has two levels
of attention mechanisms applied at the word-
and sentence-level, enabling it to attend dif-
ferentially to more and less important con-
tent when constructing the document repre-
sentation. Experiments conducted on six large
scale text classification tasks demonstrate that
the proposed architecture outperform previous
methods by a substantial margin. Visualiza-
tion of the attention layers illustrates that the
model selects qualitatively informative words
and sentences.
Text classification is one of the fundamental task in
Natural Language Processing. The goal is to as-
sign labels to text. It has broad applications includ-
ing topic labeling (Wang and Manning, 2012), senti-
ment classification (Maas et al., 2011; Pang and Lee,
2008), and spam detection (Sahami et al., 1998).
Traditional approaches of text classification repre-
sent documents with sparse lexical features, such
as n-grams, and then use a linear model or kernel
methods on this representation (Wang and Manning,
2012; Joachims, 1998). More recent approaches
used deep learning, such as convolutional neural net-
works (Blunsom et al., 2014) and recurrent neural
networks based on long short-term memory (LSTM)
(Hochreiter and Schmidhuber, 1997) to learn text
representations.