# Containment
You can calculate n-gram counts using count vectorization, and then follow the formula for containment:
containment A = answer text S = source text
If the two texts have no n-grams in common, the containment will be 0, but if all their n-grams intersect then the containment will be 1. Intuitively, you can see how having longer n-gram's in common, might be an indication of cut-and-paste plagiarism.
def containment(ngram_array):
''' Containment is a measure of text similarity. It is the normalized,
intersection of ngram word counts in two texts.
:param ngram_array: an array of ngram counts for an answer and source text.
:return: a normalized containment value.'''
count_ngram_a = ngram_array[0] == 1
numerator = sum(count_ngram_a==ngram_array[1])
# your code here
print (count_ngram_a,numerator)
return numerator / sum(count_ngram_a)
# row_0 = text 1
# row_1 = text 2
ngram_array = array([ [1, 1, 1, 0, 1, 1],
[0, 0, 1, 1, 1, 1]
], dtype=int64)