Whereas large-scale neural language fashions, reminiscent of GPT2 and BART, have achieved spectacular outcomes on numerous textual content era duties, they have a tendency to get caught in undesirable sentence-level loops with maximization-based decoding algorithms (e.g., grasping search). This phenomenon is counter-intuitive since there are few consecutive sentence-level repetitions within the human corpus (e.g., 0.02% in Wikitext-103). To analyze the underlying causes for producing consecutive sentence-level repetitions, we research the connection between the chance of repetitive tokens and their earlier repetitions in context. By our quantitative experiments, we discover that 1) Fashions have a choice to repeat the earlier sentence; 2) The sentence-level repetitions have a self-reinforcement impact: the extra occasions a sentence is repeated within the context, the upper the chance of continuous to generate that sentence; 3) The sentences with increased preliminary possibilities normally have a stronger self-reinforcement impact. Motivated by our findings, we suggest a easy and efficient coaching technique DITTO (PseuDo-RepetITion PenalizaTiOn), the place the mannequin learns to penalize possibilities of sentence-level repetitions from artificial repetitive knowledge. Though our technique is motivated by mitigating repetitions, our experiments present that DITTO not solely mitigates the repetition concern with out sacrificing perplexity, but additionally achieves higher era high quality. Intensive experiments on open-ended textual content era (Wikitext-103) and textual content summarization (CNN/DailyMail) show the generality and effectiveness of our technique.