On this paper, we examine how the output illustration of an end-to-end neural community impacts multilingual computerized speech recognition (ASR). We research totally different representations together with character-level, byte-level, byte pair encoding (BPE), and byte- stage byte pair encoding (BBPE) representations, and analyze their strengths and weaknesses. We deal with growing a single end-to- finish mannequin to assist utterance-based bilingual ASR, the place audio system don’t alternate between two languages in a single utterance however could change languages throughout utterances. We conduct our experiments on English and Mandarin dictation duties, and we discover that BBPE with penalty schemes can enhance utterance-based bilingual ASR efficiency by 2% to five% relative even with smaller variety of outputs and fewer parameters. We conclude with evaluation that signifies instructions for additional enhancing multilingual ASR.