In my genomics series, I have tried to unravel, what happened in late 2019 and early 2020, as the team around Prof. Zhang and the Chinese CDC identified the first sequence claimed to be the genome of SARS-CoV-2.
To this date, several of us have not been able to reproduce the exact contigs used to produce the current reference genome of the first SARS-CoV-2 genome variant.
Also, nobody has to date shown that the ends of the genome can be aligned perfectly with multiple reads, thus showing that the genome itself exists as a dedicated entity.
In a recent Twitter debate, Kevin McKernan said, that the reads would have to be trimmed by the four (or even five as shown in the red box) random bases, that were added by the Takara protocol, in order to properly align them to the head of the genome.
However, further investigation of these claims reveal:
The statement of trimming of four nucleotides, invalidates McKernan’s earlier statement, that adaptors would not ligate (bind) to the ends of the genome in the first place. (Of course, that’s why the random adaptors are used, but it would still make the statement irrelevant, as this would not be a problem in this case.)