Example 1 – CDKN2B-AS1’s DNA binding sites at the CDKN2A/2B region

Input lncRNA and DNA sequences: Human CDKN2B-AS1 (also called ANRIL) + Human CDKN2A/2B region (hg19).
LongTarget parameters: The default setting, except Nt=40.
Filter conditions: No.
Result: The TTS distribution of TFO1.
Background: RefSeq Genes, CpG Islands, and ENCODE DNA Methylation and ENCODE Histone Modification in some cell lines in the UCSC Genome Browser.

Example 2 – A permutation test

In all cases the default LongTarget parameter setting is used, except Nt=20. Triplexes are generated by TFO1 (also called the best TFO) of CDKN2B-AS1 at the CDKN2A/2B region (as in Example 1).
(top panel) In (AB), blue and red dots at point 1 indicate numbers of triplex generated by all rulesets and by the dominant ruleset (also called the best ruleset) R12. In (A), dots at point 2-101 indicate numbers of triplex generated by the original lncRNA sequence and 100 shuffled DNA sequences. In (B), dots at point 2-101 indicate numbers of triplex generated by 100 shuffled lncRNA sequences and the original DNA sequence. When the DNA sequence is shuffled, in every case the TFO1 generates few triplexes. When the lncRNA sequence is shuffled, in three cases (at points 14, 58, and 65) the TFO1 generates considerable triplexes, many of which are generated exactly by R12.
(bottom panel) Notably, the TTSs generated in the cases at points 14, 58, and 65 (B, C, and D) have the identical distributions to that generated in the case 1 (A).

Here shows the results of two more rounds of permutation test with the same parameters and inputs. In (AC) when the DNA sequence is shuffled 100 times, few triplexes are generated. In (B) when the lncRNA sequence is shuffled 100 times, in two cases (at point 3 and 60) the TFO1 generates many triplexes with R9 and R12 being the best and second best ruleset. In the case of point 3 R9 and R12 determine 22 and 8 A-rich triplexes, and in the case at point 60 R9 and R12 determine 45 and 32 A-rich triplexes.
In (D) the lncRNA sequence is further shuffled 100 times and in several cases the TFO1 generates many triplexes. In the case at point 41 the best and second best ruleset are R9 and R12 and determine 20 and 18 A-rich triplexes, but in the case at point 80 triplexes are determined by diverse rulesets (including H6). When in the case at point 41 triplexes < 40 bp are removed, R12 becomes the best ruleset (E). R9 and R12 differ only in GC-C and GC-U, and triplexes generated by both are A-rich but contain different numbers of C and T.
The permutation test lends LongTarget strong support. In addition, the results indicate that whenever and wherever an A-rich TFO is formed by sequence shuffle, the TFO generates triplexes by ruleset R12 or R9, and these triplexes are densely distributed at CDKN2A’s promoter (the triplex distributions determined by R9 at point 60 in (B) and at point 41 in (D) are shown in F). If such A-rich TFO is not formed, no strong triplexes are generated at this genomic region. The triplexes generated by shuffled lncRNA sequences may be assumed false positives, but the above analysis indicates that in the 7 cases (3 in the last page, 2 in (B) and 2 in (D) in this page) the triplexes should be true positives.

Example 3 - H19’s DNA binding sites at the GTAM region

Input lncRNA and DNA sequences: human H19 and human GTAM region (hg19).
LongTarget parameters: The default setting.
Filter condition: No.
Results: The TTS distribution of TFO1.
Background: RefSeq Genes, CpG islands, ENCODE Histone Modification, and ENCODE DNA Methylation signals in some cell lines in the UCSC Genome Browser.

Example 4 – TTS distributions generated by different LongTarget parameters

In each panel from the top to the bottom are results generated by parameters “Identity=60, offset=15, Nt=50”, “Identity=60, offset=15, Nt=70”, “Identity=70, offset=15, Nt=50”, and “Identity=80, offset=15, Nt=50”. The condition “Identity=80” caused many reasonable TTSs to disappear.

Example 5 – PTEN1-asRNA’s DNA binding sites at the PTEN region

Input lncRNA and DNA sequences: human PTEN1-asRNA and human PTEN region (hg19).
LongTarget parameters: The default setting.
Filter condition: No.
Results: The TTS distribution of TFO1.
Background: RefSeq Genes, CpG islands, ENCODE Histone Modification, and ENCODE DNA Methylation signals in some cell lines in the UCSC Genome Browser.

An experimental study identified that PTENpg1 encodes two antisense RNAs (here called PTENP1-asRNA), and one of them localizes to the PTEN promoter to epigenetically modulate PTEN transcription by recruitment of DNMT3a and EZH2 (Johnsson et al., 2013). Four years later it was reported that PTENP1-asRNA exon1 binds to the PTEN promoter (Lister et al., 2017). LongTarget predicts that PTENP1-asRNA has a clear TTS at the PTEN promoter and the TFO1 is located exactly in exon1. This TTS overlaps DNA methylation and histone medication signals in multiple cell lines (top), and the exon1 exists only in humans and chimpanzees (bottom).

Example 6 – Binding sites of 13562 human lncRNAs at the CDKN2A/2B region

Input lncRNA and DNA sequences: The 13562 GENCODE-annotated human lncRNAs and human CDKN2A/2B region (hg19).
LongTarget parameters: The default setting.
Filter condition: To remove TTS whose total area < 400 or height < 30.
Results: The TTS distribution of TFO1.
Background: RefSeq Genes, CpG islands, ENCODE Histone Modification, and ENCODE DNA Methylation signals in some cell lines in the UCSC Genome Browser.

Example 7 – Binding sites of12196 marmoset lncRNAs at marmoset CDKN2A/2B region

Input lncRNA and DNA sequences: The 12196 orthologues of the 13562 GENCODE-annotated human lncRNAs in marmoset and marmoset CDKN2A/2B region (WUGSC 3.2/calJac3).
LongTarget parameters: The default setting.
Filter condition: To remove TTS whose total area < 400 or height < 30.
Results: The TTS distribution of TFO1.
Background: Ensembl Genes, CpG islands, RepeatMasker in the UCSC Genome Browser.

Example 8 – Binding sites of 83 lncRNAs on human Y chromosome at the CDKN2A/2B region

Input lncRNA and DNA sequences: The 83 GENCODE-annotated lncRNAs on human Y chromosome and human CDKN2A/2B region (hg19).
LongTarget parameters: The default setting.
Filter condition: No.
Results: The TTS distribution of TFO1.
Background: RefSeq Genes, CpG islands, RepeatMasker in the UCSC Genome Browser.

The heights of most TTS are under 10, indicating these TTS are too weak to be true binding sites, only one TTS of TTTY8 and TTTY8B has a height close to 83. But this TTS is at a Simple Repeats, indicating it may be a false positive (highly enriched dinucleotides in some Simple Repeats may happen to be base-paired to some lncRNAs). It can be concluded that this example does not generate any sensible results (normally heights of TTS should exceed 30), and indeed the 83 Y chromosome lncRNAs have no known functions in the CDKN2A/2B region. This “negative” example further supports the validity of LongTarget.