Welcome to lncRNA.smu.edu.cn !

This website currently consists of two software – LongTarget and LongMan. LongTarget was developed to predict a lncRNA’s DNA binding motifs and binding sites in a genomic region based on potential base pairing rules between a RNA sequence and a DNA duplex. We tested LongTarget using multiple human and mouse lncRNAs together with well-known genome imprinting clusters and genes with known lncRNA binding in humans and mice (He et al. LongTarget: a tool to predict lncRNA DNA-binding motifs and binding sites via Hoogsteen base-pairing analysis. Bioinformatics 2015, 31:178–186), revealing that LongTarget can satisfactorily predict lncRNAs’ DNA binding sites. We further used LongTarget to analysis the binding of lncRNAs that control genomic imprinting in well-known imprinting regions in multiple mammals (Liu et al. LncRNA/DNA binding analysis reveals losses and gains and lineage specificity of genomic imprinting in mammals. Bioinformatics, 2017, 33:1431–1436), revealing that lncRNAs and imprinting sites show significant losses and gains and lineage-specificity, that a lncRNA may have many binding sites in a genome, and that multiple lncRNAs may bind to the same genomic sites.

Since the pioneering genome-wide discovery of mouse lncRNAs in the FANTOM consortium, experimental studies have identified abundant lncRNAs in both humans and mice. Nevertheless, due to differing and imperfect experimental protocols, the human lncRNAs reported in early studies share limited overlap, and so do mouse lncRNAs. In 2012 and 2014, the GENCODE consortium reported 13562 human lncRNAs and 10481 mouse lncRNAs. Because the systematic identification of lncRNAs by RNA-sequencing many tissues from multiple species is prohibitively costly, a reliable and comprehensive set of lncRNAs in other mammals has not been and likely will never be reported. While lncRNAs are poorly conserved in sequence, many mammalian lncRNAs are conserved in genomic position, making to use genome search to identify orthologues of human lncRNAs in other mammals feasible. We used RNAfold to predict the structure of each exon of the 13562 human lncRNAs and used Infernal to search the orthologue of each exon in 16 mammalian genomes. We have organized all these orthologous lncRNAs into the database LongMan, which may be the first orthologous mammalian lncRNA database. The versions of the whole genome sequences we used are: human (GRC37/hg19), chimpanzee (CSAC 2.1.4/panTro4), macaque (BGI CR_1.0/rheMac3), marmoset (WUGSC 3.2/calJac3), tarsier (Broad/tarSyr1), Mouse lemur (Broad/micMur1), Tree shrew (Broad/tupBel1), Mouse (GRCm38/mm10), Rat (RGSC 5.0/rn5), Guinea pig (Broad/cavPor3), Rabbit (Broad/oryCun2), Dog (Broad CanFam3.1/canFam3), Cow (Baylor Btau_4.6.1/bosTau7), Elephant (Broad/loxAfr3), Hedgehog (EriEur2.0/eriEur2), Opossum (Broad/monDom5), and Platypus (WUGSC 5.0.1/ornAna1). GRC38/hg38 is added recently.

Recently, we have integrated LongTarget into LongMan, thus creating a seamless pipeline for cross-species and genome-scale lncRNA/DNA binding analysis. To facilitate genome-scale analysis, we make LongMan include the whole genome sequence of 17 mammals. Currently two kinds of genome-scale lncRNA/DNA binding analysis are available: (1) to predict the DNA binding sites of a lncRNA in promoter regions of all transcripts in any of the 17 mammals (embedded in the LongTarget section), (2) to predict the DNA binding sites of all lncRNAs in a mammalian genome in a genomic region <=20000 bp (embedded in the LongMan section). The limit of 20000 bp will be relaxed soon after our server is upgraded.

Having a plain user interface though, LongMan provides strong database search functions comparable to those in other popular lncRNA databases. To better serve the users who may frequently use other databases and want to analyze lncRNAs downloaded from elsewhere, we have developed LongTarget-BE, a variant of LongTarget for Batch lncRNA/DNA binding analysis using External data. The user can choose to submit a job with multiple lncRNA sequences and one DNA sequence, or one lncRNA sequence and multiple DNA sequences. More details about the application of the pipeline, including examples, are given in the article “Lin et al. Pipelines for cross-species and genome-wide prediction of long noncoding RNA binding. Nature Protocols 2019.” . Functions of this website are tested to work well with Google Chrome and Mozilla Firefox. Please send inquiries and comments to  longtarget@smu.edu.cn.