List species supported by Ensembl with display name, common name, assembly, taxon ID, and division. Required discovery step — species names like homo_sapiens are opaque to non-biologists and are the input format every other Ensembl tool expects. Filter by division to limit results; use nameContains to find a species by partial name match. Returns the full species catalog when no filters are applied (EnsemblVertebrates has ~250 species; all divisions combined have ~1,000+).
Resolve a gene by symbol + species (or by stable ID) to its Ensembl ID, genomic location (chr:start-end:strand), biotype, description, and transcript list. Entry point for most workflows — the stable ID and coordinates returned here are inputs to other tools. Accepts both symbol lookup (BRCA2 + homo_sapiens) and direct ID lookup (ENSG00000139618). Supports batch lookup of up to 20 IDs or symbols in one call via the ids or symbols field. For symbol lookup, species is required; for ID lookup, species is not needed. Use ensembl_list_species to discover valid species names.
Fetch the DNA, cDNA, CDS, or protein sequence for a gene, transcript, protein, or genomic region. Returns the sequence with its stable ID, molecule type, and character count — large sequences are returned in full but the length is stated so callers can budget context. The type parameter selects which sequence is fetched: genomic (default, includes introns), cdna (spliced transcript), cds (coding sequence only), protein. For region mode, set id to the format species:chr:start-end (e.g. homo_sapiens:13:32315086-32400268) and set species. Protein sequences require a transcript or protein stable ID (ENST…/ENSP…), not a gene ID — use ensembl_lookup_gene with expand_transcripts=true to get the canonical transcript ID first.
Find genomic features overlapping a chromosomal region: genes, transcripts, variants, regulatory elements, or exons. Returns each feature with its stable ID, type, location, biotype, and name. Useful for "what's in this locus?" and for seeding follow-up lookups. Region format is chr:start-end (e.g. 13:32315086-32400268 for the BRCA2 locus). Chromosome names use Ensembl format — no "chr" prefix for vertebrates (use 13 not chr13). The feature parameter defaults to gene only to prevent overwhelming returns — requesting variation in an 85 kb region returns 44,000+ entries. Explicitly include variation, regulatory, transcript, or exon only when needed.
Predict the functional consequences of a sequence variant using the Ensembl Variant Effect Predictor (VEP). Accepts HGVS notation (transcript-relative, e.g. ENST00000380152.8:c.2T>A, or genomic, e.g. 13:g.32316462T>A) and also region+allele format (chr:start:end:strand/allele, e.g. 1:65568:65568:1/T). Returns the most severe consequence term, affected transcripts and genes, impact level (HIGH/MODERATE/LOW/MODIFIER), and any colocated known variants with clinical significance. HGVS input: provide the full notation including transcript version for best results. Region+allele input: use Ensembl chromosome naming (no chr prefix for vertebrates).
Find orthologs and/or paralogs of a gene across species. Returns each homolog's stable ID, species, homology type (ortholog_one2one, ortholog_one2many, paralog_many2many, etc.), perc_id (percent identity), perc_pos (percent positives), and taxonomy level. Essential for cross-species research — for example, "what is the mouse equivalent of human TP53?" or "how conserved is BRCA2 across mammals?". Provide either symbol + species or a stable gene ID. Target species can be filtered to a single species or left open to return all available homologs.
Retrieve cross-database references for a gene or feature — HGNC, UniProt, EntrezGene, OMIM, RefSeq, Reactome, and others. Returns each xref with its database name, primary ID, display ID, and description. The dbname filter narrows to specific databases; omit to return all xrefs. IDs returned here chain to protein (pubchem via UniProt), literature (pubmed via PubMed IDs), disease (OMIM via MIM_GENE), and pathway (Reactome) resources. Requires an Ensembl stable ID — use ensembl_lookup_gene to get the ENSG… ID first. Common dbname values: HGNC, Uniprot_gn, EntrezGene, MIM_GENE, RefSeq_mRNA, RefSeq_peptide, Reactome, GO (Gene Ontology), ChEMBL.