WGS

NOTE: Several versions of this metadata schema have been created over time. The (Latest) version contains most attributes, but there may be some deprecated attributes in the older versions for which data has been collected. SenNet is in the process of creating a reference which combines all of these versions into a single view. That reference will be available here once completed.

Version 1 (no longer accepting data)

Attribute	Type	Description	Allowable Values	Required
version	Allowable Value	Version of the schema to use when validating this metadata.	[‘1’]	True
description	Textfield	Free-text description of this assay.		True
source_id	Textfield	SenNet Display ID of the source of the assayed tissue.		True
tissue_id	Textfield	SenNet Display ID of the assayed tissue.		True
execution_datetime	Datetime	Start date and time of assay, typically a date-time stamped folder generated by the acquisition instrument. YYYY-MM-DD hh:mm, where YYYY is the year, MM is the month with leading 0s, and DD is the day with leading 0s, hh is the hour with leading zeros, mm are the minutes with leading zeros.		True
protocols_io_doi	Textfield	DOI for protocols.io referring to the protocol for this assay.		True
operator	Textfield	Name of the person responsible for executing the assay.		True
operator_email	Textfield	Email address for the operator.		True
pi	Textfield	Name of the principal investigator responsible for the data.		True
pi_email	Textfield	Email address for the principal investigator.		True
assay_category	Allowable Value	Each assay is placed into one of the following 4 general categories: generation of images of microscopic entities, identification & quantitation of molecules by mass spectrometry, imaging mass spectrometry, and determination of nucleotide sequence.	[‘sequence’]	True
assay_type	Allowable Value	The specific type of assay being executed.	[‘WGS’]	True
analyte_class	Allowable Value	Analytes are the target molecules being measured with the assay.	[‘DNA’]	True
is_targeted	Allowable Value	Specifies whether or not a specific molecule(s) is/are targeted for detection/measurement by the assay.	[‘Yes’,’No’]	True
acquisition_instrument_vendor	Textfield	An acquisition instrument is the device that contains the signal detection hardware and signal processing software. Assays generate signals such as light of various intensities or color or signals representing the molecular mass.		True
acquisition_instrument_model	Textfield	Manufacturers of an acquisition instrument may offer various versions (models) of that instrument with different features or sensitivities. Differences in features or sensitivities may be relevant to processing or interpretation of the data.		True
gdna_fragmentation_quality_assurance	Allowable Value	Is the gDNA integrity good enough for WGS? This is usually checked through running a gel.	[‘Pass’, ‘Fail’]	True
dna_assay_input_value	Numeric	Amount of DNA input into library preparation		True
dna_assay_input_unit	Allowable Value	Units of DNA input into library preparation	[‘ug’]	False
library_construction_method	Textfield	Describes DNA library preparation kit. Modality of isolating gDNA, Fragmentation and generating sequencing libraries.		True
library_construction_protocols_io_doi	Textfield	A link to the protocol document containing the library construction method (including version) that was used.		True
library_layout	Allowable Value	State whether the library was generated for single-end or paired end sequencing.	[‘single-end’, ‘paired-end’]	True
library_adapter_sequence	Textfield	The adapter sequence to be used for adapter trimming starting with the 5’ end. (eg. 5-ATCCTGAGAA)		True
library_final_yield	Numeric	Total amount of library after final pcr amplification step		True
library_final_yield_unit	Allowable Value	Total units of library after final pcr amplification step	[‘ng’]	False
library_average_fragment_size	Numeric	Average size in basepairs (bp) of sequencing library fragments estimated via gel electrophoresis or bioanalyzer/tapestation.		True
sequencing_reagent_kit	Textfield	Reagent kit used for sequencing		True
sequencing_read_format	Textfield	Slash-delimited list of the number of sequencing cycles for, for example, Read1, i7 index, i5 index, and Read2.		True
sequencing_read_percent_q30	Numeric	Q30 is the weighted average of all the reads (e.g. # bases UMI * q30 UMI + # bases R2 * q30 R2 + …)		True
sequencing_phix_percent	Numeric	Percent PhiX loaded to the run		True
contributors_path	Textfield	Relative path to file with ORCID IDs for contributors for this dataset.		True
data_path	Textfield	Relative path to file or directory with instrument data. Downstream processing will depend on filename extension conventions.		True

Version 0

Attribute	Type	Description	Allowable Values	Required
source_id	Textfield	SenNet Display ID of the source of the assayed tissue.		True
tissue_id	Textfield	SenNet Display ID of the assayed tissue.		True
execution_datetime	Datetime	Start date and time of assay, typically a date-time stamped folder generated by the acquisition instrument. YYYY-MM-DD hh:mm, where YYYY is the year, MM is the month with leading 0s, and DD is the day with leading 0s, hh is the hour with leading zeros, mm are the minutes with leading zeros.		True
protocols_io_doi	Textfield	DOI for protocols.io referring to the protocol for this assay.		True
operator	Textfield	Name of the person responsible for executing the assay.		True
operator_email	Textfield	Email address for the operator.		True
pi	Textfield	Name of the principal investigator responsible for the data.		True
pi_email	Textfield	Email address for the principal investigator.		True
assay_category	Allowable Value	Each assay is placed into one of the following 4 general categories: generation of images of microscopic entities, identification & quantitation of molecules by mass spectrometry, imaging mass spectrometry, and determination of nucleotide sequence.	[‘sequence’]	True
assay_type	Allowable Value	The specific type of assay being executed.	[‘WGS’]	True
analyte_class	Allowable Value	Analytes are the target molecules being measured with the assay.	[‘DNA’]	True
is_targeted	Allowable Value	Specifies whether or not a specific molecule(s) is/are targeted for detection/measurement by the assay.	[‘Yes’,’No’]	True
acquisition_instrument_vendor	Textfield	An acquisition instrument is the device that contains the signal detection hardware and signal processing software. Assays generate signals such as light of various intensities or color or signals representing the molecular mass.		True
acquisition_instrument_model	Textfield	Manufacturers of an acquisition instrument may offer various versions (models) of that instrument with different features or sensitivities. Differences in features or sensitivities may be relevant to processing or interpretation of the data.		True
gdna_fragmentation_quality_assurance	Allowable Value	Is the gDNA integrity good enough for WGS? This is usually checked through running a gel.	[‘Pass’, ‘Fail’]	True
dna_assay_input_value	Numeric	Amount of DNA input into library preparation		True
dna_assay_input_unit	Allowable Value	Units of DNA input into library preparation	[‘ug’]	False
library_construction_method	Textfield	Describes DNA library preparation kit. Modality of isolating gDNA, Fragmentation and generating sequencing libraries.		True
library_construction_protocols_io_doi	Textfield	A link to the protocol document containing the library construction method (including version) that was used.		True
library_layout	Allowable Value	State whether the library was generated for single-end or paired end sequencing.	[‘single-end’, ‘paired-end’]	True
library_adapter_sequence	Textfield	The adapter sequence to be used for adapter trimming starting with the 5’ end. (eg. 5-ATCCTGAGAA)		True
library_final_yield	Numeric	Total amount of library after final pcr amplification step		True
library_final_yield_unit	Allowable Value	Total units of library after final pcr amplification step	[‘ng’]	False
library_average_fragment_size	Numeric	Average size in basepairs (bp) of sequencing library fragments estimated via gel electrophoresis or bioanalyzer/tapestation.		True
sequencing_reagent_kit	Textfield	Reagent kit used for sequencing		True
sequencing_read_format	Textfield	Slash-delimited list of the number of sequencing cycles for, for example, Read1, i7 index, i5 index, and Read2.		True
sequencing_read_percent_q30	Numeric	Q30 is the weighted average of all the reads (e.g. # bases UMI * q30 UMI + # bases R2 * q30 R2 + …)		True
sequencing_phix_percent	Numeric	Percent PhiX loaded to the run		True
contributors_path	Textfield	Relative path to file with ORCID IDs for contributors for this dataset.		True
data_path	Textfield	Relative path to file or directory with instrument data. Downstream processing will depend on filename extension conventions.		True