Data Flows
From BiomedGT
Contents |
New Proposal Ready to Submit Flow - this page needs updating
- New proposals are initiated in either the Proposal or the Proposal_Talk namespaces.
- The Proposal namespace contains structured change proposals. Any information that isn't directly or indirectly coupled with the semantic database is ignored.
- The Proposal_talk namespace contains freetext or partially structured change proposals. Proposals in this category are intended to be read and interpreted.
- Proposals are categorized as one of:
- Minor change - changes that are relatively minor and uncontested. Changes in this category include spelling corrections, correction of typographic errors or obvious errors that will not have a significant impact on the meaning or use of the entry in question. Minor changes begin life in the Ready to Submit [[Template:WorkFlow_CurationStatus|workflow category]
- Major change - all other changes. Major changes begin life in the New Proposal workflow category
- Major changes are reviewed and evaluated by the SME's until they are satisfied that the proposal(s) are what are desired. At this point a designated SME changes the proposal status from New Proposal to Ready to Submit
- The list of proposals in each workflow category can be examined by following the links on the workflow category page. As an example, all new proposals can be found on the New Proposal page.
Packaging Ready to Submit Proposals
The Alpha SME will examine proposals in the Ready to Submit from time to time.
- The Alpha SME will group proposals into Work flow packages. Each workflow package represents a single, related unit of work.
- Each workflow package will be provided with metadata fields about the who, why, when, priority and status of the package.
- Once assembled, a workflow package will be assigned the category of Pending Receipt and all of the package components will be changed to a status of Submitted
Pulling Proposals from the Wiki into the External Curation System
Periodically, the external workflow software will upload the Pending Receipt category via. the the URL: http://biomedgt.org/index.php?title=Special:ExportRDF/Category:WorkFlow_Pending_Receipt
- This URL will provide the following RDF
- Each unstructured or structured proposal within this RDF can then be accessed individually via the corresponding rdfs:hasArticle element that corresponds to the smw:Thing (for unstructured) or rdfs:isDefinedBy element that corresponds to the owl:Class (for structured) content. As an example, information about Proposal:NCI_Blue_Gene(TC00001):
<owl:Class rdf:about="&wiki;Proposal-3ANCI_Blue_Gene-28TC00001-29_by_Hsolbrig_at_00710120123"> <rdfs:label>NCI Blue Gene(TC00001) by Hsolbrig at 00710120123</rdfs:label> <smw:hasArticle rdf:resource="&wikiurl;Proposal:NCI_Blue_Gene(TC00001)_by_Hsolbrig_at_00710120123"/> <rdfs:isDefinedBy rdf:resource="&wikiurl;Special:ExportRDF/Proposal:NCI_Blue_Gene(TC00001)_by_Hsolbrig_at_00710120123"/> </owl:Class>
Can be found in http://biomedgt.org/index.php?title=Special:ExportRDF/Proposal:NCI_Blue_Gene(TC00001)_by_Hsolbrig_at_00710120123
Information about Proposal_talk:NCI_Apoptosis_Inhibitor_Gene(C20347):
<smw:Thing
rdf:about="&wiki;Proposal_talk-3ANCI_Apoptosis_Inhibitor_Gene-28C20347-29_by_Hsolbrig_at_20071011223308">
<rdfs:label>Proposal_talk:NCI Apoptosis Inhibitor Gene(C20347) by Hsolbrig at 20071011223308</rdfs:label>
<smw:hasArticle
rdf:resource="&wikiurl;Proposal_talk:NCI_Apoptosis_Inhibitor_Gene(C20347)_by_Hsolbrig_at_20071011223308"/>
<rdfs:isDefinedBy
rdf:resource=
"&wikiurl;Special:ExportRDF/Proposal_talk:NCI_Apoptosis_Inhibitor_Gene(C20347)_by_Hsolbrig_at_20071011223308"/>
</smw:Thing>
Can be found in http://biomedgt.org/index.php?title=Proposal_talk:NCI_Apoptosis_Inhibitor_Gene(C20347)_by_Hsolbrig_at_20071011223308
Discusion can be found the same way - Category_talk:NCI_BCAS1_Gene(C20771) is found at: http://biomedgt.org/index.php?title=Category_talk:NCI_BCAS1_Gene(C20771)
NOTE: (BD) There seems to be a confusion of terms here. hasArticle refers to other wiki pages and isDefinedBy refers to RDF versions thereof.
This is orthogonal to whether or not the reference is to structured or unstructured content.
New Content Structured Proposals
New content proposals such as Special:ExportRDF/Proposal:NCI_Blue_Gene(TC00001)_by_Hsolbrig_at_00710120123 can be recognized by parsing the label (NCI Blue Gene(TC00001) by Hsolbrig at 00710120123 --> NCI Blue Gene(TC00001)) and then determining whether there is a category by the same name Category:NCI Blue Gene(TC00001).
NOTE: this could probably be improved. It might be worthwhile to add a new LexWiki_Name
property which would eliminate the need for parsing"
Semantic content in new proposals are identified by the properties which are defined in the LexWiki Common Terminology Data elements.
- property:LexWiki_Concept_Code
- property:LexWiki_Preferred_Name
- property: LexWiki_URI
. . .
- property:SKOS_inScheme
- property:SKOS_note
In addition, the rdfs:subClassOf entries define the parent classes.
NOTE: This is one place where overloading the wiki category mechanism gets us into trouble.
The RDF parser is going to have to understand which subClass (and ObjectProperty!) targets are
semantic (e.g. Category:NCI_Gene(C16612)) and which are strictly
organizational :Category:WorkFlow_BiomedGT_Proposal.
Modification Structured Proposals
Modifications to existing categories can be identified via the same mechanism as described in the section above. As an example Special:ExportRDF/Proposal:NCI_Trefoil_Family(C21231)_by_Hsolbrig_at_20071011222805 can be parsed into Category:NCI_Trefoil_Family(C21231), which exists and can be retrieved as Special:ExportRDF/Category:NCI_Trefoil_Family(C21231). The original needs to be parsed the same way as the new and the differences are then compared to find what changes
Note: Mayo already has code to do this BD: Given that this is parsed into objects isn't this just diffing two objects? Also the little MAyo code I've seen appears to be based on hitting a mysql database
Unstructured Proposals
Some information about unstructured proposals can be retrieved via the RDF route (e.g. Special:ExportRDF/Proposal_talk:NCI_Apoptosis_Inhibitor_Gene(C20347)_by_Hsolbrig_at_20071011223308), but the basic information needs to be retrieved from the wiki directly (e.g. Proposal_talk:NCI_Apoptosis_Inhibitor_Gene(C20347)_by_Hsolbrig_at_20071011223308). Note that Special:Export/Proposal_talk:NCI_Apoptosis_Inhibitor_Gene(C20347)_by_Hsolbrig_at_20071011223308 yields the raw text, but a MediaWiki parsing tool would be necessary to produce the final result. One of the tasks on the work list for the coming months will be to produce an export tool that will transform a MediaWiki page into relatively simple, stand alone HTML for archival and reference purposes.
Changing Proposal Status
While the precise mechanism hasn't been defined yet, there will be a URL that will be accessible via. one of the many language specific MediaWiki API's. The application will be required to logon in and then will be able update selected pages that will allow the status of both workflow packages and workflow proposals to indicate that they have been received, are in process or are completed.

