JournalofBiomedicalInformatics34,15–27(2001)doi:10.1006/jbin.2001.1005,availableonlineathttp://www.idealibrary.comonExtractingKnowledgefromDynamicsinGeneExpressionBenY.Reis,*AtulS.Butte,†andIsaacS.Kohane‡,1*Harvard–MITDivisionofHealthScienceTechnology,Cambridge,Massachusetts02139;†Children’sHospitalInformaticsProgram&MITDivisionofHealthScienceTechnology,Boston,Massachusetts02115;and‡Children’sHospitalInformaticsProgram,Boston,Massachusetts02115ReceivedDecember22,2000;publishedonlineMarch15,2001Mostinvestigationsofcoordinatedgeneexpressionhavefocusedonidentifyingcorrelatedexpressionpatternsbetweengenesbyexaminingtheirnormalizedstaticexpressionlevels.Inthisstudy,wefocusonthedynamicsofgeneexpressionbyseekingtoidentifycorrelatedpatternsofchangesingeneticexpressionlevel.Indoingso,webuilduponmethodsdevelopedinclinicalinformaticstodetecttemporaltrendsoflaboratoryandotherclinicaldata.WeconstructrelevancenetworksfromSaccharomycescerevisiaegene-expressiondynamicsdataandfindgeneswithrelatedfunctionalannotationsgroupedto-gether.Whilesomeoftheseassociationsarealsofoundusingastandardexpressionlevelanalysis,manyareidentifiedexclusivelythroughthedynamicanalysis.Theseresultsstronglysuggestthattheanalysisofgeneexpressiondynamicsisanecessaryandimportanttoolforstudyingregulatoryandotherfunctionalrelationshipsamonggenes.Thesourcecodedevelopedforthisinvestigationisfreelyavailabletoallnon-commercialinvestigatorsbycontactingtheauthors.᭧2001AcademicPressKeyWords:geneexpression;clustering;dynamics;bioinformatics;clinicalinformatics;trends.INTRODUCTIONTounderstandasystemfully,onemuststudyitsdynamics.WiththesequencingofthehumangenomecompletedlastTowhomcorrespondenceshouldbeaddressedatChildren’sHospi-talInformaticsProgram,HarvardMedicalSchool,300LongwoodAvenue,Boston,MA02115.Fax:(617)355-3456.E-mail:isaackohane@harvard.edu.1year,thefocusoftheresearchcommunityisshiftingtowardafunctionalunderstandingoftherolesofandrelationshipsbetweendifferentgenes.Withadvancesingeneticexpres-sionprofilingtechniques[1,2]enablingdetailedgenomicscalemeasurementsofgeneticactivity,itisimportantforthepurposesofknowledgediscoverytoextractallthemean-ingfulinformationpresentinthedata.Todate,mostanalyses[3,4]havefocusedonclusteringgenesbasedsimplyoncorrelatedpatternsofgeneticexpression,ignoringotherrela-tionshipspresentinthedata.Inthisreportweproposethatfurtheridentifyingcorrelatedpatternsofgeneexpressiondynamicsrevealsadditionalmeaningfulinformationinthedata.Inpursuingtheinvestigationofgenedynamics,wearerecapitulatingandbuildingonalargebodyofworkinclini-calinformaticsdealingwiththeidentificationoftemporalabstractionsandtrendanalysis.Theliteratureisrepletewithreportsofthelimitationsofperformingdiagnosisorplanningwithatemporaldata[5]andtheleverageobtainedbycaptur-ingthedynamicsofbiomedicalprocesses[6–11].Untilrecently,theapplicationofthesetechniquesinbioinformat-icshasbeenrelativelylimited,particularlyintheanalysisofgeneexpression,inpartbecauseofthepaucityofdatasetswithsufficienttimepoints.ThoseanalysesthathavebeenpublishedhavefocusedprimarilyontheuseofsignalprocessingtechniquesusingtheFouriertransform[12,13].Manytechniqueshavebeenusedinfunctionalgenomics1532-04/01$35.00Copyright᭧2001byAcademicPressAllrightsofreproductioninanyformreserved.1516forclustering,includingphylogenetictrees[4],self-organiz-ingmaps[14,15],andrelevancenetworks[16,17].TheseclusteringtechniqueshavereliedonavarietyofassociationmetricssuchasEuclideandistance,correlationcoefficients,andmutualinformation.Thesedifferenttechniquesandasso-ciationmeasureshave,tovaryingdegrees,allprovedsuc-cessfulinclusteringgenesknowntoberelatedinfunction.Whilemanydifferencesamongthesevariousapproachesexist,allofthemclusteraccordingtotheabsolutelevelofgeneticexpression.Inthisstudy,weproposeanalternateapproachinvolvingthedynamicsofgeneticexpression,andformulateamethodologyforclusteringgenesaccordingtochangesingeneticexpressionlevel.ClusteringGenesAccordingtoExpressionDynamicsHasImportantAdvantagesWeusethetermdynamicstorefertotherateofchangeofgeneticexpressionovertime,calculatedasthefirst-orderdifferenceofthegeneticexpressionlevels(Et2-Et1,Et3-Et2).Thisisdifferentfromthesimpletemporalpatternofgeneticexpression(Et1,Et2,Et3)thatwerefertoasstatics.Theprimarymotivationforstudyinggeneexpressiondy-namicsisthatexistingstatictechniquesmaynotidentifyalltheimportantrelationships.Somegenesmayhaveassociateddynamicbehaviorsbutmaynothaveassociatedstaticex-pressionbehaviors.AhypotheticalexampleisshowninFig.1:GeneAcodesforanenhancerproteinthatregulatestheexpressionofgeneB—ahighlevelofgeneAcausesanup-regulationofexpressioningeneB.SincegeneBcanbeatmanypossibleexpressionlevelsbeforebeingaffectedbygeneA,theenhancer-typerelationshipbetweenthetwogenescannotbenoticedbysimplyexaminingthecorrelationofstaticexpressionpatterns.Instead,oneneedstoexaminethedynamicsofgeneexpression—thewayinwhichtheexpressionlevelofgeneAleadstoachangeingeneB—inordertodetecttheunderlyingdynamicrelationship.Wethereforehypothesizethatthisdynamicapproachhasthepotentialtodiscoverrelationshipsbetweengenesthatarenotdetectableusingexistingstatictechniques.Itisthegoalofthisstudytoformulate,validate,andevaluatethisdynamicapproachforknowledgediscoveryinfunctionalgenomics.METHODSExperimentalDataWestudiedtheSaccharomycescerevisiae(Table1)mRNA-expressiondataaggregatedfromseveralexperi-mentsreportedbyEisenetal.[3]inwhichtheresponseofREIS,BUTTE,ANDKOHANEtheyeastcellstoseveraldifferentstimuliisrecorded.Thedatacontain79datapointsin10time-seriesmeasuredunderdifferentexperimentalconditions,showninTable1.Ofover6000genesintheyeastgenome,Eisenincludedonly2467genesthathadfunctionalannotations.Weanalyzethesamesubsetofgenes.RepresentingGeneExpressionDynamicsSlopesarecalculatedbetweeneachadjacentpairofex-pressiondatapoints,EtnandEtn+1:Slope(n,nϩ1)ϭexpressionlevelnϩ1Ϫexpressionleveln(timenϩ1Ϫtimen).(1)Sinceslopesareonlycalculatedbetweendatapointswithinthesametimeseries,the79datapointsin10timeseriesarereducedtoonly69slopemeasurements.Theunitsoftheslopemeasurementsareinnormalizedexpressionlevelunitsperminute.DataVisualizationandAnalysisRelevancenetworksareconstructedforthepurposesofanalysisandvisualizationofthedata.Relevancenetworksarereviewedbrieflyhereandhavebeendescribedinfullpreviously[16–18].Relevancenetworkshelpidentifygroupsofinterrelatedgenes.Ametricofassociationischosenforcomparingpat-ternsofgeneticexpressionbetweengenes.Afterallpairwisegene–geneassociationstrengthsarecalculated,astatisticallysignificantthresholdlevelofassociationisdetermined.Allconnectionsweakerthanthisthresholdareremoved,leavingsmallinterconnectedislandsofsignificantlyrelatedgenescalledrelevancenetworks.Themethodfordeterminingthisthresholdisoutlinedbelow,butasdescribedinpriorwork[16–18]itinvolvespermutingtheentireoriginaldataset,topreservethedistributionofgeneexpressionvalues,butbreakingthelinkbetweenexpressionvalueandaparticularconditionortissue.Thepairwiseassociationstrengthsarerecalculatedforeachpermutationandthelargestvalueofassociationobtainedinthepairwiseassociationsisrecorded.Afteralargenumberofpermutations,thismaximumvaluebecomestheminimumthresholdvalueforanyassociationintheunpermuteddatasets.Forthisstudy,weusealinearcorrelationcoefficientasameasureofassociation.SlopesarecalculatedasdescribedDYNAMICSINGENEEXPRESSION17FIG.1.Dynamicrelationshipsbetweengenes.(a)TheexpressedproductofgeneAbindsanenhancerregionthatincreasestranscriptionofgeneB.(b)B’sinitialexpressionlevelbeforebeingaffectedbyAcanvarythroughouttheexperiment.Asaresult,measuringthecorrelationbetweentheabsolutelevelsofgenesAandBwillnotrevealtheunderlyingenhancementrelationshipbetweenthetwo.Instead,thiscanonlybedonebyanalyzingtheexpressiondynamics—thechangeinexpressionlevelofgeneBinrelationtotheexpressionlevelofgeneA.18TABLE1ExperimentalConditionsNumberoftimepointsConditions18CellcycleafterAlphafactorarrestandrelease.14Cellcycleafterelutriation.15Cellcycleforcdc15mutantsaftertemperature-sensitivearrestandrelease.6Sporulation,Experiment1.3Sporulation,Experiment2.2Sporulation,Experiment3.6Responsetohigh-temperatureshock.4Responsetoreducingshock.4Responsetolow-temperatureshock.7Responsetodiauxicshift.Note.TheexperimentalconditionsunderwhichthegeneexpressionmeasurementsreportedbyEisenetal.[4]weretaken.above,yieldingadatasetwhereeachrowisthetimeseriesofaparticulargene’sexpressiondynamics.PairwisePearsoncorrelationcoefficientsarethencalculatedbetweenallpossi-blecombinationsoftworows.ThesearesquaredtoyieldtheR2,afterwhichtheoriginalsignisreappendedtoconservetheinformationofwhetherthegenesarepositivelyornega-tivelycorrelated[7].WecallthisfinalsignedvalueR2.Correlationcoefficientsaresensitivetooutlyingvalues,whichcanbiasdownstreamdataanalysis.Twosymmetricoutlyingvaluesmayartificiallyraisethecorrelationcoeffi-cientofanotherwisenonlineardistribution.Thatis,inallexceptoneortwomicroarrays,agenewillhaveascatterwithinasmallrangeandthenduetoanartifactofthehybridizationprocess,theoneortwomicroarrayswillhaveaveryhighvalueforthatgene.Thisisallthemorestrikinginthedatasetonwhichthisanalysishasbeenperformedwhereeachdatapointbelongstoatimeseriesofagivenstimulusandwheretherestofthetimeseriesshowsmuchsmootherchanges.Wehavehadtoapplythefilterfortheseoutliervaluesalsoinpriorstudiesforthesamereason[17].Itshouldbepointedout,nonetheless,thatwewillnecessarilymissthosefewoccasionswhereoutliervaluesdorepresentquantumanddramaticchangeinexpression.Consequently,anentropy-basedfilterisusedtoremovegeneswithoutlyingvaluesintheirdistributionsfromtheanalysis.First,theindividualentropiesofthedynamicstimeseriesarecalcu-latedforeachgene,withtheentropyH(A)definedas:H(A)ϭ͚Ϫp(An)log2p(An).(2)nREIS,BUTTE,ANDKOHANEThegenesarerankedaccordingtotheirentropies,andthebottom5%(entropythresholdϽ2.14)areexcludedfromtheanalysis.IssuesSpecifictoDynamicsTheinclusionofdynamicsinthemethodologyintroducesanumberofimportantissuesthatwillbeaddressedinturn.First,theissueofstasis:weobservethatmostgenesdonotchangetheirexpressionlevelsmostofthetime.Figure2showsthedistributionofslopestakenfromallthepointsinthedataset.Thewidespreadstasisinthedatacanleadtoseriouslymisleadinganalyses,asgenesthatremainstation-arytogethercanleadtoanartificiallyhighmeasureofassoci-ation.Toaddressthisissue,wefilteroutthestationarydatapoints,includingonlythemoredynamiconesintheanalysis.Thisinvolvessettinganexclusionrange,orhole,aroundthezerosloperange.WechoosethresholdsofϮ0.02normalizedslopeunitsperminute,foratotalholesizeof0.04.Thisapproachremovesapproximately70.0%oftheoriginaldatapoints,andallowsustostudythegenesthatchangeinacoordinatedfashion,whileavoidingthemisleadingidentifi-cationofgenesthatsimplyremainstationarytogether.Weautomaticallyevaluatedarangeofholesizesandpickedthevalueof0.04basedonmaximizingthenumberofretaineddatapointsandminimizingthethresholdassociationlevelinthepermuteddata(describedbelow).Thissolutionleadstothesecondcomplication:Sincemanydatapointsareremoved,theremainingdatacanbeverysparse.Toensurethatallcorrelationcoefficientcalcula-tionsarebasedonenoughpointstoavoidtoomanyspuriousassociations(asdefinedbyourpermutationanalysis,below),wesetathresholdrequiringaminimumoffivedatapointsforacalculationtobevalid:anypairwisedistributionhavingfewerdatapointsthanthisthresholdisexcludedfromtheanalysis.Thisthresholdingapproachissimilartotheoneusedinworkonclinicaldatarelevancenetworks[18].Thelaststepintherelevancenetworksmethodologyin-volvesdeterminingathresholdassociationlevelthatrepre-sentsalikelynonspuriousassociationbetweengenes.Wedeterminethislevelbypermutingthetimepointswithineachgeneandobtainingthedistributionofpairwisecorrelationcoefficientvalues.Weperformthispermutation100timesandthencomparetheaveragepermuteddistributionwiththedistributionobtainedfromtheoriginaldataset.ItisclearfromFig.3thatmuchstrongercorrelationsarepresentintheoriginaldataset.WecomfortablyplacethethresholdDYNAMICSINGENEEXPRESSION19FIG.2.Slopemeasurementsofgeneexpression.AsemilogplotofthedistributionofslopesderivedfromtheyeastgeneticexpressiondatasetreportedbyEisenetal.Almostalloftheslopesarenearzero,illustratingthatmostgenesremainstationarymostoftime.Thecomplicationscausedbythiswidespreadstasismustbeaddressedwithinanalyzinggeneexpressiondynamicsdata,asdescribedinthetext.levelofsignificanceatϮ0.78,asnopermuteddatapointsareabletoachieveanR2valuegreaterthanthat.RESULTSTherelevancenetworksgeneratedfromthedynamicsanalysisarepresentedfirst.Thesearethenevaluatedinthecontextofthenetworksgeneratedfromastaticanalysisbelow.DynamicRelevanceNetworksUsingathresholdofR2ϭ0.78,thedynamicanalysisyields71relevancenetworksconsistingof348nodes(Fig.4).Ofthe3,041,811possiblegene–geneconnections,only371(0.012%)areabovethisthreshold.Aboxlabeledwiththegenenamerepresentseachgene.Thewidthofeachboxrepresentsitsindegree—thenumberofothergenescon-nectedtoit.Therearefartoomanyassociationstodiscusseachoneindividually.Wethereforepresentthestrongestassociationsfound,aswellassomeofthemoreinterestingnegativeassociations.Thefulldatasetandanalysisareavailableathttp://www.chip.org/chip/people/kohane/papers/dynamics/readme.html.Ofthe71networks,thelargestonecontains154nodeswith238linksandconsistsmostlyofribosomalproteinsandrelatedgenes,suchasRNAhelicases,RNApolymerases,translationinitiationproteins,andothertranslationalregula-tors.Allofthesegenesaredirectlyrelatedinfunctiontoproteinsynthesis.Asmallernetwork,with14nodesand19links,alsoconsistsofmostlyribosomalproteins.ThegenewiththehighestindegreeisRRP4,a3Ј→5ЈexoribonucleaseinvolvedinadiversearrayofRNAproc-essing[19].Itislinkedwith12othergenes,includingRNAhelicases,RNApolymerases,andothertranslationalregula-tors.Itshighconnectivitysuggeststhatitiscoregulatedwithmanyofthegenesinvolvedinproteinsynthesisandappearstointeractwiththesegenesinadynamicmanner.Ofallthedynamicassociationsfound,Table2showsthe10withthehighestR2values.Thegenepairsfoundare20REIS,BUTTE,ANDKOHANEFIG.3.Dynamiccorrelationsofgeneexpression.AsemilogplotofthedistributionofR2calculatedfromthepairwisecomparisonsofthedynamicexpressionpatternsofalltheyeastgenes.Plottedarethepermutedandtheunpermuteddata.ThepermuteddatapointsrepresenttheaverageR2distributionderivedfrom100permutationsofthedataset.ItisclearfromthegraphthattheoriginaldataareabletoachievehighR2valuesthatarenotachievedinanyofthepermutedruns.closelyrelatedinfunction,includingthefouroccurrencesofASP3(L-asparaginaseII),whichareredundantlypresentonthemicroarrayusedformakingthemeasurements.TheseassociationsareshowngraphicallyinFig.5A.NegativeAssociationsWealsoexaminetwonegativeassociationsofinterest.First,welookatEXM2andMAD3(Fig.6A).EXM2isaproteininvolvedinallowingcellstoexitmitosis,whileMAD3isaspindle–assemblycheckpointproteinthatpre-ventscertaincellsfromleavingmitosis[20].ThesetwocounteractinggenesappearasnegativelycorrelatedintheirdynamicswithanR2ofϪ0.797.Meanwhile,theyare/arenotfoundtobestronglyassociatedinthestaticanalysis.Figure6BshowsthedistributionofslopesbetweenRAD6andMET18.RAD6isaubiquitin-conjugatingenzymecon-centratedinthenucleusthatisessentialformediatingthedegradationofamino-endrule-dependentproteinsubstrates[21].MET18,alsoknownasMMS19,isaproteinconcen-tratedinthenucleusthataffectsRNApolymeraseIItran-scription[22].Theseareinverselyrelatedintheirdynamics,withanR2ofϪ0.791.Itisnotsurprisingthatageneresponsi-bleforproteindegradationhasaninverserelationshiptoageneresponsibleforRNAtranscriptionleadingtoproteinsynthesis.ThecirclesinFig.6Brepresentthestaticdatapointsthatwefilterouttoavoiddirecttheanalysistofindingcorrelatedchangesingeneexpression.Ifthesestaticpointsareincludedintheanalysis,theR2shiftsfromϪ0.79toϪ0.58,farbelowthedeterminedlevelofsignificance.ThisillustrativeexamplehighlightsthemethodologicalutilityofthefilteringDYNAMICSINGENEEXPRESSION21FIG.4.Dynamicrelevancenetworks.Therelevancenetworksgeneratedusingthedynamicsmethodologyproposedinthisstudy.Eachboxrepresentsagene,labeledwithitsalphanumericidentificationtag.Thewidthofeachboxisdeterminedbyhowmanyothergenesitisconnectedto.Thevariousgroupsofinterconnectedgenesarecalledrelevancenetworks.Theshadedgenesarethosethatarealsofoundinthestaticanalysis.outthestationarydatapointswhenclusteringaccordinggeneexpressiondynamics.AsshowninFig.3,therearefarmorestrongpositiveassociationsthanstrongnegativeassociations.Figure7showsthedistributionforthestrongestpositiveassociation(tworibosomalproteinsRPL42andRPS24,R2ϭ0.957),andforthestrongestnegativeassociation(twoRNApolymerasegenesRPO31andSRB8,R2ϭϪ0.854).Ingeneral,thedistributionswithanextremelytightlinearfitareallpositive.Itcouldbearguedthatthesetightcorrelationsrepresentmoredirectrelationshipsbetweengenes,suchastwogenesoccurringinthesamestepofabiologicalpathway—tworibosomalproteinsthatarealwaysup-regulatedordown-regulatedtogether.Itcouldfurtherbearguedthattherearenoextremelystrongnegativecorrelationsbecausenegativefeedbackinbiologicalsystemsoccursmostlythroughmultistepsignalcascades.Thesearebydefinitionmoreindi-rectandthusresultinlesstightlinearrelationshipsbetweennegativelycorrelatedgenes.ComparisonofDynamicsandStaticAnalysesInthisworkwehaveformulatedamethodologyforclus-teringgenesaccordingtogeneexpressiondynamics.To22TABLE2DynamicAssociationsGenenameRPL42BRPS24BASP3ASP3RRP4SUA5LOS1NIP1RPL5RPS0ANMD3NSR1RPL9ARPS8BHHF2HTB1ASP3ASP3IMG1RSC6CategoryProteinsynthesisProteinsynthesisAsparagineutilizationAsparagineutilizationrRNAprocessingProteinsynthesistRNAsplicingNuclearproteintargetingProteinsynthesisProteinsynthesismRNAdecayNucleartargetingproteinProteinsynthesisProteinsynthesisChromatinstructureChromatinstructureAsparagineutilizationAsparagineutilizationProteinsynthesisChromatinstructureGenedescriptionREIS,BUTTE,ANDKOHANER20.9570.9100.9050.9000.70.70.40.40.30.3RibosomalproteinL42bRibosomalproteinL24BL-AsparaginaseIIL-AsparaginaseIIExoribonuclease/rRNAprocessingTranslationinitiationproteinInvolvedintRNAsplicingSubunitoftranslationinitiationRibosomalproteinRibosomalproteinRequiredforstableribosomalsubunitformationNLSbindingprotein/rRNAprocessingRibosomalproteinRibosomalproteinHistoneH4HistoneH2BL-AsparaginaseIIL-AsparaginaseIIMitochondrialribosomalproteinChromatinremodelingcomplexsubunitNote.Thestrongestassociationsbetweengenesfoundusingthedynamicsanalysis.FIG.5.(A)Thestrongestassociationsfoundusingthedynamicsanalysis.(B)Selectednetworksfoundusingthedynamicsanalysis,butnotthestaticanalysis.DYNAMICSINGENEEXPRESSION23FIG.6.Negativedynamiccorrelations.(A)ThedistributionofslopesofMAD3andEXM2plottedoneagainstanother.(B)ThedistributionofslopesofMET18andRAD6plottedoneagainstanother.Thefilledpointsinthemiddlearethosestaticpointsexcludedintheanalysistoensureidentificationofonlytrulydynamicrelationshipsbetweengenes.evaluatethismethodologyinthecontextofexistingtech-niques,weconstructasecondsetofrelevancenetworksbasedonastaticanalysisofthesamedataset.Thestaticanalysisisperformedasabove,withafewkeydifferences.First,weusetheoriginalgeneexpressiondata,andnotthefirstdifferenceofgeneexpression.Second,whilethegenesarestillrankedbyentropyvalueandthebottom5%(entropythresholdϽ2.2187)areremoved,thereisnoneedtofilteroutany“stationary”datapointssincethisisastaticanalysis.Forpurposesofcomparison,wesetthethresholdR2to0.70,creatingasetofrelevancenetworkswithasimilarnumberofgenesasthatseeninthedynamicanalysis(356staticvs348dynamic).Ofthe3,041,811possiblegene–geneconnections,4872(0.16%)arefoundtobeabovethisthreshold.FIG.7Positiveandnegativecorrelations.Slope–slopedistributionsofthestrongestpositivecorrelation(A)andthestrongestnegativecorrelation(B).Onthewhole,thestrongestpositivecorrelationsweremoretightlylinearthanthestrongestnegativeones,asexplainedinthetext.24REIS,BUTTE,ANDKOHANEFIG.8Comparisonofnetworks.Agroupofhistonegenesgroupedtogetherbyboththestaticanddynamicanalyses.Thedynamicsanalysis(left)foundfewdynamicconnectionscomparedtothealmostfullyconnectedcliqueformedbythestaticanalysis.Althoughbothanalysescontainasimilarnumberofgenes,therearemoreindividualnetworksgeneratedfromthedy-namicsanalysis(71separatenetworksvs45inthestatic),whiletherearefarmoreinterconnectionsbetweengenesinthestaticanalysis(4872linksvs371inthedynamic).Theseresultsmayindicatethatslope–slopeassociationsarelesscommonbiologically,orthattheyaremoredifficulttodetectwiththismethodologythanstaticassociations.Wefindthat133genesappearinboththestaticanddynamicanalyses,leaving215genesthatareexclusivetothedynamicsanalysis.However,onlyabouthalfofthe133sharedgenesappearlinkedtothesamegenesinbothanalyses—mostappearlinkedtoothergenes.Somegenesarefoundinbothanalyses,butappearassoci-atedwithdifferentgenes.Oneparticularlyinterestingexam-pleisdiscussedhere.Inthedynamicanalysis,threegenesinvolvedinproteinsynthesisaregroupsintoasinglenet-work:PRS1isinvolvedmakingPRPP,requiredformakingaminoacids[23];SIK1isanucleolarproteinnecessaryforribosomalsubunitassembly[24];SUI2codesforasubunitofatranslationinitiationfactor[25].Allthreeofthesegenesappearinthestaticanalysisaswell,butnonearelinkedtoeachother.Infact,whilePRS1andSIK1doappearindirectlyrelatedinthesamenetworkinthestaticanalysis,theyarenotfoundtobestronglydirectlylinkedtoeachother.Theseexamplesillustratehowusingbothstaticanddynamicap-proachescanattainacomplementaryviewofgene–generelations.RelationshipsAppearinginBothAnalysesAnumberoflinksarefoundidenticallyinbothanalyses,someofwhichareshowninTable3.Alargeinterconnectednetworkofhistonegenesresponsibleforchromatinstructurefoundbythestaticanalysisappearsbrokenupintotwonetworksinthedynamicsanalysis(Fig.8).Thismayindicatethatcertainassociationsareinherentlymoredynamicthanothers.AssociationsFoundExclusivelyintheDynamicsAnalysisMostofthegenesappearinginthedynamicsanalysisarenotfoundusingthestaticanalysis.Aselectionisreviewedhere(Fig.5B).OnenetworkgroupedSMD1,involvedinmRNAsplicing[26],withSPT15,ageneinvolvedintranscription[27].DYNAMICSINGENEEXPRESSIONTABLE3SharedAssociationsNameCategoryDescriptionPOL30ReplicationDNApolymeraseprocessivityfactorRFA1ReplicationReplicationfactorA,69kDasubunitRPN12Proteindegradation26SproteasomeregulatorysubunitRPN9Proteindegradation26SproteasomeregulatorysubunitCUP1CU2+IonMetallothioneneinhomeostasisCUP1CU2+ionhomeostasisMetallothioneneinASP3AsparagineL-AsparaginaseIIutilizationASP3AsparagineL-AsparaginaseIIutilizationAPT1PurinebiosynthesisAdeninephosphoribosyltransferaseNoneProteinsynthesisTryptophan–TRNAligaseHHF1ChromatinstructureHistoneH4HHT1ChromatinstructureHistoneH3HTB1ChromatinstructureHistoneH2BHHF2ChromatinstructureHistoneH4HHT2ChromatinstructureHistoneH3HTA1ChromatinstructureHistoneH2ARPS25BProteinsynthesisRibosomalproteinS25BRPS31ProteinsynthesisRibosomalproteinS31RPL18AProteinsynthesisRibosomalproteinL18ARPL8BProteinsynthesisRibosomalproteinL8BRPL1BProteinsynthesisRibosomalproteinL1BRPS19BProteinsynthesisRibosomalproteinS19BNote.Selectedassociationsfoundbyboththedynamicandstaticanalyses.AnothernetworkgroupedTOP1,involvedinDNAreplica-tion[28],withDHS1,involvedinDNArepairandrecombi-nation[29].YetanothernetworkgroupedAPL5,involvedinvacuolarproteintargeting[30],withSNI2,ageneinvolvedinsecretion[31].AnotherinterestingnetworkconsistsofonecellcyclegeneBUB2[22]andthreegeneslocalizedinspaceinthemitochondria:SLS1isintegralmembraneproteininvolvedinmitochondrialmetabolism[32];CEM1isamitochondrialproteininvolvedinfattyacidmetabolism[33];RIB2isin-volvedinriboflavinsynthesis,alsolocalizedtothemito-chondria[34].Theintuitivenatureofmanyoftheserelationshipssug-geststhatthedynamicsanalysiscanidentifymeaningfulassociationsthatarenotfoundusingastaticsanalysis.25DISCUSSIONSummaryofResultsWehaveformulatedandevaluatedananalyticmethodol-ogyforclusteringgenesaccordingtogeneexpressiondy-namics.Therelevancenetworksproducedfromthedynam-icsanalysisrevealsignificantandmeaningfulrelationships,indicatingthatthedynamicsapproachisusefulforknowl-edgediscoveryinfunctionalgenomics.Furthermore,thefactthatmostoftheserelationshipsarenotfoundusingacomparablestaticanalysisfurthersuggeststhatthedynamicapproachisactuallynecessaryforamorecompletepictureofgene–geneinteractions.Itisarguedthattheinherentdynamicnatureofcertaingene–generelationshipsrequiresthisinherentlydynamicapproachforknowledgediscovery.Asizablenumberofassociationsarefoundusingboththestaticanddynamicanalyses.Thesimilaritybetweentheresultsofthedynamicanalysisandthoseofthealreadyestablishedstaticanalysisservestofurthervalidatethepro-poseddynamicmethodology.Therewereclearlyalsorelationshipsfoundwiththestaticapproachthatwerenotfoundusingthedynamicapproach.Fromtheseresultsweconcludethattoextractallthevaluableinformationfromgeneexpressionmeasurements,oneneedsafullsetofcomplementaryanalysismethodologiesthatcapturethedynamicsofthesesystems.Withcontinuingworkinthisemergingandimportantareaofresearch,andthecontinueddecreasedcostofmassivelyparallelexpressionmeasurements,thedynamicsapproachisreadytotakeitsplaceamidstthegrowingsetoftoolsforknowledgediscov-eryinfunctionalgenomics.Weanticipatethatmanymoreofthetechniquesdevelopedtohandle“noisy”dynamicpro-cessesinclinicalinformaticswillfindreadyandimmediateapplicationtofunctionalgenomics.FutureWorkTheslopemeasurementsreportedhereweremeasuredbetweenadjacentdatapoints.Longer-termeffectscanbestudiedbymeasuringslopesbetweentimepointsthataremoredistantfromoneanother.Theassociationsreportedhereweremeasuredbetweensimultaneousslopes.Wearecurrentlystudyingpossibletime-laggedassociationsbe-tweenslopes,allowingforsignalpropagationtimesandotherdelays.Thisphasegeneralizationexpandstheanalysismethodologytoextractevenmoreinformationfromthegeneexpressiondata.26REFERENCES1.CheungVG,MorleyM,AguilarF,MassimiA,KucherlapatiR,ChildsG.Makingandreadingmicroarrays.NatGenet1999;21(1Suppl):15–19.2.BotwellD.Optionsavailable—fromstarttofinish—forobtainingexpressiondatabymicroarray.NatGenet1999;1(Suppl):25–32.3.EisenMB,SpellmanPT,BrownPO,BotsteinD.Clusteranalysisanddisplayofgenome-wideexpressionpatterns.ProcNatlAcadSciUSA1998;95(250):14863–8.4.MichaelsGS,CarrDB,AskenaziM,FuhrmanS,WenX,SomogyiR.Clusteranalysisanddatavisualizationoflarge-scalegeneex-pressiondata.PacSympBiocomput1998;42–53.5.SchwartzWB,PatilRS,SzolovitsP.Artificialintelligenceinmedi-cine:wheredowestand?NEnglJMed1987;316(11):685–688.6.HaimowitzIJ,LePP,KohaneIS.Clinicalmonitoringusingregression-basedtrendtemplates.ArtificialIntelligenceMed1995;7:471–472.7.RussTA.Reasoningwithtimedependentdata[PhD]:Massachu-settsInstituteofTechnology;1992.8.KohaneIS.Temporalreasoninginmedicalexpertsystems.In:SalomonR,BlumB,JørgensenM,editors.MEDINFO86/FifthWorldCongressonMedicalInformatics;1986;Washington,DC:ElsevierScience,1986;170–174.9.RutledgeG,ThomsenG,FarrB,TovarM,SheinerL,FaganL.VentPlan:aventilator-managementadvisor.In:ClaytonPD,editor.SymposiumonComputerApplicationsinMedicalCare,1991.Washington,DC,1991;869–871.10.ShaharY,TuS,MusenM.Knowledgeacquisitionfortemporalabstractionmechanisms.KnowledgeAcquisition1992;1(4):217–236.11.KahnMG,FaganLMB,SheinerL.Model-basedinterpretationoftime-varyingmedicaldata.In:ProceedingsSymposiumComputerApplicationsinMedicalCare,19;19.28–32.12.SpellmanPT,SherlockG,ZhangMQ,IyerVR,AndersK,EisenMB,etal.Comprehensiveidentificationofcellcycle-regulatedgenesoftheyeastSaccharomycescerevisiaebymicroarrayhybrid-ization.MolBiolCell1998;9(12):3273–97.13.ChenT,HeHL,ChurchGM.Modelinggeneexpressionwithdifferentialequations.PacSympBiocomput1999;29–40.14.ToronenP,KolehmainenM,WongG,CastrenE.Analysisofgeneexpressiondatausingself-organizingmaps.FEBSLett1999;451(2):142–6.15.TamayoP,SlonimD,MesirovJ,ZhuQ,KitareewanS,DmitrovskyE,etal.Interpretingpatternsofgeneexpressionwithself-organiz-ingmaps:methodsandapplicationtohematopoieticdifferentiation.ProcNatlAcadSciUSA1999;96(6):2907–12.16.ButteA,KohaneI.Mutualinformationrelevancenetworks:func-tionalgenomicclusteringusingpairwiseentropymeasurements.In:AltmanR,DunkerK,HunterL,LauderdaleK,KleinT,editors.PacificSymposiumonBiocomputing2000;Hawaii:WorldScien-tific,2000;418–429.17.ButteAJ,TamayoP,SlonimD,GolubTR,KohaneIS.DiscoveringREIS,BUTTE,ANDKOHANEfunctionalrelationshipsbetweenRNAexpressionandchemothera-peuticsusceptibilityusingrelevancenetworks[InProcessCita-tion].ProcNatlAcadSciUSA2000;97(22):12182–6.18.ButteA,KohaneIS.UnsupervisedKnowledgeDiscoveryinMedi-calDatabasesUsingRelevanceNetworks.In:LorenziN,editor.FallSymposium,AmericanMedicalInformaticsAssociation;1999;Washington,DC:HanleyandBelfus,1999;711–715.19.MitchellP,PetfalskiE,ShevchenkoA,MannM,TollerveyD.TheExosome:AconservedeukaryoticRNAprocessingcomplexcontainingmultiple3Ј→5Јexoribonucleases.Cell1997;91:457–466.20.HwangLH,LauLF,SmithDL,MistrotCA,HardwickKG,HwangES,etal.BuddingYeastCdc20:ATargetoftheSpindleCheck-point.Science1998;279:1041–4.21.WatkinsJF,SungP,PrakashS,PrakashL.TheextremelyconservedaminoterminusofRAD6ubiquitin-conjugatingenzymeisessentialforamino-endrule-dependentproteindegradation.GenesDev1993;7(2):50–61.22.LauderS,BankmannM,GuzderSN,SungP,PrakashL,PrakashS.DualrequirementfortheyeastMMS19geneinDNArepairandRNApolymeraseIItranscription.MolCellBiol1996;16:6783–93.23.CarterAT,BeicheF,Hove-JensenB,NarbadA,BarkerP,SchweizerLM,SchweizerM.PRS1isakeymemberofthegenefamilyencodingphosphoribosylpyrophosphatesynthetaseinSac-charomycescerevisiae.MolGenGenet1997;254:148–56.24.GautierT,BergesT,TollerveyD,HurtE.NucleolarKKE/DrepeatproteinsNop56pandNop58pinteractwithNop1pandarerequiredforribosomebiogenesis.MolCellBiol1997;17:7088–98.25.CiganAM,PabichEK,FengL,DonahueTF.Yeasttranslationinitiationsuppressorsui2encodesthealphasubunitofeukaryoticinitiationfactor2andsharessequenceidentitywiththehumanalphasubunit.ProcNatlAcadSciUSA19;86:2784–8.26.RymondBC.ConvergenttranscriptsoftheyeastPRP38-MSD1locusencodetwoessentialsplicingfactors,includingtheD1corepolypeptideofsmallnuclearribonucleoproteinparticles.ProcNatlAcadSciUSA1993;90:848–52.27.CormackBP,StruhlK.TheTATA-bindingproteinisrequiredfortranscriptionbyallthreenuclearRNApolymerasesinyeastcells.Cell1992;69:685–96.28.ChristmanMF,DietrichFS,FinkGR.MitoticrecombinationintherDNAofS.cerevisiaeissuppressedbythecombinedactionofDNAtopoisomerasesIandII.Cell1988;55:413–25.29.TishkoffDX,BoergerAL,BertrandP,FilosiN,GaidaGM,KaneMF,KolodnerRD.IdentificationandcharacterizationofSaccharo-mycescerevisiaeEXO1,ageneencodinganexonucleasethatinteractswithMSH2.ProcNatlAcadSciUSA1997;94:7487–92.30.CowlesCR,OdorizziG,PayneGS,EmrSD.TheAP-3adaptorcomplexisessentialforcargo-selectivetransporttotheyeastvacu-ole.Cell1997;91:109–118.31.LehmanK,RossiG,AdamoJE,BrennwaldP.YeasthomologuesoftomosynandlethalgiantlarvaefunctioninexocytosisandareassociatedwiththeplasmamembraneSNARE,sec9.JCellBiol1999;146:125–40.32.RouillardJM,DufourME,TheunissenB,MandartE,DujardinG,LacrouteF.SLS1,anewSaccharomycescerevisiaegeneDYNAMICSINGENEEXPRESSIONinvolvedinmitochondrialmetabolism,isolatedasasyntheticlethalinassociationwithanSSM4deletion.MolGenGenet1996;252:700–8.33.HaringtonA,HerbertCJ,TungB,GetzGS,SlonimskiPP.Identifi-cationofanewnucleargene(CEM1)encodingaproteinhomolo-goustoa-keto-acylsynthasewhichisessentialformitochondrial27respirationinSaccharomycescerevisiae.MolMicrobiol1993;9:545–55.34.PallottaMLBC,FratianniA,DeVirgilioC,BarileM,PassarellaS.SaccharomycescerevisiaemitochondriacansynthesiseFMNandFADfromexternallyaddedriboflavinandexportthemtotheextramitochondrialphase.FEBSLett1998;428(3):245–9.