(Prerequisites: CS 409) 

Introduction: Basic Data Mining Tasks, Database / OLTP Systems, Data Warehousing, OLAP Systems, Related Concepts (Statistics, Fuzzy Sets and Fuzzy Logic, Information Retrieval, Decision Support Systems, Dimensional Modeling, Machine Learning, Pattern Matching). Data Preprocessing, Exploratory Data Analysis, Statistical Approaches to Estimation and Prediction. Association Rule Mining. Classification and Prediction: Introduction, Decision Tree Induction Methods, Bayesian Classification, Rule Based Algorithms, Neural Network Based Algorithms. Cluster Analysis: Introduction, Similarity and Distance Measures, Partitioning Methods, Hierarchical Methods, Outlier Analysis. Web Mining: Web Content Mining, Web Structure Mining, Web Usage Mining. Applications and Trends in Data Mining. Some practical assignments will be given for this course

Review of the following in the context of bioinformatics: Basic probability, statistical inference, stochastic processes, computer intensive approaches to statistical inference, applications. Mathematical models and computational methods of statistical genetics including mendelian genetic traits, population genetics, pedigree relationships and gene identity, meiosis and recombination, linkage detection, multipoint linkage analysis. Course work involves some computation in a Unix environment.