Course overview

Statistics is the study of how to collect, analyze, and draw conclusions from data. Data science is typically thought of as an interdisciplinary field, combining statistical thinking with elements more traditionally thought of as coming from other fields, such as programming, database management and optimization. There is a stronger focus on the practical aspects of working with data, in particular computing, as well as applications in different domains, such as the sciences, business, sports, and government.

This course is the first of a three-course series on statistical data science. It is an introduction to statistical thinking, with a focus on computing. Students will get familiar with the R programming language and associated data manipulation and visualization tools. Students will also learn the basic statistical concepts of randomness, probability models, sampling variability, hypothesis testing and confidence intervals.

Course logistics

The course website is https://xhtai.github.io/statdatasci/. Lecture notes, homework, supplementary materials, etc., will be posted there. Canvas will be used for lab materials and for turning in labs and homework. Solutions will be posted on Canvas. Piazza will be used for discussion (more details below).

Pre-requisites and credit limitations

This is an introductory-level course and no knowledge of statistics, data science or programming knowledge will be presumed. The official pre-requisites are: MAT 016A (can be concurrent) or MAT 017A (can be concurrent) or MAT 021A (can be concurrent). The course is not open for credit to students who have taken STA 032 or STA 100. Only 2 units credit will be available for students who have taken STA 013.

Course mechanics

Lectures are Mondays, Wednesdays and Fridays, 11-11:50 AM.

Discussions (labs) are run by the TA on Thursdays 12:10-1 PM (Section A01), 1:10-2 PM (Section A02).

Office hours are posted on the course webpage.

R and RStudio

R is a free, open-source programming language for statistical computing. RStudio is a free, open-source R programming environment. It contains a built-in code editor, many features to make working with R easier, and works the same way across different operating systems.

All of our computing work in this class will be done using R and RStudio. You will use RStudio for homework, labs and exams, so a working version of RStudio is required. You can choose to download it on your personal computer, or use UC Davis JupyterHub. You will need regular, reliable access to a computer either running an up-to-date version of R and RStudio, or with a working browser (for the JupyterHub option). If this is a problem, please let us know right away. There are resources available to support you. Some are listed here.

The room that labs are held in have computers with RStudio installed, and you may choose to use them. If you are using your own laptop, please make sure that it is charged before class.

Course schedule

A rough schedule is as follows. This is subject to change depending on time and interests.

Week Topics Notes
1 Overview of data types
2 Overview of data types Homework 1 assigned
3 Data manipulation and visualization tools Homework 2 assigned
4 Data visualization tools Midterm on Friday, October 20
5 Descriptive statistics Homework 3 assigned
6 Intro to probability, Probability models Homework 4 assigned
7 Probability models Homework 5 assigned. Veterans Day; no class Friday
8 Probability models Midterm on Friday, November 17
9 Sampling distributions Thanksgiving; no class Wed, Fri
10 Sampling distributions, Confidence Intervals Homework 6 assigned (Mon)
11 Hypothesis testing Homework 7 assigned (Mon)
12 Final exam Monday, December 11 at 1:00 pm

Evaluation

The grade breakdown is:

  • 15% labs
  • 25% homework
  • 30% midterms
  • 30% final

Cutoffs for letter grades are:

  • A: 90% or higher
  • B: 80% to 89%
  • C: 70% to 79%
  • D: 60% to 69%
  • F: 59% or lower

These cutoffs may be adjusted at the conclusion of the quarter in the direction that favors the students. For example, the cutoff for an “A” may end up being adjusted to be lower than 90%, but not higher.

Labs: Labs are due the Monday after the lab session, at 9 PM, and will be turned in via Gradescope (accessible through Canvas). Labs will be completed in R Markdown format (file extension Rmd). Labs will involve writing a combination of code and written prose, and the R Markdown format allows for a combination of the two. Labs must be submitted only in PDF format, the result of calling “Knit PDF” from RStudio on your R Markdown document. Work submitted in any other format will receive a grade of 0, without exception. All code used to produce your results must be shown in your PDF file (e.g., do not use echo = FALSE or include = FALSE as options anywhere). Rmd files do not need to be submitted, but may be requested by the TA and must be available when the assignment is submitted.

Students may choose to collaborate with each other on the labs and homework, but must clearly indicate with whom they collaborated.

Homework: Homework will be released on Friday after class, and will be due Thursday the week after, at 9 PM. Exceptions: the last two homework assignments (6 and 7) will be released on Monday and be due on Sunday the week after, at 9 PM. The same policies that apply to labs will apply to homework. Homework may contain other non-coding components, and these can be typed (use any software you are comfortable with), or written and scanned. The submission must be in PDF format.

Midterms and final: There will be two midterms and one final. The midterms will be in class, during the scheduled class times. The final will consist of a coding component and a written component. More details on exams will be announced at a later date.

The lower score of the two midterms will be dropped. There will be no make-up exams. If you must miss an exam due to illness, travel, or some other reason, this will be the exam that will be dropped. For the final, if you have another final starting 30 minutes before or after the scheduled time, you may present documentation and request for an accommodation to start 15 minutes before or after the scheduled time.

Late Work: All labs and homework will be due at 9 PM Pacific Time, on the relevant due date. Late work will be accepted up to 48 hours after the deadline. In the first 24 hours, you will receive a 50% penalty, i.e., your score will be halved. If you submit 24-48 hours after the deadline, you will receive 25% of your score, i.e., if you score 80/100, you will receive 20/100.

Grade disputes and adjustments: Students have 24 hours after receiving a grade on any assignment to contest it. Grading is consistent and we will provide detailed rubrics. If you think you deserve a different grade, prepare a strong argument and submit it by email to the TA.

Attendance and participation

Class attendance is strongly encouraged. Please be on time. If you miss a lecture for any reason, you are responsible for all material covered and any announcements made in your absence. Active participation is encouraged, both in class and on Piazza. Cell phones, laptops, and other electronic devices must be silenced in class. Laptops are to be used in class for learning purposes related to the lecture only.

Textbooks

There are three required textbooks. They are all available online and are free.

  1. R for Data Science by Hadley Wickham and Garrett Grolemund. 1st Edition, 2017. Available here.
  2. Introduction to Modern Statistics by Mine Çetinkaya-Rundel and Johanna Hardin. 1st Edition, 2021. Available here.
  3. OpenIntro Statistics by David Diez, Mine Cetinkaya-Rundel and Christopher Barr. 4th Edition, 2019. Available here.

There will sometimes be additional (optional) reading.

  1. Art of R Programming by Norman Matloff. 2011. (Look on Google)

Collaboration, copying, and plagiarism

All students are expected to follow the UCD Code of Academic Conduct. Any student who cheats on an assignment or exam will be referred to the Office of Student Support and Judicial Affairs and will receive an automatic failing grade on the relevant assignment. A second instance of academic dishonesty will result in a failing grade in the course. More information on the nature of dishonest academic behavior or UCD policy can be found on the website of the Office of Student Support and Judicial Affairs.

Collaboration is encouraged and students are encouraged to discuss course material with classmates. All work that is turned in, however, must be your own. If students have collaborated on labs or homework, the names of all students working together must be clearly indicated.

Distribution of course materials

Please do not distribute any course materials outside of this class. This is an infringement of copyright as per UC policy. Use of sites like Course Hero and Chegg are not permitted.

Getting help

Labs, office hours and Piazza

You can access Piazza by clicking on the “Piazza” link on the sidebar in Canvas, or directly through this link. Students are encouraged to answer each others’ questions, and the TA will moderate by checking in every day. The quickest way to get a question answered is likely on Piazza, since anyone in class can answer. These are the rules for posting on Piazza:

  1. Please be respectful. Any content deemed inappropriate will be taken down by the TA, and reported to the instructor.
  2. Please search before you post since your question may have already been answered.
  3. Posting code that is part of your solutions for labs and homework, and asking “what is wrong with this code?” is not acceptable. “I don’t know either” is not an appropriate answer as it does not contribute constructively to the conversation. Along with your posted question, explain what else you tried that didn’t work. The answers to many common coding questions can be found on https://stackoverflow.com/.

If you have a question that requires more than a short paragraph to answer, labs and office hours are the best options.

Email

Email will be used only for questions relating to private matters (accommodations, grading, emergencies, etc.). Questions about class logistics and content should be posted on Piazza, asked in class, during labs, or during office hours. If you must ask a question about a non-private matter via email, you must first document how you tried to answer the question for yourself or through other means (e.g., “I double checked the syllabus,” “There are conflicting responses on Piazza” …). Emails that do not follow these guidelines may not be answered.

Please do not send me messages on Canvas. I do not monitor my Canvas inbox.

Other campus resources

Statistics Tutors at the Academic Assistance and Tutoring Centers provide support for RStudio. More information is available here.

Many students face different challenges during college, and it is healthy to seek support. This is a comprehensive list of resources covering general academics, health and wellness, finances, housing, career/internship, and other topics.

Health and wellness resources are available here. If you have an emergency, call 911 immediately, or go to the nearest emergency room. Mental health staff are available 24 hours/7 days week by phone at 530-752-2349. (Follow the prompts to reach a counselor.)

Accommodations for students with disabilities

UC Davis is committed to educational equity in the academic setting, and in serving a diverse student body. All students who are interested in learning more about the Student Disability Center (SDC) are encouraged to contact them directly at https://sdc.ucdavis.edu, or 530-752-3184. If you are a student who requires academic accommodations, please submit your SDC Letter of Accommodation to us as soon as possible, ideally within the first two weeks of this course.