Course overview

Statistics is the study of how to collect, analyze, and draw conclusions from data. Data science is typically thought of as an interdisciplinary field, combining statistical thinking with elements more traditionally thought of as coming from other fields, such as programming, database management and optimization. There is a stronger focus on the practical aspects of working with data, in particular computing, as well as applications in different domains, such as the sciences, business, sports, and government.

This course is the first of a three-course series on statistical data science. It is an introduction to statistical thinking, with a focus on computing. Students will get familiar with the R programming language and associated data manipulation and visualization tools. Students will also learn the basic statistical concepts of randomness, probability models, sampling variability, hypothesis testing and confidence intervals.

Course logistics

The course website is https://xhtai.github.io/statdatasci/. Lecture notes, homework, supplementary materials, etc., will be posted there. Canvas will be used for lab materials and for turning in labs and homework. Solutions will be posted on Canvas. Piazza will be used for discussion (more details below).

Pre-requisites and credit limitations

This is an introductory-level course and no knowledge of statistics, data science or programming knowledge will be presumed. The official pre-requisites are: MAT 016A (can be concurrent) or MAT 017A (can be concurrent) or MAT 021A (can be concurrent). The course is not open for credit to students who have taken STA 032 or STA 100. Only 2 units credit will be available for students who have taken STA 013.

Course mechanics

Lectures are Mondays, Wednesdays and Fridays, 11-11:50 AM.

Discussions (labs) are run by the TA on Thursdays 12:10-1 PM (Section A01), 1:10-2 PM (Section A02).

Office hours are posted on the course webpage.

R and RStudio

R is a free, open-source programming language for statistical computing. RStudio is a free, open-source R programming environment. It contains a built-in code editor, many features to make working with R easier, and works the same way across different operating systems.

All of our computing work in this class will be done using R and RStudio. You will use RStudio for homework, labs and exams, so a working version of RStudio is required. You can choose to download it on your personal computer, or use UC Davis JupyterHub. You will need regular, reliable access to a computer either running an up-to-date version of R and RStudio, or with a working browser (for the JupyterHub option). If this is a problem, please let us know right away. There are resources available to support you. Some are listed here.

The room that labs are held in have computers with RStudio installed, and you may choose to use them. If you are using your own laptop, please make sure that it is charged before class.

Course schedule

A rough schedule is as follows. This is subject to change depending on time and interests.

Week Topics Notes
1 Overview of data types
2 Overview of data types Homework 1 assigned
3 Data manipulation and visualization tools Homework 2 assigned
4 Data visualization tools Midterm on Friday, October 18
5 Descriptive statistics Homework 3 assigned
6 Intro to probability, Probability models Homework 4 assigned
7 Probability models Homework 5 assigned
8 Probability models Veterans Day; no class Monday. Midterm on Friday, November 15
9 Sampling distributions Homework 6 assigned
10 Confidence Intervals Thanksgiving; no class Wed, Fri
11 Hypothesis testing Homework 7 assigned (Mon)
12 Final exam Thursday, December 12 at 8am

Evaluation

The grade breakdown is:

  • Labs: 10%
  • Six (of seven) homework assignments: 6*4 = 24%
  • Participation: 1%
  • One (of two) midterm: 30%
  • Final: 35%

Cutoffs for letter grades follow the standard UC Davis grading scheme:

  • A: 93% or higher
  • A-: 90% or higher
  • B+: 87% or higher
  • B: 83% or higher
  • B-: 80% or higher
  • C+: 77% or higher
  • C: 73% or higher
  • C-: 70% or higher
  • D+: 67% or higher
  • D: 63% or higher
  • D-: 60% or higher

These cutoffs may be adjusted at the conclusion of the quarter in the direction that favors the students. For example, the cutoff for an “A” may end up being adjusted to be lower than 90%, but not higher.

Labs: Labs are due the Monday after the lab session, at 9 PM, and will be turned in via Gradescope (accessible through Canvas). Labs will be completed in Quarto or R Markdown format (file extension qmd or Rmd). Labs will involve writing a combination of code and written prose, and the Quarto/R Markdown format allows for a combination of the two. Labs must be submitted only in PDF format, the result of calling “Knit PDF” from RStudio on your qmd/Rmd document. Work submitted in any other format will receive a grade of 0, without exception. All code used to produce your results must be shown in your PDF file (e.g., do not use echo = FALSE or include = FALSE as options anywhere). qmd/Rmd files do not need to be submitted, but may be requested by the TA and must be available when the assignment is submitted.

Students may choose to collaborate with each other on the labs and homework, but must clearly indicate with whom they collaborated.

Homework: Homework will be released on Friday after class, and will be due Thursday the week after, at 9 PM. Exceptions: the last homework assignment (HW 7) will be released on Monday and be due on Sunday at 9 PM. The same policies that apply to labs will apply to homework. Homework may contain other non-coding components, and these can be typed (use any software you are comfortable with), or written and scanned. The submission must be in PDF format.

Midterms and final: There will be two midterms and one final. The midterms will be in class, during the scheduled class times. The final will consist of a coding component and a written component. More details on exams will be announced at a later date.

The lower score of the two midterms will be dropped. There will be no make-up exams. If you must miss an exam due to illness, travel, or some other reason, this will be the exam that will be dropped. For the final, if you have another final starting 30 minutes before or after the scheduled time, you may present documentation and request for an accommodation to start 15 minutes before or after the scheduled time.

Late Work: All labs and homework will be due at 9 PM Pacific Time, on the relevant due date. No late work will be accepted. The lowest homework score will be dropped.

Grade disputes and adjustments: Students have 24 hours after receiving a grade on any assignment to contest it. Grading is consistent and we will provide detailed rubrics. If you think you deserve a different grade, prepare a strong argument and submit it by email to the TA.

Attendance and participation

Class attendance is strongly encouraged. Please be on time. If you miss a lecture for any reason, you are responsible for all material covered and any announcements made in your absence. Cell phones, laptops, and other electronic devices must be silenced in class. Laptops are to be used in class for learning purposes related to the lecture only.

Chatting and other disruptive behavior may result in a student being dismissed for the day. Repeated disruptive behavior may result in a discussion with your academic advisor on appropriate further action.

One point will be awarded for participation. To earn the point, you have to contribute to a class discussion once during a lecture (not during the discussion section) anytime during the quarter. Contributions include asking or answering a question, or commenting on a discussion or presentation. Please then respond to the survey (Quizzes tab) on Canvas indicating the lecture you contributed to, and what the question, comment or contribution was.

Textbooks

There are three required textbooks. They are all available online and are free.

  1. R for Data Science by Hadley Wickham, Mine Çetinkaya-Rundel and Garrett Grolemund. 2nd Edition, 2023. Available here.
  2. Introduction to Modern Statistics by Mine Çetinkaya-Rundel and Johanna Hardin. 2nd Edition, 2024. Available here.
  3. OpenIntro Statistics by David Diez, Mine Çetinkaya-Rundel and Christopher Barr. 4th Edition, 2019. Available here.

There will sometimes be additional (optional) reading.

  1. Art of R Programming by Norman Matloff. 2011. (Look on Google)

Collaboration, copying, and plagiarism

All students are expected to follow the UCD Code of Academic Conduct. Any student who cheats on an assignment or exam will be referred to the Office of Student Support and Judicial Affairs and will receive an automatic failing grade on the relevant assignment. A second instance of academic dishonesty will result in a failing grade in the course. More information on the nature of dishonest academic behavior or UCD policy can be found on the website of the Office of Student Support and Judicial Affairs.

Collaboration is encouraged and students are encouraged to discuss course material with classmates. All work that is turned in, however, must be your own. If students have collaborated on labs or homework, the names of all students working together must be clearly indicated.

Use of Large Language Models (LLMs): use of LLMs such as ChatGPT are not permitted. Suspected use will be reported to OSSJA.

Distribution of course materials

Please do not distribute any course materials outside of this class. This is an infringement of copyright as per UC policy. Use of sites like Course Hero and Chegg are not permitted.

Getting help

Labs, office hours and Piazza

You can access Piazza by clicking on the “Piazza” link on the sidebar in Canvas, or directly through this link. Students are encouraged to answer each others’ questions, and the TA will moderate by checking in every day. The quickest way to get a question answered is likely on Piazza, since anyone in class can answer. These are the rules for posting on Piazza:

  1. Please be respectful. Any content deemed inappropriate will be taken down by the TA, and reported to the instructor.
  2. Please search before you post since your question may have already been answered.
  3. Posting code that is part of your solutions for labs and homework, and asking “what is wrong with this code?” is not acceptable. “I don’t know either” is not an appropriate answer as it does not contribute constructively to the conversation. Along with your posted question, explain what else you tried that didn’t work. The answers to many common coding questions can be found on https://stackoverflow.com/.

If you have a question that requires more than a short paragraph to answer, labs and office hours are the best options.

Office hours: who to ask?

  • Xiao Hui Tai: Questions regarding contents of the lectures, organizational aspects of the course.
  • Oscar Rivera: Questions regarding contents of the course, discussions, homework assignments and their grading, code, Canvas and Piazza.

Email

Email will be used only for questions relating to private matters (accommodations, grading, emergencies, etc.). Questions about class logistics and content should be posted on Piazza, asked in class, or during office hours. Canvas messages will not receive a response.

Other campus resources

Statistics Tutors at the Academic Assistance and Tutoring Centers provide support for RStudio. More information is available here.

Many students face different challenges during college, and it is healthy to seek support. This is a comprehensive list of resources covering general academics, health and wellness, finances, housing, career/internship, and other topics.

Health and wellness resources are available here. If you have an emergency, call 911 immediately, or go to the nearest emergency room. Mental health staff are available 24 hours/7 days week by phone at 530-752-2349. (Follow the prompts to reach a counselor.)

Accommodations for students with disabilities

UC Davis is committed to educational equity in the academic setting, and in serving a diverse student body. All students who are interested in learning more about the Student Disability Center (SDC) are encouraged to contact them directly at https://sdc.ucdavis.edu, or 530-752-3184. If you are a student who requires academic accommodations, please submit your SDC Letter of Accommodation as soon as possible, ideally within the first two weeks of this course.