Statistics is the study of how to collect, analyze, and draw conclusions from data. Data science is typically thought of as an interdisciplinary field, combining statistical thinking with elements more traditionally thought of as coming from other fields, such as programming, database management and optimization. There is a stronger focus on the practical aspects of working with data, in particular computing, as well as applications in different domains, such as the sciences, business, sports, and government.
This course is the first of a three-course series on statistical data science. It is an introduction to statistical thinking, with a focus on computing. Students will get familiar with the R programming language and associated data manipulation and visualization tools. Students will also learn the basic statistical concepts of randomness, probability models, sampling variability, hypothesis testing and confidence intervals.
The course website is https://xhtai.github.io/statdatasci/. Lecture notes, homework, supplementary materials, etc., will be posted there. Canvas will be used for lab materials and for turning in labs and homework. Solutions will be posted on Canvas. Piazza will be used for discussion (more details below).
This is an introductory-level course and no knowledge of statistics, data science or programming knowledge will be presumed. The official pre-requisites are: MAT 016A (can be concurrent) or MAT 017A (can be concurrent) or MAT 021A (can be concurrent). The course is not open for credit to students who have taken STA 032 or STA 100. Only 2 units credit will be available for students who have taken STA 013.
Lectures are Mondays, Wednesdays and Fridays, 11-11:50 AM.
Discussions (labs) are run by the TA on Thursdays 12:10-1 PM (Section A01), 1:10-2 PM (Section A02).
Office hours are posted on the course webpage.
R is a free, open-source programming language for statistical computing. RStudio is a free, open-source R programming environment. It contains a built-in code editor, many features to make working with R easier, and works the same way across different operating systems.
All of our computing work in this class will be done using R and RStudio. You will use RStudio for homework, labs and exams, so a working version of RStudio is required. You can choose to download it on your personal computer, or use UC Davis JupyterHub. You will need regular, reliable access to a computer either running an up-to-date version of R and RStudio, or with a working browser (for the JupyterHub option). If this is a problem, please let us know right away. There are resources available to support you. Some are listed here.
The room that labs are held in have computers with RStudio installed, and you may choose to use them. If you are using your own laptop, please make sure that it is charged before class.
A rough schedule is as follows. This is subject to change depending on time and interests.
Week | Topics | Notes |
---|---|---|
1 | Overview of data types | |
2 | Overview of data types | Homework 1 assigned |
3 | Data manipulation and visualization tools | Homework 2 assigned |
4 | Data visualization tools | Midterm on Friday, October 18 |
5 | Descriptive statistics | Homework 3 assigned |
6 | Intro to probability, Probability models | Homework 4 assigned |
7 | Probability models | Homework 5 assigned |
8 | Probability models | Veterans Day; no class Monday. Midterm on Friday, November 15 |
9 | Sampling distributions | Homework 6 assigned |
10 | Confidence Intervals | Thanksgiving; no class Wed, Fri |
11 | Hypothesis testing | Homework 7 assigned (Mon) |
12 | Final exam Thursday, December 12 at 8am |
The grade breakdown is:
Cutoffs for letter grades follow the standard UC Davis grading scheme:
These cutoffs may be adjusted at the conclusion of the quarter in the direction that favors the students. For example, the cutoff for an “A” may end up being adjusted to be lower than 90%, but not higher.
Labs: Labs are due the Monday after the lab session, at 9 PM, and will be turned in via Gradescope (accessible through Canvas). Labs will be completed in Quarto or R Markdown format (file extension qmd
or Rmd
). Labs will involve writing a combination of code and written prose, and the Quarto/R Markdown format allows for a combination of the two. Labs must be submitted only in PDF format, the result of calling “Knit PDF” from RStudio on your qmd
/Rmd
document. Work submitted in any other format will receive a grade of 0, without exception. All code used to produce your results must be shown in your PDF file (e.g., do not use echo = FALSE
or include = FALSE
as options anywhere). qmd
/Rmd
files do not need to be submitted, but may be requested by the TA and must be available when the assignment is submitted.
Students may choose to collaborate with each other on the labs and homework, but must clearly indicate with whom they collaborated.
Homework: Homework will be released on Friday after class, and will be due Thursday the week after, at 9 PM. Exceptions: the last homework assignment (HW 7) will be released on Monday and be due on Sunday at 9 PM. The same policies that apply to labs will apply to homework. Homework may contain other non-coding components, and these can be typed (use any software you are comfortable with), or written and scanned. The submission must be in PDF format.
Midterms and final: There will be two midterms and one final. The midterms will be in class, during the scheduled class times. The final will consist of a coding component and a written component. More details on exams will be announced at a later date.
The lower score of the two midterms will be dropped. There will be no make-up exams. If you must miss an exam due to illness, travel, or some other reason, this will be the exam that will be dropped. For the final, if you have another final starting 30 minutes before or after the scheduled time, you may present documentation and request for an accommodation to start 15 minutes before or after the scheduled time.
Late Work: All labs and homework will be due at 9 PM Pacific Time, on the relevant due date. No late work will be accepted. The lowest homework score will be dropped.
Grade disputes and adjustments: Students have 24 hours after receiving a grade on any assignment to contest it. Grading is consistent and we will provide detailed rubrics. If you think you deserve a different grade, prepare a strong argument and submit it by email to the TA.
Class attendance is strongly encouraged. Please be on time. If you miss a lecture for any reason, you are responsible for all material covered and any announcements made in your absence. Cell phones, laptops, and other electronic devices must be silenced in class. Laptops are to be used in class for learning purposes related to the lecture only.
Chatting and other disruptive behavior may result in a student being dismissed for the day. Repeated disruptive behavior may result in a discussion with your academic advisor on appropriate further action.
One point will be awarded for participation. To earn the point, you have to contribute to a class discussion once during a lecture (not during the discussion section) anytime during the quarter. Contributions include asking or answering a question, or commenting on a discussion or presentation. Please then respond to the survey (Quizzes tab) on Canvas indicating the lecture you contributed to, and what the question, comment or contribution was.
There are three required textbooks. They are all available online and are free.
There will sometimes be additional (optional) reading.
All students are expected to follow the UCD Code of Academic Conduct. Any student who cheats on an assignment or exam will be referred to the Office of Student Support and Judicial Affairs and will receive an automatic failing grade on the relevant assignment. A second instance of academic dishonesty will result in a failing grade in the course. More information on the nature of dishonest academic behavior or UCD policy can be found on the website of the Office of Student Support and Judicial Affairs.
Collaboration is encouraged and students are encouraged to discuss course material with classmates. All work that is turned in, however, must be your own. If students have collaborated on labs or homework, the names of all students working together must be clearly indicated.
Use of Large Language Models (LLMs): use of LLMs such as ChatGPT are not permitted. Suspected use will be reported to OSSJA.
Please do not distribute any course materials outside of this class. This is an infringement of copyright as per UC policy. Use of sites like Course Hero and Chegg are not permitted.
You can access Piazza by clicking on the “Piazza” link on the sidebar in Canvas, or directly through this link. Students are encouraged to answer each others’ questions, and the TA will moderate by checking in every day. The quickest way to get a question answered is likely on Piazza, since anyone in class can answer. These are the rules for posting on Piazza:
If you have a question that requires more than a short paragraph to answer, labs and office hours are the best options.
Office hours: who to ask?
Email will be used only for questions relating to private matters (accommodations, grading, emergencies, etc.). Questions about class logistics and content should be posted on Piazza, asked in class, or during office hours. Canvas messages will not receive a response.
Statistics Tutors at the Academic Assistance and Tutoring Centers provide support for RStudio. More information is available here.
Many students face different challenges during college, and it is healthy to seek support. This is a comprehensive list of resources covering general academics, health and wellness, finances, housing, career/internship, and other topics.
Health and wellness resources are available here. If you have an emergency, call 911 immediately, or go to the nearest emergency room. Mental health staff are available 24 hours/7 days week by phone at 530-752-2349. (Follow the prompts to reach a counselor.)
UC Davis is committed to educational equity in the academic setting, and in serving a diverse student body. All students who are interested in learning more about the Student Disability Center (SDC) are encouraged to contact them directly at https://sdc.ucdavis.edu, sdc@ucdavis.edu or 530-752-3184. If you are a student who requires academic accommodations, please submit your SDC Letter of Accommodation as soon as possible, ideally within the first two weeks of this course.