A Research-Class Computing Ecosystem for Students

The UC San Diego JupyterHub allows students to perform data- and compute-intensive research for coursework and independent studies.

Chemistry student Xuan Zhang using the UC San Diego Data Science/Machine Learning Platform for her courses. (Image courtesy of UC San Diego.)

UC San Diego student Xuan Zhang using the UC San Diego Data Science/Machine Learning Platform for her courses. (Image courtesy of UC San Diego.)

“I can’t get a job interview if I haven’t run TensorFlow on a GPU on a real problem.”

One student’s comment, overheard more than three years ago, inspired UC San Diego to implement a high-performance computational system—and now students are reaping the benefits.

The Data Science/Machine Learning Platform began development in 2017, spearheaded by engineering and computer science professor Larry Smarr, the one who overheard that initial comment. He was involved in a National Science Foundation-funded project called the Pacific Research Platform (PRP), and realized that the innovations he was pursuing there could also be leveraged to create a better computing infrastructure for computationally significant research, such as that using machine learning or data visualizations. This hardware infrastructure, dubbed the JupyterHub, provides UCSD students with access to computing resources that enable them to solve real-world data- and compute-intensive problems. 

The JupyterHub platform is currently being used in multiple courses across the UC San Diego campus, including engineering, data science, computer science, and the physical sciences, among many others.

According to UC San Diego, the objective behind developing the Data Science/Machine Learning Platform was to enable students in undergraduate and graduate studies to access research-class central processing unit (CPU) and graphics processing unit (GPU) hardware resources for their coursework, projects and independent studies. The platform aims to make it easier for students to continue work between classroom projects and follow-on research projects.

“Our students are getting access to the same level of computing capacity that normally only a researcher using an advanced system like a supercomputer would get. The students are exploring much more complex data problems because they can,” Smarr says to UCSD News.

This commodity hardware approach to high-performance computing enabled UC San Diego to establish a “dynamic and innovative on-premises ecosystem” for data- and compute-intensive coursework. The campus used to rely on commercial cloud services which resulted in students losing access to their coursework after finishing their classes. Long-term access to these types of technology and tools provides students with opportunities to work on research projects even beyond the classroom.

​​”The commercial cloud doesn’t provide an ecosystem that gives students the same platform from course to course, or the same platform they have in their courses as they have in their research,” says UCSD IT Services senior architect Adam Tilghman. “This is especially true in the graduate area where students are starting work in a course context and then they continue that work in their research. It’s that continuity, even starting as a lower division undergraduate, all the way up.”

“It’s essential for the nation that students all across campus learn and work on computing infrastructure that is relevant for their future, whether it’s in industry, academia, or the public sector,” added Albert P. Pisano, dean of the UCSD Jacobs School of Engineering. “These information technology ecosystems being created and deployed on campus are critical for empowering our students to leverage innovations to serve society.”

The Data Science/Machine Learning Platform was developed by an interdisciplinary team formed alongside UC San Diego’s IT Services (ITS) division. By designing their own platform, UC San Diego was able to avoid spending over $1 million in commercial cloud computing costs. The computational building blocks used to develop the system included hardware repurposed by UC San Diego’s ITS such as rack-mounted PCs containing multi-core CPUs and eight GPUs optimized for data-intensive projects including machine learning.

The Data Science/Machine Learning Platform comprises rack-mounted PCs optimized for data-intensive projects. (Image courtesy of UC San Diego.)

The Data Science/Machine Learning Platform comprises rack-mounted PCs optimized for data-intensive projects. (Image courtesy of UC San Diego.)

UCSD student Xuan Zhang uses the Data Science/Machine Learning Platform in her classes, which often involve data- and visualization-intensive tasks. In her recent research work for her Ph.D. dissertation, she was able to discover that higher order genetic structures (R-Loops) could be successfully regulated by short tandem repeats (STRs) thanks to UC San Diego’s new computing infrastructure.

Even after finishing their classes, students still have the opportunity to access and build on their research by obtaining their own independent research profile on the Data Science/Machine Learning Platform.

The platforms’ Jupyter notebooks are a software environment that allows students to write code and visualize data. They allows both students and professors to streamline and scale their workload.

“With these Jupyter Notebooks, you can automatically embed the grading system. It saves a lot of work,” shared UCSD professor Melissa Gymrek from the Department of Computer Science and Engineering and the Department of Medicine’s Division of Genetics. “It was hard to go past a dozen students. Now, you can scale,” says Gymrek.

Before the Data Science/Machine Learning Platform was in place, students typically had to send PDFs of their problem sets, making grading more time-intensive on both sides. Gymrek says that she expanded access to her personal genomics graduate class from a dozen students to more than 50 thanks to the platform.

“The platform is truly transforming education. Unlike many learning technology innovations, classes in every division at UC San Diego have used the Data Science/Machine Learning Platform. Many thousands of students use it every year. It’s innovation with real impact, preparing our students in many—sometimes unexpected—fields to be leaders and innovators when they graduate,” says UCSD Academic Technology Senior Director Valerie Polichar.