Data for Impact Summer Institute

You are here



Click Here (or Scroll Down) for full D4I Project Descriptions


The Data for Impact Summer Institute is a eight-week project-centric program for highly motivated students focused on Data Science, Research, Analytics, Visualization, and Execution. The Institute begins with a one-week workshop series, taught by Lehigh faculty, that will get students up to speed on the core knowledge of data statistics, applications, analysis and visualization. For the remaining seven weeks, student teams will work on a wide array of interdisciplinary data-centric projects with meaningful outcomes. Projects may address compelling social, economic, population health, or community-related topics with aspirations for contributing to real sustainable impact. These projects will be faculty-guided and inquiry-driven, and prepare students to engage with faculty research while empowering them to pursue their own ideas and data-related career pathways. Project teams have the opportunity to advance their work into the academic year through the multi-year Creative Inquiry project framework.

The Data for Impact Summer Institute is offered by the Office of Creative Inquiry, the Martindale Center for the Study of Private Enterprise, and the Institute for Data, Intelligent Systems, and Computation (I-DISC). The D4I Summer Institute specifically welcomes applications from rising sophomores and juniors but is open to all Lehigh students. We have 12+ projects that will engage ~40 students chosen through a competitive application process. This immersive Summer Institute runs from June 15 to Aug 7 and students are expected to commit full-time to both the workshops and their projects. Students with high financial need can apply for wage replacement funds. This funding pool is limited and students with the highest need will be prioritized.

Student applications will open on May 19 and be due on May 26. The application will request that students indicate their top two project choices. After a quick interview process, we anticipate informing selected students by June 1. Students who successfully complete the Institute will be named “Data for Impact Fellows” and this will add to their portfolios of accomplishment.   

Students: What’s in it for me?

  • Develop / strengthen skill sets in data science, analysis and visualization: a growing field for all disciplines that opens new career pathways;

  • Experience working with interdisciplinary teams and driving research forward collaboratively with student, faculty, and external partners;

  • Opportunities to work collaboratively with other self-selected, motivated students from across the university on ambitious data-centric projects;

  • Applying principles of data science to high-impact, real-world projects / applying knowledge to practical execution;
  • Learning fast, learning big, learning to fail and reiterate, and learning on the go;

  • Prepare your resume to join faculty research endeavors, pursue your own ideas for new ventures, and qualify for new internship and career opportunities.

Program Structure

  • The program begins on June 15th and ends on August 7th, lasting eight weeks.

  • June 15-19 is a one-week workshop series, followed by immersive project work (Mandatory). Additional pre-program workshops may be offered for students with no or very limited experience working with data.

  • Weekly Check-Ins from Teams (Mandatory)

  • Weekly Seminar by Industry/Academic Experts (Mandatory)

  • Pop-up classes by Lehigh folks and external subject matter experts will be offered every week. Faculty can make specific workshops mandatory for their own teams.

  • Presenting at the Virtual Summer Expo (Mandatory)

Information for Students

How do students apply?

Application Link: . The application takes about 15 minutes to complete.

When are student applications due?

Students must submit their completed applications by 11:59 EDT on Saturday, May 23rd to be guaranteed full consideration for the program.

Is there a cost to the D4I Summer Institute?

No! The Institute is tuition-free, and students with demonstrated financial need can apply for wage-replacement stipends of up to $400 per week for the eight weeks of the Institute. More information can be given on that during the application and interview process.

Why should I become a D4I Fellow?

See above for program benefits - mostly, though, you want to do this because you are a motivated, intelligent, outcomes-focused individual who wants to continue to develop your skill sets and mindsets along your career pathway, and because COVID-19 has made summer internships and other opportunities difficult to come by.

Information for Faculty

I have a project idea. What next?

Given the short time frame, we are expecting initial information including a project title, lead faculty mentor, other faculty involved, and a short summary by May 18 if possible. After initial conversations and confirmation, and in order to facilitate the selection of students, we will request (before the program commences on June 15) a brief (1-2 page) project concept that responds to the following prompts (bullet points are completely fine):

1. Demographics
      a. Project Title
      b. Lead Faculty Member
      c. Partnering Faculty
      d. Ideal Student Profile
      e. Specific Resources Necessary

2. Dream and Impact
      a. What is the ultimate, or hoped-for, dream of the project? How might you pursue this dream?
      b. What is the topic/question/possibility/mode of inquiry you will employ?
      c. What is the project’s potential for impact? What might your impact look like? What disciplines, fields, or spheres will your work influence?

3. Project Scope
      a. What would make this project a potential game-changer? 
      b. What is the new intellectual/creative pathway you are taking?
      c. How is this project collaborative? What communities of practice are you contributing to, and calibrating against?

What are the criteria for D4I projects?

Projects should be data intensive and involve creative inquiry, defined as pursuing a novel intellectual or creative pathway. Projects should meet the 4 C's: Creative Inquiry; Convergence (of disciplines, methodologies, epistemologies, etc.); Commitment to Impact; and, Community (i.e. interdisciplinary). Continuity (the 5th C) is optional, but can come with additional resources and support.

What's in it for faculty project mentors?

Faculty project mentors can:

  • Advance a research project that aligns with their research and impact agenda;

  • Work with a highly self-selected student cohort to work on the project in the Summer, and more importantly, through the academic year;

  • Have access to project funding through the Creative Inquiry / Mountaintop infrastructure through the academic year.

What are the ways in which interested faculty can participate without mentoring a project?

We are looking for faculty who are interested in one of the following roles:

  1. Helping teach the bootcamp portion of the D4I Summer Institute;
  2. Contribute content that would benefit project teams in the Institute as they advance their work forward;
  3. Offer one-off guest workshops and seminars, or help connect us with industry folks whom we should invite to deliver workshops.

Data for Impact Summer Institute 2020

Project List and Descriptions

1.       Are Public Family Firms Aiming for Exit or for the Long Term?

Faculty mentor: Jesus Salas, Associate Professor, Finance

- Under the direction of Associate Professor of Finance Jesus Salas, the Lehigh Martindale Center’s Family Business Institute is amassing the largest database available in the world on publicly traded family firms. The long-term goal is to understand issues related to the operations, management, governance, control, and ownership of family firms and how and why their performance differs relative to other businesses. The database will serve years (possibly decades) of faculty and student research projects. The Summer 2020 Data for Impact team will collect and analyze select data on publicly-traded family firms, thereby contributing to the larger database. Depending on interests, students may also tackle independent questions related to how and why the performance of family firms differs from non-family firms and/or contribute to research publications at the frontiers of the field. In particular, the team may explore an emerging puzzle about the differences in financial performance, governance, and control between family firms that appear to be driven by entrepreneurial “exit” strategies targeting sale or acquisition vs those aiming for long-term continued control by the founders’ families. Understanding these differences, and identifying potential indicators for outside investors of these different strategies, could have important implications for investors, management, and boards of directors of such firms.

2. Data-to-control: Towards data-driven model predictive control for chemical process automation
Faculty mentors: Mayuresh Kothare, Professor and Chair, Chemical & Biomolecular Engineering; Srinivas Rangarajan, Assistant Professor, Chemical & Biomolecular Engineering.
- Most chemical and biological processes are dynamical systems. This means that their state variables (i.e. variables that characterize what state the system is in) are continuously changing, often underlined by highly nonlinear correlated behavior that many not be easily captured by physics-based models. Modern plants in the energy and chemical industry have advanced data acquisition technologies, enabled in many cases by solutions offered by OSISoft LLC, the industrial partners on this project. These technologies allow for collecting, storing, and analyzing data from thousands of sensors every second (or faster). Our ultimate goal is to leverage this data to design, optimize, and control new energy and chemical systems. We will begin addressing this larger goal by developing algorithms that will allow us to extract the underlying ordinary differential equations from time-varying data. This algorithm will then allow us to take time-varying plant data and build data-driven dynamic equations that accurately captures the overall process. We specifically intend to build on the state-of-the-art algorithms from the applied mathematics community on inferring equations from data that have been successfully applied in the fluid mechanics domain by incorporating a number of new features including the concept of infusing chemical engineering domain knowledge as constraints while training the data-driven model.

3. Digital Humanities, Data Science, and Design
Faculty mentor: Edward Whitley, Professor of English
- The journal PMLA recently expressed “hope for an interdisciplinary encounter where digital humanities meets data science.” These two research fields have remained relatively separate, despite having similar goals, and now scholars at Lehigh are poised to realize this encounter. The field of Digital Humanities sees scholars in disciplines such as history and literature using technology to archive, analyze, and visualize the human cultural record. In Data Science, computer scientists and statisticians use natural language processing, artificial intelligence, and machine learning to understand textual and numerical data. Over the past year, Digital Humanists at Lehigh have worked with scholars from Lehigh’s Data X Initiative to identify design as a productive meeting ground for interdisciplinary collaboration. As both theory and practice, design captures a range of activities relevant to both groups—constructing data sets, creating user interfaces, and visualizing information—while design thinking has emerged as a multi-disciplinary process for creative problem solving. The Data for Impact student team will (1) interview faculty and staff who work in digital humanities, data science, and design to assess existing opportunities and expertise at Lehigh; (2) inventory programs in data science and digital humanities at other universities and research centers; and (3) use digital tools like Tableau to capture and display the information gathered in steps (1) and (2).

4. Lehigh Valley Arts & Culture in Crisis: Cultural Asset Mapping & COVID-19 Impact Assessment
Faculty mentor: Todd Watkins, Professor of Economics & Executive Director, Martindale Center for the Study of Private Enterprise
- Arts & cultural organizations and artists everywhere are suffering from a dramatic loss of performance opportunities, ticket revenue, grants, and other significant sources of income. A healthy creative sector contributes strongly to a region’s long term economic growth and quality of life. The Lehigh Valley may be particularly at risk with its high concentration of arts and cultural organizations and its proximity to New York City. An emergency survey of artists and non-profit arts and cultural organizations conducted in April 2020 by the Cultural Coalition of Allentown revealed a dire situation in the region’s arts and culture community. Individual artists have lost most of their income and nearly all non-profit arts & culture organizations across the Valley have had to cancel critical events. Preliminary survey results estimate an attendance decline of 2.5 million to 3.5 million which threatens the very existence of many organizations. A second more detailed impact survey is under development for distribution in mid-May. The Data for Impact student team’s goals will be (1) analysis of data from those two surveys; (2) development of an asset map of the region’s arts and cultural resources and of the impact on those assets of the COVID-19 crisis; (3) development of high-impact visualizations of (1) and (2) in support of a major advocacy campaign among the region’s political leaders and philanthropic and donor communities to support artists and non-profit arts and cultural organizations in crisis.

5. From Molecules to Medicine:  Overcoming the Time Scale Challenge
Faculty mentor: Ed Webb, Associate Professor, Mechanical Engineering & Mechanics
- Biological processes underpinning human wellness occur over seconds, hours, days, and longer, yet the governing molecular mechanisms occur on time scales from picoseconds to microseconds.  Molecular mechanistic understanding of complex biological systems can dramatically impact disease diagnosis and treatment. Even the longest simulations that resolve matter at the atomic scale can only examine atomistic behavior over micro- or perhaps milliseconds. Advanced data processing techniques have emerged that hold promise for the ability to bridge information obtained from molecular-scale descriptions of matter to address questions that manifest at human physiological time scales. In this project, students will develop an understanding of structure/function relationships in biology and the intrinsic multi-time scale nature of addressing human wellness from a molecular point of view.  The team will learn specific chemical-physical structure-function coupling mechanisms in the human blood protein von Willebrand Factor (vWF), which is potentially implicated in bleeding disorders affecting ~2% of the human population.  Team members will use molecular scale computational simulations, in conjunction with advanced data processing techniques, to understand how data methods are being used to bridge molecular scale mechanistic information and impact treatment of conditions at the human physiological scale.

6. Computer-Aided Drug Discovery – Practical Training and Research
Faculty lead mentor: Wonpil Im, Professor of Bioengineering and Presidential Endowed Chair in Health, Science, and Engineering
- Despite advances in biotechnology, the number of new drugs approved per billion USD spent on drug research and development (R&D) has halved roughly every 9 years, indicating declining R&D efficiency. Therefore, the ability to conduct efficient computational drug discovery has emerged as a vital component to improve both the efficiency and economics of drug discovery. Drug compounds bind to proteins, regulating their functions to acquire beneficial effects to treat diseases. Therefore, better understanding of protein-ligand interactions at the molecular level, and accurate quantification or prediction of their binding affinity, are at the core of computer-aided drug discovery. This project aims to study protein-ligand interactions computationally, using three families of impactful therapeutic targets for cancers and AIDS: estrogen receptor; HIV-1 protease; and three types of kinases (Ser, Thr, Tyr). Out of a large number of data sets, we will choose a few test cases and compare calculated binding free energy results with the corresponding experiment data. In particular, we plan to provide practical hands-on research experiences in computer-aided drug discovery. The lectures and tools in CHARMM-GUI ( will be used for student learning and research.

7. Spam Spotting – Using AI Tools to Educate and Improve Online Decision-Making
Faculty Mentors: Sihong Xie, Assistant Professor, Computer Science & Engineering; Qiong Fu, College of Education
- On websites like Amazon and TripAdvisor, fake reviews (“spams”) are prevalent. Stories about spams and their victims have been reported widely; these spams overturn product and service reputations and adversely affect users’ decision making. To protect the general public, AI-based spam detectors have been employed to actively flag the spams. Also, more sophisticated users may use their judgments to spot spams. However, AI detectors are not always accurate and transparent, and will not be much trusted and adopted by the general public for fighting spams (“algorithm aversion”). Further, without training, even sophisticated users have difficulty in distinguishing spams from genuine reviews. This project will work towards an education-based defense against spams, where the general public will be educated to acquire the skills to spot spams, and to trust and properly rely on AI detectors to improve their protection. Our summer scope of work will be 1) develop surveys and questionnaires to understand the scope of the challenges; 2) code a role-playing game where a spammer can craft spams for the spotters to catch, both for fun and for research; 3) code a simple tutoring tool to teach human to use AI spam detectors.

8. Thinking Outside the (Lunch) Box – Establishing a Food Carbon Footprint
Faculty Mentors: Don Morris, Environmental Initiative; Katharine Targett Gross, Sustainability Director
- Have you ever thought about the carbon footprint of the foods eaten at on Lehigh’s campus?  In this project, students will create a food carbon footprint calculator that suits Lehigh Dining’s needs to determine which menu items are the most and least carbon intensive. This end goal will allow Lehigh Dining to provide a carbon footprint (red - high, yellow - medium, green - low) for key menu items at dining locations across campus.  This will encourage students, faculty, and staff to choose to alter their food choices based on the impact of the menu item.  The students will collect and analyze data from Lehigh Dining, review existing carbon footprint calculators, and examine current practices in the restaurant industry and at colleges/universities.  Students will be working with different kinds of data including recipe ingredients, ingredient quantities, distance of ingredient sourcing, food carbon intensities, etc.

9. Real-Time Machine Learning in Experimental Materials Science
Faculty Mentor: Joshua Agar, Assistant Professor, Materials Science & Engineering
- In materials science and physics more broadly there is a growing trend to conduct multimodal experiments (experiments that collect data from a variety of sources). The boon in data collection has left a majority of the data collected under-analyzed leaving important physics left undiscovered. This project will develop machine and deep learning methods to discover actionable information from such data. This project will also consider how such models can be implemented on specialty AI hardware for real-time analysis. The work will focus on materials problems as they provide unique ways to stress-test practical theories of machine and deep learning. Outcomes of this work have direct impacts on creating interpretable AI, controlling fairness and bias, and creating autonomous control systems. The impacts of these theories can be adapted to solve problems in medicine and healthcare, resource management and logistics, and manufacturing and processing.

10. A Dose of AI for Disease Prevention and Treatment
Faculty Mentor: Lifang He, Computer Science & Engineering
- AI combined with predictive analysis has helped change the landscape of disease prevention and treatment – bringing a paradigm shift to healthcare. With improved image analytics, concrete clinical and diagnostic decision-making, AI has been highly beneficial for the treatment of chronic diseases like cancer, neurology, and cardiology. In this project, students will develop predictive AI algorithms for early detection and biomarker discovery on various diseases over different kinds of data, such as mood disorder prediction in mobile keyboard, ROP and Parkinson's disease prediction with image and clinical data. Students will engage in focused problems to develop prototypes, run experiments and scale how this works, etc. This project would invigorate fields like computer science and healthcare and trigger the development of new models and algorithms. It could also help solve some medical problems for the benefit of humanity as well as harness the power AI in healthcare. Through this process, students will begin to form lasting AI capability in healthcare.

11. Neurogenetics of Anxiety Disorders
Faculty Mentor: Julie Miwa, Biological Sciences
- This project is working toward identifying genetic linkages of anxiety and cognitive disorders in the population, and consists of a series of bioinformatics subprojects aimed at exploring gene mutations which could underlie neuropsychiatric disorders. Students on this team will enagaged in analyses of DNA databases, data analysis of complex traits and genes, DNA sequencing, and exploring psychological tests of cognitive traits. In the near term, the project can help to identify a subpopulation of people with a predisposition to anxiety disorders; the ultimate goal is to help develop a personalized medicine approach for intractable anxiety. Providing a biological basis for anxiety disorders helps to destigmatize the disorder and direct individuals to the appropriate treatments. The identification of a world-wide genetic risk connected to anxiety disorders could lead to a new treatment beyond the current treatments, which are short-term and do not address the root cause. If successful, this strategy could restore the synaptic imbalances (e.g. plasticity) underlying the over-activation of anxiety-based brain structures over a long term.

12. The “Falling Knife” Project
Faculty Mentor: Patrick Zoro, Finance
- The goal of this project is to build a software with customer interface to detect whether or not the stock that a user has selected is a “falling knife.” If the stock is a falling knife, how far is the fall, and when will it go back up to its previous price level? Students will understand what a falling knife is, how to find these falling knives, and to teach machine learning the pattern of them. Lastly, students will build the base model on findings and back test its accuracy. Currently the project uses Technical Analysis to identify those short term (days, weeks) falling knives with MACD, RSI, EMA, and the fall duration. The team will cross check the S&P 500 for the past 30 years to see if findings hold, and use fundamental analysis and financial matrices to identify what counts as a long term (Months, years) falling knife. To see more about the project, see a 30-minute video from Masters in Financial Engineering students, here.  

13. Visualizing Change in Society
Faculty Mentors: Andrew Ward, Professor, Management; Joshua Ehrig, Professor of Practice, Management
- The pace of change continues to accelerate, and there are various societal-level forces, which we call Societal Shifts, which will drive fundamental change in society – how we live, work, and interact with one another – and will also impact the great divides within society: inequalities in wealth, health, and access to technology, as well as facilitate or hinder progress towards achieving the UN Sustainable Development Goals. We are currently focusing on eight of these Societal Shifts: Changing Demographics, Climate Change, Rapid Urbanization, Energy Generation & Storage, Social Commerce, Big Data, Artificial Intelligence, and Blockchain.  This project will create a visual “dashboard” that tells us the speed and direction that these Societal Shifts are moving, by identifying key indicators that underlie the progress of these shifts and where this data is sourced.  Data will then be fed into a visual dashboard to give current data, and trend data, in an easy-to-understand visual representation of the shifts.  


14. Voter Participation and Election Outcomes in Ireland, 1991-present
Faculty Mentor: Vincent Munley, Professor, Economics
- The focus of this project is an analysis of voter participation and electoral outcomes in the Republic of Ireland, focusing on the electoral results of three aspects: 1) referenda to amend the Irish constitution; 2) results of general elections on parliamentary membership; and 3) the success of incumbent candidates in general elections. Data sets have already been created at multiple levels of electoral divisions, but this team will be compiling additional available data on general elections during the past two decades. Ultimately, the team will be visualizing and analyzing this data to track population changes, election conditions, and changes in local-level election constituencies to create an overall picture of the recent Irish electoral process.  

15. The Mindanao Food Highway
A Collaborative Project with the Civika Asian Development Academy and the Development Academy of the Philippines 
Project mentors: Prof. Ganesh Balasubramanian, Dept. of Mechanical Engineering and Mechanics;
Jo-Ann Emilene A. de Belen - Supervising Fellow (Managing Director, DAP sa Mindanao);
Dr. Elmer S. Soriano - Senior Fellow (Managing Director, Civika Asian Development Academy - Zero Hunger Lab);
Dr. Dominic Vincent D. Ligot - Senior Fellow and Data Analyst (Founder and Chief Technology Officer, CirroLytix);
Fatima D. dela Cruz - Project Manager (Project Staff, DAP sa Mindanao)
- This project will work closely with a team from the Civika Asian Development Academy and the Development Academy of the Philippines, both Philippines-based organizations that through projects develop public and private sector leaders throughout Southeast Asia. The direct focus of this project is the creation of a “Food Highway,” using data-driven and data-centric approaches to alleviate ongoing disruptions to food and medicine supply chains throughout the Philippines during the COVID-19 pandemic. The Mindanao Food Highway, named for the second-largest island in the Philippines archipelago, aims to explore innovative, data-driven, and technology-based solutions to food security challenges during COVID-19 disruptions. Achieving this objective will result in reducing transitory food insecurity and increasing community resilience during this pandemic.