An electrical conductor is a material that allows electric current to be carried. The ability of the material to serve as a conductor is quantified by its resistance, with higher resistance resulting in a lower amount of current carried. For most conductors, as the temperature of the material decreases, the resistance also decreases, but remains positive. However, superconductors are materials which have a critical temperature below which the material demonstrates no electrical direct current resistance at all. These critical temperatures are often at very low temperatures.
Superconductors have many scientific uses, including in particle colliders and in MRI machines. Scientists often want to predict the critical temperature of superconducting materials for practical reasons, but there is a lack of scientific theory that guides prediction of critical temperatures for superconductors. Hamidieh (2018) created a statistical model aimed at predicting the critical temperature of superconducting materials based only on its chemical formula.
A repository has already been created for your team and will be
available in the course GitHub organization. The dataset for this
assignment can be found as a csv file in the data folder of
your repository. This dataset represents data for five thousand
superconductors as collected from Japan’s National Institute for
Materials Science.
There are 81 total predictors in the dataset. The predictors are various physical properties of the elements that make up the chemical formula in each superconducting material. In addition to the number of unique elements that make up the material, the dataset also has information on:
fie in the dataset)For each of these physical properties, variables were created based on the following functions:
Your overall goal is to create a linear model that accurately
predicts the critical temperature (critical_temp) based on
the other 81 variables in the dataset. Use a variable selection approach
of your choice. You may consider any transformations or interaction
terms you would like.
For your lab report, clearly describe the following items:
This is very open-ended! You will be evaluated based on how comprehensively you describe your process. As well, some bonus points will be provided to the team whose linear model performs the best in an out-of-sample test set in terms of minimizing RMSE (I’ll release the test set after the due date of the lab, but the observations come from the same “population” as the dataset provided for you in this lab).
There should only be one submission per team on Gradescope.