Years ago I collected together data from NHL games using the NHL’s stats API, most interestingly x&y coordinates of plays like shots, hits, faceoffs etc.. I made this data available on Kaggle.
I have been updating this data intermittently and have finally got around to adding the past seasons data (a mere month before the new season is due to start…) I have also published the code I used to generate the data to allow others to create their own updates when I fall behind.
I’ve enjoyed working with this data set and seeing what others have done with it but the ultimate missing piece is player tracking data.
Its been a long time coming but form last season’s playoffs we started to see use in competitive games. Still in its infancy, the broadcasts I saw only really made use of this for highlighting in real-time who is on the ice, and how long their shift has been. Last year was an odd one all round so we van forgive little benign done with it but the prospect of having player tracking data recorded is an exciting one and I am keen to get my hands on it.
If we have this data, we can start to analyse positions. We can re-construct the scene and begin to mark up passing & shooting lanes. Wrap this into a 2D tensor and we can start to do neural network with it. Add dimensions to hold player’s x&y velocities to add greater power.
I have been hesitant to use neural networks on sporting data as they can hide the understanding of why an outcome has been predicted, but the potential volume of this data opens up so many opportunities.
The plans I’ve seen target real-time tracking for broadcasters but I hope the data is stored and made available for analysts. Tracking data can quickly get large but if this were to be reduced to only the 12-players on ice at a time and only for game time it should be possible to simplify this into ~5MB per game. (reduced to 10 observations per player per second)
Still this would be much larger than the game data dataset I have been accumulating but I am keen to take up the challenge.