Summer Interning at Byteflies: Data Science

Interested in an internship at Byteflies? Contact us!

Here at Byteflies we are always on the lookout for excellent summer internship candidates. Last summer, we had the pleasure of hosting Tomas Fiers in the Signal Processing & Data Science team. He validated the signal quality of Byteflies IMU sensors against gold-standard recordings obtained in collaboration with Case Western Reserve University (CWRU), and ran some activity classification experiments. Read on for his excellent report!

A first field-test of the Byteflies IMU sensor

Hi, I’m Tomas Fiers, I study biomedical engineering and I did my summer internship at Byteflies. Just like in Hollywood, we attached 3D motion capture markers to arms and legs. This is the result:


Motion capture markers on limbs during walking.

We also attached Byteflies Sensor Dots to the same arms and legs. Besides sensors like ECG and PPG, they also contain motion sensors. Each Sensor Dot contains three: an accelerometer, a gyroscope, and a magnetometer. These all make 3D measurements, and together they are known as an inertial measurement unit, or IMU. This is what the data from the accelerometers looks like:


Accelerometer data while running. Each step can be clearly distinguished. Note also the left-right alternation in the legs.

We conducted experiments in collaboration with CWRU. The goal was to validate how well the IMU measurements stack up against the motion capture data – the gold standard. Here we compare acceleration data captured by one of the Sensor Dots, with acceleration derived from the motion capture data:



Top - Comparison of accelerometer measurements with baseline derived from motion capture (plot shows vector magnitude of proper acceleration over time). Bottom - Distribution of absolute difference between the two signals, the median is just 0.06 g.


Now what can you do with raw IMU data? Going into the technical details of these measurements is beyond the scope of this post, but suffice to say that together, they can be used to robustly track the orientation of the Sensor Dot. Which direction is it facing? Up or down, east or west – and everything in between. This means that, when multiple Sensor Dots are attached to the body, they can be used to determine posture and orientation of the limbs (relative to each other and the torso). In turn, this information can be used to track gait, athletic performance, rehabilitation, and much more. Tracking absolute position with an IMU over longer periods of time is trickier. This is why smartphones for example cannot navigate solely based on their internal sensors; they need an external GPS signal.

A machine learning recipe

Besides pure orientation tracking, IMU data is very useful for activity detection. Is the person walking or running, driving or cycling, eating or sleeping? Would certainly be cool to know from only these sensor readings, right, but how? Let me guide you through a simple example.

First, chop up the different accelerometer recordings into short time slices, or ‘windows’. Assign a label to each window, according to what the person was doing at the time: running, walking, sitting, or standing. You might use video recordings as a reference when training your algorithm. In the top left of the figure below, such a window is shown.


Top left - One window of accelerometer data. The different time series correspond to different sensors (right upper arm, left lower leg, etc.), and different axes (x, y, z, and magnitude). All data in such a window is transformed into a feature vector, corresponding to one point on the center plot. Center - Projection on the two most important directions (Principal Components or PC for short) of the feature space. Colors indicate the activity class of a window. 

Next, we summarise all the accelerometer data in a window into a few numbers. Two of these numbers are plotted above as ‘PC 1’ and ‘PC 2’. This step is called feature engineering / dimensionality reduction.

Now, when you get new accelerometer data (that is, a new window) for which you don’t yet know the activity, you can summarize this new data in the same way as was done for the other windows. It will land somewhere on the above plot. Based on the location, you can infer the activity with a certain probability.

This is the outline for most machine learning pipelines. And it is what we did this summer, as part of the initial validation of the motion measurements. For this simple exercise, we achieve practically perfect prediction accuracy:


Confusion matrix for 5-nearest neighbor classification, on first 20 principal components of a large number of features per window. All labeled windows (n = 373) were split into a training data set (70%) and a test set (30%). Recordings from two different subjects were used.


I enjoyed my time at Byteflies very much. I had fun playing with the data (using Jupyter / Python) and I learned a lot (Byteflies works in many areas, from electronics to front-end software, from embedded to scientific trials, and from UI and UX design to signal processing. You can pick up a lot in such an environment). The team is talented and makes for a pleasant atmosphere. Us interns had plenty of autonomy and responsibility; we were like full members of the team. It was overall a great internship, for which I want to wholeheartedly thank the entire team!