[1910.11250] Fully-Automatic Semantic Segmentation for Food Intake Tracking in Long-Term Care Homes
The main difference between these approaches is our proposed system requires no user input whereas for graph cuts, user-defined seed initialization is required and thus increases user time over the current method for tracking food and fluid intake of LTC residents at risk for malnutrition
Abstract: Malnutrition impacts quality of life and places annually-recurring burden on
the health care system. Half of older adults are at risk for malnutrition in
long-term care (LTC). Monitoring and measuring nutritional intake is paramount
yet involves time-consuming and subjective visual assessment, limiting current
methods' reliability. The opportunity for automatic image-based estimation
exists. Some progress outside LTC has been made (e.g., calories consumed, food
classification), however, these methods have not been implemented in LTC,
potentially due to a lack of ability to independently evaluate automatic
segmentation methods within the intake estimation pipeline. Here, we propose
and evaluate a novel fully-automatic semantic segmentation method for
pixel-level classification of food on a plate using a deep convolutional neural
network (DCNN). The macroarchitecture of the DCNN is a multi-scale
encoder-decoder food network (EDFN) architecture comprising a residual encoder
microarchitecture, a pyramid scene parsing decoder microarchitecture, and a
specialized per-pixel food/no-food classification layer. The network was
trained and validated on the pre-labelled UNIMIB 2016 food dataset (1027 tray
images, 73 categories), and tested on our novel LTC plate dataset (390 plate
images, 9 categories). Our fully-automatic segmentation method attained similar
intersection over union to the semi-automatic graph cuts (91.2% vs. 93.7%).
Advantages of our proposed system include: testing on a novel dataset,
decoupled error analysis, no user-initiated annotations, with similar
segmentation accuracy and enhanced reliability in terms of types of
segmentation errors. This may address several short-comings currently limiting
utility of automated food intake tracking in time-constrained LTC and hospital
settings.
‹Fig. 4: Network diagram of the proposed deep food segmentation network comprised of a residual encoder microarchitecture [36] and a pyramid scene parsing [37] decoder microarchitecture. (Methods)Fig. 5: Sample graph cuts annotation with one line per food item and one background line and resulting segmentation mask. (Testing and Analysis)Fig. 6: Visual comparison of ground truth hand segmentation, and performances of our proposed system (EDFN-H) and graph cuts (GC-H) with Hough plate masking. (Results)Fig. 7: Bland-Altman plot comparing agreement between graph cuts (GC-H) and our proposed method (EDFN-H) with Hough plate masking. (Performance on difficult meal scenarios)Fig. 8: Each of the 11 instances where graph cuts greatly outperformed EDFN-H (i.e., below limit of agreement cutpoint), could be attributed to food or sauce remnants (breakfast: oatmeal remnants, lunch: pasta sauce remnants, dinner: potato remnants). (Performance on difficult meal scenarios)Fig. 9: Thirteen instances where EDFN-H greatly outperformed graph cuts (i.e., above limit of agreement cutpoint); 8/13 due to graph cuts oversegmentation, 5/13 due to graph cuts undersegmentation. (Performance on difficult meal scenarios)›