-By Alexandre Kaspar, Tae-Hyun Oh, Liane Makatura,
Petr Kellnhofer, Jacqueline Aslarus, Wojciech Matusik
Abstract
Motivated by the recent potential of mass customization brought by whole-garment knitting machines, we introduce the new problem of automatic machine instruction generation using a single image of the desired physical product, which we apply to machine knitting. We propose to tackle this problem by directly learning to synthesize regular machine instructions from real images. We create a cured dataset of real samples with their instruction counterpart and propose to use synthetic images to augment it in a novel way. We theoretically motivate our data mixing framework and show empirical results suggesting that making real images look more synthetic is beneficial in our problem setup.
Source: http://www.shimaseiki.com
Advanced manufacturing methods that allow completely automated production of customized objects and parts are transforming today’s economy. One prime example of these methods is whole-garment knitting that is used to mass-produce many common textile products (e.g., socks, gloves, sportswear, shoes, car seats, etc.). During its operation, a whole garment knitting machine executes a custom low-level program to manufacture each textile object. Typically, generating the code corresponding to each design is a difficult and tedious process requiring expert knowledge. A few recent works have tackled the digital design workflow for whole-garment knitting None of these works, however, provide an easy way to specify patterns.
Figure 1: Illustration of our inverse problem and solution. An instruction map (top-left) is knitted into a physical artifact (top-right). We propose a machine learning pipeline to solve the inverse problem by leveraging synthetic renderings of the instruction maps.
Figure 2: Sample Transfer sequence: move the red center stitch to the opposite bed; rack (move) the back bed 1 needle relative to the front; transfer the red stitch back to its original side. Note that the center front needle is now empty, while the right front needle holds 2 stitches.
Figure 3: (L to R) Illustration of Knit, Tuck, and Miss operations.
Contributions include:
- An automatic translation of images to sequential instructions for a real manufacturing process;
- A diverse knitting pattern dataset that provides a mapping between images and instruction programs specified using a new domain-specific language (DSL) (Kant, 2018) that significantly simplifies low-level instructions and can be decoded without ambiguity;
- A theoretically inspired deep learning pipeline to tackle this inverse design problem; and
- A novel usage of synthetic data to learn to neutralize real-world, visual perturbations.
- Knit pulls a new loop of yarn through all current loops,
- Tuck stacks a new loop onto a needle,
- Miss skips a needle,
- Transfer moves a needle’s content to the other bed,
- Racking changes the offset between the two beds.
Figure 4: Top: abstract illustration and color coding of of our 17 instructions. Bottom: instruction codes, which can be interpreted using the initial character of the following names: Knit and Purl (front and back knit stitches), Tuck, Miss, Front, Back, Right, Left, Stack. Finally, X stands for Cross where + and − are the ordering (upper and lower). Move instructions are composed of their initial knitting side (Front or Back), the move direction (Left or Right) and the offset (1 or 2).
Given a line of instructions, the sequence of operations is done over a full line using the following steps:
- The current stitches are transferred to the new instruction side without racking;
- The base operation (knit, tuck or miss) is executed;
- The needles of all transfer-related instructions are transferred to the opposite bed without racking;
- Instructions that involve moving within a bed proceed to transfer back to the initial side using the appropriate racking and order;
- Stack instructions transfer back to the initial side without racking.
The Refiner Network Our refinement network translates real images into regular images that look similar to synthetic images. Its implementation is similar to Img2prog, except that it outputs the same resolution image as input, of which illustration is shown in Figure 10.
Loss Balancing Parameters When learning our full architecture with both Refiner and Img2prog, we have three different losses: the cross-entropy loss LCE, the perceptual loss LPerc, and the PatchGAN loss.
Our combined loss is the weighted sum
The perceptual loss consists of the feature matching loss and style loss (using the gram matrix). If not mentioned here, we follow the implementation details of, where VGG-16 is used for feature extraction, after replacing max-pooling operations with average-pooling. The feature matching part is done using the pool3 layer, comparing the input real image and the output of Refiner so as to preserve the content of the input data. For the style matching part, we use the gram matrices of the {conv1 2, conv2 2, conv3 3} layers with the respective relative weights {0.3, 0.5, 1.0}. The measured style loss is between the synthetic image and the output of Refiner.
Metrics
Semantic segmentation Our problem is to transform photographs of knit structures into their corresponding instruction maps. This resembles semantic segmentation which is a per-pixel multi-class classification problem except that the spatial extent of individual instruction interactions is much larger when looked at from the original image domain. From a program synthesis perspective, we have access to a set of constraints on valid instruction interactions (e.g. Stack is always paired with a Move instruction reaching it). This conditional dependency is referred to as context in semantic segmentation, and there have been many efforts to explicitly tackle this by Conditional Random Field (CRF) . They clean up spurious predictions of a weak classifier by favoring same-label assignments to neighboring pixels, e.g., Potts model. For our problem, we tried a first-order syntax compatibility loss, but there was no noticeable improvement. However we note that observed that a CNN with a large receptive field but without CRF can outperform or compare similarly to its counterpart with CRF for subsequent structured guidance . While we did not consider any CRF post processing in this work, sophisticated modeling of the knittability would be worth exploring as a future direction.
Another apparent difference between knitting and semantic segmentation is that semantic segmentation is an easy – although tedious – task for humans, whereas parsing knitting instructions requires vast expertise or reverse engineering.
Neural program synthesis In terms of returning explicit interpretable programs, our work is closely related to program synthesis, which is a traditional challenging, ongoing problem similar concept is program induction, in which the model learns to mimic the program rather than explicitly return it. From our perspective, semantic segmentation is closer to program induction, while our task is program synthesis. The recent advance of deep learning has made notable progress in this domain. Our task would have potentials to extend the research boundary of this field, since it differs from any other prior task on program synthesis in that:
1) while program synthesis solutions adopt a sequence generation paradigm , our type of input-output pairs are 2D program maps, and
2) the domain specific language (our instruction set) is newly developed and directly applicable to practical knitting.
We have proposed an inverse process for translating high level specifications to manufacturing instructions based on deep learning. In particular, we have developed a framework that translates images of knitted patterns to instructions for industrial whole-garment knitting machines. In order to realize this framework, we have collected a dataset of machine instructions and corresponding images of knitted patterns. We have shown both theoretically and empirically how we can improve the quality of our translation process by combining synthetic and real image data. We have shown an uncommon usage of synthetic data to develop a model that maps real images onto a more regular domain from which machine instructions can more easily be inferred.
The different trends between our perceptual and semantic metrics bring the question of whether adding a perceptual loss on the instructions might also help improve the semantic accuracy. This could be done with a differentiable rendering system. Another interesting question is whether using higher-accuracy simulations could help and how the difference in regularity affects the generalization capabilities of our prediction.
Comments