Using Machine Learning Models on the Client
Learning Objectives
- You can query an inference API from a web application.
- You know of the possibility of running machine learning models directly in the browser.
Using an inference API
On the client, using a machine learning API is as simple as making a request with the fetch API. As an example, for the inference API that we created earlier, the core functionality for making the request would look something like the following:
const fetchPrediction = async (exercise, code) => {
const response = await fetch("/inference-api/predict", {
method: "POST",
headers: {
"Content-Type": "application/json"
},
body: JSON.stringify({
exercise,
code
})
});
const data = await response.json();
return data.prediction;
};
In principle, the above code could be called every time the user presses a key when working on the exercise. This would allow the user to see the prediction in real-time as they type.
However, from the scalability perspective, this might not be such a good idea, as it really would make a request to the server for each keystroke. Instead, using some sort of a threshold for the number of keystrokes could be a better idea. For example, we could make a request every N keystrokes, or every X seconds, whichever comes first. This way, we can reduce the number of requests to the server and improve performance, while also providing feedback to the user in a timely manner.
Concretely, the functionality for e.g. monitoring the keystrokes could be implemented using the oninput
event handler. This event is triggered whenever the value of an input element changes, which includes keystrokes, pasting text, and other input methods — we could then have a separate function that keeps track of the time, and the number of keystrokes, and makes the request when either of the thresholds is reached.
Client-side machine learning
In addition to using machine learning APIs, it is also possible to run machine learning models directly in the browser. This has the benefit of reducing latency and server load, as well as improving privacy, as the data does not need to be sent to the server for processing. However, this requires more resources on the client side, and may not be suitable for all applications.
As an example, we could transform our earlier machine learning model implemented using Python to WebAssembly using tools like py2wasm. Due to the complexity of models and the differences in library versions, this might not always be feasible.
There are also JavaScript libraries for machine learning, such as TensorFlow.js, which allow you to run machine learning models directly in the browser. In the broader context, there are also active efforts into Web APIs for hardware like GPUs such as WebGPU and WebNN — these APIs provide a low-level interface for accessing the GPU, and can be used to accelerate machine learning tasks in the browser.
There’s a whole course on Machine Learning for Web Developers from Google on YouTube — you can find it here.