Machine Learning on the Web

Using a Third-Party Machine Learning Service


Learning Objectives

  • You know of machine learning services on the cloud.

In the previous chapter, we briefly looked into setting up a machine learning model in a container and using it in a web application. Many companies offer machine learning services on the cloud, which allow you to use machine learning models without having to set up your own infrastructure. As a few examples, check out Google Cloud AI, Amazon SageMaker AI, and Microsoft Azure Machine Learning.

Using third-party services can be a good option if you want to quickly prototype an application, or if you don’t have the resources to set up your own infrastructure. Using a third-party service can also be a good option when quickly prototyping models that require expensive hardware. For example, contemporary Large Language Models require expensive hardware to run efficiently, and buying such hardware for quick prototyping is not sensible.

As an example, a single NVIDIA H200 Tensor Core GPU costs tens of thousands of euros — this is just for the GPU and does not include the other required hardware. Using one for an hour on a cloud service, on the other hand, costs just a few euros.

Many of these services also provide APIs that allow you to easily integrate machine learning capabilities into your applications.

As an example, if we would wish to create a chatbot like the one on the lower right corner of these materials, we could install the OpenAI JavaScript client and use it to call the OpenAI API. In the simplest form, a request with Deno could look like this:

import OpenAI from "jsr:@openai/openai";

const client = new OpenAI({
  apiKey: // your OpenAI API Key
});

const response = await client.responses.create({
  model: "o3-mini",
  instructions: "You are a very unhelpful coding assistant that just makes jokes about code.",
  input: "How can I create a hello world application in Python?",
});

console.log(response.output_text);

Naturally, we would wrap the above in an API, and add some error handling.

The above code uses the OpenAI API to create a response to a given input using the specified model. The model parameter specifies which model to use for generating the response, the instructions is a system prompt that guides the behavior of the model, and the input parameter specifies the input text for which we want to generate a response.

OpenAI’s JavaScript API can be used to also query LLMs running on HuggingFace, which is a popular platform for sharing and using machine learning models.

For a bit more information on using Large Language Models programmatically, including Retrieval Augmented Generation that helps providing context to the models, check out the course Software Engineering with Large Language Models.