How to create an Instagram face finder app?

12 min readJan 29, 2024

We won’t focus on collecting a dataset of face images, when dealing with facial data, it’s important to consider privacy laws and ethical implications. Ensure that you comply with privacy laws and user consent for face recognition. A freely available dataset of 100,000 celebrities, would be a great choice for learning.

How the process typically works

Face Embeddings

First, you need to convert each face into a vector representation, often called an embedding. This is done using deep learning models specifically designed for facial recognition or facial feature extraction. These models output a vector (a list of numbers) for each face, where similar faces have similar vectors.

Storing in a Vector Database

Next, these vectors are stored in a vector database like SingleStore. This type of database is optimized for handling vector data and can efficiently perform operations like nearest neighbor searches, which are crucial for finding similar faces.

SQL Queries for Searching

With the vectors stored in a database, you can then perform search queries to find similar faces. In some vector databases, you can use SQL-like syntax to query the data, making it more accessible and easier to integrate with existing systems. You write a query where you input a face vector, and the database returns the most similar face vectors from your dataset.

Advantages

— Scalability: Vector databases are designed to handle large datasets efficiently.
— Speed: Searching for similar vectors is much faster compared to traditional methods.
— Flexibility: SQL-like querying allows for easy integration and more complex queries.

Considerations

— Accuracy: The accuracy of your face search service will largely depend on the quality of the face embedding model.
— Privacy and Ethical Concerns: When dealing with facial data, it’s important to consider privacy laws and ethical implications.
— Data Preparation: The face images need to be pre-processed and normalized before generating embeddings for consistency.

Updates and Maintenance

Keep in mind that maintaining a database and ensuring the face embedding model stays current (as better models are developed) are ongoing tasks.

Best face embedding models

When looking for the best face embedding models, you want to consider factors like accuracy, speed, robustness against variations (like lighting, pose, age, etc.), and the size of the model (which affects deployment feasibility). Here are some of the notable models and frameworks:

1. Facenet: Developed by Google, Facenet has been a popular choice for face recognition tasks. It uses a deep convolutional network, trained to optimize the embedding itself, rather than intermediate features.

2. DeepFace: Developed by Facebook, DeepFace is another highly influential model in this field. It’s known for its robustness and high accuracy.

3. VGGFace and VGGFace2: Developed by the Visual Graphics Group at Oxford, these models are built on the famous VGG-16 architecture and are known for their high performance in face recognition.

4. ArcFace: This model is known for its angular margin loss function, which significantly enhances the discriminative power of the embeddings.

5. Dlib’s Face Recognition Model: Based on a ResNet architecture, this model is widely used in the community for its balance of accuracy and resource efficiency.

6. OpenFace: An open-source tool, OpenFace offers good performance and is easier to deploy in practical applications.

7. InsightFace: This is a recent and popular deep learning toolkit for face analysis, known for its state-of-the-art performance in face detection and recognition.

When selecting a face embedding model, it’s important to consider the specific requirements of your application. For instance, if you’re working on a mobile application, you might prioritize models that are more lightweight and efficient. On the other hand, for a security-focused application, the accuracy and robustness of the model would be the top priority.

Face finder system

Implementing a facial recognition system involves several steps: training the model, storing face embeddings, and then retrieving similar faces. I’ll outline a high-level example using Python, assuming you have a dataset of face images. For this example, I’ll use the Dlib library, which is popular for face recognition and offers a good balance between simplicity and performance.

Step 1: Installing Dependencies

First, ensure you have the necessary libraries. You can install them using pip:

pip install dlib numpy opencv-python

Step 2: Training the Model

In practice, training a state-of-the-art face recognition model from scratch requires a large dataset and significant computational resources. Fortunately, models like Dlib’s come pre-trained. Here’s how you can use Dlib to generate embeddings:

import dlib
import numpy as np
import cv2

# Load pre-trained face detection and recognition models
detector = dlib.get_frontal_face_detector()
sp = dlib.shape_predictor('shape_predictor_68_face_landmarks.dat')
facerec = dlib.face_recognition_model_v1('dlib_face_recognition_resnet_model_v1.dat')

def get_face_embedding(image_path):
    # Load image
    img = cv2.imread(image_path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    # Detect faces
    dets = detector(img, 1)

    for k, d in enumerate(dets):
        shape = sp(img, d)
        # Compute the 128D vector that describes the face
        face_descriptor = facerec.compute_face_descriptor(img, shape)
        return np.array(face_descriptor)

# Example usage
embedding = get_face_embedding('path_to_image.jpg')
print(embedding)

Ensure you have the required shape_predictor_68_face_landmarks.dat and dlib_face_recognition_resnet_model_v1.dat files, which can be downloaded from Dlib's model repository.

Step 3: Storing Embeddings

You can store these embeddings in a database. For simplicity, let’s assume a local SQLite database:

import sqlite3

# Connect to SQLite database
conn = sqlite3.connect('face_embeddings.db')
cursor = conn.cursor()

# Create table
cursor.execute('''CREATE TABLE IF NOT EXISTS faces
               (id INTEGER PRIMARY KEY, embedding TEXT)''')

# Insert embedding
def store_embedding(embedding, id):
    cursor.execute("INSERT INTO faces (id, embedding) VALUES (?, ?)", (id, ','.join(map(str, embedding))))
    conn.commit()

# Example usage
store_embedding(embedding, 1)

Step 4: Retrieving Similar Faces

To retrieve similar faces, you’d typically use a nearest neighbor search. However, SQLite doesn’t support this natively, so for simplicity, I’ll show a basic retrieval:

def find_similar_faces(target_embedding, threshold=0.6):
    cursor.execute("SELECT id, embedding FROM faces")
    rows = cursor.fetchall()

    for row in rows:
        stored_embedding = np.array(list(map(float, row[1].split(','))))
        distance = np.linalg.norm(stored_embedding - target_embedding)

        if distance < threshold:
            print(f"Found a similar face with ID: {row[0]} and distance: {distance}")

# Example usage
find_similar_faces(embedding)

This is a very basic example. In a real-world scenario, you’d use a more suitable database for vector storage and efficient similarity search, like Singlestore vector database. Additionally, consider handling issues like face alignment, multiple faces in an image, and scaling for large datasets. Always test and validate the system thoroughly, and be mindful of privacy and ethical considerations.

Which database supports nearest neighbor search?

Several databases support nearest neighbor search, which is a crucial functionality for efficiently handling vector similarity queries like those used in face recognition systems. As of my last update, here are some notable databases that offer this capability:

1. Elasticsearch: With the introduction of the KNN (k-nearest neighbors) search feature, Elasticsearch allows you to perform similarity searches on vector fields efficiently. It’s widely used for search applications and has robust support for scalable, real-time searches.

2. Faiss (Facebook AI Similarity Search): Developed by Facebook AI Research, Faiss is a library for efficient similarity search on large collections of dense vectors. While it’s not a full-fledged database, it’s often used in conjunction with databases to handle large-scale vector similarity searches.

3. Milvus: An open-source vector database, Milvus is designed for handling large-scale similarity search. It supports multiple index types for efficient vector search and is well-suited for applications like image and video search, recommendation systems, and natural language processing.

4. PostgreSQL with PGroonga or Cube Extensions: PostgreSQL, a popular open-source relational database, can perform nearest neighbor searches using extensions like PGroonga and Cube. These extensions allow PostgreSQL to handle multi-dimensional data and perform efficient k-nearest neighbor searches.

5. Annoy (Approximate Nearest Neighbors Oh Yeah): Annoy is a C++ library with Python bindings to search for points in space that are close to a given query point. It’s not a database itself but is commonly used alongside databases to support efficient nearest neighbor searches in high-dimensional spaces.

6. HNSWlib (Hierarchical Navigable Small World Graph Library): Similar to Annoy, HNSWlib is a library for approximate nearest neighbor search with efficient algorithms. It is known for its speed and accuracy in high-dimensional vector search tasks.

7. Vectorflow: A more recent addition to the vector database landscape, Vectorflow is designed specifically for machine learning similarity searches. It focuses on providing fast, scalable, and efficient nearest neighbor search capabilities.

8. SingleStore (formerly MemSQL): SingleStore has introduced capabilities to handle vector similarity search, enabling efficient operations on vector data for real-time applications.

When choosing a database for nearest neighbor search, consider factors like the size of your dataset, the dimensionality of your vectors, the speed of search required, and integration with your existing technology stack. Each of these databases and libraries has its strengths and is better suited for different types of applications and scales.

Face finder app

Example: https://huggingface.co/spaces/arthuqa/facefinder

Creating a face recognition application using Gradio is a simpler approach, especially for prototyping and demonstration purposes. These frameworks allow for rapid development of interactive web applications with Python. I’ll provide an example using Gradio, as it’s well-suited for tasks involving image processing and machine learning.

Setting Up the Environment

Install necessary Python libraries.

You’ll need gradio, face_recognition, and numpy. Install them via pip:

pip install gradio face_recognition numpy

Building the Gradio App

Create a Python script (e.g., app.py).

This script will set up the Gradio interface and handle the face recognition logic.

import gradio as gr
import face_recognition
import numpy as np
import sqlite3
import json
import os

# Path to the SQLite database file
db_file = 'faces.db'

# Function to get a new SQLite connection
def get_new_connection():
    conn = sqlite3.connect(db_file)
    cursor = conn.cursor()
    cursor.execute('''CREATE TABLE IF NOT EXISTS faces (id INTEGER PRIMARY KEY AUTOINCREMENT, embedding TEXT)''')
    return conn

def upload_face(input_image):
    conn = get_new_connection()
    cursor = conn.cursor()

    embeddings = face_recognition.face_encodings(input_image)
    if not embeddings:
        conn.close()
        return "No face detected in the uploaded image."

    # Store the first face embedding in the database
    embedding = embeddings[0].tolist()
    cursor.execute("INSERT INTO faces (embedding) VALUES (?)", (json.dumps(embedding),))
    conn.commit()
    conn.close()
    return "Face uploaded successfully!"

def search_faces(input_image):
    conn = get_new_connection()
    cursor = conn.cursor()

    embeddings = face_recognition.face_encodings(input_image)
    if not embeddings:
        conn.close()
        return "No face detected in the search image."

    input_embedding = embeddings[0]

    cursor.execute("SELECT id, embedding FROM faces")
    rows = cursor.fetchall()
    if not rows:
        conn.close()
        return "No faces in database to compare with."

    min_distance = float('inf')
    closest_id = None
    for row in rows:
        db_embedding = json.loads(row[1])
        distance = np.linalg.norm(np.array(db_embedding) - input_embedding)
        if distance < min_distance:
            min_distance = distance
            closest_id = row[0]

    conn.close()
    if closest_id is not None:
        return f"Closest match is ID {closest_id} with distance {min_distance:.2f}"
    else:
        return "No similar faces found."

with gr.Blocks() as app:
    gr.Markdown("Upload a face to add to the database")
    with gr.Row():
        input_image1 = gr.Image()
        submit_button1 = gr.Button("Upload")
    output1 = gr.Textbox(label="Upload Status")

    submit_button1.click(upload_face, inputs=input_image1, outputs=output1)

    gr.Markdown("Search for a similar face in the database")
    with gr.Row():
        input_image2 = gr.Image()
        submit_button2 = gr.Button("Search")
    output2 = gr.Textbox(label="Search Results")

    submit_button2.click(search_faces, inputs=input_image2, outputs=output2)

if __name__ == "__main__":
    # Ensure the database file exists
    if not os.path.exists(db_file):
        open(db_file, 'a').close()

    app.launch()

Run the Gradio app

Execute the script to start the application:

python app.py

This will start a local server, and you can interact with your app through the web interface.

How App Works

Users can upload a photo through the first function. The app computes its face embedding and stores it in the database.
Users can then search for similar faces using the second function. The app computes the embedding of the search image and finds the closest match from the stored embeddings.
The interface now supports two different operations: uploading and searching.

Note

This implementation is for demonstration purposes. In a real-world application, you would need to handle cases where no face is detected, deal with multiple faces in a photo, and ensure robust error handling.
As before, consider the privacy and ethical implications of handling and storing facial data.
The database is still in-memory, so data will be lost when the script stops. For persistent storage, use a file-based SQLite database or another database system.

What is distance, and why is it 0 at full match?

The “distance” in this context refers to the Euclidean distance between two face embeddings in a high-dimensional space. These embeddings are numerical representations of faces, typically generated by a deep learning model. The distance is a measure of how similar or dissimilar two face embeddings (and thus the faces they represent) are.

Euclidean Distance: It’s the straight-line distance between two points in a multidimensional space. In your application, each point (or vector) represents a face embedding. The formula for Euclidean distance between two points ( P ) and ( Q ) in an ( n )-dimensional space is:

d(P,Q)\=∑i\=1n(Pi−Qi)2

Distance Interpretation:

Distance = 0: This implies that the two embeddings are identical (or extremely similar). A distance of 0 in your app indicates a perfect match, meaning the face embedding from the uploaded image is the same (or almost the same) as the one stored in the database. This might happen if you’re searching for a face that you just uploaded.
Positive Distance: A larger distance indicates less similarity. There’s no absolute scale for what distance values are considered “close” or “far” as it depends on the model and the data. Typically, a threshold is empirically determined to decide whether a distance is close enough to consider two faces a match.

Why 0 for a Full Match?:

A distance of 0 for a full match is expected because it means the two compared embeddings are identical. In face recognition systems, this usually signifies that the same person’s face (or an extremely similar one) is found.

Practical Considerations:

Thresholds: In real-world applications, you often set a threshold distance. If the computed distance is below this threshold, the faces are considered a match.
Variability and Noise: Factors like image quality, lighting, facial expressions, and the angle of the face can affect the embeddings and thus the distances.

In summary, the distance is a measure of similarity between faces, with smaller values indicating higher similarity. The exact interpretation of these distances can vary and typically requires some empirical tuning based on the specific characteristics of the face recognition model and the dataset being used.

What is the size of 1 vector of an image?

The size of a single vector representing an image, particularly in the context of face embeddings or similar image feature extraction techniques, depends primarily on two factors: the dimensions of the vector and the data type used for each element in the vector.

1. Vector Dimensions: In the field of face recognition and image processing, common dimensions for feature vectors are 128, 256, 512, or even higher. The specific dimension depends on the model and architecture used. For instance, FaceNet, a popular face recognition model, typically produces embeddings of 128 dimensions.

2. Data Type: The data type of each element in the vector usually is a floating-point number. In most cases, this is either a 32-bit float (float32) or, less commonly, a 64-bit float (float64). A 32-bit float takes up 4 bytes of memory, while a 64-bit float takes up 8 bytes.

Calculating the Size

To calculate the size of one vector:

- For a 128-dimensional vector using 32-bit floats: 128 elements * 4 bytes/element = 512 bytes
- For a 256-dimensional vector using 32-bit floats: 256 elements * 4 bytes/element = 1024 bytes = 1 KB

Example: If we use FaceNet’s 128-dimensional embeddings with 32-bit floats, each vector would be approximately 512 bytes.

It’s important to note that while the vector itself might be of this size, additional storage overhead may apply depending on the database or file format used to store these vectors. This overhead can include metadata, indexing information, or other auxiliary data structures.

How much resources will a vector base of 100,000,000 faces consume?

To estimate the resource consumption for a vector database containing 100,000,000 faces, we need to consider several factors, including the size of each vector, the database’s storage overhead, and the operational requirements like indexing and querying. Let’s break it down:

1. Disk Storage

Assuming each vector represents a face embedding:

Vector Size: As previously calculated, a typical 128-dimensional vector with 32-bit floats would be 512 bytes.
Total Size: 100,000,000 vectors * 512 bytes/vector = 51,200,000,000 bytes = 51.2 GB

This is the raw size of the vectors. However, databases have additional overhead for storage, such as indexing, metadata, and possibly replication for fault tolerance.

Estimated Storage Overhead: This can vary, but a common rule of thumb is to expect at least 100% overhead for databases, which would double the storage requirement to around 102.4 GB. Depending on the database and its configuration (e.g., index types, replication settings), this could be higher.

2. RAM

The RAM requirement depends on the database’s operational needs:

Indexing: Efficient search requires indexing, which consumes additional RAM. The amount depends on the indexing algorithm and the database architecture.
Cache: Databases typically cache frequently accessed data in memory for faster access.
Concurrent Operations: Simultaneous read/write operations and queries consume more memory.

A rough estimate is challenging without specific database details, but for a large dataset like this, a server with at least 64 GB to 128 GB of RAM would be advisable for moderate performance. More would be better for optimal performance, especially if the database supports in-memory operations.

3. CPU

CPU requirements depend on:

Query Load: The number and complexity of concurrent queries.
Data Processing: Tasks like vector insertion, updating, and maintenance operations.
Index Maintenance: Indexing algorithms can be CPU-intensive.

A multi-core processor (e.g., 16 cores or more) is recommended for handling a large-scale vector database efficiently, especially if the system will handle many simultaneous queries.

Conclusion

For a vector database with 100,000,000 individuals:

Disk Storage: At least 102.4 GB, more with higher redundancy and backup requirements.
RAM: Ideally 64 GB to 128 GB or more, depending on operational needs.
CPU: A robust multi-core processor (16 cores or more) for efficient processing.

These are ballpark estimates. The actual requirements could vary based on the database’s specific implementation, configuration, and the nature of the workload (read-heavy, write-heavy, query complexity, etc.). It’s also important to consider the network infrastructure and potential scaling strategies (like sharding or clustering) for handling such a large dataset effectively.