Modular Arithmetic in Data Science

is a mathematical system the place numbers cycle after reaching a price known as the modulus. The system is sometimes called “clock arithmetic” attributable to its similarity to how analog 12-hour clocks signify time. This text supplies a conceptual overview of modular arithmetic and explores sensible use circumstances in information science.

Conceptual Overview

The Fundamentals

Modular arithmetic defines a system of operations on integers primarily based on a selected integer known as the modulus. The expression x mod d is equal to the rest obtained when x is split by d. If r ≡ x mod d, then r is claimed to be congruent to x mod d. In different phrases, which means r and x differ by a a number of of d, or that x – r is divisible by d. The image ‘≡’ (three horizontal strains) is used as a substitute of ‘=’ in modular arithmetic to emphasise that we’re coping with congruence moderately than equality within the typical sense.

For instance, in modulo 7, the quantity 10 is congruent to three as a result of 10 divided by 7 leaves a the rest of three. So, we will write 3 ≡ 10 mod 7. Within the case of a 12-hour clock, 2 a.m. is congruent to 2 p.m. (which is 14 mod 12). In programming languages comparable to Python, the p.c signal (‘%’) serves because the modulus operator (e.g., 10 % 7 would consider to three).

Here’s a video that explains these ideas in additional element:

Fixing Linear Congruences

A linear congruence generally is a modular expression of the shape n ⋅ y ≡ x (mod d), the place the coefficient n, goal x, and modulus d are identified integers, and the unknown integer y has a level of 1 (i.e., it’s not squared, cubed, and so forth). The expression 2017 ⋅ y ≡ 2025 (mod 10000) is an instance of a linear congruence; it states that when 2017 is multiplied by some integer y, the product leaves a the rest of 2025 when divided by 10000. To resolve for y within the expression n ⋅ y ≡ x (mod d), observe these steps:

Discover the biggest widespread divisor (GCD) of the coefficient n and modulus d, additionally written as GCD(n, d), which is the best constructive integer that may be a divisor of each n and d. The Extended Euclidean Algorithm could also be used to effectively compute the GCD; this will even yield a candidate for n^-1, the modular inverse of the coefficient n.

Decide whether or not an answer exists. If the goal x shouldn’t be divisible by GCD(n, d), then the equation has no resolution. It is because the congruence is simply solvable when the GCD divides the goal.

Simplify the modular expression, if wanted, by dividing the coefficient n, goal x, and modulus d by GCD(n, d) to scale back the issue to a less complicated equal kind; allow us to name these simplified portions n₀, x₀, and d₀, respectively. This ensures that n₀ and d₀ are coprime (i.e., 1 is their solely widespread divisor), which is critical for locating a modular inverse.

Compute the modular inverse n₀^-1 of n₀ mod d₀ (once more, utilizing the Prolonged Euclidean Algorithm).

Discover one resolution for the unknown worth y. To do that, multiply the modular inverse n₀^-1 by the diminished goal x₀ to acquire one legitimate resolution for y mod d₀.

Lastly, by constructing on the results of step 5, generate all attainable options. Because the authentic equation was diminished by GCD(n, d), there are GCD(n, d) distinct options. These options are spaced evenly aside by the diminished modulus d₀, and all are legitimate with respect to the unique modulus d.

Following is a Python implementation of the above process:

def extended_euclidean_algorithm(a, b): """ Computes the best widespread divisor of constructive integers a and b, together with coefficients x and y such that: a*x + b*y = gcd(a, b) """ if b == 0: return (a, 1, 0) else: gcd, x_prev, y_prev = extended_euclidean_algorithm(b, a % b) x = y_prev y = x_prev - (a // b) * y_prev return (gcd, x, y) def solve_linear_congruence(coefficient, goal, modulus): """ Solves the linear congruence: coefficient * y ≡ goal (mod modulus) Returns all integer options for y with respect to the modulus. """ # Step 1: Compute the gcd gcd, _, _ = extended_euclidean_algorithm(coefficient, modulus) # Step 2: Test if an answer exists if goal % gcd != 0: print("No resolution exists: goal shouldn't be divisible by gcd.") return None # Step 3: Cut back the equation by gcd reduced_coefficient = coefficient // gcd reduced_target = goal // gcd reduced_modulus = modulus // gcd # Step 4: Discover the modular inverse of reduced_coefficient with respect to the reduced_modulus _, inverse_reduced, _ = extended_euclidean_algorithm(reduced_coefficient, reduced_modulus) inverse_reduced = inverse_reduced % reduced_modulus # Step 5: Compute one resolution base_solution = (inverse_reduced * reduced_target) % reduced_modulus # Step 6: Generate all options modulo the unique modulus all_solutions = [(base_solution + i * reduced_modulus) % modulus for i in range(gcd)] return all_solutions

Listed here are some instance checks:

options = solve_linear_congruence(coefficient=2009, goal=2025, modulus=10000) print(f"Options for y: {options}") options = solve_linear_congruence(coefficient=20, goal=16, modulus=28) print(f"Options for y: {options}")

Outcomes:

Options for y: [225] Options for y: [5, 12, 19, 26]

This video explains how one can resolve linear congruences in additional element:

Information Science Use Circumstances

Use Case 1: Characteristic Engineering

Modular arithmetic has plenty of fascinating use circumstances in information science. An intuitive one is within the context of characteristic engineering, for encoding cyclical options like hours of the day. Since time wraps round each 24 hours, treating hours as linear values can misrepresent relationships (e.g., 11 PM and 1 AM are numerically far aside however temporally shut). By making use of modular encoding (e.g., utilizing sine and cosine transformations of the hour modulo 24), we will protect the round nature of time, permitting machine studying (ML) fashions to acknowledge patterns that happen throughout particular intervals like nighttime. The next Python code exhibits how such an encoding may be carried out:

import numpy as np import pandas as pd import matplotlib.pyplot as plt # Instance: Checklist of incident hours (in 24-hour format) incident_hours = [22, 23, 0, 1, 2] # 10 PM to 2 AM # Convert to a DataFrame df = pd.DataFrame({'hour': incident_hours}) # Encode utilizing sine and cosine transformations df['hour_sin'] = np.sin(2 * np.pi * df['hour'] / 24) df['hour_cos'] = np.cos(2 * np.pi * df['hour'] / 24)

The ensuing dataframe df:

hour hour_sin hour_cos 0 22 -0.500000 0.866025 1 23 -0.258819 0.965926 2 0 0.000000 1.000000 3 1 0.258819 0.965926 4 2 0.500000 0.866025

Discover how the usage of sine nonetheless differentiates between the hours earlier than and after 12 (e.g., encoding 11 p.m. and 1 a.m. as -0.258819 and 0.258819, respectively), whereas the usage of cosine doesn’t (e.g., each 11 p.m. and 1 a.m. are mapped to the worth 0.965926). The optimum alternative of encoding will rely upon the enterprise context through which the ML mannequin is to be deployed. In the end, the method enhances characteristic engineering for duties comparable to anomaly detection, forecasting, and classification the place temporal proximity issues.

Within the following sections, we are going to contemplate two bigger information science use circumstances of linear congruence that contain fixing for y in modular expressions of the shape n ⋅ y ≡ x (mod d).

Use Case 2: Resharding in Distributed Database Programs

In distributed databases, information is usually partitioned (or sharded) throughout a number of nodes utilizing a hash perform. When the variety of shards adjustments — say, from d to d’ — we have to reshard the information effectively with out rehashing every thing from scratch.

Suppose every information merchandise is assigned to a shard as follows:

shard = hash(key) mod d

When redistributing objects to a brand new set of d’ shards, we would wish to map the outdated shard indices to the brand new ones in a method that preserves stability and minimizes information motion. This could result in fixing for y within the expression n ⋅ y ≡ x (mod d), the place:

x is the unique shard index,

d is the outdated variety of shards,

n is a scaling issue (or transformation coefficient),

y is the brand new shard index that we’re fixing for

Utilizing modular arithmetic on this context ensures constant mapping between outdated and new shard layouts, minimizes reallocation, preserves information locality, and permits deterministic and reversible transformations throughout resharding.

Beneath is a Python implementation of this state of affairs:

def extended_euclidean_algorithm(a, b): """ Computes gcd(a, b) and coefficients x, y such that: a*x + b*y = gcd(a, b) Used to search out modular inverses. """ if b == 0: return (a, 1, 0) else: gcd, x_prev, y_prev = extended_euclidean_algorithm(b, a % b) x = y_prev y = x_prev - (a // b) * y_prev return (gcd, x, y) def modular_inverse(a, m): """ Returns the modular inverse of a modulo m, if it exists. """ gcd, x, _ = extended_euclidean_algorithm(a, m) if gcd != 1: return None # Inverse does not exist if a and m usually are not coprime return x % m def reshard(old_shard_index, old_num_shards, new_num_shards): """ Maps an outdated shard index to a brand new one utilizing modular arithmetic. Solves: n * y ≡ x (mod d) The place: x = old_shard_index d = old_num_shards n = new_num_shards y = new shard index (to unravel for) """ x = old_shard_index d = old_num_shards n = new_num_shards # Step 1: Test if modular inverse of n modulo d exists inverse_n = modular_inverse(n, d) if inverse_n is None: print(f"No modular inverse exists for n = {n} mod d = {d}. Can't reshard deterministically.") return None # Step 2: Clear up for y utilizing modular inverse y = (inverse_n * x) % d return y

Instance take a look at:

import hashlib def custom_hash(key, num_shards): hash_bytes = hashlib.sha256(key.encode('utf-8')).digest() hash_int = int.from_bytes(hash_bytes, byteorder='massive') return hash_int % num_shards # Instance utilization old_num_shards = 10 new_num_shards = 7 # Simulate resharding for just a few keys keys = ['user_123', 'item_456', 'session_789'] for key in keys: old_shard = custom_hash(key, old_num_shards) new_shard = reshard(old_shard, old_num_shards, new_num_shards) print(f"Key: {key} | Outdated Shard: {old_shard} | New Shard: {new_shard}")

Be aware that we’re utilizing a customized hash perform that’s deterministic with respect to key and num_shards to make sure reproducibility.

Outcomes:

Key: user_123 | Outdated Shard: 9 | New Shard: 7 Key: item_456 | Outdated Shard: 7 | New Shard: 1 Key: session_789 | Outdated Shard: 2 | New Shard: 6

Use Case 3: Differential Privateness in Federated Studying

In federated studying, ML fashions are educated throughout decentralized gadgets whereas preserving person privateness. Differential privateness provides noise to gradient updates with a view to obscure particular person contributions throughout gadgets. Typically, this noise is sampled from a discrete distribution and have to be modulo-reduced to suit inside bounded ranges.

Suppose a consumer sends an replace x, and the server applies a metamorphosis of the shape n ⋅ (y + okay) ≡ x (mod d), the place:

x is the noisy gradient replace despatched to the server,

y is the unique (or true) gradient replace,

okay is the noise time period (drawn at random from a variety of integers),

n is the encoding issue,

d is the modulus (e.g., measurement of the finite discipline or quantization vary through which all operations happen)

As a result of privacy-preserving nature of this setup, the server can solely get better y + okay, the noisy replace, however not the true replace y itself.

Beneath is the now-familiar Python setup:

def extended_euclidean_algorithm(a, b): if b == 0: return a, 1, 0 else: gcd, x_prev, y_prev = extended_euclidean_algorithm(b, a % b) x = y_prev y = x_prev - (a // b) * y_prev return gcd, x, y def modular_inverse(a, m): gcd, x, _ = extended_euclidean_algorithm(a, m) if gcd != 1: return None return x % m

Instance take a look at simulating some purchasers:

import random # Parameters d = 97 # modulus (finite discipline) noise_scale = 20 # controls magnitude of noise # Simulated purchasers purchasers = [ {"id": 1, "y": 12, "n": 17}, {"id": 2, "y": 23, "n": 29}, {"id": 3, "y": 34, "n": 41}, ] # Step 1: Purchasers add noise and masks their gradients random.seed(10) for consumer in purchasers: noise = random.randint(-noise_scale, noise_scale) consumer["noise"] = noise noisy_y = consumer["y"] + noise consumer["x"] = (consumer["n"] * noisy_y) % d # Step 2: Server receives x, is aware of n, and recovers noisy gradients for consumer in purchasers: inv_n = modular_inverse(consumer["n"], d) consumer["y_noisy"] = (consumer["x"] * inv_n) % d # Output print("Consumer-side masking with noise:") for consumer in purchasers: print(f"Consumer {consumer['id']}:") print(f" True gradient y = {consumer['y']}") print(f" Added noise = {consumer['noise']}") print(f" Masked worth x = {consumer['x']}") print(f" Recovered y + noise = {consumer['y_noisy']}") print()

Outcomes:

Consumer-side masking with noise: Consumer 1: True gradient y = 12 Added noise = 16 Masked worth x = 88 Recovered y + noise = 28 Consumer 2: True gradient y = 23 Added noise = -18 Masked worth x = 48 Recovered y + noise = 5 Consumer 3: True gradient y = 34 Added noise = 7 Masked worth x = 32 Recovered y + noise = 41

Discover that the server is simply in a position to derive the noisy gradients moderately than the unique ones.

The Wrap

Modular arithmetic, with its elegant cyclical construction, provides way over only a intelligent solution to inform time — it underpins among the most important mechanisms in trendy information science. By exploring modular transformations and linear congruences, we’ve got seen how this mathematical framework turns into a robust device for fixing real-world issues. In use circumstances as numerous as characteristic engineering, resharding in distributed databases, and safeguarding person privateness in federated studying by means of differential privateness, modular arithmetic supplies each the abstraction and precision wanted to construct strong, scalable programs. As information science continues to evolve, the relevance of those modular methods will possible develop, suggesting that generally, the important thing to innovation lies within the the rest.

Source link

Bots Are Taking Over the Internet—And They’re Not Asking for Permission

Can Machines Really Recreate “You”?

Unfiltered Roleplay AI Chatbots with Pictures – My Top Picks

Bots Are Taking Over the Internet—And They’re Not Asking for Permission

I Tried Buying a Car Through Amazon: Here Are the Pros, Cons

Amazon and eBay to pay ‘fair share’ for e-waste recycling

Artificial Intelligence Concerns & Predictions For 2025

Barbara Corcoran: Entrepreneurs Must ‘Embrace Change’

Most Popular

Tried AI Image Generator from Text (Unfiltered)

How User-Generated Content Helps You Build Trust and Credibility

How AI & Machine Learning Are Revolutionizing Mobile Apps: Insights from Apps-US.com | by Abdulahad Qtech | Feb, 2025

Our Picks

Bots Are Taking Over the Internet—And They’re Not Asking for Permission

Data Analysis Lecture 2 : Getting Started with Pandas | by Yogi Code | Coding Nexus | Aug, 2025

TikTok to lay off hundreds of UK content moderators

Modular Arithmetic in Data Science

Conceptual Overview

The Fundamentals

Fixing Linear Congruences

Information Science Use Circumstances

Use Case 1: Characteristic Engineering

Use Case 2: Resharding in Distributed Database Programs

Use Case 3: Differential Privateness in Federated Studying

The Wrap

Related Posts