Salted Hybrid Bloom Filter (SHBF)

Kapil N, Thakral C, Raghuvanshi A and Sathiyamoorthy E

Published on: 2025-08-20

Abstract

Big Data’s growth led to the change of technological environment in the fields of IT, Bioinformatics etc. There is an apparent spike in the solution requirement for efficient data management in the era of continuous data generation. Bloom Filter is a probabilistic data structure that efficiently handles big datasets, which filters out redundant data and makes effective use of memory. One of the major drawbacks of Bloom Filtering is False Positives, which gives rise to operational inefficiencies and secondary lookups.

Types of Bloom Filter techniques, including Cuckoo Filter, Partitioned Bloom Filter, and Counting Bloom filter have been recommended as solution. Main goals of these techniques include optimizing the spatial properties and increasing the number of bits allotted to virtual Bloom array. In this work we have defined how the Modular Dual Bloom Filter architecture works. It involves dual fingerprint verification along with the salting property that helps in significantly reducing the false positives which requires two independent fingerprint checks for confirmation. This algorithm uses modular partitioning with multilevel fingerprint arrays instead of only bit array.

Keywords

Bloom Filters; Fingerprint array; Modular approach; Membership testing

Introduction

The Salted Hybrid Bloom Filter (SHBF), a Bloom Filter advancement described in this research, minimizes false positives while demonstrating memory and processing time efficiency. It incorporates a novel technique known as the "dual fingerprint verification mechanism," which calls for two separate fingerprint checks. To ensure that the system is not overly liberal, SHBF employs a dual verification procedure in contrast to Bloom Filters. Unlike conventional, items are not confirmed as present until both fingerprints match. The unique nature of fingerprints greatly reduces the chances of mistakenly matching two different fingerprints, leading to exceedingly low false positive rates.

The founder of Bloom Filters was Burton Howard Bloom. These probabilistic data structures are incredibly efficient when it comes to allocating memory. Unlike traditional data structures, which are limited in their functionality, Bloom Filters are specifically designed for storing data about a particular dataset. Membership testing is one area where these filters are utilized. These filters, however, are flawed in one aspect - the problem of false positives. These outdated Bloom Filters are highly inefficient in real-time applications as they have to deal with unnecessary secondary lookups. In order to overcome these shortcomings, various variants of Bloom Filter have been developed such as Counting Bloom Filters, Cuckoo Filters and Partitioned Bloom Filters. All variants try to solve some degree of scalability, element deletion, and false positive reduction, but often at the price of increased complexity or performance reductions.

This paper presents an advance in Bloom Filters called Salted Hybrid Bloom Filter that reduces false positives while being efficient in regard to processing time. It adds a “dual fingerprint verification mechanism”, a new method that requires two independent fingerprint checks for membership confirmation. In opposition to Bloom Filters, SHBF uses a dual verification process and elements are not confirmed as present until both fingerprints do match, ensuring that the system is not too permissive. False positive rates are extremely low because the independence of the fingerprints greatly reduces the chances of two dissimilar fingerprints matching incorrectly. Moreover, SHBF uses modular partitioning as well as multiple fingerprint arrays to ensure greater accuracy of the filter’s space efficiency and high reliability.

The main scopes of this research are directed towards minimizing false positives of SHBF. The dual fingerprint verification component helps SHBF outperform other Bloom Filters and their variants in the matter of false positive ratios. This enhancement is crucial in many instances that can suffer from expensive secondary lookups or mistakes performed because of database indexing, network security decisions, or massive data processing. Also, the research serves as a case study proving that SHBF accomplishes not only top remaining false positive ratios, but also competitive memories and performance in computations. High false positive reduction ratios combined with competitive memory and performance make SHBF a solution for modern Big Data problems and set a new era of probabilistic data structures on industrial engagements.

Existing Bloom Filter Variants

Bloom Filters and their extensions have been studied and applied with great attention as they aid effectiveness in probabilistic data structures. These filters offer a space-efficient solution for membership testing, making them suitable for different use cases. Bloom Filters guarantee no false negatives while maintaining scalability, space, and time efficiency, but they often produce false positives, which are one of the considerable limitations. Here is an in-depth analysis of several existing Bloom Filtering Techniques like the Standard Bloom Filter, Scalable Bloom Filter, Counting Bloom Filter, Cuckoo Filter, Quotient Filter, and Hybrid Modular Dual Bloom Filter. All these Bloom Filtering techniques have distinct characteristics, advantages, and limitations.

Standard Bloom Filter

One of the first probabilistic data structures to be introduce d was the Standard Bloom Filter (SBF), created by Burton Ho ward Bloom in 1970. The filter is very helpful in determining whether an element belongs to or is a member of a given set or not (with some chances of false positives SBF utilizes a bit array of size m in conjunction with k independent hash functions. At first, all bits within the bit array are initialized to 0. Whenever an element is inserted, it is hashed by these k hash functions. The bits that correspond to those elements are set to 1. The bits are later compared, and if all bits set to one earlier are one, then the element is placed in the set. This filter is space-efficient and provides constant-time operations O(k) for insertion and query. It is commonly used for packet filtering in network routers and query optimization in databases; however, the SBF does have limitations, such as the inability to delete, and as the filter gets saturated, an increasing likelihood of false positives. Dual fingerprinting helps reduce false positives, and the modular design provides a foundation for implementing deletion strategies.

Scalable Bloom Filter

The standard Bloom filter’s cap size issues are resolved by the Scalable Bloom Filter (SBF), which as the name states, is a lot more scalable [11]. In cases where a standard filter is being used, and the number of elements is above a certain limit, the likelihood of false positives is much greater. To make life easier, SBF dynamically adjusts its sizing. It has multiple standard bloom filters linked together in a sequential manner, with the new filters being inserted at a back when the filter being filled becomes full. This system effectively reduces the probability of false positives while also rationing memory, but it does come at a cost. The SBF can be harder to implement and require more memory than a normal Standard Bloom filter. However, it shines in applications like distributed databases and data streaming as those are hard to estimate the space needed. It has a variable size to cater for growing datasets; however, this incurs a penalty for the memory in use with a higher false positive rate as more filters get appended to the sequence. The layered configuration makes it a tedious task to manage several filters. The E-HMDBF fixes this problem with its dual fingerprinting modular design, which lowers the chances of false positives without introducing extra layers. The memory footprint remains static, with improved accuracy and much simpler management compared to the Scalable Bloom Filter.

Counting Bloom Filter

The Counting Bloom Filter (CBF) is an enhancement of the Standard Bloom Filter [8] that allows for the deletion of elements, a feature that the Standard Bloom Filter lacks. The CBF adopts an alternative method by utilizing an array of counters rather than a bit array. The array is initialized based on the number of times a given bit has been updated. Whenever an element is introduced, the filter increases the corresponding counters instead of eliminating the element from the filter. To remove the element from the filter, the counter must be decremented. One can check if the element belongs to the filter by checking whether all the related counters are positive. There is a disadvantage to CBF, which is the counters overflow that can occur to users who do not consider measures when dealing with high-turnover data sets. The CBF also shares the same disadvantages that Standard Bloom Filters share in terms of false positives and higher memory utilization. The CBF is able to delete elements and is cheaper to use compared to the Standard Bloom Filter, at a cost though, that is less efficient as far as using memory is concerned. The CBF has grown widely popular among consumers of systems used for caching as well as routers in a network, whereby elements need dynamic deletion for adequate performance. The SHBF achieves similar accuracy without counters, reducing memory overhead and complexity.

Quotient Bloom Filter

The Quotient Filter represents another memory-efficient probabilistic data structure [1] employed in membership tests, insertions, and deletions. It divides the hash value of an item into a quotient and the remainder stores the latter in a hash table entry, where the quotient is employed to report the location. It is yet another memory efficient probabilistic data structure employed in membership tests, insertions, and deletions. It separates the hash value of an element into a quotient and a remainder, storing the remainder in a hash table entry, while the quotient is used for pointing. It divides an element's hash value into a quotient and a remainder, whose storage is additionally in a hash table entry with the quotient employed for pointing of position. Limitation of Quotient Filter is greater complexity, performance loss at high load. The SHBF presents a simpler and more effective alternative with similar accuracy.

Cuckoo Bloom Filter

Unlike the Standard Bloom filter, the Cuckoo Filter is defined with a probability of False Positive outcomes [5] and allows insertions, lookups, and deletions. Elements are stored by means of cuckoo hashing, and their fingerprints are contained in a hash table where an element can be stored in two possible places. If both positions are filled, the filter kicks out a fingerprint, relocates it and allows the manage dynamic data sets in the Cuckoo Filter so that it can be utilized for networking and database applications. However, Cuckoo Filters suffer from high insertion complexity due to displacement chains, especially under heavy loads. This is also termed as ‘kick-out loop’ when the table becomes too full. This requires careful control of the load factor to alleviate the damage caused by the cost of fingerprint eviction slowing down insertion speed relative to the Bloom filter. Even though they’re less memory intensive than Counting Bloom Filters, Filters perform poorly with high- turnover data sets. The SHBF claims to maintain consistent insertions and scalability by partitioning entries into different modules at the cost of increased complexity, and claiming to use an adjustable fingerprint and hybrid bit-fingerprint compression to cut FPR and memory use fixed FPR and memory wastage from CBF.

For Full-length manuscript, kindly go through this PDF.