One of our clients once reached out to us for help with deep inspection of traffic that was reaching its web servers, mostly from the compliance perspective.
The client had, by then, already run multiple kinds of workloads that were mostly web or API interfaces exposed over the internet.
Now the internet is an exciting place to host one’s services, as well as to reach out and meet a wide range of client needs. It is also a place where intelligent minds get to deliberately exploit vulnerabilities of applications and services. Preventing these harmful attacks, therefore, is a necessity today. So every request that reaches your applications over the internet must be inspected for its ‘intentions’ – is it a legitimate one, or is it attempting to exploit vulnerabilities that are yet to be fixed.
While legitimate requests are allowed to go through, the ones that are exploitative in nature must be blocked. This entails real-time inspection, but with little or insignificant delays, to ensure the user experience remains unaffected.
The idea proposed to correct this sort of a situation was that of a Reverse Proxy using a highly available load balancer. Given that the client had his deployment on AWS Cloud, the natural solution was to use the AWS Elastic Load Balancer (ELB) and host the inspection service behind it. Doing so would then relay the request and responses to and from the application services.
Now, I won’t get into the details of setting up the inspection service -- that we will get to some other time. However, I will touch upon one of the aspects of the integration with AWS ELB required to check of a key line item in the compliance checklist – that of identifying the source IP address from where the request is originating.
Let’s quickly summarize the setup so far:
The connections are stateful using TCP/ IP, and the routing of established connections is managed at each hop. This means that every time a packet of data makes a hop, the source and destination details of each connection are stored at the hop, while the device’s source address is written onto the packet to enable the return traffic.
The data packet(s) originating from the device making the request will have two key pieces of information – the IP address of the device making the request a.k.a source IP, and the IP address of the ELB a.k.a destination IP address. The destination IP address field of the packet is used by network devices making up the internet to send the packet(s) along to the ELB associated with the IP address. At this stage, the filed source IP address will contain the IP address of the requesting device.
After the packet is processed by the ELB’s logic on load balancing, the packet will then be addressed to one of the hosts running the inspection service. The source IP address filed will now contain the ELB’s IP address – this is to enable response packet requests to go back to the same device that sent the request to the inspection service.
Therein lies the issue, where the IP address of the original device making the request is held at the load balancer for the duration of the session, and then is lost to logging/ auditing services. Thankfully, this is by design and is sometimes useful in enhancing security where the requesting device’s identity has to be abstracted. That said, this is not one of those cases. The same scenario recurs when the inspection service relays the request to the application service.
Here, we will look at sending the source IP address through the ELB to the inspection service. The same can be achieved on the inspection service, but we are not going to get into that. I am sure you will be able to figure that out on your own once you see how it’s done using the ELB.
Since the abstraction of the source IP is by design, the original requester’s IP address must be relayed as additional information through the ELB, i.e. in addition to the ELB’s source IP address. AWS ELB provides for only sending this additional piece of information in the request header originating from the ELB (see ELB x-forward-for/ proxy protocol). It’s the job of the service at the request receiving end (inspection service in this case) to make sense of the additional data that is coming its way and extract the required information.
Once the service uses the right modules (like myfixip with apache in this case), apache can read the format in which the ELB is forwarding the original requester’s IP address over proxy headers. This can then be used by the apache to log the IPs. Alternatively, the application service can pull that information to run its login against it.
Contact Us Today
What Is an SQL Query Engine? SQL query engine architecture was designed to allow users to query a variety of data sources within a single query. While early SQL-based query engines such as Apache Hive allowed analysts to cut through the clutter of analytical data, they found running SQL analytics on multi-petabyte data warehouses to be a time-intensive process that was difficult to visualize and hard to scale.Explore
A Winning Base for Successful Digital Transformations When it comes to developing a successful digital strategy, it is not just corporations planning to maximize the benefits of data assets and technology-focused initiatives. The Government of Western Australia recently unveiled four key priorities for digital reform in its new Digital Strategy for 2021-2025.Explore
Engage Your Workforce with a Modern Employee Intranet Solution The employee intranet has changed significantly since it was first introduced in the early 1990s. What started as HTML-based static portals have now evolved into intuitive communication tools complete with search engines, user profiles, blogs, event planners, and more. Today, many organizations are taking a second look at employee intranets to bridge gaps between teams, build company culture, centralize information, increase productivity, and improve workflow.Explore
Adopting emerging cloud technologies, consolidating resources, and improving processes is the key. “IT no longer just supports corporate operations as it traditionally has but is fully participating in business value delivery. Not only does this shift IT from a back-office role to the front of business, but it also changes the source of funding from an overhead expense that is maintained, monitored, and sometimes cut, to the thing that drives revenue,” said John-David Lovelock, research vice president at Gartner.Explore
Deliver Powerful Insights Instantaneously with Federated Queries - No Matter Where Your Data Resides The concept of federated queries isn’t new. Facebook PrestoDB popularized the idea of distributed structured query language (SQL) query engines in 2013. Over the years, AWS, Google, Microsoft, and many others in the industry have accelerated the adoption of a distributed query engine model within their products. For example, AWS developed Amazon Athena on top of the Presto code base, while Google’s BigQuery is based on Cloud SQL.Explore
What is Unstructured Data? Almost 80% of the data that enterprises and organizations collect is unstructured - data without a set record format or structure. Unstructured data includes data such as emails, web pages, PDFs, documents, customer feedback, in-app reviews, social media, video files, audio files, and images.Explore