One of our clients once reached out to us for help with deep inspection of traffic that was reaching its web servers, mostly from the compliance perspective.
The client had, by then, already run multiple kinds of workloads that were mostly web or API interfaces exposed over the internet.
Now the internet is an exciting place to host one’s services, as well as to reach out and meet a wide range of client needs. It is also a place where intelligent minds get to deliberately exploit vulnerabilities of applications and services. Preventing these harmful attacks, therefore, is a necessity today. So every request that reaches your applications over the internet must be inspected for its ‘intentions’ – is it a legitimate one, or is it attempting to exploit vulnerabilities that are yet to be fixed.
While legitimate requests are allowed to go through, the ones that are exploitative in nature must be blocked. This entails real-time inspection, but with little or insignificant delays, to ensure the user experience remains unaffected.
The idea proposed to correct this sort of a situation was that of a Reverse Proxy using a highly available load balancer. Given that the client had his deployment on AWS Cloud, the natural solution was to use the AWS Elastic Load Balancer (ELB) and host the inspection service behind it. Doing so would then relay the request and responses to and from the application services.
Now, I won’t get into the details of setting up the inspection service -- that we will get to some other time. However, I will touch upon one of the aspects of the integration with AWS ELB required to check of a key line item in the compliance checklist – that of identifying the source IP address from where the request is originating.
Let’s quickly summarize the setup so far:
The connections are stateful using TCP/ IP, and the routing of established connections is managed at each hop. This means that every time a packet of data makes a hop, the source and destination details of each connection are stored at the hop, while the device’s source address is written onto the packet to enable the return traffic.
The data packet(s) originating from the device making the request will have two key pieces of information – the IP address of the device making the request a.k.a source IP, and the IP address of the ELB a.k.a destination IP address. The destination IP address field of the packet is used by network devices making up the internet to send the packet(s) along to the ELB associated with the IP address. At this stage, the filed source IP address will contain the IP address of the requesting device.
After the packet is processed by the ELB’s logic on load balancing, the packet will then be addressed to one of the hosts running the inspection service. The source IP address filed will now contain the ELB’s IP address – this is to enable response packet requests to go back to the same device that sent the request to the inspection service.
Therein lies the issue, where the IP address of the original device making the request is held at the load balancer for the duration of the session, and then is lost to logging/ auditing services. Thankfully, this is by design and is sometimes useful in enhancing security where the requesting device’s identity has to be abstracted. That said, this is not one of those cases. The same scenario recurs when the inspection service relays the request to the application service.
Here, we will look at sending the source IP address through the ELB to the inspection service. The same can be achieved on the inspection service, but we are not going to get into that. I am sure you will be able to figure that out on your own once you see how it’s done using the ELB.
Since the abstraction of the source IP is by design, the original requester’s IP address must be relayed as additional information through the ELB, i.e. in addition to the ELB’s source IP address. AWS ELB provides for only sending this additional piece of information in the request header originating from the ELB (see ELB x-forward-for/ proxy protocol). It’s the job of the service at the request receiving end (inspection service in this case) to make sense of the additional data that is coming its way and extract the required information.
Once the service uses the right modules (like myfixip with apache in this case), apache can read the format in which the ELB is forwarding the original requester’s IP address over proxy headers. This can then be used by the apache to log the IPs. Alternatively, the application service can pull that information to run its login against it.
Contact Us Today
What are the Differences? Though often used interchangeably, data pipelines and ETL are two different methodologies for managing and structuring data. ETL tools are used for data extraction, transformation, and loading. Whereas data pipelines encompass the entire set of processes applied to data as it moves from one system to another. Sometimes data pipelines involve transformation, and sometimes they do not.Explore
One Unified Dashboard In the past, most enterprises would have used a legacy business management system to track business needs and understand how IT resources can fulfill these needs. The problem with these legacy systems is the manual data collection process, which introduces the risk of human error and is much slower than newer automated solutions.Explore
Intelligent automation in the workplace is becoming more relevant in the modern market. As automation technology becomes more refined and smart business models allow business owners to optimize their workflow, more and more are turning to intelligent automation for their internal and client-facing processes alike.Explore
What is a Hybrid Data Center? A hybrid data center is a computing environment that combines on-premise and cloud-based infrastructure to enable the sharing of applications and data across physical data centers and multi-cloud environments. This allows organizations to balance the security provided by on-premise infrastructure and the agility found with a public cloud environment.Explore
Leverage Your Data to Discover Hidden Potential The amount of data in the insurance industry is exploding, and the number of opportunities to leverage this data to achieve large-scale business value has exploded along with it. Rapid integration of technology makes it possible to use advanced business analytics in insurance to discover potential markets, risks, customers, and competitors, as well as plan for natural disasters.Explore
Increased Use of Data Lakes As volumes of big data continue to explode, data lakes are becoming essential for companies to leverage their data for competitive advantage. Research by Aberdeen shows that organizations that have deployed and are using data lakes outperform similar companies by nine percent in organic revenue growth.Explore