We provide IT Staff Augmentation Services!

Big Data Engineer/cloud Data Engineer Resume

Chicago, IL


  • 8 years of experience in IT Industry in the Big data platform having extensive hands on experience in Apache Hadoop ecosystem and enterprise application development. Good knowledge on extracting the models and trends from the raw data collaborating with the data science team.
  • Experience in Hadoop ecosystem experience in ingestion, storage, querying, processing and analysis of big data
  • Hands on experience on Data Analytics Services such as Athena, Glue Data Catalog & Quick Sight
  • Performed the migration of Hive and MapReduce Jobs from on - premise MapR to AWS cloud using EMR and Qubole
  • Experience in installation, configuration, supporting and managing Hadoop Clusters using HDP and other distributions
  • Hands on expertise with AWS Databases such as RDS(Aurora), Redshift, DynamoDB and Elastic Cache (Memcached & Redis)
  • Hands on experience on tools like Pig & Hive for data analysis, Sqoop for data ingestion, Oozie for scheduling and Zookeeper for coordinating cluster resources
  • Worked on Scala code base related to Apache Spark performing the Actions, Transformations on RDDs, Data Frames & Datasets using SparkSQL and Spark Streaming Contexts
  • Proficiency in analyzing large unstructured data sets using PIG and developing and designing POCs using Map-reduce and Scala and deploying on the Yarn cluster
  • Experienced in developing MapReduce programs using Apache Hadoop for working with Big Data
  • Good understanding of Apache Spark High level architecture and performance tuning pattern
  • Parsing the data from S3 through the Python API calls through the Amazon API Gateway generating Batch Source for processing
  • Good understanding of AWS SageMaker
  • Extract, transform and load the data from different formats like JSON, a Database, and expose it for ad-hoc/interactive queries using Spark SQL


Databases: Oracle, SQL Server, MySQL, HBase, MongoDB, RedShift, DynamoDB and Elastic Cache

Data Visualization Tools: Cognos, Tableau

Machine Learning & Analytics Tools: AWS Sage Maker, AWS Glue, AWS Athena

Cloud: AWS, Azure

Programming Languages: C++, Java, J2EE, Python, Scala, Shell scripting, Core Java, JDBC, C, PL/SQL, Perl

Web Technologies: HTML, JavaScript, CSS, J2EE, JqueryDevelopment EnvironmentsEclipse

Operating System: Linux, Unix, Windows

Integration Tools: Git, Gerrit, Jenkins, ant, Maven

Big Data Ecosystem: HDFS, MapReduce, Pig, Hive, YARN, Impala, Sqoop, Flume, Oozie, Zookeeper, Spark, Scala, Storm, Kafka, Spark SQL, Azure SQL


Big Data Engineer/Cloud Data Engineer

Confidential - Chicago, IL


  • Involved in designing and deploying multi-tier applications using all the AWS services like (EC2, Route53, S3, RDS, Dynamo DB, SNS, SQS, IAM) focusing on high-availability, fault tolerance, and auto-scaling in AWS Cloud Formation
  • Supporting Continuous storage in AWS using Elastic Block Storage, S3, Glacier. Created Volumes and configured Snapshots for EC2 instances
  • Used Data Frame API in Scala for converting the distributed collection of data organized into named columns, developing predictive analytic using Apache Spark Scala APIs
  • Developed Scala scripts using both Data frames/SQL/Data sets and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop
  • Developed Hive queries to pre-process the data required for running the business process
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios
  • Implementations of generalized solution model using AWS SageMaker
  • Extensive expertise using the core Spark APIs and processing data on an EMR cluster
  • Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline which can be written to Glue Catalog and can be queried from Athena.
  • Programmed in Hive, Spark SQL, Java, C# and Python to streamline the incoming data and build the data pipelines to get the useful insights, and orchestrated pipelines
  • Extensive expertise using the core Spark APIs and processing data on a EMR cluster
  • Worked on ETL pipeline to source these tables and to deliver this calculated ratio data from AWS to Datamart (SQL Server) & Credit Edge server
  • Experience in using and tuning relational databases (e.g. Microsoft SQL Server, Oracle, MySQL) and columnar databases (e.g. Amazon Redshift, Microsoft SQL Data Warehouse)

Environment: & Tools: Hortonworks, Hadoop, HDFS, AWS Glue, AWS Athena, EMR, Pig, Sqoop, Hive, NoSQL, HBase, Shell Scripting, Scala, Spark, SparkSQL, AWS, SQL Server, Tableau

Big Data Engineer/Cloud Data Engineer

Confidential - Dover, NH


  • Worked closely with business, transforming business requirements to technical requirements part of Design Reviews & Daily Project Scrums and Wrote custom MapReduce programs by writing Custom Input formats
  • Created Sqoop jobs with incremental load to populate Hive External tables
  • Worked on Partitioning, Bucketing, Join optimizations and query optimizations in Hive
  • Compared the performance of the Hadoop based system to the existing processes used for preparing the data for analysis
  • Worked closely with business, transforming business requirements to technical requirements
  • Implemented Pig Latin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT, AND UNION
  • Implemented Java HBase MapReduce paradigm to load data onto HBase database on a 4 node Hadoop cluster
  • Design and develop Hadoop MapReduce programs and algorithms for analysis of cloud-scale classified data stored in Cassandra
  • Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better performance with HiveQL queries
  • Did Aggregations and analysis on large set of log data, collection of log data done using custom built Input Adapters
  • Evaluated data import-export capabilities, data analysis performance of Apache Hadoop framework
  • Involved in installation of HDP Hadoop, configuration of the cluster and the eco system components like Sqoop, Pig, Hive, HBase and Oozie
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios
  • Tested raw data and executed performance scripts and Assisted with data capacity planning and node forecasting
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios
  • Created reports for the BI team using Sqoop to export data into HDFS and Hive
  • Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database system and vice-versa.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the Data science team
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms

Environment: & Tools: Hadoop, Hive, PIG, Sqoop, Kafka, AWS EMR, AWS S3, AWS Redshift, Oozie, Flume, HBase, Hue, HDP, IBM Mainframes, HP NonStop and RedHat 5.6.

Big Data Developer

Confidential - Phoenix, AZ


  • Worked on Hortonworks-HDP 2.5distribution
  • Responsible for building-scalable distribution data solution using Hadoop
  • Involved in importing data from MS SQL Server, MySQL and Teradata into HDFS using Sqoop
  • Played a key role in dynamic partitioning and Bucketing of the data stored in Hive Metadata
  • Wrote HiveQL queries for integrating different tables for create views to produce result set
  • Collected the log data from Web Servers and integrated into HDFS using Flume
  • Worked on loading and transforming of large sets of structured and unstructured data
  • Used MapReduce programs for data cleaning and transformations and load the output into the Hive tables in different file formats
  • Created data pipelines for different events to load the data from DynamoDB to AWS S3 bucket and then into HDFS location
  • Involved in loading data into HBase NoSQL database
  • Building, Managing and scheduling Oozie workflows for end to end job processing
  • Worked on extending Hive and Pig core functionality by writing custom UDFs using Java
  • Analyzing of Large volumes of structured data using SparkSQL
  • Migrated HiveQL queries into SparkSQL to improve performance

Environment: & Tools: Hortonworks, Hadoop, HDFS, Pig, Sqoop, Hive, Oozie, Zookeeper, NoSQL, HBase, Shell Scripting, Scala, Spark, SparkSQL

Hadoop Developer



  • Developed Hives Scripts for performing transformation logic and also loading the data from staging zone to final landing zone
  • Involved in loading transactional data into HDFS using Flume for Fraud Analytics
  • Developed Python utility to validate HDFS tables with source tables
  • Designed and developed UDF'S to extend the functionality in both PIG and HIVE
  • Import and Export of data using Sqoop between MySQL to HDFS on regular basis
  • Developed a process for Sqooping data from multiple sources like SQL Server, Oracle and Teradata
  • Responsible for creation of mapping document from source fields to destination fields mapping
  • Developed a shell script to create staging, landing tables with the same schema as the source and generate the properties which are used by Oozie Jobs
  • Developed Oozie workflows for executing Sqoop and Hive actions
  • Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources
  • Responsible for developing Python wrapper scripts which will extract specific date range using Sqoop by passing custom properties required for the workflow
  • Automated all the jobs for pulling data from FTP server to load data into Hive tables using Oozie workflows
  • Involved in developing Spark code using Scala and Spark-SQL for faster testing and processing of data

Environment: & Tools: Hadoop, HDFS, Hive, HBase, Zookeeper, Oozie, Impala, Java, Oracle, Teradata SQL Server, UNIX Shell Scripting, Flume, Scala, Spark, Sqoop, Python

SQL Developer



  • Worked on tables, packages, procedures, functions, collections, triggers, cursors, ref cursors, exceptions, views, synonyms, sequence, performance tuning, interfaces, API, Lookups, processing constrain
  • Involved in providing the POC for the new implementation webservices DTO model flow
  • Debugged Order management, Purchase order and Pricing issues in IAT, UAT and production and fix the issue
  • Developed DE fix scripts for the hold orders and corrected the process for the old existing orders.
  • Prepared test plan and test cases for various types of testing like unit, functional, performance and regression
  • Involved in documentation of functional and technical requirements specification
  • Involved in deploying and executing the code in oracle
  • Involved in the integration of third-party tool to oracle
  • Worked on preparation of estimation plan to implement the change request based on the code freeze dates in different instances
  • Comprehensive team work with the client to gather requirements for solutions
  • Completed requirement analysis and compiled a list of clarifications and issues
  • Responsible to ensure the code quality using SVN
  • Responsible for day to day Production Support operations, Job monitoring, Incident ticket resolution, on time delivers and code deployment

Environment: & Tools: Oracle 11g/10g, SQL * Plus, TOAD, SQL*Loader, SQL Developer, Shell Scripts, UNIX, Windows XP

Hire Now