We provide IT Staff Augmentation Services!

Big Data Etl Engineer Resume


  • Experienced in interpreting the requirements of various Big Data Analytic Use Cases and Scenarios and driving the design and implementation of specific data models to ultimately help drive better business decisions through insights from a combination of external data assets.
  • Capable of developing end to end architecture for huge data sets using deign patterns
  • Proficient in batch processing of the data via mongo db, Solr, Stream processing of the data via Strom API, java, drools. Automating it through Jenkins and visualizing it by Kibana and banana tools.
  • Well versed and proficient in understanding the problems finding a quick solution by using various tools and languages.
  • Hands on experience working in advanced analytics platform for end to end incident management system.
  • Experienced in developing and implementing architectural solutions using Pyspark.
  • Expertise in writing Hadoop Jobs for analysing, transforming data using Pyspark and hive.
  • Proficient in processing large sets of structured, semi - structured and unstructured data and supporting systems application.
  • Well versed in defining data requirements, gather and mine large scale of structured and unstructured data, and validate data by running various data tools in the Big Data Environment
  • Experienced in preparing and executing Unit Test Plan and Unit Test Cases after software development.
  • Involved in Agile Scrum methodology that leverages the Client big data platform and used version control tool Git.
  • Co-ordinated with offshore team and cross-functional teams to ensure that applications are properly tested configured and deployed.
  • Excellent analytical, Interpersonal and Communication skills, fast learner, hardworking and good team player.
  • Passionate about exploring and experiencing new technologies and tools.


Programming: Python, Java, SQL, HQL, Bash Scripting, Rules Engine (Drools).

Big Data Ecosystems and languages: Spark v2.3, Hive, Sqoop, Kafka and java.

Streaming and search platform: Apace Strom and Apache Solr with banana

Data visualization: ELK elastic search, Logstash and Kibana.

NoSQL databases: Mongo DB and Solr.

IDE and Automation Platforms: PyCharm, Eclipse, Intellij, Putty, Robo 3T, Jenkins.

Platforms: Windows, Linux, Mac.

Methodologies: Agile -Scrum.

Versioning: Git, Bitbucket, Codecloud.



Big Data ETL Engineer


  • Involved in development of full life cycle implementation of ETL using Oracle, SQL Server and helped with designing the Date warehouse by defining Facts, Dimensions and relationships between them and applied the Corporate Standards in Naming Conventions.
  • Involved in complete Big Data flow of the application starting from data ingestion from upstream to HDFS, processing the data in HDFS and analyzing the data.
  • Work Closely with QA and Prod support team by providing components, documentation, validation and Knowledge transfer on new projects and debugging on issues
  • Design and develop ETL code using Informatica Mappings to load data from heterogeneous Source systems like flat files, XML’s, MS Access files, Oracle to target system Oracle under Stage, then to data warehouse and then to Data Mart tables for reporting.
  • Created workflow instances and improved the performance with large data using round robin, Key range partitions, pushdown optimization.
  • Worked on importing and exporting data from Oracle data into HDFS using SQOOP for analysis, visualization and to generate reports.
  • Creating Hive external tables and partitioned tables using Hive Index and used HQL to make ease of data analytics.
  • Used TOAD to create, execute and Optimized SQL queries to analyze the data, provide support to BA’s, and create various DDL and DML scripts on daily activities. processed huge data, optimized the performance with database partitioning, partition exchange, hints, DOP, drop/rebuilding the indexes and gather stats on the tables/partitions.
  • Wrote Spark programs to perform data cleansing, transformation and joins.
  • Wrote ETL jobs using Spark and Worked on tuning the performance of HIVE queries.
  • Created HIVE queries to join multiple tables of a source system and load them into Elastic Search Tables and used HIVE QL scripts to perform the incremental loads.
  • Experienced in managing Hadoop Jobs and logs of all the scripts. And, Using Oozie to schedule the workflows to execute the batch job.
  • Used shell scripts to perform ETL process to call SQL pre-post ETL process like file validation, zipping, massaging and archiving the source and target files, and used UNIX scripting to manage the file systems.
  • Automated and scheduled Oracle, Informatica and Spark batch jobs using PHP Program Engine that are scheduled with file watchers, daily, weekly and on special requests.
  • Worked closely with Platform service Teams for setup Dev, QA environments based on the configurations, create data refreshment, creating the future partitions, configuring Environment variables to accommodate the framework and help with handshake testing.
  • Participate in migrating the code from Development to Test, Production and provided various migration documentations like Environment preparation, deployment components, Batch execution instructions and DFD’s for jobs.

Core technical areas worked on this project: Spark, Hive, PHP, Sqoop, Cloudera, Hadoop, Python, HDFS. Oozie, HBase Oracle 11G, PL/SQL, UNIX, Toad, SVN.


Applications Developer/Big Data Engineer


  • Was responsible for interpreting the requirements of various Big Data Analytic Use Cases and Scenarios and driving the design and implementation of specific data models to ultimately help better business decisions through insights from a combination of external and internal Confidential & Confidential ’s data assets.
  • Was also accountable for developing the necessary enablers and data platform in the Big Data Lake Environment and had the responsibility of maintaining its integrity during the life cycle phases.
  • Working on Big Data infrastructure for batch processing as well as real-time processing. Responsible for building scalable distributed data solutions using the Hadoop eco system.
  • Used core java concepts to build a working model to stream in the alarms via Storm from various Equipment’s across the mainland U.S.
  • Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Maintained and queried the batch data and the streaming data of the End to End incident Management project.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Experience in manipulating/analysing large datasets and finding patterns and insights within structured and unstructured data.
  • Involved in developing Bash Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.
  • Since the data dealt is very huge optimization is very important hence used searching algorithms to implement the correct search optimization for streaming data in java.
  • Used Jenkins to automate the jobs to run daily manipulated it to include advanced filtering using bash scripting and mongo queries.
  • Experience in designing and developing applications in Spark using pyspark to compare the performance of Spark with Hive and SQL/Oracle.
  • Highly efficient in querying mongo and wrote a java program to convert the mongo collections into hive tables for M.L Team prediction analysis.
  • Experienced in working with spark eco system using Spark SQL and Pyspark queries on different formats like Text file, CSV file.
  • Developed Spark scripts by using Pyspark shell commands as per the requirement.
  • Developed spark code and spark-SQL/streaming for faster testing and processing of data.
  • Exported the analysed data to relational databases using Sqoop and used Kibana for visualization and to generate reports.
  • Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.

Core technical areas worked on this project: Java, Python, Apache Spark, Apache Hive, GIT, Jenkins, DVC, MongoDB, Rules Engine (Drools), Apache Storm, ELK, Docker, Kubernetes.


Hadoop Developer


  • Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase and SQOOP.
  • Coordinated with business customers to gather business requirements. And also interact with other technical peers to derive Technical requirements.
  • Extensively involved in Design phase and delivered Design documents.
  • Involved in Testing and coordination with business in User testing.
  • Importing and exporting data into HDFS and Hive using SQOOP.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Involved in creating Hive tables, loading with data and writing hive queries.
  • Experienced in defining job flows.
  • Used Hive to analyze the partitioned data and compute various metrics for reporting.
  • Experienced in managing and reviewing the Hadoop log files.
  • Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations.
  • Load and Transform large sets of structured and semi structured data.
  • Responsible to manage data coming from different sources.
  • Created Data model for Hive tables.
  • Involved in Unit testing and delivered Unit test plans and results documents.
  • Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.

Core technical areas worked on this project: Hadoop, HDFS, MapReduce, Pig, Hive, Sqoop, HBase, java, python.

Hire Now