We provide IT Staff Augmentation Services!

Big Data Engineer Resume



  • Over 8 years of experience in Big Data technologies related to banking and Financial services
  • Significant expertise in implementing Big Data Ecosystem components like HDFS, Hive, Sqoop, Spark, Spark Core, Spark Streaming, Spark SQL, Zookeeper, Flume, Kafka, Oozie
  • Created end to end data pipelines using Big Data tools
  • Productionizing Big Data Applications
  • Experience in creating the data lakes in consultation with Data Warehousing teams. Defining the data layouts.
  • Implemented Partitions, bucketing concepts in Hive to optimize performance and developed HiveQL aggregations based on requirements
  • Experience in Importing and exporting data from different databases like MySQL, Oracle, Teradata and S3
  • Experience in NOSQL databases like MongoDB, HBase, Cassandra
  • Hands - on experience with message broker such as Apache Kafka
  • Extensive experience with SQL, PL/SQL and database concepts
  • Worked extensively with Spark tools like RDD transformations, Data Frames, Spark MLlib, Spark SQL and Streaming API
  • Strong experience in writing applications using python using different libraries like Pandas, Numpy
  • Involved in writing SerDe regular expressions to read unstructured data from various sources
  • Experience with onboarding new tools or technologies by carrying out different proof of concepts and defining different metrics for evaluation of tools or technologies
  • Worked on multiple stages of Software Development Life Cycle including Development, Component Integration, Performance Testing, Deployment and Support Maintenance
  • Experience in working with Cloudera, Hortonworks Hadoop Distributions
  • Strong communication skills with professional attitude and can take the pressures to drive with enthusiasm to support with full potential


Hadoop Core Services: HDFS, Spark, YARN

Hadoop Distribution: CDH 3 and 4, Hortonworks

Hadoop Data Services: Hive, Pig, Sqoop, Spark, Kafka

Hadoop Services: Zookeeper, Oozie

Programming Languages: Python, Scala, SQL, Shell Scripting

Python: Pandas, NumPy, Matplotlib, Plotly, Seaborn

Operating Systems: Windows, Linux, Unix, centos 5,6

IDE Tools: Eclipse, IntelliJ, Net beans

Databases: MySQL, HBase, Mongo DB, Oracle

Others: Git, Putty, Tableau


Confidential, NJ

Big Data Engineer

Roles & Responsibilities:

  • Involved in complete Bigdata flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data from HDFS
  • Created Partitioned and Bucketed Hive tables in ORC File Formats using Zlib/Snappy Compression from Avro tables
  • Involved in performance tuning of Hive from design, storage and query perspectives
  • Developed Sqoop jobs to import data in Avro file format from Oracle database and created hive tables on top of it
  • Experienced in implementing Spark RDD, Data Frames and performed transformations based on requirements
  • Involved in writing queries in Spark SQL using PySpark
  • Performed real-time analysis of the incoming data using Kafka consumer API, Kafka topics, Spark Streaming utilizing Scala
  • Developed python scripts to collect data from source systems and store it on HDFS to run analytics
  • Analyzed the SQL scripts and designed the solution to implement using python.
  • Implemented Oozie Operational Services for batch processing and scheduling workflows dynamically
  • Involved in designing and developing tables in HBase and storing data
  • Experienced in troubleshooting errors in HBase Shell/API, Hive
  • Worked on Hortonworks Data Platform (HDP 2.4) Hadoop distribution for data querying using Hive to store and retrieve data

Environment: HDFS, Hive, Sqoop, Spark, Spark Streaming, Spark SQL, HBase, PySpark, Kafka, Scala, Python, Oracle

Confidential, NY

Big Data Engineer

Roles & Responsibilities:

  • Importing and exporting data into HDFS and Hive using Sqoop
  • Experience working on processing unstructured data using Hive
  • Implemented Partitioning, Dynamic Partitions, Buckets in Hive
  • Developed Hive queries and Spark SQL queries to analyze large datasets
  • Exported the result set from Hive to MySQL using Sqoop
  • Created and maintained technical documentation for launching Hadoop clusters and for executing Hive queries
  • Worked on debugging, performance tuning of Hive
  • Gained experience in managing and reviewing Hadoop log files
  • Involved in scheduling Oozie workflow engine to run multiple Hive jobs
  • Used NoSQL database with Hbase
  • Actively involved in code review and bug fixing for improving the performance

Environment : Hadoop, HDFS, Hive, Sqoop, Spark, Flume, LINUX, Hbase, Oozie


Hadoop Developer

Roles & Responsibilities:

  • Implemented Proof of concepts on Hadoop Stack and different big data analytic tools, migration from different databases to Hadoop.
  • Developed the Sqoop scripts to make the interaction between Hive and MySQL Database
  • Responsible for analyzing and cleansing raw data by performing Hive queries
  • Used analytical tools including Hive, Spark with Cloudera distribution
  • Experience in creating Hive tables to store the processed results in a tabular format
  • Participated in development/implementation of Cloudera Hadoop environment
  • Implemented optimization techniques like partitions and bucketing to provide better performance with HiveQL queries
  • Created custom user defined functions in Hive using Python
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume
  • Designed technical solution for real-time analytics using Kafka and Spark
  • Designed and Modified database tables and used MongoDB queries to insert and fetch data from tables
  • Used PySpark for streaming, interactive queries, and iterative algorithms
  • Used Oozie operational services for batch processing and scheduling workflows dynamically

Environment: Hadoop, Spark, YARN, Flume, Hive, Scoop, Oozie, MongoDB, HDFS, Zookeeper, Oracle, MYSQL


Hadoop Developer

Roles & Responsibilities:

  • Importing and exporting data into HDFS from Relational databases using Sqoop
  • Created partitions, bucketing across state in Hive to handle structured data.
  • Implemented Dash boards that handle HiveQL queries internally like Aggregation functions, basic hive operations, and different kind of join operations.
  • Implemented business logic based on state in Hive using Generic UDF's.
  • Involved in moving all log files generated from various sources to HDFS for processing through Kafka, Flume.
  • Created Hive tables to store the processed results in a tabular format.
  • Involved in developing Hive UDFs and reused in some other requirements.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala and have a good experience in using Spark-Shell and Spark Streaming.
  • Extracted files from MongoDB through Sqoop and placed in HDFS and processed

Environment: CDH 3, HDFS, Hive, Spark, Spark Streaming, Kafka, Flume, Sqoop, MongoDB, My SQL.


SQL Developer

Roles & Responsibilities:

  • Designed, Created and maintained database objects like Tables, Views, Stored Procedures, User Defined Functions using SQL server
  • Applied business rules to perform extensive data scrubbing to maintain data quality and consistency.
  • Create Tables, Views, Indexes based on the requirements
  • Created SQL reports, data extraction and data loading scripts for different databases and schemas
  • Designing the Databases, Developing Business Intelligence Analysis, Design Specifications Implement and Reporting with Microsoft SQL Server
  • Identified relationships between tables, enforced referential integrity using foreign key constraints.
  • Worked in Production Support Environment as well as QA/TEST environments for projects, work orders, maintenance requests, bug fixes, enhancements, data changes, etc
  • Wrote packages to fetch complex data from different tables in remote databases using joins, sub queries and database links
  • Developed and supported analysis solutions, data transformations, and reports.

Hire Now