We provide IT Staff Augmentation Services!

Big Data Engineer Resume

2.00/5 (Submit Your Rating)

Quincy, MA

PROFESSIONAL SUMMARY:

  • Over 7 years of experience in software development with 5 years of experience in Big Data as Hadoop/Spark developer, 2 plus of years of experience in AWS and 2 plus years of experience in analysis, design, development and testing of enterprise software applications using Java/J2ee technologies.
  • Strong hands on experience in developing and integrating big data applications allowing for flexible and scalable data transformation with data quality controls.
  • Experience working with major components in Hadoop Ecosystem like Hadoop MapReduce, HDFS, Hive, Pig, Sqoop, HBase, Flume, Spark, Storm, Kafka, Oozie and Zookeeper.
  • Excellent understanding of Hadoop Architecture (YARN & MRV1) and experience in setting up, configuring and monitoring Hadoop clusters.
  • Used Spark API over Cloudera and HDP to perform analytics on data in Hive.
  • Analyzed large data sets of structured, semi - structured and unstructured data using AWS Athena, HiveQL and Spark programs.
  • Worked with data in various file formats including Avro, ORC and Parquet .
  • Developed Spark code using Scala and Spark-SQL for data processing and testing.
  • Experienced on Amazon AWS concepts like EMR, EC2 and S3 Buckets which provides fast and efficient processing of Big Data.
  • Experienced in Extraction, Transformation, and Loading (ETL) processes based on business need using AWS Glue and Oozie to execute multiple Spark, Hive, Shell and SSH actions.
  • Good understanding of NoSQL databases like HBase and Cassandra .
  • Implemented Talend Bigdata to load the data in to HDFS, S3, Hive.
  • Experienced in Cluster maintenance, commissioning and decommissioning data nodes, troubleshooting, managing & reviewing Hadoop log files.
  • Solid understanding and extensive experience in working with different databases such as Oracle, SQL Server, MySQL and writing stored procedures, functions, joins and triggers for different data models.
  • Excellent Java development skills using J2EE, Servlets, Junit and familiar with popular frameworks such as Spring and Hibernate .
  • Extensive experience in PL/SQL, developing stored procedures with optimization techniques
  • Expertise in Waterfall and Agile - SCRUM methodologies.
  • Excellent team player, with pleasant disposition and ability to lead a team.

TECHNICAL SKILLS:

Programming Skills: Python, Scala, Java, Shell Scripting.

Big Data Hadoop: Map Reduce, HDFS, HBase, Zookeeper, HIVE, Pig, SQOOP, Spark, Cassandra, Oozie, Flume, Ambari, Flume, Kafka, Mahout, Impala, HUE.

Amazon Web Service s: EC2, S3 Bucket, EMR, DynamoDB, Redshift, Athena, Glue.

Databases: Oracle, MySQL, SQL Server 2008, Apache Cassandra, Mongo DB, HBase.

Operating Environment: Red Hat Linux 6/5.2, UNIX and IBM AIX 7.1.

Developer Tools: IntelliJ, Eclipse IDE, Toad, SQL Developer.

Frameworks: Struts, Hibernate, Spring MVC, Spring Core, Spring JDBC.

Tools: FileZilla, Visual Studio, Tableau, Zeppelin

PROFESSIONAL EXPERIENCE:

Big Data Engineer

Confidential, Quincy, MA

Responsibilities:

  • Redesigned and architected new data ingestion and load solution for a Confidential DWH using SPARK, Scala, HIVE on yarn cluster for performance improvement.
  • Performed API calls using the python scripting. Performed reads and writes to S3 using Boto3 library.
  • Working on a Data Lake creation project using Hive, Sqoop, Spark and AWS S3 .
  • Developed a custom parser in Spark Scala for complex healthcare data and perform CDC. Deployed this application on AWS EMR cluster as well as on Hortonworks ( HDP ) cloud.
  • Developed Airflow DAG for daily incremental loads, which gets data from Oracle and then imported into hive tables using Sqoop .
  • Worked on Scala code base related to Apache Spark performing Transformations and Actions on RDDs, Data Frames and Datasets using Spark SQL.
  • Performed Data Ingestion using Sqoop, Used Hive QL for data processing and scheduled the complex work flows using Oozie.
  • Used AWS EMR for processing of the ETL jobs and load to S3 buckets and AWS Athena for ad-hoc/low latency querying on S3 data.
  • Implemented buckets in HIVE and partitioning and dynamic partitions in Spark, Hive.
  • Executed complex Hive QL queries for required data extraction from Hive tables.
  • Great familiarity with Hive joins & used HQL for querying the databases eventually leading to complex Hive QL.
  • Developed Slowly changings Dimensions ( SCDs ) populating the data to S3 using Spark Scala.
  • Worked with large databases and write complex SQL queries to get the right input files to process and analyze in Hadoop environment.
  • Developed Oozie workflows to ingest/parse the raw data, populate staging tables and store the refined data in partitioned tables in the Hive.
  • Assisted in installing, configuring and maintaining Hadoop cluster on Hortonworks Data Platform with Hadoop tools like Spark, Hive, Pig, HBASE, Zookeeper and Sqoop for application development.
  • Actively monitor, research and analyze ways in which the services in AWS can be improved.
  • Leveraging build tools such as SBT and Maven for Building the Spark Applications.
  • Used CI/CD for spark apps using Jenkins and Git.

Environment: Spark, Hive, HDFS, Sqoop, AWS EC2, AWS EMR, AWS S3 Buckets, Airflow, Hortonworks Distributed Platform, Maven Build, MySQL, AWS Athena, Agile-Scrum, Scala, Python, putty, IntelliJ, Git.

HADOOP/SPARK DEVELOPER

Confidential, Reston, VA

Responsibilities:

  • Extensively migrated existing architecture to Spark Streaming for live streaming of data.
  • Executed Spark code using Scala for Spark Streaming/SQL for faster processing of data.
  • Developed Oozie Bundles to schedule Sqoop and Hive jobs to create data pipelines.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD's in Python.
  • Developed Kafka producer for message handling.
  • Involved in importing the real-time data to Hadoop using Kafka and implemented the Oozie job.
  • Used Amazon CLI for data transfers to and from Amazon S3 buckets.
  • Executed Hadoop/Spark jobs on AWS EMR using programs, data stored in S3 Buckets.
  • Experienced in pulling data from Amazon S3 bucket to data lake and built Hive tables on top of it and created data frames in Spark to perform further analysis.
  • Worked on Data serialization formats for converting complex objects into sequence bits by using ORC and Parquet.
  • Worked on analyzing and examining customer behavioral data using HiveQL.
  • Analyzed large amount of data sets to determine optimal way to aggregate and report on it.
  • Involved in daily SCRUM meetings to discuss the development/progress.

Environment: Hadoop, Hive, HDFS, Pig, Sqoop, Oozie, Spark, Spark-Streaming, Kafka, Apache Solr, Cassandra, Cloudera Distribution, Maven Build, MySQL, AWS, Agile-Scrum. Scala, putty, IntelliJ, Git.

HADOOP/SPARK DEVELOPER

Confidential, Tampa, FL

Responsibilities:

  • Used Sqoop to inject data from server to HDFS and Hive as a part of data acquisition.
  • Used Spark to remove all the missing data and data transformation to create new features in pre-processing phase.
  • Used Hive and Impala to get some insights about the customer data in data exploration stage.
  • Used Sqoop, Hive, Spark and Oozie for building data pipeline.
  • Installed and configured Hadoop MapReduce and HDFS.
  • Developed multiple Map Reduce jobs in Java for data cleaning and processing.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Implemented Spark RDD transformations to map business analysis and apply actions on top of transformations.
  • Experienced in managing and reviewing Hadoop log files.
  • Involved in configuring Hadoop ecosystem components like HBase, Hive, Pig and Sqoop.
  • Used HIVE to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Worked with the business analyst team for gathering requirements and client needs.
  • Hands on experience in loading data from UNIX file system and Teradata to HDFS.

Environment: Hadoop, Map Reduce, Spark, Spark SQL, Flume, HDFS, Hive, PIG, Sqoop, Oozie, SQL, Scala, Java, Zookeeper, Shell script.

HADOOP DEVELOPER/ADMINISTRATOR

Confidential, Bentonville, AR

Responsibilities:

  • Installed and configured Hive, Pig, Sqoop and Flume on the Hadoop cluster.
  • Developed simple to complex MapReduce jobs using Hive and Pig.
  • Data processing of logs and semi structured content using PIG.
  • Created Hive tables to store the processed results in a tabular format.
  • Developed Hive Scripts for implementing dynamic Partitions.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Used Sqoop to efficiently transfer data between databases and HDFS.
  • Build and maintain scalable data pipelines using the Hadoop ecosystem and other open source components like Hive, and HBase.
  • Involved in gathering the requirements, designing, development and testing.
  • Worked on loading and transformation of large sets of structured, semi structured data into Hadoop system.
  • Debugging and identifying issues reported by QA with the Hadoop jobs.
  • Worked on Hue interface for querying the data.
  • Exported analyzed data to relational databases using SQOOP for visualization to generate reports for the BI team.

Environment: Hadoop (HDP 2.X), UNIX, Eclipse, HDFS, Java, MapReduce, Apache Pig, Hive, Flume, Python, SQOOP and MySQL, Tableau.

JAVA DEVELOPER

Confidential

Responsibilities:

  • Designed and developed user interface using Struts tags, JSP, HTML and JavaScript.
  • Developed user specific highlights (dashboard menu) section, Home page, Admin home page, user module (Modify/search users, create user screens with assigning various roles) using Spring MVC framework, Hibernate ORM Module, Spring Core Module, XML, JSP and XSLT.
  • Implemented various UI components using JQuery, HTML, CSS.
  • Implemented various services required for the business application.
  • Involved in designing the user interfaces using HTML, CSS, and JSPs.
  • Configured Hibernate and Spring to map the business objects to Oracle Database using XML configuration file.
  • Involved in writing Shell Script to export oracle table's data into flat files and performed unit testing using JUNIT and used Log4j for logging and automatic batch jobs.
  • Developed stored procedures and triggers using PL/SQL to calculate and update the tables to implement business logic.

Environment: Spring Core, Spring MVC, Hibernate, HTML, CSS, JQuery, AJAX, IBM Websphere, Oracle, Eclipse, Maven, JIRA, Git.

We'd love your feedback!