We provide IT Staff Augmentation Services!

Big Data Engineer Resume

4.00/5 (Submit Your Rating)

Philadelphia, PA

SUMMARY

  • Around 7 years of professional IT experience and certified Hadoop Developer with over 5years of Big Data ecosystem experience.
  • Efficient in Data design and development usingETL methodologies.
  • In depth and extensive knowledge of Spark, Hadoop Architectures and its components such as Spark Data Frames, Streaming Datasets, RDD’s, HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN, Resource Manager, Node Manager and Map Reduce.
  • Extensive experience in processing data using core Spark (Spark SQL, Scala) and Spark
  • Streaming of Kafka and Kinesis.
  • Hands on experience in using Hadoop ecosystem components Hive, Pig, Oozie, Sqoop,HUE, HBase and Map Reduce.
  • Experience in processing vast data sets by performing structural modifications, to cleanse both structured and semi - structured data using Spark Scala, Hive QL and Pig Latin.
  • Experience in performing specialized JOINS in Spark, PIG, HIVE and Map-Reduce.
  • Hands on usage of Databricks, creating cluster, spark notebooks and schedules jobs.
  • Familiar with using AWS components like S3, Glacier, Redshift, Kinesis, Lambda and Cloud Watch, IAM and EC2.
  • Worked with multiple file Input File Formats such as Parquet, ORC, JSON,Text/Csv File and Key Value File input format.
  • Handled incremental data loads on HDFS and Hive tables with reorganizing partitions
  • Solid Experience in optimizing the Hive queries using Partitioning and Bucketing techniques, which controls the data distribution, to enhance performance.
  • Experience in writing custom UDF’s in Scala and Java and used in Spark, Pig & Hive, extending their library as required for working on data in non-standard formats.
  • Extensive experience in using Hive UDAF’s, UDTF’sand HiveSerDe’s.
  • Hands on experience in developing Pig Scripts to perform data transformation operations, by implementing various analytical functions, for loading and evaluating data in the relations.
  • Experience in Importing and Exporting data from different RDBMS databases like MySQL, Oracle into HDFS using Sqoop.
  • Familiarity and Good Understanding in Kafka andKinesis messaging systems.
  • Familiar with writing Oozieworkflows and Job Controllers for job automation.
  • Configured the Hadoop cluster in Local (Standalone), Pseudo-Distributed, Fully-Distributed mode with the use of Cloudera and Hortonworks distributions.
  • Experience in developing Shell Scripts for system management and for automating BAU tasks.
  • Good exposure to Name node Federation and YARN.
  • Integration of Spark/Hadoop ETL pipeline with Enterprise BI and reporting tools like Tableau and Pentaho.
  • Worked on NoSQL databases like HBase, Cassandra and MongoDB.
  • Well versed knowledge and experience in cloud environment - Amazon Web Services (AWS), Rackspace and on private cloud infra structure - OpenStack cloud platform.
  • Object Oriented Programming and SBT/Maven project management using Scala and Java related technologies.
  • Good understanding and familiar with Docker containers and Orchestration
  • Worked extensively on various RDBMS systems like SQL Server, MySQL and PostgreSQL and have good database programming experience with SQL and familiar with front end technologies HTML (5), CSS and JavaScript.
  • Has in depth knowledge in Digital Marketing, Travel and Cable domains.

TECHNICAL SKILLS

Big Data Technologies: Spark Scala, Spark SQL, Spark Streaming, Apache Hadoop, Hive, Pig, Impala, Sqoop, Flume, Java Map Reduce, Kafka, Oozie, Hue, Zeppelin, Zookeeper, HBase, Cassandra

Programming Languages: SQL, Scala, UNIX Shell Scripting, Java, PL/SQL

Databases: SQL Server, MySQL and PostgreSQL, Oracle, MS-Access

ETL & Reporting Tools: Pentaho, Tableau, Power BI, Excel

Web & Java Technologies: HTML(5), CSS, JavaScript, XML, JQuery, PHP

Web Servers: Tomcat, WebSphere, SBT, Maven, Eclipse, VMware vSphere, Putty, WinSCP

Cloud Platforms: Amazon Web Services, Open Stack, Rackspace

Monitoring tools: Cloudera Manager, Hadoop Resource Manager, Nagios, Ganglia

Version Control & Project Management tools: GitHub, SVN, Bit Bucket, confluence, Jira tracker, Slack

Operating Systems: Unix, Linux(CentOS, Ubuntu, Red Hat), Windows, Mac

Domain knowledge: Cable, Travel Data, Digital Marketing and Insurance

PROFESSIONAL EXPERIENCE

Confidential, Philadelphia, PA

Big Data Engineer

Responsibilities:

  • Developed and supported a generalized Spark-Scala application as a SBT project.
  • Implemented simple to complex transformation on Spark Data Frames, Streaming Data Frames and Datasets.
  • Integrated Spark application with vast kind of Data sources i.e., RBMS, NoSQL, Streaming Queues, AWS, SFTP servers.
  • Connecting to multiple data sources involved usage of respective system specific API’s, SDK’s and dependencies.
  • Appropriate usage of Spark Scala and Spark SQL in the application to make the coding and execution much simpler and faster.
  • Processed Kafka messages with json structure, parsed and stored as parquet files in HDFS.
  • Hands on using Data bricks platform, developed Scala notebooks, scheduled the jobs and monitoring
  • Data bricks cluster management with required configurations according to data size.
  • Kinesis stream processing with complex JSON nested structures.
  • Used windowing and watermark functions to make seamless processing of batches without data loss.
  • Created Tableau dashboards on top of Oracle and Hive data to monito the behavior of metrics and job performance.
  • Managed deployment and support of spark application on different clusters and environments like Prod, Staging and Development.
  • Architectural flow, Design documentation and wiki management on Confluence boards.
  • Jira task management on Kanban Board following agile methodology of SDLC.
  • Developed Shell scripts to wrap the BAU jobs and hands on with Kubernetes UNIX environment.

Environment: Spark (Scala, SQL & Streaming), Hadoop (HDFS, Hive, HBase), AWS (S3, Redshift, Kinesis, Cloud watch etc.,), Kafka, UC4 Scheduling, Tableau, Java, SQL, CentOS, UNIX, Putty, JIRA, Confluence.

Confidential, Scottsdale, AZ

Big Data / Hadoop Developer

Responsibilities:

  • Highly experienced in batch-processing of data using PIG and Hive.
  • Used various built-in functions and regex operations to extract the sensible data also developed Pig UDF’sin java used to derive business metrics.
  • Optimized the processing by using specialized joins in Pig and bucketing in Hive.
  • Worked on generating reports using SPARK SQL for fast processing of data.
  • Worked on Data Frames in Spark SQL to get data from Hive tables and applied transformations on data frames
  • Highly worked on UNIX shell scripting to handle day to day jobs & automation.
  • Used Oozie workflows to automate various jobs i.e., Pig, Hql and shell scripts.
  • Also used NiFi flow to automate the process of transmitting data through MFT.
  • Worked on developing different kinds of reports using different loaders and delimiters. Involved in end to end development and deployment of reports for various clients.
  • Worked on development of Kafka-Spark streaming integration to process real-time data.
  • Developed Spark streaming pipeline in java to parse JSON data and to store in Hive tables.
  • Involved and contributed in architectural discussions and handled offshore team.
  • Worked closely with Data warehouse team and Business Analysts to reduce the gaps between the requirement gatherings.
  • Used JIRA tracker for daily work logs and SDLC management.
  • Awarded with SPOT recognition for successfully deployment of project within short deadlines.

Environment: Hadoop Architecture, HDFS, Map Reduce, Hive, Pig, Spark, Kafka, Sqoop, Nifi,Oozie, ctrl-M, Java, MySQL, CentOS, UNIX, Putty, JIRA.

Confidential, New York, NY

Big Data / Hadoop Developer

Responsibilities:

  • Extensive experience in processing data using Spark (Scala, Spark SQL) and Spark
  • Streaming. Hands on experience in processing real time data streams using Spark Streaming.
  • Integrated Spark Streaming with Kafka, Flume, Amazon Kinesis and Cassandra.
  • Worked on developing python scripts using REST and SOAP APIs.
  • Created and maintained PL/SQL scripts and stored procedures.
  • Designed and developed and maintained data extraction and transformation processes and ensured that data is properly loaded and extracted in and out of our systems.
  • Developer stored procedures, Functions, Cursors using SQL and PL/SQL for transformations.
  • Used advanced features of PL/SQL like Records, Tables, Objects type and Dynamic SQ.
  • Extensive experience in handling transient clusters and persistent clusters using shell script on AWS.
  • Developed multiple hive scripts and redshift scripts for several workflows. Used Sqoop to efficiently transfer data between databases and HDFS.
  • Involved in architecting Hadoop clusters Translation of functional and technical requirements into detailed architecture and design.
  • Connected qlikview reporting tool to redshift and generated reports.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
  • Developed Unix Shell scripts to automate the cluster installation.
  • Involved in developing reports in Tableau and apache zeppelin.

Environment: Hadoop Architecture, HDFS, Map Reduce, Hive, Pig, Spark, Kafka, Storm, Nifi,Sqoop, Oozie, Flume, Java, MySQL, PL/SQL Tableau, CentOS, Windows, Unix, AWS, Putty.

Confidential

Big Data / Hadoop Developer

Responsibilities:

  • Designed and implemented Customization of Keys, Values, Partitioners, Combiners, Input Formats and Record Readers in JAVA.
  • Implemented UDFS, UDAFS, UDTFS in java for hive to process the data that can’t be performed using Hive inbuilt functions.
  • Worked with multiple Input Formats such as Text File, Key Value, Sequence File input format.
  • Worked with multiple file formats JSON, XML, Sequence Files and RC Files.
  • Deployed and configured Flume agents to stream log events into HDFS for analysis. Transformed the log files into structured data using Hive SerDe’s and Pig Loaders.
  • Parsed JSON and XML files in PIG using Pig Loader functions and extracted meaningful information from Pig Relations by providing a regex using the built-in functions in Pig.
  • Involved in creating Hive Internal and External tables, loading data and writing hive queries which will run internally in map reduce way.
  • Analyzed JSON and XMLfiles using Hive Built in functions and SerDe’s.
  • Optimizing the Hive queries using Partitioning and Bucketing techniques, for controlling the data distribution.
  • Used Sqoop to efficiently transfer data between databases and HDFS.
  • Implemented complex map reduce programs to perform joins on the Map side using Distributed Cache in Java.
  • Worked on complex data types Array, Map and Struct in Hive.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
  • Developing Scripts and Batch Jobs to schedule various Hadoop Programs. Familiarity in using NoSQL database, HBase on top of HDFS.
  • Worked with BI teams in generating the reports in Tableau and designing ETL workflows on Pentaho.

Environment: Hadoop, HDFS, Map Reduce, Java, Hive, Pig, Sqoop, Flume, Oozie, Hue, Cloudera Manager, HBase, Tableau, Pentaho, CentOS, Windows, Putty, MySQL.

Confidential

Big Data / Hadoop Developer

Responsibilities:

  • Developed Map-Reduce programs in JAVA, HIVE and PIG to validate and cleanse the data in HDFS, obtained from heterogeneous data sources, to make it suitable for analysis.
  • Used a 20 node cluster with Cloudera Hadoop distribution on Amazon EC2 for backup. Worked on streaming the data into HDFS from web servers using Flume.
  • Experience in database development creating Tables, View and Constraints.
  • Created indexes on the tables for faster retrieval of the data to enhance query performance using SQL and PL/SQL
  • Wrote complex SQL Queries using joins, Sub Queries and inline views to retrieve data from the database
  • Designed and implemented Hive and Pig UDF’s for evaluation, filtering, loading and storing of data.
  • Experience in writing Nested FOREACH in Pig Latin, for implementing complex business logic in the process of data mining.
  • Performed Fragment-Replicate Joins (Map side joins) in Pig, which implements Distributed Cache.
  • The Hive tables created as per requirement were Internal or External tables defined with appropriate Static and Dynamic partitions, intended for efficiency.
  • Wrote Pig Scripts to generate Map Reduce jobs and performed ETL procedures on the data in HDFS.
  • Used several Built in functions and Piggy Bank UDF’s for data transformations in Pig. Implemented Lateral Viewin conjunction with UDTFsin Hive.
  • Worked extensively with Sqoop for importing and exporting data from MySQL into HDFS and Hive.
  • Performed complex Joinson the tables in Hive.
  • Load and transform large sets of structured, semi structured using Hive and Impala.
  • Connected Hive and Impala to Tableau reporting tool and generated dashboard reports.
  • Understanding the existing Enterprise data warehouse set up and provided design and architecture suggestion converting to Hadoop using MR, HIVE, SQOOP and Pig Latin.

Environment: Hadoop(CDH), HDFS, Map Reduce, Hive, Pig, Impala, Sqoop, PL/SQL, Oozie, Flume, Hue, Java, MySQL, Tableau, CentOS, Unix, AWS, Putty.

Confidential

Jr. Java/J2EE Developer

Responsibilities:

  • Actively involved in analyzing and collecting user requirements
  • Participated in Server side and Client side programming
  • Developed and tested the DAO layer for CRUD operations.
  • Involved in complete Software Development Life Cycle (SDLC) with Object Oriented approach of clients business process and continuous client feedback.
  • Used HTML, CSS, JavaScript, JQuery, Ajax for Front End Development.
  • Implemented the web-based application using spring framework.
  • Used spring dependency injection and Spring-Hibernate Integration.
  • Responsible for writing JavaScript for validationin client side.
  • Wrote database queries using SQL for accessing, manipulating and updating Oracle Database.
  • Worked with a team of 4 members to deliver the end to end web application for an educational institution.
  • Involved in analyzing, designing, coding and testing.
  • Designed and implemented UI layer using HTML, JavaScript and JSP Servlets.
  • Connected java web applications to MySQL by using JDBC connectors in Eclipse IDE.
  • Involved in configuring and deploying the application on Tomcat Server.
  • Involved on the back end to modify business logic by making enhancements.

Environment: Java/J2EE, Web Services, SVN, Eclipse, Spring, Hibernate, Tomcat, JSON, XML, JSP, JavaScript, HTML, CSS, JQuery, AJAX, Tomcat Server, Oracle 10g, Windows, Unix.

We'd love your feedback!