We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

2.00/5 (Submit Your Rating)

Washington, DC

SUMMARY:

  • Highly skilled And IT Professional with 9+ years of experience in Software Engineering with emphasis on Big Data Application development and Java server - side programming.
  • Strong expertise in Big Data ecosystem like Spark, Hive, Sqoop, Hdfs, Map Reduce, Kafka, Oozie, Yarn, Pig, HBase, Flume.
  • Strong expertise in building scalable applications using various programming languages (Java, Scala and python).
  • In depth Knowledge on architecture of distributed systems and parallel computing.
  • Experience implementing end-to-end data pipelines for serving reporting and data science capabilities.
  • Experienced in working with Cloudera, Hortonworks and Amazon EMR clusters.
  • Experience in fine tuning applications written in Spark and Hive and to improve the overall performance of the pipelines.
  • Developed production ready spark applications using Spark RDD apis, Data frames, Datasets, Spark SQL and Spark Streaming.
  • Hands on experience on fetching the live stream data and inject data into HBase table using Spark Streaming and Apache Kafka.
  • Experience in Hadoop Shell commands, writing MapReduce Programs, verifying managing and reviewing Hadoop Log files.
  • In depth knowledge on import/export of data from Databases using Sqoop.
  • Well versed in writing complex hive queries using analytical functions.
  • Knowledge in writing custom UDF’s in Hive to support custom business requirements.
  • Solid experience in using the various file formats like CSV, TSV, Parquet, ORC, JSON and AVRO.
  • Experience in using the compression techniques like G-zip, Snappy with in Hadoop.
  • Strong knowledge of NoSQL databases and worked with HBase, Cassandra and Mongo DB.
  • Experience in using the cloud services like Amazon EMR, S3, EC2, Red shift and Athena.
  • Extensively used various IDE’s like IntelliJ, NetBeans and Eclipse
  • Proficient in using RDBMS concepts with Oracle, MySQL, DB2, Teradata and experienced in writing SQL queries.
  • Knowledge in writing shell scripts and scheduling using cron jobs.
  • Experience working with GIT(Repository), Jenkins and Maven build tools.
  • Developed cross-platform applications using JAVA, JSP, Servlets, Hibernate, RESTful, JDBC, JavaScript, XML, and HTML.
  • Used Log4J for enabling runtime logging and performed system integration test to ensure quality of the system.
  • Experience in using SOAP UI tool to validate the web service.
  • Expertise in writing unit test cases using JUnit API.
  • Experience in database design, entity relationships, database analysis, programming SQL, stored procedures PL/ SQL, packages and triggers in Oracle.
  • Highly self-motivated, good technical, communications and interpersonal skills. Able to work reliably under pressure. Committed team player with strong analytical and problem-solving skills, ability to quickly adapt to new environments & technologies.

TECHNICAL SKILLS:

Big Data Ecosystem: MapReduce, HDFS, HIVE, HBase, Pig, Sqoop, Flume, Oozie, Zookeeper, Spark, Kafka

Cloud Platform: Amazon AWS EMR, EC2, Redshift, Athena

Programming Languages:: Java, Scala, Python, SQL, UNIX Shell Scripting.

Databases: Oracle 12c/11g, MySQL, MS-SQL Server2016/2014

Version Control: GIT, GitLab, SVN

NoSQL Databases: HBase and MongoDB

Methodologies:: Agile Model

Build Management Tools:: Maven, Ant.

IDE & Command line tools: Eclipse, IntelliJ

PROFESSIONAL EXPERIENCE:

Confidential, Washington DC

Sr. Big Data Engineer

Responsibilities:

  • Created Sqoop Scripts to import and export customer profile data from RDBMS to S3 buckets.
  • Built custom Input adapters to migrate click stream data from FTP servers to S3.
  • Developed various enrichment applications in spark using Scala for performing cleansing and enrichment of click stream data with customer profile lookups.
  • Troubleshooting Spark applications for improved error tolerance and reliability.
  • Used Spark Data frame and Spark SQL API to implement batch processing of Jobs.
  • Used Apache Kafka and Spark Streaming to get the data from adobe live stream rest Api connections.
  • Automated creation and termination of AWS EMR clusters.
  • Worked on fine tuning and performance enhancements of various spark applications and hive scripts.
  • Used various concepts in spark like broadcast variables, caching, dynamic allocation etc. to design more scalable spark applications.

Environment: AWS EMR, S3, Spark, Hive, Sqoop, Scala, Java, MySQL, Oracle DB, Athena, Redshift.

Confidential, Addison, NJ

Big Data/Hadoop Engineer

Responsibilities:

  • Extensively worked in Sqoop to migrate data from RDBMS to HDFS .
  • Ingested data from various source systems like Teradata, MySQL, Oracle databases.
  • Developed Spark application to perform Extract Transform and load using Spark RDD and Data frames.
  • Created Hive external tables on top of data from HDFS and wrote ad-hoc hive queries to analyze the data based on business requirements.
  • Utilized Partitioning and Bucketing in Hive to improve hive query processing times.
  • Performed incremental data ingestion using Sqoop as existing application is generating data on daily basis.
  • Performed Data ingestion using SQOOP, Apache Kafka, Spark Streaming and FLUME.
  • Migrated/reimplemented Map Reduce jobs to Spark applications for better performance.
  • Handled data in different file formats like Avro and Parquet.
  • Extensively used Cloudera Hadoop distributions with in the project.
  • Used GIT for maintaining/versioning the code.
  • Created Oozie workflows to automate the data pipelines

Environment: Cloudera (CDH 5.x), Spark, Scala, Sqoop, Oozie, Hive, HDFS, MySQL, Oracle DB, Tera Data

Confidential, Atlanta, GA

Sr. Big Data/Hadoop Engineer

Responsibilities:

  • Wrote complex Map Reduce jobs to perform various data cleansing and ETL like processing on the data.
  • Worked on different file formats like Text, Avro, Parquet using Map Reduce Programs.
  • Developed Hive Scripts to create partitioned tables and create various analytical datasets.
  • Worked with cross functional consulting teams within the data science and analytics team to design, develop and execute solutions to derive business insights and solve client operational and strategic problems.
  • Objective of this project is to build a data lake as a cloud-based solution in AWS using Apache Spark.
  • Extensively used Hive queries to query data in Hive Tables and loaded data into HBase tables.
  • Developed Spark scripts by using Scala Shell commands as per the requirement.
  • Developed shell script to pull the data from third party system's into Hadoop file system.
  • Exported the processed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Developed Spark scripts by using Scala Shell commands as per the requirement.
  • Used Hive Partitioning and Bucketing concepts to increase the performance of Hive Query processing.
  • Designing Oozie workflows for job scheduling and batch processing.
  • Helped analytics team by writing Pig and Hive scripts to perform further detailed analysis of the data processed.

Environment: Java, HDFS, MapReduce, Hive, Pig, MySQL, CDH, IntelliJ, YARN, Sqoop, HBase, Unix Shell Scripting.

Confidential, Atlanta, GA

Bigdata/Hadoop Engineer

Responsibilities:

  • Experience in using Avro, Parquet and JSON file formats and developed UDFs using Hive and Pig.
  • Developing and maintaining Workflow Scheduling Jobs in Oozie.
  • Experience in loading and transforming huge sets of structured, semi structured and unstructured data.
  • Continuously monitored and managed Hadoop cluster using Cloudera Manager.
  • Created Hive tables, loaded them with data and wrote hive queries.
  • Involved in collecting, aggregating and moving data from RDBMS to HDFS using Sqoop.
  • Experience in managing and reviewing Hadoop log files.
  • Analysis of Web logs using Hadoop tools for operational and security related activities.
  • Developed efficient Map Reduce programs in java for filtering out the unstructured data.
  • Managed and reviewed Hadoop log files to identify issues when job fails.
  • Ingest the application logs into HDFS and processes the logs using map reduce jobs.
  • Create and maintain Hive warehouse for Hive analysis.
  • Worked on different file formats like XML files, Sequence files, CSV and Map files.
  • Developed multiple Map Reduce jobs in Java for data cleaning and preprocessing
  • Responsible for design and creation of Hive tables, partitioning, bucketing, loading data and writing hive queries.
  • Worked with Oozie workflow engine to run multiple Map-R, Hive and Pig jobs.

Environment: HDFS, Hive, Scala, Map Reduce, Storm, Java, HBase, Pig, Sqoop, Oozie, MySQL, Tableau.

Confidential, Houston TX .

Bigdata/Hadoop Engineer

Responsibilities:

  • Involved in creating Hive tables, loading with data and writing hive queries.
  • Involved in data ingestion into HDFS using Sqoop and Flume from variety of sources.
  • Developed MapReduce programs to parse the raw data, populate tables and store the refined data in partitioned tables.
  • Installed and configured Hadoop and Hadoop stack on a 4-node cluster.
  • Experienced in managing and reviewing application log files.
  • Ingest the application logs into HDFS and processes the logs using map reduce jobs.
  • Create and maintain Hive warehouse for Hive analysis.
  • Generate test cases for the new MR jobs.
  • Developed multiple Map Reduce jobs in Java for data cleaning and preprocessing
  • Responsible for design and creation of Hive tables, partitioning, bucketing, loading data and writing hive queries.
  • Created HBase tables to store various data formats of personally identifiable information data coming from different portfolios.
  • Involved in managing and reviewing Hadoop log files.
  • Worked with Oozie workflow engine to run multiple Hive and Pig jobs.

Environment: HDFS, Hive, Map Reduce, Storm, Java, HBase, Pig, Sqoop, Shell Scripts, Oozie, MySQL, Eclipse, Webservices, MYSQL, JDBC and WebSphere Applications.

Confidential, Philadelphia, PA

Sr. Java/J2EE Developer

Responsibilities:

  • Involved in a full life cycle Object Oriented application development - Object Modeling, Database Mapping, GUI Design.
  • Developed the J2EE application based on the Service Oriented Architecture.
  • Used Design Patterns like Singleton, Factory, Session Facade and DAO.
  • Developed Use Case diagrams, Class diagrams and Sequence diagrams to express the detail design.
  • Worked with EJB (Session and Entity) to implement the business logic to handle various interactions with the database.
  • Created and injected spring services, spring controllers and DAOs to achieve dependency injection and to wire objects of business classes.
  • Used Spring Inheritance to develop beans from already developed parent beans.
  • Used DAO pattern to fetch data from database using Hibernate to carry out various database.
  • Used SOAP Lite module to communicate with different web-services based on given WSDL.
  • Used Hibernate Transaction Management, Hibernate Batch Transactions, and cache concepts.
  • Created complex SQL Queries, PL/SQL Stored procedures, Functions for back end.
  • Developed various generic JavaScript functions used for validations.
  • Developed screens using HTML5, CSS, jQuery, JSP, JavaScript, AJAX and Ext.JS.
  • Used Aptana Studio and Sublime to develop and debug application code.
  • Used Rational Application Developer (RAD) which is based on Eclipse, to develop and debug application code.
  • Created user-friendly GUI interface and Web pages using HTML, AngularJS, JQuery and JavaScript.
  • Used Log4j utility to generate run-time logs.
  • Deployed business components into WebSphere Application Server.
  • Developed Functional Requirement Document based on users' requirement.

Environment: Core Java, J2EE, JDK 1.6, spring 3.0, Hibernate 3.2, Tiles, AJAX, JSP 2.1, Eclipse 3.6, IBM WebSphere7.0, XML, XSLT, SAX, DOM Parser, HTML, UML, Oracle10g, PL/ SQL, JUnit.

We'd love your feedback!