Sr. Big Data Engineer Resume
Washington, DC
SUMMARY:
- Highly skilled And IT Professional with 9+ years of experience in Software Engineering with emphasis on Big Data Application development and Java server - side programming.
- Strong expertise in Big Data ecosystem like Spark, Hive, Sqoop, Hdfs, Map Reduce, Kafka, Oozie, Yarn, Pig, HBase, Flume.
- Strong expertise in building scalable applications using various programming languages (Java, Scala and python).
- In depth Knowledge on architecture of distributed systems and parallel computing.
- Experience implementing end-to-end data pipelines for serving reporting and data science capabilities.
- Experienced in working with Cloudera, Hortonworks and Amazon EMR clusters.
- Experience in fine tuning applications written in Spark and Hive and to improve the overall performance of the pipelines.
- Developed production ready spark applications using Spark RDD apis, Data frames, Datasets, Spark SQL and Spark Streaming.
- Hands on experience on fetching the live stream data and inject data into HBase table using Spark Streaming and Apache Kafka.
- Experience in Hadoop Shell commands, writing MapReduce Programs, verifying managing and reviewing Hadoop Log files.
- In depth knowledge on import/export of data from Databases using Sqoop.
- Well versed in writing complex hive queries using analytical functions.
- Knowledge in writing custom UDF’s in Hive to support custom business requirements.
- Solid experience in using the various file formats like CSV, TSV, Parquet, ORC, JSON and AVRO.
- Experience in using the compression techniques like G-zip, Snappy with in Hadoop.
- Strong knowledge of NoSQL databases and worked with HBase, Cassandra and Mongo DB.
- Experience in using the cloud services like Amazon EMR, S3, EC2, Red shift and Athena.
- Extensively used various IDE’s like IntelliJ, NetBeans and Eclipse
- Proficient in using RDBMS concepts with Oracle, MySQL, DB2, Teradata and experienced in writing SQL queries.
- Knowledge in writing shell scripts and scheduling using cron jobs.
- Experience working with GIT(Repository), Jenkins and Maven build tools.
- Developed cross-platform applications using JAVA, JSP, Servlets, Hibernate, RESTful, JDBC, JavaScript, XML, and HTML.
- Used Log4J for enabling runtime logging and performed system integration test to ensure quality of the system.
- Experience in using SOAP UI tool to validate the web service.
- Expertise in writing unit test cases using JUnit API.
- Experience in database design, entity relationships, database analysis, programming SQL, stored procedures PL/ SQL, packages and triggers in Oracle.
- Highly self-motivated, good technical, communications and interpersonal skills. Able to work reliably under pressure. Committed team player with strong analytical and problem-solving skills, ability to quickly adapt to new environments & technologies.
TECHNICAL SKILLS:
Big Data Ecosystem: MapReduce, HDFS, HIVE, HBase, Pig, Sqoop, Flume, Oozie, Zookeeper, Spark, Kafka
Cloud Platform: Amazon AWS EMR, EC2, Redshift, Athena
Programming Languages:: Java, Scala, Python, SQL, UNIX Shell Scripting.
Databases: Oracle 12c/11g, MySQL, MS-SQL Server2016/2014
Version Control: GIT, GitLab, SVN
NoSQL Databases: HBase and MongoDB
Methodologies:: Agile Model
Build Management Tools:: Maven, Ant.
IDE & Command line tools: Eclipse, IntelliJ
PROFESSIONAL EXPERIENCE:
Confidential, Washington DC
Sr. Big Data Engineer
Responsibilities:
- Created Sqoop Scripts to import and export customer profile data from RDBMS to S3 buckets.
- Built custom Input adapters to migrate click stream data from FTP servers to S3.
- Developed various enrichment applications in spark using Scala for performing cleansing and enrichment of click stream data with customer profile lookups.
- Troubleshooting Spark applications for improved error tolerance and reliability.
- Used Spark Data frame and Spark SQL API to implement batch processing of Jobs.
- Used Apache Kafka and Spark Streaming to get the data from adobe live stream rest Api connections.
- Automated creation and termination of AWS EMR clusters.
- Worked on fine tuning and performance enhancements of various spark applications and hive scripts.
- Used various concepts in spark like broadcast variables, caching, dynamic allocation etc. to design more scalable spark applications.
Environment: AWS EMR, S3, Spark, Hive, Sqoop, Scala, Java, MySQL, Oracle DB, Athena, Redshift.
Confidential, Addison, NJ
Big Data/Hadoop Engineer
Responsibilities:
- Extensively worked in Sqoop to migrate data from RDBMS to HDFS .
- Ingested data from various source systems like Teradata, MySQL, Oracle databases.
- Developed Spark application to perform Extract Transform and load using Spark RDD and Data frames.
- Created Hive external tables on top of data from HDFS and wrote ad-hoc hive queries to analyze the data based on business requirements.
- Utilized Partitioning and Bucketing in Hive to improve hive query processing times.
- Performed incremental data ingestion using Sqoop as existing application is generating data on daily basis.
- Performed Data ingestion using SQOOP, Apache Kafka, Spark Streaming and FLUME.
- Migrated/reimplemented Map Reduce jobs to Spark applications for better performance.
- Handled data in different file formats like Avro and Parquet.
- Extensively used Cloudera Hadoop distributions with in the project.
- Used GIT for maintaining/versioning the code.
- Created Oozie workflows to automate the data pipelines
Environment: Cloudera (CDH 5.x), Spark, Scala, Sqoop, Oozie, Hive, HDFS, MySQL, Oracle DB, Tera Data
Confidential, Atlanta, GA
Sr. Big Data/Hadoop Engineer
Responsibilities:
- Wrote complex Map Reduce jobs to perform various data cleansing and ETL like processing on the data.
- Worked on different file formats like Text, Avro, Parquet using Map Reduce Programs.
- Developed Hive Scripts to create partitioned tables and create various analytical datasets.
- Worked with cross functional consulting teams within the data science and analytics team to design, develop and execute solutions to derive business insights and solve client operational and strategic problems.
- Objective of this project is to build a data lake as a cloud-based solution in AWS using Apache Spark.
- Extensively used Hive queries to query data in Hive Tables and loaded data into HBase tables.
- Developed Spark scripts by using Scala Shell commands as per the requirement.
- Developed shell script to pull the data from third party system's into Hadoop file system.
- Exported the processed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Developed Spark scripts by using Scala Shell commands as per the requirement.
- Used Hive Partitioning and Bucketing concepts to increase the performance of Hive Query processing.
- Designing Oozie workflows for job scheduling and batch processing.
- Helped analytics team by writing Pig and Hive scripts to perform further detailed analysis of the data processed.
Environment: Java, HDFS, MapReduce, Hive, Pig, MySQL, CDH, IntelliJ, YARN, Sqoop, HBase, Unix Shell Scripting.
Confidential, Atlanta, GA
Bigdata/Hadoop Engineer
Responsibilities:
- Experience in using Avro, Parquet and JSON file formats and developed UDFs using Hive and Pig.
- Developing and maintaining Workflow Scheduling Jobs in Oozie.
- Experience in loading and transforming huge sets of structured, semi structured and unstructured data.
- Continuously monitored and managed Hadoop cluster using Cloudera Manager.
- Created Hive tables, loaded them with data and wrote hive queries.
- Involved in collecting, aggregating and moving data from RDBMS to HDFS using Sqoop.
- Experience in managing and reviewing Hadoop log files.
- Analysis of Web logs using Hadoop tools for operational and security related activities.
- Developed efficient Map Reduce programs in java for filtering out the unstructured data.
- Managed and reviewed Hadoop log files to identify issues when job fails.
- Ingest the application logs into HDFS and processes the logs using map reduce jobs.
- Create and maintain Hive warehouse for Hive analysis.
- Worked on different file formats like XML files, Sequence files, CSV and Map files.
- Developed multiple Map Reduce jobs in Java for data cleaning and preprocessing
- Responsible for design and creation of Hive tables, partitioning, bucketing, loading data and writing hive queries.
- Worked with Oozie workflow engine to run multiple Map-R, Hive and Pig jobs.
Environment: HDFS, Hive, Scala, Map Reduce, Storm, Java, HBase, Pig, Sqoop, Oozie, MySQL, Tableau.
Confidential, Houston TX .
Bigdata/Hadoop Engineer
Responsibilities:
- Involved in creating Hive tables, loading with data and writing hive queries.
- Involved in data ingestion into HDFS using Sqoop and Flume from variety of sources.
- Developed MapReduce programs to parse the raw data, populate tables and store the refined data in partitioned tables.
- Installed and configured Hadoop and Hadoop stack on a 4-node cluster.
- Experienced in managing and reviewing application log files.
- Ingest the application logs into HDFS and processes the logs using map reduce jobs.
- Create and maintain Hive warehouse for Hive analysis.
- Generate test cases for the new MR jobs.
- Developed multiple Map Reduce jobs in Java for data cleaning and preprocessing
- Responsible for design and creation of Hive tables, partitioning, bucketing, loading data and writing hive queries.
- Created HBase tables to store various data formats of personally identifiable information data coming from different portfolios.
- Involved in managing and reviewing Hadoop log files.
- Worked with Oozie workflow engine to run multiple Hive and Pig jobs.
Environment: HDFS, Hive, Map Reduce, Storm, Java, HBase, Pig, Sqoop, Shell Scripts, Oozie, MySQL, Eclipse, Webservices, MYSQL, JDBC and WebSphere Applications.
Confidential, Philadelphia, PA
Sr. Java/J2EE Developer
Responsibilities:
- Involved in a full life cycle Object Oriented application development - Object Modeling, Database Mapping, GUI Design.
- Developed the J2EE application based on the Service Oriented Architecture.
- Used Design Patterns like Singleton, Factory, Session Facade and DAO.
- Developed Use Case diagrams, Class diagrams and Sequence diagrams to express the detail design.
- Worked with EJB (Session and Entity) to implement the business logic to handle various interactions with the database.
- Created and injected spring services, spring controllers and DAOs to achieve dependency injection and to wire objects of business classes.
- Used Spring Inheritance to develop beans from already developed parent beans.
- Used DAO pattern to fetch data from database using Hibernate to carry out various database.
- Used SOAP Lite module to communicate with different web-services based on given WSDL.
- Used Hibernate Transaction Management, Hibernate Batch Transactions, and cache concepts.
- Created complex SQL Queries, PL/SQL Stored procedures, Functions for back end.
- Developed various generic JavaScript functions used for validations.
- Developed screens using HTML5, CSS, jQuery, JSP, JavaScript, AJAX and Ext.JS.
- Used Aptana Studio and Sublime to develop and debug application code.
- Used Rational Application Developer (RAD) which is based on Eclipse, to develop and debug application code.
- Created user-friendly GUI interface and Web pages using HTML, AngularJS, JQuery and JavaScript.
- Used Log4j utility to generate run-time logs.
- Deployed business components into WebSphere Application Server.
- Developed Functional Requirement Document based on users' requirement.
Environment: Core Java, J2EE, JDK 1.6, spring 3.0, Hibernate 3.2, Tiles, AJAX, JSP 2.1, Eclipse 3.6, IBM WebSphere7.0, XML, XSLT, SAX, DOM Parser, HTML, UML, Oracle10g, PL/ SQL, JUnit.