Sr. Big Data Engineer Resume
Bellevue, WA
SUMMARY
- 7 years of professional experience in IT which includes 4+ years of comprehensive experience in Big Data, Hadoop ecosystem and related technologies
- Experienced in working with different Hadoop ecosystem components such as HDFS, MapReduce, HBase, Spark, Yarn, Kafka, Zookeeper, PIG, HIVE, Sqoop, Storm, Oozie, Spark, Zookeeper and Flume
- Having Good Working Expertise on handling Multi Terabytes of structured and unstructured data on significantly big Cluster Environment
- Good understanding and working experience on Hadoop Distributions like Hortonworks and Cloudera.
- Demonstrated experience in delivering data and analytic solutions leveraging AWS, Azure or similar cloud data lake.
- Data Streaming from various sources like cloud (AWS, Azure) and on - premises by using the tools Spark.
- Hands on experience with AWS (Amazon Web Services), Elastic Map Reduce (EMR), Storage S3, EC2 instances and Data Warehousing.
- Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and Controlling and granting database access and Migrating On premise databases to Azure Data lake store using Azure Data factory.
- Experience in importing and exporting Multi Terabytes of data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice-versa
- Good understanding of Storm and Kafka for monitoring and managing Hadoop jobs.
- Experience in NoSQL databases like HBase and MongoDB for data extraction and storing huge volumes of data.
- Worked with data Ingestion Team, have Good understanding of Apache Nifi and its transformations.
- Experience in analyzing data using Hive QL, Pig Latin, and custom Map Reduce programs in Java
- Experienced in developing UDFs for Hive using Java.
- Good exposure to Python programming.
- Experience on writing PySpark programs for Streaming data
- Good experience in writing Spark applications using Python and Scala.
- Experienced in developing scripts for doing transformations using Scala.
- Experience in implementing custom Key and Values in MapReduce for better control on data
- Experience in writing simple to complex adhoc Hive and PIG Scripts
- Working experience of RDBMS like Oracle, MySQL and NoSQL DB like MongoDB, Cassandra
- Having Experience Semi-Structured Data Processing (Xml and Json) in Hive.
- Demonstrated experience in delivering data and analytic solutions leveraging AWS, Azure or similar cloud data lake
- Hands on experience in working on Spark SQL queries, Data frames, and import data from Data sources, perform transformations; perform read/write operations, save the results to output directory into HDFS.
- Experience in writing shell scripts to dump the Shared data from MySQL, Oracle servers to HDFS
- Experience in using Zookeeper for coordinating the distributed applications
- Experience in developing SCALA scripts to run in SPARK cluster.
- Good Knowledge in creating event processing data pipelines using Kafka and Storm.
- Good understanding in working with various compression techniques like Avro, Snappy, LZO
- Good understanding in Data integration between Pentaho and Hadoop
- Good understanding of Data Mining and Machine Learning techniques
- Experience in Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns
- Experience in developing Web Applications with various Open Source frameworks: Spring, Hibernate 2.0/3.0 ORM Frameworks
- Proficient in Core Java, J2EE, JDBC, Servlets, JSP, Exception Handling, Multithreading, EJB, XML, HTML5, CSS3, JavaScript, AngularJS
- Experience in Administration/Maintenance of Source Control Management Systems, such as Sub Version (SVN), GIT
- Jenkins is used as a Continuous Integration tool for Automation of daily process.
- Worked in an iterative, agile, SCRUM Methodology SDLC with strong ability to estimate/scope the development of projects
- Exceptional ability to quickly master new concepts and capable of working in-group as well as independently with excellent communication skills
TECHNICAL SKILLS
BigData Technologies: Hadoop, HDFS, Hive, MapReduce, Spark, Spark SQL, Streaming, Pig, Sqoop, Flume, Oozie, Hadoop distribution, HBase, Zookeeper, CDH
Programming Languages: Java, Python, Scala
Databases/RDBMS: Oracle 9i/10g/11g, MySQL, SQL/PL-SQL, MS-SQL Server 2005
Scripting/Web Languages: JavaScript, HTML5, CSS3, XML, SQL, Shell, Kafka, Rabbit MQ, Active MQ, Jenkins
ETL Tools: HBASE, Cassandra, Mongo DB, NIFI
Operating Systems: Linux, Windows XP/7/8
Software Life Cycles: SDLC, Waterfall and Agile models
Office Tools: MS-Office, MS-Project and Risk Analysis tools, Visio
Utilities/Tools: Eclipse, Tomcat, NetBeans, JUnit, SQL, SVN, Git, Log4j, REST UI, ANT, Maven, Automation and MR-Unit
PROFESSIONAL EXPERIENCE
Confidential, Bellevue, WA
Sr. Big Data Engineer
Responsibilities:
- Worked on Big Data Hadoop cluster implementation and data integration in developing large-scale system software.
- Developed real-time streaming application in Spark to ingest data from various source into Hadoop Data Lake.
- Developed web service using HBase and Hive Java API to compare schema between HBase and Hive tables.
- Created Hive, Phoenix, HBase tables and HBase integrated Hive tables as per the design using ORC file format and Snappy compression.
- Created Hive tables on data imported using Sqoop from Relational Database Oracle
- Implemented Partitioning and bucketing in Hive based on the requirement.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
- Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
- Written Hive UDFs and used piggy bank a repository of UDF’s for Pig Latin.
- Involved in loading the created HFiles into HBase for faster loading large customer base without taking Performance hit.
- Developed Python code to gather the data from HBase and designs the solution to implement using PySpark.
- Importing data from existing databases like MySQL, DB2 using Sqoop.
- Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems/mainframe and vice-versa loading data into HDFS.
- Monitoring YARN applications Troubleshoot and resolve cluster related system problems.
- Created shell scripts to parameterize the Hive actions in Oozie workflow and for scheduling the jobs.
- Configured Kafka for enterprise and resolved Kafka issues.
- Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
- Experience in Apache NiFi for dataflow with significant processing requirements and controlling security of data flow.
- Worked as a key role in a team of developing an initial prototype of a NiFi big data pipeline. This pipeline demonstrated an end to end scenario of data ingestion, processing
- Using NiFi tool to check whether message reached the end system or not.
- Developed the custom processor for NiFi.
- Worked on NiFi data Pipeline to process large set of data and configured Lookup's for Data Validation and Integrity.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDS and PySpark.
- Worked on NOSQL Data bases such as HBase, also used SPARK for real time streaming of data into the cluster.
- Developed Spark programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Implemented Spark using Python and SparkSql for faster testing and processing of data.
- Created Spark Streaming jobs using Python to read messages from Kafka & download JSON files from AWS S3 buckets
- Used Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS using Python and NoSQL databases such as HBase and Cassandra
- Implemented Spark Scripts using Spark Session, Python, Spark SQL to access hive tables into spark for faster processing of data
- Worked with Spark Session Object on Spark SQL and Data-Frames for faster execution of Hive queries
- Import the data from different sources like SQL Server into Spark RDD and developed a data pipeline using Kafka and Spark to store data into HDFS
- Used SparkSql to load JSON data and create schema RDD and load it into Hive tables and handled Structured data using SparkSql.
- Created Pipelines in ADF using Datasets/Pipeline to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
- Developed Python Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data from different sources.
- Coding in Java, Scala, Python, PySpark and Spark SQL
- Backend application code developed using Java 8 and Spring Boot Services to consume the data from the Database and from the other applications.
- Developed and integrated REST web services to display data or search results.
- Using REST API calls fetching the data from other source systems. And Integrated API with APIGEE and implemented JWT token validation.
- Used Oozie Scheduler systems to automate the pipeline workflow and orchestrate the map reduce jobs that extract and Zookeeper for providing coordinating services to the cluster.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Created Teradata views on top of the table as per the business requirement.
- Assessed existing and EDW (enterprise data warehouse) technologies and methods to ensure our EDW/BI architecture meet the needs of the business and enterprise and allows for business growth.
- Worked on Big Data Integration and Analytics based on Hadoop, SOLR, Spark, Kafka, Storm and web Methods technologies.
- Worked with Hue GUI in scheduling jobs with ease and File browsing, Job browsing, Metastore management.
Environment: CDH5, Hortonworks, Apache Hadoop 2.6.0, HDFS, Java 8, Hive 1.2.1000, Sqoop 1.4.6, HBase 1.1.2, Oozie 4.1.0, Storm 0.9.3, YARN, NiFi, Cassandra, Zookeeper, Spark, Kafka, Oracle 11g, MySQL, Shell Script, AWS, EC2, Tomcat 8, Spring 3.2.3, STS 3.6, Build Tool Gradle 2.2, Source Control GIT, Tera Data SQL Assistant.
Confidential
Big Data Developer
Responsibilities:
- Worked on analyzing, writing Hadoop MapReduce jobs using API, Pig and Hive.
- Design and Implement historical and incremental data ingestion techniques from multiple external systems using Hive, pig and Sqoop ingestion tools.
- Supported MapReduce Programs those are running on the cluster.
- Wrote MapReduce job using Java API for data Analysis and dim fact generations.
- Installed and configured Pig and also written Pig Latin scripts.
- Used Pig to do transformations, event joins and some Pre-Aggregations before storing the data onto HDFS.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Developed Scripts and Batch Job to schedule various Hadoop Program.
- Wrote Hive queries for data analysis to meet the business requirements.
- Extensively worked on SQOOP for importing metadata from Oracle.
- Extensively used Pig for data cleansing.
- Implemented complex SQL quires to get the analysis reports.
- Created Hive tables and working on them using Hive QL.
- Involved in using Sqoop for importing and exporting data between RDBMS and HDFS and Impala.
- Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
- Worked on MongoDB by using CRUD (Create, Read, Update and Delete), Indexing, Replication and Sharding features.
- Used Spark to analyze large amounts of non-unique data points with low latency and high throughput.
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
Environment: CDH4, Java, MapReduce, HDFS, Hive, Pig, Sqoop, MongoDB, Linux, XML, MySQL, MySQL Workbench, Java 6, Eclipse, PL/SQL, SQL connector, Sub Version.
Confidential, St Louis, MO
Software Developer
Responsibilities:
- Involved in the complete SDLC including the design of System Architecture, development of System Use Cases based on the functional requirements
- Used HTML, CSS, and JavaScript to construct the dynamic web pages for the application
- Wrote Python scripts to parse XML documents and load the data in database
- Develop various screens for the frontend using AngularJS
- Implemented Business Logic using Java, Spring MVC, and Hibernate
- Implemented RESTful Web services to retrieve data from client side and made REST API calls
- Developed Business objects using POJOs and data access layer using Hibernate framework
- Developed business components using Spring Boot, Spring Dependency Injection (Core) and Spring Annotations
- Used Spring Data Framework to use the features of Spring JDBC and Spring ORM classes like JDBC Template and Hibernate Template to perform the database operations by connecting to Data sources
- Developed SQL Queries, Stored Procedures, and Triggers Using Oracle, SQL, PL/SQL
- Used Micro services to communicate using synchronous protocols HTTP and REST
- Focused on Test Driven Development (TDD), created detailed JUnit tests for every single piece of functionality
- Developed automation scripts using Selenium WebDriver, Eclipse, Junit and Java
- Used MAVEN as build tools in Jenkins to move from one environment to other environments
- Responsible for configuring Continuous Integration i.e. Jenkins. GITHUB as version control tool
- Used NoSQL DB like MongoDB for the proof of concept
Environment: SQL, PL/SQL, JAVA, J2EE, HTML, CSS, Python script, Angular JS, Bootstrap, Node JS, JSON, Spring Framework, Spring MVC, JSP, Hibernate, REST, Agile/Scrum Methodology, Maven, JIRA, JBoss, JUnit, TDD, MySQL
Confidential
Java Developer
Responsibilities:
- Involved in the development of the forms and Webpages using Spring form, JSP, JSTL
- Developed the web tier using JSP, Struts, and Spring MVC
- Used Maven to build services by defining all dependencies in POM.xml file
- Integrated Spring Dependency Injection (IOC) among different layers of an application
- Involved in creating various Data Access Objects (DAO) for addition, modification and deletion of records
- Used SQL and PL/SQL queries, stored procedures, joins, functions, using DDL and DML
- Developed presentation layer using HTML, JSP, Ajax, CSS and jQuery
- Used Tomcat Server for deploying various components of application
- Involved in Unit Testing of various modules in generating the Test Cases
- Log4J to monitor the Error Logs. Extensively involved in Test-Driven Development (TDD)
- Used MYSQL to maintain the data and JDBC to write all SQL queries
Environment: Java, Sql, Pl/Sql, Servlets, J2EE, JSP, Spring framework, Spring MVC, MySQL, JDBC, Ajax, XML, HTML, CSS, jQuery, Log4j, Tomcat, Maven, JavaScript, JBoss, Junit.
