Senior Bigdata Engineer Resume LosAngeles, California - Hire IT People

PROFESSIONAL SUMMARY:

6+ years of Big Data Hadoop Ecosystems experience in ingestion, storage, querying, processing and analysis of big data.
Experience in dealing with Apache Hadoop components like HDFS, MapReduce, HIVE, HBase, PIG, SQOOP, Spark and Flume Big Data and Big Data Analytics.
Hands on experience in installing, configuring Hadoop ecosystems such as HDFS, MapReduce, Yarn, Pig, Hive, HBase, Oozie, Sqoop, flume and Kafka.
Excellent knowledge on Hadoop Architecture such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm.
Experience in developing MapReduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
Involved in writing data transformations, data cleansing using PIG operations and good experience in data retrieving and processing using HIVE.
Worked with HBase to conduct quick look ups (updates, inserts and deletes) in Hadoop.
Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
Proficient in using Hive optimization techniques like Buckets, Partitions, etc.
Experienced in loading dataset into Hive for ETL (Extract, Transfer and Load) operation.
Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice - versa.
Extensive Experience on importing and exporting data using stream processing platforms like Flume .
Developed Apache Spark jobs using Scala and Python for faster data processing and used Spark Core and Spark SQL libraries for querying.
Experience in creating Spark Streaming jobs to process huge sets of data in real time.
Experience tuning spark jobs for efficiency in terms of storage and processing.
Extensive experience using MAVEN as a Build Tool for the building of deployable artifacts from source code.
Experience working with Amazon's AWS services like EC2, EMR, S3, KMS, Kinesis, Lambda, API gateways, IAM etc.
Tested, Cleaned, and Standardized Data to meet the business standards using Execute SQL task, Conditional Split, Data Conversion, and Derived column in different environments.
Expertise in relational database systems (RDBMS) such as My SQL, and No SQL database systems like HBase and had basic knowledge on MongoDB and Cassandra.
Experience in database development using SQL and PL/SQL and experience working on databases like Oracle 12c/ 11g/10g, SQL Server and MySQL .
Good understanding of Hadoop Gen1/Gen2 architecture and hands-on experience with Hadoop components such as Job Tracker, Task Tracker, Name Node, Secondary Name Node, Data Node , Map Reduce concepts and YARN architecture which includes Node manager, Resource manager and App Master.
A great team player& ability to effectively communicate with all levels of the organization such as technical, management and customers.
Ability to quickly master new concepts and applications.

TECHNICAL SKILLS:

Big Data Ecosystems: Hadoop, MapReduce, HDFS, HBase, Hive, Pig, Sqoop, Oozie, Spark, Kafka

Hadoop platforms: Cloudera, MapR

AWS services: EC2, EMR, S3, KMS, Kinesis, Lambda, API Gateway, IAM

Languages: Java Script, Scala, SQL, Python

Web Technologies: HTML5, CSS3, JavaScript

Scripting Language: UNIX Shell Script

RDBMS DB: MySQL, Oracle 12c/ 11g/ 10g

NoSQL Technologies: HBase, MongoDB, Cassandra, DynamoDB

Tools: & Utilities: Eclipse, Visual Studio, Net Beans, GitHub, Maven, Jenkins

Operating Systems: Unix, Windows, Cent OS, Linux (Ubuntu, Red hat)

Others: Putty, WinSCP, GitHub

PROFESSIONAL EXPERIENCE:

Confidential, LosAngeles, California

Senior Bigdata Engineer

Responsibilities:

Created a Serverless data ingestion pipeline on AWS using MSK (Kafka)and lambda functions.
Developed applications using Java that reads data from MSK(kafka) and writes it to Dynamo DB.
Developed applications that leverages step functions and cloudwatch event triggers to fetch data and generate features from that data.
Very much involved in a number of key decisions in this project from design decisions to planning and implementation and security.
Involved in creating research data-lake by extracting customer's data from various data sources to S3 which include data from Excel, databases, and log data from servers.
Developed Apache Spark applications by using Scala for data processing from various streaming sources.
Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data to DynamoDB using Scala.
Created python-based lambda functions for feature extraction.
Created the light weight Serverless pipeline that runs on lambdas and generates insights.
Created a set of data classifiers that reads from DynamoDB and classifies the features into bins and stores them in DynamoDB.
Used lambda with SNS to create insight notifications to mobile devices.
Implemented Tableau mobile dashboards via Tableau mobile application.
Used different stages of Datastage Designer like Lookup, Join, Merge, Funnel, Filter, Copy, Aggregator, and Sort etc .
Involved in all phases of the SDLC and collaborated with a large team to get this pipeline operational.
Configured cloudwatch logs and created a cloudwatch dashboard for monitoring.
Deployed Machine Learning Models on Sagemaker and exposed it as an endpoint.
Accessed the endpoints to call the model in real time to generate the insight.

Environment: Lambda, MSK, KMS, Spark, SQL Server 2016/2014, DB2, DynamoDB, cloudwatch, Tableau, Python, SNS, step functions.

Confidential, Columbus, Ohio

Bigdata Developer

Responsibilities:

The near real time reporting was achieved by an event-based processing approach adoption instead of micro-batching to deal with data coming from Kafka.
Developed spring boot applications to read data from Kafka in an event-based manner. These applications were developed to run as micro-services that deals with parts of the problem and were deployed on Docker containers that were built and deployed automatically using Jenkins pipelines.
Have written applications using Spring boot that reads data from Kafka and writes it to MaprDB (MapR version of HBase).
Have written applications that produced data to Kafka and also consumed data from it.
Used Scala to convert Hive/SQL queries into RDD transformations in Apache Spark.
Implemented Spark solutions to generate reports, fetch and load data in Hive.
Experienced in writing real-time processing and core jobs using Spark Streaming with Kafka as a data pipeline system.
Implemented Spark using Scala, Python and utilizing Data frames and Spark SQL API for faster processing of data.
Very much involved in a number of key decisions in this project from design decisions to planning and implementation.
Have written HiveQl scripts to populate table and brought data from various systems using Sqoop.
Used DataStage as an ETL tool to extract data from sources systems, loaded the data into the ORACLE database.
Built a data lake on the MapR cluster which was used by different teams.
Wrote Spark applications and also mentored other team members on the perks of spark.
Wrote complex logic implementations using Spark to process data present in MaprDB and Hive.
Involved in all phases of the project lifecycle from requirements collection, design, development, testing and deployment.
Built a dashboard of all the YARN applications running on the cluster using YARN API.

Environment: Hadoop, HDFS, Hive, HBase, Sqoop, Oracle 12c, Apache Spark, MapReduce, Python, SQL Server 2012, Spark, Springboot, Linux, Relational Databases.

Confidential, Detroit, Michigan

Hadoop Developer

Responsibilities:

Working on Big Data infrastructure for batch processing as well as real-time processing. Responsible for building scalable distributed data solutions using Hadoop .
Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
Designed and implemented Incremental Imports into Hive tables.
Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
Experience in importing and exporting Terabytes of data using Sqoop from HDFS to Relational Database Systems and vice-versa.
Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables .
Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.
Experienced in managing and reviewing the Hadoop log files .
Implemented the workflows using Apache Oozie framework to automate tasks.
Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
Implemented data ingestion and handling clusters in real time processing using Kafka .
Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
Implemented python scripts which perform transformations and actions on tables and send incremental data to the next zone by using spark submit.
Experienced in working with spark eco system using Spark SQL and Scala queries on different formats like Text file, CSV file.
Developed and Configured Kafka brokers to pipeline server logs data into spark streaming.
Developed Spark scripts by using scala shell commands as per the requirement.
Developed spark code and spark-SQL/streaming for faster testing and processing of data.
Exported the analyzed data to relational databases using sqoop for visualization and to generate reports.

Environment : Hadoop, HDFS, Pig, Apache Hive, Sqoop, Flume, Python, Kafka, Apache Spark, HBase, Scala, Zookeeper, Maven, AWS, MySQL.

Confidential

Hadoop Developer

Responsibilities:

Developed MapReduce jobs in both PIG and Hive for data cleaning and pre-processing.
Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
Developed Sqoop scripts for loading data into HDFS from DB2 and preprocessed with PIG.
Automated the tasks of loading the data into HDFS and pre-processing with Pig by developing workflows using Oozie.
Loaded data from UNIX file system to HDFS and written Hive User Defined Functions.
Used Sqoop to load data from DB2 to HBase for faster querying and performance optimization.
Worked on streaming to collect this data from Flume and performed real time batch processing.
Developed Hive scripts for implementing dynamic partitions.
Developed Python scripts to find vulnerabilities with SQL Queries by doing SQL injection.
Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using testing library.
Collected the logs data from web servers and integrated in to HDFS using Flume.
Worked on developing ETL Workflows on the data obtained using Scala for processing it in HDFS and HBase using Oozie.
Written ETL jobs to visualize the data and generate reports from MySQL database using DataStage.

Environment: Hadoop, HDFS, Hive, Pig, Flume, Mapper, Flume, ETL Workflows, HBase, Python, Sqoop, Oozie, DataStage, Linux, Relational Databases, SQL Server 2012, DB2.

Confidential

Associate Software Engineer

Responsibilities:

Imported the data from CASSANDRA databases and Stored it into AWS.
Performed transformations on the data using different Spark modules.
Responsible for Spark Core configuration based on type of Input Source.
Executed Spark code using Scala for Spark Streaming/Spark SQL for faster processing of data.
Performed SQL Joins among Hive tables to get input for Spark batch process.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs.
Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
Involved in importing the data to Hadoop using Kafka and implemented the Oozie job for daily imports.
Used Amazon CLI for data transfers to and from Amazon S3 buckets.
Executed Hadoop/Spark jobs on AWS EMR using programs and data is stored in S3 Buckets.
Wrote various SQL, PLSQL queries and stored procedures for data retrieval.
Prepared utilities for the Unit -Testing of Application Using JSP and Servlets.
Developed Database applications using SQL and PL/SQL.
Applied design patterns and Object-Oriented design concept to improve the existing Java/J2EE based code base.
Experience in pulling the data from Amazon S3 bucket to data lake and built Hive tables on top of it and created data frames in Spark to perform further analysis.
Implemented Spark RDD transformations, actions to implement business analysis.
Developed Spark scripts by using Scala shell commands as per the requirement.

Environment: Cassandra, Kafka, Spark, Pig, Hive, Oozie, AWS, SQL, Scala, Python, Core Java, FileZilla, putty, IntelliJ, GitHub.

We provide IT Staff Augmentation Services!

Senior Bigdata Engineer Resume

Losangeles, CaliforniA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship