BIG DATA ENGINEER Resume Jersey City, NJ - Hire IT People

SUMMARY

I am a Big Data Engineer with 5 years of professional experience in the IT industry.
As an experienced Big Data consultant, I will ensure the successful delivery of high - quality big data solutions.
I combine an understanding of the business case, a variety of skills, frameworks, best practices and coding skill.
Additionally, I have a strong work ethics, the ability to work well with teams, to create the right platforms, pipelines and reporting tools for clients.
Used Spark SQL and Data Frame API extensively to build Spark applications.
Experienced in working on CQL (Cassandra Query Language), for retrieving the data present in Cassandra cluster by running queries in CQL.
Learn and adapt to perform for the CICD tool (GITHUB, Jenkins) chain that is available at Customer environment or proposed to be made available.
Configured the ELK stack for Jenkins logs, and syslogs
Used Spark to work on streaming analyzed data to HBase and make available for visualization and report generation by the BI team.
Used Spark Structured Streaming to structure real time data frame and update it in real time.
Prototyped analysis and joining of customer data using Spark in Scala and processed it to HDFS.
Implemented Spark in EMR for processing Big Data across our Data Lake in AWS System
Developed AWS strategy, planning, and configuration of S3, Security groups, IAM, EC2, EMR and Redshift.
Experience integrating Kafka with Avro for serializing and deserializing data. Expertise with Kafka producer and consumer.
Experience in configuring, installing and managing Hortonworks & Cloudera Distributions.
Involved in continuous Integration of application using Jenkins.
Implemented Spark and Spark SQL for faster testing and processing of data.
Experience writing streaming applications with Spark Streaming/Kafka.
Utilized Spark Structured Streaming to update the data frame in real time and process it
Experienced in Amazon Web Services (AWS), and cloud services such as EMR, EC2, S3, RDS and IAM entities, roles, and users.
Knowledgeable of deploying the application jar files into AWS instances.
Creation of Kafka brokers in structured streaming to get structured data by schema.
Handling schema changes in data stream using Kafka.
Skilled in HiveQL, custom UDFs written in Hive, and optimizing Hive Queries, as well as writing incremental imports into Hive tables.
Skillset

TECHNICAL SKILLS

DATABASE AND DATA WAREHOUSE: Cassandra, Hbase, Amazon Redshift, DynamoDB, MongoDB, Oracle, PostgreSQL, MySQL, Hive

DATA STORES (repositories): Data Lake, HDFS, Data Warehouse, S3

SOFTWARE DEVELOPMENT: Spark, Scala, Hive, Pig, Java, PySpark, Keras and TensorFlow JavaScript, HTML, SQL, C, C++, C#, Shell Script, HTML, CSS VBA, Python (Jupyter Notebook, Pandas, Numpy, Matplotlib, Scikit- learn, Boto3, Psycopg2, BeautifulSoup, GeoPandas, Rasterio) R-Programming, MATLAB, C++, C#

DEVELOPMENT TOOLS, AUTOMATION, CICD: Git, GitHub, MVC, Jenkins, CI CD, Jira, Agile, Scrum

ELK LOGGING & SEARCH: Elasticsearch, Logstash, Kibana

DATA PIPELINES/ETL: Flume, Spark, Kafka, Hive, Pig, Spark Streaming, SparkSQL, Data Frames, Kinesis, Spark, Spark Streaming, Spark Structured Streaming

BIG DATA PLATFORMS: Cloudera CDH, Hortonworks HDP, Amazon Web Services (AWS)/Amazon Cloud

AWS PLATFORM: AWS IAM Formation, AWS Redshift, AWS RDS, AWS EMR, AWS S3, EC2, AWS Lambda, AWS Kinesis, AWS ELK, AWS Cloud

DATA VISUALIZATION: Tableau, Power BI, Excel, Kibana

PROFESSIONAL EXPERIENCE

BIG DATA ENGINEER

Confidential, Jersey City, NJ

Responsibilities:

Created training program to form professionals as Machine Learning Developers.
Trained IT professionals in Python and Spark that at the end of the program will be able to understand capabilities, features, custom solutions and limitations in order to deliver high-quality solutions to a model of the data processing by using the PySpark programs for proof of concept.
Built Real-Time Streaming Data Pipelines with Kafka, Spark Streaming and Hive.
Spark streaming implemented for real-time data processing with Kafka.
Handled large amounts of data utilizing Spark.
Wrote streaming applications with Spark Streaming/Kafka.
Used Spark SQL to perform transformations and actions on data residing in Hive.
Responsible for designing and deploying new ELK clusters.
Log monitoring and generating visual representations of logs using ELK stack.
Implement CI/CD tools Upgrade, Backup and Restore
Created Infrastructure design for ELK Clusters.
Worked with Elasticsearch and Logstash (ELK) performance and configure tuning.
Created a Kafka producer to connect to different external sources and bring the data to a Kafka broker.
Handled schema changes in data stream using Kafka.
Support for the clusters, topics on the Kafka manager.
Responsible for Kafka operation and monitoring, and handling of messages funneled through Kafka topics.
Coordinated Kafka operation and monitoring with dev ops personnel; formulated balancing impact of Kafka producer and Kafka consumer message(topic) consumption.
Versioning with Git and set-up a Jenkins CI to manage CICD practices.
Pulled data and populated the data in Kibana.
Kibana dashboard designed over Elasticsearch for visualizing the data
Used Kibana to create custom dashboards, data visualization and reports.
Built Jenkins jobs for CI/CD infrastructure from GitHub repos

AWS BIG DATA ENGINEER

Confidential, Atlanta, GA

Responsibilities:

Implemented AWS IAM user roles and policies to authenticate and control access.
Specified nodes and performed the data analysis queries on Amazon redshift clusters on AWS.
Developed AWS Cloud Formation templates to create custom infrastructure of our pipeline.
Working on AWS Kinesis for processing huge amounts of real time data.
Developed multiple Spark Streaming and batch Spark jobs using Scala and Python on AWS.
Ingestion data through AWS Kinesis Data from various sources to S3.
Cloud Formation, AWS IAM and Security Group in Public and Private Subnets in VPC.
Worked with AWS Lambda functions for event-driven processing to various AWS resources.
Implemented Spark in EMR for processing Big Data across our Data Lake in AWS System.
Worked with Amazon AWS IAM console to create custom users and groups.Hands-on work with AWS EMR and S3.
Automated AWS components like EC2 instances, Security groups, ELB, RDS, Lambda and IAM through AWS cloud Formation templates.
Processed multiple terabytes of data stored in AWS using Elastic Map Reduce (EMR) to AWS Redshift.
Implemented security measures AWS provides, employing key concepts of AWS Identity and Access Management (IAM).
Installed, Configured and Managed AWS Tools such as ELK, Cloud Watch for Resource Monitoring.
AWS EMR to process big data across Hadoop clusters of virtual servers on Amazon Simple Storage Service (S3).
Launched and configured The Amazon EC2 (AWS) Cloud Servers using AMI's (Linux/Ubuntu) and configuring the servers for specified applications.
Responsible for Designing Logical and Physical data modelling for various data sources on AWS Redshift.
Implemented AWS Lambda functions to run scripts in response to events in Amazon Dynamo DB table or S3 bucket using Amazon API gateway.
AWS Kinesis used for real time data processing.
Experienced in Amazon Web Services (AWS), and cloud services such as EMR, EC2, S3, ELB and IAM entities, roles, and users.
Developed AWS Cloud Formation templates to for RedShift.

BIG DATA DEVELOPER

Confidential, St. Paul, MN

Responsibilities:

Wrote Hive queries and optimized the Hive queries with Hive QL.
ETL to Hadoop file system (HDFS) and wrote HIVE UDFs.
Experienced in importing real-time logs to HDFS using Flume.
Created UNIX shell scripts to automate the build process, and to perform regular jobs like file transfers.
Managed Hadoop clusters and check the status of clusters using Ambari.
Moved Relational Database data using Spark to transform in and move into Hive Dynamic partition tables using staging tables.
Developed scripts to automate the workflow processes and generate reports.
Transferred data between a Hadoop ecosystem and structured data storage in a RDBMS such as MySQL using Sqoop.
Involved in writing incremental imports into Hive tables.
Extensively worked on HiveQL, join operations, writing custom UDFs, and skilled in optimizing Hive Queries.
Spark API over Hadoop YARN to perform analytics on data in Hive.
Developed Shell Scripts, Oozie Scripts and Python Scripts.
Download data through Hive in HDFS platform.
Developed job processing scripts using Oozie workflow to run multiple Spark Jobs in sequence for processing data.
Used Ambari for maintaining heathy cluster.
Configured Hadoop components (HDFS, Zookeeper) to coordinate the servers in clusters.
Hive partitioning, bucketing, and joins on Hive tables, utilizing Hive SerDe’s.
Wrote shell scripts to automate workflows to pull data from various databases into Hadoop.
Loaded into Hbase tables and Hive tables consumption purposes.

BIG DATA DEVELOPER

Confidential, Bloomington, IN

Responsibilities:

Experience in configuring, installing and managing Hortonworks (HDP) Distributions.
Enabled security to the cluster using Kerberos and integrated clusters with LDAP at Enterprise level.
Worked on tickets related to various Hadoop/Big data services which include HDFS, Yarn, Hive, Oozie, Spark, Kafka.
Worked on Hortonworks Hadoop distributions (HDP 2.5)
Performed cluster tuning and ensured high availability.
Cluster coordination services through Zookeeper and Kafka.
Coordinates with monitors cluster upgrade needs, and monitors cluster health and builds proactive tools to look for anomalous behaviors.
Managing Hadoop clusters via Command Line, and Hortonworks Ambari agent.
Monitored multiple Hadoop clusters environments using Ambari.
Worked with cluster users to ensure efficient resource usage in the cluster and alleviate multi-tenancy concerns.
Managed cluster using Ambari
Managed and scheduled batch jobs on a Hadoop Cluster using Oozie.
Monitored Hadoop cluster using tools like Ambari.
Performed cluster and system performance tuning.
Run multiple Spark jobs in sequence for processing data.
Performed analytics on data using Spark.
Moved data from Spark and persist it to HDFS.
Used Spark SQL and UDFs to perform transformations and actions on data residing in Hive.

We provide IT Staff Augmentation Services!

Big Data Engineer Resume

Jersey City, NJ

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship