Big Data Developer Resume Piscataway, NJ - Hire IT People

SUMMARY

Extensive experience in a various IT related technology which includes hands - on experience in Big Data technologies
Proficient in installing, configuring and using Apache Hadoop ecosystems such as MapReduce, Hive, Pig, Flume, Yarn, HBase, Sqoop, Spark, Storm, Kafka, Oozie, Flink, NiFi, Impala and Zookeeper
Strong understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase, Cassandra, MongoDB and Elasticsearch
Hands-on experience in developing the ETL program for Data Extraction, Transformation and Loading and also designing and implementing Data Warehouse applications using ETL tool like Oracle, Amazon Redshift, Snowflake and SAS
Hands-on experience in capacity planning, monitoring and performance tuning of Hadoop and Spark clusters
Expertise in distributed programming through spark, specifically Java, Scala and Python
Proficient knowledge in collecting, aggregating and moving large amounts of real-time data with Flume and Apache Spark and programming Scala to analyze large datasets using Spark Streaming, Kinesis and Kafka to process real time data
Strong experience with batch processing and workflow tools such as Airflow, NiFi, Luigi and Azkaban
Experience in writing Pig Latin scripts and HiveSQL, Impala queries for preprocessing and analyzing large volumes of data
Extensive experience in designing and implementing large scale data warehousing and analytics solutions for working with RDBMS (e.g. Oracle, Teradata, Amazon RDS, PostgreSQL) and understanding of the challenges and limitations of them
Extensive experience in importing and exporting data using Sqoop from HDFS/Hive/HBase to Relational Database Systems (RDBMS) and vice versa
Experience working with Public Cloud platforms like Google Cloud, AWS, and Azure
Experience in creating AWS computing instance services like EC2 and Amazon Elastic Load Balancing as well as creating and managing AWS Storage services like S3, EBS and Amazon CloudFront
Hands-on experience on full life cycle implementation using MapReduce, CDH (Cloudera) and HDP (Hortonworks Data Platform)
Experience of REST APIs (using, designing and development) and creating web application using MEAN stack consisting of React, Node.js, MongoDB, and Express.js along with HTML5/HTML, CSS3/CSS, JavaScript, jQuery, Bootstrap, JSON and AJAX
Experienced in design patterns such as MVC using web framework such as Django, Flask and deploying application on Heroku, containerizing applications using Docker
Knowledge of data serialization and familiar with data formats including SequenceFile, Avro, Parquet, XML and JSON
Strong in core java, data structure, algorithms design, Object-Oriented Design(OOD), Object Oriented Programming (OOP) concepts and Java components like Collections, Exception handling, I/O system
Demonstrated ability to communicate and gather requirements, partner with enterprise architects, business users, analysts and development teams to deliver rapid iteration of complex solutions
Proficient in business intelligence reporting tools like Tableau, SAP and Looker
Experience in Agile, Waterfall, and Scrum Development environments by using Git, Docker and JIRA

TECHNICAL SKILLS

Programming Languages\Development Approach\: C, C++, Java, Python, Scala, Javascript\Agile/SCRUM and waterfall\

Big Data TechnologiesAWS\: MapReduce, Spark, Spark SQL, Elasticsearch,EC2, SNS, SQS, VPC, Lambda, DynamoDB\Spark streaming, Kafka, Sqoop, Flume, \RDS, Kinesis, Redshift, S3, ELB, CloudFront, \Azkaban, Hive, Cassandra, Apache Nifi, Oozie, \EBS, EMR, Glue\Storm, Flink, Zookeeper, Pig, Yarn, Airflow, \Impala \

Web Technologies\NoSQL Database\: HTML5, CSS, JavaScript, XML, Angular, \MongoDB, Cassandra, Redis, HBase, Neo4j, \React, Node.js, Express.js, Restful Services, \Oracle NoSQL, Amazon DynamoDB, Couchbase\Bootstrap, jQuery, JSON, Flask, Django, Spring\

Tools: \RDBMS Database\: GitHub, SVN, Microsoft Office, Eclipse, \MySQL, Oracle, Microsoft SQL Server, \Jupiter, Hue, Docker, Heroku, Looker, Tableau, \PostgreSQL, IBM DB2, Teradata, SQLite\IntelliJ\

PROFESSIONAL EXPERIENCE

Confidential, Piscataway, NJ

Big Data Developer

Responsibilities:

Designed and implemented data pipeline for audience targeting data management platform (DMP) using Kafka, Flume, Spark, Hive and ingested data into Elasticsearch, HBase and Redis
Worked with Nginx and Flume NG cluster to import the real-time bidding data from the Server-to-Server integrations between DMP and Demand-Side Platform (DSP) into Kafka for real-time processing and HDFS for batch processing
Convert raw data with sequence data format Parquet to reduce data processing time and increase data transferring efficiency through the network
Used Spark Streaming combined with Kafka to do Real-time statistical analysis of business indicators and store the calculated result data in Redis
Developed Spark programs and created RDDs and DataFrames to do Spark Transformations, Actions and Broadcast with Scala and Spark SQL to process the offline data from a variety of sources for data segmentation and profile building
Worked closely with data science team to use Spark GraphX to analyze and process data to identify the same user across multiple devices and GeoHash algorithm to solve user's geographical location identification problem
Use Spark Transformations and Action to merge newly collected user data with unified user data, store the merged user segmentation data in HBase, and dynamically expand
Use ECharts, ELK stack (Elasticsearch, Logstash and Kibana) to visualize the data in HBase and generate audience profile reports based on the analysis for future research
Worked in an Agile environment. Effectively communicated with different levels of the management

Environment: Apache Hadoop 2.5, YARN, Spark 2.3.2, Kafka 0.10.0.1, Flume 1.8.0, Hive 3.1.0, HBase 1.3.3, Elasticsearch 6.5.3, Redis 5.0, Logstash 6.5.3, Kibana 6.5.3, ECharts 4.1.0

Confidential, Riverside, CA

Data Engineer

Responsibilities:

Work closely with the web developer team to implement a web application which allow users to create account, display nearby available bicycles on google map and unlock the bicycles the user want to use and send HTTP request to the servers
Used NGINX to get the HTTP request from the user side iOS application first and then send those data to the servers with load balance strategies
Work closely with web develop team to implement a micro-service module with Java Spring Boot to collect user account information from NGINX and store them into MongoDB cluster and MySQL database
Used Flume to monitor and collect real-time user behavior data like location, riding time length, distance etc after using NGINX as HTTP load balancer and sink log-data into Message Queue of the Kafka and HDFS
Processed data using Spark Streaming from Kafka in real-time and performs necessary transformations and aggregations for the data and store the result in MySQL database
Wrote UDFs, UDAFs and UDTFs to do ETL processes which includes data processing and data storage in Spark SQL and Spark Core to transform offline unstructured data in HDFS into structured data
Utilized Sqoop to transfer data from HBase to HDFS
Deployed Elasticsearch and Kibana in Docker on AWS to perform data indexing and visualization so as to help the owner of the system to make good business strategies
Actively participated and provided feedback constructively during daily Stand up meetings and weekly Iterative review meetings with SCRUM development

Environment: Apache Hadoop 2.5, Apache Spark 2.1.3, Kafka 0.10.0.1, Sqoop 1.4.7, NGINX PLUS R8, Flume 1.7.0, AWS, Spring Boot, Zookeeper, iOS 11.4.1, MongoDB 3.4, Elasticsearch 6.2.1, Spring Boot 1.5.17

Confidential, Riverside, CA

Big Data Developer

Responsibilities:

Implement a real time credit card fraud detection and analysis pipeline with Kafka, batch processing, Spark streaming, Cassandra, Spark SQL, HBase and Airflow
Developed Scala scripts, UDF's using both Data frames/Spark SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing to transfer large sets of semi structured credit card transaction data which include credit card information, merchant details and whether it is fraud transaction or not to structured data and load it into Cassandra database
Worked with NoSQL databases Cassandra in creating databases and tables to load the cleaned transaction data
Worked with Spark ML pipeline to process the structured transaction data in Cassandra and use the data to create machine learning model by using random forest algorithm
Delivered real time credit card transaction data from multiple sources into Kafka messaging system
Responsible for collecting incoming real time credit card transaction data from Kafka, processed them with Spark-Streaming and detection for fraud using Spark ML library with the deployed model
Stored the fraud transaction detection result data into HBase
Developed a scalable web application with Java Spring Boot and jQuery with a dashboard style with Bootstrap to automate the process of monitoring the HBase and alert via the dashboard once detected fraud based on the real-time & batch data
Automated the whole pipeline with Airflow scheduling, decreased the pipeline run time by 49.5% and reduced data storage size by 99.7% via substituting intermediate database with Parquet
Deployed the spark code on EMR
Worked in an Agile environment. Effectively communicated with different levels of the management

Environment: Apache Hadoop 2.5, Apache Spark 1.6.0/2.1.3 , Kafka 0.10.0.1, Cassandra 2.2, EMR, HBase, Airflow 1.7.1.2, Spring Boot 1.3.0, Zookeeper, Bootstrap 3.3.7

Confidential

Big Data Developer

Responsibilities:

Implement a pipeline to process and store streaming data from twitter with Pig, Hive, offline processing, Flume, Spark SQL, HBase and Oozie
Created an application in twitter API developers page and then generate the corresponding keys
Created Flume agents to handle streaming data from twitter and loaded the data into Hadoop cluster
Developed Scala scripts, UDF's using both Data frames/Spark SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing to transfer large sets of semi structured tweets data into structured one
Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS
Used Spark MLlib utilities such as classification, regression, clustering and collaborative filtering on tweets data, analyze, identify and remove the non-job ad tweets
Involved in creating Hive tables, loading data and implementation of various hive optimization techniques like dynamic partitions, buckets, map joins, parallel executions in those job tweets data
Dumped the data from HDFS to MySQL database and vice-versa using Sqoop
Used Oozie engine for creating workflow and coordinator jobs that schedule and execute various Hadoop jobs such as Pig jobs, Hive, Spark and automating Sqoop jobs
Configured Oozie workflow to run multiple Hive and Spark jobs which run independently with time and data availability
Unit tested a sample of raw data and improved performance and turned over to production

Environment: Apache Hadoop 2.5, Apache Spark 1.1.1, Flume 1.5.2, Pig 0.14.0, Hive 0.14.0, HBase 0.94.22, Oozie 4.1.0, Sqoop 1.4.5, MySQL 5.7.5, twitter API 0.0

Confidential

Data Analyst

Responsibilities:

Worked on huge amount of flight data to do ETL processing with different big data analytic tools including Spark, Hadoop, Hive, Pig and Impala
Applied Scala scripts, UDF's using both Data frames/Spark SQL and RDD/MapReduce in Spark to do batch processing with airline data
Developed Pig Latin scripts to transform the log data files and load into HDFS
Analyzed large data sets on HDFS with Impala queries and creating views for business processing
Created Hive tables, analyzed data with Hive Queries, and written Hive UDFs and worked on various performance optimizations like partition, bucketing, clustering, sampling, data compression, tuning and query optimization with Hive and Impala
Convert raw data with sequence data format Parquet to reduce data processing time and increase data transferring efficiency through the network
Connected Hive tables with Tableau and performed data visualization for report
Used Git for version control and Jenkins for continuous integration

Environment: Apache Hadoop 2.2.0, Apache Spark 0.8.0, Hive 0.12.0, Pig 0.12.0, Impala 1.1.1, Tableau 8.0.5, Jenkins 2.4.1

Confidential

Responsibilities:

Built a fully functional scalable and secure Web Applications for book catalog using the Flask Framework.
Utilized PostgreSQL database to allow users to register, login, logout and perform CRUD operations
Designed and styled the web application using Bootstrap.
Deployed the application on Heroku and restored the PostgreSQL database into Heroku using Amazon S3.

Environment: Flask 0.10, PostgreSQL 9.3, Bootstrap 3.1.0, Heroku, Amazon S3

We provide IT Staff Augmentation Services!

Big Data Developer Resume

Piscataway, NJ

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship