Big Data Developer Resume
5.00/5 (Submit Your Rating)
New, YorK
SUMMARY
- Over 4 years of IT experience as a Developer, Designer & quality Tester wif cross platform integration experience using Hadoop development and Admin.
- Hands on experience in installing, configuring and using Hadoop Ecosystem - HDFS, MapReduce, Pig, Hive, Oozie, Flume, HBase, Spark, Sqoop, Flume and Oozie.
- Strong understanding of various Hadoop services, MapReduce and YARN architecture.
- Responsible for writing Map Reduce programs.
- Experienced in importing-exporting data into HDFS using SQOOP.
- Experience loading data to Hive partitions and creating buckets in Hive.
- Developed Map Reduce jobs to automate transfer teh data from HBase.
- Expertise in analysis using PIG, HIVEand MapReduce.
- Experience in HDFS data storage and support for running map-reduce jobs.
- Experience in Chef, Puppet or related tools for configuration management.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
- Involved in Infrastructure set up and installation of HDP stack on Amazon Cloud.
- Experience wif ingesting data from RDBMS sources like - Oracle, SQL and Teradata into HDFS using Sqoop.
- Experience in big data technologies: Hadoop HDFS, Map-reduce, Pig, Hive, Oozie, Sqoop, Zookeeper and NoSQL.
- Adding/installation of new components and removal of them through Cloudera Manager.
- Experience in benchmarking, performing backup and disaster recovery of Name Node metadata and important sensitive data residing on cluster.
- Experience in designing and implementing HDFS access controls, directory and file permissions user authorization that facilitates stable, secure access for multiple users in a large multi-tenant cluster
- Extensive experience in importing and exporting streaming data into HDFS using stream processing platforms like Flume and Kafka messaging system.
- Implemented Capacity schedulers on teh Job tracker to share teh resources of teh Cluster for teh Map Reduce jobs given by teh users.
- Responsible for teh Provisioning, installing, configuring, monitoring and maintaining HDFS, Yarn, HBase, Flume, Sqoop, Oozie, Pig, Hive, Ranger, Falcon, Smart sense, Storm, Kafka.
- Experience in AWS CloudFront, including creating and managing distributions to provide access to S3 bucket or HTTP server running on EC2 instances.
- Good working noledge of Vertica DB architecture, column orientation and High Availability.
- Good understanding of Scrum methodologies, Test Driven Development and continuous integration.
- Major strengths are familiarity wif multiple software systems, ability to learn quickly new technologies, adapt to new environments, self-motivated, team player, focused adaptive and quick learner wif excellent interpersonal, technical and communication skills.
- Experience in defining detailed application software test plans, including organization, participant, schedule, test and application coverage scope.
- Experience in gathering and defining functional and user interface requirements for software applications.
- Experience in real time analytics wif Apache Spark (RDD, Data Frames and Streaming API).
- Used Spark Data Frames API over Cloudera platform to perform analytics on Hive data.
- Experience in integrating Hadoop wif Kafka. Expertise in uploading Click stream data from Kafka to HDFS.
- Expert in utilizing Kafka for messaging and publishing subscribe messaging system.
PROFESSIONAL EXPERIENCE
Confidential, New York
Big Data Developer
Responsibilities:
- Developed Spark applications using Scala utilizing Data frames and Spark SQL API for faster processing of data.
- Involved in Agile methodologies, daily scrum meetings, spring planning, and scripts were written for distribution of query for performance test jobs in Amazon Data Lake.
- Created Hive Tables, loaded transactional data from Teradata using Sqoop, and worked wif highly unstructured and semi-structured data of 2 Petabytes in size.
- Developed MapReduce jobs for cleaning, accessing, and validating teh data and created and worked Sqoop jobs wif teh incremental load to populate Hive External tables.
- Developed optimal strategies for distributing teh weblog data over teh cluster importing and exporting teh stored web log data into HDFS and Hive using Sqoop.
- Apache Hadoop installation & configuration of multiple nodes on AWS EC2 system and developed Pig Latin scripts for replacing teh existing legacy process to teh Hadoop and teh data is fed to AWS S3.
- Responsible for building scalable distributed data solutions using Hadoop Cloudera and designed and developed automation test scripts using Python
- Integrated Apache Storm wif Kafka to perform web analytics and to perform clickstream data from Kafka to HDFS.
- Analyzed teh SQL scripts and designed teh solution to implement using Spark and implemented Hive Generic UDF's to in corporate business logic into Hive Queries.
- Responsible for developing data pipeline wif Amazon AWS to extract teh data from weblogs and store it in HDFS.
- Uploaded streaming data from Kafka to HDFS, HBase, and Hive by integrating wif storm and writing Pig-scripts to transform raw data from several data sources into forming baseline data.
- Worked on MongoDB by using CRUD (Create, Read, Update and Delete), Indexing, Replication, and Sharding features.
- Involved in designing teh row key in HBase to store Text and JSON as key values in teh HBase table and designed row key in such a way to get/scan it in sorted order.
- Integrated Oozie wif teh rest of teh Hadoop stack supporting several types of Hadoop jobs out of teh box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts)
- Creating Hive tables and working on them using Hive QL and designed and Implemented Partitioning (Static, Dynamic) Buckets in HIVE.
- Used Spark-Streaming APIs to perform necessary transformations and actions on teh fly for building teh common learner data model which gets teh data from Kafka in near real-time and persists into Cassandra.
- Developed syllabus/Curriculum data pipelines from Syllabus/Curriculum Web Services to HBASE and Hive tables.
- Worked on Cluster coordination services through Zookeeper and monitored workload, job performance, and capacity planning using Cloudera Manager
- Involved in build applications using Maven and integrated wif CI servers like Jenkins to build jobs.
- Configured deployed and maintained multi-node Dev and Test Kafka Clusters and implemented data ingestion and handling clusters in real-time processing using Kafka.
- Creating teh cube in Talend to create different types of aggregation in teh data and also to visualize them.
- Monitor Hadoop Name Node Health status, number of Task trackers running, number of Data Nodes running and automated all teh jobs starting from pulling teh Data from different Data Sources like MySQL to pushing teh result set Data to Hadoop Distributed File System.
- Developed story-telling dashboards in Tableau Desktop and published them on to Tableau Server and used GitHub version controlling tools to maintain project versions.
Environment: Hadoop, HDFS, HBase, Spark, Scala, Hive, MapReduce, Sqoop, ETL, Java, PL/SQL, Oracle 11g, Unix/Linux.
Confidential, New York, New York
Big Data Developer
Responsibilities:
- Involved in Hive/SQL queries performing spark transformations using Spark RDDs and Python (spark).
- Created a Serverless data ingestion pipeline on AWS using lambda functions.
- Configured Spark Streaming to receive real time data from teh Apache Kafka and store teh stream data to DynamoDB using Scala.
- Developed Apache Spark Applications by using Scala, Python and Implemented Apache Spark data processing module to handle data from various RDBMS and Streaming sources.
- Experience in developing and scheduling various Spark Streaming / batch Jobs using python (pyspark) and Scala.
- Developing spark code using pyspark to be applying various transformations and actions for faster data processing.
- Achieved high-throughput, scalable, fault-tolerant stream processing of live data streams using Apache Spark Streaming
- Used Spark Stream processing using Scala to get data into in-memory, created RDDs, Data Frames and applied transformations and actions.
- Involved in using various Python libraries wif spark in order to create data frames and store them to Hive.
- Sqoop jobs and Hive queries were created for data ingestion from relational databases to analyze historical data.
- Experience in working wif Elastic MapReduce (EMR) and setting up environments on amazon AWS EC2 instances.
- Knowledge on handling Hive queries using Spark SQL that integrates wif Spark environment.
- Executed Hadoop/Spark jobs on AWS EMR using programs, stored in S3 Buckets.
- Knowledge on creating teh user defined functions (UDF's) in Hive.
- Worked wif different File Formats like c, Avro, parquet for HIVE querying and processing based on business logic.
- Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
- Implemented Hive UDF's to implement business logic and Responsible for performing extensive data validation using Hive.
- Involved in loading teh structured and semi structured data into spark clusters using Spark SQL and Data Frames API.
- Involved in developing code and generated various data frames based on teh business requirement and created temporary tables in hive.
- Utilized AWS CloudWatch to monitor teh performance environment instances for operational and performance metrics during load testing.Scripting Hadoop package installation and configuration to support fully automated deployments.
- Involved in chef-infra maintenance including backup/security fix on Chef Server.
- Deployed application updates using Jenkins. Installed, configured, and managed Jenkins
- Triggering teh SIT environment build of client remotely through Jenkins.
- Deployed and configured Git repositories wif branching, forks, tagging, and notifications.
- Experienced and proficient deploying and administering GitHub
- Deploy builds to production and work wif teh teams to identify and troubleshoot any issues.
- Worked on MongoDB database concepts such as locking, transactions, indexes, Shading, replication, schema design.
- Consulted wif teh operations team on deploying, migrating data, monitoring, analyzing, and tuning MongoDB applications.
- Viewing teh selected issues of web interface using SonarQube.
- Developed a fully functional login page for teh company's user facing website wif complete UI and validations.
- Installed, Configured and utilized AppDynamics (Tremendous Performance Management Tool) in teh whole JBoss Environment (Prod and Non-Prod).
- Responsible for upgradation of SonarQube using upgrade center.
- Resolving tickets submitted by users, P1 issues, troubleshoot teh error documenting, resolving teh errors.
- Installed and configured Hive in Hadoop cluster and help business users/application teams fine tune their HIVE QL for optimizing performance and efficient use of resources in cluster.
- Conduct performance tuning of teh Hadoop Cluster and map reduce jobs. Also, teh real-time applications wif best practices to fix teh design flaws.
- Implemented Oozie workflow for ETL Process for critical data feeds across teh platform.
- Configured Ethernet bonding for all Nodes to double teh network bandwidth
- Implementing Kerberos Security Authentication protocol for existing cluster.
- Built high availability for major production cluster and designed automatic failover control using Zookeeper Failover Controller (ZKFC) and Quorum Journal nodes.
Environment: HDFS, Map Reduce, Hive 1.1.0, Kafka, Hue 3.9.0, Pig, Flume, Oozie, Sqoop, Apache Hadoop 2.6, Spark, SOLR, Storm, Cloudera Manager, Red Hat, MySQL, Prometheus, Docker, Puppet.