We provide IT Staff Augmentation Services!

Sr. Big Data Developer Resume

4.00/5 (Submit Your Rating)

Brooklyn, NY

SUMMARY

  • Total 8+ Years of experience in IT Sector in Big - Data and Hadoop frameworks and cloud services.
  • Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing.
  • Exceptional understanding and knowledge of Hadoop architecture and various components such as HDFS, MapReduce, Name Node, Data Node, Resource Manager, Node Manager, Job Tracker, Task Tracker programming paradigm and Hadoop Ecosystem (Hive, Impala, Sqoop, Flume, Oozie, Zookeeper, Kafka, Spark).
  • Well versed in Installation, Configuration, Supporting and Managing of Big Data and Underlying infrastructure of Hadoop Cluster.
  • Experience In Installing, Updating Hadoop and its related components in Single node cluster as well as Multi node cluster environment using Cloudera Manager.
  • Experience in managing and reviewing Hadoop log files.
  • Experience in analyzing data using HiveQL, Impala and custom MapReduce programs in Java.
  • Extending Hive and Pig core functionality by writing custom UDFs.
  • Experience with all stages of the SDLC and Agile Development model right from the requirement gathering to Deployment and production support.
  • Experience with Database administration, maintenance, and schema design for PostgreSQL and MySQL.
  • Experience in data management and implementation of Big Data applications using Hadoop frameworks.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Experience with leveraging Hadoop ecosystem components including Pig and Hive for data analysis, Sqoop for data Ingestion, Oozie for scheduling and HBase as a NoSQL data store.
  • Experienced in deployment of Hadoop Cluster using Ambari, Cloudera Manager.
  • Experience in Hadoop Shell commands, writing MapReduce Programs, verifying managing, and reviewing Hadoop Log files.
  • Proficient in configuring Zookeeper, Flume & Sqoop to the existing Hadoop cluster.
  • Having good Knowledge in Apache Flume, Sqoop, Hive, Hcatalog, Impala, Zookeeper, Oozie.
  • Experience with Java web framework technologies like Spring Batch.
  • Expertise in deployment of Hadoop, Yarn, Spark, and Storm integration with Cassandra, Ignite and RabbitMQ, Kafka.
  • Very Good experience on high-volume transactional systems running on Unix/Linux and Windows.
  • Co-ordination with different tighter schedules and efficient in meeting deadlines. Self- starter, adaptive person and a collaborator with effective communication and people skills.

TECHNICAL SKILLS

Hadoop/Big Data Technologies: Hadoop 3.3, HDFS, MapReduce, HBase 1.4, Apache Pig, Hive, Sqoop 1.4,, Oozie 4.3, Yarn, Apache Flume 1.9, Kafka 2.8, Zookeeper

SDLC Methodologies: Agile, Waterfall

Hadoop Distributions: Cloudera, Hortonworks, MapR

NoSQL Databases: HBase 1.4, Cassandra 3.11, MongoDB

Cloud Platform: Amazon AWS, EC2, EC3, MS Azure

Programming Language: Java, Scala, Python 3.5, Shell Scripting, Storm 1.0, JSP, Servlets

Web Technologies: HTML, CSS, JavaScript, JQuery 3.3, XML, JSON

Frameworks: Spring MVC, Hibernate 5.2, Struts 1.3, JSF, EJB, JMS

Web/Application Server: Apache Tomcat 9.0.7, JBoss, Web Logic, Web Sphere

Version Control: GIT, SVN, CVS

Operating Systems: Linux, Unix, Windows 10/8/7

PROFESSIONAL EXPERIENCE

Confidential

Sr. Big Data Developer

Responsibilities:

  • As a Big Data Developer, assisted in leading the plan, building, and running states within the Enterprise Analytics Team.
  • Worked in agile mode and interacted closely with the product owner and business team.
  • Interacted with other technical peers to derive technical requirements.
  • Developed a data pipeline using Kafka to store data into HDFS.
  • Built Jenkins jobs to create AWS infrastructure from GitHub repos containing terraform code.
  • Collaborated with developer teams on workflow to pick up the data from rest API server, from data lake as well as from SFTP server and send that to Kafka broker.
  • Worked on functions in Lambda that aggregates the data from incoming events, and then stored result data in Amazon DynamoDB.
  • Implemented File Transfer Protocol operations using Talend Studio to transfer files in between network folders.
  • Developed monitoring and notification tools using Python.
  • Ingested structured data into appropriate schemas and tables to support the rule and analytics.
  • Worked on transferring data from Kafka topic into AWS S3 storage.
  • Involved in loading and transforming of large sets of structured, semi structured, and unstructured data.
  • Involved in performing Transformations & Actions on RDDs and Spark Streaming data.
  • Developed custom User Defined Function (UDF’s) in Hive to transform the large volumes of data with respect to business requirement.
  • Loaded data from different source (database & files) into Hive using Talend tool.
  • Involved in deploying and maintaining production environment using AWS EC2 instances and ECS with Docker.
  • Developed tools using Python, XML to automate some of the menial tasks.
  • Involved in loading data from edge node to HDFS using shell scripting.
  • Developed and Hive Scripts, Hive UDFs to load data files.
  • Managed Hadoop jobs using airflow workflow scheduler system for Map Reduce, Hive, Sqoop actions.
  • Worked on DynamoDB and Amazon EMR.
  • Troubleshooting, debugging & altering Tableau particular issues, while maintaining the health and performance of the ETL environment.
  • Created data transformations on PySpark, AWS Glue.
  • Involved in managing and reviewing Hadoop log files.
  • Used DynamoDB Stream to lambda using python.
  • Used Oozie workflow engine to run multiple Hive and pig jobs.
  • Analyzed copious amounts of data sets to determine optimal way to aggregate and report on it.
  • Created and attached volumes on to EC2 instances.
  • Created test harness to enable comprehensive testing utilizing Python.
  • Worked on cluster configurations and resource management using YARN.
  • Responsible to manage the test data coming from various sources.
  • Developed batch process using Unix Shell Scripting.

Environment: Hadoop3.3, Agile, AWS, Jenkins, DynamoDB, Python3.5, Yarn, Hive3.2, Oozie, MapReduce, Terraform, Rest API, Kafka 2.8, Talend, Tableau.

Confidential - Brooklyn, NY

Big Data/Hadoop Developer

Responsibilities:

  • Supported Big Data/Hadoop developers and assisting in optimization of map reduce jobs, Pig Latin scripts, Hive Scripts and HBase ingest required.
  • Worked with business teams and created Hive queries for ad hoc access.
  • Designed the business requirement collection approach based on the project scope and SDLC (Agile) methodology.
  • Worked on S3 buckets on AWS to store Cloud Formation Templates and worked on AWS to create EC2 instances.
  • Collaborated with Developer teams to move data in to HDFS through Sqoop.
  • Automated data movements using python scripts.
  • Highly skilled in integrating Kafka with Spark streaming for high-speed data processing.
  • Worked with Avro Data Serialization system to work with JSON data formats.
  • Involved in developing programs in Spark using Python to compare the performance of Spark with Hive and SQL.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop.
  • Developed backup policies for Hadoop systems and action plans for network failure.
  • Installation and setting up Python in AWS and EC2 cloud platform.
  • Created custom new columns depending up on the use case while ingesting the data into Hadoop Lake using pyspark.
  • Wrote Hive queries and UDFs.
  • Worked on reading multiple data formats on HDFS.
  • Analyzed the SQL scripts and designed the solution to implement using Scala.
  • Created automated pipelines in AWS Code Pipeline to deploy Docker containers in AWS ECS using S3.
  • Extracted the data from Databases into HDFS using Sqoop.
  • Managed importing of data from various data sources, performed transformations using Hive, Pig and Spark and loaded data into HDFS.
  • Managed and reviewed Hadoop log files.
  • Used Oozie scripts for deployment of the application and perforce as the secure versioning software.
  • Implemented a Continuous Delivery pipeline with Docker, and GitHub and AWS.
  • Involved in analysis, design, testing phases and responsible for documenting technical specifications.
  • Developed Kafka producer and consumers, HBase clients, Spark, and Hadoop MapReduce jobs along with components on HDFS, Pig, Hive.
  • Worked with Apache Nifi for managing the flow of data from sources through automated data flow.
  • Developed AWS Lambda to invoke glue job as soon as a new file is available in Inbound S3 bucket.
  • Worked on the core and SparkSQL modules of Spark extensively.
  • Defined job flows and managed and reviewed Hadoop and HBase log files.
  • Developed solutions to pre-process large sets of structured, semi-structured data.
  • Involved in loading data from UNIX file system to HDFS.

Environment: Hadoop3.0, Agile, AWS, Python, Docker, Avro, Pig0.17, Hive, Sqoop1.4, HBase1.2, Kafka1.1, MapReduce, GitHub, Oozie4.3, SQL.

Confidential, Jacksonville FL

Spark/Hadoop Developer

Responsibilities:

  • Evaluate business requirements and prepare detailed specifications that follow project guidelines required to develop written programs.
  • Utilized Agile and Scrum Methodology to help manage and organize a team of developers with regular code review sessions.
  • Installed application on AWS EC2 instances and configured the storage on S3 buckets.
  • Worked on Spark for in memory commutations and comparing the Data Frames for optimizing performance.
  • Implemented POC to migrate MapReduce programs into Spark transformations using Spark and Scala.
  • Performed S3 buckets creation, policies and on the IAM role based polices and customizing the JSON template.
  • Installed, configured, monitored, and maintained HDFS, Yarn, HBase, Flume, Sqoop, Oozie, Pig, Hive.
  • Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name node recovery, Capacity planning, Cassandra, and slots configuration.
  • Installed and configured Hadoop Ecosystem components and Cloudera manager using CDH distribution.
  • Designed and developed applications in Spark using Scala to compare the performance of Spark with Hive.
  • Exported of result set from HIVE to MySQL using Sqoop export tool for further processing.
  • Involved in provisioning and managing multi-node Hadoop Clusters on public cloud environment Amazon Web Services (AWS) - EC2 and on private cloud infrastructure.
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala.
  • Involved in installing, configuring Map, Hortonworks clusters and installed Hadoop ecosystem components like Hadoop, Pig, Hive, Sqoop, Kafka, Oozie, Flume, Zookeeper.
  • Worked on Scripting Hadoop package installation and configuration to support fully automated deployments
  • Implemented Spark using Scala and Sparks for faster testing and processing of data.
  • Ran Hadoop streaming jobs to process terabytes of text data.
  • Worked on YARN capacity scheduler by creating queues to allocate resource guarantee to specific groups.
  • Resolved tickets submitted by users, troubleshot the error documenting, and resolved the errors.

Environment: Spark, Agile, AWS, Hadoop, MySQL, Yarn, Scala, Hortonworks, Apache Pig, Flume, MapReduce, Oozie, Zookeeper.

Confidential - Boston, MA

Java Developer

Responsibilities:

  • Extensively developed Java and EJB in Mainframe DB2 environment.
  • Applied advanced HTM, CSS and JavaScript to deliver cutting-edge user interfaces and components.
  • Used jQuery plugins to develop the custom portal templates.
  • Created Java test apps and enhanced using JUnit and Eclipse IDE.
  • Integrated the frontend jQuery UI with backend REST API.
  • Used JSON to transmit the data from server application layers to web application layers.
  • Created web service component using REST, SOAP, WSDL, XML to interact with the middleware
  • Developed application using MyEclipse for rapid development.
  • Used Spring MVC based on spring annotation such as request Mapping.
  • Implemented Log4j for logging and developed test cases using JUnit.
  • Implemented various J2EE design patterns like DAO pattern, Business Delegate, Value Object.
  • Used AJAX and JSON to make asynchronous calls to the project server to fetch data on the fly.
  • Build jobs to provide continuous automated builds based on polling the Git source control system.
  • Used JIRA extensively for Defect tracking and reporting, made use of Confluence for document management.
  • Responsible for fixing any bugs and communicating back over to the QA team.

Environment: Java, Spring MVC, HTML, CSS, JavaScript, Rest API, MyEclipse, Log4j, JSON, JIRA, GIT, QA.

We'd love your feedback!