Hadoop Developer Resume
0/5 (Submit Your Rating)
New York City New, YorK
SUMMARY
- Total 4+ Years' experience in It Sector & development in Big - Data and Hadoop frameworks.
- Exceptional understanding and knowledge of Hadoop architecture and various components such as HDFS, MapReduce, Name Node, Data Node, Resource Manager, Node Manager, Job Tracker, Task Tracker programming paradigm and Hadoop Ecosystem (Hive, Impala, Sqoop, Flume, Oozie, Zookeeper, Kafka, Spark).
- Well versed in Installation, Configuration, Supporting and Managing of Big Data and Underlying infrastructure of Hadoop Cluster.
- Experience with Cloudera Manager Administration also experience In Installing, Updating Hadoop and its related components in Single node cluster as well as Multi node cluster environment using Cloudera Manager.
- Experience in Database Administration, performance tuning and backup & recovery and troubleshooting in large scale customer facing environment.
- Experience in managing and reviewing Hadoop log files.
- Experience in analyzing data using HiveQL, Impala and custom MapReduce programs in Java.
- Extending Hive and Pig core functionality by writing custom UDFs.
- Experience with Database administration, maintenance, and schema design for RDBMS systems like PostgreSQL and MySQL.
- Experience in data management and implementation of Big Data applications using Hadoop frameworks.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Experience with leveraging Hadoop ecosystem components including Pig and Hive for data analysis, Sqoop for data Ingestion, Oozie for scheduling and HBase as a NoSQL data store.
- Experienced in deployment of Hadoop Cluster using Ambari, Cloudera Manager.
- Experience in Hadoop Shell commands, writing MapReduce Programs, verifying managing, and reviewing Hadoop Log files.
- Proficient in configuring Zookeeper, Flume & Sqoop to the existing Hadoop cluster.
- Having good Knowledge in Apache Flume, Sqoop, Hive, Hcatalog, Impala, Zookeeper, Oozie.
- Expertise in deployment of Hadoop, Yarn, Spark, and Storm integration with Cassandra, Ignite and RabbitMQ, Kafka.
- Very Good Knowledge in YARN (Hadoop 2.x.x) terminology and High availability Hadoop Clusters.
- Experience in analyzing the log files for Hadoop and ecosystem services and finding out the root cause.
- Performed Thread Dump Analysis for stuck threads and Heap Dump Analysis for leaked memory with Memory analyzer tool manually.
- Very Good experience on high-volume transactional systems running on Unix/Linux and Windows.
- Firsthand experience with Chef, Confidential and Ansible.
- Involved in all phases of Software Development Life Cycle (SDLC) in large-scale enterprise software using Object Oriented Analysis and Design.
- Co-ordination with different tighter schedules and efficient in meeting deadlines.
- Followed the Agile methodology in developments and used tools like Git for SCM applications.
- Good Experience with working on Lower and higher environments such as Test, Stage, Unit, Production, & Dev Environments.
- Self- starter, adaptive person and a collaborator with effective communication and people skills.
- Experience with JIRA for management and to anticipate the needs to be informed on a project, and at what time to the stake holders.
PROFESSIONAL EXPERIENCE
Confidential, New York City, New York
Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoop cluster using different big data analytic tools includingPig, Hive, andMapReduce.
- Coordinated with business customers to gather business requirements. And interacted with other technical peers to derive technical requirements.
- Developed data pipeline usingSqoopand Java map reduce to ingest customer behavioral data and financial histories intoHDFSfor analysis.
- Implemented File Transfer Protocol operations usingTalend Studio to transfer files in between network folders.
- Developed a data pipeline usingKafka to store data into HDFS.
- Experience in loading and transforming of large sets of structured, semi structured, and unstructured data.
- Experience in converting SQL queries intoSpark TransformationsusingSpark RDDsand Scala and Performed map-side joins on RDD's.
- Experience in performingTransformations & ActionsonRDDsandSparkStreamingdata.
- Proficient in developing data transformation and other analytical applications inSpark,Spark-SQLusingScala programming language
- Ingested structured data into appropriate schemas and tables to support the rule and analytics.
- Developed custom User Defined Function(UDF’s) in Hive to transform the large volumes of data with respect to business requirement.
- Loading data from different source (database & files) into Hive usingTalend tool.
- Collaborated with Developer teams to move data in to HDFS through Sqoop.
- Collaborated with developer teams on workflow to pick up the data from rest API server, from data lake as well as from SFTP server and send that to Kafka broker
- Involved in loading data from edge node toHDFSusing shell scripting.
- Load and transform large sets of structured, semi structured, and unstructured data.
- Developed and Hive Scripts, Hive UDFs to load data files.
- Managed Hadoop jobs usingairflow workflowscheduler system forMap Reduce, Hive, Sqoop actions.
- Troubleshooting, debugging & alteringTableau particular issues, while maintaining the health and performance of the ETL environment.
- Experience in managing and reviewing Hadoop log files.
- UsedOozieworkflow engine to run multiple Hive and pig jobs.
- Analyzed copious amounts of data sets to determine optimal way to aggregate and report on it.
- Responsible to manage the test data coming from various sources.
- Developed batch process usingUnix Shell Scripting.
- Exceptional understanding of Agile Mythologies with Rally and pushing development through Gitlab.
- Creating automation jobs with Control-M and working in production support as back up for the team.
- Good understanding on cluster configurations and resource management using YARN.
Environment: Hadoop, MapReduce, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Spark, Scala, Oracle 11g, Core Java, Cloudera, HDFS, Eclipse, UNIX, LINUX, Oracle
Confidential, Columbus, Ohio
Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop.
- Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
- Wrote Hive queries and UDFs.
- Worked on reading multiple data formats on HDFS.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, and Scala.
- Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Analyzed the SQL scripts and designed the solution to implement using Scala.
- Involved in loading data from UNIX file system to HDFS.
- Extensively worked on Apache Cassandra tables, created Apache Cassandra procedures for Bulk Loading.
- Wrote Spark transformations and action jobs to get data from source DB/log files and migrating to destination Cassandra database.
- Involved in design Cassandra data model, used CQL (Cassandra Query Language) to perform CRUD operations on Cassandra file system.
- Extracted the data from Databases into HDFS using Sqoop.
- Managed importing of data from various data sources, performed transformations using Hive, Pig and Spark and loaded data into HDFS.
- Manage and review Hadoop log files.
- Experience in migration of data across cloud environment to Amazon EC2 clusters.
- Involved in analysis, design, testing phases and responsible for documenting technical specifications.
- Developed Kafka producer and consumers, HBase clients, Spark, and Hadoop MapReduce jobs along with components on HDFS, Pig, Hive.
- Developed solutions to pre-process large sets of structured, semi-structured data, with different file formats (Text file, Avro data files, Sequence files, Xml and JSON files, ORC and Parquet).
- Particularly good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance
- Worked on the core and Spark SQL modules of Spark extensively
Environment: Hadoop, MapReduce, Cloudera Manager, HDFS, HIVE, PIG, HBase, Solr, Sqoop, Spark/Scala, Flume, Oozie, UNIX Shell Scripting, SQL, Eclipse.
Confidential, Ohio
Jr. Hadoop Developer
Responsibilities:
- Evaluate business requirements and prepare detailed specifications that follow project guidelines required to develop written programs.
- Installed application on AWS EC2 instances and configured the storage on S3 buckets.
- Performed S3 buckets creation, policies and on the IAM role based polices and customizing the JSON template.
- Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.
- Installed, configured, monitored, and maintained HDFS, Yarn, HBase, Flume, Sqoop, Oozie, Pig, Hive.
- Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name node recovery, Capacity planning, Cassandra, and slots configuration.
- Experience in developing programs in Spark using Python to compare the performance of Spark with Hive and SQL/Oracle.
- Firsthand experience in provisioning and managing multi-node Hadoop Clusters on public cloud environment Amazon Web Services (AWS) - EC2 and on private cloud infrastructure.
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using Python. Worked with business teams and created Hive queries for ad hoc access.
- Experience in understanding Hadoop multiple data processing engines such as interactive SQL, real time streaming, data science and batch processing to handle data stored in a single platform in Yarn
- Firsthand experience in installing, configuring Map, Hortonworks clusters and installed Hadoop ecosystem components like Hadoop, Pig, Hive, HBase, Sqoop, Kafka, Oozie, Flume, Zookeeper.
- Monitoring systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
- Worked on Scripting Hadooppackage installation and configuration to support fully automated deployments
- Supported Hadoop developers and assisting in optimization of map reduce jobs, Pig Latin scripts, Hive Scripts and HBase ingest required.
- Defined job flows and managed and reviewed Hadoop and HBase log files.
- Ran Hadoop streaming jobs to process terabytes of text data.
- Worked on YARN capacity scheduler by creating queues to allocate resource guarantee to specific groups.
- Implemented Hadoop stack and different bigdata analytic tools, migration from different databases to Hadoop (Hdfs)
- Developed backup policies for Hadoop systems and action plans for network failure.
- Involved in the User/Group Management in Hadoop with AD/LDAP integration.
- Resource management and load management using capacity scheduling and appending changes according to requirements.
- Implemented strategy to upgrade entire cluster nodes OS from RHEL5 to RHEL6 and ensured cluster remains up and running.
- Developed scripts in shell and python to automate lot of day-to-day activities.
- Installed several projects on Hadoop servers and configured each project to run jobs and scripts successfully
- Created user accounts and given users the access to the Hadoop cluster.
- Resolved tickets submitted by users, troubleshot the error documenting, and resolved the errors.
- Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
- Implemented Spark using Scala and Sparks for faster testing and processing of data.
Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, Puppet, Zookeeper, HBase, Flume, Ganglia, Sqoop, Linux, CentOS, Ambari.
