Big Data Hadoop Developer Resume
4.00/5 (Submit Your Rating)
New, YorK
SUMMARY
- Close to 5 years of technology experience including experience in Big data and Hadoop ecosystem. In - depth knowledge and hands-on experience in dealing with Apache Hadoop components like HDFS, MapReduce, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name node, HiveQL, HBase, Pig, Hive, Impala, Sqoop, InfoWorks, Oozie, Control-M, Cassandra, Flume and Spark.
- Extensively worked on MRV1 and MRV2 Hadoop architectures and wrote MapReduce programs, Pig & Hive scripts.
- Designed and created Hive external tables using shared meta-store instead of Derby with dynamic partitioning and bucketing.
- Experienced in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.
- Extensively used Kafka to load the log data from multiple sources directly into HDFS. Knowledge on RabbitMQ. Loaded streaming log data from various webservers into HDFS using Flume.
- Experienced in building Pig scripts to extract, transform and load data onto HDFS for processing. Excellent knowledge of data mapping, extract, transform and load from different data source. Experience in writing HiveQL queries to store processed data into Hive tables for analysis. Extended Hive and Pig core functionality by writing custom UDFs.
- Excellent understanding and knowledge of NOSQL databases like HBase and Cassandra.
- Designed Databases, created and managed schemas, wrote stored procs, functions, DDL, DML, SQL queries and data modeling
- Extensive experience in ETL Architecture, Development, enhancement, maintenance, Production support, Data Modeling, Data profiling, Reporting including Business requirement, system requirement gathering.
- Hands on experience in Multithreaded programming using akka actors .
- Hands on experience on Java8,Scala and Play/Akka framework.
- Hands-on experience in shell scripting. Knowledge on cloud services AWS and MS Azure.
- Proficient in using RDMS concepts with Oracle, SQL Server, MariaDB and MySQL.
- Experienced in Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
- Experienced in project life cycle (design, development, testing and implementation) of Client Server and Web applications.
- Experience in processing different file formats like XML, JSON and sequence file formats.
- Designed, deployed, maintained and lead the implementation of Cloud solutions using MS Azure and underlying technologies Implemented HA deployment models with Azure Classic and Azure Resource Manager and configured Azure Active Directory and managed users and groups. Worked on Continuous Integration CI/Continuous Delivery (CD) pipeline for Azure Cloud Services using Chef. Migrating Services from On-premise to Azure Cloud Environments.
- Good Knowledge in Amazon AWS concepts like EMR and EC2 web services for fast and efficient processing of Big Data.
- Good Experience in creating Business Intelligence solutions and designing ETL workflows using Tableau.
- Collaborate with development and QA teams to maintain high-quality deployment
- Designed Client/Server telemetry adopting latest monitoring techniques.
- Configured Azure Traffic Manager to build routing for user traffic
- Infrastructure Migrations: Drive Operational efforts to migrate all legacy services to a fully Virtualized Infrastructure.
- Exhibited strong written and oral communication skills. Rapidly learn and adapt quickly to emerging new technologies and paradigms
- Performed systems analysis for several information systems documenting and identifying performance and administrative bottlenecks.
- Good understanding and extensive work experience on SQL and PL/SQL
- Experience in designing and implementing of secure Hadoop cluster using Kerberos.
- Monitor health of the platforms and Generate Performance Reports and Monitor and provide continuous improvements.
PROFESSIONAL EXPERIENCE
Confidential, New York
Big Data Hadoop Developer
Responsibilities:
- Working on developing architecture document and proper guidelines
- Involved in all phases of Software Development Life Cycle (SDLC) and Worked on all activities related to the development, implementation and support for Hadoop.
- Installed and Configured Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Adding/installation of new components and removal of them through Cloudera Manager
- Played a key role in installation and configuration of the various Hadoop ecosystem tools such as Solr, Kafka, Pig, HBase and Cassandra.
- Hands on experience in Multithreaded programming using akka actors .
- Hands on experience on Java8,Scala and Play/Akka framework.
- Implemented multiple Map Reduce Jobs in java for data cleansing and pre-processing.
- Wrote complex Hive queries and UDFs in Scala and Python.
- Involved in implementing an HDInsight version 3.3 clusters, which is based on spark version 1.5.1.
- Responsible for data extraction and data ingestion from different data sources into Hadoop Data Lake by creating ETL pipelines using Pig, and Hive
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters and Experience in converting MapReduce applications to Spark.
- Job duties involved the design, development of various modules in Hadoop Big Data Platform and processing data using Map Reduce, Hive, Pig, Sqoop and Oozie.
- Design developed and tested Map Reduce programs on Mobile Offers Redemptions and Send it to the downstream applications like HAVI.
- Installed Oozie workflow engine to run multiple Map Reduce, Hive HQL and Pig jobs.
- Worked with Control-M workflow.
- Developed HDFS with huge amounts of data using Apache Kafka.
- Collected the log data from web servers and integrated into HDFS using Flume.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts and Experience in managing and reviewing Hadoop log files
- Constructed System components and developed server-side part using Java, EJB, and Spring Frame work. Involved in designing the data model for the system.
- Used J2EE design patterns like DAO, MODEL, Service Locator, MVC and Business Delegate.
- Worked with cloud services like Amazon Web Services (AWS) and involved in ETL, Data Integration and Migration
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
- Implemented best income logic using Pig scripts and UDFs.
- Component unit testing using Azure Emulator Analyze escalated incidences within the Azure SQL database
- Moderate and contribute to the support forums (specific to Azure Networking, Azure Virtual Machines, Azure Active Directory, Azure Storage) for Microsoft Developers Network including Partners and MVPs.
- Built a prototype Azure Data Lake application that accesses 3rd party data services via Web Services. The solution dynamically scales, automatically adding/removing cloud-based compute, storage and network resources based upon changing workloads.
- Worked with Azure ExpressRoute to create private connections between Azure datacenters and infrastructure for on premises and in colocation environment.
- Worked with XML files, to configure the .NET core application as well as PowerShell code.
- Proactively worked as the complete Software Development Life Cycle including Requirement Analysis, Design, Implementation, Testing and Maintenance
- Experience in deployment of Hadoop Ecosystems like Map Reduce, Yarn, Sqoop, Flume, Pyspak,Pig, Hive, Hbase, SPARK, SCALA, Cassandra, Zoo Keeper, Storm, Impala, Kafka
- Worked on data processing and transformations and actions in spark by using Python (Pyspark) language.
- Expertise in developing data driven applications using Python 2.7, Python 3.0 on Pycharm and Anaconda Spyder IDE's.
- Worked onHadoop, Hive, JAVA, python, Scala Struts web framework.
Environment: Spark, Kafka, Cloudera, Teradata, InfoWorks, MS Azure, HDFS, ZooKeeper, Hive, Pig, Oozie, Control-M, Core Java, Eclipse, HBase, Sqoop, Pyspak, Python.
Confidential, Jersey City, New Jersey
Big Data Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive, Sqoop, Spark.
- Responsible to manage data coming from different sources.
- Load and transform large sets of structured, semi structured and unstructured data
- Develop Spark code using Scala and Spark-SQL for faster testing and data processing
- Involved in the development of Spark Streaming application for one of the data sources using Scala, Spark.
- Experience in managing and reviewing Hadoop log files.
- Job management using Fair scheduler.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Developed various algorithms for generating several data patterns.Used JIRA for bug tracking and issue tracking.
- Developed Python/Django application for Analytics aggregation and reporting.
- Used Django configuration to manage URLs and application parameters.
- Generated Python Django Forms to record data of online users
- Used Python and Django creating graphics, XML processing, data exchange and business logic
- Created Oozie workflows to run multiple MR, Hive and pig jobs.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
- Implemented test scripts to support test driven development and continuous integration.
- Worked on tuning the performance of MapReduce Jobs.
- Involved in loading data from LINUX file system to HDFS.
- Perform architecture design, data modeling, and implementation of Big Data platform and analytic applications for the consumer products
- Analyze latest Big Data Analytic technologies and their innovative applications in both business intelligence analysis and new service offerings.
- Using PIG predefined functions to convert the fixed width file to delimited file.
- Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs
- Involved in scheduling Oozie workflow engine to run multiple Hive and Pig job.
- Worked with Control-M workflow.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Managed datasets using Panda data frames and MySQL, queried MYSQL database queries from Python using Python-MySQL connector MySQL dB package to retrieve information.
- Created POC on Hortonworks and suggested the best practice in terms HDP, HDF platform
- Set up Hortonworks Infrastructure from configuring clusters to Node
- Installed Ambari server on the clouds
- Setup security using Kerberos and AD on Hortonworks clusters/Cloudera CDH
- Assign access to users by multiple users’ login.
- Installed and configured CDH cluster, using Cloudera manager for easy management of existing Hadoop cluster.
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Extensively using Cloudera manager for managing multiple clusters with petabytes of data.
- Having knowledge on documenting processes, server diagrams, preparing server requisition documents
- Setting up the machines with Network Control, Static IP, Disabled Firewalls, Swap memory.
- Managing the configuration of the cluster to the meet the needs of analysis whether I/O bound or CPU bound
- Worked on setting up high availability for major production cluster. Performed Hadoop version updates using automation tools.
- Working on setting up 100 node production cluster and a 40 node backup cluster at two different data centers
- Performance tune and manage growth of the O/S, disk usage, and network traffic
- Responsible for building scalable distributed data solutions using Hadoop.
- Worked on installing cluster, commissioning & decommissioning of data node, name node recovery, capacity planning, and slots configuration.
Environment: Hadoop, HDFS, Pig, Sqoop, Shell Scripting, Ubuntu, Linux Red Hat, Spark, Scala, Hortonworks, Cloudera Manager, Apache Yarn, Python, Machine Learning, NLP (Natural Language Processing)
Confidential, Jersey City, New Jersey
Big Data Hadoop Admin
Responsibilities:
- Launching Amazon EC2 Cloud Instances using Amazon Web Services (Linux/ Ubuntu/RHEL) and Configuring launched instances with respect to specific applications.
- Installed application on AWS EC2 instances and also configured the storage on S3 buckets.
- Performed S3 buckets creation, policies and also on the IAM role based polices and customizing the JSON template.
- Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.
- Managed servers on the Amazon Web Services (AWS) platform instances using Puppet, Chef Configuration management.
- Developed PIG scripts to transform the raw data into intelligent data as specified by business users.
- Worked in AWS environment for development and deployment of Custom Hadoop Applications.
- Worked closely with the data modelers to model the new incoming data sets.
- Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, Map Reduce, Spark and Shell scripts (for scheduling of few jobs.
- Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, Oozie, Zookeeper, SQOOP, flume, Spark, Impala, Cassandra with Horton work Distribution.
- Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts
- Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and HBase.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data. Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs
- Import the data from different sources like HDFS/HBase into Spark RDD.
- Developed a data pipeline using Kafka and Storm to store data into HDFS.
- Performed real time analysis on the incoming data.
- Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
- Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
Environment: Apache Hadoop, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBASE, Oozie, Scala, Spark, Linux.