We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Irving, TX

PROFESSIONAL SUMMARY:

  • Around 7 years of overall IT Industry and Software Development experience with 4+ years of experience in Hadoop Development
  • Experience in installation, upgrading, configuration, monitoring supporting and managing in Hadoop clusters using Cloudera CDH4, CDH5, Hortonworks HDP 2.1, 2.2 and 2.3 on Ubuntu, RedHat, CentOS systems.
  • Worked on components of CDH and HDP including HDFS, MapReduce, Job tracker, Task tracker, Sqoop, Zookeeper, YARN, Oozie, Hive, Hue, Flume, HBase, Fair Scheduler, Spark and Kafka.
  • Deployed Hadoop clusters on public and private cloud environments like AWS and OpenStack
  • Involved in vendor selection and capacity planning for the Hadoop Clusters in production.
  • Experienced in Administering the Linux systems to deploy Hadoop cluster and monitoring the cluster using Nagios and Ganglia.
  • Experienced in performing backup, recovery, failover and DR practices on multiple platforms. implemented Kerberos and LDAP authentication of all the services across Hadoop clusters
  • Experienced in automating the provisioning processes and system resources using Puppet.
  • Implemented Hadoop - based solutions to store archives and backups from multiple sources.
  • Familiar with importing and exporting data using Sqoop from RDBMS MySQL, Oracle, Teradata and used fast loaders and connectors
  • Built ingestion framework using flume for streaming logs and aggregating the data into HDFS.
  • Worked with application team via scrum to provide operational support, install Hadoop updates, patches and version upgrades as required.
  • Imported and exported data in and out of HDFS and processed data for commercial analytics
  • Installed, monitored and performance tuned standalone multi-node clusters of Kafka.
  • Successfully loaded files to Hive and HDFS from MongoDB, Cassandra, and Hbase.
  • Implemented authentication using Kerberos and authentication using Apache Sentry.
  • Exposure to installing Hadoop and its ecosystem components such as Hive and Pig.
  • Experienced in collaborative platforms including Jira, Rally, SharePoint and Discovery
  • Experience in understanding and managing Hadoop Log Files, experience in managing the Hadoop infrastructure with Cloudera Manager.
  • Experienced on SQL DBA in HA and DDR like replication, log shipping, mirroring and clustering and database security and permissions
  • Worked with highly transactional merchandise and investment in SQL databases with PCI, HIPAA compliance involving data encryption with certificates and security keys at various levels
  • Experienced in upgrading SQL Server software, patches and service packs
  • Experience in providing good production support for 24x7 over weekends on rotation basis.

PROFESSIONAL EXPERIENCE:

HADOOP DEVELOPER

Confidential, Irving, TX

Responsibilities:

  • Participate in the development, enhancement and maintenance of web applications both as an individual contributor and as a team member.
  • Leading in the identification, isolation, resolution and communication of problems within the production environment.
  • Leading developer and applying technical skills Apache/Confluent Kafka, Big Data technologies, Spark.
  • Design recommend best approach suited for data movement from different sources to HDFS using Apache/Confluent Kafka
  • Performs independent functional and technical analysis for major projects supporting several corporate initiatives
  • Communicate and Work with IT partners and user community with various levels from Sr Management to detailed developer to business SME for project definition.
  • Works on multiple platforms and multiple projects concurrently.
  • Performs code and unit testing for complex scope modules, and projects
  • Additional background in Java programming is needed, since the person needs to build test framework in Java using Kafka APIs.

Environment: HDFS, Sqoop, Hive, Spark, Control M, Hadoop, Confluent Kafka, Hortonworks HDF, HDP, NIFI, Linux, Java, Puppet, Apache Yarn, Pig, Scala, PySpark, Python, AWS EC2, S3, Glue, RedShift, Athena, EMR, IAM, Kinesis, Data pipeline, Oracle, MYSQL, DB2, Cassandra, HBase, MongoDB

HADOOP DEVELOPER

Confidential, New York, NY

Responsibilities:

  • Developed architecture document, process documentation, server diagrams, requisition documents
  • Worked on installing Kafka on Virtual Machine and created topic for business segments and stakeholders
  • Installed Zookeepers, brokers, schema registry, control Center on multiple machines.
  • Developed security for users and they can connect with SSL security and defined access privileges
  • Used Puppet for automation of deployment to the server
  • Monitored errors and warning on the server using Splunk
  • Set up Hortonworks Infrastructure from configuring clusters to Node and installed Ambari servers on the clouds
  • Setup security using Kerberos and AD on Hortonworks clusters
  • Managing the configuration of the cluster to the meet the needs of analysis whether I/O bound or CPU bound
  • Set up HA for major production clusters - performed Hadoop version updates using automation tools
  • Performance tune and manage growth of the O/S, disk usage, and network traffic
  • Responsible for building scalable distributed data solutions using Hadoop on LINUX systems
  • Analyze latest Big Data analytics and their innovative applications in both BI analysis and new service offerings
  • Worked on installing cluster, commissioning & decommissioning of data node, name node recovery, capacity planning, and slots configuration.
  • Created User Guide Development and Training overviews for supporting teams
  • Provide troubleshooting and best practices methodology for development teams. This includes process automation and new application onboarding
  • Design monitoring solutions and baseline statistics reporting to support the implementation
  • Experienced with designing and building solutions for data ingestion both real time & batch using Sqoop/PIG/Impala/Kafka.
  • Experienced with Map Reduce, Spark Streaming, Sparks for data processing and reporting.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Implemented Spark using Scala and Sparks for faster testing and processing of data.
  • Used Apache Kafka for importing real time network log data into HDFS.
  • Developed business specific Custom UDF's in Hive, Pig.
  • Configured Oozie workflow to run multiple Hive and Pig jobs
  • Optimized Map Reduce code by writing Pig Latin scripts.
  • Import data from external table into HIVE by using load command
  • Created tables in hive and use static, dynamic partition for data slicing mechanism
  • Experienced with monitoring cluster, identifying risks, establishing good practices
  • Good understanding on cluster configurations and resource management using YARN.
  • Started using apache NiFi to copy the data from local file system to HDFS.
  • Worked on developing ETL processes to load data from multiple data sources to HDFS using SQOOP, Pig &Oozie.
  • Leveraged ETL methods for ETL solutions and data warehouse tools for reporting and analysis

Environment: Hadoop, Confluent Kafka, Hortonworks HDF, HDP, NIFI, Linux, Splunk, Java, Puppet, Apache Yarn, Pig, Spark, Tableau, Machine Learning.

HADOOP DEVELOPER

Confidential, New York, NY

Responsibilities:

  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop.
  • Conducted POC on Hortonworks and suggested the best practice in terms HDP, HDFS platform
  • Set up Hortonworks Infrastructure from configuring clusters to Node
  • Installed Ambari server on the clouds
  • Setup security using Kerberos and AD on Hortonworks clusters/Cloudera CDH
  • Assign access to users by multiple users’ login.
  • Installed and configured CDH cluster, using Cloudera Manager for management of existing Hadoop cluster.
  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Extensively using Cloudera manager for managing multiple clusters with petabytes of data.
  • Having knowledge on documenting processes, server diagrams, preparing server requisition documents
  • Setting up the machines with Network Control, Static IP, Disabled Firewalls, Swap memory.
  • Managing the configuration of the cluster to the meet the needs of analysis whether I/O bound or CPU bound
  • Working on setting up 100 node production cluster and a 40 node backup cluster at two different data centers
  • Performance tune and manage growth of the O/S, disk usage, and network traffic
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Perform architecture design, data modeling, and implementation of Big Data platform and analytic applications for the consumer products
  • Analyze latest Big Data Analytic technologies and their innovative applications in both business intelligence analysis and new service offerings.
  • Load and transform large sets of structured, semi structured and unstructured data
  • Experience in managing and reviewing Hadoop log files.
  • Job management using Fair scheduler.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs
  • Managed datasets using Panda data frames and MySQL, queried MYSQL database queries from Python using Python-MySQL connector MySQL dB package to retrieve information.
  • Created Oozie workflows to run multiple MR, Hive and pig jobs.
  • Set up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
  • Develop Spark code using Scala and Spark-SQL for faster testing and data processing
  • Involved in the development of Spark Streaming for various data sources using Scala
  • Import the data from different sources like HDFS/MYSQL into Spark RDD.

Environment: Hadoop, HDFS, Pig, Sqoop, Shell Scripting, Ubuntu, Linux Red Hat, Spark, Scala, Hortonworks, Cloudera Manager, Apache Yarn, Python, Machine Learning, NLP (Natural Language Processing)

SDET

Confidential, New York, NY

Responsibilities:

  • Launched Amazon EC2 Instances using AWS (Linux/ Ubuntu/RHEL) and configured instances with respect to specific applications
  • Conducted functional testing, regression resting using Java-Selenium WebDriver and Data-driven framework and Keyword-driven framework using Page Factory model.
  • Experience in Selenium Grid for cross-platform, cross-browser and parallel tests using TestNG and Maven
  • Experienced in working with the Protractor.
  • Used Jenkins to execute the test scripts periodically on Selenium Grid for different platforms
  • Expertise in grouping of test suites, test cases and test methods for regression and functional testing usingTestNG annotations
  • Experienced in writing test cases and conducted sanity, regression, integration, unit test, black-box and white-box tests
  • Integrated Jenkins with Git version control to schedule automatic builds using predefined maven commands.
  • Developed BDD framework from scratch using Cucumber and defined steps, scenarios and features
  • Utilized Apache POI jar file to read test data from the excel spreadsheets and load them into test cases.
  • Administered and Engineered Jenkins for managing weekly Build, Test, and Deploy chain, SVN/GIT with Dev/Test/Prod Branching Model for weekly releases.
  • Handled Selenium Synchronization problems using Explicit & Implicit waits during regression testing.
  • Experienced in writing complex and dynamic Xpaths
  • Executed test cases in real device for both mobile app and mobile website.
  • Thorough experience in implementing Automation tools Selenium WebDriver, JUnit, TestNG, Eclipse, Git/GitHub, Jenkins, SOAP UI and REST with POSTMAN.
  • Used cucumber to automate services using Rest API.
  • Used runner classes in cucumber to generate step definition and also used tags to run different kinds of test suites like smoke, health check and regression.
  • Created profiles in maven to launch specific TestNG suite from Jenkins job
  • Implemented SOAP UI tool to test SOAP based architecture application to test SOAP services and RESTAPI.
  • Used the Groovy language to verify Webservices through SOAP UI.
  • Experience in testing the cloud platform.
  • Shared Daily Status Reports with all the team members, Team Leads, Managers

Environment: s: Selenium IDE, Groovy, RC Web Driver, Cucumber, HPQC, My Eclipse, JIRA, MySQL, Oracle, Java, JavaScript .Net, Python, Microservices, Restful API Testing, JMeter, VBScript, JUnit, TestNG, Firebug, Xpath, Windows

Hire Now