We provide IT Staff Augmentation Services!

Senior Hadoop Developer Resume

3.00/5 (Submit Your Rating)

Raleigh, NC

SUMMARY

  • Over 8 years of IT experience as a Developer, Designer & QA Engineer with cross - platform integration experience using Hadoop Ecosystem, Java, and functional automation
  • Hands on experience in installing, configuring, and architectingHadoopand Hortonworks clusters and services - HDFS, Map Reduce, Yarn, Pig, Hive, Oozie, Flume, HBase, Spark, Sqoop, Flume and Oozie
  • Experienced in loading data to Hive partitions and created buckets in Hive and developed Map Reduce jobs to automate transfer the data from HBase
  • Experienced in developing Java UDFs for Hive and Pig
  • Experienced in NoSQL DBs like HBase, MongoDB and Cassandra and wrote advanced query and sub-query
  • Experienced in defining detailed application software test plans, including organization, participant, schedule, test, and application coverage scope
  • Gathered and defined functional and UI requirements for software applications
  • Experienced in real time analytics with Apache Spark RDD, Data Frames and Streaming API& Used Spark Data Frames API over Cloudera platform to perform analytics on Hive data
  • Experienced in integratingHadoopwith Kafka, experienced in uploading Clickstream data from to HDFS.
  • Expert in utilizing Kafka for messaging and publishing subscribe messaging system.
  • Experienced with Docker and Kubernetes on multiple cloud providers, from helpingdevelopersbuild and containerize their application (CI/CD) to deploying either on public or private cloud.
  • Experienced in troubleshooting and automated deployment to web and application servers like WebSphere, WebLogic, JBOSS and Tomcat.
  • Experienced in deploy to Integrate with multiple build systems and to provide an application model handling multiple projects.
  • Hands on experience with integrating Rest API's to cloud environment to access resources.
  • Experience on Migrating SQL database toAzuredataLake,Azuredatalake Analytics,AzureSQL Database,DataBricks andAzureSQLDatawarehouse and Controlling and granting database access and Migrating On premise databases toAzureDatalake store usingAzureDatafactory.
  • Experience in analyzingdatausing HiveQL, and Map Reduce Programs.
  • Experienced in ingestingdatainto HDFS from various Relational databases like MYSQL, Oracle, DB2, Teradata, Postgres using sqoop.
  • Experienced in importing real time streaming logs and aggregating thedatato HDFS using Kafka and Flume.
  • Experience with UNIX shell Scripting and power shell scripting to automate deployments.
  • Experience in working with Azure integrated with VSTS 2015 cloud deployments, creating websites, and deploying binaries onAzurecloud,Azureprovisioning, deployment, and monitoring

PROFESSIONAL EXPERIENCE

Senior Hadoop Developer

Confidential - Raleigh, NC

Responsibilities:

  • Worked in a team that built big data analytic solutions using ClouderaHadoopDistribution.
  • Migrated RDBMS (Oracle, MySQL) data into HDFS and Hive using Sqoop.
  • Extensively used Pig scripts for data cleansing and optimization.
  • Worked on creating and optimizing Hive scripts based on business requirements.
  • Installed and configuredHadoopMap Reduce, HDFS, developed multiple Map Reduce jobs in Java for data cleaning and pre-processing.
  • Involved in loading data from UNIX file system to HDFS.
  • Wrote Map Reduce jobs to discover trends in data usage by users.
  • Used Map Reduce JUnit for unit testing.
  • Involved in managing and reviewingHadooplog files.
  • Involved in runningHadoopstreaming jobs to process terabytes of text data.
  • Loaded and transformed large sets of structured, semi structured and unstructured data.
  • Exported the result set from HIVE to MySQL using Shell scripts.
  • Used Zookeeper for various types of centralized configurations.
  • Involved in maintaining various Unix Shell scripts.
  • Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
  • Created and managed Hive external tables inHadoopfor Data Warehousing.
  • Developed customized UDFs in Java for extending Pig and Hive functionality.
  • Extracted the needed data from the server into HDFS and Bulk-Loaded the cleaned data into HBase.
  • Integrated the Hive Warehouse with HBase.
  • Well versed with various Hadoop distributions which include Cloudera (CDH), Hortonworks (HDP),AzureHD Insight.
  • Implemented and deployed a system to WindowsAzure, Microsoft's new cloud solution.
  • Configured Oozy to run multiple Hive and Pig jobs which run independently with time and data availability.
  • Worked on optimization and performance tuning of Map Reduce, Hive & Pig Jobs
  • Worked on Agile Methodology and responsible for monitoring and managing the development and production and worked withAzurePortal to Provide IAAS resources to client.
  • Worked withAzurePaaS and IaaS Solutions likeAzureWeb Apps, Web Roles, Worker Roles, and SQLAzureandAzureStorage.
  • Worked withAzurecompute services,AzureWeb apps,AzureDataFactory & Storage,AzureMedia & Content delivery,AzureNetworking,AzureHybrid Integration, andAzureIdentity & Access Management.
  • ImplementedAzureDatabricks to otherAzureservices (such asAzureStorage) in a more secure manner using service endpoints.
  • Designed, Planned, and createdAzurevirtual machines, Implementing, and managing virtual networking withinAzureand connect it to on-premises environments.
  • Worked onAzureDatabricks to use custom DNS and configure network security group (NSG) rules to specify egress traffic restrictions.
  • Designed and configuredAzureVirtual Networks (VNets), subnets,Azurenetwork settings, DHCP address blocks, DNS settings, and Security policies to provide high-secure environment for running Linux in Virtual Machines (VM's) and applications.
  • Configured Broder Gateway Protocol (BGP) to enable the connection betweenDataCenters andAzurecloud to exchange routing and reachability information.
  • Design and Implementation of Azure Site Recovery in both Disaster Recovery Scenario and for migrating the workloads from On-Premises toAzure.

Environment: CDH5, HDFS, Map Reduce, Pig, Hive, Sqoop, HBase, Oozie, Hue, Java, Linux

Hadoop Developer

Confidential - Plano, TX

Responsibilities:

  • Developed Spark applications using Scala for easyHadooptransitions.
  • Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API
  • Developed Spark code using Scala and Spark-SQL for faster processing and testing.
  • Implemented Spark programs using Spark and analyzed the SQL scripts and designed the solutions
  • Worked directly with the Big Data Architecture Team which created the foundation of this Enterprise Analytics initiative in aHadoop-based Data Lake.
  • Created multi-nodeHadoopand Spark clusters in AWS instances to generate terabytes of data and stored it in AWS HDFS.
  • Extracted real time feed using Kafka and Spark streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
  • Upgraded theHadoopcluster from CDH4.7 to CDH5.2 and worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration.
  • Developed Spark scripts to import large files from Amazon S3 buckets and imported the data from different sources like HDFS/HBase into Spark RDD.
  • Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation and worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
  • Worked on Installing Cloudera Manager, CDH and install the JCE Policy File to Create a Kerberos Principal for the Cloudera Manager Server, enabling Kerberos Using the Wizard.
  • Developed Spark jobs using Scala and Python on top of Yarn/MRv2 for interactive and Batch Analysis.

Environment: Hadoop2.0, YARN Resource Manager, SQL, Python, Kafka, Hive, Sqoop 1.4.6, Qlik Sense, Tableau, Oozie, Jenkins, Linux, Scala 2.12, Spark 2.4.3.

Data Engineer

Confidential, Florida

Responsibilities:

  • Involved in Requirement gathering, Business Analysis and translated business requirements into Technical design in Hadoop and Big Data
  • Involved in SQOOP implementation which helps in loading data from various RDBMS sources to Hadoop systems and vice versa.
  • Involved in HBASE setup and storing data into HBASE, which will be used for further analysis.
  • Worked on Cloud Health tool to generate AWS reports and dashboards for cost analysis.
  • Written a python script which automates to launch the EMR cluster and configures the Hadoop applications.
  • Involved in working with Spark on top of Yarn/MRv2 for interactive and Batch Analysis
  • Worked closely with AWS EC2 infrastructure teams to troubleshoot complex issues
  • Involved in managing and monitoring Hadoop cluster using Cloudera Manager.
  • Used Python and Shell scripting to build pipelines.

Data Mining Analyst

Confidential - Denver, CO

Responsibilities:

  • Created data partitions on large data sets in S3 and DDL on partitioned data.
  • Converted all Hadoop jobs to run in EMR by configuring the cluster according to the data size.
  • Monitor and Troubleshoot Hadoop jobs using Yarn Resource Manager and EMR job logs using Genie and kibana.
  • Support all business areas of DELL with critical data analysis that helps team members make profitable decisions as a forecast expert and business analyst and utilize tools for business optimization and analytics.
  • Ensures technology roadmaps are incorporated into data and database designs.
  • Create list and summary view reports.
  • Handling and communicating with business and understanding the problems from a business perspective rather than as adeveloperperspective.
  • Used Spring and .Net framework for dependency injection.
  • Worked with back end database such as Oracle and MS SQL.
  • Developed Web application and services using C# and ASP .net.
  • Preparing the Unit Test Plan and System Test Plan documents.
  • Preparation & Execution of unit test cases and Troubleshooting and debugging.
  • Worked with complex SQL queries, SQL Joins and Stored Procedures using TOAD for data retrieval and update.
  • Used JUnit and NUnit for performing Unit Testing.
  • Used Log4J to capture the logs that included runtime exceptions

Environment: HDFS, Java, Sqoop, Spark, Yarn, Clouder Manager, CloudHealth, Splunk, Oracle, Elastic search, Impala, Jira,, Shell/Perl Scripting, Python, AWS (EC2, S3, EMR, S3, VPC, RDS Lambda, CLoudwatch etc.)

We'd love your feedback!