We provide IT Staff Augmentation Services!

Big Data Developer Resume

5.00/5 (Submit Your Rating)

Seattle, WashingtoN

PROFESSIONAL SUMMARY:

  • Over 6 years of overall IT Industry and Software Development experience with 3+ years of experience in Big Data, Development
  • Experience in installation, upgrading, configuration, monitoring supporting and managing in Hadoop clusters using Cloudera CDH4, CDH5, Hortonworks HDP 2.1, 2.2 and 2.3 on Ubuntu, RedHat, CentOS systems.
  • Worked on components of CDH and HDP including HDFS, MapReduce, Job tracker, Task tracker, Sqoop, Zookeeper, YARN, Oozie, Hive, Hue, Flume, HBase, Fair Scheduler, Spark and Kafka.
  • Deployed Hadoop clusters on public and private cloud environments like AWS and OpenStack
  • Involved in vendor selection and capacity planning for the Hadoop Clusters in production.
  • Experienced in Administering the Linux systems to deploy Hadoop cluster and monitoring the cluster using Nagios and Ganglia.
  • Experienced in performing backup, recovery, failover and DR practices on multiple platforms.
  • Implemented Kerberos and LDAP authentication of all the services across Hadoop clusters
  • Experienced in automating the provisioning processes and system resources using Puppet.
  • Implemented Hadoop - based solutions to store archives and backups from multiple sources.
  • Familiar with importing and exporting data using Sqoop from RDBMS MySQL, Oracle, Teradata and used fast loaders and connectors
  • Built ingestion framework using flume for streaming logs and aggregating the data into HDFS.
  • Worked with application team via scrum to provide operational support, install Hadoop updates, patches and version upgrades as required.
  • Imported and exported data in and out of HDFS and processed data for commercial analytics
  • Installed, monitored and performance tuned standalone multi-node clusters of Kafka.
  • Successfully loaded files to Hive and HDFS from MongoDB, Cassandra, and HBase.
  • Implemented authentication using Kerberos and authentication using Apache Sentry.
  • Exposure to installing Hadoop and its ecosystem components such as Hive and Pig.
  • Experienced in collaborative platforms including Jira, Rally, SharePoint and Discovery
  • Experience in understanding and managing Hadoop Log Files, experience in managing the Hadoop infrastructure with Cloudera Manager.
  • Experienced on SQL DBA in HA and DDR like replication, log shipping, mirroring and clustering and database security and permissions
  • Worked with highly transactional merchandise and investment in SQL databases with PCI, HIPAA compliance involving data encryption with certificates and security keys at various levels

PROFESSIONAL EXPERIENCE:

BIG DATA DEVELOPER

Confidential, Seattle, Washington

Responsibilities:

  • Developed architecture document, process documentation, server diagrams, requisition documents
  • Developed stream pipelines and consumed real time events from Kafka using Kafka streams API and Kafka clients.
  • Configured Spark streaming to get incoming messages from Kafka topics and store the stream data in to HDFS.
  • Responsible for building scalable distributed data solutions on Cloudera distributed Hadoop.
  • Involved in using spark streaming and SPARK jobs for ongoing transactions of customers and Spark SQL to handle structured data in Hive.
  • Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generate visualizations using Tableau.
  • Written PySpark code to calculate aggregate data like mean, Co-Variance, Standard Deviation and etc.
  • Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
  • Written UDF in Scala and used in sampling of large data sets.
  • Used distinctive data formats while stacking the data into HDFS.
  • Worked in AWS environment for development and deployment of custom Hadoop applications.
  • Worked on NiFi workflows development for data ingestion from multiple sources. Involved in architecture and design discussions with the technical team and interface with other teams to create efficient and consistent Solutions.
  • Involved in creating Shell scripts to simplify the execution of all other scripts (Pig, Hive, Sqoop, Impala and MapReduce) and move the data inside and outside of HDFS.
  • Creating files and tuned the SQL queries in Hive utilizing HUE.
  • Involved in converting Hive/Sql queries into Spark transformations using Spark RDD's.
  • Experienced in working with spark eco system using Spark SQL and Scala queries on different formats like Text file, CSV file.
  • Expertized in implementing Spark using Scala and Spark SQL for faster testing and processing of data responsible to manage data from different sources.
  • Worked with NoSQL databases like HBase in making HBase tables to load expansive arrangements of semi structured data.
  • Acted for bringing in data under HBase using HBase shell also HBase client API.
  • Used Kafka to patch up a customer activity taking after pipeline as a course of action of steady appropriate subscribe supports.
  • Developed workflow in Oozie to automate the jobs.
  • Provided design recommendations and thought leadership to sponsors/stakeholders that improved review process and resolved technical problems
  • Developed complete end to end Big-Data Processing in Hadoop Ecosystems.

Environment: Hadoop, Map Reduce, HDFS, Hive, Java, Oozie, Linux, XML, Java 6, Eclipse, Oracle 10g, PL/SQL, YARN, Spark, Pig, Sqoop, DB2, java, XML, UNIX, HCatalog.

BIG DATA DEVELOPER

Confidential, St Louis, Missouri

Responsibilities:

  • Worked extensively on Hadoop Components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and Map Reduce programming.
  • Developed Map-Reduce programs to clean and aggregate the data.
  • Responsible for building scalable distributed data solutions using Hadoop and Spark
  • Worked hands on with ETL process using Java
  • Implemented Hive Ad-hoc queries to handle Member data from different data sources such as Epic and Centricity.
  • Implemented Hive UDF's and did performance tuning for better results.
  • Analyzed the data by performing Hive queries and running Pig Scripts.
  • Involved in loading data from UNIX file system to HDFS
  • Implemented optimized map joins to get data from different sources to perform cleaning operations before applying the algorithms.
  • Experience in using Sqoop to import and export the data from Netezza and Oracle DB into HDFS and HIVE.
  • Implemented POC to introduce Spark Transformations.
  • Worked with NoSQL database HBase, MongoDB and Cassandra to create tables and store data
  • Handled importing data from various data sources, performed transformations using Hive and Map Reduce, streamed using Flume and loaded data into HDFS
  • Worked in transforming data from map reduce into HBase as bulk operations.
  • Implemented CRUD operations on HBase data using thrift API to get real time insights.
  • Installed Ozzie workflow engine to run multiple MapReduce, Hive, Impala, Zookeeper and Pig jobs which run independently with time and data availability
  • Developed workflow in Oozie to manage and schedule jobs on Hadoop cluster for generating reports on nightly, weekly and monthly basis.
  • Used Zookeeper to manage Hadoop clusters and Oozie to schedule job workflows.
  • Implemented test scripts to support test driven development and continuous integration.
  • Involved in data ingestion into HDFS using Apache Sqoop from a variety of sources using connectors like JDBC and import parameters
  • Coordination with Hadoop Admin's during deployment to production
  • Developed Pig Latin Scripts to extract data from log files and store them to HDFS. Created User Defined Functions (UDFs) to pre- process data for analysis.
  • Developing Scripts and Batch Job to schedule various Hadoop Program.
  • Continuously monitoring and managing the Hadoop cluster through Cloudera Manager
  • Participated in design and implementation discussion for the developing Cloudera 5 Hadoop eco system.
  • Used JIRA and Confluence to update tasks and maintain documentation.
  • Worked in Agile development environment in sprint cycles of two weeks by dividing and organizing tasks. Participated in daily scrum and other design related meetings.
  • Created final reports of analyzed data using Apache Hue and Hive Browser and generated graphs for studying by the data analytics team.
  • Used SQOOP to export the analyzed data to relational database for analysis by data analytics team

Environment: Hadoop, Cloudera Hadoop, Map Reduce, Hive, Pig, Sqoop, Flume, HBase, Java, JSON, Spark, HDFS, YARN, Oozie Scheduler, Zookeeper, Mahout, Linux, UNIX, ETL, My SQL.

BIG DATA DEVELOPER

Confidential, New York, NY

Responsibilities:

  • Developed stream pipelines and consumed real time events from Kafka using Kafka streams API and Kafka clients.
  • Configured Spark streaming to get incoming messages from Kafka topics and store the stream data in to HDFS.
  • Responsible for building scalable distributed data solutions on Cloudera distributed Hadoop.
  • Involved in using spark streaming and SPARK jobs for ongoing transactions of customers and Spark SQL to handle structured data in Hive.
  • Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generate visualizations using Tableau.
  • Written PySpark code to calculate aggregate data like mean, Co-Variance, Standard Deviation and etc.
  • Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
  • Written UDF in Scala and used in sampling of large data sets.
  • Used distinctive data formats while stacking the data into HDFS.
  • Worked in AWS environment for development and deployment of custom Hadoop applications.
  • Worked on NiFi workflows development for data ingestion from multiple sources. Involved in architecture and design discussions with the technical team and interface with other teams to create efficient and consistent Solutions.
  • Involved in creating Shell scripts to simplify the execution of all other scripts (Pig, Hive, Sqoop, Impala and MapReduce) and move the data inside and outside of HDFS.
  • Creating files and tuned the SQL queries in Hive utilizing HUE.
  • Involved in converting Hive/Sql queries into Spark transformations using Spark RDD's.
  • Experienced in working with spark eco system using Spark SQL and Scala queries on different formats like Text file, CSV file.
  • Expertized in implementing Spark using Scala and Spark SQL for faster testing and processing of data responsible to manage data from different sources.
  • Worked with NoSQL databases like HBase in making HBase tables to load expansive arrangements of semi structured data.
  • Acted for bringing in data under HBase using HBase shell also HBase client API.
  • Used Kafka to patch up a customer activity taking after pipeline as a course of action of steady appropriate subscribe supports.
  • Developed workflow in Oozie to automate the jobs.
  • Provided design recommendations and thought leadership to sponsors/stakeholders that improved review process and resolved technical problems
  • Developed complete end to end Big-Data Processing in Hadoop Ecosystems.

SDET

Confidential, New York, NY

Responsibilities:

  • Launched Amazon EC2 Instances using AWS (Linux/ Ubuntu/RHEL) and configured instances with respect to specific applications
  • Conducted functional testing, regression resting using Java-Selenium WebDriver and Data-driven framework and Keyword-driven framework using Page Factory model.
  • Experience in Selenium Grid for cross-platform, cross-browser and parallel tests using TestNG and Maven
  • Experienced in working with the Protractor.
  • Used Jenkins to execute the test scripts periodically on Selenium Grid for different platforms
  • Expertise in grouping of test suites, test cases and test methods for regression and functional testing using TestNG annotations
  • Experienced in writing test cases and conducted sanity, regression, integration, unit test, black-box and white-box tests
  • Integrated Jenkins with Git version control to schedule automatic builds using predefined maven commands.
  • Developed BDD framework from scratch using Cucumber and defined steps, scenarios and features
  • Utilized Apache POI jar file to read test data from the excel spreadsheets and load them into test cases.
  • Administered and Engineered Jenkins for managing weekly Build, Test, and Deploy chain, SVN/GIT with Dev/Test/Prod Branching Model for weekly releases.
  • Handled Selenium Synchronization problems using Explicit & Implicit waits during regression testing.
  • Experienced in writing complex and dynamic Xpaths
  • Executed test cases in real device for both mobile app and mobile website.
  • Thorough experience in implementing Automation tools Selenium WebDriver, JUnit, TestNG, Eclipse, Git/GitHub, Jenkins, SOAP UI and REST with POSTMAN.
  • Used cucumber to automate services using Rest API.
  • Used runner classes in cucumber to generate step definition and also used tags to run different kinds of test suites like smoke, health check and regression.
  • Created profiles in maven to launch specific TestNG suite from Jenkins job
  • Implemented SOAP UI tool to test SOAP based architecture application to test SOAP services and RESTAPI.
  • Used the Groovy language to verify Webservices through SOAP UI.
  • Experience in testing the cloud platform.
  • Shared Daily Status Reports with all the team members, Team Leads, Managers

Environment: s: Selenium IDE, Groovy, RC Web Driver, Cucumber, HPQC, My Eclipse, JIRA, MySQL, Oracle, Java, JavaScript .Net, Python, Microservices, Restful API Testing, JMeter, VBScript, JUnit, TestNG, Firebug, Xpath, Windows

We'd love your feedback!