We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

3.00/5 (Submit Your Rating)

Columbus, OH

SUMMARY

  • 71/2 years of professional IT work experience in Analysis, Design, Administration, Development, Deployment and Maintenance of critical software and BigData applications.
  • Over 3+ years of experience in BigData platform as both Developer and Administrator.
  • Data Extractions & modeling support from Hive, Pyspark& Teradata using Complex SQL queries
  • Fine tuning the SQL queries - Big Data - Hive, Pyspark& Teradata
  • Involved in development of full life cycle implementation of ETL using Informatica, Oracle and halped wif designing the Date warehouse by defining Facts, Dimensions and relationships between them and applied the Corporate Standards in Naming Conventions.
  • Developed, tested and implemented ETL logic and support other development requests as required.
  • Write ETL jobs using PIG Latin and Worked on tuning the performance of HIVE queries.
  • Hands on experience in developing and deploying enterprise-based applications using major Hadoop ecosystem components like Map Reduce, YARN, Hive, Pig, HBase, Flume, Sqoop, Spark Streaming, SparkSQL, Storm, Kafka, Oozie and Cassandra.
  • Hands on experience in using MapReduce programming model for Batch processing of data stored in HDFS.
  • Exposure to administrative tasks such as installing Hadoop and its ecosystem components such as Hive and Pig
  • Installed and configured multiple Hadoop clusters of different sizes and wif ecosystem components like Pig, Hive, Sqoop, Flume, HBase, Oozie and Zookeeper.
  • Worked on all major distributions of Hadoop Cloudera and Hortonworks.
  • Responsible for designing and building a Data Lake using Hadoop and its ecosystem components.
  • Handled Data Movement, data transformation, Analysis and visualization across the lake by integrating it wif various tools.
  • Defined extract-translate-load (ETL) and extract-load-translate (ELT) processes for the Data Lake.
  • Developed ingestion scripts to load data from SQL Server, db2, different types flat files to HIVE usingScala and SparkETL process using python scripts wifPyspark RDD
  • Good Expertise in Planning, Installing and Configuring HadoopCluster based on the business needs.
  • Good experience in working wif cloud environment like Amazon Web Services (AWS)EC2 and S3
  • Transformed and aggregated data for analysis by implementing workflow management of Sqoop, Hive and Pig scripts.
  • Experience working on different file formats like Avro, Parquet, ORC, Sequence and Compression techniques like Gzip, Lzo, and Snappy in Hadoop.
  • Experience in retrieving data from databases like MYSQL, Teradata, Informix, DB2 and Oracle into HDFS using Sqoop and ingesting them into HBase and Cassandra.
  • Experience writing Oozie workflows and Job Controllers for job automation.
  • Integrated Oozie wif Hue and scheduled workflows for multiple Hive, Pig and Spark Jobs.
  • In-Depth knowledge of Scala and Experience building Spark applications using Scala.
  • Good experience working on Tableau and Spotfire and enabled the JDBC/ODBCdata connectivity from those to Hive tables.
  • Designed neat and insightful dashboards in Tableau.
  • Has worked and designed on array of reports which includes Crosstab, Chart, Drill-Down, Drill-Through, Customer-Segment, and Geodemographic segmentation.
  • Deep understanding of Tableau features such as site and server administration, Calculated fields, Table calculations, Parameters, Filter's (Normal and quick), highlighting, Level of detail, Granularity, Aggregation, Reference line and many more.
  • Adequate knowledge of Scrum, Agile and Waterfall methodologies.
  • Designed and developed multiple J2EEModel 2 MVC based Web Application using J2EE.
  • Worked on various Tools and IDEs like Eclipse, IBMRational, Apache Ant-Build Tool, MS-Office, PL/SQL Developer and SQLPlus.
  • Highly motivated wif the ability to work independently or as an integral part of a team and Committed to highest levels of profession.

TECHNICAL SKILLS

Big Data Technologies: Hadoop, HDFS, Hive, Map Reduce, Pig, HDFS, Hive, Map Reduce, Pig distribution, and H Base, Spark

Programming Languages: Java (5, 6, 7), Python, Scala, C/C++, XML Shell scripting, COBOL

Databases/RDBMS: MySQL, SQL/PL-SQL, MS-SQL Server 2005, Oracle …

Scripting/ Web Languages: JavaScript, HTML5, CSS3, XML, SQL, Shell, XML, J query, AJAX

ETL Tools: Cassandra, HBASE, ELASTIC SEARCH, Alteryx.

Operating Systems: Linux, Windows XP/7/8

Software Life Cycles: SDLC, Waterfall and Agile models

Office Tools: MS-Office, MS-Project and Risk Analysis tools, Visio

Utilities/Tools: Eclipse, Tomcat, NetBeans, JUnit, SQL, SOAP UI, ANT, Maven, Automation and MR-Unit

Cloud Platforms: Amazon (EC2, EMR, S3)

Version Control: CVS, Tortoise SVN

Visualization Tools: Tableau.

Servers IBM: WebSphere, WebLogic, Tomcat, and Red hat Satellite Server

PROFESSIONAL EXPERIENCE

Sr. Hadoop Developer

Confidential | Columbus, OH

Responsibilities:

  • Responsible for building scalable distributed data solutions usingHadoopand migrate legacy Retail applications TALEND ETL toHadoop.
  • Installed and configured Hive, Pig and Sqoop on the HDP 2.0 cluster.
  • Performed real time analytics on HBase using Java API and Fetched data to/from HBase by writing Map Reduce job.
  • Installed and configuredHadoopMap Reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and Processing using HDP 2.0
  • Wrote SQL queries to process the data using Spark SQL. Worked extensively wif importing metadata into Hive and migrated existing tables and applications to work on Hive and making the data available.
  • Extracted data from different databases and to copy into HDFS file system using Sqoop.
  • Created Talend Mappings to populate the data into Staging, Dimension and Fact tables.
  • Worked on project to retrieve log messages procured by leveraging Spark Streaming.
  • Designed Oozie jobs for the auto processing of similar data. Collect the data using Spark Streaming.
  • Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs. Used Scala collection framework to store and process the complex consumer information. Used Scala functional programming concepts to develop business logic.
  • Developed Pig scripts in the areas where extensive coding needs to be reduced.
  • An in depth understanding of Scala programming language along wif lift framework. Generating Scala and java classes from the respective APIs so dat they can be incorporated in the overall application.
  • Worked wif Spark Streaming to ingest data into spark engine. Extensively used for all and bulk collect to fetch large volumes of data from table.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
  • Handled importing of data from various data sources using Sqoop, performed transformations using Hive, MapReduce and loaded data into HDFS.
  • Worked on running reports in Linux environment. Worked on writing shell scripts to reports in Linux environment. Used Linux to manage files.
  • Parsed high-level design specification to simple ETL coding and mapping standards.
  • Developed complex Talend jobs mappings to load the data from various sources using different components. Design, develop and implement solutions using Talend Integration Suite.
  • Imported the data from different sources like Talend ETL, Local file system into Spark RDD. Experience wif developing and maintaining Applications written for Elastic, Map Reduce.
  • Responsible to manage data coming from sources (RDBMS) and involved in HDFS maintenance and loading of structured data.
  • Optimized several Map Reduce algorithms in Java according to the client requirement for big data analytics.
  • Responsible for importing data from MySQL to HDFS and provide the query capabilities using HIVE.
  • Used Sqoop to import the data from RDBMS toHadoopDistributed File System (HDFS) and later analyzed the imported data usingHadoopComponents.
  • Developed the Sqoop scripts to make the interaction between Pig and MySQL Database.
  • Involved in writing shell scripts in scheduling and automation of tasks.

Environment: Hadoop, Talend, Map Reducer, HDFS, Jenkins, Hive, Pig, Spark, Storm, Kafka, Flume, Sqoop, Oozie, SQL, Scala, Java,Hadoopand Eclipse

Sr. Hadoop Developer

Confidential | Kansas City, MO

Responsibilities:

  • Worked on Hadoop cluster scaling from 4 nodes in development environment to 8 nodes in pre-production stage and up to 24 nodes in production.
  • Involved in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive programs.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BIteam.
  • Extensively used Hive/HQL orHive queries to query or search for a particular string in Hive tables in HDFS.
  • Possess good Linux and HadoopSystemAdministrationskills, networking, shellscripting and familiarity wif open-source configuration management and deployment tools such as Chef.
  • Worked wif Puppet for application deployment
  • Experience in developing customized UDF's in java to extend Hive and Pig Latin functionality.
  • Created HBase tables to store various data formats of data coming from different sources.
  • Use Maven to build and deploy code in Yarncluster
  • Good knowledge on building Apachespark applications using Scala.
  • Developed several business services using Java RESTfulWebServices using SpringMVC framework
  • Managing and scheduling Jobs to remove the duplicate log data files in HDFS using Oozie.
  • Used Apache Oozie for scheduling and managing the Hadoop Jobs. Knowledge on HCatalog for Hadoop based storage management.
  • Expert in creating and designing data ingest pipelines using technologies such as spring Integration, Apache Storm-kafka
  • Used Flume extensively in gathering and moving log data files from ApplicationServers to a central location in Hadoop Distributed File System (HDFS).
  • Implemented test scripts to support test driven development and continuous integration.
  • Dumped the data from HDFS to MYSQL database and vice-versa using SQOOP
  • Responsible to manage data coming from different sources.
  • Experienced in Analyzing Cassandra database and compare it wif other open-source NoSQL databases to find which one of them better suites the current requirements.
  • Used File System check (FSCK) to check the health of files in HDFS.
  • Developed the UNIX shell scripts for creating the reports from Hivedata.
  • Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Used JAVA, J2EE application development skills wif Object Oriented Analysis and extensively involved throughout Software Development Life Cycle (SDLC)
  • Involved in the pilot of Hadoop cluster hosted on AmazonWebServices (AWS)
  • Extensively used Sqoop to get data from RDBMS sources like Teradata and Netezza.
  • Create a complete processing engine, based on Cloudera' s distribution
  • Involved in collecting metrics for Hadoop clusters using Ganglia and Ambari.
  • Extracted files from CouchDB, MongoDB through Sqoop and placed in HDFS for processed
  • Spark Streaming collects dis data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in NoSQL store (Hbase).
  • Configured Kerberos for the clusters

Environment: Hadoop, Map Reduce, HDFS, Ambari, Hive, Sqoop, Apache Kafka, Oozie, SQL, Alteryx, Flume, Spark, Cassandra, Scala, Java, AWS, GitHub.

Sr. Hadoop Developer/Admin

Confidential | Atlanta, GA

Responsibilities:

  • Worked closely wif Hortonworks’ Architects to design and build Big Data solutions on Hadoop.
  • Extensively created ETL complex mappings using transformations like Filter, aggregator, update strategy, lookup, router, stored procedure, sequence generator, XML and joiner using Informatica Power Center 8.1.1.
  • Administered big data applications so dat they are highly available and performing as expected.
  • Responsible for capacity planning and estimating the requirements for lowering or increasing the capacity of the Hadoop cluster.
  • Involved in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive programs.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BIteam.
  • Extensively used Hive/HQL or Hive queries to query or search for a particular string in Hive tables in HDFS.
  • Possess good Linux and Hadoop System Administration skills, networking, shell scripting and familiarity wif open source configuration management and deployment tools such as Chef.
  • Worked wif Puppet for application deployment.
  • Experience in developing customized UDF's in java to extend Hive and Pig Latin functionality.
  • Created HBase tables to store various data formats of data coming from different sources.
  • Use Maven to build and deploy code in Yarn cluster.
  • Good knowledge on building Apache spark applications using Scala.
  • Developed several business services using Java RESTful WebServices using SpringMVC framework.
  • Managing and scheduling Jobs to remove the duplicate log data files in HDFS using Oozie.
  • Used Apache Oozie for scheduling and managing the Hadoop Jobs. Knowledge on HCatalog for Hadoop based storage management.
  • Expert in creating and designing data ingest pipelines using technologies such as spring Integration, Apache Storm-Kafka.
  • Implemented test scripts to support test driven development and continuous integration.
  • Dumped the data from HDFS to MYSQL database and vice-versa using SQOOP.
  • Responsible to manage data coming from different sources.
  • Experienced in Analyzing Cassandra database and compare it wif other open-source NoSQL databases to find which one of them better suites the current requirements.
  • Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Used JAVA, J2EE application development skills wif Object Oriented Analysis and extensively involved throughout Software Development Life Cycle (SDLC).
  • Hands-on experience in cloud services including AWS (EC2, S3, LAMBDA, RDS, IAM), GCP (Kubernetes) and Docker Hub.
  • Used Amazon Web Services Elastic Compute Cloud (AWS EC2) to launch cloud instance
  • Infrastructure development on AWS by employing services such as EC2, S3, EMR, Redshift etc.
  • Extensively used Sqoop to get data from RDBMS sources like Teradata and Netezza.
  • Create a complete processing engine, based on Cloudera’s distribution.
  • Involved in collecting metrics for Hadoop clusters using Ganglia and Ambari.
  • Spark Streaming collects dis data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in NoSQL store (Hbase).
  • Configured Kerberos for the clusters.

Environment: Hadoop, Map Reduce, HDFS, Ambari, Hive, Sqoop, Apache Kafka, Oozie, SQL, Alteryx, Flume, Spark, Cassandra, Scala, Java, AWS, GitHub.

Sr. Hadoop Developer

Confidential | Fort Lauderdale, FL

Responsibilities:

  • Worked on analyzingHadoop stack and different big data analytic tools including Pig, Hive, HBase database and Sqoop.
  • Designing and implementing semi-structured data analytics platform leveraging Hadoop.
  • Worked on performance analysis and improvements for Hiveand Pig scripts at MapReduce job tuning level.
  • Involved in Optimization of HiveQueries.
  • Developed a frame work to handle loading and transform large sets of unstructured data from UNIX system to HIVE tables.
  • Involved in Data Ingestion to HDFS from various data sources.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Extensively used ApacheSqoop for efficiently transferring bulk data between Apache Hadoop and relational databases.
  • Automated Sqoop, hive and pig jobs using Oozie scheduling.
  • Extensive knowledge in NoSQL databases like HBase
  • Worked extensively wif importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
  • Responsible for continuous monitoring and managing Elastic MapReduce (EMR) cluster through AWS console.
  • Has good knowledge on writing and using the user defined functions in HIVE, PIG and MapReduce.
  • Helped business team by installing and configuring Hadoop ecosystem components along wif Hadoop admin.
  • Developed multiple Kafka Producers and Consumers from scratch as per the business requirements.
  • Worked on loading log data into HDFS through Flume
  • Created and maintained technical documentation for executing Hive queries and PigScripts.
  • Worked on debugging and performance tuning of Hive&Pigjobs.
  • Used Oozie to schedule various jobs on Hadoopcluster.
  • Used Hive to analyses the partitioned and bucketed data.
  • Worked on establishing connectivity between Tableau andHive.

Environment: Hadoop, Talend, Map Reducer, HDFS, Jenkins, Hive, Pig, Spark, Storm, Kafka, Flume, Sqoop, Oozie, SQL, Scala, Java,Hadoopand Eclipse.

Hadoop Developer

Confidential | Tampa, FL

Responsibilities:

  • Worked on a live90 nodes Hadoop clusterrunningCDH4.1
  • Worked wif highlyunstructured and semi structured data of 120 TBin size (360 TB)
  • Developed hive queries on data logs to perform a trend analysis of user behavior on various online modules.
  • Developed the Pig UDF'S to pre-process the data for analysis. Involved in the setup and deployment of Hadoop cluster.
  • Developed Map Reduce programs for some refined queries on big data. Involved in loading data from UNIX file system to HDFS.
  • Implemented AWS solutions using E2C, S3, RDS, EBS, Elastic Load Balancer, Auto scaling groups,
  • Loaded data into HDFS and extracted the data from MySQL into HDFS using Sqoop.
  • Exported the analyzed data to the relational databases using Sqoop and generated reports for the BI team.
  • Managing and scheduling jobs on a Hadoop cluster using Oozie.
  • Along wif the Infrastructure team, involved in design and developed Kafka and Storm based data pipeline
  • Designed and configured Kafka cluster to accommodate heavy throughput of 1 million messages per second. Used Kafka producer 0.6.3 API's to produce messages.
  • Provided daily code contribution, worked in a test-driven development.
  • Installed, Configured Talend ETL on single and multi-server environments.
  • Developed Merge jobs in Pythonto extract and load data into MySQL database.
  • Created and modified several UNIX shellScripts according to the changing needs of the project and client requirements. Developed UNIX shellscripts to call Oracle PL/SQL packages and contributed to standard framework.
  • Developed Simple to complex Map/reduce Jobs using Hive. Implemented Partitioning and bucketing in Hive.
  • Mentored analyst and test team for writing Hive Queries. Involved in setting up of HBase to use HDFS.
  • Extensively used Pig for data cleansing.
  • Loaded streaming log data from various Webservers into HDFS using Flume.
  • Along wif the Infrastructure team, involved in design and developed Kafka and Storm based
  • Performed benchmarking of the No-SQL databases, Cassandra and Hbase streams.
  • Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
  • Knowledgeable of Spark and Scala mainly in framework exploration for transition from Hadoop/MapReduce to Spark.
  • Supported in setting up QA environment and updating configurations for implementing scripts wif Pig and Sqoop.
  • Involved in collecting and aggregating large amounts of log data using Apache Flumeand staging data in HDFS for further analysis.
  • Configured Flume to extract the data from the web server output files to load into HDFS.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing wif Pig.

Environment: Unix Shell Scripting, Python, Oracle 11g, DB2, HDFS, Kafka, Storm, Spark, ETL, 1Java (jdk1.7), Pig, Linux, Cassandra, MapReduce, Ms Access, Toad, SQL, Scala, MySQL Workbench, XML, No-SQL, MapReduce, SOLR, HBase, Hive, Sqoop, Flume, Talend, Oozie.

Hadoop/Java Developer

Confidential

Responsibilities:

  • Involved in analysis, design and development of Expense Processing system.
  • Designed Use Case Diagrams, Class Diagrams and Sequence Diagrams and Object Diagrams to model the detail design of the application using UML.
  • Installed, configuring, and administrating Hadoop cluster of major Hadoop distributions.
  • Written MapReduce jobs in Java, Pig and Python.
  • Extensively worked wif workflow/schedulers like Oozie and Scripting using Unix Shell Script, Python, and Perl.
  • Worked wif SQL and NoSQL (MongoDB, Cassandra, Hadoop) data structures
  • Managing and reviewing Hadoop log files
  • Running Hadoop streaming jobs to process terabytes of xml format data
  • Worked on Hadoop Cluster migrations or Upgrades
  • Extensively worked wif Cloudera Hadoop distribution components and custom packages
  • Build Reporting using Tableau
  • Applied ETL principles and best practices
  • Developed the application using Spring MVC Framework. Performed Client-side validations using Angular JavaScript& Node JavaScript
  • Developed user interface using JSP, HTML, CSS and Java Script to simplify the complexities of the application.
  • Used AJAX Framework for Dynamic Searching of Bill Expense Information.
  • Created dynamic end to end REST API wif Loopback-Node JS Framework.
  • Configured the spring framework for the entire business logic layer.
  • Developed code using various patterns like Singleton, Front Controller, Adapter, DAO, MVC, Template, Builder and Factory Patterns
  • Developed one-to-many, many-to-one, one-to-one annotation-based mappings in Hibernate.
  • Developed DAO service methods to populate the domain model objects using Hibernate.
  • Used Spring Frame Work’s Bean Factory for initializing services.
  • Used Java collections API extensively such as List, Sets and Maps.
  • Wrote DAO classes using spring and Hibernate to interact wif database for persistence.

Environment: Java, Struts, Hibernate ORM, Loop Back Framework, Spring Application Framework, EJB, JSP, Servlets, JMS, XML, SOAP, WSDL, JDBC, JavaScript, UML, HTML, Angular JS, Node JS, JNDI, Subversion (SVN), Maven, Log4J, Spring Source Tool Suite (STS), Windows XP, Web Sphere App server, Oracle.

We'd love your feedback!