We provide IT Staff Augmentation Services!

Senior Hadoop Developer Resume

5.00/5 (Submit Your Rating)

Charlotte, NC

SUMMARY

  • 9+ years of overall software development experience on Big Data Technologies, HadoopEco system and experience programming inPython, Scala, and QA Automation tools.
  • Deep knowledge oftroubleshootingand tuning Sparkapplications andHivescripts to achieve optimal performance.
  • Worked withreal - time dataprocessing and streaming techniques usingSpark streaming, Storm andKafka.
  • Experience in moving data into and out of the HDFS andRelational Database Systems (RDBMS) usingApache Sqoop.
  • Experience developingKafka producers and Kafka Consumersfor streaming millions of events per second on streaming data.
  • Significant experience writing customUDF’sinHive and Spark.
  • In-depthunderstanding ofSparkArchitecture includingSpark Core, Spark SQL, Data Frames, Spark Streaming.
  • Experience in installation, configuration, Management, supporting and monitoringHadoop cluster using various distributions such asApache SPARK, Cloudera, and AWS Service console.
  • Designed and Implemented test environment on AWS and build data pipelines from scratch using multiple AWS services.
  • Hands-on experience with Amazon EC2, Amazon S3, Amazon RDS, Auto Scaling, Cloud Front, CloudWatch and other services of the AWS family.
  • Strong experienceproductionalizingend to end data pipelines on Hadoop platform.
  • Good experience is designing and implementing end to end Data Security and Governance within Hadoop Platform usingKerberos.
  • Prepare RTM (Requirement traceability Matrix) - Mapping the test cases with respective business requirements.
  • Designing, Development and Executing Test cases
  • Performing UAT in both testing and dev environment to ensure the application functionality.
  • Experience inarchitecting,designing, and buildingdistributed software systems.
  • Extensively worked onUNIX shell and Python scriptsto do the batch processing.
  • Experience in using various Hadoop Distributions likeCloudera, Hortonworks, and Amazon EMR.
  • Experience working with NoSQL database technologies, including MongoDB, Cassandra and HBase.
  • Experienced in writing complex MapReduce programs that work with different file formats.
  • Expertise in Database Design, Creation and Management of Schemas, writing Stored Procedures, Functions, DDL, DMLSQLqueries.
  • Worked onHive HBase integrationto load and retrieve data for real time processing usingRest API.
  • Experience in developing applications usingwaterfallandAgile(XPandScrum).
  • Strong Problem Solving and Analytical skills and abilities to make Balanced & Independent Decisions.

TECHNICAL SKILLS

Big Data Technologies: Hadoop, HDFS, Map Reduce, Hive, Pig, HBase,Impala, Sqoop, Flume,NoSQL (HBase, Cassandra), Spark, Kafka,Zookeeper, Oozie, Hue, Cloudera Manager, Amazon AWS, Hortonwork clusters

AWS Ecosystems: S3, EC2, EMR, Redshift, Athena, Glue, Lambda, SNS, CloudWatch

Languages: Shell Scripting, SQL, PL/SQL, Python, Scala

Operating systems: Windows, Linux, and Unix

IDE and Build Tools: PyCharm, Jupyter Notebook, MS Visual Studio, Maven, JIRA, Confluence

Version Control: Git, SVN, CVS

Web Services: RESTful, SOAP

Web Servers: Web Logic, Web Sphere, Apache Tomcat

PROFESSIONAL EXPERIENCE

Confidential, Charlotte, NC

Senior Hadoop Developer

Responsibilities:

  • Extensively worked on Metadata Driven Automated Ingestion Framework to ingest Structured & Semi-structured data to Centralized Hadoop Data Lake.
  • Developed Hive (HQL) Applications and Spark Applications to Extract, Transform, Merge data in batch mode.
  • Bullet-proofing data with various data quality proactive rules, designed in distributed fashion.
  • Performed data enrichment, analysis in stream, aggregation, splitting, schema translation, and format conversion to prepare the data for further business processing.
  • Access the data from Data Lake, explore, analyze, aggregate, curate, enrich and make it ready for consumption by using Apache Hive & Apache Spark.
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analysed them by running Hive queries and Pig scripts.
  • Design data for XML file layouts, generate schema and parse data using spark applications.
  • Developed pre-processing scripts using Python, Shell scripts to parse and standardize the data feed which then can be integrated/automated with the Metadata Driven Ingestion ETL framework.
  • Transform raw data as per data governance standard using Hive UDFs and PySpark.
  • Created Unit tests for Python and Spark Applications and developed helper methods and designed data processing pipelines.
  • Designing ETL Data Pipeline flow to ingest the data from RDBMS source to Hadoop using shell script, sqoop, package and MySQL.
  • Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better.
  • Worked on AWS EMR for data processing requirements and used AWS Lambda functions to create/terminate EMR clusters.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Oracle into HDFS using Sqoop.
  • Used AWS S3 buckets for data storage needs.
  • Expertise on version control tools such as SVN, GITHUB and automated data processes using enterprise client scheduler and Oozie.

Environment: HDFS, Map Reduce, Hive, Sqoop, Shell Scripts, MySQL, Teradata, HBase, Cloudera, AWS, Kafka, Spark, Scala and ETL, Python.

Confidential, PA

Hadoop Spark Developer

Responsibilities:

  • Examined data, identified outliers, inconsistencies and manipulated data to ensure data quality and integration.
  • Developed data pipeline using Sqoop, Spark and Hive to ingest, transform and analyse operational data.
  • Used Spark SQL with Scala for creating data frames and performed transformations on data frames.
  • Developed custom multi-threaded Java based ingestion jobs as well as Sqoop jobs for ingesting from FTP servers and data warehouses.
  • Real time streaming the data using Spark and Kafka.
  • Worked on troubleshooting spark application to make them more error tolerant.
  • Worked on fine-tuning spark applications to improve the overall processing time for the pipelines.
  • Extending HIVE and PIG core functionality by using custom User Defined Function's (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig using python.
  • Wrote Kafka producers to stream the data from external rest APIs to Kafka topics.
  • Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to HBase.
  • Experienced in handling large datasets using Spark in Memory capabilities using broadcasts variables in Spark, effective & efficient Joins, transformations, and other capabilities.
  • Experience with Kafka in understanding and performing thousands of megabytes of reads and writes per second on streaming data.
  • Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Worked extensively with Sqoop for importing data from Oracle.
  • Involved in creating Hive tables, loading, and analysing data using Hive scripts.
  • Created Hive tables, dynamic partitions, buckets for sampling and working on them using Hive QL.
  • Used MAVEN extensively for building jar files of Map Reduce programs & deployed to cluster.
  • Performing data migration from Legacy Databases RDBMS to HDFS using Sqoop.
  • Perform Tuning and Increase Operational efficiency on a continuous basis.
  • Worked on Spark SQL, reading/ Writing data from JSON file, text file, parquet file, schema RDD.
  • Worked on POC's with Apache Spark using Scala to implement spark in project.

Environment: Hadoop YARN, Spark-Core, Spark Streaming, Spark SQL, Scala, Kafka, Hive, Sqoop, HBase, Teradata, Power Centre, Tableau, Oozie, Oracle, Linux

Confidential, NJ

Big Data Developer

Responsibilities:

  • Installed, configured and job creation in Hadoop Map-Reduce, Pig, Hive, HBase, Spark RDD, Pair RDD, Flume, Oozie, Sqoop environment.
  • Involved in application migration from Hadoop to Spark for the fast processing.
  • Extracted data from Oracle database into HDFS using Sqoop.
  • Developed Oozie workflows to schedule and manage Sqoop, Hive, Pig jobs to Extract-Transform-Load process.
  • Used Flume, and configured it to use multiplexing, replicating, multi-source, interceptors, selectors to import log files from Web Servers in to HDFS/Hive.
  • Managed and scheduled Jobs on a Hadoop cluster using Shell Scripts.
  • Maintained Cluster co-ordination services through Zookeeper for system
  • Involved in filter the partition data based on different year range different format using Hive functions.
  • Defining a schema, creating new relations, performing Pig-Join, sorting and filtering using Pig-Group on large data sets.
  • Performed Map-Side joins and Reduce-Side joins for large tables. Involved in filtering the partition data based on different year range different format using with Hive functions.
  • Creating HBase tables for random read/writes by the map reduce programs.
  • Designed and developed entire pipeline from data ingestion to reporting tables.
  • Developing predictive analytic product for using ApacheSpark, SQL/HiveQL, JavaScript, and High Charts.
  • Performed data cleaning, integration, transformation, reduction by developing Map-Reduce jobs in java for data mining.
  • Creating Hive tables, loading data into it and customizing hive queries, internally operating in Map-Reduce way.
  • Performed Map-Side joins and Reduce-Side joins for large tables.
  • Used Cloudera Manager to monitor and manage Hadoop Cluster

Environment: HDFS, CDH, Big Insights, Apache Spark, Flume, Hive, Pig, Scala, Java, Sqoop, SQL, Perl, Shell scripting, C, C++, Java, Oracle, WebSphere Application Server, Spring, Hibernate, Struts, JMS

Confidential

Hadoop Developer

Responsibilities:

  • Installed and configuredHadoop MapReduce, HDFS,Developed multipleMapReducejobs in java for data cleaning and pre-processing.
  • Written MapReduce code to process & parsing the data from various sources & storing parsed data intoHBaseandHiveusingHBase-HiveIntegration.
  • Involved in developingUML Use case diagrams, Class diagrams, and Sequence diagramsusing Rational Rose.
  • Worked on moving all log files generated from various sources toHDFSfor further processing.
  • Developed workflows using customMapReduce, Pig, HiveandSqoop.
  • Creating various views forHBASEtables and utilizing the performance of Hive on top of HBASE.
  • Developedthe Apache Storm, Kafka,and HDFS integration project to do a real time data analysis.
  • Designed and developed theApache Stormtopologies for Inbound and outbound data for real time ETL to find the latest trends and keywords.
  • DevelopedMap Reduce programfor parsing and loading into HDFS information.
  • Built reusableHive UDF librariesfor business requirements which enabled users to use these UDF's inHive Querying.
  • WrittenHive UDFto sort Structure fields and return complex data type.
  • Responsible for loading data fromUNIX file systemtoHDFS.
  • Developed ETL Applications using HIVE, IMPALA & SQOOP and Automated using Oozie
  • Developed suit of Unit Test Cases forMapper, ReducerandDriverclasses usingMR Testing library.
  • Designed and developed a distributed processing system running to process binary files in parallel and crunch the analysis metrics into a Data Warehousing platform for reporting.
  • Developed workflow inControl Mto automate tasks of loading data intoHDFSand pre-processing withPIG.
  • Cluster co-ordination services throughZooKeeper
  • UsedMavenextensively for building jar files ofMapReduceprograms and deployed toCluster.
  • ModelledHivepartitions extensively for data separation and faster data processing and followedPigandHivebest practices for tuning.

Environment: Hive QL, MySQL, HBase, HDFS, HIVE, Eclipse, Hadoop, Oracle 11g, PL/SQL, SQL*PLUS, Toad 9.6, Flume, PIG, Oozie, Sqoop, Unix, Tableau, Cosmos.

Confidential

Java Developer

Responsibilities:

  • Competency in using XML Web Services by using SOAP to transfer data to supply chain and for Domain expertise Monitoring Systems.
  • Worked on Maven to build tool for building jar files.
  • Used the Hibernate framework (ORM) to interact with the database.
  • Knowledge in struts tiles framework for layout management.
  • Worked on design, analysis, and development and testing various phases of the application.
  • Developed user interface using JSP and HTML.
  • Used JDBC for the Database connectivity.
  • Involved in projects utilizing Java, JavaEE web applications in the creation of fully integrated client management systems.
  • Executed SQL statements for searching contactors depending on Criteria.
  • Development and integration of the application using Eclipse IDE.
  • Developed Junit for server-side code.
  • Involved in building, testing and debugging of JSP pages in the system.
  • Involved in multi-tiered J2EE design utilizing spring (IOC) architecture and Hibernate.
  • Involved in the development of front-end screens using technologies like JSP, HTML, AJAX and JavaScript.
  • Configured spring managed beans.
  • Spring Security API is used for configured security.
  • Investigated, debug and fixed the potential bugs in the implementation code.

Environment: Java, J2EE, JSP, Hibernate, Struts, XML Schema, SOAP, Java Script, PL/SQL, Junit, AJAX, HQL, JSP, HTML, JDBC, Maven, Eclipse.

We'd love your feedback!