Senior Hadoop Developer Resume Charlotte, NC - Hire IT People

SUMMARY

9+ years of overall software development experience on Big Data Technologies, HadoopEco system and experience programming inPython, Scala, and QA Automation tools.
Deep knowledge oftroubleshootingand tuning Sparkapplications andHivescripts to achieve optimal performance.
Worked withreal - time dataprocessing and streaming techniques usingSpark streaming, Storm andKafka.
Experience in moving data into and out of the HDFS andRelational Database Systems (RDBMS) usingApache Sqoop.
Experience developingKafka producers and Kafka Consumersfor streaming millions of events per second on streaming data.
Significant experience writing customUDF’sinHive and Spark.
In-depthunderstanding ofSparkArchitecture includingSpark Core, Spark SQL, Data Frames, Spark Streaming.
Experience in installation, configuration, Management, supporting and monitoringHadoop cluster using various distributions such asApache SPARK, Cloudera, and AWS Service console.
Designed and Implemented test environment on AWS and build data pipelines from scratch using multiple AWS services.
Hands-on experience with Amazon EC2, Amazon S3, Amazon RDS, Auto Scaling, Cloud Front, CloudWatch and other services of the AWS family.
Strong experienceproductionalizingend to end data pipelines on Hadoop platform.
Good experience is designing and implementing end to end Data Security and Governance within Hadoop Platform usingKerberos.
Prepare RTM (Requirement traceability Matrix) - Mapping the test cases with respective business requirements.
Designing, Development and Executing Test cases
Performing UAT in both testing and dev environment to ensure the application functionality.
Experience inarchitecting,designing, and buildingdistributed software systems.
Extensively worked onUNIX shell and Python scriptsto do the batch processing.
Experience in using various Hadoop Distributions likeCloudera, Hortonworks, and Amazon EMR.
Experience working with NoSQL database technologies, including MongoDB, Cassandra and HBase.
Experienced in writing complex MapReduce programs that work with different file formats.
Expertise in Database Design, Creation and Management of Schemas, writing Stored Procedures, Functions, DDL, DMLSQLqueries.
Worked onHive HBase integrationto load and retrieve data for real time processing usingRest API.
Experience in developing applications usingwaterfallandAgile(XPandScrum).
Strong Problem Solving and Analytical skills and abilities to make Balanced & Independent Decisions.

TECHNICAL SKILLS

Big Data Technologies: Hadoop, HDFS, Map Reduce, Hive, Pig, HBase,Impala, Sqoop, Flume,NoSQL (HBase, Cassandra), Spark, Kafka,Zookeeper, Oozie, Hue, Cloudera Manager, Amazon AWS, Hortonwork clusters

AWS Ecosystems: S3, EC2, EMR, Redshift, Athena, Glue, Lambda, SNS, CloudWatch

Languages: Shell Scripting, SQL, PL/SQL, Python, Scala

Operating systems: Windows, Linux, and Unix

IDE and Build Tools: PyCharm, Jupyter Notebook, MS Visual Studio, Maven, JIRA, Confluence

Version Control: Git, SVN, CVS

Web Services: RESTful, SOAP

Web Servers: Web Logic, Web Sphere, Apache Tomcat

PROFESSIONAL EXPERIENCE

Confidential, Charlotte, NC

Senior Hadoop Developer

Responsibilities:

Extensively worked on Metadata Driven Automated Ingestion Framework to ingest Structured & Semi-structured data to Centralized Hadoop Data Lake.
Developed Hive (HQL) Applications and Spark Applications to Extract, Transform, Merge data in batch mode.
Bullet-proofing data with various data quality proactive rules, designed in distributed fashion.
Performed data enrichment, analysis in stream, aggregation, splitting, schema translation, and format conversion to prepare the data for further business processing.
Access the data from Data Lake, explore, analyze, aggregate, curate, enrich and make it ready for consumption by using Apache Hive & Apache Spark.
Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analysed them by running Hive queries and Pig scripts.
Design data for XML file layouts, generate schema and parse data using spark applications.
Developed pre-processing scripts using Python, Shell scripts to parse and standardize the data feed which then can be integrated/automated with the Metadata Driven Ingestion ETL framework.
Transform raw data as per data governance standard using Hive UDFs and PySpark.
Created Unit tests for Python and Spark Applications and developed helper methods and designed data processing pipelines.
Designing ETL Data Pipeline flow to ingest the data from RDBMS source to Hadoop using shell script, sqoop, package and MySQL.
Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better.
Worked on AWS EMR for data processing requirements and used AWS Lambda functions to create/terminate EMR clusters.
Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Oracle into HDFS using Sqoop.
Used AWS S3 buckets for data storage needs.
Expertise on version control tools such as SVN, GITHUB and automated data processes using enterprise client scheduler and Oozie.

Environment: HDFS, Map Reduce, Hive, Sqoop, Shell Scripts, MySQL, Teradata, HBase, Cloudera, AWS, Kafka, Spark, Scala and ETL, Python.

Confidential, PA

Hadoop Spark Developer

Responsibilities:

Examined data, identified outliers, inconsistencies and manipulated data to ensure data quality and integration.
Developed data pipeline using Sqoop, Spark and Hive to ingest, transform and analyse operational data.
Used Spark SQL with Scala for creating data frames and performed transformations on data frames.
Developed custom multi-threaded Java based ingestion jobs as well as Sqoop jobs for ingesting from FTP servers and data warehouses.
Real time streaming the data using Spark and Kafka.
Worked on troubleshooting spark application to make them more error tolerant.
Worked on fine-tuning spark applications to improve the overall processing time for the pipelines.
Extending HIVE and PIG core functionality by using custom User Defined Function's (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig using python.
Wrote Kafka producers to stream the data from external rest APIs to Kafka topics.
Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to HBase.
Experienced in handling large datasets using Spark in Memory capabilities using broadcasts variables in Spark, effective & efficient Joins, transformations, and other capabilities.
Experience with Kafka in understanding and performing thousands of megabytes of reads and writes per second on streaming data.
Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Worked extensively with Sqoop for importing data from Oracle.
Involved in creating Hive tables, loading, and analysing data using Hive scripts.
Created Hive tables, dynamic partitions, buckets for sampling and working on them using Hive QL.
Used MAVEN extensively for building jar files of Map Reduce programs & deployed to cluster.
Performing data migration from Legacy Databases RDBMS to HDFS using Sqoop.
Perform Tuning and Increase Operational efficiency on a continuous basis.
Worked on Spark SQL, reading/ Writing data from JSON file, text file, parquet file, schema RDD.
Worked on POC's with Apache Spark using Scala to implement spark in project.

Environment: Hadoop YARN, Spark-Core, Spark Streaming, Spark SQL, Scala, Kafka, Hive, Sqoop, HBase, Teradata, Power Centre, Tableau, Oozie, Oracle, Linux

Confidential, NJ

Big Data Developer

Responsibilities:

Installed, configured and job creation in Hadoop Map-Reduce, Pig, Hive, HBase, Spark RDD, Pair RDD, Flume, Oozie, Sqoop environment.
Involved in application migration from Hadoop to Spark for the fast processing.
Extracted data from Oracle database into HDFS using Sqoop.
Developed Oozie workflows to schedule and manage Sqoop, Hive, Pig jobs to Extract-Transform-Load process.
Used Flume, and configured it to use multiplexing, replicating, multi-source, interceptors, selectors to import log files from Web Servers in to HDFS/Hive.
Managed and scheduled Jobs on a Hadoop cluster using Shell Scripts.
Maintained Cluster co-ordination services through Zookeeper for system
Involved in filter the partition data based on different year range different format using Hive functions.
Defining a schema, creating new relations, performing Pig-Join, sorting and filtering using Pig-Group on large data sets.
Performed Map-Side joins and Reduce-Side joins for large tables. Involved in filtering the partition data based on different year range different format using with Hive functions.
Creating HBase tables for random read/writes by the map reduce programs.
Designed and developed entire pipeline from data ingestion to reporting tables.
Developing predictive analytic product for using ApacheSpark, SQL/HiveQL, JavaScript, and High Charts.
Performed data cleaning, integration, transformation, reduction by developing Map-Reduce jobs in java for data mining.
Creating Hive tables, loading data into it and customizing hive queries, internally operating in Map-Reduce way.
Performed Map-Side joins and Reduce-Side joins for large tables.
Used Cloudera Manager to monitor and manage Hadoop Cluster

Environment: HDFS, CDH, Big Insights, Apache Spark, Flume, Hive, Pig, Scala, Java, Sqoop, SQL, Perl, Shell scripting, C, C++, Java, Oracle, WebSphere Application Server, Spring, Hibernate, Struts, JMS

Confidential

Hadoop Developer

Responsibilities:

Installed and configuredHadoop MapReduce, HDFS,Developed multipleMapReducejobs in java for data cleaning and pre-processing.
Written MapReduce code to process & parsing the data from various sources & storing parsed data intoHBaseandHiveusingHBase-HiveIntegration.
Involved in developingUML Use case diagrams, Class diagrams, and Sequence diagramsusing Rational Rose.
Worked on moving all log files generated from various sources toHDFSfor further processing.
Developed workflows using customMapReduce, Pig, HiveandSqoop.
Creating various views forHBASEtables and utilizing the performance of Hive on top of HBASE.
Developedthe Apache Storm, Kafka,and HDFS integration project to do a real time data analysis.
Designed and developed theApache Stormtopologies for Inbound and outbound data for real time ETL to find the latest trends and keywords.
DevelopedMap Reduce programfor parsing and loading into HDFS information.
Built reusableHive UDF librariesfor business requirements which enabled users to use these UDF's inHive Querying.
WrittenHive UDFto sort Structure fields and return complex data type.
Responsible for loading data fromUNIX file systemtoHDFS.
Developed ETL Applications using HIVE, IMPALA & SQOOP and Automated using Oozie
Developed suit of Unit Test Cases forMapper, ReducerandDriverclasses usingMR Testing library.
Designed and developed a distributed processing system running to process binary files in parallel and crunch the analysis metrics into a Data Warehousing platform for reporting.
Developed workflow inControl Mto automate tasks of loading data intoHDFSand pre-processing withPIG.
Cluster co-ordination services throughZooKeeper
UsedMavenextensively for building jar files ofMapReduceprograms and deployed toCluster.
ModelledHivepartitions extensively for data separation and faster data processing and followedPigandHivebest practices for tuning.

Environment: Hive QL, MySQL, HBase, HDFS, HIVE, Eclipse, Hadoop, Oracle 11g, PL/SQL, SQL*PLUS, Toad 9.6, Flume, PIG, Oozie, Sqoop, Unix, Tableau, Cosmos.

Confidential

Java Developer

Responsibilities:

Competency in using XML Web Services by using SOAP to transfer data to supply chain and for Domain expertise Monitoring Systems.
Worked on Maven to build tool for building jar files.
Used the Hibernate framework (ORM) to interact with the database.
Knowledge in struts tiles framework for layout management.
Worked on design, analysis, and development and testing various phases of the application.
Developed user interface using JSP and HTML.
Used JDBC for the Database connectivity.
Involved in projects utilizing Java, JavaEE web applications in the creation of fully integrated client management systems.
Executed SQL statements for searching contactors depending on Criteria.
Development and integration of the application using Eclipse IDE.
Developed Junit for server-side code.
Involved in building, testing and debugging of JSP pages in the system.
Involved in multi-tiered J2EE design utilizing spring (IOC) architecture and Hibernate.
Involved in the development of front-end screens using technologies like JSP, HTML, AJAX and JavaScript.
Configured spring managed beans.
Spring Security API is used for configured security.
Investigated, debug and fixed the potential bugs in the implementation code.

Environment: Java, J2EE, JSP, Hibernate, Struts, XML Schema, SOAP, Java Script, PL/SQL, Junit, AJAX, HQL, JSP, HTML, JDBC, Maven, Eclipse.

We provide IT Staff Augmentation Services!

Senior Hadoop Developer Resume

Charlotte, NC

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship