Big Data Developer Resume Long Beach, CA - Hire IT People

SUMMARY:

Over 8 years of experience in application development and design using Hadoop echo system tools and Java /J2EE Technologies.
Developed and built frameworks that integrate big data and advanced analytics to make business decisions.
Extensive experience in installing, configuring and using eco system components like Hadoop Map reduce , HDFS, Hive, Pig, Flume, Sqoop and Spark.
Preprocessed and cleansed big data for better analysis.
Certified Cloudera Spark and Hadoop Developer
Experience in Cloudera distributions (CDH) and Hortonworks Data Platform (HDP)
Created various use cases using massive public big data sets. Ran various performance tests for verifying the efficacy of Map Reduce, PIG and HIVE
Migrated to Azure cloud and created end - to-end architecture for running in Cloud.
Have experience on ADF, ADLS, Blob Storage, HD Insights, Ranger, S3, IR, IoTHub, Stream Analytics, etc.
Good knowledge of Amazon Web Services (AWS) components like EC2, EMR, S3, CloudWatch etc.
Proficiency in developing applications using Java, JSP, JavaScript, JDBC, Selenium, Oracle ADF, Python
Strong coding and debugging skills in Java Platform
Experience in shipping enterprise products, web/mobile UI applications to a large customer base
Experienced in Full Life Cycle development of software products
Good at Servlets, JSPs and MVC framework
Have excellent analytical and problem-solving skills and ability to learn new technologies quickly

TECHNICAL SKILLS:

Learning: Can rapidly adapt to new environments and designs.

Apache Hadoop: HDFS, Hive, Pig, MapReduce, Flume, Sqoop and Spark

Cloud: HDInsight, ADLS, ADF, S3, EMR, EC2, NACL, Security groups

Programming Language & Scripts: Java, J2EE, UNIX, Java Script, SQL, UML, XML, CSS, JSON

Enterprise Java: JSP, Servlets, JSF, EJB, JMS, Socket Programming, Java Beans

Software Design: Design Patterns, Data Structures, Object Oriented design

Tools & Framework: TIBCO Composite, JSF, Spring, Web Services, Selenium, JUnit, Maven, Ant

Web Servers: Weblogic, Web Sphere, Tomcat, Oracle OC4J, Oracle Weblogic Server

IDE: Eclipse, Visual Studio, XCode, GIT

PROFESSIONAL EXPERIENCE:

Confidential, Long Beach, CA

Big Data Developer

Responsibilities:

Worked on a live 30 node (Prod) and 6 node (UAT) big data production cluster CDH 5.13.3
Developed and maintained the complex Claims Semantic Pipeline for weekly full load and incremental loads
Weekly full load of claims is validated against Netezza and verified for any discrepancies
Resolved the state issue (Universal and Medicare state) in the data set for reference for all the pipelines
Developed the aggregated datasets and lookup columns from Claims dataset and all reference tables
Integrated SIU pipeline into the existing Claims pipeline and retired the SIU pipeline
Used windowing techniques and UDFs in SparkSQL
Develop and in corporate the enhancements into the existing claims pipeline
Monitor and maintain weekly talend job and resolve any failures to meet the SLAs
Convert existing SQL logic to SparkSQL for Pharmacy pipeline and optimize it
Improve the performance of Provider datasets and incorporated all the provider data into claims
Worked with PARQUET file formats using SNAPPY compression to fasten network transfer of big data
Created Hive tables and views using Impala. Implemented partitioning, bucketing in Hive for better organization of data
Build Power BI dashboards to validate the data against Netezza
Currently in the process of automating the check before the start of pipeline to validate the L0 data
Collaborated with Data Management team on the business requirements and retirement of Netezza
Follow Agile Scrum methodology in JIRA during project
Gained very good business knowledge on claim processing

Confidential, Houston, TX

Big Data Developer

Responsibilities:

Involved in the complete SDLC of Big data project that includes requirement analysis, design, coding, testing and production.
Worked on a live 24 nodes and 4 nodes (Test) big data cluster of type Hadoop 3.6 on Linux.
Experience working on both Non-domain and domain joined clusters.
Worked with highly unstructured, structured and semi structured data of 30 TB in size (90 TB with replication factor of 3)
Ingested structured data from TIBCO Composite Data Virtualization tool into ADLS using Sqoop
Created Shell scripts to automate the Sqoop jobs.
Developed Ambari workflows for scheduling and orchestrating the ETL process
Worked with ORC file formats using ZLIB compression to fasten network transfer of big data
Ingested structured big data from Teradata, Oracle, Netezza, Postgres, SQLServer into ADLS using Azure Data Factory (ADF).
Created pipelines in ADF to create cluster, ingest, create hive tables, enable daily triggers.
Involved in converting Hive queries into Spark transformations using Spark Structured API.
Used PySpark (Python) and Scala for analyzing the data in Non-domain joined Spark 2.3 cluster
Scripted Python Code to transfer data from Hive tables into Data Science Sandbox using SFTP.
Very good experience in monitoring and managing the Hadoop cluster using Ambari.
Created dashboards in Power BI based on the Incident record data to generate metrics and Hive tables using ODBC connection.
Gained very good business knowledge on oil and gas industry, well pad, weather, mud pressure and exploration analysis.
Collaborated with Digital Security, Data Scientists, Palantir and Catalog team to ensure data quality and availability.
Follow Agile Scrum methodology in Visual Studio Team Services during the course of project.

Confidential, Orlando, FL

Hadoop Developer

Responsibilities:

Worked on a live 80 nodes Hadoop cluster running CDH5.10
Worked with structured and semi structured data of 150 TB in size (450 TB with replication factor of 3)
Created and worked Sqoop jobs with incremental load to populate Hive External tables.
Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
Developed Hive queries and UDFs to analyze/transform the data in HDFS.
Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance
Established custom MapReduce programs to analyze data and used Pig Latin to clean unwanted data
Used Pattern matching algorithms in PIG to recognize the fraudulent customer across different sources and built risk profiles for each customer and stored the result data into HDFS
Used Oozie to orchestrate the MapReduce jobs and worked with HCatalog to open up access to Hive's Metastore

Confidential

Software Development Engineer

Responsibilities:

Worked on 10 nodes Hadoop Cluster
Worked on semi structured and structured data of 15TB in size (45TB with replication factor of 3)
Loaded data from disparate data sets using Sqoop and flume.
Used sqoop to import/export data between RDBMS and hive tables.
Imported logs from web servers with Flume to ingest the data into HDFS.
Created Sqoop jobs with incremental load to populate Hive External tables.
Have a very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
Writing Pig Latin Scripts to perform transformations as per the use case requirement.
Worked with different file formats and compression techniques.

Environment: Cloudera Enterprise, Hadoop, MapReduce, Pig, Hive, Avro, Sqoop, HBase

Confidential

Member Technical Staff

Responsibilities:

Created functional and design specification documents.
Analyzed on how to display the data/metrics collected on Enterprise Management (EM) and develop the relevant pages.
Worked on User-Interface using JSPs and Servlets for the Enterprise Manager framework
Discover all the Universal Content Management servers installed on the content server and identify their statuses.
Extracted the configuration details of the server.
Integrate the targets (SOA, WebLogic, WebCenter) to the EM Tree.
Create Dynamic Monitoring Services (DMS) messages for the Content Management.
Add the DMS instrumentation to the Content Server code to extract the metrics and validating and testing them.
Identified the cached queries, active databases, documents waiting, and number of service requests in the Content Server
Analyzed the system performance and monitor system status.
Used Oracle Application Development Framework (ADF) for end-to-end Java-based application development.
Resolve the issues on the server based on the priority.

Environment: Java, J2EE(Servlets), OOPS concepts, Oracle DB, JDBC

Confidential

Internship

Responsibilities:

Prepare Requirement, Functional and Design Specification documents.
Worked on Oracle JDeveloper, which is a free integrated development environment.
Dynamic peer discovery has to be done both statically and dynamically (Using SLP and NAPTR)
Created Realm and Peer routing tables.
Invoked TCP Connection to send and receive data over it.
Test each method with JUnit.
Used EMMA Code Coverage to help improve the coverage of the Project.
Implemented Failover and Failback procedures

Environment: Java, J2EE (Socket Programming), Design Patterns, Seagull Traffic generator

We provide IT Staff Augmentation Services!

Big Data Developer Resume

Long Beach, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship