We provide IT Staff Augmentation Services!

Big Data Developer Resume

2.00/5 (Submit Your Rating)

DallaS

SUMMARY

  • 8+ Years of IT experience in Big data Hadoop/Spark & J2EE including requirements Analysis and Design, Development, implementation, support, maintenance and enhancements in Finance & Insurance domains.
  • 6+ years of experience as Hadoop/Spark Developer with good knowledge of Java Map Reduce, Hive, Pig Latin, Scala and Spark.
  • Organizing data into tables, performing transformations, and simplifying complex queries with Hive.
  • Performing real - time interactive analysis on massive data sets stored in HDFS.
  • Strong knowledge and experience with Hadoop architecture and various components such as HDFS, YARN, Pig, Hive, Sqoop, Oozie, Flume, Spark, Kafka and Map Reduce programming paradigm.
  • Developed many Map/Reduce programs.
  • Experience in analyzing data using Spark SQL, HIVEQL, PIG Latin and experience in developing custom UDF s using Pig and Hive.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Good knowledge in using job scheduling tools like Oozie.
  • Experienced in using IDE Tool like Eclipse 3.x, IBM RAD 7.0
  • Experience in requirement gathering, analysis, planning, designing, coding and unit testing.
  • Strong work ethic with desire to succeed and make significant contributions to the organization.
  • Strong problem-solving skills, good communication, interpersonal skills and a good team player.
  • Have the motivation to take independent responsibility as well as ability to contribute and be a productive team member.

TECHNICAL SKILLS

Hadoop Technologies: Hadoop, HDFS, Hadoop Map-Reduce, Hive, SQOOP, Oozie, AVRO, Pig- Latin, Hue, CDH, Parquet, Scala, Spark, Python, AWS, S3, EMR, Apache Nifi,lambda, Glue

No Sql: HBase, DynamoDB

IDE/Tools: Eclipse, IntelliJ

Web and Application Servers: Web sphere, JBOSS, Tomcat

Core Competency Technologies: Java, OOPS, JSP, servlets, JDBC, java 5 / java 6 / java 7, CC++, shell scripting, Spark, SAS EG, Scala, Spark Streaming, Kafka

Testing & Issue Log tools: JUnit 4, Bugzilla, HP Quality Centre

SCM/Version control tools: PVCS, CVS, Sub Version, Bit bucket, Git

Build and continuous Integration: Maven, SBT

Data base: Oracle 8i/9i/10g, DB2 & MySQL 4.x/5.x

OS: UNIX, LINUX, Windows

PROFESSIONAL EXPERIENCE

Big Data Developer

Confidential, Dallas

Responsibilities:

  • Requirement analysis and mapping document creation.
  • Involved in requirements gathering, analysis, design, development and test.
  • ETL mapping, job development and complex transformations.
  • Analyzed the systems and met with end users and business teams to define the requirements.
  • Developed Spark Applications by using Java and Implemented Apache Spark data processing project to handle data.
  • Written Java program to retrieve data from HDFS and providing it to REST Services.
  • Used MAVEN for building jar files of MapReduce programs and deployed to cluster
  • Implemented business logic by writing UDFs in Java and used various UDFs from other sources.
  • Written multiple programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other codec file format.
  • Lead offshore team and coordinated with onsite team Skills.
  • Create coding architecture that leverages a reusable framework.
  • Provide weekly status updates to the higher management.
  • Ensure time delivery of projects to meet client needs.
  • Conducted unit testing, system testing, performance testing.
  • Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
  • Used lambda function for Infrastructure automation: EC2 automated snapshot creation

Environment: Java, Spark, Sql, Hive, Pyspark, Bitbucket, Hadoop, Hdfs, lambda

Big Data Developer

Confidential, Portland, OR

Responsibilities:

  • Member of the Confidential Consumer Data Engineering (CDe) team responsible for building the data pipelines to ingest the Confidential Consumer data
  • Experience designing and developing Cloud solutions for data and analytical workloads such as warehouses, Big Data, data lakes, real-time streaming and advanced analytics
  • Solicit detailed requirements, develop designs with input from the Sr. Data Architect, and develop code consistent with existing practice patterns and standards
  • Responsible for Migrating the process from Cloudera Hadoop to AWS NGAP2.0( Confidential Global Analytics Platform)
  • Designed and developed Airflow DAGS to validate the upstream files, Data sourcing and loading into Hive tables built on AWS S3
  • Working on Airflow to schedule the spark jobs
  • Migrating the existing Hive Scripts to PySpark scripts and optimizing the process
  • Used AWS Glue for the data transformation, validate and data cleansing
  • Created Filters, Groups, Sets on Tableau Reports.
  • Migrated Reports and Dashboards from Cognos to Tableau and Power BI.

Environment: AWS (S3, EC2, EMR, Athena), Snowflake, Airflow, Glue, PySpark, Tableau, Power BI, Hive and YARN.

Big Data Developer

Confidential, Mclean VA

Responsibilities:

  • Used NIFI as dataflow automation tool to ingest data into HDFS from different source systems.
  • Developed common methods to bulk load raw HDFS files into data frame
  • Developed common methods to persist data frame into S3, Red shift, HDFS, Hive
  • Prune the ingested data to remove duplicates by applying window functions and perform complex transformations to derive various metrics.
  • Used oozie scheduler to trigger spark jobs
  • Created UDF’s in spark to be used in spark sql.
  • Used Spark API to perform analytics on data in Hive using Scala programming.
  • Optimization of existing algorithms in Hadoop using Spark Context, Data Frames, Hive context.
  • Spark Dataframes are created in Scala for all the data files which then undergo transformations.
  • The filtered Dataframes are aggregated and transformed based on the business rules and saved as temporary hive tables for intermediate processing.
  • The RDDs and data frames undergo various transformations and actions and are stored in HDFS/S3 as parquet Files.
  • Copy data from S3 to Redshift tables using shell script
  • Performed performance tuning of spark jobs using broadcast joins, correct level of Parallelism and memory tuning.
  • Analyze and define client's business strategy and determine system architecture requirements to achieve business goals

Environment: Spark 2.2, Hadoop, Hive 2.1, HDFS, Java 1.8, Scala 2.11, HDP, AWS, Redshift, Oozie, Intellij, ORC, Shell Scripting, bitbucket, airflow, Python, Pyspark

Big Data Developer

Confidential, Albany, Newyork

Responsibilities:

  • Involved in loading raw data from AWS S3 to Redshift.
  • Involved in developing & scheduling DAGS using Airflow
  • Involved in implementing transformations on raw data and moved data to cleansed layer in parquet fileformat.
  • Involved in building logic for incremental data ingestion process.
  • Involved in data ingestion from oracle to aws s3.
  • Evaluated Spark jdbc (scala/pyspark/sparkR) vs Sqoop for data ingestion from On-Premise/On-cloud databases to AWS S3.
  • Created shell scripts to invoke pyspark commands.
  • Involved in creating hand shake process between autosys and airflow using autosys api in python
  • Involved in code automation for dags in airflow using jinja templates.
  • Responsibile in performing unit testing.
  • Involved in performing several proof of concepts in building the change data capture using next gen platform tools.
  • Spark RDDs are created for all the data files which then undergo transformations.
  • The filtered RDDs are aggregated and curated based on the business rules and converted into dataframes and saved as temporary hive tables for intermediate processing.
  • The RDDs and dataframes undergoes various transformations and actions and are stored in HDFS as parquet Files.

Environment: Spark JDBC, Pyspark, AirFlow, AWS S3, EMR, Kerberos, Autosys, Avro, Parquet, Github, Python, Oracle, Sqoop Teradata, CloudWatch, Redshift, Confluence, Agile

Java Developer

Confidential, Iselin, NJ

Responsibilities:

  • Involved in software development on web-based front-end applications.
  • Involved in development of the CSV files using the Data load.
  • Performed unit testing of the developed modules.
  • Involved in bug fixing, writing SQL queries & unit test cases.
  • Used Rational Application Developer (RAD).
  • Used Oracle as the Backend Database.
  • Involved in configuration and deployment of front-end application on RAD.
  • Involved in developing JSP’s for graphical user interface.
  • Implemented code for validating the input fields and displaying the error messages.

Environment: Java, JSP, Servlets, Apache Struts framework, WebSphere, RAD, Oracle, PVCS, TOAD

We'd love your feedback!