Big Data Developer Resume
DallaS
SUMMARY
- 8+ Years of IT experience in Big data Hadoop/Spark & J2EE including requirements Analysis and Design, Development, implementation, support, maintenance and enhancements in Finance & Insurance domains.
- 6+ years of experience as Hadoop/Spark Developer with good knowledge of Java Map Reduce, Hive, Pig Latin, Scala and Spark.
- Organizing data into tables, performing transformations, and simplifying complex queries with Hive.
- Performing real - time interactive analysis on massive data sets stored in HDFS.
- Strong knowledge and experience with Hadoop architecture and various components such as HDFS, YARN, Pig, Hive, Sqoop, Oozie, Flume, Spark, Kafka and Map Reduce programming paradigm.
- Developed many Map/Reduce programs.
- Experience in analyzing data using Spark SQL, HIVEQL, PIG Latin and experience in developing custom UDF s using Pig and Hive.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Good knowledge in using job scheduling tools like Oozie.
- Experienced in using IDE Tool like Eclipse 3.x, IBM RAD 7.0
- Experience in requirement gathering, analysis, planning, designing, coding and unit testing.
- Strong work ethic with desire to succeed and make significant contributions to the organization.
- Strong problem-solving skills, good communication, interpersonal skills and a good team player.
- Have the motivation to take independent responsibility as well as ability to contribute and be a productive team member.
TECHNICAL SKILLS
Hadoop Technologies: Hadoop, HDFS, Hadoop Map-Reduce, Hive, SQOOP, Oozie, AVRO, Pig- Latin, Hue, CDH, Parquet, Scala, Spark, Python, AWS, S3, EMR, Apache Nifi,lambda, Glue
No Sql: HBase, DynamoDB
IDE/Tools: Eclipse, IntelliJ
Web and Application Servers: Web sphere, JBOSS, Tomcat
Core Competency Technologies: Java, OOPS, JSP, servlets, JDBC, java 5 / java 6 / java 7, CC++, shell scripting, Spark, SAS EG, Scala, Spark Streaming, Kafka
Testing & Issue Log tools: JUnit 4, Bugzilla, HP Quality Centre
SCM/Version control tools: PVCS, CVS, Sub Version, Bit bucket, Git
Build and continuous Integration: Maven, SBT
Data base: Oracle 8i/9i/10g, DB2 & MySQL 4.x/5.x
OS: UNIX, LINUX, Windows
PROFESSIONAL EXPERIENCE
Big Data Developer
Confidential, Dallas
Responsibilities:
- Requirement analysis and mapping document creation.
- Involved in requirements gathering, analysis, design, development and test.
- ETL mapping, job development and complex transformations.
- Analyzed the systems and met with end users and business teams to define the requirements.
- Developed Spark Applications by using Java and Implemented Apache Spark data processing project to handle data.
- Written Java program to retrieve data from HDFS and providing it to REST Services.
- Used MAVEN for building jar files of MapReduce programs and deployed to cluster
- Implemented business logic by writing UDFs in Java and used various UDFs from other sources.
- Written multiple programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other codec file format.
- Lead offshore team and coordinated with onsite team Skills.
- Create coding architecture that leverages a reusable framework.
- Provide weekly status updates to the higher management.
- Ensure time delivery of projects to meet client needs.
- Conducted unit testing, system testing, performance testing.
- Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
- Used lambda function for Infrastructure automation: EC2 automated snapshot creation
Environment: Java, Spark, Sql, Hive, Pyspark, Bitbucket, Hadoop, Hdfs, lambda
Big Data Developer
Confidential, Portland, OR
Responsibilities:
- Member of the Confidential Consumer Data Engineering (CDe) team responsible for building the data pipelines to ingest the Confidential Consumer data
- Experience designing and developing Cloud solutions for data and analytical workloads such as warehouses, Big Data, data lakes, real-time streaming and advanced analytics
- Solicit detailed requirements, develop designs with input from the Sr. Data Architect, and develop code consistent with existing practice patterns and standards
- Responsible for Migrating the process from Cloudera Hadoop to AWS NGAP2.0( Confidential Global Analytics Platform)
- Designed and developed Airflow DAGS to validate the upstream files, Data sourcing and loading into Hive tables built on AWS S3
- Working on Airflow to schedule the spark jobs
- Migrating the existing Hive Scripts to PySpark scripts and optimizing the process
- Used AWS Glue for the data transformation, validate and data cleansing
- Created Filters, Groups, Sets on Tableau Reports.
- Migrated Reports and Dashboards from Cognos to Tableau and Power BI.
Environment: AWS (S3, EC2, EMR, Athena), Snowflake, Airflow, Glue, PySpark, Tableau, Power BI, Hive and YARN.
Big Data Developer
Confidential, Mclean VA
Responsibilities:
- Used NIFI as dataflow automation tool to ingest data into HDFS from different source systems.
- Developed common methods to bulk load raw HDFS files into data frame
- Developed common methods to persist data frame into S3, Red shift, HDFS, Hive
- Prune the ingested data to remove duplicates by applying window functions and perform complex transformations to derive various metrics.
- Used oozie scheduler to trigger spark jobs
- Created UDF’s in spark to be used in spark sql.
- Used Spark API to perform analytics on data in Hive using Scala programming.
- Optimization of existing algorithms in Hadoop using Spark Context, Data Frames, Hive context.
- Spark Dataframes are created in Scala for all the data files which then undergo transformations.
- The filtered Dataframes are aggregated and transformed based on the business rules and saved as temporary hive tables for intermediate processing.
- The RDDs and data frames undergo various transformations and actions and are stored in HDFS/S3 as parquet Files.
- Copy data from S3 to Redshift tables using shell script
- Performed performance tuning of spark jobs using broadcast joins, correct level of Parallelism and memory tuning.
- Analyze and define client's business strategy and determine system architecture requirements to achieve business goals
Environment: Spark 2.2, Hadoop, Hive 2.1, HDFS, Java 1.8, Scala 2.11, HDP, AWS, Redshift, Oozie, Intellij, ORC, Shell Scripting, bitbucket, airflow, Python, Pyspark
Big Data Developer
Confidential, Albany, Newyork
Responsibilities:
- Involved in loading raw data from AWS S3 to Redshift.
- Involved in developing & scheduling DAGS using Airflow
- Involved in implementing transformations on raw data and moved data to cleansed layer in parquet fileformat.
- Involved in building logic for incremental data ingestion process.
- Involved in data ingestion from oracle to aws s3.
- Evaluated Spark jdbc (scala/pyspark/sparkR) vs Sqoop for data ingestion from On-Premise/On-cloud databases to AWS S3.
- Created shell scripts to invoke pyspark commands.
- Involved in creating hand shake process between autosys and airflow using autosys api in python
- Involved in code automation for dags in airflow using jinja templates.
- Responsibile in performing unit testing.
- Involved in performing several proof of concepts in building the change data capture using next gen platform tools.
- Spark RDDs are created for all the data files which then undergo transformations.
- The filtered RDDs are aggregated and curated based on the business rules and converted into dataframes and saved as temporary hive tables for intermediate processing.
- The RDDs and dataframes undergoes various transformations and actions and are stored in HDFS as parquet Files.
Environment: Spark JDBC, Pyspark, AirFlow, AWS S3, EMR, Kerberos, Autosys, Avro, Parquet, Github, Python, Oracle, Sqoop Teradata, CloudWatch, Redshift, Confluence, Agile
Java Developer
Confidential, Iselin, NJ
Responsibilities:
- Involved in software development on web-based front-end applications.
- Involved in development of the CSV files using the Data load.
- Performed unit testing of the developed modules.
- Involved in bug fixing, writing SQL queries & unit test cases.
- Used Rational Application Developer (RAD).
- Used Oracle as the Backend Database.
- Involved in configuration and deployment of front-end application on RAD.
- Involved in developing JSP’s for graphical user interface.
- Implemented code for validating the input fields and displaying the error messages.
Environment: Java, JSP, Servlets, Apache Struts framework, WebSphere, RAD, Oracle, PVCS, TOAD