Big Data Developer Resume Dallas - Hire IT People

SUMMARY

8+ Years of IT experience in Big data Hadoop/Spark & J2EE including requirements Analysis and Design, Development, implementation, support, maintenance and enhancements in Finance & Insurance domains.
6+ years of experience as Hadoop/Spark Developer with good knowledge of Java Map Reduce, Hive, Pig Latin, Scala and Spark.
Organizing data into tables, performing transformations, and simplifying complex queries with Hive.
Performing real - time interactive analysis on massive data sets stored in HDFS.
Strong knowledge and experience with Hadoop architecture and various components such as HDFS, YARN, Pig, Hive, Sqoop, Oozie, Flume, Spark, Kafka and Map Reduce programming paradigm.
Developed many Map/Reduce programs.
Experience in analyzing data using Spark SQL, HIVEQL, PIG Latin and experience in developing custom UDF s using Pig and Hive.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
Good knowledge in using job scheduling tools like Oozie.
Experienced in using IDE Tool like Eclipse 3.x, IBM RAD 7.0
Experience in requirement gathering, analysis, planning, designing, coding and unit testing.
Strong work ethic with desire to succeed and make significant contributions to the organization.
Strong problem-solving skills, good communication, interpersonal skills and a good team player.
Have the motivation to take independent responsibility as well as ability to contribute and be a productive team member.

TECHNICAL SKILLS

Hadoop Technologies: Hadoop, HDFS, Hadoop Map-Reduce, Hive, SQOOP, Oozie, AVRO, Pig- Latin, Hue, CDH, Parquet, Scala, Spark, Python, AWS, S3, EMR, Apache Nifi,lambda, Glue

No Sql: HBase, DynamoDB

IDE/Tools: Eclipse, IntelliJ

Web and Application Servers: Web sphere, JBOSS, Tomcat

Core Competency Technologies: Java, OOPS, JSP, servlets, JDBC, java 5 / java 6 / java 7, CC++, shell scripting, Spark, SAS EG, Scala, Spark Streaming, Kafka

Testing & Issue Log tools: JUnit 4, Bugzilla, HP Quality Centre

SCM/Version control tools: PVCS, CVS, Sub Version, Bit bucket, Git

Build and continuous Integration: Maven, SBT

Data base: Oracle 8i/9i/10g, DB2 & MySQL 4.x/5.x

OS: UNIX, LINUX, Windows

PROFESSIONAL EXPERIENCE

Big Data Developer

Confidential, Dallas

Responsibilities:

Requirement analysis and mapping document creation.
Involved in requirements gathering, analysis, design, development and test.
ETL mapping, job development and complex transformations.
Analyzed the systems and met with end users and business teams to define the requirements.
Developed Spark Applications by using Java and Implemented Apache Spark data processing project to handle data.
Written Java program to retrieve data from HDFS and providing it to REST Services.
Used MAVEN for building jar files of MapReduce programs and deployed to cluster
Implemented business logic by writing UDFs in Java and used various UDFs from other sources.
Written multiple programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other codec file format.
Lead offshore team and coordinated with onsite team Skills.
Create coding architecture that leverages a reusable framework.
Provide weekly status updates to the higher management.
Ensure time delivery of projects to meet client needs.
Conducted unit testing, system testing, performance testing.
Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
Used lambda function for Infrastructure automation: EC2 automated snapshot creation

Environment: Java, Spark, Sql, Hive, Pyspark, Bitbucket, Hadoop, Hdfs, lambda

Big Data Developer

Confidential, Portland, OR

Responsibilities:

Member of the Confidential Consumer Data Engineering (CDe) team responsible for building the data pipelines to ingest the Confidential Consumer data
Experience designing and developing Cloud solutions for data and analytical workloads such as warehouses, Big Data, data lakes, real-time streaming and advanced analytics
Solicit detailed requirements, develop designs with input from the Sr. Data Architect, and develop code consistent with existing practice patterns and standards
Responsible for Migrating the process from Cloudera Hadoop to AWS NGAP2.0( Confidential Global Analytics Platform)
Designed and developed Airflow DAGS to validate the upstream files, Data sourcing and loading into Hive tables built on AWS S3
Working on Airflow to schedule the spark jobs
Migrating the existing Hive Scripts to PySpark scripts and optimizing the process
Used AWS Glue for the data transformation, validate and data cleansing
Created Filters, Groups, Sets on Tableau Reports.
Migrated Reports and Dashboards from Cognos to Tableau and Power BI.

Environment: AWS (S3, EC2, EMR, Athena), Snowflake, Airflow, Glue, PySpark, Tableau, Power BI, Hive and YARN.

Big Data Developer

Confidential, Mclean VA

Responsibilities:

Used NIFI as dataflow automation tool to ingest data into HDFS from different source systems.
Developed common methods to bulk load raw HDFS files into data frame
Developed common methods to persist data frame into S3, Red shift, HDFS, Hive
Prune the ingested data to remove duplicates by applying window functions and perform complex transformations to derive various metrics.
Used oozie scheduler to trigger spark jobs
Created UDF’s in spark to be used in spark sql.
Used Spark API to perform analytics on data in Hive using Scala programming.
Optimization of existing algorithms in Hadoop using Spark Context, Data Frames, Hive context.
Spark Dataframes are created in Scala for all the data files which then undergo transformations.
The filtered Dataframes are aggregated and transformed based on the business rules and saved as temporary hive tables for intermediate processing.
The RDDs and data frames undergo various transformations and actions and are stored in HDFS/S3 as parquet Files.
Copy data from S3 to Redshift tables using shell script
Performed performance tuning of spark jobs using broadcast joins, correct level of Parallelism and memory tuning.
Analyze and define client's business strategy and determine system architecture requirements to achieve business goals

Environment: Spark 2.2, Hadoop, Hive 2.1, HDFS, Java 1.8, Scala 2.11, HDP, AWS, Redshift, Oozie, Intellij, ORC, Shell Scripting, bitbucket, airflow, Python, Pyspark

Big Data Developer

Confidential, Albany, Newyork

Responsibilities:

Involved in loading raw data from AWS S3 to Redshift.
Involved in developing & scheduling DAGS using Airflow
Involved in implementing transformations on raw data and moved data to cleansed layer in parquet fileformat.
Involved in building logic for incremental data ingestion process.
Involved in data ingestion from oracle to aws s3.
Evaluated Spark jdbc (scala/pyspark/sparkR) vs Sqoop for data ingestion from On-Premise/On-cloud databases to AWS S3.
Created shell scripts to invoke pyspark commands.
Involved in creating hand shake process between autosys and airflow using autosys api in python
Involved in code automation for dags in airflow using jinja templates.
Responsibile in performing unit testing.
Involved in performing several proof of concepts in building the change data capture using next gen platform tools.
Spark RDDs are created for all the data files which then undergo transformations.
The filtered RDDs are aggregated and curated based on the business rules and converted into dataframes and saved as temporary hive tables for intermediate processing.
The RDDs and dataframes undergoes various transformations and actions and are stored in HDFS as parquet Files.

Environment: Spark JDBC, Pyspark, AirFlow, AWS S3, EMR, Kerberos, Autosys, Avro, Parquet, Github, Python, Oracle, Sqoop Teradata, CloudWatch, Redshift, Confluence, Agile

Java Developer

Confidential, Iselin, NJ

Responsibilities:

Involved in software development on web-based front-end applications.
Involved in development of the CSV files using the Data load.
Performed unit testing of the developed modules.
Involved in bug fixing, writing SQL queries & unit test cases.
Used Rational Application Developer (RAD).
Used Oracle as the Backend Database.
Involved in configuration and deployment of front-end application on RAD.
Involved in developing JSP’s for graphical user interface.
Implemented code for validating the input fields and displaying the error messages.

Environment: Java, JSP, Servlets, Apache Struts framework, WebSphere, RAD, Oracle, PVCS, TOAD

We provide IT Staff Augmentation Services!

Big Data Developer Resume

DallaS

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship