Spark/Hadoop Developer Resume Atlanta, GA - Hire IT People

PROFESSIONAL SUMMARY:

5 years of programming experience involved in all phases of Software Development Life Cycle (SDLC) Platform.
Expertise in Bigdata Development applications and experienced in Hadoop ecosystem components like Spark, Hive, Sqoop, Pig and Oozie.
Hands on developing and debugging Spark Jobs to process large Datasets.
Excellent knowledge and understanding of Distributed Computing and Parallel processing frameworks.
Experience in working with Cloudera and Horton Works Hadoop Distributions.
Worked on Importing and exporting data into HDFS and Hive using Sqoop.
Experience in Creating Hive tables and load the tables using Sqoop and processed data using Hive QL.
Extensive experience in developing PIG Latin Scripts and using Hive Query Language for data analytics.
Extending Hive and Pig core functionality by writing custom UDF's for Data Analysis.
Good experience in job scheduling tools like Oozie.
Experience in handling Hive queries using spark SQL that integrate with spark environment implemented in Scala.
Hands on Experience in dealing with the different file formats like Json, Avro and Parquet.
Experience in converting SQL queries into Spark Transformations using Spark RDDs, Data Frames and Scala, and performed map - side joins on RDD's.
Experience in Hadoop administration activities such as installation and configuration of clusters using Apache and Cloudera.
Adequate knowledge of Agile and Waterfall methodologies.
Good experience working on Tableau and enabled the JDBC/ODBC data connectivity from those to Hive tables.
Well versed with UNIX and Linux command line and shell script.
Extensive experience in developing Stored Procedures, Functions and Triggers, Complex SQL queries using Oracle PL/SQL.
Exhibited strong written and oral communication skills. Rapidly learn and adapt quickly to emerging new technologies and paradigms.
Highly motivated with the ability to work independently or as an integral part of a team and Committed to highest levels of profession.

TECHNICAL SKILLS:

BigData Technologies: Hadoop, MapReduce 2.0, Pig, Hive, Sqoop, Oozie, Spark, Kafka.

Databases: Oracle 11g/10g.

Cloud Platforms/Version Control: AWS/ Git.

Programming/Scripting Languages: Scala, Python, Unix.

Operating System: Mac OS, Linux (Various Versions), Windows 2003/7/8/8.1/XP.

Development Tools: Pycharm, Eclipse, Intellij.

PROFESSIONAL EXPERIENCE:

Confidential, Atlanta, GA

Spark/Hadoop Developer

Responsibilities:

Designed and developed the applications on the data lake to transform the data according business users to perform analytics.
Developed shell scripts to perform Data Quality validations like Record count, File name consistency, Duplicate File and for creating Hive Tables and views.
Creating the views by masking PHI Columns for the table, so that data in the view for the PHI columns cannot be seen by unauthorized teams.
Worked on Parquet File format to get a better storage and performance for publish tables.
Worked with NoSQL databases like HBase in creating HBase tables to store the audit data of the RAWZ and APPZ tables.
Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
Developed shell scripts for performing transformation logic and loading the data from raw zone to app zone.
Responsible for developing Spark wrapper scripts using python to perform the transformations on the data.
Responsible for creation of mapping document from source fields to destination fields mapping.
Created Different data Pipelines using Stream sets to land the data from source to Raw zone.
Worked on different files like csv, txt, fixed width to load the data from source to rawz tables.
Experienced in using Kafka as a data pipeline for the Json data between source and destination
Responsible for creating the Jobs using CONTROL M.
Responsible for production activities and production support.
Responsible for resolving the production issues.
Worked in Agile Scrum model and involved in sprint activities.
Worked with Bitbucket, Jira, for the deployed the projects into production environments

Environment: Apache Hive, HBase, spark, python, Agile, Stream sets, Bitbucket, Cloudera, Kafka, Hadoop, Shell Scripting.

Confidential, Madison, WI .

Spark/Hadoop Developer

Responsibilities:

Applied several Spark APIs to perform necessary transformations and actions on the data came from mainframe files.
Created and worked on large data frames with a schema of more than 300 columns.
Ingestion of data into Amazon S3 using Sqoop and apply data transformations using python.
Developed UDFs when necessary to use in PIG and HIVE queries.
Creating Hive tables, loading and analyzing data using hive scripts. Implemented Partitioning, Dynamic Partitions in HIVE.
Deployed and analyzed large chunks of data using HIVE as well as HBase.
Worked on querying data using Spark SQL on top of spark engine.
Used Amazon EMR to perform the Pyspark Jobs on the Cloud.
Created HBase tables as a centralized PIT table which stores the all the information from the remaining tables data and used to incrementally load the data into the Hive tables.
Created Hive tables to store various data formats of PII data coming from the raw hive tables.
Developed Sqoop jobs to import/export data from RDBMS to S3 data store.
Fine-tuning pyspark applications/jobs to improve the efficiency and overall processing time for the pipelines.
Knowledge of writing Hive queries and running both scripts in tez mode to improve performance on Hortonworks Data Platform.
Worked 10 Nodes cluster in AWS for Dev & QA Environment.
Used Bit Bucket for version control.

Environment: Amazon EMR, Amazon S3, Apache Hive, Sqoop, spark, python, Agile, PyCharm, Bitbucket, Hortonworks.

Confidential, Bothell, WA

Spark/Hadoop Developer

Responsibilities:

Used different Scala APIs to perform necessary transformations and actions on the data came in Batches form different sources.
Performed various Parsing technique’s using spark API’S to cleanse the data from Kafka.
Experienced in working with Spark SQL on different file formats like Avro and Parquet.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
Implemented to run Hive on spark and analyzed the data using SparkSQL Queries.
Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
Implemented Incremental Imports of analyzed data into MYSQL tables using Sqoop.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
Implemented the workflows using Apache Oozie framework to automate tasks.
Exported the analyzed data to relational databases using Sqoop for visualization and to generate reports.

Environment: Hadoop, HDFS, Apache Hive, Sqoop, Apache Spark, Scala,Shell Scripting, Agile, Oracle, Cloudera.

Confidential

ETL Developer

Responsibilities:

Understanding the requirements, interacting with Client/Onsite team for Clear understanding of the Requirements.
Participate in order to define and implement project level standards and guidelines and ensure adherence to enterprise level policies
Extracted data from various sources across the organization (Oracle, SQL Server and Flat files) and loading into staging area.
Used techniques like source query tuning, single pass reading and caching lookups to achieve optimized performance in the existing sessions.
Developed test cases and tested the reports.
Created and scheduled Sessions and Batch Process based on demand, run on time, or run only once using Informatica Workflow Manager and monitoring the data loads using the Workflow Monitor.
Developed various daily and monthly ETL load jobs using Control-M and modified the existing Control-M jobs on business requirement.
Work with testing team to define a robust test plan and support them during the functional testing of the application.
Contribute to performance tuning and volume testing of the application.
Impact analysis for change requests.
Review and deploy the code.
Involved in fixing the UAT defects raised by the testing team within the timelines.
Track and Report the status of the project in frequent intervals.

Environment: Informatica Power Centre 9.x, Oracle10g, SQL, UNIX, Control-M, Waterfall methodology.

We provide IT Staff Augmentation Services!

Spark/hadoop Developer Resume

Atlanta, GA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship