We provide IT Staff Augmentation Services!

Spark/hadoop Developer Resume

Atlanta, GA

PROFESSIONAL SUMMARY:

  • 5 years of programming experience involved in all phases of Software Development Life Cycle (SDLC) Platform.
  • Expertise in Bigdata Development applications and experienced in Hadoop ecosystem components like Spark, Hive, Sqoop, Pig and Oozie.
  • Hands on developing and debugging Spark Jobs to process large Datasets.
  • Excellent knowledge and understanding of Distributed Computing and Parallel processing frameworks.
  • Experience in working with Cloudera and Horton Works Hadoop Distributions.
  • Worked on Importing and exporting data into HDFS and Hive using Sqoop.
  • Experience in Creating Hive tables and load the tables using Sqoop and processed data using Hive QL.
  • Extensive experience in developing PIG Latin Scripts and using Hive Query Language for data analytics.
  • Extending Hive and Pig core functionality by writing custom UDF's for Data Analysis.
  • Good experience in job scheduling tools like Oozie.
  • Experience in handling Hive queries using spark SQL that integrate with spark environment implemented in Scala.
  • Hands on Experience in dealing with the different file formats like Json, Avro and Parquet.
  • Experience in converting SQL queries into Spark Transformations using Spark RDDs, Data Frames and Scala, and performed map - side joins on RDD's.
  • Experience in Hadoop administration activities such as installation and configuration of clusters using Apache and Cloudera.
  • Adequate knowledge of Agile and Waterfall methodologies.
  • Good experience working on Tableau and enabled the JDBC/ODBC data connectivity from those to Hive tables.
  • Well versed with UNIX and Linux command line and shell script.
  • Extensive experience in developing Stored Procedures, Functions and Triggers, Complex SQL queries using Oracle PL/SQL.
  • Exhibited strong written and oral communication skills. Rapidly learn and adapt quickly to emerging new technologies and paradigms.
  • Highly motivated with the ability to work independently or as an integral part of a team and Committed to highest levels of profession.

TECHNICAL SKILLS:

BigData Technologies: Hadoop, MapReduce 2.0, Pig, Hive, Sqoop, Oozie, Spark, Kafka.

Databases: Oracle 11g/10g.

Cloud Platforms/Version Control: AWS/ Git.

Programming/Scripting Languages: Scala, Python, Unix.

Operating System: Mac OS, Linux (Various Versions), Windows 2003/7/8/8.1/XP.

Development Tools: Pycharm, Eclipse, Intellij.

PROFESSIONAL EXPERIENCE:

Confidential, Atlanta, GA

Spark/Hadoop Developer

Responsibilities:

  • Designed and developed the applications on the data lake to transform the data according business users to perform analytics.
  • Developed shell scripts to perform Data Quality validations like Record count, File name consistency, Duplicate File and for creating Hive Tables and views.
  • Creating the views by masking PHI Columns for the table, so that data in the view for the PHI columns cannot be seen by unauthorized teams.
  • Worked on Parquet File format to get a better storage and performance for publish tables.
  • Worked with NoSQL databases like HBase in creating HBase tables to store the audit data of the RAWZ and APPZ tables.
  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Developed shell scripts for performing transformation logic and loading the data from raw zone to app zone.
  • Responsible for developing Spark wrapper scripts using python to perform the transformations on the data.
  • Responsible for creation of mapping document from source fields to destination fields mapping.
  • Created Different data Pipelines using Stream sets to land the data from source to Raw zone.
  • Worked on different files like csv, txt, fixed width to load the data from source to rawz tables.
  • Experienced in using Kafka as a data pipeline for the Json data between source and destination
  • Responsible for creating the Jobs using CONTROL M.
  • Responsible for production activities and production support.
  • Responsible for resolving the production issues.
  • Worked in Agile Scrum model and involved in sprint activities.
  • Worked with Bitbucket, Jira, for the deployed the projects into production environments

Environment: Apache Hive, HBase, spark, python, Agile, Stream sets, Bitbucket, Cloudera, Kafka, Hadoop, Shell Scripting.

Confidential, Madison, WI .

Spark/Hadoop Developer

Responsibilities:

  • Applied several Spark APIs to perform necessary transformations and actions on the data came from mainframe files.
  • Created and worked on large data frames with a schema of more than 300 columns.
  • Ingestion of data into Amazon S3 using Sqoop and apply data transformations using python.
  • Developed UDFs when necessary to use in PIG and HIVE queries.
  • Creating Hive tables, loading and analyzing data using hive scripts. Implemented Partitioning, Dynamic Partitions in HIVE.
  • Deployed and analyzed large chunks of data using HIVE as well as HBase.
  • Worked on querying data using Spark SQL on top of spark engine.
  • Used Amazon EMR to perform the Pyspark Jobs on the Cloud.
  • Created HBase tables as a centralized PIT table which stores the all the information from the remaining tables data and used to incrementally load the data into the Hive tables.
  • Created Hive tables to store various data formats of PII data coming from the raw hive tables.
  • Developed Sqoop jobs to import/export data from RDBMS to S3 data store.
  • Fine-tuning pyspark applications/jobs to improve the efficiency and overall processing time for the pipelines.
  • Knowledge of writing Hive queries and running both scripts in tez mode to improve performance on Hortonworks Data Platform.
  • Worked 10 Nodes cluster in AWS for Dev & QA Environment.
  • Used Bit Bucket for version control.

Environment: Amazon EMR, Amazon S3, Apache Hive, Sqoop, spark, python, Agile, PyCharm, Bitbucket, Hortonworks.

Confidential, Bothell, WA

Spark/Hadoop Developer

Responsibilities:

  • Used different Scala APIs to perform necessary transformations and actions on the data came in Batches form different sources.
  • Performed various Parsing technique’s using spark API’S to cleanse the data from Kafka.
  • Experienced in working with Spark SQL on different file formats like Avro and Parquet.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Implemented to run Hive on spark and analyzed the data using SparkSQL Queries.
  • Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Implemented Incremental Imports of analyzed data into MYSQL tables using Sqoop.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Exported the analyzed data to relational databases using Sqoop for visualization and to generate reports.

Environment: Hadoop, HDFS, Apache Hive, Sqoop, Apache Spark, Scala,Shell Scripting, Agile, Oracle, Cloudera.

Confidential

ETL Developer

Responsibilities:

  • Understanding the requirements, interacting with Client/Onsite team for Clear understanding of the Requirements.
  • Participate in order to define and implement project level standards and guidelines and ensure adherence to enterprise level policies
  • Extracted data from various sources across the organization (Oracle, SQL Server and Flat files) and loading into staging area.
  • Used techniques like source query tuning, single pass reading and caching lookups to achieve optimized performance in the existing sessions.
  • Developed test cases and tested the reports.
  • Created and scheduled Sessions and Batch Process based on demand, run on time, or run only once using Informatica Workflow Manager and monitoring the data loads using the Workflow Monitor.
  • Developed various daily and monthly ETL load jobs using Control-M and modified the existing Control-M jobs on business requirement.
  • Work with testing team to define a robust test plan and support them during the functional testing of the application.
  • Contribute to performance tuning and volume testing of the application.
  • Impact analysis for change requests.
  • Review and deploy the code.
  • Involved in fixing the UAT defects raised by the testing team within the timelines.
  • Track and Report the status of the project in frequent intervals.

Environment: Informatica Power Centre 9.x, Oracle10g, SQL, UNIX, Control-M, Waterfall methodology.

Hire Now