Hadoop/ Spark Developer Resume
Charlotte, NC
SUMMARY:
- Spark/Hadoop developer with 7+ Years of professional IT experience including 4 Years of Big data experience in Hadoop ecosystem components in Data Ingestion, Data modeling, Querying, Processing, Storage Analysis, Data Integration and Implementing enterprise level systems transforming Big data.
- Astounding hands on experience in Data Extraction, Transformation, Loading and Data Analysis and Data Visualization uutilizing Cloudera Platform (Spark, Scala, HDFS, Hive, Sqoop, Oozie).
- Developed end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities as per the necessities.
- Experience developing and implementing Spark programs in Scala using Hadoop to work with Structured and Semi - structured data.
- Experience in extracting data from heterogeneous sources like flat files, MySQL, Teradata into HDFS using Sqoop and the other way around .
- Broad experience working with structured data using Spark SQL, Data frames, Hive QL, optimizing queries, and in corporate complex UDF's in business logic.
- Experience working with Text, Sequence files, XML, Parquet, JSON, ORC, AVRO file formats and Click Stream log files.
- Experience in using Oozie schedulers and Unix Scripting to implement Cron jobs that execute different kind of Hadoop actions.
- Good experience in optimization/performance tuning of Spark Jobs & Hive Queries.
- Familiar in data architecture including data ingestion pipeline design, Hadoop architecture, data modeling and data mining and advanced data processing. Experience optimizing ETL workflows.
- Excellent understanding of Spark Architecture and framework, Spark Context, APIs, RDDs, Spark SQL, Data frames, Streaming, MLlib.
- Solid understanding and experience with Extract, Transform, Load (ETL) methodologies.
- Adequate understanding of Hadoop Gen1/Gen2 architecture and hands on experience with Hadoop components such as Job Tracker, Task Tracker, Name Node, Secondary Name Node, Data Node and YARN architecture and its deamons Node manager, Resource manager and App Master and Map Reduce Programming Paradigm.
- Hands on experience in using the Hue browser for interacting with Hadoop components.
- Good understanding and Experience with Agile and Waterfall methodologies of Software Development Life Cycle (SDLC).
- Familiar with Jenkins (CI).
- Highly motivated, self-learner with a positive attitude, willingness to learn new concepts and accepts challenges.
TECHNICAL SKILLS:
Cloud/Big Data Frameworks: Hadoop, Spark, HDFS, Yarn, Hive, Sqoop, Pig, Impala on CDH 5.x
Databases: MySQL, Oracle, DB2, Teradata
Programming Languages: Python, Java, Scala, PL/SQL, MapReduce, Bash Scripting, C
Workflow Scheduler: Oozie, Autosys
File Formats: CSV, JSON, Parquet, Avro, Sequence, ORC
Operating Systems: Windows 7/8/10, Linux, Mac
Others: Eclipse, Intellij, Putty, Microsoft Suite
PROFESSIONAL EXPERIENCE:
Confidential, Charlotte, NC
Hadoop/ Spark Developer
Responsibilities:
- Developed and maintained the ingestion process to ingest data coming from various source databases to Hadoop using sqoop.
- Created Hive tables for loading and analysing data, Implemented Partitions, Buckets and developed Hive queries to process the data and generate the data cubes for visualizing.
- Load the data into Spark RDD and perform in-memory data computation to generate the output as per requirements.
- Developed custom aggregate functions using Spark SQL and performed interactive querying.
- Worked with Spark SQL context to create dataframes to filter input data for model execution.
- Worked on different file formats like Text, Sequence files, Avro, Parquet, ORC, JSON, XML files and Flat files.
- Developed Spark jobs, Hive jobs to summarize and transform data.
- Developed daily process to do incremental import of data from Oracle, DB2, Teradata into Hive tables using Sqoop.
- Analyzed the SQL scripts and designed the solution to implement using Scala.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Extensively used Hive/HQL or Hive queries to query data in Hive Tables and loaded data into HBase tables.
- Extensively worked with Partitions, Dynamic Partitioning, Bucketing tables in Hive, designed both Managed and External tables, also worked on optimization of Hive queries.
- Designed Oozie workflows for job scheduling and batch processing.
Environment: CDH 5.8.2, Scala 2.10.5, Spark 1.6.0, SQL, Hive, Python 2.6, HBase, Java 1.8, Hbase, Oozie, ETL, MapReduce, AWS S3, Tableau.
Confidential, Columbus, OH
Hadoop/Spark Developer
Responsibilities:
- Involved in requirement gathering and analysis of process metric with business users.
- Worked on developing solutions for ingesting data from various sources and processing the Data utilizing Big Data Technologies such as Hive, Spark, Pig, Sqoop, HBase, Map reduce, etc.
- Imported data from different source systems using Scoop to HDFS and perform data cleansing, and data transformation using Hive.
- Tuning Spark application to improve performance.
- Worked on experimental Spark API for better optimization of existing algorithms such as Spark context, Spark SQL, Spark Streaming, Spark DataFrames.
- Troubleshooting problems occurring during migration of data and effectively solving any production issues.
- Involved in creating Hive Tables, loading data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
- Worked on importing data from AWS S3 using Spark API.
- Integrated data from different file formats such as CSV and Json.
- Query optimization using Dynamic Partitioning and Bucketing in Hive.
- Distributed data from HIVE to Teradata processing system using Hive ODBC connector.
- Experience working with workflow scheduler Autosys for scheduling different types of jobs in executing and deploying application.
Environment: Cloudera 5.8, Spark 1.6.0, Hive, UNIX, SQL, MySQL, Sqoop Autosys, Scala, Pyspark, Python, AWS S3.
Confidential
Big Data Engineer
Responsibilities:
- Worked on different cluster size on Cloudera distribution.
- Develop data pipeline using Sqoop and MapReduce to ingest current data and historical data in data staging area.
- Responsible for defining data flow in Hadoop ecosystem to different teams.
- Wrote Pig scripts for data cleansing and data transformation as ETL tool before loading in HDFS.
- Worked on importing normalize data from staging area to HDFS using Scoop and perform analysis using Hive Query Language (HQL).
- Create Managed tables and External tables in Hive and load data from HDFS.
- Performed query optimization for HiveQL and denormalized Hive tables to increase speed of data retrieval.
- Transferred analyzed data from HDFS to BI team for visualization and to data scientist team for predictive modelling.
- Exported data from HDFS to Web API and development team that support front end process.
- Worked with Hadoop admin to scale data nodes, monitor jobs in cluster and reviewed Hadoop log files.
- Managed and reviewed Hadoop log files, managing and scheduling Jobs on a Hadoop cluster.
- Schedule data processing pipeline jobs using Oozie and UNIX Cron Jobs.
- Developed Job Information Language script using Oozie for automatic executions of jobs.
- Assisted in Hadoop administration and support activities for installations and configuring Apache Big Data Tools and Hadoop clusters using Cloudera Manager.
- Experienced in Hadoop Production support tasks by analysing the Application and cluster logs.
- Used Agile Scrum methodology/ Scrum Alliance for development.
Environment: CDH, Spark, Hive, MapReduce, Sqoop, HDFS, Oozie, UNIX, SQL, Oracle, Scala, PySpark, Python.
Confidential
Software Developer (SQL Developer)
Responsibilities:
- Used JDBC technology to establish connection with Oracle database and communicated using PL/SQL.
- Defined the search criteria to pull out the customer requested record from the database and make required changes and save the updated record back to the database.
- Developed stored procedures and triggers using PL/SQL to calculate and update the tables to implement business logic.
- Involved in loading data using PL/SQL and SQL*Loader calling Unix scripts to download and manipulate files.
- Developed ETL processes to load data from Flat files, SQL Server and Access into the target Oracle database by applying business logic on transformation mapping for inserting and updating records when loaded.
- Created indexes on the tables for faster retrieval of the data to enhance database performance while using the data for generation of the reports or moving the data to a different database.
- Written complex SQL queries and stored procedures using Joins, Cursors, and Exception handling.
- Stored Procedures, Functions, Triggers, Constraints, Cursors, Views, and Materialize views.
- Designed and developed XML processing components for dynamic menus on the application.
- Implemented database using SQL server.
- Used RESTful web services with MVC for parsing and processing XML data.
- Analyzed the requirements and designed class diagrams, sequence diagrams, and use case diagrams using UML.
- Involved in technical workflow discussion and documentation, mapping document, and data modelling designing.
- Scheduled monthly, weekly, daily, and hourly reports at the end of each month.
Environment: JDBC, Oracle, PL/SQL, Unix, ETL, Flat files, SQL Server, XML, MVC, Web services, UML diagrams.
Confidential
SQL Developer
Responsibilities:
- Used JDBC technology to establish connection with Oracle database and communicated using PL/SQL.
- Defined the search criteria to pull out the customer requested record from the database and make required changes and save the updated record back to the database.
- Developed stored procedures and triggers using PL/SQL to calculate and update the tables to implement business logic.
- Involved in loading data using PL/SQL and SQL*Loader calling Unix scripts to download and manipulate files.
- Developed ETL processes to load data from Flat files, SQL Server and Access into the target Oracle database by applying business logic on transformation mapping for inserting and updating records when loaded.
- Created indexes on the tables for faster retrieval of the data to enhance database performance while using the data for generation of the reports or moving the data to a different database.
- Written complex SQL queries and stored procedures using Joins, Cursors, and Exception handling.
- Stored Procedures, Functions, Triggers, Constraints, Cursors, Views, and Materialize views.
- Designed and developed XML processing components for dynamic menus on the application.
- Implemented database using SQL server.
- Used RESTful web services with MVC for parsing and processing XML data.
- Analyzed the requirements and designed class diagrams, sequence diagrams, and use case diagrams using UML.
- Involved in technical workflow discussion and documentation, mapping document, and data modelling designing.
- Scheduled monthly, weekly, daily, and hourly reports at the end of each month.
Environment: JDBC, Oracle, PL/SQL, Unix, ETL, Flat files, SQL Server, XML, MVC, Web services, UML diagrams.