Big Data Developer Resume
Birmingham, AL
SUMMARY
- Having 10 years of IT experience wif over 5 years of experience as a Big Data Developer wif extensive knowledge on Banking and Insurance Domains.
- Experience wif SDLC implementation in a large organization.
- Experience on Cloudera Hadoop distribution model CDH4 and CDH5.
- Experience in Hadoop ecosystem components like Map Reduce, HDFS, Sqoop, Flume, Pig, Hive, Oozie, Zookeeper, HBase.
- Experience in Managing data extraction jobs, and build new data pipelines from various structured and unstructured sources into Hadoop.
- Experience in Amazon Cloud services like Amazon EMR File System and Simple Storage Services (S3).
- Experience using HDFS along wif Amazon S3 to store input and output data.
- Experience in Spark Core, Spark SQL wif Scala API.
- Experience in using Hcatalog for Hive, Pig of different frameworks.
- Experience in Load and transform large data sets of structured and semi structured.
- Strong knowledge on Spark Streaming data wif complex Input DSTREAMS.
- Extensive experience in Data warehousing, Data Architecture & Extraction, Transformation and ETL data load from various sources into Data Warehouse and Data Marts using Informatica Power Center Client tool.
- Expertise in Data Modeling using Star Schema/Snowflake Schema, Fact and Dimensions tables, Physical and logical data modeling using ERWIN 4.x/3.x
- In - depth knowledge working on Oracle Database, DB2, MySQL, HBASE(No Sql Database).
- Extensive knowledge and experience inCOBOL, CICS, JCL,VSAM, FILE-AID, TSO/ISPF, CA7 (Scheduler), ICETOOL, CHANGEMAN, SPUFI.
- Excellent interpersonal skills comfortable presenting to large Groups, preparing written communications and presentation material.
- Flexible and Quick learner, who can adapt and execute in any fast paced environment.
TECHNICAL SKILLS
Operating Systems: Windows XP/NT/2000, Unix/Linux, CentOS Linux
Programming Languages: SQL, PL/SQL, Scala API, Python, JCL, COBOL
Frameworks: Hadoop(Sqoop, HDFS, Hive, Pig, Map Reduce, HBase(NoSQL), Oozie, Flume, Zookeeper, HCatalog), Spark(Spark Core, Spark SQL, Spark Streaming)
RDBMS: Oracle 9i/10g/11g/12c, MySQL 5.5, DB2, SQL Server
Tools: SQL Developer, Toad, Tableau, Jira, Informatica Power Center 9.5/9.0.1/9/8.6.1, Erwin 4.x, MS Visio, ServiceNow, HP Quality Center, File Aid, File-manager, DB2 Interactive, Icetool.
Version Controller: Git, SVN
Storage: EC2, S3, EMR
IDE: Scala Eclipse
PROFESSIONAL EXPERIENCE
Confidential, Birmingham, AL
Big Data Developer
Environment: Hadoop 2.x, Spark 1.6, Scala API,Scala IDE-Eclipse, HDFS, AWS EMR,AWS S3,Hive, MapReduce, Sqoop, CentOS Linux and Oracle DB Server.
Responsibilities:
- Worked wif Data Science team to gather requirements and provide support for data analysis by manipulating and exploring the data.
- Responsible for translating complex functional and technical requirements into detailed design.
- Loading disparate datasets into Hadoop Data Lake, which would be available to the data science team to predict the future.
- Created Hive tables wif Avro file Format.
- Used HDFS along wif Amazon S3 to store input and output data.
- Writing oozie scripts and setting up workflow using Apache Oozie workflow engine for managing and scheduling Hadoop jobs.
- Created Spark RDD’s from source files for better performance using Scala API.
- Migrated Hive queries to Spark SQL for improving performance.
- Creating different Dataframes from Hive tables using SparkSQL.
- Performing data analysis by exploring the datasets and bringing some recommendations.
- Implemented Data Ingestion in real time using FLUME.
- Performed sentimental analysis of customer reviews using Spark Streaming.
Confidential, Dublin, OH
Big Data Developer
Environment: Hadoop2.x, HDFS, Pig, Hive, Map Reduce, Sqoop, Oozie, CentOS Linux and Oracle DB.
Responsibilities:
- Responsible for gathering requirements and translating the functional and technical requirements in to detailed design.
- Loading and transforming large datasets of structured data from relational databases into HDFS using Sqoop imports.
- Developed Sqoop scripts to import data from relational sources and handled incremental loading.
- Data Pre-processing by analyzing and cleansing raw data using Hive queries and Pig scripts.
- Optimizing the hive queries using partitioning and bucketing techniques by controlling the data.
- Created Hive tables wif RCFile format which is very useful for performing analysis.
- Performing SerDe operations using Hive.
- Performing data analysis by exploring the datasets and bringing some insights.
- Writing oozie scripts and setting up workflow using Apache Oozie workflow engine for managing and scheduling Hadoop jobs.
Confidential, Dublin, OH
Big Data Developer
Environment: Hadoop1.x, HDFS, Hive, Sqoop, Oozie, CentOS Linux and Oracle DB.
Responsibilities:
- Worked wif Business Analyst to gather requirements.
- Created sqoop scripts to Import data from oracle tables to hive tables.
- Validated the data loaded in Hive using HiveQL.
Confidential, Bloomington, IL
Mainframe developer
Environment: COBOL, JCL, Ezytrieve, ICETOOL, DB2, File-Aid, Ca7, Control M, Changeman.
Responsibilities:
- Worked wif Business Analyst to gather requirements.
- Preparation of High level and Low level designs from the requirements.
- Analysis & Programming in COBOL, DB2, JCL using design specification.
Confidential, Charlotte, NC
ETL Developer
Environment: Informatica Power Center 8.6.1, DB2,SQL Developer, Unix, HP Quality Center.
Responsibilities:
- Developed ETL programs using Informatica to implement the business requirements.
- Implemented Relational and Dimensional Data Modeling Techniques to design ERWIN data models.
- Developed and maintained complex Informatica mappings. Supported, monitored and tuned Informatica ETL processes.
- Designed and created table structures and modified existing tables to fit into the existing Data Model.
- Implemented SQL queries for database operations and to maintain DW systems.
- Used all major transformations to load data into target systems.
- TEMPEffectively used Informatica parameter files for defining mapping variables, workflow variables, FTP connections and relational connections.
- Implemented fine-tuning in ETL jobs and performed unit testing for workflow monitoring.
Confidential, Charlotte, NC
ETL Developer
Environment: Informatica Power Center 8.6.1, DB2, SQL Developer, Unix, HP Quality Center.
Responsibilities:
- Developed ETL programs using Informatica to implement the business requirements.
- Implemented Relational and Dimensional Data Modeling Techniques to design ERWIN data models.
- Developed and maintained complex Informatica mappings. Supported, monitored and tuned Informatica ETL processes.
- Implemented SQL queries for database operations and to maintain DW systems.
- Used all major transformations to load data into target systems.
- TEMPEffectively used Informatica parameter files for defining mapping variables, workflow variables, FTP connections and relational connections.
- Implemented fine-tuning in ETL jobs and performed unit testing for workflow monitoring.
Confidential, Charlotte, NC
Mainframe Developer
Environment: COBOL, JCL, Ezytrieve, ICETOOL, DB2, File-Aid, Ca7, Control M, Changeman.
Responsibilities:
- Analyzed the requirement and prepared the High/Low level Design documents.
- JCL job creation and maintenance, Scheduling of Jobs via CA-7 scheduler.
- Preparation of Test region, and Execution of region shake out.
- Execution of SQL queries for conditioning/Mining activities for the identified data from all the data stores.
- Preparation of Component/System Integration Test Plan/ Scripts and execution.