We provide IT Staff Augmentation Services!

Big Data Developer Resume

3.00/5 (Submit Your Rating)

Los Angeles, CA

SUMMARY:

  • Sr. Big Data Developer/ Architect with more than eleven (11) years of experience in the design and development of analytics/big data applications using leading industry tools, working with fortune 50 firms like Confidential, Confidential, Confidential and Confidential
  • Well rounded experience in ETL, Hadoop, Spark, data modeling, data visualization
  • Strong knowledge of data integration concepts like Dimensional Modeling, Data Quality, Streaming, CDC, Master Data Management (MDM), REST, SOAP, Web Services
  • Good understanding of Big Data concepts like Hadoop, Map - Reduce, YARN, Spark, RDD, Dataframes, Datasets, Streaming
  • Proficient in Hive, Oracle, SQL Server, SQL, PL/SQL, T-SQL and in managing very large databases
  • Hands on programming experience in scripting languages like JAVA, SCALA,
  • Experience writing in house UNIX shell scripts for Hadoop & Big Data Development
  • Skilled in performance tuning of data pipelines, distributed datasets, databases and SQL query performance
  • Strong data modeling skills with experience developing complex data using Unified Modeling Language (UML), ER Diagrams, conceptual/physical Diagrams etc.
  • Recognized for superior performance with awards such as Confidential Service Excellence, Confidential Managers Choice and the Amex Chairman’s award

TECHNICAL SKILLS:

Big Data: Hadoop, Sqoop, Flume, Hive, Spark, Pig, Kafka, Talend, HBase, Impala

ETL Tools: Informatica, Talend, Microsoft SSIS, Confidential DataStage

Database: Oracle, SQL Server 2016, Teradata, Netezza, MS Access

Reporting: Microsoft Power BI, Tableau, QlikView, SSRS, Business Objects(Crystal)

Business Intelligence: MDM, Change Data Capture (CDC), Metadata, Data Cleansing, OLAP, OLTP, SCD, SOA, REST, Web Services.

Tools: Ambari, Dbeaver, SQL Developer, TOAD, Erwin, Visio, Tortoise SVN

Operating Systems: Windows Server, UNIX (Red Hat, Linux, Solaris, AIX)

Languages: UNIX shell scripting, SCALA, SQL, PL/SQL, T-SQL

Trainings: Hadoop, Spark, Kafka, Hive, Mapreduce, Talend, Informatica BDE, Pentaho

PROFESSIONAL EXPERIENCE:

Big Data Developer

Confidential, Los Angeles, CA

Responsibilities:

  • Work with Project Manager, Business Leaders and Technical teams to finalize requirements and create solution design & architecture.
  • Architect the data lake by cataloging the source data, analyzing entity relationships, and aligning the design as per performance, schedule & reporting requirements
  • Design and Develop Hadoop ETL solutions to move data to the data lake using big data tools like Sqoop, Hive, Spark, HDFS, Talend etc.
  • Design and Develop Spark code using Scala programming language & Spark SQL for high speed data processing to meet critical business requirements
  • Implement RDD/Datasets/Dataframe transformations in Scala through SparkContext and HiveContext
  • Import Java libraries into the transformation logic to implement core functionality
  • Wrote Spark-SQL and embedded the sql in SCALA files to generate jar files for submission onto the Hadoop cluster
  • Develop algorithms & scripts in Hadoop to import data from source system and persist in HDFS (Hadoop Distributed File System) for staging purposes.
  • Develop Hive logic & Stored Procedures to implement business rules and perform data transformation.
  • Develop Unix Shell scripts to perform Hadoop ETL functions like Sqoop, create external/internal Hive tables, initiate HQL scripts etc.
  • Develop scripts in Hive to perform transformations on the data and load to target systems for use by the data analysts for reporting.
  • Schedule jobs through Apache oozie by creating workflow and properties file, and submit jobs
  • Develop Kafka streaming solution to process real time data usage data & merge it with weather data to predict climate anamolies to take preventive actions

Environment: Hortonworks 2.3.5, Sqoop, Hive, Informatica, Spark, Scala, Python, T-SQL, PL/SQL, Talend, UNIX, Ambari, Oozie

Big Data Developer

Confidential, San Francisco, CA

Responsibilities:
  • Performed extensive data analysis and coordinated with the client teams to develop data models
  • Developed HQL scripts in Hive & SparkSQL to perform transformation on relational data and Sqoop export data back to DB’s.
  • Develop Unix Shell scripts to perform ELT operations on big data using functions like Sqoop, create external/internal Hive tables, initiate HQL scripts etc.
  • Developed the ETL/SQL code to load data from raw stages relational DB’s, and Ingest data using Sqoop to Hadoop environment
  • Optimize Spark code in Scala through reengineering the DAG logic to use minimal resources and provide high throughput
  • Develop PIG scripts to transform unstructured and semi structured streaming data and perform transformations
  • Developed data flow architecture & physical data model with Data Warehouse Architect
  • Wrote unit scripts to automate data load and performed data transformation operations
  • Performance tuned the Hive code through use of Map Joins, partitions, vectorization, compute statistics
  • Performance tuned the Spark code by minimizing shuffle operations, caching and persisting reusable RDD’s and adjusting the number of executors/cores/tasks

Environment: Informatica, Hadoop, Shell Scripting, Scala, Sqoop, Hive, Oracle, PL/SQL, Java, UNIX

Talend/ Big Data Developer

Confidential, Washington, DC

Responsibilities:
  • Extracted and profiled data from the customer, commercial loans and retail source systems that would provide the data needed for the loan reporting requirements
  • Determined criteria and wrote scripts for technical and business data quality checks, error handling and rejected reports during the data quality stage
  • Provided inputs on design of physical and logical architecture, Source\Target Mappings of the data warehouse and the ETL process
  • Created Hive tables and loaded data from HDFS to Hive tables as per the requirement.
  • Processing complex XML files and generating derived fields to be loaded to database.
  • Converting Large XML files to Multiple XML files as required by downstream application.
  • Loading the Processed XML files to the Database tables.
  • Mapping source files and generating Target files in multiple formats like XML, Excel, CSV etc.
  • Transform the data and reports retrieved from various sources and generating derived fields.
  • Writing complex SQL queries to validate the reports.
  • Writing user defined function to transform data into required formats.
  • Developing Talend jobs by using the context variables and scheduling the jobs to run it automatically.
  • Extensively worked on Data Mapper to map complex JSON formats to XML.
  • Copy data to AWS S3 for storage and use COPY command to transfer data to Redshift. Used Talend connectors integrated to Redshift.

Environment: Talend, Hadoop, Hortonworks, AWS, Redshift, UNIX, Hive, Informatica, Control-M

BI Developer

Confidential, Phoenix, AZ

Responsibilities:
  • Extracted data from five operational databases containing almost two terabytes of data, loaded into the data warehouse and subsequently populated seven data marts
  • Created complex transformations, mappings, mapplets, reusable items, scheduled workflows based on the business logic and rules
  • Developed ETL job workflows with QC reporting and analysis frameworks
  • Developed Informatica mappings, Lookups, Reusable Components, Sessions, Work Flows etc. (on ETL side) as per the design documents/communication
  • Designed Metadata tables at source staging table to profile data and perform impact analysis
  • Performed query tuning and setting optimization on the Oracle database (rule and cost based)
  • Created Cardinalities, Contexts, Joins and Aliases for resolving loops and checked the data integrity
  • Debugged issues, fixed critical bugs and assisted in code deployments to QA and production
  • Coordinated with the external teams to assure the quality of master data and conduct UAT/integration testing
  • Implemented Power Exchange CDC for mainframes to load certain large data modules in to the data warehouse and implement changing data
  • Designed and developed exception handling, data standardization procedures and quality assurance controls
  • Used Cognos for analysis and presentation layers

Environment: Informatica, Tableau, Oracle 10g, SQL Developer, Cognos, Windows Server and Teradata

ETL Developer

Confidential, Bloomington, IL

Responsibilities:
  • Involved in gathering the business scope and technical requirements, and created technical specifications
  • Developed complex mappings and SCD type-I, Type-II and Type III mappings in Informatica to load the data from various sources using different transformations like Source Qualifier, Lookup (connected and unconnected), Expression, Aggregate, Update Strategy, Sequence Generator, Joiner, Filter, Rank and Router and SQL transformations. Created complex mapplets for reusable purposes
  • Worked with XML, JSON, SFDC and other non-traditional sources to provide real time integration solutions
  • Develop processes that conduct service calls through APIs that interface with application on cloud.
  • Deployed reusable transformation objects such as mapplets to avoid duplication of metadata, reducing the development time
  • Created synonyms for copies of Time Dimensions, used the Sequence Generator transformation type to create Sequences for generalized Dimension Keys, Stored Procedure transformation type for encoding and decoding functions and Lookup Transformation to identify slowly changing Dimensions
  • Fine-tuned Informatica maps for performance optimization
  • Debugged mappings by creating logic that assigns a severity level to each error and sends the error rows to error table so that they could be corrected and re-loaded into a Target System
  • Involved in the unit testing, event and thread testing, and system testing
  • Tested the system from the beginning to end to ensure quality of the adjustments made to oblige the Source system up-grades
  • Efficient documentation was created for all phases like analysis, design, development, testing and maintenance

Environment: Informatica, Java/SOAP/Web Services, Oracle, DB2, SAS, Shell Scripting, TOAD, SQL Plus, Scheduler

We'd love your feedback!