We provide IT Staff Augmentation Services!

Sr. Big Data Developer Resume

Austin, TX

SUMMARY:

  • Above 9+ years of Experience in Big Data/Hadoop, Data Analysis, Data Modeling professional with applied information Technology.
  • Proficient in Data Modeling / DW/ Bigdata/ Hadoop/ Data Integration/ Master Data Management, Data Migration and Operational Data Store, BI Reporting projects with a deep focus in design, development and deployment of BI and data solutions using custom, open source and off the shelf BI tools
  • Experienced in Technical consulting and end - to-end delivery with data modeling, data governance and design - development - implementation of solutions.
  • Have experience in Apache Spark, Spark Streaming, Spark SQL and NoSQL databases like HBase, Cassandra, and MongoDB.
  • Expertise in configuring the monitoring and alerting tools according to the requirement like AWS CloudWatch.
  • Proficiency in Big Data Practices and Technologies like HDFS, MapReduce, Hive, Pig, HBase, Sqoop, Oozie, Flume, Spark, Kafka.
  • Data Warehousing: Full life-cycle project leadership, business-driven requirements, capacity planning, gathering, feasibility analysis, enterprise and solution architecture, design, construction, data quality, profiling and cleansing, source-target mapping, gap analysis, data integration/ETL, SOA, ODA, data marts, Inman/Kimball methodology, Data Modeling for OLTP, canonical modeling, Dimension Modeling for data ware house star/snowflake design.
  • Experience in BI/DW solution (ETL,OLAP, Data mart), Informatica, BI Reporting tool like Tableau and Qlikview and also experienced leading the team of application, ETL, BI developers, Testing team .
  • Experience in developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Talend Integration Suite.
  • Worked on Informatica Power Center tools-Designer, Repository Manager, Workflow Manager.
  • Proficiency in multiple databases like MongoDB, Cassandra, MySQL, ORACLE and MS SQL Server.
  • Extensive use of Talend ELT, database, data set, HBase, Hive, PIG, HDFS and SCOOP components.
  • Experience in Installation, Configuration, and Administration of Informatica Power Center 8.x, 9.1 Client/Server.
  • Experienced on Hadoop Ecosystem and Big Data components including Apache Spark, Scala, Python, HDFS, Map Reduce, KAFKA.
  • Expertise in reading and writing data from and to multiple source systems such as oracle, HDFS, XML, delimited files, Excel, Positional and CSV files.
  • Experience in Business Intelligence (BI) project Development and implementation using Microstrategy product suits including Microstrategy Desktop/Developer, Web, OLAP Services, Administrator and Intelligence server.
  • Logical and physical database designing like Tables, Constraints, Index, etc. using Erwin, ER Studio, TOAD Modeler and SQL Modeler.
  • Good understanding and hands on experience with AWS S3 and EC2.
  • Good experience on programming languages Python, Scala.
  • Experience in Performance tuning of Informatica (sources, mappings, targets and sessions) and tuning the SQL queries.
  • Excellent knowledge on creating reports on SAP Business Objects, Webi reports for multiple data providers.
  • Created and maintained UDB DDL for databases, table spaces, tables, views, triggers, and stored procedures. Resolved lock escalations, lock-waits and deadlocks.
  • Experience in working with business intelligence and data warehouse software, including SSAS, Pentaho, Cognos Database, Amazon Redshift, or Azure Data Warehouse.
  • Worked on Informatica Power Center tools-Designer, Repository Manager, Workflow Manager.
  • Logical and physical database designing like Tables, Constraints, Index, etc. using Erwin, ER Studio, TOAD Modeler and SQL Modeler.
  • Extensive ETL testing experience using Informatica 9x/8x, Talend, Pentaho.
  • Experience in Dimensional Data Modeling, Star/Snowflake schema, FACT & Dimension tables.
  • Experience with relational (3NF) and dimensional data architectures. Experience in leading cross-functional, culturally diverse teams to meet strategic, tactical and operational goals and objectives.
  • Good exposure to BI reporting Microstrategy 8i, 9i & Tableau & SQL programming, RDBMS - Teradata, Oracle, and SQL server.
  • Expertise on Relational Data modeling (3NF) and Dimensional data modeling.
  • Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
  • Practical understanding of the Data modeling (Dimensional & Relational) concepts like Star-Schema Modeling, Snowflake Schema Modeling, Fact and Dimension tables.

TECHNICAL SKILLS:

Big Data Hadoop: Map Reduce, HDFS, Hive, Pig, HBase, Zookeeper, Sqoop, Oozie, Flume, Scala, Akka, Kafka, Storm, Mongo DB.

Data Analysis/ Modeling Tools: Erwin R6/R9, IBM Infosphere, ER Studio and Oracle Designer.

Database Tools: Microsoft SQL Server12.0, Teradata 15.0, Oracle 11g/9i/12c and MS Access

BI Tools: Tableau 7.0/8.2, Tableau server 8.2, Tableau Reader 8.1,SAP Business Objects, Crystal Reports Packages: Microsoft Office 2010, Microsoft Project 2010, SAP and Microsoft Visio, Share point Portal Server

ETL/Data warehouse Tools: Informatica 9.6/9.1/8.6.1/8.1 , SAP Business Objects XIR3.1/XIR2, Web Intelligence, Talend, Pentaho.

Tools: OBIE 10g/11g/12c, SAP ECC6 EHP5, Go to meeting, Docusign, Insidesales.com, Share point, Mat-lab.

AWS: MS Azure, AWS, EC2, S3.

Operating System: Windows, Unix, Sun Solaris

RDBMS: Microsoft SQL Server14.0, Teradata 15.0, Oracle 12c/11g/10g and MS Access.

Other Tools: TOAD, SQL PLUS, SQL LOADER, MS Project, MS Visio and MS Office, Have worked on C++, UNIX, PL/SQL.

PROFESSIONAL EXPERIENCE:

Confidential, Austin, TX

Sr. Big Data Developer

Responsibilities:

  • Design/ develop and implemented complex projects dealing with the considerable data size (GB/ PB) and with high complexity.
  • Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
  • Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, Hive .
  • Implementation of Big Data ecosystem(Hive, Impala, Sqoop, Flume, Spark, Lambda)with Cloud Architecture.
  • Generate metadata, create Talend etl jobs, mappings to load data warehouse, data lake.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Used Talend for Big data Integration using Spark and Hadoop.
  • Used Microsoft Windows server and authenticated client server relationship via Kerbros protocol.
  • Experience on BI reporting with At Scale OLAP for Big Data.
  • Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
  • Designed data analysis and visualization using BigSQL DSM and IBM Big Sheet.
  • Designed and Developed Real time Stream processing Application using Spark, Kafka, Scala and Hive to perform Streaming ETL and apply Machine Learning.
  • Identify query duplication, complexity and dependency to minimize migration efforts
  • Experience in AWS, implementing solutions using services like (EC2, S3, RDS, Redshift, VPC)
  • Worked on Talend Magic Quadrant for performing fast integration tasks.
  • Worked as a Hadoop consultant on (Map Reduce/Pig/HIVE/Sqoop).
  • Worked using Apache Hadoop ecosystem components like HDFS, Hive, Sqoop, Pig, and Map Reduce.
  • Worked with AWS to implement the client-side encryption as Dynamo DB does not support at rest encryption at this time.
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Used Data Frame API in Scala for converting the distributed collection of data organized into named columns.
  • Performed data profiling and transformation on the raw data using Pig and Python.
  • Experienced with batch processing of data sources using Apache Spark.
  • Developing predictive analytic using Apache Spark Scala APIs.
  • Involved in working of big data analysis using Pig and User defined functions (UDF).
  • Created Hive External tables and loaded the data into tables and query data using HQL.
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
  • Implement enterprise grade platform(mark logic) for ETL from mainframe to NOSQL(cassandra).
  • Experience on BI reporting with At Scale OLAP for Big Data.
  • Responsible for importing log files from various sources into HDFS using Flume.
  • Worked on tools Flume, Storm and Spark.
  • Assigned name to each of the columns using case class option in Scala.
  • Enhancements to traditional data warehouse based on STAR schema, update data models, perform Data Analytics and Reporting using Tableau.
  • Expert in performing business analytical scripts using Hive SQL.
  • Implemented continuous integration & deployment (CICD) through Jenkins for Hadoop jobs.
  • Worked in writing Hadoop Jobs for analyzing data using Hive, Pig accessing Text format files, sequence files, Parquet files.
  • Experience in integrating oozie logs to kibana dashboard.
  • Extracted the data from MySQL, AWS RedShift into HDFS using Sqoop.
  • Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
  • Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
  • Developed Spark streaming application to pull data from cloud to Hive table.
  • Used Spark SQL to process the huge amount of structured data.

Environment: Spark, YARN, HIVE, Pig, Scala, Python, Hadoop, AWS, Dynamo DB, Kibana, EMR, JDBC, Redshift, NOSQL, Sqoop, MYSQL, Star Schema, Flume, Scala, oozie.

Confidential, Albertville, AL

Big Data Engineer

Responsibilities:

  • Gathered the business requirements from the Business Partners and Subject Matter Experts.
  • Involved in installing Hadoop Ecosystem components.
  • Wrote Sqoop scripts for moving data between Relational DBs, HDFS, and S3 storage and automated the workflow using Oozie.
  • Proof-of-concept to determine feasibility and product evaluation of Big Data products
  • Writing Hive join query to fetch info from multiple tables, writing multiple Map Reduce jobs to collect output from Hive.
  • Created high level and detail data models for Azure SQL Databases, NonSQL databases, as well as the use of storages for logging and data movement between on-premise data warehouse and cloud vNets.
  • Designed both 3NF data models for ODS, OLTP, OLAP systems and dimensional data models using star and snow flake Schemas.
  • Developed and designed data integration and migration solutions in Azure.
  • Implemented Spark GraphX application to analyze guest behavior for data science segments.
  • Ingest data into Hadoop / Hive/HDFS from different data sources.
  • Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard.
  • Involved in developing Map-reduce framework, writing queries scheduling map-reduce
  • Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
  • Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
  • Wrote ETL jobs to read from web apis using REST and HTTP calls and loaded into HDFS using Talend.
  • Generated comprehensive analytical reports by running SQL queries against current databases to conduct data analysis.
  • Imported data frequently from MySQL to HDFS using Sqoop.
  • Created Hive tables and working on them using Hive QL .
  • Worked on configuring and managing disaster recovery and backup on Cassandra Data.
  • Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
  • Work with Development, Storage and Network teams in installation and administration of MongoDB in the IT Enterprise Environment.
  • Developed Talend ESB services and deployed them on ESB servers on different instances.
  • Designed and implemented the MongoDB schema.
  • Effectively used Informatica parameter files for defining mapping variables, workflow variables, FTP connections and relational connections.
  • Finalize the naming Standards for Data Elements and ETL Jobs and create a Data Dictionary for Meta Data Management.
  • Produced PL/SQL statement and stored procedures in DB2 for extracting as well as writing data.
  • Wrote and executed SQL queries to verify that data has been moved from transactional system to DSS, Data warehouse, data mart reporting system in accordance with requirements.
  • Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs for data cleaning and preprocessing.
  • Designed development of various reports and dashboards that can be easily accessible through Tableau.

Environment: : Pig, Sqoop, Kafka, Apache Cassandra, Oozie, Impala, Flume, Apache Hadoop, HDFS, Hive, Map Reduce, Cassandra, Zookeeper, Azure, MySQL, SQL Server, SQL Server Analysis Services, Oracle12c, Eclipse, Dynamo DB, PL/SQL .

Confidential, Northbrook

Sr. Data Analyst / Modeler

Responsibilities:

  • Analyzed the business requirements by dividing them into subject areas and understood the data flow within the organization
  • Worked very close with Data Architectures and DBA team to implement data model changes in database in all environments.
  • Developed data Mart for the base data in Star Schema, Snow-Flake Schema involved in developing the data warehouse for the database.
  • Developed enhancements to Mongo DB architecture to improve performance and scalability.
  • Created DDL scripts for implementing Data Modeling changes. Created ERWIN reports in HTML, RTF format depending upon the requirement, Published Data model in model mart, created naming convention files, co-coordinated with DBAs' to apply the data model changes.
  • Developing predictive analytic using Apache Spark Scala APIs.
  • Assigned name to each of the columns using case class option in Scala.
  • Experienced with batch processing of data sources using Apache Spark.
  • Worked on Unit Testing for three reports and created SQL Test Scripts for each report as required.
  • Involved in developing Map-reduce framework, writing queries scheduling map-reduce.
  • Involved in data modelling to define the table structure in MDM system.
  • Implement enterprise grade platform (mark logic) for ETL from mainframe to NoSQL (cassandra)
  • Designed and developed Data Mapping Application for 30+ disparate source systems (COBOL, MS Sql Server, Oracle, and Mainframe DB2), using MS Access, and UNIX Korn script.
  • Extensively used Erwin as the main tool for modeling along with Visio.
  • Installed, configured and administered JBOSS 4.0 server in various environments.
  • Created DDL scripts for implementing Data Modeling changes. Created ERWIN reports in HTML, RTF format depending upon the requirement, Published Data model in model mart, created naming convention files, co-coordinated with DBAs' to apply the data model changes.
  • Extensively used Erwin as the main tool for modeling along with Visio
  • Established and maintained comprehensive data model documentation including detailed descriptions of business entities, attributes, and data relationships.
  • Worked on Metadata Repository (MRM) for maintaining the definitions and mapping rules up to mark.
  • Developed the Conceptual Data Models, Logical Data models and transformed them to creating schema using ERWIN.
  • Performed data cleaning and data manipulation activities using NZSQL utility.
  • Analyzed the physical data model to understand the relationship between existing tables.
  • Created a list of domains in Erwin and worked on building up the data dictionary for the company.

Environment: Erwin r8.2, Oracle SQL Developer, Oracle Data Modeler, Teradata 14, SSIS, Business Objects, SQL Server, ER/Studio Windows, MS Excel.

Confidential, Sunnyvale

Data Analyst/Modeler

Responsibilities:

  • Developed the logical data models and physical data models that capture current state/future state data elements and data flows using ER Studio.
  • Delivered dimensional data models using ER/Studio to bring in the Employee and Facilities domain data into the oracle data warehouse.
  • Performed analysis of the existing source systems (Transaction database)
  • Involved in maintaining and updating Metadata Repository with details on the nature and use of applications/data transformations to facilitate impact analysis.
  • Created DDL scripts using ER Studio and source to target mappings to bring the data from source to the warehouse.
  • Designed the ER diagrams, logical model (relationship, cardinality, attributes, and, candidate keys) and physical database (capacity planning, object creation and aggregation strategies) for Oracle and Teradata .
  • Worked in importing and cleansing of data from various sources like Teradata, Oracle, flatfiles, MS SQL Server with high volume data
  • Designed Logical & Physical Data Model /Metadata/ data dictionary using Erwin for both OLTP and OLAP based systems.
  • Reverse Engineered DB2 databases and then forward engineered them to Teradata using ER Studio.
  • Part of team conducting logical data analysis and data modeling JAD sessions, communicated data-related standards .
  • Involved in meetings with SME (subject matter experts) for analyzing the multiple sources.
  • Involved in SQL queries and optimizing the queries in Teradata.
  • Created DDL scripts using ER Studio and source to target mappings to bring the data from source to the warehouse.
  • Worked in importing and cleansing of data from various sources like Teradata, Oracle, flat files, SQL Server 2005 with high volume data.
  • Wrote and executed SQL queries to verify that data has been moved from transactional system to DSS, Data warehouse, data mart reporting system in accordance with requirements.
  • Worked in importing and cleansing of data from various sources like Teradata, Oracle, flat files, SQL Server 2005 with high volume data
  • Worked extensively on ER Studio for multiple Operations across Atlas Copco in both OLAP and OLTP applications.
  • Used forward engineering to create a physical data model with DDL that best suits the requirements from the Logical Data Model.
  • Worked with the DBA to convert logical Data models to physical Data models for implementation.
  • Involved in preparing the design flow for the Data stage objects to pull the data from various upstream applications and do the required transformations and load the data into various downstream applications.

Environment: Oracle Data Modeler, Business Objects, Erwin r8.2, Oracle SQL Developer, SQL Server 2008, Teradata, ER/Studio, SSIS, Windows, MS Excel.

Confidential

Data Analyst

Responsibilities:

  • Responsible for the development and maintenance of Logical and Physical data models, along with corresponding metadata, to support Applications.
  • Worked with Business users during requirements gathering and prepared Conceptual, Logical and Physical Data Models.
  • Created conceptual, logical and physical data models using best practices and company standards to ensure high data quality and reduced redundancy.
  • Wrote PL/SQL statement, stored procedures and Triggers in DB2 for extracting as well as writing data.
  • Project involves production, test, and development administration and support for client's existing DB2 UDB platform running DB2 UDB v9.1 and v8.2 on servers under various operating system.
  • Attended and participated in information and requirements gathering sessions
  • Translated business requirements into working logical and physical data models for Data warehouse, Data marts and OLAP applications.
  • Designed Star and Snowflake Data Models for Enterprise Data Warehouse using ERWIN
  • Created and maintained Logical Data Model (LDM) for the project. Includes documentation of all entities, attributes, data relationships, primary and foreign key structures, allowed values, codes, business rules, glossary terms, etc.
  • Validated and updated the appropriate LDM's to process mappings, screen designs, use cases, business object model, and system object model as they evolve and change.
  • Responsible for the development and maintenance of Logical and Physical data models, along with corresponding metadata, to support Applications.
  • Excellent knowledge and experience in Technical Design and Documentation.
  • Used forward engineering to create a physical data model with DDL that best suits the requirements from the Logical Data Model.
  • Involved in preparing the design flow for the Data stage objects to pull the data from various upstream applications and do the required transformations and load the data into various downstream applications.
  • Involved in preparing the design flow for the Data stage objects to pull the data from various upstream applications and do the required transformations and load the data into various downstream applications.
  • Performed logical data modeling, physical data modeling (including reverse engineering) using the Erwin Data Modeling tool.

Environment: Oracle 9i, PL/SQL, Solaris 9/10, Windows Server, NZSQL, Erwin, Toad, Informatica, IBM OS 390(V6.0), DB2 V7.

Hire Now