Big Data Analyst Resume

SUMMARY

Experienced in architecting, developing and implementing Big - Data technologies in core and enterprise software development initiatives and applications that perform large scale Distributed Data Processing for Big data analytics using Java and Big Data ecosystem tools; Hadoop, Hive, Pig, Sqoop, HBase, Cassandra, Scala, Spark, Spark SQL, Spark Streaming, MLlib, Oozie, Zoo Keeper, Flume, Yarn, TEZ,MLCP. Hands on experience in using various Hadoop distributions (Cloudera, Hortonworks, MapR). Experience on Talend BigData Enterprise, Talend Data Integration, and Talend Data Quality platforms to perform different types of transformation, File and Database, Exception handling using Talend and worked on Talend Administrator Console (TAC) for deployment, scheduling jobs and adding users.
Strong skills in IBM-DataStage 9.1/8.5/7.5, Talend 5.3/6.2, SQL Programming, IBM DB2, Teradata, SQL Server, Oracle PL/SQL, Netezza, Mark Logic, MongoDB, HBase, Cassandra,MySQL, Debugging, Performance tuning and Shell Scripting.
In-Depth knowledge and experience in design, development and deployments of Big Data projects using Hadoop / Data Analytics / NoSQL / Distributed Machine Learning frameworks.
Experience in design and development of Map Reduce programs, User Defined Functions (UDFs ) in Java, PIG and Hive and handling ETL transformations using Pig Latin scripts, expressions, join operations and Custom UDF's for evaluation, filtering and storing data.
Experience using Sqoop to import data into HDFS from RDBMS and vice-versa and dealing with log files to extract data and to copy into HDFS using Flume.
Experience in using Spark SQL and MLlib libraries and converting business process into RDD transformations Spark and Scala.
Experienced in Capacity planning, Designing, Deployment, Troubleshooting, tuning the clusters.
Expertise on Logical and Physical modelling of Landing/Staging/Foundation and Mart Layers.
Used Model Mart of Erwin for effective model management of sharing, dividing and reusing model information and design for productivity improvement.
Created the High level design documents and Source to Target Mapping for ETL/ELT process.
Reviewed SQL to ensure efficient plans and performance CPU thresholds. Worked closely with Architects to adjust SQL as necessary.
Extensive experience with Data Extraction, Transformation, and Loading (ETL) from multiple data sources Oracle, DB2, SQL Server, XML, Flat files into Analytical and Enterprise Data Model.
Coded complex SQL to load data into Foundation/Aggregate tables. expertise in automating build processes, application deployments and continuous integration systems such as Jenkins, Team City, Terraform, TFS
Lead a team of 5 to 10 people and guide developers and expert in fulfilment of project tasks such as Planning, Requirement gathering, identifying sources, and execution of projects including planning of implementation activities.
Have good knowledge on Retail, Health care, Insurance and Logistics domains.
Worked in both Waterfall and Agile/SRCUM methodologies experience in Database Programming using Microsoft Technology (MS- Sql Server, SSIS,SSRS,MS-Excel

PROFESSIONAL EXPERIENCE

Confidential

Big Data Analyst

Responsibilities:

Gathered the business requirements from the Business Partners and SMEs.
Involved in data analysis, preparation of design documents and mappings including HIPAA.
Developed data ingestion, preprocess, post ingestion transformation from various data sources CDB, ORx, Cirrus loaded data into HDFS/Hive/HBase.
Created the Talend jobs to maintain the audits and status of ingestion in SPP, EPP, EIT, PET, PRT layers.
Involved in deploying multiple modules using Talend TAC.
Develop the ETL mappings for XML, csv, txt sources and also loading the data from these sources into relational tables with Talend ETL.
Developed Talend MDM to manage the master data for single or multiple domains - customers, patients, member, locations, provider, services offerings and accounts.
Implementing Data Integration process with Talend Big Data Integration
Design, develop and deploy end-to-end Data Integration solution.
Developed Talend jobs to load the data into Hadoop and Hive and used most of the Talend BigData components like tHdfsInput, tHdfsOutput, tHiveLoad etc
Used different components in Talend like tMap, tJoin,tFileInputDelimited, tForeach, tLoop, tConvertType, tFixedFlowInput, tfileoutputdelimited, tJava, tJavaFlex, tUnique, tFlowToIterate, tIntervalmatch, tLogcatcher, tFilelist, tAggregate, tSort, tMDMInput, tMDMOutput etc.
Extensively worked with Hadoop datasets to handle large volumes of data.
Involved in developing Unix Shell Scripts/Batch Scripts.
Worked extensively with Sqoop for importing data SQL Server, Teradata and CDC for incremental data.
Imported the JSON document files to MarkLogic database using MLCP scripts.

Confidential

ETL Developer

Responsibilities:

Design of ELT processes that extract information from various foundational entities which helps to define set of item/store eligible for Instock Tracking
Error processing and count matching wherever applicable
High level Capacity planning for database
Define data refresh frequency and mechanism.
Design of ELT processes for initial load that extract information from foundation data sources
High level coverage of SLA details and its dependency on various subject area
Designing Data stage jobs for migrating the data from source to landing area in Teradata
Create a business driven approach to feeding data down stream
Analyzed, designed, developed, implemented and maintained Parallel jobs using IBM info sphere Data stage
Extracted data from flat files and then transformed according to the requirement and Loaded into target tables using various stages like sequential file , Look up, Aggregator, Transformer, Join, Remove Duplicates, Change capture data, Sort, Column generators, Funnel and Oracle Enterprise.
Used the ETL Data Stage Director to schedule and running the jobs, testing and debugging its components & monitoring performance statistics.
Developed DS jobs to populate the data into staging and Data Mart.
Worked on Teradata optimization and performance tuning.
Interacting with offshore team.

Environment: IBM Ionosphere Data Stage 8.5,IBM Ionosphere Data Stage 8.1, Data Stage 8.0.1 (Designer, Director, Administrator), IBM DB2/UDB,Teradata 14, UNIX, JIRA, ERWIN, Data Quality, Control M,,CA7,Teradata View point,Netezza JCL,COBOL,FTP,SFTP,SSIS,SSAS,SSRS.

Confidential

ETL Developer

Responsibilities:

Involved in creation/review of functional requirement specifications and supporting documents for business systems, experience in database design process and data modeling process.
Experience in Creating Star Schema and Snow Flake Schemas.
Designing, building and testing all reports, ensuring delivery according to time constraints and programmed goals.
Support the project team members and work alongside third parties to achieve project goals
Developed test cases, involved in in Unit System, Integration System, User acceptance System testing and make sure that the accuracy of the functionality. Supported end users in UAT testing.
Debug, monitor and troubleshoot BI solutions
Attending the Recoveries to resolve the business issues.
Involved in version control using VSS and updating the design documents, code.
Collaborated with EDW team in, High Level design documents for extract, transform, validate and load ETL process data dictionaries, Metadata descriptions, file layouts and flow diagrams.
Collaborated with EDW team in, Low Level design document for mapping the files from source to target and implementing business logic.
Developed DS jobs to populate the data into staging and Data Mart .
Support the project team members and work alongside third parties to achieve project goals
Developed test cases, involved in in Unit System, Integration System, User acceptance System testing and make sure that the accuracy of the functionality. Supported end users in UAT testing.
Debug, monitor and troubleshoot BI solutions
Developed Numerous Fast Load, MultiLoad, and TPump FastExport and BTEQ scripts for loading.
Created Teradata External loader connections such as MLoad, Fastload while loading data into the target tables in Tera Data Database.

Environment: IBM Ionosphere Data Stage 8.5,IBM Ionosphere Data Stage 8.1, Data Stage 8.0.1 (Designer, Director, Administrator), IBM DB2/UDB.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship