Big Data Architect /data Scientist Resume
SUMMARY
- Having around 15+ years of experience in the software industry covering a wide range of projects. Have worked in all areas of the software life cycle from requirements through support/maintenance. Collected and analyzed business data to assist with strategic decision making particularly applied to software.
- Have over 4+ years of experience as Hadoop Architect/senior hadoop technical lead with very good exposure on Hadoop Technologies like HDFS, Map Reduce, Hive, spark,Hbase, Sqoop, HCatalog, Pig, Zookeeper, Flume, and Mahout.
- Expert in understanding the data and designing/Implementing the enterprise platforms like Hadoop Data lake and Huge Data warehouses.
- Experience withData flow diagrams, Data dictionary, Database normalization theory techniques, Entity relation modelinganddesign techniques.
- Expertise in Client - Server application development using Oracle11g/10g/9i/8i, PL/SQL, SQL *PLUS, TOADandSQL*LOADER.
- Effectively made use ofTable Functions, Indexes, Table Partitioning, Collections, Analytical functions, Materialized Views, Query Re-WriteandTransportable table spaces.
- In-depth experience in translating key strategic objectives into actionable and governable roadmaps and designs using best practices and guidelines. Worked on all facets of software development life cycle.
- Good knowledge of Hadoop ecosystems, HDFS, Big Data, ETL(Informatica/SSIS/Sagent),Reporting(Cognos from impromptu to 10.2.X) and RDBMS.
- Excellent understanding / knowledge of Hadoop architecture and various components of such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, MapReduce & YARN.
- Proficient in writing Ad-hoc queries for moving data from HDFS to HIVE and analyzing the data using HIVE QL.
- Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.
- Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
- Collection and Analysis on large set of log data is done using Custom built Input Adapters and Sqoop.
- Expertise in Informatica Power Center 9.x/8.x/7.x/6.x extracting data from Oracle, SQL Server and DB2 databases.
- Strong experience in Extraction, Transformation and Loading (ETL) data from various sources into Data Warehouses and Data Marts using Informatica Power Center (Repository Manager, Designer, Workflow Manager, Workflow Monitor, Metadata Manger), Power Exchange, Power Connect as ETL tool on Oracle, DB2 and SQL Server Databases.
- Extensive experience in Extraction, Transformation and Loading of data using Informatica from heterogeneous sources.
- Extensive experience with all major domains.
- Extensive knowledge of Dimensional Data Modeling like Star and Snowflake schemas and knowledge in designing tools like Erwin and Power Designer.
- Extensively worked on development of Informatica Mappings and Workflows.
- Design and development of OLAP models for analysis.
- Involved in identification of facts, measures, dimensions and hierarchies for OLAP models.
- Good designing and coding skills
- Experience in debugging ETL jobs to check in for the errors and warnings associated with each Job run.
- Provided production support and performed enhancement on existing multiple projects.
- Experience in creating and using Stored Procedures, Functions, Triggers, Views, Synonyms, and Packages in SQL Server 2000/2005, Oracle 10g/9i/8i and DB2.
- Involved in Performance/ Query tuning. Generation /interpretation of explain plans and tuning SQL to improve performance.
- Experience in scheduling of ETL jobs using Crontab, Control-M.
- Knowledge in developing reports using Business Intelligence tools like Business Objects and Cognos 10.x/TM1.
- Maintained outstanding relationship with Business Analysts and Business Users to identify information needs as per business requirements.
- Experience in working in an onsite-offshore structure and effectively coordinated tasks between onsite and offshore teams.
- A highly motivated self-starter and a good team-player with excellent verbal and written communication skills.
- Developed mapplets, reusable transformations.
TECHNICAL SKILLS
ETL: Informatica 9.1.x, SSIS/SSAS,DataStage, Ab Initio and Sagent
Big Data: Hadoop, Map Reduce, Hive, HBase, Spark,Mahout, Sqoop, Pig, Zookeeper, Flume,Oozie.
Reporting Tools: Cognos 10.x,SSRS, Cognos TM1, Hyperion Intelligence client 8.5
RDBMS (databases): Oracle 10g/9i/8i, MS SQL Server2000/7.0, MS Access
GUI: Visual Basic 4.0/5.0/6.0Languages C, C++.
Operating Systems: Windows XP/NT/2000/98
PROFESSIONAL EXPERIENCE
Confidential
Big Data architect /Data Scientist
Responsibilities:
- Involved in the process of data acquisition, data pre-processing and data exploration of claims/customers digital data..
- As a part Data acquistation in, used sqoop and flume to inject the data from server to shadoop using incremental import.
- In pre-processing phase used spark to remove all the missing data and data transformation to create new features.
- In data exploration stage used hive and impala to get some insights about the customerdata.
- Used flume, sqoop,hadoop, spark and oozie for building data pipeline.
- Installed and configuredHadoopMap Reduce, HDFS, Developed multiple Map
- Reduce jobs in java for data cleaning and Processing.
- Importing and exporting data into HDFS and Hive using Sqoop
- Experienced in defining job flows
- Experienced in managing and reviewingHadooplog files.
- Experienced in runningHadoopstreaming jobs to process terabytes of xml format data.
- Load and transform large sets of structured, semi structured and unstructured data
- Responsible to manage data coming from different sources
- Supported Map Reduce Programs those are running on the cluster
- Cluster coordination services through Zookeeper.
- Involved in loading data from UNIX file system to HDFS.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map way
- Automated all the jobs, for pulling data from FTP server to load data into Hive tables using Oozie workflows.
Confidential
Hadoop architect/Hands-on
Responsibilities:
- Implemented high-volume data integration solutions using Hadoop ecosystem tools for transferring large datasets generated by applications to Hadoop file system.
- Streaming IoT device data using Azure event Hub and Kafka/ Kinesis to store in AWS DynamoDB /Hive database for generating reports and gain insights performing analysis to the business.
- Transformed and processed streamed data in real time using Apache spark, storm, AWS lambda to convert the unstructured data to structured format for analysis.
- Transformed the streaming data from kinesis using Apache Spark Scala programs using Streaming packages and stored in data warehouse for reporting.
- Developed MapReduce, Hive Scripts, Pig scripts, Unix Shell scripts, Spark programs using Java/Python/Scala for all ETL loading processes and converting the files into parquet and JSON in the Hadoop File System.
- Used Apache Oozie for scheduling the Hadoop jobs for moving data between RDMS and HDFS system.
- Exported and imported high volume dimension data from RDBMS system to HDFS using SQOOP.
- Developed REST API’s and deploy in AWS Elastic beanstalk to access the records in the data warehouse.
- Visualize the real-time data along with enriched data in SQL database using POWER BI and Tableau by creating dashboards and reports for the business to gain insights for which action needs to be taken.
- Work closely with Management team to develop proof of concepts for the projects using cloud based big data technologies.
Confidential
Big Data architect/technical lead
Responsibilities:
- Analyzed Tera Data procedure to prepare all individual queries information.
- Developed hive queries according to business requirement.
- Developed UDF's in Hive where we don't have some default functions in hive.
- Developed UDF for converting data from Hive table to JSON format as per client requirement.
- Implemented Dynamic partitioning and Bucketing in Hive as part of performance tuning.
- Implemented the workflow and coordinator files using Oozie framework to automate tasks.
- Involved in Unit, Integration, System Testing.
- Prepared all unit test case documents and flow diagrams for all scripts which are used in the project.
- Scheduling and managing jobs on a Hadoop cluster using Oozie work flow.
- Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
- Transforming unstructured data into structured data using PIG.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Designed and developed PIG Latin Scripts to process data in a batch to perform trend analysis.
- Good experience on Hadoop tools like MapReduce, Hive and HBase.
- Worked on both External and Managed HIVE tables for optimized performance.
- Developed HIVE scripts for analyst requirements for analysis.
- Maintenance of data importing scripts using Hive and Map reduce jobs.
- Data design and analysis in order to handle huge amount of data.
- Cross examining data loaded in Hive table with the source data in oracle.
- Working close together with QA and Operations teams to understand, design, and develop and end-to-end data flow requirements.
- Utilising Oozie to schedule workflows.
- Developing structured, efficient and error free codes for Big Data requirements using my knowledge in Hadoop and its Eco-system.
- Storing, processing and analyzing huge data-set for getting valuable insights from them.
Confidential
Hadoop technical Developer
Responsibilities:
- Developing Hive scripts to select the Delta (CDC) and load into HBase tables using pig script
- Transforming data using pig scripts
- Developing MapReduce scripts to count large number of records in HBase tables
- Working on different hive optimization and performance tuning techniques.
- Working on Ingestion of logs intoHadoopusing Flume and Kafka
- Processing logs using spark streaming and loaded into hive tables
- Using Hive SerDe to read and write data in different formats.
- Involved in loading data from UNIX file system to HDFS.
- Responsible for building scalable distributed data solutions usingHadoop.
- Involved in loading data from edge node to HDFS.
- Involved in Design, Architecture and Installation of Big Data andHadoopecosystem components.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Creating workflows using Oozie.
- AutomatedHadoopjobs using Oozie scheduler.
- Gained very good business knowledge on health insurance, claim processing, fraud suspect identification, appeals process etc.
Confidential
Informatica Developer
Responsibilities:
- Leading a team of 12 resources. Including 2 onshore resources.
- Responsibilities included Project planning, effort estimation, work scheduling, delivery scheduling and process compliance.
- Develop realistic action plans with time schedules, critical dates, resource and cost estimates
- Change Control Management & Defect density analysis and prevention.
- Have set up the complete SDLC processes for the project.
- Risk Identification, Controlling & Managing Risks in the project.
- Responding to a change in the client's IT direction, promptly prepared for construction of the planning and forecasting tool, developed a global enterprise-wide application using Cognos TM1.
- Design, develop, and implement Cognos TM1 Applications for Financial Planning and Budgeting models with complex rules.
