Senior Data Analyst Resume

SUMMARY:

Have 5 years of work experience in software industry, mainly as a Big data and Spark Developer in Data Warehousing project of Confidential (System Engineer) and Confidential (senior Data Analyst)
Exposure to emerging technologies such as, Big Data ( HDFS, Hadoop, Hive, Impala, Yarn, Spark,Sqoop, Oozie, Python, Core Java), Oracle 11 g PL/SQL, Data Warehousing,Kafka and Flume
Informatica PowerCentre 9.6, Basics of Machine learning with python, Qlikview, Tableau, Basic knowledge of React Js and Javascript

PROFESSIONAL SKILLS AND COMPETENCIES:

Have professional work experience in Data warehousing Project
Have hands - on experience in HDFS, Hive, Spark Core, Spark SQL, Python, Impala, Big Data/ Hadoop, Sqoop, Oozie, Pig
Extensive hands-on on developing PySpark framework with the help of Python libraries(pandas, Datapackage, scikit-learn, matplotlib etc) and spark data structures(RDD, Dataframe and Dataset)
Have used git with Bitbucket and Artifactory for Repository Management, Jira for SAFe(Scaled Agile)
Experience of scheduling jobs in Oozie, CA7, Control M
Have technical tools Anaconda Navigator(Jupyter Notebook) and Spyder for Python and spark Framework, Informatica PowerCentre 9.6 for ETL mapping, Teradata studio and PLSQL Developer for SQL query generation, Hue web Interface for Hive and Impala queries and Hadoop file management
Working knowledge of Reporting tool like BO, OBIEE, Tableau and Qlikview
Basic knowledge of Machine learning and Deep learning with Python(Scikit -learn, Seaborn, matplotlib, pandas, goodtables, Datapackage) and React Js and JAvascrpt .

KEY AREAS OF WORK:

PROFESSIONAL EXPERIENCE:

Confidential

Senior Data Analyst

Environment: Anaconda Navigator (Jupyter), Spark 2, PySpark, Hive Shell, Unix, Hue, WinScp, Python

Responsibilities:

To manage the ISR and BCM raw feed files provided by their corresponding user in the designated server location, we have to extract and load that feed files into the hive partitioned tables (partition is based on requirements) in SDL(Secure Data Lake) feed framework .
After that we applied business logic and transformations and created intermediate views on that data and then load the derived data into hive tables again in IRRM project space. For reporting purpose, Qlikview processes that derived data and generate reports.

Confidential

Environment: Anaconda Navigator (Jupyter), Spark 2, PySpark, Hive Shell, Unix, Hue, WinScp, Python

Big Data and Spark Developer

Responsibilities:

Create mapping, session, command task, Workflows, CA7 jobs, Unix script for email generation

Confidential

Environment: Python, Hive, Impala, Spark, Sqoop, Teradata, Hadoop

Hadoop Developer

Responsibilities:

Involved in creating of Frame work from scratch, which replace existing system ETL.

Confidential

Developer

Responsibilities:

Development & design: Extract, Transform and Load (ETL) processes using Informatica and Oracle Warehouse Builder. Optimized PL/SQL code; writing efficient SQL queries and improving data quality by performing data analysis.
Development as well as Performance tuning of the ETL codes for populating multiple Layers as Stage, Integration (IDS), Data mart and Aggregate Data mart using Informatica, OWB and PL/SQL.
Supporting data load processes by Creating process flows and Control - M scheduling.
Analysis of the functional requirements and planning of execution to achieve the requirements during development phase.
Development & design: Extract, Transform and Load (ETL) processes using Informatica
Powercentre and Oracle Warehouse Builder. Optimized PL/SQL code; writing efficient SQL queries and improving data quality by performing data analysis.
Preparing daily report in Microsoft excel which is shared with client.