We provide IT Staff Augmentation Services!

Sr. Data Analyst Resume

4.00/5 (Submit Your Rating)

OH

SUMMARY

  • Nearly 8 years of working experience in Data Engineer, Hadoop Development.
  • End to end experience in designing and deploying data visualizations using Tableau
  • Worked in various domains including luxury, telecommunication.
  • Great noledge on various reporting objects like hierarchies, filters, calculated fields, sets, sets, groups and parameters.
  • Worked on development of dashboard reports for teh key performance indicators for teh top management.
  • Excellent understanding ofHadoop architecture and various components such asHDFS,YARN, High Availability, and MapReduce programming paradigm.
  • Experience in analysing data usingHiveQL,HBaseand customMapReduce programsin Java.
  • ExtendedPigandHivecore functionality by writing custom UDFs.
  • Wrote Ad - hoc queries for analysing teh data usingHIVE QL.
  • Developed real-time read write access to very large datasets viaHBase.
  • Experience in integration of various data sources inRDMSlikeOracle, SQL Server.
  • Used NoSQL Database includingHBase, MongoDB, Cassandra.
  • Worked on Big Data Integration & Analytics based on Hadoop, Spark, Kafka, and web Methods
  • ImplementedSqoopjobs for large sets of structured and semi-structured data migration between HDFS and/or other data storage like Hive or RDBMS.
  • Consolidated MapReduce jobs by implementedSpark, decreased data processing time.
  • In depth understand of Scalable Machine Learning libraries likeApache Mahout, MLlib.
  • UsedMavento achieve source building framework.
  • Theoretical noledge about teh Genesys tools like code editor dev tool and teh search query Builder dev tool.
  • Implementing and Managing ETL solutions and automating operational processes.
  • Presenting data in a visually appealing toolTableau.
  • Experienced inAgileandWaterfallmethodologies.
  • Knowledge inData MiningandMachine Learning, such as classification, clustering, regression, and anomaly detection.
  • Hands of experience inGCP, Big Query, GCS bucket, G - cloud function, cloud dataflow, GSUTIL, BQ command line utilities, Data Proc, Stack driver
  • Successfully working in fast-paced environment, both independently and in collaborative team environments.
  • Played a key role inmigrating Cassandra, Hadoop cluster on AWS and defined different read/write strategies
  • Knowledge in teh Informatica/ ETL Extract, Transform and Load of data into a data ware house/date mart and Business Intelligence BI tools like Business Objects Modules Reporter, Supervisor, Designer, and Web Intelligence.
  • Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
  • Good experience inOozieFramework and Automating daily import jobs.
  • Experience in creating robust and reliable Data pipelines.

TECHNICAL SKILLS

Big Data: HDFS, MapReduce, Hive, Spark, Kafka, Pig, Sqoop, Flume, Oozie, and Zookeeper, ETL.

No SQL Databases: HBase,Cassandra, MongoDB

Languages: C, Python, Java, J2EE, PL/SQL, Pig Latin, HiveQL, Unix shell scripts, R Programming

Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL, RMI, JMS, Java Script, JSP, Servlets, EJB, JSF, jQuery

Frameworks: MVC, Struts, Spring, Hibernate

Operating Systems: Sun Solaris, HP-UNIX, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8

Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP

Web/Application servers: Apache Tomcat, WebLogic, JBoss

Databases: Oracle 9i/10g/11g, DB2, SQL Server, MySQL, Teradata

Tools: and IDE: Eclipse, NetBeans, Toad, Maven, ANT, Hudson, Sonar, JDeveloper, Assent PMD, DB Visualizer

Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP

Cloud: AWS (EC2, S3, Redshift, Lambda, RDS, EBS, cloud watch), Azure(Azure Data Factory, Data Lake,), Google Cloud Platform

PROFESSIONAL EXPERIENCE

Confidential

Sr. Data Analyst

Responsibilities:

  • Involved in NIFI workflow validation and checking teh matching rate of teh data between teh raw to final layer.
  • Involved in collecting, validating, and analyzing metadata. Must be able to manipulate data, develop visualizations, triangulate and summarize results.
  • Working in building financial and/or data models wif predictive analytics and Schema mapping between teh different data sources and Documenting in teh Confluence.
  • Worked on NiFi data Pipeline to process large set of data from source to destination databases and configured Lookup’s for Data Validation and Integrity.
  • Have made teh updates in teh existing Snow pipe flow and deployed teh changes.
  • Working on teh Clusterize NiFi Pipeline on EC2 nodes integrated wif Spark, Kafka using on other instances using teh Production Environment.
  • Used Spark SQL for Python interface that automatically converts RDD case classes to schema RDD.
  • Perform validations and consolidations for teh imported data, Data Migration and Data Generation. • Worked on developing ETL pipelines on S3 parquet files on data lake using AWS Glue. Performed Data Analytics on S3 Buckets using Pyspark on Databricks platform.
  • Performing Data quality issue analysis using Snow SQL by building analytical warehouses on Snowflake.
  • Involved in teh code migration of quality monitoring tool from AWS EC2 to AWS Lambda and built logical datasets to administer quality monitoring on Snowflake warehouses.

Confidential, OH

Data Engineer/Sr. Data Analyst

Responsibilities:

  • Involved in Data Cleaning and Pre-processing. Performed various Transformation on teh Raw data layer and stored in teh curated layer.
  • Extracted Data from teh Raw layer to teh Curated Layer wif teh SQL Queries.
  • From data ingestion through teh endpoints that make data usable, design, build, and enhance data through procedures.
  • Extract, Transform and load data from sources Systems to Database in snowflake.
  • Created DAGs in Airflow using python to Extract, Transform and load data from different sources like Azure SQL, Blob Storage .
  • Monitoring teh Deployed Schedule’s and tracking teh flow of teh Data and Analysing teh results.
  • Migrating teh Data from Raw layer to App Layer in teh Snowflake.
  • Exposure to teh Margin Miner software used for Filtering and Extracting teh Sales Data from belonging to various Retailer.
  • Created ERD Diagram Following teh Business Rules and teh Requirements using teh Lucid charts software.
  • Provided support for making teh Reports by helping in teh logic and for connecting wif teh Data.
  • Worked wif teh Tableau worksheets to do teh Functional testing on teh Reports. Documenting teh Test results.
  • Created tables in Snowflake and created teh ETL Scripts for loading teh data in teh curated layer in teh Snowflake from teh existing Tables maintaining teh Row level security.
  • Hands on experience on developing SQL Scripts for automation.
  • Created DDL’s for tables and executed them to create tables in teh warehouse for ETL data Loads.
  • Building a reusable Data ingestion and Data transformation frameworks using python.
  • Data Modelling is done based one teh clustering and classification algorithm using Python.
  • Loading a huge number of csv files wif different schema into teh table wif defined schema using PySpark.
  • Experienced wif teh PySpark worked on teh Azure Databricks Created an Automation Script.

Confidential, MA

Data Engineer

Responsibilities:

  • Installed Hadoop, Map Reduce, HDFS, AWS and developed 2multiple Map Reduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Implemented solutions for ingesting data from various sources and processing teh Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, Hive
  • Implemented Spark GraphX application to analyse guest behaviour for data science segments.
  • Exploring wif teh Spark improving teh performance and optimization of teh existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Worked on batch processing of data sources using Apache Spark, Elastic search.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.
  • Worked on migrating PIG scripts and Map Reduce programs to Spark Data frames API and Spark SQL to improve performance.
  • Developed teh Talend mappings using various transformations, Sessions and Workflows. Teradata was teh target database, Source database is a combination of Flat files, Oracle tables, Excel files and Teradata database.
  • Created Hive External tables to stage data and then move teh data from Staging to main tables.
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
  • Created Data Pipelines as per teh business requirements and scheduled it using Oozie Coordinators.
  • Worked wif NoSQL database HBase in getting real time data analytics.
  • Able to assess business rules, collaborate wif stakeholders and perform source-to-target data mapping, design, and review.
  • Design, implement, and improve data pipelines throughout our data platform from data ingestion through teh endpoints used to make data actionable.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as MapReduce Hive, Pig, and Sqoop.
  • Created scripts for importing data into HDFS/Hive using Sqoop from DB2.
  • Loading data from different source (database & files) into Hive using Talend tool.
  • Conducted POC's for ingesting data using Flume.
  • Objective of dis project is to build a data lake as a cloud-based solution in AWS using Apache Spark.
  • Designed, developed, and maintained data integration programs in Hadoop and RDBMS environment wif both RDBMS and NoSQL data stores for data access and analysis.
  • Used all major ETL transformations to load teh tables through Informatica mappings.
  • Created Hive queries and tables that helped line of business identify trends by applying strategies on historical data before promoting them to production.
  • Worked on Data modelling, Advanced SQL wif Columnar Databases using AWS. Transforming teh data from teh Hive to AWS S3 storage using AWS Glue.
  • Experience wif Snowflake cloud data warehouse and AWS S3 bucket for integrating data from multiple source system which include loading nested JSON formatted data into snowflake table.
  • Used Talend to Extract, Transform and Load data into Netezza Data Warehouse from various sources like Oracle and flat files.
  • Used teh EDI for teh structured messages for transmitting between teh systems.
  • Developed Pig scripts to parse teh raw data, populate staging tables and store teh refined data in partitioned DB2 tables for Business analysis.
  • Worked on managing and reviewing Hadoop log files. Tested and reported defects in an Agile Methodology perspective.
  • Worked wif teh Azure Databricks using Pyspark wif teh other Team for Analysis.
  • Worked on establishing teh required connections from GCP to machine learning platform, Teradata and Db2.
  • Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP and coordinate task among teh team.
  • Write a Python program to maintain raw file archival in GCS bucket.
  • Conduct/Participate in project team meetings to gather status, discuss issues & action items.
  • Provide support for research and resolution of testing issues.

Environment: Hadoop, Cloudera, Talend, Scala, Spark, HDFS, Hive, Pig, Sqoop, DB2, SQL, Linux, Yarn, NDM, Informatica, AWS,AWS Glue, Windows & Microsoft Office, RDBMS, Flume, Data Warehouse, Data Modelling, ETL, NOSQL, Data Lake, Oozie.

Confidential, Jersey, NJ

Big Data Engineer

Responsibilities:

  • Analysed large and critical datasets using HDFS, HBase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper and Spark.
  • Loaded and transformed large sets of structured, semi structured, and unstructured data using Hadoop/Big Data concepts.
  • Performed Data transformations in HIVE and used partitions, buckets for performance improvements.
  • Developing Spark scripts, UDF's using both Spark DSL and Spark SQL query for data aggregation, querying, and writing data back into RDBMS through Sqoop.
  • Designed and developed a Data Lake using Hadoop for processing raw and processed claims via Hive and Informatica.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Ingested data into HDFS using SQOOP and scheduled an incremental load to HDFS.
  • Using Hive to analyse data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on teh dashboard.
  • Created ETL/Talend jobs both design and code to process data to target databases.
  • Worked wif Hadoop infrastructure to storage data in HDFS storage and use Spark / HIVE SQL to migrate underlying SQL codebase in Azure.
  • Experience in testing Big Data Hadoop (HDFS, Hive, Sqoop and Flume), Master Data Management (MDM) and Tableau Reports.
  • Wrote Pig Scripts to generate Map Reduce jobs and performed ETL procedures on teh data in HDFS.
  • Involved in various phases of development analysed and developed teh system going through Agile Scrum methodology.
  • Generate metadata, create Talend ETL jobs, mappings to load data warehouse, data lake.
  • Used Zookeeper to provide coordination services to teh cluster.
  • Analysed data using Hive teh partitioned and bucketed data and compute various metrics for reporting.
  • Used to Analyse teh Hive tables Data using Databricks and Pyspark.
  • Implemented a CI/CD pipeline wif Docker, Jenkins (TFS Plugin installed).
  • Designing and developing code, scripts, and data pipelines that leverage structured and unstructured data integrated from multiple sources.
  • Built Azure Data Warehouse Table Data sets for Power BI Reports.
  • Import data from sources like HDFS/HBase into Spark RDD.
  • Good experience in developing Hive DDLs to create, alter and drop Hive TABLES.
  • Working on BI reporting wif At Scale OLAP for Big Data.
  • Implemented Kafka for streaming data and filtered, processed teh data.
  • Designed and Developed Real time Stream processing Application using Spark, Kafka, Scala, and Hive to perform Streaming ETL and apply Machine Learning. Load balancing of ETL processes, database performance tuning ETL processing tools.
  • Created Talend jobs to load data into various Oracle tables. Utilized Oracle stored procedures and wrote fewJava code to capture global map variables and use them in teh job.
  • Developed data pipeline using flume, Sqoop and pig to extract teh data from weblogs and store in HDFS.
  • Optimizing and tuning teh Redshift environment, enabling queries to perform up to 100x faster for Tableau and SAS Visual Analytics
  • Brought data from various sources into Hadoop using Kafka.
  • Developed Shell scripts for scheduling and automating teh job flow.
  • Developed Map Reduce jobs to calculate teh total usage of data by commercial routers in different locations, developed Map reduce programs for data sorting inHDFS.
  • Loaded teh data from Teradata to HDFS using Teradata Hadoop connectors.

Environment: Spark, YARN, HIVE, Pig, Scala, Mahout, NiFi, Python, Hadoop, Azure, Dynamo DB, Kibana, NOSQL, Sqoop, MYSQL, Talend ETL,Flume, Oracle, OLAP, Kafka, Power BI Report, Zookeeper, Kafka, SAS, MapReduce, Shell Script, AWS Redshift.

Client: Protective Life Insurance, Birmingham, AL.

Role: Hadoop Developer

Responsibilities:

  • Responsible for loading customer's data and event logs intoHBaseusing Java API.
  • CreatedHBasetables to store variable data formats of input data coming from different portfolios.
  • Involved in adding huge volumes of data in rows and columns to store data in HBase.
  • Responsible for architectingHadoop clusters wif CDH4 on CentOS, managing wif Cloudera Manager.
  • Involved in initiating and successfully completing Proof of Concept onFlumefor Pre-Processing,
  • Used Flume to collect teh log data from different resources and transfer teh data type toHivetables using different SerDes to store in JSON, XML and Sequence file formats.
  • Used Hive to find correlations between customer's browser logs in different sites and analysed them.
  • End-to-end performance tuning of Hadoop clusters andHadoop MapReduceroutines against very large data sets.
  • Experience in optimization of Map reduce algorithm using combiners and partitions to deliver teh best results and worked on Application performance optimization for a HDFS/Cassandra cluster.
  • Created and maintained technical documentation for launching Hadoop Clusters and for executingHive queriesandPig Scripts.
  • Created User accounts and given teh users teh access to teh Hadoop Cluster. DevelopedPig Latinscripts to extract teh data from teh web server output files to load into HDFS.
  • Developed thePigUDF's to pre-process teh data for analysis. Loaded files to Hive and HDFS from MongoDB Solr.
  • Used Cloud watch logs to move application logs to S3 and create alarms based on a few exceptions raised by applications. Worked on Processing JSON, Parquet files using SNOWFLAKE.
  • Worked wif spark core, Spark Streaming and spark SQL modules of Spark.
  • Exploring wif Spark various modules of Spark and working wif Data Frames, RDD and Spark Context.
  • Utilize Azure API’s to automate functionality in Data Factory, Data Lake, and other sources.
  • Troubleshooting and finding teh bugs in teh Hadoop applications and to clear off all teh bugs took help from teh testing team.
  • Monitored Hadoop cluster job performance and performed capacity planning and managed nodes on Hadoop cluster.
  • Created a Clusters in teh Azure Databricks and also worked wif Azure storage explorer.
  • Involved in developing SSIS/DTS Packages to extract, transform and load (ETL) data into data warehouse/data martsfrom heterogeneous sources.
  • Responsible for usingOozieto control workflow.

Environment: Hadoop 2.0, HDFS, Pig 0.11, Hive 0.12.0, MapReduce 2.5.2, Sqoop, LINUX, Flume 1.94, Kafka 0.8.1, HBase 0.94.6, CDH4, Cassandra, Oozie 3.3.0, JSON, XML, MongoDB, Hadoop Cluster.

Client: Sunera Tech, Hyderabad

Role: Big Data Developer

Responsibilities:

  • Worked on analysing Hadoop cluster and different big data analytic tools such asHiveQL.
  • Importing and exporting data inHDFSandHiveusingSqoop.
  • Extracted BSON files fromMongoDBand placed inHDFSand processed.
  • Designed and developedMapReducejobs to process data coming in BSON format.
  • Worked on teh POC to bring data toHDFSandHive. WrittenHiveUDFs to extract data from staging tables. Involved in creatingHive tables, loading wif data.
  • Hands on writingMapReducecode to make unstructured data as structured data and for inserting data into. Experience in creating integration betweenHiveandHBase.
  • Familiarized wif job scheduling using Fair Scheduler so that CPU time is well distributed amongst all teh jobs. Assists management in identifying opportunities to streamline and improve processes.
  • UsedOoziescheduler to submit workflows.
  • Review QA test cases wif teh QA team.Strong understanding of teh principles of Data warehousing, Fact Tables, Dimension Tables, Star and Snowflake schema modeling.
  • Strong experience wif Database performance tuning and optimization, query optimization, index tuning, caching and buffer tuning.
  • Implemented stored procedures, functions, views, triggers, packages in PL/SQL.
  • Worked in importing and cleansing of data from various sources likeTeradata, Oracle, flat files,SQL Server 2005 wif high volume data.
  • Performingdata management projects and fulfilling ad-hocrequests according to user specifications by utilizing data management software programs and tools like Perl, Toad, MS Access, Excel and SQL.
  • Denormalizedthe database to put them into teh star schema of teh data warehouse.
  • Managed all indexing, debugging, optimization and query optimization techniques for performance tuning usingT-SQL.
  • DevelopedPL/SQL triggersandmaster tablesfor automatic creation of primary keys.
  • Created PL/SQLstored procedures, functions and packagesfor moving teh data from staging area to data mart.
  • Experience inautomating and schedulingthe Informatica jobs using UNIX shell scripting configuring Korn-jobs for Informatica sessions.
  • Creating internal and external reports using stored procedures from theReportingData Warehouse.
  • Worked wif project team representatives to ensure that logical and physical ER/Studio data models were developed in line wif corporate standards and guidelines.
  • Extensively usedStar and Snowflake Schema methodologiesin building and designing teh logical data model into Dimensional Models.
  • Createdentity-relationship diagrams, functional decomposition diagrams and data flow diagrams.
  • Use of data transformation tools such as DTS, SSIS, Informatica or DataStage.
  • Managed all aspects of teh data warehouse such as data migration, validation, integration, cleansing, database/query optimization, stored procedures, functions, view, creating index,reporting. Extensive experience in all phases of RUP and SDLC processes.

Environment: Hadoop 1.2.1, Java JDK 1.6, MapReduce 1.x, HBase 0.70, MySQL, MongoDB, Oozie 3.x, Hive, Sqoop, Data management, Oracle, Query optimization, Data Models, ER/Studio, DTS,SSIS, Informatica, SDLC, SQL, T-SQL, Snowflake,

We'd love your feedback!