Data Analyst Resume
Seattle, WA
SUMMARY:
- Over 7+ years of IT experience in the field of Data/ Business analysis, ETL Development, Data Modeling, and Project Management with 2+ years of experience in Big Data and related Hadoop technologies.
- Excellent knowledge on internal working of HDFS file system, MapReduce
- In - depth knowledge on Hadoop ecosystem components like: Pig, Hive, Sqoop, Flume, Oozie, Zookeeper, Cloudera Manager, Flume
- Experience in deploying Hadoop cluster on public and private cloud environments like: Amazon AWS, Rackspace and Open stack
- Experience in using HCatalog for Hive, Pig and HBase
- Exposure to NoSQL databases HBase and Cassandra
- Have excellent knowledge on Python Collections and Multi-Threading.
- Skilled experience in Python with proven expertise in using new tools and technical developments
- Experience in Apache Spark cluster and streams processing using Spark Streaming
- Worked on several python packages like NumPy, scipy, pandas, pytables etc.
- Strong experience in Business and Data Analysis, Data Profiling, Data Migration, Data Conversion, Data Quality, Data Integration and Metadata Management Services and Configuration Management
- Proficiency in multiple databases like Teradata, MongoDB, Cassandra, MySQL, ORACLE and MS SQL Server
- Design & Implementation of Data Extraction, Transformation & Loading (using SQL Loader, Informatica & other ETL tools), Analyze Oracle & SQL Server Data & Migration of the same.
- Data Analysis-Data collection, data transformation and data loading the data using different ETL systems like SSIS and Informatica.
- Strong experience in interacting with stakeholders/customers, gathering requirements through interviews, workshops, and existing system documentation or procedures, defining business processes, identifying and analyzing risks using appropriate templates and analysis tools.
- Experience in various phases of Software Development life cycle (Analysis, Requirements gathering, Designing) with expertise in documenting various requirement specifications, functional specifications, Test Plans, Source to Target mappings, SQL Joins.
- Good understanding of Relational Database Design, Data Warehouse/OLAP concepts and methodologies
TECHNICAL SKILLS:
BigData Ecosystem: Hadoop, MapReduce, Hive, Pig, HBase, Sqoop, Flume, Oozie, Zookeeper.
Methodologies: Waterfall,RationalUnifiedProcess(RUP),Agile methodology .
Languages: Java, J2EE, SQL.
Business Modelling Tools: Ms-Visio,VisualStudio
Project Management Tools: MS Project, MS Office: Excel (Pivots and Micro's), MPP, JIRA.
Requirement Management Tools: Rational Requisite Pro.
Databases: Oracle, My SQL.
Database Development: T-SQL,PL/SQL.
NoSQL Databases: HBase, MongoDB, Cassandra
Hadoop Distributions: Cloudera, Hortonworks.
Operating Systems: Unix, Windows, Linux .
Design Tools: Microsoft Visio
ETL Tools: Informatica, SSIS.
PROFESSIONAL EXPERIENCE:
Confidential, Seattle, WA
Data Analyst
Responsibilities:
- Worked with BI team in gathering the report requirements and Sqoop to export data into HDFS and Hive.
- Involved in extraction, transformation and loading of data directly from different source systems (flat files/Excel/Oracle/MSSQL/Teradata) using SAS/SQL, SAS/macros.
- Generated PL/SQL scripts for data manipulation, validation and materialized views for remote instances.
- Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Assisted with data capacity planning and node forecasting.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
- Developed MapReduce programs to extract and transform the data sets and results were exported back to RDBMS using Sqoop.
- Was responsible for importing the data (mostly log files) from various sources into HDFS using Flume
- Created tables in Hive and loaded the structured (resulted from MapReduce jobs) data
- Using HiveQL developed many queries and extracted the required information.
- Exported the data required information to RDBMS using Sqoop to make the data available for the claims processing team to assist in processing a claim based on the data.
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Worked on MongoDB database concepts such as locking, transactions, indexes, Sharing, replication, schema design, etc.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
- Managed and reviewed Hadoop log files.
- Performed Data analysis using Python Pandas.
- Processing Data records and returning computed results using Mongo DB Aggregation framework. Parse aggregated data into Apache Solr and graph database Orient DB.
- Involved in the below phases of Analytics using R, Python and Jupiter notebook.
Environment: HDFS, PIG, HIVE, Map Reduce, Linux, HBase, Flume, Sqoop, MongoDB, VMware, Eclipse, Cloudera, and RStudio.
Confidential, Chicago, IL
Hadoop/ Big Data Analyst
Responsibilities:
- Developed MapReduce programs to parse and filter the raw data and partitioned tables in Teradata. Created Hive queries that helped market analysts spot emerging trends by comparing incremental data with Teradata reference tables and historical metrics.
- Involved in running MapReduce jobs for processing millions of records.
- Experienced in migrating Hive QL into Impala to minimize query response time.
- Responsible for Data Modeling as per our requirement in HBase and for managing and scheduling Jobs on a Hadoop cluster using Oozie jobs.
- Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark Sql Context.
- Performed analysis on implementing Spark using Scala and wrote spark sample programs using PySpark.
- Used Solr Search & MongoDB for querying and storing data.
- Created UDFs to calculate the pending payment for the given residential or small business customer’s quotation data and used in Pig and Hive Scripts.
- Experienced in moving data from Hive tables into HBase for real time analytics on Hive tables.
- Handled importing of data from various data sources, performed transformations using Hive. (External tables, partitioning).
- Responsible for data modeling in MongoDB to load data which is coming as structured and unstructured data.
- Unstructured files like XML's, JSON files are processed using custom built Java API and pushed into MongoDB.
- Built and deployed Java applications into multiple UNIX based environments and produced both unit and functional test results along with release notes.
Environment: HDFS, MapReduce, Hive, Pig, Cloudera, Impala, Oozie, Green plum, MongoDB, HBase, Sqoop, Flume, Maven, Python, Cloud Manager, JDK, J2EE, Struts, JSP, Servlets, Solr, WebSphere, HTML, XML, JavaScript, MR unit.
Confidential, Irvine, CA
Hadoop/ Big Data Analyst
Responsibilities:
- Developed Map Reduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables.
- Involved in data ingestion into HDFS using Sqoop from variety of sources using the connectors like jdbc and import parameters.
- Responsible for managing data from various sources and their metadata.
- Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
- Installed and configured Hive and wrote Hive UDF's that helped spot market trends.
- Used Hadoop streaming to process terabytes data in XML format.
- Used Hive Queries in Spark-SQL for analysis and processing the data. Used Scala programming to perform transformations and applying business logic.
- Implemented Partitioning, Dynamic Partition, Indexing and buckets in Hive.
- Loaded the dataset into Hive for ETL Operation.
- Stored processed data in parquet file format.
- Streamed data from data source using Flume.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Developing Snowflake Schemas by normalizing the dimension tables as appropriate.
- Converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.
- Worked with Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark Streaming.
- Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.
- Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like Spark.
- Involved in creating Hive Tables, loading with data and writing Hive queries, which will invoke and run MapReduce jobs in the backend.
- Used SAS enterprise guide and SAS enterprise miner to build statistical models and to conduct statistical analysis/tests.
Environment: Hive, Sqoop, Pig, Puppet, HBase, MongoDB, PowerPivot, and Flume.
Confidential, Indianapolis, IN
Data Analyst
Responsibilities:
- Conducted independent statistical analysis, descriptive analysis and Logistic Regression.
- Performed ad hoc analysis of data sources for all external and internal customers.
- Imported the claims data into Python using Pandas libraries and performed various data analysis.
- Created interactive dashboards and visualizations of claims reports, competitor analysis and improved statistical data using Tableau.
- Supported technical team members for technologies such as Microsoft Excel.
- Formulated procedures for data extraction, transformation and integration of health care data.
- Used Excel's VLOOKUP's to determine the customer data and created pivots to easily access and validate data.
- Worked extensively on Data Profiling, Data cleansing, Data Mapping and Data Quality.
- Created Tableau Dashboards with interactive views, trends and drill downs along with user level security.
- Assisted the team for standardization of reports using SAS macros and SQL.
- Performed Detailed Data Analysis (DDA), Data Quality Analysis (DQA) and Data Profiling on source data.
- Responsible to design, develop and test the software (Informatica, PL SQL, UNIX shell scripts) to maintain the data marts (Load data, analyze using OLAP tools).
- Experience in building Data Integration, Workflow Solutions and Extract, Transform, and Load (ETL) solutions for data warehousing using SQL Server Integration Service (SSIS).
Environment: Python, Tableau7.0, Microsoft Excel, MS SQL, SAS, Teradata, ETL, SSIS, Teradata.
Confidential
Data Analyst
Responsibilities:
- Experienced in developing business reports by writing complex SQL queries using views, volatile tables
- Experienced in Automating and Scheduling the Teradata SQL Scripts in UNIX using Korn Shell scripting.
- Wrote several Teradata SQL Queries using Teradata SQL Assistant for Ad Hoc Data Pull request.
- Implemented Indexes, Collecting Statistics, and Constraints while creating table
- Created action filters, parameters and calculated sets for preparing dashboards and worksheets in Tableau.
- Building, publishing customized interactive reports and dashboards, report scheduling using Tableau server.
- Design and deploy rich Graphic visualizations with Drill Down and Drop-down menu option and Parameterized using Tableau.
- Created side by side bars, Scatter Plots, Stacked Bars, Heat Maps, Filled Maps and Symbol Maps according to deliverable specifications.
Environment: Oracle 9i, MS-Office, Teradata, Tableau 6.1.10, Teradata 13