We provide IT Staff Augmentation Services!

Big Data Engineer Resume

CA

SUMMARY:

  • 5+ years of experience in a various IT related technology, which includes hands - on experience in Big Data technologies
  • Proficient in installing, configuring and using Apache Hadoop ecosystems such as MapReduce, Hive, Pig, Flume, Yarn, HBase, Sqoop, Spark, Storm, Kafka, Oozie, and Zookeeper
  • Strong comprehension of Hadoop daemons and Map-Reduce topics
  • Used informatica Power Center for Extraction, Transformation, and Loading (ETL) of information from numerous sources like Flat files, XML documents, and Databases
  • Experienced in developing UDFs for Pig and Hive
  • Strong knowledge of Spark for handling large data processing in streaming process along with Scala
  • Hands On experience on developing UDF, DATA Frames and SQL Queries in Spark SQL.
  • Highly skilled in integrating kafka with Spark streaming for high speed data processing
  • Worked with NoSQL databases like HBase, Cassandra and MongoDB for information extraction and place huge amount of data
  • Understanding of data storage and retrieval techniques, ETL, and databases, to include graph stores, relational databases, tuple stores
  • Ability to develop Map Reduce program using Java and Python
  • Good understanding and exposure to Python programming
  • Exporting and importing data to and from Oracle using SQL developer for analysis
  • Developed PL/SQL programs (Functions, Procedures, Packages and Triggers)
  • Good experience in using Sqoop for traditional RDBMS data pulls
  • Worked with different distributions of hadoop like Hortonworks and Cloudera
  • Strong database skills in IBM- DB2, Oracle and Proficient in database development, including Constraints, Indexes, Views, Stored Procedures, Triggers and Cursors
  • Extensive use of Open Source Software and Web/Application Servers like Eclipse 3.x IDE and Apache Tomcat 6.0
  • Experience in designing a component using UML Design-Use Case, Class, Sequence, and Development, Component diagrams for the requirements
  • Involved in reports development using reporting tools like Tableau. Used excel sheet, flat files, CSV files to generated Tableau adhoc reports
  • Broad design, development and testing experience with Talend Integration Suite and knowledge in Performance tuning of mappings
  • Experience in cluster monitoring tools like Ambari & Apache hue
  • Solid Technical foundation, great investigative capacity, cooperative person, and objective arranged, with a promise toward incredibleness
  • Outstanding communication and presentation skills, willing to learn, adapt to new technologies and third-party products

PROFESSIONAL EXPERIENCE:

Big Data Engineer

Confidential, CA

Responsibilities:

  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN
  • Worked on batch processing of data sources using Apache Spark, Elastic search
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala
  • Worked on migrating PIG scripts and MapReduce programs to Spark Data frames API and Spark SQL to improve performance
  • Able to assess business rules, collaborate with stakeholders and perform source-to-target data mapping, design and review
  • Created scripts for importing data into HDFS/Hive using Sqoop from DB2
  • Loading data from different source (database & files) into Hive using Talend tool
  • Conducted POC's for ingesting data using Flume
  • Used all major ETL transformations to load the tables through Informatica mappings
  • Created Hive queries and tables that helped line of business identify trends by applying strategies on historical data before promoting them to production
  • Worked on Sequence files, RC files, Map side joins, bucketing, Partitioning for Hive performance enhancement and storage improvement
  • Developed Pig scripts to parse the raw data, populate staging tables and store the refined data in partitioned DB2 tables for Business analysis
  • Worked on managing and reviewing Hadoop log files. Tested and reported defects in an Agile Methodology perspective
  • Conduct/Participate in project team meetings to gather status, discuss issues & action items
  • Provide support for research and resolution of testing issues
  • Coordinating with Business for UAT sign off

Environment: Hadoop, Cloudera, Talend, Scala, Spark, HDFS, Hive, Pig, Sqoop, DB2, SQL, Linux, Yarn, NDM, Quality Center 9.2, Informatica, Windows & Microsoft Office

Data Analyst

Confidential, NJ

Responsibilities:

  • Worked as a Data Analyst to generate data models using Oracle and developed relational database systems
  • Involved with data analysis primarily identifying the datasets, source data, meta data, data formats and data definition
  • Installed and worked with R and Tableau in creating visualizations for the data
  • Documented the complete process flow to describe program development, logic, testing, implementation and application integration
  • Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs
  • Devised procedures that solve complex business problems with due considerations for hardware/ software capacity and limitations, operating times and desired results
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it
  • Involved in the implementation of metadata repository, maintaining data quality, data cleaning procedures, data transformations, stored procedures, triggers and execution plans
  • Responsible for data extraction, data aggregation, building of centralized data solutions and quantitative analysis to generate business insights
  • Created and designed reports that use gathered metrics to infer and draw logical conclusions of past and future behavior
  • Worked hands on with ETL process.
  • Worked closely with ETL, SSIS, SSRS developers to explain the data transformations using logic
  • Prepared the workspace for markdown, accomplished data analysis, statistical analysis, generated reports, listings, and graphs

Environment: Oracle, Tableau, R, MS Excel, SQL, MS-SQL Databases

Big Data Engineer

Confidential , OH

Responsibilities:

  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN
  • Worked on batch processing of data sources using Apache Spark, Elastic search
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala
  • Worked on migrating PIG scripts and MapReduce programs to Spark Data frames API and Spark SQL to improve performance
  • Performed Data Analysis, Statistical Analysis, Generated Reports and Listing using SAS/SQL, SAS/ ACCESS and SAS/EXCEL, Pivot Tables and Graphs
  • Able to assess business rules, collaborate with stakeholders and perform source-to-target data mapping, design and review
  • Created scripts for importing data into HDFS/Hive using Sqoop from DB2
  • Loading data from different source (database & files) into Hive using Talend tool
  • Conducted POC's for ingesting data using Flume
  • Used all major ETL transformations to load the tables through Informatica mappings
  • Created Hive queries and tables that helped line of business identify trends by applying strategies on historical data before promoting them to production
  • Worked on Sequence files, RC files, Map side joins, bucketing, Partitioning for Hive performance enhancement and storage improvement
  • Worked on managing and reviewing Hadoop log files. Tested and reported defects in an Agile Methodology perspective
  • Conduct/Participate in project team meetings to gather status, discuss issues & action items
  • Involved in reports development using reporting tools like Tableau. Used excel sheet, flat files, CSV files to generated Tableau ad-hoc reports
  • Provide support for research and resolution of testing issues

Environment: Hadoop, Cloudera, Talend, Python, Spark, HDFS, Hive, Pig, Sqoop, DB2, SQL, Linux, Yarn, NDM, SAS/SQL, SAS/EXCEL, JIRA, Informatica, Windows & Microsoft Office, Tableau

Hire Now